Spectral reflectance recovery using optimal illuminations

Ying Fu; Yunhao Zou; Yinqiang Zheng; Hua Huang

doi:10.1364/OE.27.030502

1. Introduction

The spectral reflectance of objects contains much underlying characteristic and physical properties, and has been proven beneficial to numerous applications, including object classification [1,2], remote sensing [3–5], medical diagnosis [6–8], image relighting [9,10] and so on.

Hyperspectral cameras are widely used for the spectral

acquisition of a scene, which densely sample the spectral signature across wavelength bands for each point of a scene. The obtained intensity of the pixel located at $(x,y)$ for each band can be described as

(1)$$p(x,y,\lambda) = h(\lambda)l(\lambda)s(x,y,\lambda),$$

where $h(\lambda )$ is the spectral response of the hyperspectral camera, $l(\lambda )$ is the incoming illumination, and $s(\lambda ,x,y)$ represents the spectral reflectance for the $\lambda$-th band at location $(x,y)$. When the spectral response $h(\lambda )$ and illumination $l(\lambda )$ are calibrated first, and $p(x,y,\lambda )$ is obtained by hyperspectral cameras, the spectral reflectance $s(x,y,\lambda )$ can be directly calculated from Eq. (1).

Whiskbroom and pushbroom [11,12] are widely used hyperspectral imaging systems, which capture the scene by scanning it pointwisely or linewisely. To capture dynamic scenes, snapshot hyperspectral cameras [13–16] are designed by multiplexing the spectra into a 2D spatial sensor. However, such hyperspectral acquisition devices are typically complex and/or a handful of researchers can access them.

To obtain the spectral reflectance of a scene by the cameras in our daily life, the spectral recovery from RGB images has drawn much attention. Quite a few researchers reconstruct spectral reflectance [17–19] or the product of spectral reflectance and the illumination [20–26], i.e., $l(\lambda )s(\lambda )$, from a single RGB image. However, solving the three-to-many mapping between an RGB image and the corresponding spectral image is an over under-determined problem, these methods have much high error due to the limited scene information, especially under unknown illumination. Thus they can hardly satisfy the high demand for spectra accuracy.

To capture more information of a scene, some researchers employ multiple images captured by conventional RGB or monochrome cameras under multiple known filters [27,28] or illuminations [29–32] to recover high quality spectral reflectance. Yamaguchi et al. [27] rotated multiple filters and Chakrabarti et al. [28] put tunable filters in front of a monochrome camera to sample the spectral information by multiple images. Besides, Park et al. [29] estimated the spectral reflectance through multiplexed illumination. Han et al. [31] recovered spectral reflectance by exploiting the unique color-forming mechanism of digital light processing projector. Chi et al. [30] obtained spectral reflectance by optimizing wide band filtered illumination based on active lighting approach. Lam et al. [32] employed the optimal basis lights for spectral reflectance recovery. Later, Oh et al. [10] used multiple images captured by different consumer-level digital cameras to reconstruct spectral image.

All these methods on spectral reflectance recovery from multiple images assume that the spectral reflectance can be linearly represented by using a small number of basis functions (e.g. PCA and [33]) and recover the spectral reflectance with these basis functions. On basis of this linear assumption, [29] found the optimal multiplexed illuminations in a brute force way. [30] selected the optimized wide band filters in front of illuminant by minimizing their condition number. [32] employed the orthogonal representation of the spectral reflectance bases as the illuminations, which can minimize the condition number of illumination spectra to be one in theory and are the most effective lights under the linear representation. However, the spectral reflectance is much more complex in the real world and cannot meet the linear assumption [21,25,26], so that optimizing illumination under the linear assumption is not reasonable and cannot guarantee the optimality.

In this paper, we build a hyperspectral imaging system utilizing multiple input images captured under a set of optimal lights, as shown in Fig. 1. In particular, we learn a simple and efficient convolutional neural network (CNN) to recover spectral reflectance with illumination optimization. Our model can simultaneously learn the optimal illumination spectra and a nonlinear representation for the reflectance spectral recovery through an end-to-end network. Specifically, we design illumination optimization layers to optimally multiplex illuminations in a given dataset, or to design the optimal ones under some physical restrictions. These optimization layers are then jointly learned with spectral recovery in a CNN-based architecture. Experimental results on synthetic and real images show the superior performance of our method both in theory and real capture system.

Fig. 1. Overview of the proposed spectral reflectance imaging system. First, images under multiple optimal illuminations are captured by RGB/monochrome cameras. Then the learned network recovers spectral reflectance from these captured images.

Download Full Size | PDF

In summary, our main contributions are that

• We effectively model optimal illumination as the optimization of convolution layers;
• We develop the nonlinear representation for the spectral reflectance in a data-driven way and jointly optimize illumination under this representation.
• We implement the proposed method on multiple RGB/monochromatic images and discuss the smallest number of RGB/monochromatic images for high quality spectral reflectance recovery.
• We set up a real spectral reflectance capture system and achieve promising results practically.

2. Illumination optimization and spectral reflectance recovery

In this section, we first formulate the problem. Then, the illumination optimization layer and reflectance spectral recovery network are presented. Finally, we provide the learning details.

2.1 Formulation

When taking an image of a reflective scene using an ordinary RGB or monochromatic camera, the intensity of each pixel for the $p$-th channel under the $m$-th illumination is

(2)$$Y_{p,m}(x,y) = \int c_p(\lambda)l_{m}(\lambda)s(x,y,\lambda)d\lambda,$$

where $c_p(\lambda )$ is the corresponding camera spectral response (CSR) function. In practice, Eq. (2) can be discretized across wavelength and written as

(3)$$Y_{p,m}(x,y) = \sum_{b=1}^{B}c_p(\lambda_{b})l_{m}(\lambda_{b})s(x,y,\lambda_{b}),$$

where $\lambda _{b}$ $(b = 1, 2, \ldots , B)$ is the discrete representation of wavelength $\lambda$ and $B$ is the number of spectral bands.

The image $\{Y_{p,m}(x,y)\}$ under the $m$-th illumination can be simply denoted as $\mathbf {Y}_{m} \in \mathbb {R}^{N\times O \times P}$. $N$ and $O$ are the number of rows and columns, respectively. $P$ is the number of captured channels, and $P=3$ for the RGB image and $P=1$ for the monochromatic image. When the scene is captured under $M$ illuminations, the full captured data for a scene can be expressed as

(4)$$\mathbf{Z} = [\mathbf{Y}_{1};\mathbf{Y}_{2};\cdots;\mathbf{Y}_{M} ].$$

It can be clearly observed from Eq. (3) that the radiance of a scene captured by RGB or monochromatic cameras is determined by three aspects, i.e., CSR, illumination, and the scene’s intrinsic property—spectral reflectance. To obtain spectral reflectance in high fidelity, the effect from illumination should be investigated for a specialized camera. In this work, we effectively model optimal illumination as a CNN optimization layer and join it with a simple spectral network to recover the high quality spectral reflectance under the nonlinear representation. The overall framework of the proposed method is shown in Fig. 2.

Fig. 2. Overview of our CNN-based method which jointly learns the optimal illumination and the spectral reflectance recovery. The black arrow represents the training stage and the red arrow denotes the testing stage.

Download Full Size | PDF

2.2 Illumination optimization

We present two approaches to obtain the optimal illumination, respectively. For existing light sources (e.g., variant LEDs [34]), we design a multiplex convolution layer to retrieve the optimal illumination combinations. Beyond the optimal multiplexing, the optimal illumination design is further learned under some physical restrictions by simulating the illumination as a convolution layer, which could be provided by Nikon Equalized Light Source (ELS) [32].

2.2.1 Illumination multiplexing

Here, we describe how to design a multiplex convolution layer to retrieve the optimal illumination combinations for existing light sources (e.g., variant LEDs [34]).

To optimally multiplex illumination from existing light sources, we first synthesize spectral images for each scene under different illuminations in a candidate dataset. According to Eq. (3), each illumination is linearly multiplied by the spectral reflectance along spectral bands, and can be regarded as the weights in a $1 \times 1$ depth-wise convolution kernel. Thus, the $t$-th scene under the $j$-th candidate illumination from existing light sources can be described as

(5)$$\mathbf{X}_{j,t} = \boldsymbol{l}_{j}*\mathbf{S}_{t},$$

where $\boldsymbol {l}_{j} = [l_{j}(\lambda _{1}),l_{j}(\lambda _{2}),\ldots ,l_{j}(\lambda _{B})] \in \mathbb {R}^{1\times 1\times B}$ is the $j$-th illumination spectrum, and $\mathbf {S}_{t} \in \mathbb {R}^{N\times O\times B}$ denotes the spectral reflectance $\{s(x,y,\lambda _{b})\}$ for the $t$-th scene.

Then, we stack the corresponding candidate spectral images under different illuminations for each scene and express it as

(6)$$\mathcal{X}_{t} = stack(\mathbf{X}_{1,t}; \mathbf{X}_{2,t}; \cdots; \mathbf{X}_{J,t}),$$

where $J$ is the number of illuminations in the candidate light dataset. Thus, the illumination optimization can be equivalent to choosing the optimal linear combination of $\mathbf {X}_{j,t}$ for all scenes. As $\mathbf {X}_{j,t}$ is a 3-dimensional tensor, the illumination multiplexing can be simply implemented by a $1 \times 1 \times 1$ convolution layer with $M$ kernels and each kernel corresponds to an optimal multiplexed illumination in Eq. (4). As shown in Fig. 3(a), the $j$-th parameter in the $m$-th kernel is the coefficient for the $m$-th multiplexed illumination from the $j$-th illuminant in the candidate dataset.

Fig. 3. The illustration of the $m$-th illumination optimization.

Download Full Size | PDF

Let $\boldsymbol {V_{s}}$ and $\mathbf {Z}_t^{s}$ indicate the corresponding convolution kernels for the illumination multiplexing and the ground truth under optimal $M$ multiplex illuminations for the $t$-th scene. The output under the optimal illumination multiplexing for the $t$-th scene can be described as $\mathbf {C}(\boldsymbol {V_{s}} * \mathcal {X}_{t})$, where $\mathbf {C}(\cdot )$ denotes the irradiance of spectra $\boldsymbol {V_{s}} * \mathcal {X}_{t}$ is integrated by the CSR.

Thus, considering the non-negativity of each illumination as well, $\boldsymbol {V_{s}}$ can be determined by minimizing the mean squared errors (MSE) under the non-negative constraint between the prediction $\mathbf {C}(\boldsymbol {V_{s}} * \mathcal {X}_{t})$ and the corresponding ground truth $\mathbf {Z}_t^{s}$

(7)$$\mathcal{L}_{s}(\boldsymbol{V_{s}}) = \sum_{t=1}^{T}\Vert \mathbf{C} (\boldsymbol{V_{s}} * \mathcal{X}_{t}) - \mathbf{Z}_t^{s} \Vert _2^2, \quad s.t. \quad \boldsymbol{V_{s}} \geq 0,$$

where $T$ is the number of training samples.

2.2.2 Illumination design

Beyond the optimal multiplexing for existing light sources, we further present how to learn the optimal illuminations under some physical restrictions in this section. It can effectively guide the design of the off-the-shelf light source.

According to Eq. (5), each illumination can be regarded as the weights in a $1 \times 1$ depth-wise convolution kernel, as shown in Fig. 3(b). The $M$ illuminations used for the spectral data collection in Eq. (4) can be interpreted as the weights in a $1 \times 1$ convolution layer with $M$ depth-wise kernels.

Let $\boldsymbol {V_{d}}$ denote the corresponding convolution kernels. The captured data for each scene under designed $M$ optimal illuminations could be expressed as $\mathbf {C}(\boldsymbol {V_{d}} * \mathbf {S}_{t}$). As the illumination should always be positive, all parameters in $\boldsymbol {V_{d}}$ are set to be non-negative values to guarantee. Besides, according to [10,17,20], the illumination function should be slightly changed along neighboring bands to facilitate its realization. Thus, the values in $\boldsymbol {V_{d}}$ can be achieved by minimizing the MSE under the non-negative smooth constraint

(8)$$\begin{aligned} \mathcal{L}_{d}(\boldsymbol{V_{d}}) = & \sum_{t=1}^{T}\Vert \mathbf{C}(\boldsymbol{V_{d}} * \mathbf{S}_{t})- \mathbf{Z}_t^{d} \Vert _2^2+ \eta\Vert G\boldsymbol{V_{d}} \Vert _2^2, \quad s.t. \quad \boldsymbol{V_{d}} \geq 0, \end{aligned}$$

where $\eta$ is a predefined parameter, $G$ is the first derivative matrix to account for spectrum smoothness, and $\mathbf {Z}_t^{d}$ is the corresponding ground truth under the designed optimal illuminations. The optimal illuminations are the learned weights of $M$ convolution kernels in $\boldsymbol {V_{d}}$.

2.3 Spectral reflectance recovery

To well nonlinearly represent the spectral reflectance, we develop a simple and efficient network with $1 \times 1$ convolution layer to recover spectral reflectance, which has much lower time complexity.

The recovery network comprises $K$ layers, and each layer is made of a 2D convolution and a rectified linear unit (ReLU) [35] activation. Without loss of generality, $\mathbf {Z}$ is used to denote the input of the recovery network, and $\mathbf {Z}$ represents $\mathbf {Z}^{s}$ for the illumination multiplexing setting and denote $\mathbf {Z}^{d}$ for the case of illumination design. Thus, the output of the $k$-th layer can be expressed as

(9)$$\begin{aligned} \mathbf{F}_{k} & = \textrm{ReLU}(\mathbf{W}_{k}*\textrm{stack}(\mathbf{F}_{k-1},\mathbf{Z})+\boldsymbol{b}_{k}), \end{aligned}$$

where $\mathbf {F}_{0} = 0$ and $\textrm {ReLU}(x) = \max \{x,0\}$. $\mathbf {W}_{k}$ and $\boldsymbol {b}_{k}$ represent the convolution kernels and biases for the $k$-th layer, respectively. Convolution kernels are of size $a_{0} \times 1 \times 1\times a_{1}$ for the first layer, where $a_{0}$ and $a_{1}$ are the number of input and output channels, respectively. In the $2$-nd to $(K-1)$-th layers, the kernels are of size $(a_{1}+a_{0}) \times 1 \times 1 \times a_{1}$. Finally, the last layer uses are of size $(a_{1}+a_{0}) \times 1\times 1\times B$. In the experiments, we set $K=5$, $a_{1}=64$, $a_{0} = 3M$ for RGB images, and $a_{0} = M$ for monochromatic images.

2.4 Learning details

In our method, we simultaneously learn the optimal illumination and the parameters for spectral reflectance recovery, as shown in Fig. 2. During the training stage, the layer for the optimal illuminations works like an encoder, it updates its parameters by maximizing the amount of information that the captured images take. Then, the spectral reflectance recovery network, which works like a decoder, transforms the input data to the spectral reflectance of a scene.

As the illumination optimization layer is appended onto the the spectral reflectance recovery network, the output of the former layer directly flows to the latter. The optimal illumination and the recovery network have to be jointly learned. Let $\mathbf {\Theta }$ denotes the parameters in spectral reflectance recovery network. To optimally multiplex illuminations in a candidate dataset, the entire network can be learn by minimizing the loss

(10)$$\mathcal{L}(\textbf V_{s}, \boldsymbol\Theta) = \tau_{1} \mathcal{L}_{s}(\textbf V_{s}) + \sum_{t=1}^T\Vert f(\mathbf{Z}_{t}^{s},\boldsymbol{\Theta}) - \textbf S_{t} \Vert _2^2, \quad s.t. \quad \boldsymbol{V_{s}} \geq 0,$$

where $f(\mathbf {Z}_{t},\mathbf {\Theta })$ and $\textbf S_{t}$ are the predicted spectral reflectance and the corresponding ground truth.

Similarly, the entire network for the optimal illumination design can be trained by minimizing the loss

(11)$$\mathcal{L}(\textbf V_{d}, \boldsymbol\Theta) = \tau_{2} \mathcal{L}_{d}(\textbf V_{d}) + \sum_{t=1}^T\Vert f(\mathbf{Z}_{t}^{d},\mathbf{\Theta}) - \textbf S_{t} \Vert _2^2, \quad s.t. \quad \boldsymbol{V_{d}} \geq 0.$$

The optimal illumination can be obtained from the learned weights of the convolution kernels $\textbf V_{s}$ or $\textbf V_{d}$.

In the testing stage, the input images are the data captured by RGB or monochromatic cameras under the optimal illuminations, denoted as $\mathbf {Z}_{test}$. The predicted spectral reflectance $\textbf S$ for a tested scene can be obtained by $\textbf S = f(\mathbf {Z}_{test},\mathbf {\Theta })$.

In the experiments, $\eta$, $\tau _{1}$ and $\tau _{2}$ are set to 0.1, 1, and 1. We use the well-known Kaiming initialization [36] to initialize the spectral recovery network, and uniformly initialize the parameters in illumination optimization layer with positive weights. The losses are minimized with the adaptive moment estimation method [37] and the momentum parameter is set to 0.9. The learning rate is initially set as $10^{-3}$ and decayed to $10^{-5}$. Our method is run on an NVIDIA GTX 1080Ti GPU with deep learning framework PyTorch.

3. Experiments on synthetic data

In this section, we evaluate our method using synthetic data. First, the dataset and learning settings are introduced. Then, experiments on both RGB and monochromatic images are provided, which show the effectiveness of our method.

3.1 Datasets and settings

Our experiments are conducted on four public hyperspectral datasets, including the ICVL dataset [20], the NUS dataset [17], the Harvard Dataset [28], and the CAVE dataset [38]. The ICVL spectral dataset consists of 201 high-quality natural spectral images in total, and is by far the most comprehensive natural hyperspectral dataset. We randomly select 101 images out of 201 in the ICVL dataset as training data, and the rest are used for testing. The NUS dataset has 41 images in the training set and 25 images in testing set. The Harvard dataset contains 50 natural spectral images. We randomly select 35 images for training and the rest for testing. The CAVE dataset has 32 spectral images, where 20 images are randomly selected for training and the rest are used for testing. Images in all these four datasets have 31 bands.

8 LED light sources [34] are used for the illumination multiplexing. These LED’s intensity can be controlled by the voltage and is flexible to multiplex. In all experiments, if not specifically stated, we use the CSR of Canon 5D MarkII like [21] to synthesize RGB images and FLIR GS3-U3-15S5M like [25] to simulate monochromatic images from given spectral datasets.

Three image quality metrics are utilized to evaluate the performance of all methods, including root-mean-square error (RMSE), structural similarity (SSIM) [39], and spectral angle mapping (SAM) [40]. RMSE and SSIM measure the 2D spatial fidelity of an image, and SAM calculates the error on 1D spectral dimension. Smaller values of RMSE and SAM suggest better recovery, and a larger value of SSIM means better performance.

3.2 Results on RGB images

First, we conduct experiments on RGB images, the CSR of conventional RGB cameras is used to synthesize RGB images from spectral images under multiple optimal illuminations as the input of all compared methods. Here, we compare our method with [29], which recovers spectral reflectance from two RGB images under two optimal illuminations in a brute force way. Intuitively, more number of input images contain more spatial and spectral information, and are sure to reconstruct reflectance with higher fidelity. To fairly compare with their work, we also conduct our method with multiplexing 2 optimal illuminations and designing 2 optimal illuminations. These are denoted as Ours$_{m1}$ and Ours$_{m2}$, respectively.

Table 1 provides quantitative results on RGB setting, and the best results are highlighted in bold. To visualize the experimental results, the error images of two representative recovered scene in spectral datasets are shown in Fig. 4. Error images are the absolute error between the prediction and the corresponding ground truth, which perceptually present the efficiency of our method on 2D spatial reconstruction. The average point-wise absolute error for each band is shown in Fig. 6(a). It proves that our method, especially illumination design, works well on spectral recovery. The method in [29] has selected the optimal multiplexed light sources in a brute force way under the linear assumption for the spectral reflectance, while our method effectively exploits the nonlinear representation of the spectral reflectance and jointly optimizes illumination. These make our method performs much better than [29].

Fig. 4. Visual quality comparison for the spectral recovery from two RGB images. The error maps for [29]/Ours$_{m1}$/Ours$_{m2}$ results and the scenes are shown from left to right.

Download Full Size | PDF

Fig. 5. Visual quality comparison for the spectral recovery from multiple monochromatic images. The error maps for [32]/Ours-$s3$/Ours-$s6$/Ours-$s8$/Ours-$d3$/Ours-$d6$/Ours-$d8$ results and the scenes are shown from left to right.

Download Full Size | PDF

Fig. 6. The absolute error between ground truth and recovered results along spectra for all compared methods, corresponding results are shown in Figs. 4 and 5.

Download Full Size | PDF

Table 1. Quantitative results on multiple RGB images’ setting. $\textrm {Ours}_{m1}$ and $\textrm {Ours}_{m2}$ denote multiplexing 2 optimal illuminations and designing 2 optimal illuminations, respectively.

View Table | View all tables in this article

3.3 Results on monochromatic images

We also conduct our method using monochromatic images under different illuminations and compare it with Lam’s method [32]. The method in [32] recovers spectral reflectance from 8 monochromatic images under different illuminations, which are orthogonal representation of the basis of spectral reflectance in [33]. Its designed illuminations are optimal under the linear assumption for the spectral reflectance. Our method is performed under 3, 6, and 8 optimal illuminations, denoted as Ours-${s3}$, Ours-${s6}$ and Ours-${s8}$ for optimal illumination multiplexing, and Ours-${d3}$, Ours-${d6}$ and Ours-${d8}$ for optimal illumination design. Table 2 provides the corresponding results and Figs. 5 and 6(b) visually show the corresponding spatial and spectral errors from two typical scenes. We can see that our method perform much better even under 3 optimal illuminations. It further verifies the rationality of the nonlinear representation for the spectral reflectance and superiority of our illumination optimization under this representation.

Table 2. Quantitative results on multiple monochromatic images’ setting. Ours-${s3}$, Ours-${s6}$ and Ours-${s8}$ denote our method under 3, 6 and 8 optimal multiplexed illuminations. Ours-${d3}$, Ours-${d6}$ and Ours-${d8}$ denote our method under 3, 6 and 8 optimal designed illuminations.

View Table | View all tables in this article

4. Experiments on real data

To further evaluate the effectiveness of our method, we set up a real capture system, as shown in Fig. 7(a). The FLIR GS3-U3-15S5M monochrome camera with Fujinon TF8DA-8B lens is used to capture data, and its CSR is shown in Fig. 12(b). The programmable light source ELS provides the optimal illuminations. According to the analysis in Section 3.3, monochromatic images under six optimal illuminations are sufficient for spectral reflectance recovery. We capture the real data under six different illuminations.

Fig. 7. The setup for real data and visualized result. (a) Real capture system setup. (b) The optimal illumination spectra. (c) The recovered spectral reflectance results in RGB.

Download Full Size | PDF

As shown in Fig. 2, we first train our model with the given CSR and learn six optimal illuminations based on spectral datasets and implement them via the ELS. The number of the spectral band $B$ is $31$. Then we use monochromatic camera to capture a scene under six designed illuminations, which are shown in Fig. 8. The scene contains a Macbeth Color Checker with 24 color patches, and the standard spectral reflectance is known for each patch. We stack these captured monochrome images, send them to the spectral recovery network, and obtain the spectral reflectance. For each color patch, we treat the average intensity of the corresponding region as the final estimation to avoid the influence from noise.

Fig. 8. The captured six monochromatic images under six optimal illuminations.

Download Full Size | PDF

Figure 7(b) shows the calibrated optimal illuminations implemented by ELS, which are not orthogonal and cannot meet the requirement in [32] for the spectral recovery. Nevertheless, the method in [29] are suitable for monochromatic camera under arbitrary number of illuminations. Thus, we perform [29] and our method on the real data. The results from our method and Park’s method [29] are shown in Table 3 and Fig. 9, which provide quantitative results and recovered spectra for randomly selected patches in the real scene, respectively. By comparing these results, we can draw conclusions that utilizing these images under learned illuminations for our method can approximate the reflectance better. These convincingly demonstrate the effectiveness of the proposed method in real capture system.

Fig. 9. The recovered spectra of four typical patches on a Macbeth Color Checker. These four patches are marked in Fig. 7.

Download Full Size | PDF

Table 3. The quantitative results on real data. RMSE results of eight typical patches and the average RMSE of all patches in the ColorChecker are provided. The eight patches are marked in Fig. 7(c).

View Table | View all tables in this article

5. Discussion

In this section, we show the feasibility of our method on the spectral reflectance recovery from a single RGB image and discuss the effect from different illuminations.

5.1 Feasibility on a single image

Here, we first provide evidence on the effect of optimal illumination on a single image as model input to exclude the improvement via multiple inputs. We compare our method with six state-of-the arts, including radial basis function based method (RBF) [17], the sparse representation based method (SR) [20], the manifold-based mapping (MM) [21], deeply learned CSR based method (DLCSR) [25], CNN-based HSI enhancement (HSCNN+) [24], and the CNN-based method with CSR selection (CNNS) [26]. Our method is conducted on 3 different cases, including spectral recovery without illumination optimization, spectral recovery with optimal illumination multiplexing, and spectral recovery with optimal illumination design. These cases are denoted as Ours$_{s1}\sim$Ours$_{s3}$. We test all these methods on four spectral datasets and the experimental results are provided in Table 4. It can be noticed that by adding illumination optimization layer, Ours$_{s2}$ and Ours$_{s3}$ outperform Ours$_{s1}$, which shows the effectiveness of optimal illumination. Even though we use only one input image, Ours$_{s3}$ outperforms all state-of-the-arts. Besides, although results from CNNS [26] is close to ours, their CSR selection operation could hardly suit multiple images condition and the time complexity of CNNS is approximately 90 times of ours.

Table 4. Quantitative results on the single RGB image setting. Ours$_{s1}\sim$Ours$_{s3}$ denote our methods on spectral recovery without illumination optimization, with optimal illumination multiplexing, and with optimal illumination design, respectively.

View Table | View all tables in this article

To visualize the experimental results, the error images of a representative recovered spectral image in spectral datasets are shown in Fig. 10. The absolute error along spectra for all methods are shown in Fig. 10(j). We can see that the recovered images from our method are consistently more accurate. It further verifies that our method provides higher spatial and spectral accuracy, and the similar conclusion can be drawn like Table 4.

Fig. 10. Visual quality comparison for the spectral recovery from a single RGB image. The error maps for RBF/SR/MM/DLCSR /HSCNN+/CNNS/Ours$_{s1}$/Ours$_{s2}$/Ours$_{s3}$ results and the absolute error along spectra are shown.

Download Full Size | PDF

5.2 Discussion on illumination

Here, we first conduct extensive experiments on different numbers of RGB images with the optimal illumination design, and their results are shown in Fig. 11(a). It can be seen that the spectral reflectance recovered from two input RGB images has much higher accuracy than that from a single image, and inputting more than two images cannot obviously improve the spectral recovery accuracy. These imply that two RGB images are sufficient to recover spectral reflectance under the optimal illuminations.

Fig. 11. (a) RMSE results of the recovered spectral reflectance under different numbers of RGB images. (b) The used CSR of the RGB camera.

Download Full Size | PDF

To validate the effect from the number of monochromatic images, we provide the RMSE results of the recovered spectral reflectance under different numbers of optimal illuminations in Fig. 12(a). RMSE from a single monochromatic image is much higher than that from a single RGB image. RMSE becomes lower with the number of inputting images increasing and inputting more than six images cannot obviously improve the spectral recovery accuracy any more.

According to Figs. 11(a) and 12(a), the RMSE under six monochromatic images is obviously lower than the results from two RGB images, although they have the same number of channels. The reason may be that the consumer RGB cameras are usually closely to zero in some spectral range (Fig. 11(b)) and cannot be well adjusted by the illumination optimization. Thus, to obtain high quality spectral reflectance, capturing scene by the monochromatic camera under multiple illuminations is a better choice.

Fig. 12. (a) RMSE results of the recovered spectral reflectance under different numbers of monochromatic images. (b) The used CSR of the monochromatic camera.

Download Full Size | PDF

To empirically analyze how the illumination optimization layer works, taking the optimal illumination design on monochromatic images as an example, their corresponding illumination spectral distributions under different number of the designed illuminations are provided in Figs. 13(a)–(d). It can be observed that in all situations, the spectral distribution of light sources nearly cover the entire wavelength range. It guarantees that the camera samples enough information for all wavelength range as far as possible.

Fig. 13. The designed optimal illuminations and the corresponding singular values on monochromatic images setting. (a)–(d) show the spectral distribution of optimal illuminations under 3, 4, 5 and 6 input images, respectively. Their corresponding singular values are provided in (e)–(h).

Download Full Size | PDF

Another important criteria for good illumination sets is the low linear correlation, as images captured under linear illuminations cannot provide new information. If a certain light could be linearly expressed by other 2 illuminations, i.e., $l_3(\lambda ) = \alpha _1l_1(\lambda ) + \alpha _2l_2(\lambda )$, according to Eq. (3) the intensity for each pixel for the $p$-th channel under the illumination $l_3$ could be represented as

(12)$$\begin{aligned}Y_{p,3}(x,y) & = \sum_{b=1}^{B}c_p(\lambda_{b})l_{3}(\lambda_{b})s(x,y,\lambda_{b})\\&= \sum_{b=1}^{B}c_p(\lambda_{b})[\alpha_1l_1(\lambda) + \alpha_2l_2(\lambda)]s(x,y,\lambda_{b})\\&=\alpha_1\sum_{b=1}^{B}c_p(\lambda_{b})l_{1}(\lambda_{b})s(x,y,\lambda_{b}) + \alpha_2\sum_{b=1}^{B}c_p(\lambda_{b})l_{2}(\lambda_{b})s(x,y,\lambda_{b}) \\ &= \alpha_1Y_{p,1} + \alpha_2Y_{p,2}. \end{aligned}$$

This means that the captured information under the illumination $l_3$ has been already provided from measurements under illuminations $l_1$ and $l_2$, and no additional information is captured. To illustrate this feature of optimal illumination more explicitly, we provide the singular values and condition numbers of the designed illuminations, which are two important values for evaluating linear independence and orthogonality. The singular values are shown in Figs. 13(e)–(h) and their corresponding condition numbers are 1.6407, 1.9181, 1.9002, 3.2973. These results further prove the low linear correlation in optimal illuminations.

6. Conclusion

In this paper, we present an effective CNN-based spectral reflectance recovery method with illumination optimization, which simultaneously learns optimal illumination and a mapping for the spectral recovery by using an end-to-end network. Specifically, we design illumination optimization layer to optimize the illumination, develop the nonlinear representation for the spectral reflectance in a data-driven way, and jointly optimize illumination under this representation in a CNN-based architecture. Experimental results on synthetic and real data verify the advantages of the nonlinear representation of the spectral reflectance and deeply optimal illumination under this representation.

Funding

National Natural Science Foundation of China (61425013, 61672096).

Disclosures

The authors declare no conflicts of interest.

References

1. B. Qi, C. Zhao, E. Youn, and C. Nansen, “Use of weighting algorithms to improve traditional support vector machine based classifications of reflectance data,” Opt. Express 19(27), 26816–26826 (2011). [CrossRef]

2. X. Cao, F. Zhou, L. Xu, D. Meng, Z. Xu, and J. Paisley, “Hyperspectral image classification with markov random fields and a convolutional neural network,” IEEE Trans. Image Processing 27(5), 2354–2367 (2018). [CrossRef]

3. M. A. Loghmari, M. S. Naceur, and M. R. Boussema, “A spectral and spatial source separation of multispectral images,” IEEE Trans. Geosci. Electron. 44(12), 3659–3673 (2006). [CrossRef]

4. J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, N. M. Nasrabadi, and J. Chanussot, “Hyperspectral remote sensing data analysis and future challenges,” IEEE Geosci. Remote Sens. Mag. 1(2), 6–36 (2013). [CrossRef]

5. D. B. Gillis, J. H. Bowles, M. J. Montes, and W. J. Moses, “Propagation of sensor noise in oceanic hyperspectral remote sensing,” Opt. Express 26(18), A818–A831 (2018). [CrossRef]

6. S. Nakariyakul and D. P. Casasent, “Fast feature selection algorithm for poultry skin tumor detection in hyperspectral data,” J. Food Eng. 94(3-4), 358–365 (2009). [CrossRef]

7. L. L. Randeberg, I. Baarstad, T. Løke, P. Kaspersen, and L. O. Svaasand, “Hyperspectral imaging of bruised skin,” Proc. SPIE 6078, 60780O (2006). [CrossRef]

8. G. N. Stamatas, C. J. Balas, and N. Kollias, “Hyperspectral image acquisition and analysis of skin,” Proc. SPIE 4959, 77–83 (2003). [CrossRef]

9. A. Lam and I. Sato, “Spectral modeling and relighting of reflective-fluorescent scenes,” in Proceedings of Conference on Computer Vision and Pattern Recognition (2013).

10. S. Wug Oh, M. S. Brown, M. Pollefeys, and S. Joo Kim, “Do it yourself hyperspectral imaging with everyday digital cameras,” in Proceedings of Conference on Computer Vision and Pattern Recognition (2016).

11. R. W. Basedow, D. C. Carmer, and M. E. Anderson, “Hydice system: Implementation and performance,” Proc. SPIE 2480, 258–267 (1995). [CrossRef]

12. W. M. Porter and H. T. Enmark, “A system overview of the airborne visible/infrared imaging spectrometer (aviris),” in Annual Technical Symposium, (1987), pp. 22–31.

13. A. Gorman, D. W. Fletcher-Holmes, and A. R. Harvey, “Generalization of the lyot filter and its application to snapshot spectral imaging,” Opt. Express 18(6), 5602–5608 (2010). [CrossRef]

14. B. K. Ford, M. R. Descour, and R. M. Lynch, “Large-image-format computed tomography imaging spectrometer for fluorescence microscopy,” Opt. Express 9(9), 444–453 (2001). [CrossRef]

15. L. Gao, R. T. Kester, N. Hagen, and T. S. Tkaczyk, “Snapshot image mapping spectrometer (ims) with high sampling density for hyperspectral microscopy,” Opt. Express 18(14), 14330–14344 (2010). [CrossRef]

16. N. Gat, G. Scriven, J. Garman, M. D. Li, and J. Zhang, “Development of four-dimensional imaging spectrometers (4d-is),” Proc. SPIE 6302, 63020M (2006). [CrossRef]

17. R. M. H. Nguyen, D. K. Prasad, and M. S. Brown, “Training-based spectral reconstruction from a single rgb image,” in Proceedings of European Conference on Computer Vision (2014), pp. 186–201.

18. A. Robles-Kelly, “Single image spectral reconstruction for multimedia applications,” in Proceedings of international conference on Multimedia (2015), pp. 251–260.

19. Y. Fu, Y. Zheng, L. Zhang, and H. Huang, “Spectral reflectance recovery from a single rgb image,” IEEE Trans. Comput. Imaging 4(3), 382–394 (2018). [CrossRef]

20. B. Arad and O. Ben-Shahar, “Sparse recovery of hyperspectral signal from natural rgb images,” in Proceedings of European Conference on Computer Vision (2016) pp. 19–34.

21. Y. Jia, Y. Zheng, L. Gu, A. Subpa-Asa, A. Lam, Y. Sato, and I. Sato, “From rgb to spectrum for natural scenes via manifold-based mapping,” in Proceedings of International Conference on Computer Vision (2017) pp. 4715–4723.

22. A. Alvarez-Gila, J. van de Weijer, and E. Garrote, “Adversarial networks for spatial context-aware spectral image reconstruction from rgb,” in The IEEE International Conference on Computer Vision Workshops, (2017).

23. Z. Xiong, Z. Shi, H. Li, L. Wang, D. Liu, and F. Wu, “Hscnn: Cnn-based hyperspectral image recovery from spectrally undersampled projections,” in Proceedings of International Conference on Computer Vision - Workshops (2017).

24. Z. Shi, C. Chen, Z. Xiong, D. Liu, and F. Wu, “Hscnn+: Advanced cnn-based hyperspectral recovery from rgb images,” in The IEEE Conference on Computer Vision and Pattern Recognition Workshops, (2018).

25. S. Nie, L. Gu, Y. Zheng, A. Lam, N. Ono, and I. Sato, “Deeply learned filter response functions for hyperspectral reconstruction,” in Proceedings of Conference on Computer Vision and Pattern Recognition (2018).

26. Y. Fu, T. Zhang, Y. Zheng, D. Zhang, and H. Huang, “Joint camera spectral sensitivity selection and hyperspectral image recovery,” in Proceedings of European Conference on Computer Vision (2018).

27. M. Yamaguchi, H. Haneishi, H. Fukuda, J. Kishimoto, H. Kanazawa, M. Tsuchida, R. Iwama, and N. Ohyama, “High-fidelity video and still-image communication based on spectral information: natural vision system and its applications,” Proc. SPIE 6062, 60620G (2006). [CrossRef]

28. A. Chakrabarti and T. Zickler, “Statistics of real-world hyperspectral images,” in Proceedings of Conference on Computer Vision and Pattern Recognition (2011), pp. 193–200.

29. J.-I. Park, M.-H. Lee, M. D. Grossberg, and S. K. Nayar, “Multispectral imaging using multiplexed illumination,” in Proceedings of International Conference on Computer Vision (2007), pp. 1–8.

30. C. Chi, H. Yoo, and M. Ben-Ezra, “Multi-spectral imaging by optimized wide band illumination,” Int J Comput Vis 86(2-3), 140–151 (2010). [CrossRef]

31. S. Han, I. Sato, T. Okabe, and Y. Sato, “Fast spectral reflectance recovery using DLP projector,” Int J Comput Vis 110(2), 172–184 (2014). [CrossRef]

32. A. Lam, A. Subpa-Asa, I. Sato, T. Okabe, and Y. Sato, “Spectral imaging using basis lights,” in Proceedings of Conference on British Machine Vision Conference (2013).

33. J. P. Parkkinen, J. Hallikainen, and T. Jaaskelainen, “Characteristic spectra of munsell colors,” J. Opt. Soc. Am. A 6(2), 318–322 (1989). [CrossRef]

34. M. Kitahara, T. Okabe, C. Fuchs, and H. P. Lensch, “Simultaneous estimation of spectral reflectance and normal from a small number of images,” in VISAPP (1), (2015), pp. 303–313.

35. V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proceedings of International Conference on Machine Learning (2010), pp. 807–814.

36. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in Proceedings of International Conference on Computer Vision (2015).

37. D. P. Kingma and J. L. Ba, “Adam: a method for stochastic optimization,” in Proceedings of International Conference on Learning Representations (2015).

38. F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar, “Generalized assorted pixel camera: Postcapture control of resolution, dynamic range and spectrum,” IEEE Trans. Image Processing 19(9), 2241–2253 (2010). [CrossRef]

39. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Processing 13(4), 600–612 (2004). [CrossRef]

40. F. A. Kruse, A. B. Lefkoff, J. W. Boardman, K. B. Heidebrecht, A. T. Shapiro, P. J. Barloon, and A. F. H. Goetz, “The spectral image processing system (sips)–interactive visualization and analysis of imaging spectrometer data,” Remote Sens. Environ. 44(2-3), 145–163 (1993). [CrossRef]

Dataset	Metrics	[29]	${Ours}_{m 1}$	${Ours}_{m 2}$
ICVL	RMSE	$5.2780$	$1.1572$	$0.9594$
	SSIM	$0.9619$	$0.9980$	$0.9986$
	SAM	$0.0960$	$0.0252$	$0.0224$
NUS	RMSE	$8.8813$	$4.4292$	$2.9013$
	SSIM	$0.9312$	$0.9694$	$0.9810$
	SAM	$0.2229$	$0.1467$	$0.1077$
Harvard	RMSE	$2.1390$	$1.5634$	$1.0512$
	SSIM	$0.9884$	$0.9952$	$0.9979$
	SAM	$0.0971$	$0.0809$	$0.0640$
CAVE	RMSE	$7.1100$	$2.7513$	$2.3954$
	SSIM	$0.9690$	$0.9960$	$0.9965$
	SAM	$0.3319$	$0.0844$	$0.0732$

Dataset	Metrics	[32]	Ours- $s 3$	Ours- $s 6$	Ours- $s 8$	Ours- $d 3$	Ours- $d 6$	Ours- $d 8$
ICVL	RMSE	$1.7907$	$1.2317$	$0.8360$	$0.7476$	$1.1316$	$0.5439$	$0.5142$
	SSIM	$0.9965$	$0.9976$	$0.9986$	$0.9990$	$0.9978$	$0.9993$	$0.9993$
	SAM	$0.0387$	$0.0278$	$0.0184$	$0.0168$	$0.0258$	$0.0128$	$0.0122$
NUS	RMSE	$5.7702$	$4.5775$	$3.1766$	$3.1268$	$3.9201$	$2.4648$	$2.0094$
	SSIM	$0.9672$	$0.9591$	$0.9811$	$0.9823$	$0.9673$	$0.9859$	$0.9890$
	SAM	$0.1687$	$0.1649$	$0.1082$	$0.1066$	$0.1439$	$0.0895$	$0.0757$
Harvard	RMSE	$1.4983$	$1.5693$	$1.0576$	$0.9889$	$1.5678$	$0.9061$	$0.8062$
	SSIM	$0.9960$	$0.9930$	$0.9978$	$0.9981$	$0.9932$	$0.9982$	$0.9983$
	SAM	$0.0687$	$0.0793$	$0.0631$	$0.0607$	$0.0794$	$0.0578$	$0.0545$
CAVE	RMSE	$3.5493$	$5.1275$	$2.7705$	$2.7707$	$5.0534$	$2.2913$	$1.6134$
	SSIM	$0.9910$	$0.9873$	$0.9964$	$0.9963$	$0.9853$	$0.9972$	$0.9988$
	SAM	$0.0688$	$0.1312$	$0.0690$	$0.0703$	$0.1330$	$0.0606$	$0.0471$

Patches	$1$	$2$	$3$	$4$	$5$	$6$	$7$	$8$	Average
[29]	$11.4515$	$18.7091$	$21.9944$	$24.8219$	$27.1512$	$16.0337$	$19.5113$	$11.1983$	$15.8211$
Ours	$13.3470$	$17.5822$	$11.2263$	$7.8252$	$24.1162$	$16.6412$	$11.5217$	$5.9012$	$12.9791$

Dataset	Metrics	RBF [17]	SR [20]	MM [21]	DLCSR [25]	HSCNN + [24]	CNNS [26]	${Ours}_{s 1}$	${Ours}_{s 2}$	${Ours}_{s 3}$
ICVL	RMSE	$7.7152$	$3.0223$	$2.1245$	$2.6782$	$1.2366$	$1.2058$	$1.3792$	$1.3174$	$1.1698$
	SSIM	$0.9546$	$0.9582$	$0.9946$	$0.9900$	$0.9974$	$0.9975$	$0.9975$	$0.9974$	$0.9976$
	SAM	$0.1419$	$0.0645$	$0.0470$	$0.0520$	$0.0255$	$0.0267$	$0.0288$	$0.0281$	$0.0263$
NUS	RMSE	$14.878$	$8.9766$	$6.1825$	$8.0224$	$5.7290$	$4.7150$	$5.3540$	$4.6385$	$4.4707$
	SSIM	$0.8648$	$0.8701$	$0.9555$	$0.9467$	$0.9546$	$0.9669$	$0.9606$	$0.9678$	$0.9689$
	SAM	$0.3239$	$0.2358$	$0.2114$	$0.2137$	$0.1828$	$0.1694$	$0.1810$	$0.1627$	$0.1570$
Harvard	RMSE	$11.810$	$4.9534$	$4.9616$	$3.3914$	$2.1540$	$2.0430$	$2.2910$	$2.0734$	$1.9702$
	SSIM	$0.9539$	$0.9126$	$0.9541$	$0.9842$	$0.9922$	$0.9932$	$0.9922$	$0.9935$	$0.9928$
	SAM	$0.1723$	$0.1848$	$0.2273$	$0.1562$	$0.0943$	$0.0900$	$0.0949$	$0.0913$	$0.0891$
CAVE	RMSE	$12.009$	$20.496$	$6.3212$	$6.8493$	$6.6527$	$5.6203$	$5.8467$	$5.5052$	$5.4265$
	SSIM	$0.9469$	$0.8803$	$0.9824$	$0.9802$	$0.9798$	$0.9850$	$0.9844$	$0.9866$	$0.9850$
	SAM	$0.2171$	$0.3607$	$0.1935$	$0.2110$	$0.2445$	$0.1533$	$0.1590$	$0.1485$	$0.1389$

Dataset	Metrics	[29]	${Ours}_{m 1}$	${Ours}_{m 2}$
ICVL	RMSE	$5.2780$	$1.1572$	$0.9594$
	SSIM	$0.9619$	$0.9980$	$0.9986$
	SAM	$0.0960$	$0.0252$	$0.0224$
NUS	RMSE	$8.8813$	$4.4292$	$2.9013$
	SSIM	$0.9312$	$0.9694$	$0.9810$
	SAM	$0.2229$	$0.1467$	$0.1077$
Harvard	RMSE	$2.1390$	$1.5634$	$1.0512$
	SSIM	$0.9884$	$0.9952$	$0.9979$
	SAM	$0.0971$	$0.0809$	$0.0640$
CAVE	RMSE	$7.1100$	$2.7513$	$2.3954$
	SSIM	$0.9690$	$0.9960$	$0.9965$
	SAM	$0.3319$	$0.0844$	$0.0732$

Spectral reflectance recovery using optimal illuminations

Abstract

1. Introduction

2. Illumination optimization and spectral reflectance recovery

2.1 Formulation

2.2 Illumination optimization

2.2.1 Illumination multiplexing

2.2.2 Illumination design

2.3 Spectral reflectance recovery

2.4 Learning details

3. Experiments on synthetic data

3.1 Datasets and settings

3.2 Results on RGB images

3.3 Results on monochromatic images

4. Experiments on real data

5. Discussion

5.1 Feasibility on a single image

5.2 Discussion on illumination

6. Conclusion

Funding

Disclosures

References

Cited By

Figures (13)

Tables (4)

Equations (12)

Optics Express