CVNet: confidence voting convolutional neural network for camera spectral sensitivity estimation

Tianyue He; Qican Zhang; Mingwei Zhou; Junfei Shen

doi:10.1364/OE.425988

1. Introduction

During the imaging process, the light emitted from an illumination source is reflected from an object and then recorded by a camera [1]. The quality of the image is determined by the intensity of the light source, the spectral reflectance of objects, and the camera response. The spectral sensitivity of the used camera establishes a strict mapping relationship between the image intensity and the scene radiance. It has been a significant task of research in computer vision, which is used in a wide range of applications such as multispectral imaging [2,3], illumination optimization [4,5], spectral reflectance reconstruction [6], and color correction [7]. Based on camera response expansion and pseudo inverse operation, Liang and Wan proposed an optimized method for spectral reflectance reconstruction by selecting and weighting the training samples [8]. With the estimated camera spectral sensitivity and spectral reflectance, Lee et al. found the optimal illumination to maximize the color difference of the objects and discriminated them with different spectra [9]. Qiu and Xu illustrated the relationship between absolute radiometric quantities and camera response [10], which was important for photographic calibration [11,12] and photometric measurement. However, the demand for accurate measurement of camera spectral sensitivity is high but it’s always uneasy to be obtained. Because spectral sensitivity is closely related to the semiconductor quality efficacy and imaging quality of the sensors, mainly depending on the manufacturing technique, it’s rarely disclosed by the manufactures.

There have been a variety of spectral sensitivity recovery techniques proposed and evaluated over the past decades. Most approaches can be categorized into the following groups: scanning method, constraint optimization based method, basis function based method, and ANN based method.

Scanning method. The traditional approach to acquiring camera spectral sensitivity is to measure the camera intensity response under serial varying monochromatic light generated by a monochromator [13,14], scanning each narrow-band over the whole spectrum. This method is accurate and easy to handle, but in many cases, it relies on expensive instruments and strictly controlled experimental conditions.

Constraint optimization based method. Kinds of constraint optimization methods [15–17] were proposed for spectral sensitivity recovery, solving this problem by imposing some constraints on the illumination spectrum or on the spectral sensitivity itself. Finlayson et al. developed a rank-based camera spectral sensitivity estimation technique that could recover the linear device spectral sensitivities from linear images and the effective linear sensitivities from rendered images [18]. The rank order of camera responses is a powerful tool for estimating the camera spectral sensitivity functions, but it imposes a constraint on the shape of the underlying spectral sensitivity curve of the sensor. Huynh and Robles-Kelly proved that the problem of estimation of spectral sensitivity function could be regarded as a well-defined mathematical problem [19]. Under the assumption that the scene lighting had a smooth spectral variation, they solved it using the coordinate-descent optimization method by imposing some constraints on the illumination spectrum.

Basis function based method. In recent years, basis function based method has been proposed without any presetting limitations on the spectral sensitivity or the illumination spectrum distribution, which has a strong power of dimensionality reduction. Zhao et al. used different types of basis functions, including polynomial basis function, Fourier series, radial basis function and SVD basis function, to estimate the unknown spectral sensitivity of an arbitrary camera [20]. The estimation accuracy of different basis functions was discussed and results showed that the radial basis function performed better compared with three other basis functions.

ANN based method. Based on the work of Ref. [20], the method proposed by Chaji et al. estimated the spectral sensitivity function using neural learning and architecture [21], modeling the objective function as the sum of weighted radial basis functions. An artificial neural network (ANN) was specifically designed to rebuild the spectral sensitivity function. Since the loss function of the network proposed by Ref. [21] was evaluated by the difference between the reconstructed image and ground truth image, the illumination spectrum and object reflectance had to be measured in advance for the spectral sensitivity function estimation. Meanwhile, if the model would be applied to a new camera, the dataset needed to be updated and the network must be retrained.

In this paper, confidence voting neural network is proposed to estimate the camera spectral sensitivity function by modeling the objective function as the linear combination of weighted basis functions. The reconstruction of spectral sensitivity can be regarded as an inverse mathematical problem of image recording. Thus, the input of the network is a single exposure picture taken by the camera and output are the rebuilt spectral sensitivity curves of three channels. High-dimensional features are extracted from different picture segments through convolutional operations and then transformed into the weight of basis functions by confidence voting. Such design motivates the network to learn the information of illumination spectrum and object reflectance. The mapping rule between the input image and spectral sensitivity can be built automatically. With our method, the trained network performs well not only on the training dataset but also on the test dataset. Experiments show that the spectral sensitivity of a Nikon D3X digital camera can be accurately estimated by the trained model with 98.98% average accuracy.

This paper proceeds as follows: In Section 2, the estimating process of spectral sensitivity curves from the image data and the implementation of CVNet are reported. The construction of CVNet, the acquisition of the basis functions, and the procedures of training our effective neural network module are also presented. Section 3 provides experiment and results, demonstrating model’s performance on the training and test dataset. The comparisons of our approach to other works are given here. Finally, the discussions and conclusions are drawn in Section 4.

2. Method

2.1 Camera response formation model

As mentioned in Section 1, the reproduction of spectral sensitivity can be regarded as an inverse problem of the camera imaging process, which is determined by the relative power distribution of the illumination source, the spectral reflectance of the object, and the response of the camera. Let L, R_x, and S_k represent the power distribution of the illumination source, the spectral reflectance of point x, and the spectral sensitivity of the camera in channel k, respectively. Then the camera response value V_kx of the pixel at position x can be given by:

(1)$${V_{kx}} = \int_\lambda {L(\lambda ){R_x}(\lambda ){S_k}(\lambda )} d\lambda. $$

Each element on the right side of Eq. (1) is a function of wavelength. Meanwhile, Eq. (1) can be rewritten in a vector-matrix form as follows:

(2)$${V_{kx}} = {{\textbf L}^T}{{\textbf R}_x}{{\textbf S}_k}, $$

where R_x is a diagonal matrix consisting of spectral reflectance r, both L and S_k are one-dimensional vectors. Equation (2) presents the fundamental imaging process. It will be further used to generate our training datasets. Details will be discussed in Section 2.2.1.

2.2 Spectral sensitivity reconstruction

We aimed to estimate the spectral sensitivity of the camera by using only a single exposure picture as the input of the artificial neural network. CVNet was then designed to achieve this goal, motivated by the deep learning optimization algorithm and basis function fitting algorithm. Figure 1 shows the block diagram of our method. There are three core modules colored in orange in Fig. 1, including an image generator module for training image data synthesis, a basis function generator module for producing basis functions, and a CVNet which is responsible for generating the corresponding weight coefficient. The hidden layer parameters are optimized by minimizing MSE (mean square error) loss between ground-truth curves and reconstruction curves, shown as the dotted line in Fig. 1, and the estimated spectral sensitivity is obtained by a weighted summation of the basis functions.

Fig. 1. Block diagram of our method.

Download Full Size | PDF

2.2.1 Image generation

Equation (2) is employed to produce the input training images of CVNet. The required datasets contain illumination spectrum data L of the light source, spectral reflectance data R of the object, and spectral sensitivity data S of the camera. Images can be generated via randomly integrating these elements L, R, and S based on Eq. (2). Figure 2 demonstrates the detailed internal structure of the Image Generator. Multispectral features (140-patch Color Checker) are used as the imaging target to provide more effective spectral reflectance information.

Fig. 2. Internal structure of the Image Generator. 10,500 image data can be generated by randomly integrating these elements L, R, and S.

Download Full Size | PDF

2.2.2 CVNet architecture design

Traditional basis function based method relied on enormous amounts of computation to acquire the weight of each basis function. But the image itself can provide useful information about the spectral sensitivity, which is also important for the convergence rate and accuracy of the model. To address this problem, the CVNet is capable of harvesting more accurate reference information from the image, in which feature correspondences can be discovered. The structure of the CVNet is shown in Fig. 3, which consists of 1 gamma layer, 6 convolutional layers, 5 pooling layers, and 1 confidence voting layer. In network training process, it takes the simulated images (size of 512×512×3 for example) produced by Image Generator as input and outputs the weight coefficient of each basis function.

Fig. 3. Detailed structure of CVNet. It takes 512×512×3 simulated images produced by Image Generator as input and outputs 1×3n weight data.

Download Full Size | PDF

Gamma nonlinear layer: The nonlinearity of the model can be compensated by gamma nonlinear layer, which optimizes the gradient descent process between hidden layers of the network.

Multi-convolutional and pooling layers: Based on multi-convolutional and pooling operations with different kernel sizes, the intensity value of each pixel of the image is transformed into high-dimensional features, such as the information of illumination spectrum and spectral reflectance.

Confidence voting layer: Extracted features are mapped to the weighting coefficient matrix of three channels with corresponding confidence. The matrix length is equal to 1 + 3n and the basis function number n is determined by the output of Basis Function Generator. By applying softmax function to the first dimension of confidence voting layer, normalized confidence levels can be obtained. Each confidence represents the importance of the behind matrix. For example, when the extracted high-dimensional features come from the part of the image with less effective spectral information, the generated weight matrix matches lower confidence.

The final weight W of the basis functions for RGB channels can be calculated by Eq. (3), which details the confidence voting process:

(3)$${\textbf W} = \sum\limits_{i = 1}^{13 \times 13} {{{\textbf c}_i} \cdot {{\textbf w}_i}}, $$

where the 1×3n vector w_i is the weight coefficient matrix, c_i is the corresponding confidence. Here, the condition 0 ≤ c_i ≤1 (for all i) must be satisfied.

2.2.3 Basis function optimization algorithm

Basis function reduces the dimension of the spectral sensitivity since the required function number is much less than the dimension of spectral sensitivity itself, resulting in less computing time and increasing the convergence speed of the model. Therefore, the spectral sensitivity can be robustly estimated by using the linear combination of basis functions. Inspired by Ref. [20] and Ref. [21], we considered integrating basis function theory into deep learning algorithm to develop a new spectral sensitivity estimation method with real-time ability and high accuracy. Fourier basis function (FBF), single value decomposition basis function (SVDBF), and radial basis function (RBF) are employed in our network because:

(1) FBF has the properties of normalization, orthogonality, and completeness, which enables it to describe more details of the objective functions.
(2) Singular value decomposition (SVD) algorithm, which is wildly used in principal component analysis (PCA), can not only provide us with a set of normalized and orthogonal eigenvectors but also reduce the dimension to solve the problem. SVDBF is calculated by applying the SVD algorithm to the spectral sensitivity database, which can be regarded as the principal components extracted from this database and is capable of fitting spectral sensitivity functions.
(3) RBF is characteristically close to the spectral sensitivity functions.

FBF. The basis functions F_FBF made up of Fourier basis functions are written as:

(4)$${{\textbf F}_{FBF}} = \sum\limits_{j = 1}^n {\cos ({2\pi \times {f_j} \times {\mathbf \lambda }} )}, $$

where f_j is the frequency of each basis function, λ=[400, 401, …,780]^T is a 381×1 vector representing 381 sampling point of the wavelength in the visible range (400 ∼ 780 nm).

To find the correct frequency f_j, 48 sets of spectral sensitivity data of 16 cameras are collected. Based on the linear additive properties of Fourier transform, their fundamental frequency components f_j are extracted from the frequency domain, which can be derived as:

(5)$$T({{f_j}} )= {{\cal F}}({{S_1} + \ldots + {S_j} + \ldots + {S_{48}}} ), $$

where ℱ denotes the symbol of Fourier transform, S_j is the spectral sensitivity function of each channel. The amplitude distribution of T(f_j) is demonstrated in Fig. 4, from which the optimal frequency components f_j can be located in 0 ∼ 6 Hz.

Fig. 4. Amplitude distribution T(f_j) in the frequency domain.

Download Full Size | PDF

When the initial frequencies f_j are known, the basis functions F_FBF can be calculated by Eq. (4), which are shown with different colors in Fig. 5.

Fig. 5. (a) The first four Fourier basis functions. (b) The last three Fourier basis functions.

Download Full Size | PDF

The number of basis functions changes with the specific accuracy requirement. More functions are required to achieve higher accuracy, but it leads to longer computing time and higher memory occupation. Not all the basis functions are shown here.

SVDBF. Based on the database of spectral sensitivity of 16 cameras, the eigenvectors used as basis functions can be calculated by applying the SVD method to the database. It’s assumed that the matrix of the desired spectral sensitivity database S is decomposed by the SVD algorithm, as shown in Eq. (6):

(6)$${{\textbf S}_{m \times n}} = {{\textbf U}_{m \times m}}{{\mathbf \Sigma }_{m \times n}}{{\textbf V}_{n \times n}}, $$

where m=381 is the number of wavelength sampling points and n=16×3 = 48 is the number of singular values of S. The left singular value matrix U_m×m and the right singular value matrix V_n×n consist of the eigenvectors of (SS^T)_m×m and (S^TS)_n×n, respectively. S^T denotes the transposition of S. Each column of U is a 381×1 eigenvector that is used as the basis function. Σ is a diagonal matrix made up of 48 singular values, each of which represents the weight of the eigenvector and V is a transformation matrix. Figure 6 shows the basis functions extracted from SVD.

Fig. 6. The first nine basis functions extracted from SVD.

Download Full Size | PDF

Using the SVD algorithm to select the basis functions is a wise criterion because the larger singular value corresponds to the more important eigenvector. The percentage of the first nine eigenvalues is demonstrated in Table 1.

Table 1. Percentage of each eigenvalue.

View Table | View all tables in this article

It can be seen that for the first nine eigenvalues, the sum of their percentage is 90.86%, which means the corresponding eigenvectors cover 90.86% information of the spectral sensitivity database.

RBF. Radial basis function is commonly used in the basis function methods, which can be written as:

(7)$${{\textbf F}_{RBF}} = \sum\limits_{j = 1}^n {\exp - \left[ {\frac{{{{({{\mathbf \lambda } - {c_j}} )}^2}}}{{{\delta_j}}}} \right]}, $$

where c_j and δ_j are constants that determine the center and the width of the curve, respectively. Similarly, λ=[400, 401, …,780]^T is a 381×1 column vector consisting of 381 sample points of the wavelength. Significantly, both c_j and δ_j are the hidden layer parameters, which can be optimized during the training phase. Otherwise, it will lead to less accurate estimation with severe vibrate of the reconstruction curves. Figure 7 shows the radial basis functions for three channels.

Fig. 7. Radial basis functions.

Download Full Size | PDF

To summarize, Basis Function Generator is responsible for generating a 381×n basis function matrix F which can be combined with the n×3 weight matrix W to generate the final estimated spectral sensitivity functions. So we have:

(8)$${\textbf S}_{381 \times 3}^{\ast } = {{\textbf F}_{381 \times n}}{{\textbf W}_{n \times 3}}, $$

where S* is the estimation result of the spectral sensitivity for RGB channels.

2.3 Loss function

To prevent over-fitting during training, there are 2 sets of loss functions in our approach. The overall loss function can be interpreted as:

(9)$${L_{overall}} = {L_{MSE}} + {L_2}\textrm{ = }\frac{1}{N}{\sum\limits_{x = 1}^{381 \times 3} {({s_x^\ast{-} {s_x}} )} ^2}\textrm{ + }\sum\limits_i {|{\omega_i^2} |} , $$

where L_MSE is the mean square error between the ground-truth S and the reconstruction results S*, L₂ is the Euclidean norm loss, N=381 × 3 is the number of sampling points for three channels, s_x* and s_x are the components of S^* matrix and S, respectively, ω_i is the weight of the hidden layer which is defined as the default optimizable parameter of the network.

During the training phase, the weight coefficient and shape of the basis functions are optimized to minimize L_overall. In this case, the optimal combination of basis functions can be learned.

3. Experiment and results

3.1 Dataset acquisition

The image generating method mentioned in Section 1 was employed to create the training and test dataset. The database adopted 21 CIE (International Commission on illumination) standard illuminants from CIE technical report (colorimetry, 4th edition) [22] as data L, 25 kinds of spectral reflectance of the 140-patch Color Checker measured under various lighting conditions using the hyperspectral camera as data R, and 20 spectral sensitivities of different cameras as data S. Then, the elements L, R, and S were processed by self-developed software to produce 10,500 simulated raw images. The light source used here was the LED lighting box constructed in our laboratory, which was an LED panel with the size of 700mm×400mm integrated with 128 LEDs (11 kinds of LEDs, sketch shown in Fig. 8(a)). The 11 kinds of high-power LEDs consisted of 8 color LEDs and 3 white LEDs with different CCTs, whose spectral distributions were all given by Fig. 8(b). Luminance level of each LED could be controlled by the LED control circuit and the target spectral distribution was modulated by the software based on a light matching algorithm and the feedback signal from a spectrometer.

Fig. 8. (a) Sketch of the lightbox embedded with LED panels. (b) Spectral distribution of used LEDs. Intensity is in arbitrary units.

Download Full Size | PDF

The hyperspectral camera Specim IQ made by Spectral Imaging LTD was adopted here, working in the visible to a near-infrared band (400 ∼ 1000 nm). Figure 9 shows the whole experiment setup for spectral reflectance measurement.

Fig. 9. Experiment setup to capture the spectral reflectance of the 140-patch Color Checker.

Download Full Size | PDF

3.2 Network training and preprocess implementation

Each module of the estimation process has been defined. The network was trained and tested on the recently proposed dataset, i.e., 10,500 simulated images. The training set contained 8,400 pairs, each of them consisted of an input image and the corresponding spectral sensitivity functions S defined as the ground-truth. There were 1,050 testing images in the testing set and the remaining 1,050 images were used to validate the model after each epoch of the training. Here one epoch denotes a complete consumption round of the training data while the whole training phase contains 320 epochs. To avoid numerical instability during training, each mini-batch contained 100 training images with size 512 × 512 × 3 along with 100 ground-truth curves with size 381 × 3 and our model was trained with the learning rate of 3e^-4. Meanwhile, preprocessing the training images before delivering them to the network could improve the robustness of the model, whereas the gamma function was additionally considered. The training images were randomly augmented by horizontally or vertically flipping, then followed by randomly rotating 0° ∼ 135° or by randomly cropping, but the resized images had to cover at least 80% information of the raw images.

3.3 Network train and test

For each training phase, the value of the loss function during iterations was recorded, i.e., L_overall (hereafter called MSE), which could be a direct criterion for evaluating the experimental accuracy. The final test loss and the estimated results (clipped to 0 ∼ 1) calculated from the testing dataset were also recorded. Training results showed that both the number and the specific shape of basis functions significantly affected the experimental accuracy. Under the assumption that the function number n is fixed to 8, the performance of each type of the basis function is shown in Figs. 10–12. The variation of loss function, the estimated spectral sensitivity functions, and the ground truth data are provided. Batch and epoch training loss represent the mean MSE of each mini-batch and each training epoch, respectively.

Fig. 10. (Top left) Loss value of each batch under the setting of FBF (n=8). The red circle denotes the local enlargement of the curve. (Top right) Illustration of the validation loss and epoch loss for FBF (n=8). (Bottom) Estimated results for RGB channels along with the corresponding ground truth data, which are calculated from the testing dataset. Solid and dash line represent the ground truth (GT) and estimated (E) spectral sensitivity functions, respectively.

Download Full Size | PDF

Fig. 11. (Top left) Loss value of each batch under the setting of SVDBF (n=8). The red circle denotes the local enlargement of the curve. (Top right) Illustration of the validation loss and epoch loss for SVDBF (n=8). (Bottom) Estimated results for RGB channels along with the corresponding ground truth data, which are calculated from the testing dataset. Solid and dash line represent the ground truth (GT) and estimated (E) spectral sensitivity functions, respectively.

Download Full Size | PDF

Fig. 12. (Top left) Loss value of each batch under the setting of RBF (n=8). The red circle denotes the local enlargement of the curve. (Top right) Illustration of the validation loss and epoch loss for RBF (n=8). (Bottom) Estimated results for RGB channels along with the corresponding ground truth data, which are calculated from the testing dataset. Solid and dash line represent the ground truth (GT) and estimated (E) spectral sensitivity functions, respectively.

Download Full Size | PDF

As it can be seen in Figs. 10–12, the loss function of all methods converges after approximately 1000 batch training iterations and the shape of the three sets of curves coincide closely over the whole spectrum, which proves the feasibility of the proposed model. The final test loss (MSE) of three basis functions for FBF, SVDBF, and RBF is 0.0386, 0.0133, and 0.0106, respectively. Accordingly, the estimation accuracy of RBF is the best while that of the FBF is the worst. The reason might be that the dimension of the first eight Fourier basis functions in the vector space is lower than that of spectral sensitivity. Compared with FBF, the first eight eigenvectors extracted from SVD cover more than 89.3% information of the spectral sensitivity database, which means higher linear independence of these vectors. RBF performs better because it’s characteristically close to the objective functions.

More function number is expected to provide better results. To investigate how it affects estimation accuracy changes, the experimental results with different values of n are given in Table 2, covering the range from n=6 to n=50. In particular, the performances of the network under settings of n=16 and n=50 are illustrated in Fig. 13 and Fig. 14.

Fig. 13. Estimated results for RGB channels along with the corresponding ground truth data with three types of basis functions under the setting of n=16.

Download Full Size | PDF

Fig. 14. Estimated results for RGB channels along with the corresponding ground truth data with three types of basis functions under the setting of n=50.

Download Full Size | PDF

Table 2. The final test loss (MSE) for different basis functions.

View Table | View all tables in this article

As demonstrated in Fig. 13 and Fig. 14, the smoothness and continuity of the ground truth curves are accurately learned by the estimated spectral sensitivity functions. When n is 16 (or 50), the final test loss (MSE) of three basis functions for FBF, SVDBF, and RBF is 0.0090 (0.0086), 0.0125 (0.0131), and 0.0088 (0.0093), respectively. The results reported in Fig. 13 and Fig. 14 using more basis functions show an improvement compared with Figs. 10–12. Meanwhile, the accuracy of FBF is more sensitive to the increase of n than the other two basis functions, so that more details of the spectral sensitivity functions can be described by the new Fourier basis functions. Because the last eight eigenvectors of SVDBF contain less than 10.7% weight value, the new eigenvectors make little contribution to the experimental accuracy.

Given different function numbers, the whole MSE results are shown in Table 2 and Fig. 15. The minimal MSE loss of 0.0086 is obtained using FBF. The main reason is that compared with the other two basis functions, FBF is the only one that has the properties of normalization, orthogonality, and completeness. That means it can offer more high dimensional information to fit more features of the objective functions. Theoretically, an arbitrary signal can be made up of the integral of sine functions. RBF and SVDBF perform better especially when the function number is smaller than 10, which also costs less computing time. Besides, with the increase of n, the MSE of these three basis functions has different degrees of fluctuation. This may be caused by over-fitting with large amounts of basis functions and parameters. For instance, when n is 25 or larger, the dimension of basis functions also increases. More details can be learned and overfitted during training phase, but such ‘details’ and high-frequency components learning ability may have negative effect in real testing.

Fig. 15. The final test loss (MSE) for different basis functions.

Download Full Size | PDF

Because the confidence voting module plays a key role in our CVNet, to investigate its effects on experimental accuracy, a traditional convolutional neural network (CNN), whose confidence voting layer is replaced with 2 convolutional layers and 2 pooling layers, is employed for comparison. Without the use of confidence voting layer, the final MSE of CNN is compared with that of CVNet in Table 3, under the setting of n=8. It can be seen that better results can be obtained by using confidence voting layer, since it’s capable of evaluating useful information amount supplied by different image segments to increase the reliability of the model.

Table 3. Comparison of CVNet and CNN without confidence voting module.

View Table | View all tables in this article

Besides, a new synthetic dataset was collected for simulation test, whose spectral sensitivity data, light source data, and reflectance data has never been used in the training dataset. The MSE losses of our network on the new dataset are given in Table 4, which are at the same level compared to the MSE in Table 3, verifying the generality ability of the proposed model.

Table 4. Testing results on the synthetic dataset.

View Table | View all tables in this article

The robustness against noise is also considered and analyzed in experiments by adding random noises into the testing images (used in Table 4). Testing results on the noisy dataset are shown in Table 5 and Fig. 16. Compared with Table 4, added noises have little effect on the experimental accuracy, and the anti-noise characteristic of the model becomes stronger as n increases. This is because the confidence voting module in our network is specifically designed to automatically discriminate different features extracted from different segments of the image. High-dimensional features learned from these ‘noisy segments’ score lower confidence level, making little contribution to the estimation result.

Fig. 16. Experimental results using datasets within and without noise for comparison. Solid lines represent the results without noise. Dotted lines represent the results with noise, named with (N-).

Download Full Size | PDF

Table 5. Testing results on the synthetic dataset within random noise.

View Table | View all tables in this article

All approaches were run on a Tesla P100 GPU and average runtime over 50 runs was recorded. Under the settings of n=8, n=16, n=25, and n=50, CVNet cost 10.7 ms, 11.7 ms, 12.9 ms, and 16.2ms for each conversion of the 512 × 512 image to the estimated spectral sensitivity functions, respectively. In contrast, dozens of minutes to hours are required for traditional scanning method to measure the spectral response in whole visible wavelength band, which depends on the speed of system calibration, mechanical scanning and image signal processing.

3.4 Comparison to other references

For a fair comparison, all the measurement errors are converted into accuracy. Experimental results reported by our method and other references are demonstrated in Table 6.

Table 6. Comparison of our experimental results to other references

View Table | View all tables in this article

As shown in Table 6, the accuracy of our method is higher than that of the other three methods, which are 91.26% for using a fluorescent target, 95.79% for using neural learning and architecture, and 95.00% for using monochromator and narrow-band interference filter. The method reported by Ref. [21] requires a new spectral reflectance dataset when the trained model is adapted to a new camera. The use of fluorescent target, monochromator, and narrow-band interference filters is imperative during the measurement in Ref. [23] and [24]. However, these devices are expensive and difficult to setup. Compared with other traditional approaches, this proposed method significantly improves the reconstruction accuracy.

3.5 New approach validation

To test the performance of the trained network in real measurement, a raw image (*.nef file) of 140-patch Color Checker was obtained using Nikon D3X. Demosaic process was then manipulated using the Bayer pattern for each color channel. There were not any other operations done on the image gray-level intensity or chromatic value. Under settings of n=8, 16, and 50, the estimated spectral sensitivity functions with three types of basis functions and the ground-truth data of Nikon D3X are shown in Fig. 17. In addition, a natural raw image (not the checkerboard image used for training, shown in Fig. 18) captured by Nikon D3X is also used as the network input and the corresponding reconstruction results are shown in Fig. 19.

Fig. 17. Estimated spectral sensitivity and ground-truth data of Nikon D3X with different basis functions. B, G, and R denote three channels of the camera. For n=8, 16 and 50, the MSE of three kinds of basis functions (FBF), (SVDBF), and (RBF) is (0.0358, 0.0088, 0.0072), (0.0130, 0.0148, 0.0115), and (0.0109, 0.0071, 0.0074), respectively.

Download Full Size | PDF

Fig. 18. A natural raw image captured by Nikon D3X for real measurement.

Download Full Size | PDF

Fig. 19. Estimated spectral sensitivity and ground-truth data of Nikon D3X using a natural raw image, instead of 140 ColorChecker images used for training. For n=8, 16 and 50, the MSE of three kinds of basis functions (FBF), (SVDBF), and (RBF) is (0.0472, 0.0096, 0.0107), (0.0121, 0.0120, 0.0115), and (0.0087, 0.0081, 0.0077), respectively.

Download Full Size | PDF

From Fig. 17 and Fig. 19, we can see that our network performs well not only on the checkerboard images, but also on the natural image. But it should be noticed that it’s a severely ill-posed problem to rebuild the spectral sensitivity if input scene image has too less spectral information. Better results can be obtained by using these imaging targets with rich color features. In the future work, we are planning to investigate the detailed influence of scene color features amount on the reconstruction accuracy, and find out the constraint about the least color shades needed in the input image. Meanwhile, other deep learning architectures with strong generation ability, such as Generative Adversarial Network (GAN) will also be primarily considered. However, for CVNet, we still recommend using the objects with more color features to achieve high spectral response estimation accuracy in the real measurement.

In Fig. 17 and Fig. 19, although the reproduced curves are very close to the original ones, details of the objective functions are lost. Since deep learning techniques work best when the data used for training resembles the data used for evaluation, the main reason is that the spectral curves used in the simulation are very smooth but the real camera spectral curves may not be that smooth. To address this problem, the network can be retrained with a new dataset, which contains these irregular spectral curves. Moreover, the characteristic of basis functions can be optimized to solve this problem.

For that purpose, the network has been trained with another 10500 images including irregular spectral response while a new type of radial basis function (RBF*) is employed, whose parameters cj and δj are fixed and cannot be adjusted during training. As discussed in Section 2.2.3, this kind of basis functions may lead to vibration in the reconstruction curve, but when n is small, this characteristic is proper for fitting non-smooth spectral response. The reconstruction result is shown in Fig. 20. Similarly, the network input is a raw image captured by Nikon D3X.

Fig. 20. Experimental result of Nikon D3X calculated by RBF* (MSE=0.0062), with the network trained on the new dataset.

Download Full Size | PDF

As expected, better results can be obtained by using the retrained network and RBF*. Certainly, this result shows the potential of our network to deal with curves which may not be smooth. In the future, the training dataset can be expanded for more kinds of spectral curves and more basis functions can be designed to adapt to various experiment conditions. For example, except for RBF*, Fourier basis functions with high frequency or radial basis function with small half-width can also be used to learn more details and high-frequency components of the objective functions.

4. Discussion and conclusions

In this study, CVNet was proposed to reconstruct the camera spectral sensitivity defined as the weighted basis functions with only one image as input. There are two major parts in the network: (1) Basic convolutional operations and (2) Confidence voting integration. High-dimensional features from different image regions are extracted through multiple convolutional operations and then integrated by confidence voting algorithm to generate weight. The process from image to spectral sensitivity can be rebuilt autonomously. Experiments show that the estimation results with our proposed method have higher accuracy, both in the training and validation.

Based on the camera response formation model, 10500 simulated raw images were adopted as the database, for network training, validation, and test. Meanwhile, gamma nonlinear layer and image preprocessing were proposed to improve the model robustness. Results show that three different basis functions have disparate accuracy as function number n changes. When n is small, RBF and SVDBF perform better because they have strong power in dimensionality reduction. But as n increases, FBF gets higher accuracy due to its orthogonal completeness. Finally, Nikon D3X was adopted and tested on the well-trained network. The MSE of estimated spectral sensitivity for FBF, SVDBF, and RBF is 0.0088, 0.0148, and 0.0071, respectively, which verifies the effectiveness of our model.

Compared with other approaches, our proposed method features simpler design, lower cost, higher flexibility, and no strict experimental requirements. Due to the high-performance characteristic of CNN which allows parallel batch processing, this work can be assembled on GPU to accelerate computation. Most relied on deep learning calculation, the whole estimation can be completed without expensive and redundant hardware environment. Another advantage of our proposed method is real-time capability. The corresponding developed algorithm has millisecond-level running time so that it can be used in a lot of real-time measurements. For example, some sensors are extremely sensitive to the change of environment temperature and air humidity, which means pre-calibrated spectral sensitivity with traditional spectral scanning method would not work. Instead, our proposed artificial neural network can realize in-time spectral sensitivity reconstruction and computational imaging.

Certainly, the precision of the model can be improved by setting more complex basis functions, but it will lead to longer computation time and higher memory occupation. To solve this problem, algorithm can be optimized in the future to find a better balance between the required precision and the hardware cost. Although GPU has rapid instruction processing ability, that will fail in many AI smart applications without high-powered GPU assembled, such as mobile phone and portable device. This can be solved by algorithm optimization for terminal systems and model migration.

Moreover, this method can be used in multi-spectral imaging by providing in-time camera spectral sensitivity reconstruction. For that purpose, multi-modal learning mechanism can be developed to learn both the camera characteristic and scene spatio-spectral information from only one captured image of the scene. This will offer great potential for real-time 3D imaging, virtual/augmented reality display, biomedical spectroscopy measurement, and so on.

Funding

National Natural Science Foundation of China (62075143).

Acknowledgment

The authors would like to acknowledge funding support from The National Natural Science Foundation of China.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. J. Y. Hardeberg, H. Brettel, and F. J. M. Schmitt, “Spectral characterization of electronic cameras,” (1998), pp. 100–109.

2. M.-H. Lee, H. Park, I. Ryu, and J.-I. Park, “Fast model-based multispectral imaging using nonnegative principal component analysis,” Opt. Lett. 37(11), 1937–1939 (2012). [CrossRef]

3. S. Ono, “Snapshot multispectral imaging using a pixel-wise polarization color image sensor,” Opt. Express 28(23), 34536–34573 (2020). [CrossRef]

4. A. M. Nahavandi, M. Safi, P. Ojaghi, and J. Y. Hardeberg, “LED primary selection algorithms for simulation of CIE standard illuminants,” Opt. Express 28(23), 34390–34405 (2020). [CrossRef]

5. H.-C. Wang, Y.-T. Chen, J.-T. Lin, C.-P. Chiang, and F.-H. Cheng, “Enhanced visualization of oral cavity for early inflamed tissue detection,” Opt. Express 18(11), 11800–11809 (2010). [CrossRef]

6. J. Liang, K. Xiao, M. R. Pointer, X. Wan, and C. Li, “Spectra estimation from raw camera responses based on adaptive local-weighted linear regression,” Opt. Express 27(4), 5165–5180 (2019). [CrossRef]

7. Z. Sadeghipoor, Y. M. Lu, and S. Suesstrunk, “Optimum Spectral Sensitivity Functions for Single Sensor Color Imaging,” in Digital photography VIII, (Burlingame, CA(US), 2012), pp. 829904.829901–829904.829914.

8. J. Liang and X. Wan, “Optimized method for spectral reflectance reconstruction from camera responses,” Opt. Express 25(23), 28273–28287 (2017). [CrossRef]

9. M.-H. Lee, D.-K. Seo, B.-K. Seo, and J.-I. Park, “Optimal illumination for discriminating objects with different spectra,” Opt. Lett. 34(17), 2664–2666 (2009). [CrossRef]

10. J. Qiu and H. Xu, “Camera response prediction for various capture settings using the spectral sensitivity and crosstalk model,” Appl. Opt. 55(25), 6989–6999 (2016). [CrossRef]

11. M. Rump, A. Zinke, and R. Klein, “Practical spectral characterization of trichromatic cameras,” in Proceedings of the 2011 SIGGRAPH Asia Conference, (Association for Computing Machinery, Hong Kong, China, 2011), p. Article 170.

12. F. Sigernes, J. M. Holmes, M. Dyrland, D. A. Lorentzen, T. Svenøe, K. Heia, T. Aso, S. Chernouss, and C. S. Deehr, “Sensitivity calibration of digital colour cameras for auroral imaging,” Opt. Express 16(20), 15623–15632 (2008). [CrossRef]

13. M. M. Darrodi, G. Finlayson, T. Goodman, and M. Mackiewicz, “Reference data set for camera spectral sensitivity estimation,” J. Opt. Soc. Am. A 32(3), 381–391 (2015). [CrossRef]

14. F. Sigernes, M. Dyrland, N. Peters, D. A. Lorentzen, T. Svenøe, K. Heia, S. Chernouss, C. S. Deehr, and M. Kosch, “The absolute sensitivity of digital colour cameras,” Opt. Express 17(22), 20211–20220 (2009). [CrossRef]

15. K. Mahmoud, S. Park, S.-N. Park, and D.-H. Lee, “Measurement of normalized spectral responsivity of digital imaging devices by using a LED-based tunable uniform source,” Appl. Opt. 52(6), 1263–1271 (2013). [CrossRef]

16. R. Kawakami, H. Zhao, R. T. Tan, and K. Ikeuchi, “Camera Spectral Sensitivity and White Balance Estimation from Sky Images,” International Journal of Computer Vision 105 (2013).

17. J. Zhu, X. Xie, N. Liao, Z. Zhang, W. Wu, and L. Lv, “Spectral sensitivity estimation of trichromatic camera based on orthogonal test and window filtering,” Opt. Express 28(19), 28085–28100 (2020). [CrossRef]

18. G. Finlayson, M. M. Darrodi, and M. Mackiewicz, “Rank-based camera spectral sensitivity estimation,” J. Opt. Soc. Am. A 33(4), 589–599 (2016). [CrossRef]

19. C. P. Huynh and A. Robles-Kelly, “Recovery of Spectral Sensitivity Functions from a Colour Chart Image under Unknown Spectrally Smooth Illumination,” in 2014 22nd International Conference on Pattern Recognition, 2014), 708–713.

20. R. K. H. Zhao, R. T. Tan, and K. Lkeuchi, “Estimating basis functions for spectral sensitivity of digital cameras,” in meeting on Image Recognition and Understanding (MIRU), pp. 7–13 (2009).

21. S. Chaji, A. Pourreza, H. Pourreza, and M. Rouhani, “Estimation of the camera spectral sensitivity function using neural learning and architecture,” J. Opt. Soc. Am. A 35(6), 850–858 (2018). [CrossRef]

22. t. E. CIE 015:2018 Colorimetry, Commission Internationale de L'Eclairage.

23. S. Han, Y. Matsushita, I. Sato, T. Okabe, and Y. Sato, “Camera spectral sensitivity estimation from a single image under unknown illumination by using fluorescence,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition. [v.1], (Providence, Rhode Island, USA, 2012), pp. 805–812. [CrossRef]

24. C. Mauer and D. Wueller, “Measuring the spectral response with a set of interference filters,” in Digital photography V, (San Jose, CA(US), 2009), pp. 72500S:72501–72500S:72510.

Eigenvalues	50.0862	28.8987	20.9565	10.2401	7.1144	4.4257	3.3411	2.9487	2.2310
Percentage	34.94%	20.16%	14.62%	7.14%	4.96%	3.09%	2.33%	2.06%	1.56%

Method	Criterion	Accuracy
Neural learning and architecture [21]	RMSE^a	95.79%^b
Fluorescent target [23]	RMSE	91.26%^b
Monochromator and narrow-band interference filter [24]	Relative Error	95.00%^b
FBF (ours)	MSE	97.92%
SVDBF (ours)	MSE	98.69%
RBF (ours)	MSE	99.01%

Eigenvalues	50.0862	28.8987	20.9565	10.2401	7.1144	4.4257	3.3411	2.9487	2.2310
Percentage	34.94%	20.16%	14.62%	7.14%	4.96%	3.09%	2.33%	2.06%	1.56%

Method	Criterion	Accuracy
Neural learning and architecture [21]	RMSE^a	95.79%^b
Fluorescent target [23]	RMSE	91.26%^b
Monochromator and narrow-band interference filter [24]	Relative Error	95.00%^b
FBF (ours)	MSE	97.92%
SVDBF (ours)	MSE	98.69%
RBF (ours)	MSE	99.01%

CVNet: confidence voting convolutional neural network for camera spectral sensitivity estimation

Abstract

1. Introduction

2. Method

2.1 Camera response formation model

2.2 Spectral sensitivity reconstruction

2.2.1 Image generation

2.2.2 CVNet architecture design

2.2.3 Basis function optimization algorithm

2.3 Loss function

3. Experiment and results

3.1 Dataset acquisition

3.2 Network training and preprocess implementation

3.3 Network train and test

3.4 Comparison to other references

3.5 New approach validation

4. Discussion and conclusions

Funding

Acknowledgment

Disclosures

Data availability

References

Data availability

Cited By

Figures (20)

Tables (6)

Equations (9)

Optics Express