DeepCubeNet: reconstruction of spectrally compressive sensed hyperspectral images with deep neural networks

Daniel Gedalin; Yaniv Oiknine; Adrian Stern

doi:10.1364/OE.27.035811

1. Introduction

For more than a decade, Compressive Sensing (CS) techniques have been explored extensively for optical sensing and imaging [1], with the purpose of reducing the required effort and data storage space. Specifically, in HS sensing the issue of storage and acquisition time is even more dominant because of the high dimensionality of HS cubes. Classical acquisition techniques fail to deal with dynamic scenes due to long acquisition times, and HS cubes can reach tenths of Gigabytes of storage.

A number of techniques based on the CS framework were proposed recently for acquisition of HS data [1–10]. Most of these techniques show promising reconstruction results, while maintaining a low number of measurements. However, they share a common downside: the reconstruction is obtained by iterative algorithms, which are very time-consuming and usually demand tuning parameters that may be scenario sensitive. One of these techniques is the Compressive Sensing Miniature Ultra-Spectral Imager (CS-MUSI) [11]. The CS-MUSI was demonstrated to capture hundreds of spectral bands with an order of magnitude less samples than conventional CS imagers. The CS-MUSI employs CS only in the spectral domain; therefore, it preserves fully the spatial resolution. A brief description of the CS-MUSI is brought in Sec. 2. The CS-MUSI was shown to reach high reconstruction rates using iterative reconstruction algorithms; however, as mentioned, those algorithms are time-consuming, and naive reconstruction of one HS cube may take several hours due to the iterative nature of the algorithms. The reconstruction time is particularly long with the CS-MUSI due to its capability of capturing HS images of Giga-voxel size.

In recent years, deep neural networks (DNNs) [12] have been shown to provide state-of-the-art results for machine learning tasks, and also to reduce computational times in prediction, due to their feed-forward structure. Various types of DNNs, especially Convolutional Neural Networks (CNNs), show high accuracies in the tasks of denoising [13], super-resolution [14], inverse problem-solving in imaging [15] and CS reconstruction [16]. CNNs have been applied for various optical imaging applications, such as optical tomography [17], computational imaging [18], holography [19], microscopy [20], ptycography [21] and imaging behind scattering media [22], amongst others.

CNNs were applied for the processing of HS data in [23–25]. In [23], the authors reconstructed HS cubes from RGB measurements using an architecture similar to Very Deep Super Resolution (VDSR) [26]. The authors tested different blocks, where the best performing block was a two dimensional (2D) dense convolutional block. The paper shows high reconstruction accuracy on all 2D convolutional architectures with faster times than the classical sparse coding methods. However, they presented reconstruction results with only 31 spectral channels (cf. 391 in our work herein), and do not show generalizability to other datasets (which is demonstrated in our present work). Lately, a Deep Learning (DL) scheme [24] was shown to reconstruct CS spectrometer measurements. However, all training and reconstructions were done on simulated measurements using a Gaussian Mixture prior, which does not necessarily describe real spectra. In [27], we presented a DNN for the reconstruction of compressively sensed spectral images. The DNN is applied in the spectral domain only, and was demonstrated to obtain moderate performance on simulated data only.

Lastly, simulations of Compressive Coded Aperture Spectral Imaging (CASSI) [2] reconstruction, with joint optimization of its coded aperture, was shown to achieve high reconstruction accuracy in [25]. Due to the structure of CASSI, which multiplexes both the spatial and the spectral domains, the authors use two concatenated CNNs: a spatial CNN followed by a spectral CNN. This makes the reconstruction more complex and demands more parameters. The authors report performance degradation for the Harvard Dataset [28], which shows lack of generalization. Also, the CASSI system inherently loses information in the spatial domain by using a coded aperture. In contrast, the CS-MUSI system encodes only the spectral domain without damaging the spatial resolution. Moreover, the mentioned paper deals with the reconstruction of 31 spectral channels, while here we demonstrate CS HS reconstructions with over an order of magnitude more channels.

In this paper, we present a DNN developed for the reconstruction of data captured with the CS-MUSI. The CS-MUSI preserves the spatial resolution and allows reconstruction of hundreds of spectral bands. These properties have been found to be very important for target detection tasks [29,30], for example. For the reconstruction of spectral cubes captured with the CS-MUSI we designed an end to end DNN, dubbed DeepCubeNet (Deep hyperspectral Cube reconstruction Network) [31]. Our DNN consists essentially of two parts; the first is an approximate solver in the form of a pseudo-inverse operation and the second is a U-NET [32] architecture where the 2D convolutions are replaced by 3D convolutions.

The structure of the paper is as follows. In Sec. 2, we briefly describe the CS-MUSI system. In Sec. 3, we describe the DeepCubeNet architecture and the training process. In order to train the DeepCubeNet for the task of CS HS reconstruction, we have created a CS HS database by simulating the compressed measurements via applying a CS-MUSI compression operator on each spectrum in the HS cubes, from the ICVL [33] benchmark dataset. This operation led to the compression of those cubes. We then trained a DNN to reconstruct the original HS cube from those measurements using a CNN. In Sec. 4 we present reconstruction results. We provide quantitative results in the forms of Peak Signal-to-Noise Ratio (PSNR), and visual comparison. We show reduced reconstruction time and high accuracy both for simulated data and for real measurements acquired by the CS-MUSI. We test our network on real data acquired by the CS–MUSI and compare the results to ground-truth spectra measured by a commercial spectrometer. The process of applying the network to real data is challenging, due to the fact that we did not have proper ground-truth spectra for training the network in a supervised fashion or for applying transfer learning to the network trained on simulations. Yet, the results are competitive with classical reconstruction methods and outperform other reported deep learning schemes. To our knowledge, this is the first demonstration of reconstruction of real compressive HS imager measurements using DL techniques.

2. Imaging system-CS MUSI camera

The CS-MUSI [3,11] was designed to work as a spectral modulator that is compliant with the CS framework, by using a single liquid crystal (LC) phase retarder that encodes only the spectral domain. The CS-MUSI changes its spectral modulation by modifying its refractive index via applying different voltages on the LC cell. The spectral transmission response of the LC phase retarder, for the case where the optical axis of the LC cell is at 45° to two perpendicular polarizers, can be described by [34] :

(1)$${\phi _{\textrm{LC}}}({\lambda ,{V_i}} )= \frac{1}{2} - \frac{1}{2}\cos \left( {\frac{{2\pi \Delta n({V_i})d}}{\lambda }} \right),$$

where d is the cell thickness, $\lambda$ is the wavelength and $\Delta n({V_i})$ is the birefringence produced by voltage ${V_i}$.

The CS-MUSI camera we built is shown in Fig. 1. A LC cell is placed in the image plane of a zoom lens. The light transmitted through the LC cell is conjugated to a sensor array using a 1:1 relay lens.

Fig. 1. CS-MUSI camera.

Download Full Size | PDF

The CS process may be described by applying a sensing matrix $\boldsymbol{\Phi }$ on a HS cube f, resulting in a compressed cube g, as described in Eq. (2):

(2)$$\textbf{g =} \boldsymbol{\Phi}\textbf{f}.$$

The sensing matrix $\boldsymbol{\Phi }$ of the CS-MUSI has the form of:

(3)$$\boldsymbol{\Phi } = \left[ {\begin{array}{cccc} {{\boldsymbol{\Phi }_\lambda }}&{{0^{{M_\lambda } \times {N_\lambda }}}}& \cdots &{{0^{{M_\lambda } \times {N_\lambda }}}}\\ {{0^{{M_\lambda } \times {N_\lambda }}}}&{{\boldsymbol{\Phi }_\lambda }}& \cdots &{{0^{{M_\lambda } \times {N_\lambda }}}}\\ \vdots & \vdots & \ddots & \vdots \\ {{0^{{M_\lambda } \times {N_\lambda }}}}&{{0^{{M_\lambda } \times {N_\lambda }}}}& \cdots &{{\boldsymbol{\Phi }_\lambda }} \end{array}} \right],$$

where ${\boldsymbol{\Phi }_\lambda }$ is the spectral sensing matrix representing the spectral modulation of each pixel. The size of ${\boldsymbol{\Phi }_\lambda }$ is ${M_\lambda } \times {N_\lambda }$. The rows of ${\boldsymbol{\Phi }_\lambda }$ are determined by the LC spectral transmission response from Eq. (1). Equation (2) has the form of a block matrix and can be written in an abbreviated form as $\boldsymbol{\Phi } = \textbf{I} \otimes {\boldsymbol{\Phi }_\lambda }$, where the operator ${\otimes}$ denotes the Kronecker tensor product [35] and I is the identity matrix of size ${N_x}{N_y} \times {N_x}{N_y}$. The CS-MUSI spectral sensing matrix was obtained by pseudo-randomly sampling the full transmission map of the system, which was measured in the calibration process, and its map is presented in Fig. 2.

Fig. 2. Visualization of the 32 by 391 CS-MUSI spectral sensing matrix, ${\boldsymbol{\Phi }_\lambda }$, map describing the spectral transmission in the range of 400-800 nm for 32 different voltages applied on the LC.

Download Full Size | PDF

3. CS-MUSI data reconstruction with DeepCubeNet

3.1 Data preparation

For training the DeepCubeNet we used the ICVL HS database [33]. The data consists of HS images acquired by the Specim PS Kappa DX4 HS camera, with spectral bands in the range of 400-1000 nm. We chose 122 HS cubes as our training set, 10 HS cubes as the validation set and 10 HS cubes as the test set. The validation and test set were carefully selected to provide variability of scenes and conditions, including indoor and outdoor conditions and objects captured from different distances and having various sizes. Figure 3 shows an example of RGB representations of 3 cubes from the dataset.

Fig. 3. RGB representations of three of the data set cubes.

Download Full Size | PDF

The CS system is designed to capture data in the spectral range of 410-800 nm; therefore, we cropped this range from the original spectral cubes and resized the data to have 391 channels. We then simulated the CS-MUSI sensing process by compressing the HS cubes to 32 channels using the sensing matrix of CS-MUSI that is described in Eq. (3), in a process described by Eq. (2).

The process of extracting training pairs of the compressed patch and ground-truth patch is shown in Fig. 4. After having both compressed and original HS cubes, we extracted pairs of patches from the cubes with a spatial size of 64 × 64, so that each simulated compressed patch of size 64 × 64 × 32 has its original ground-truth pair of size 64 × 64 × 391. The patches have an overlap of 32 pixels. However, due to the large spatial dimensions of the cubes, the number of patches generated is of the order of thousands. Additionally, HS cubes demand high amounts of memory and therefore cannot be processed as batches in our CNN, which results in long training times. The CS-MUSI modulates only the spectral domain information without harming the spatial information, so in order to deal with the previously mentioned issues of memory and training times, we decided to apply spatial down-sampling to all HS cubes. The down-sampling enables faster training networks without significant loss of information, and also helps with the class imbalance presented in the data set by introducing more variance in each patch.

Fig. 4. Data pair generation: the original HS cube is compressed using the CS matrix, patches are extracted with overlap from both the original HS cube and the CS cube, generating training pairs. ${N_x}$ and ${N_y}$ are the spatial sizes of the cube, ${N_\lambda }$ is the spectral domain size and ${M_\lambda }$ is the number of CS-MUSI measurements.

Download Full Size | PDF

After the whole process above is repeated for each HS cube we end up with 1071 training patches and 82 validation patches.

3.2 DeepCubeNet architecture

The architecture of the DeepCubeNet [31], is shown in Fig. 5.

Fig. 5. DeepCubeNet Architecture; a pseudo-inverse operation followed by a three-level U-NET. The size of the output tensor is provided left of the box and then the number of filters is denoted on top of the box. The grey boxes represent copied feature maps.

Download Full Size | PDF

DeepCubeNet consists of two parts, the first is a pseudo-inverse operator and the second one is a U-NET architecture with 3D convolutions. The pseudo-inverse operation back-projects the data from the compressed domain to the HS domain. The pseudo-inverse matrix is defined in Eq. (4):

(4)$${\boldsymbol{\Phi} _{\lambda} ^{\dagger}} = {\boldsymbol{\Phi} _{\lambda} ^{\boldsymbol{T}}}{({\boldsymbol{\Phi} _{\lambda }}{\boldsymbol{\Phi} _{\lambda} ^{\boldsymbol{T}}})^{ - \textbf{1}}}.$$

Applying back-projection before the DNN is not a new concept, and was shown to be an essential part of CT reconstruction from undersampled measurement schemes in [36]. The pseudo-inverse transform of the CS-MUSI sensing matrix serves as an initial back-projection. Additionally, it also introduces prior information about the imaging system and provides the network with an initial guess, allowing it to converge to desired minima.

The pseudo-inverse operation is realized as the first block in Fig. 5. It consists of a 1 × 1 2D convolutional layer with 391 filters, with weights determined by the pseudo-inverse matrix, as shown in Fig. 6. We set this first layer to be a non-trainable, which helps the neural network to avoid overfitting.

Fig. 6. Initial back-projection using pseudo-inverse operators in the first 2D convolutional operation in Fig. 5.

Download Full Size | PDF

After the pseudo-inverse operation the data passes through a U-Net [32] architecture, where all layers are 3D convolutions that replace the 2D convolutions in the original architecture. 3D convolution is used to exploit both the spatial context of neighboring pixels and spectral correlation of neighboring bands [37]. The U-Net architecture consists of an encoder and decoder part with multiple resolutions and concatenations between the decoder and encoder parts. Table 1 describe the convolutional kernel sizes and pooling operations used in each level of the encoder and decoder of DeepCubeNet.

Table 1. Encoder and decoder architectures

View Table

We should mention that we have tried other architectures that are based on 2D convolutional layers, inspired by the performance of HSCNN++ [23]; these converged to PSNRs of the order of 30 dB. However, they fail to generalize to real measurements reconstruction due to their inability to correctly describe spectral information.

As can be seen in Fig. 5 and Table 1, the architecture also employs max pooling operations which are done solely on the spectral axis. This relates to the operation of the CS-MUSI, which leaves the spatial domain intact; therefore, there is no need for pooling in the spatial domain. However, the spectral length of the cubes is not divisible by the max pooling size, resulting in the inability to concatenate the encoder and decoder filters. Cropping was applied in the process of filter concatenations to deal with this issue.

To avoid boundary artifacts, we used reflective padding before each 3D convolution. Each 3D convolution was followed by RELU activation. The first 2D convolutions that are set to be the pseudo-inverse of the sensing matrix are followed by linear activation. The last layer of the network is also followed by RELU activation as a direct result of the spectrum being positive.

As a loss function we chose to use the MSE, as was commonly done in all previous researches. We also tested other loss functions, such as the mean absolute error and Huber loss functions, which showed no significant improvement over the MSE loss function. MSE is more sensitive to outliers in data, and when dealing with small and unbalanced data it is clear that those outliers must have an effect on the training process; this is in contrast to using MAE which is less sensitive to outliers.

4. Experimental results

The training was done on a Geforce 1080ti GPU for 38 epochs until reaching early stop. We used the ADAM optimizer with an initial learning rate of 0.0001 and a batch size of 2, while the metric used is PSNR. At convergence, PSNR reached 49.52 dB on the training set and 50.16 dB on the validation set. On a spatially down-sampled test set, the PSNR reached 48 dB, and on full resolution reconstruction, the PSNR reached 44.5 dB. This is an improvement of more than 10 dB over the state–of-the-art DL based reconstruction [25]. The average time to reconstruct a 64 × 64 × 391 HS patch from a 64 × 64 × 32 compressed patch captured with CS-MUSI is 0.25 seconds on a GPU, and 6 seconds on a CPU. In comparison, the average reconstruction time with TwIST solver [38] is 14336 seconds.

As explained in Sec. 2.2, we perform spatial down-sampling of the database to reduce the training time and storage. In order to show that spatial down-sampling does not deteriorate the prediction accuracy for a high spatial resolution cube, we provide an example in Fig. 7, which shows full cube reconstruction for full resolution of the spectral image at wavelength 500 nm, with 43.6 dB and 0.97 SSIM.

Fig. 7. Spatial resolution preserve: a) Original full resolution ground truth at 500 nm, b) Reconstruction.

Download Full Size | PDF

Figure 8 shows an example of three spectra from the test set with DeepCubeNet reconstruction, TwIST reconstruction, and ground truth. The PSNRs vary between 38.4 dB to 41.8 dB for the DeepCubeNet, and 26.1 dB to 30.7 dB for the TwIST. It can be seen that DeepCubeNet reconstructs the signal more accurately and in higher resolution than TwIST.

Fig. 8. (a) - (c) Three examples of spectra for reconstruction with DeepCubeNet (red dashed line), TwIST (dotted solid line), and ground truth (blue solid line).

Download Full Size | PDF

We tested the generalization of our network on real data acquired by the CS-MUSI [3,11]. This presents a challenge due to the fact that we did not have a database with real ground truth to use for transfer learning or supervised learning. Figure 9 shows the reconstruction results of an image consisting of three RGB LED arrays. Figure 9(a) shows the pseudo-color image of the LEDs’ obtained by projecting the HS cube predicted by the DeepCubeNet onto the RGB space, and Fig. 9(b) shows the corresponding photo of the same LEDs taken with a standard RGB camera. By comparing the prediction to the spectrometer measurements, we reach a PSNR of 30 dB for the blue LED, 31.7 dB for the red LED and 27.9 dB for the green LED. Figure 9(c) shows the reconstruction and ground truth for the LEDs and their comparison to a spectrometer ground-truth measurement.

Fig. 9. (a) RGB projection of predicted LEDs HS cube, (b) RGB Image of the RGB LED arrays. (c) Spectra reconstruction of green, blue and red LEDs in Fig. 9(a) compared to a spectrometer ground truths.

Download Full Size | PDF

Figure 10 demonstrates HS imaging of light reflected from objects that have broadband spectra. In this example, the illumination source was a Halogen light and the reflected light from three car models was captured by CS-MUSI. Figure 10(a) shows a photo of three car models taken with a standard RGB camera. Figure 10(b) presents the pseudo-color image obtained by projecting the HS cube predicted by the DeepCubeNet onto the RGB space. Owing to the separability of the spatial and spectral behavior of CS-MUSI acquisition process [3,11], the spatial resolution is not degraded by the spectral sensing process and the fine spatial details of the car models are preserved in the imaging process. Figures 10(c)–10(k) display eight images from the reconstructed cube at different wavelengths.

Fig. 10. (a) RGB image of the three car models. (b) RGB representation of the reconstructed car models HS image. Nine subfigures (c-k) from the entire HS cube are presented corresponding to different wavelengths (482, 490, 518, 536, 566, 588, 595, 630, 642 nm).

Download Full Size | PDF

The reconstruction of the LEDs’ and car models HS cube proves that the DNN is well generalized despite the fact that the data set fed to the DNN at the training stage had almost no similar condition measurements, and had no LED measurements at all. However, by learning the correct transformation from CS to HS measurements, and generalizing to a high number of spectra, DeepCubeNet successfully managed to reconstruct the LED Cube with a minimum amount of noise.

5. Conclusions

We have introduced the DeepCubeNet DNN for the reconstruction of CS HS images. We show that the CS-MUSI system with DeepCubeNet reconstruction outperforms previous presented systems by at least 10 dB PSNR with an order of magnitude more spectral bands. Further, it improves the reconstruction speed of CS-MUSI data by more than three orders of magnitude. DeepCubeNet performance generalizes well to real measurements without any further calibration other than initial training, in contrast to current iterative solvers that may require parameter tuning per scene.

This performance was achieved by devising a neural network which is adapted to our problem. Initially, DeepCubeNet uses a pseudo-inverse projection as part of the network, which prevents overfitting and introduces prior knowledge of the physical measurement system to the neural network. The added knowledge also reduces the complexity of the network by allowing reduction in the number of parameters; the projection from the CS domain to the HS domain is done by using the pseudo-inverse operation. Additionally, the use of 3D convolutions makes it possible to capture the 3D context of HS images and leads to better generalization. Moreover, we show that as a result of CS-MUSI compression of only the spectral domain, we may train the network with spatially down-sampled cubes, and thus save training time and storage space, and introduce higher variability to each training patch.

The method can be further improved by introducing a dedicated data set with real CS-MUSI measurements and corresponding HS cubes. Nevertheless, due to the generalization capability of the DNN, we believe that we may use transfer learning, which allows us to gather less data than might be required by an untrained DNN.

The DeepCubeNet architecture was illustrated with a CS-MUSI system that performs only spectral multiplexing by using a LC phase retarder as the spectral modulator. This architecture can also be realized with other spectral modulators, such as the Fabry-Perot resonator [9,10].

Funding

Ministry of Science, Technology and Space, Israel (3-13351, 3-18410).

Acknowledgments

The code repository, model and example data are freely available at https://github.com/dngedalin/DeepCubeNetPublic

Disclosures

The authors declare no conflicts of interest.

References

1. A. Stern, Optical compressive imaging (CRC Press, 2016).

2. G. R. Arce, D. J. Brady, L. Carin, H. Arguello, and D. S. Kittle, “Compressive coded aperture spectral imaging: An introduction,” IEEE Signal Process. Mag. 31(1), 105–115 (2014). [CrossRef]

3. I. August, Y. Oiknine, M. AbuLeil, I. Abdulhalim, and A. Stern, “Miniature compressive ultra-spectral imaging system utilizing a single liquid crystal phase retarder,” Sci. Rep. 6(1), 23524 (2016). [CrossRef]

4. Y. August, C. Vachman, Y. Rivenson, and A. Stern, “Compressive hyperspectral imaging by random separable projections in both the spatial and the spectral domains,” Appl. Opt. 52(10), D46–D54 (2013). [CrossRef]

5. X. Lin, G. Wetzstein, Y. Liu, and Q. Dai, “Dual-coded compressive hyperspectral imaging,” Opt. Lett. 39(7), 2044–2047 (2014). [CrossRef]

6. M. A. Golub, A. Averbuch, M. Nathan, V. A. Zheludev, J. Hauser, S. Gurevitch, R. Malinsky, and A. Kagan, “Compressed sensing snapshot spectral imaging by a regular digital camera with an added optical diffuser,” Appl. Opt. 55(3), 432–443 (2016). [CrossRef]

7. G. R. Arce, H. Rueda, C. V. Correa, A. Ramirez, and H. Arguello, “Snapshot compressive multispectral cameras,” In Wiley Encyclopedia of Electrical and Electronics Engineering; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2017; pp. 1–22.

8. X. Wang, Y. Zhang, X. Ma, T. Xu, and G. R. Arce, “Compressive spectral imaging system based on liquid crystal tunable filter,” Opt. Express 26(19), 25226–25243 (2018). [CrossRef]

9. Y. Oiknine, I. August, D. G. Blumberg, and A. Stern, “NIR hyperspectral compressive imager based on a modified Fabry–Perot resonator,” J. Opt. 20(4), 044011 (2018). [CrossRef]

10. Y. Oiknine, I. August, and A. Stern, “Multi-aperture snapshot compressive hyperspectral camera,” Opt. Lett. 43(20), 5042–5045 (2018). [CrossRef]

11. Y. Oiknine, I. August, V. Farber, D. Gedalin, and A. Stern, “Compressive sensing hyperspectral imaging by spectral multiplexing with liquid crystal,” J. Imaging 5(1), 3 (2018). [CrossRef]

12. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

13. J. Xie, L. Xu, and E. Chen, “Image denoising and inpainting with deep neural networks,” In Advances in neural information processing systems, pp. 341–349 (2012).

14. C. Dong, C. C. Loy, K. He, and X. Tang, “Image super-resolution using deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 295–307 (2016). [CrossRef]

15. A. Lucas, M. Iliadis, R. Molina, and A. K. Katsaggelos, “Using deep neural networks for inverse problems in imaging: beyond analytical methods,” IEEE Signal Process. Mag. 35(1), 20–36 (2018). [CrossRef]

16. K. Kulkarni, S. Lohit, P. Turaga, R. Kerviche, and A. Ashok, “Reconnet: Non-iterative reconstruction of images from compressively sensed measurements,” InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 449–458 (2016).

17. U. S. Kamilov, I. N. Papadopoulos, M. H. Shoreh, A. Goy, C. Vonesch, M. Unser, and D. Psaltis, “Learning approach to optical tomography,” Optica 2(6), 517–522 (2015). [CrossRef]

18. A. Sinha, J. Lee, S. Li, and G. Barbastathis, “Lensless computational imaging through deep learning,” Optica 4(9), 1117–1125 (2017). [CrossRef]

19. Y. Wu, Y. Rivenson, Y. Zhang, Z. Wei, H. Günaydin, X. Lin, and A. Ozcan, “Extended depth-of-field in holographic imaging using deep-learning-based autofocusing and phase recovery,” Optica 5(6), 704–710 (2018). [CrossRef]

20. Y. Rivenson, Z. Göröcs, H. Günaydin, Y. Zhang, H. Wang, and A. Ozcan, “Deep learning microscopy,” Optica 4(11), 1437–1443 (2017). [CrossRef]

21. T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah, “Deep learning approach for Fourier ptychography microscopy,” Opt. Express 26(20), 26470–26484 (2018). [CrossRef]

22. G. Satat, M. Tancik, O. Gupta, B. Heshmat, and R. Raskar, “Object classification through scattering media with deep learning on time resolved measurement,” Opt. Express 25(15), 17466–17479 (2017). [CrossRef]

23. Z. Shi, C. Chen, Z. Xiong, D. Liu, and F. Wu, “Hscnn+: Advanced cnn-based hyperspectral recovery from rgb images,” In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 939–947 (2018).

24. C. Kim, D. Park, and H. Lee, “Convolutional neural networks for the reconstruction of spectra in compressive sensing spectrometers,” Proc. SPIE 10937, 109370L (2019). [CrossRef]

25. L. Wang, T. Zhang, Y. Fu, and H. Huang, “HyperReconNet: Joint Coded Aperture Optimization and Image Reconstruction for Compressive Hyperspectral Imaging,” IEEE Trans. Image Process. 28(5), 2257–2270 (2019). [CrossRef]

26. J. Kim, J. Kwon Lee, and K. Mu Lee, “Accurate image super-resolution using very deep convolutional networks,” In Proceedings of the IEEE conference on computer vision and pattern recognition, pp.1646–1654 (2016).

27. Y. Heiser, Y. Oiknine, and A. Stern, “Compressive hyperspectral image reconstruction with deep neural networks,” Proc. SPIE 10989, 109890M (2019). [CrossRef]

28. A. Chakrabarti and T. Zickler, “Statistics of real-world hyperspectral images,” In CVPR 2011, pp. 193–200 (IEEE, 2011).

29. D. Gedalin, Y. Oiknine, I. August, D. G. Blumberg, S. R. Rotman, and A. Stern, “Performance of target detection algorithm in compressive sensing miniature ultraspectral imaging compressed sensing system,” Opt. Eng. 56(4), 041312 (2017). [CrossRef]

30. Y. Oiknine, D. Gedalin, I. August, D. G. Blumberg, S. R. Rotman, and A. Stern, “Target detection with compressive sensing hyperspectral images,” Proc. SPIE 10427, 104270O (2017). [CrossRef]

31. D. Gedaln, “DeepCubeNetPublic,” https://github.com/dngedalin/DeepCubeNetPublic.

32. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” In International Conference on Medical image computing and computer-assisted intervention, pp. 234–241 (Springer, 2015).

33. B. Arad and O. Ben-Shahar, “Sparse recovery of hyperspectral signal from natural RGB images,” In European Conference on Computer Vision, pp. 19–34 (Springer, 2016).

34. A. Yariv and P. Yeh, Optical waves in crystals (Wiley, 1984).

35. Y. Rivenson and A. Stern, “Compressed imaging with a separable sensing operator,” IEEE Signal Process. Lett. 16(6), 449–452 (2009). [CrossRef]

36. K. H. Jin, M. T. McCann, E. Froustey, and M. Unser, “Deep convolutional neural network for inverse problems in imaging,” IEEE Trans. Image Process. 26(9), 4509–4522 (2017). [CrossRef]

37. S. Mei, X. Yuan, J. Ji, Y. Zhang, S. Wan, and Q. Du, “Hyperspectral image spatial super-resolution via 3D full convolutional neural network,” Remote Sens. 9(11), 1139 (2017). [CrossRef]

38. J. M. Bioucas-Dias and M. A. Figueiredo, “A new TwIST: Two-step iterative shrinkage/thresholding algorithms for image restoration,” IEEE Trans. Image Process. 16(12), 2992–3004 (2007). [CrossRef]

Level	Encoder	Decoder
1	Conv3D (3,3,11)	Concatenation
	Conv3D (3,3,11)	Conv3D (3,3,11)
	Max pooling (1,1,2)	Conv3D (1,1,1)
2	Conv3D (3,3,9)	Concatenation
	Conv3D (3,3,9)	Conv3D (3,3,9)
	Max pooling (1,1,2)	Up-sampling (1,1,3)
		Cropping
		Conv3D (3,3,11)
3	Conv3D (3,3,7)	Concatenation
	Conv3D (3,3,7)	Conv3D (3,3,7)
	Max pooling (1,1,2)	Up-sampling (1,1,2)
	Max pooling (1,1,2)	Conv3D (3,3,9)
4		Conv3D (3,3,5)
		Conv3D (3,3,5)
		Up-sampling (1,1,3)
		Cropping
		Conv3D (3,3,7)

DeepCubeNet: reconstruction of spectrally compressive sensed hyperspectral images with deep neural networks

Abstract

1. Introduction

2. Imaging system-CS MUSI camera

3. CS-MUSI data reconstruction with DeepCubeNet

3.1 Data preparation

3.2 DeepCubeNet architecture

4. Experimental results

5. Conclusions

Funding

Acknowledgments

Disclosures

References

Cited By

Figures (10)

Tables (1)

Equations (4)

Optics Express