Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Input aperture restriction of the spatial spectral compressive spectral imager and a comprehensive solution for it

Open Access Open Access

Abstract

Compressive spectral imaging (CSI) is an attractive spectral imaging technique since it could acquire a spectral image data cube in a single snapshot. One notable CSI scheme is the spatial spectral compressive spectral imager (SSCSI), which has low complexity and high quality of the recovering spectral image. However, the SSCSI suffers from a small input aperture, which reduces the optical efficiency and signal-to-noise ratio of the system. In this paper, the effect of the input aperture size on the SSCSI system is analyzed. It shows that with the increase of input aperture, the incident light from different spectral bands will overlap with each other on the mask, and the encoding pattern of each spectral band will be ambiguous. Thus, the reconstruction quality of the data cube will highly deteriorate. A new scheme is proposed to deal with this problem. First, the observed image is resampled and recombined into new sub-observed images to improve the frequency response of the encoding pattern. Then each sub-observed image is divided into multiple sub-sets to reduce the coherence of the sensing matrix. Compared to the original reconstruction algorithm for the SSCSI system, the peak signal-to-noise ratio (PSNR) is promoted by more than 3dB, and the spectral reconstruction accuracy and noise suppression capability are also improved.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Spectral image data is a three-dimensional data cube, including spatial and spectral information, which has a large volume. Spectral imaging instruments with a two-dimensional focal plane detector often need to scan in the time dimension to obtain the three-dimensional data cube, which cannot cope with the changing scenes. Compressive spectral imaging (CSI) enables the sampling rate of the signal lower than Nyquist-criterion [1], which greatly reduces the amount of sampling data, and makes it possible to obtain a three-dimensional data cube in a single snapshot.

The CSI needs to meet two basic criteria: sparsity and incoherence measurement (isometry property) [2]. Spectral images have spatial and spectral correlation, and they can be represented sparsely in a specific transformation domain. To achieve incoherent measurement, various CSI systems with different structures are proposed [310]. The most famous ones are the coded aperture snapshot spectral imager (CASSI) and its modifications [35]. In the CASSI system, a two-dimensional coded mask is used to encode the spatial dimension, and spectral modulation can be realized by one or two dispersive elements. The single disperser CASSI (SD-CASSI) has a simple structure, but the observations are aliasing, which affects the quality of the reconstructed data [3,4]. Gehm et al. proposed a dual disperser CASSI (DD-CASSI) system [5], by changing the position of the coded mask and adding an extra disperser. Lin et al. proposed dual-coded hyperspectral imaging (DCSI) system [6], which can increase the randomness of the sensing matrix and the accuracy of the spectral reconstruction by setting two coded masks in the optical path to encode the spatial and spectral dimensions respectively. Correa et al. proposed a snapshot colored compressive spectral imager (SCCSI) system [7] in which they replaced the mask with a set of tiny filters attached to the sensor and then encoded only a portion of the bands. However, the high complexity of DD-CASSI and DCSI systems reduces the optical efficiency of the system, resulting in difficulties in assembly and calibration [11]. The production of filters in the SCCSI system is difficult and at a high cost, which limits its wide application in practice [12]. Subsequently, Liu et al. proposed a spatial spectral compressive spectral imager (SSCSI) system [8] based on agile spectral imaging technology [13]. SSCSI uses only one coded mask and one disperser to modulate spatial and spectral information simultaneously, which can avoid image aliasing at low system complexity.

Recently, some scholars proposed to directly modulate spectral information, such as compressive spectral imaging with diffractive lenses (CSID) system [9] and miniature ultra-spectral imaging (MUSI) system [10]. The CSID modulates spectral information by changing the focal length of the diffractive lens, and MUSI uses the birefringence effect of liquid crystal elements to encode spectral information. The optical structures of these two systems are relatively simple. Nevertheless, CSID requires multiple measurements to achieve good results. MUSI retains redundant spatial information, which will reduce the compression rate.

Among the above systems, SSCSI has a great attraction. It is similar to the DD-CASSI system but implemented with low complexity, and it can recover higher-quality spectral data [14]. Based on the effect of the “spectral plane” [13], it adopts only one coded mask and one disperser to realize the simultaneous encoding of spectral and spatial dimensions. However, according to the conclusion in [13], the input aperture of the SSCSI system must be small enough to ensure a high sampling efficiency. It is generally believed that a small input aperture will lead to a small amount of incident light, which will affect the sensitivity and signal-to-noise ratio of the system.

In this paper, we present a method to solve this problem. In the method, we divide the observed image into multiple sub-regions according to the projection size and then extract the corresponding pixels to reconstruct the multiple sub-observed images. Therefore, the sensing matrix corresponding to each sub-image can be improved in terms of frequency response and randomness. Due to the low spatial resolution of the sub-image after partition, this method combined with the “super-resolution” reconstruction algorithm to avoid the decline of spatial resolution, and finally reconstructed high-quality spectral image data.

The structural arrangement of this paper is as follows. In Section 2, the imaging principle and the restriction of the input aperture of the SSCSI system are briefly introduced. In Subsection 3.1, the influence of spatial frequency response and coherence of the sensing matrix of SSCSI is analyzed. In Subsection 3.2, the optimization method proposed in this paper is presented, and the optimization results of frequency response and coherence of the sensing matrix are verified through simulation. In Section 4, simulation experiments show that the proposed method can improve the PSNR and spectral accuracy of the SSCSI system with a large input aperture.

2. Brief description of SSCSI system

In the SSCSI system, the incident light is first imaged to the grating plane by the objective lens and then imaged to the sensor by the back-end lens after the dispersion, as shown in Fig. 1. A spectral plane will be formed between the back-end lens and the sensor, i.e. light of the same spectrum in the scene converges on a straight line perpendicular to the direction of dispersion. The coded mask is located between the sensor and the spectral plane, $s = {d_m}/{d_a}$ is a ratio of the distance between the sensor and spectral plane. The transfer model in the SSCSI system can be written as

$$g(x,y) = \int\limits_\Lambda {T(x(1 - s) + sa\lambda ,y)f(x,y,\lambda )} d \lambda ,$$
where T is the coded mask, f is spectral data of the scene, x, y, λ represents the spatial and spectral coordinates respectively, $\Lambda $ is the spectral range and g is the observed image. In the SSCSI system, the coding of spectral and spatial dimensions is realized through changing the s. When $s = 0$, the mask coincides with the sensor position. At this time, there is no dispersion on the mask, all spectral data is coded by the same mode, so the modulation of spectral dimension cannot be carried out, and all spectral information is lost. When $s = 1$, the coded mask overlaps with the spectral plane, the same spectral data will appear in a “column” of the mask, that is, the result of the highest spectral resolution reconstruction. However, a large amount of spatial information will be lost in the process of observation, and the recovery data is most ambiguous in the spatial dimension. Therefore, s should generally vary between 0-1.

 figure: Fig. 1.

Fig. 1. The optical path of SSCSI.

Download Full Size | PDF

According to the conclusion in [13], to make the spectrum more distinguishable, the range of each spectrum on the spectral plane should be as narrow as possible. The imaging system should be similar to a pinhole imaging device, but considering the actual imaging quality, the input aperture is generally set around F#11. As the input aperture becomes larger, the range of a single spectrum on the spectral plane becomes larger, as well as the overlap of adjacent spectra, which will affect the imaging quality of the SSCSI system in both spatial and spectral dimensions. This will be analyzed in detail below.

3. Our optimization method

3.1 Influence of input aperture on SSCSI imaging

In an ideal SSCSI system, the input aperture should be as small as possible to ensure that the incident light of the same wavelength converges at the same point on the spectral plane. In [14], the influence of code elements and pixel size on spatial resolution is analyzed, as well as the influence of different positions of the coded mask on the spectral resolution. However, it is based on the front imaging system, which is an ideal pinhole imaging system. With the increase of input aperture, the projected area of incident light on the coded mask increases, which affects the modulation of spatial dimension and spectral dimension, and finally reconstructed spectral data will also be affected.

Based on the compressive sensing theory, the sensing matrix of the SSCSI system directly affects the quality of reconstructed data. In general, the quality of the sensing matrix is measured by RIP (Restricted Isometry), as shown in (2):

$$(1 - {\delta _s}){\bigg \Vert}\theta {\bigg \Vert}_2^2 \le {\bigg \Vert}{A\theta } {\bigg \Vert}_2^2 \le (1 + {\delta _s}){\bigg \Vert}\theta {\bigg \Vert}_2^2 ,$$
where $\theta $ is the sparse coefficient, A is the sensing matrix, the smaller ${\delta _s}$ corresponds to higher reconstruction quality. To facilitate the calculation and analysis, coherence analysis of the sensing matrix is usually used to replace the complex and time-consuming reconstruction process. The lower the coherence of the sensing matrix, the higher quality of reconstruction. The observation process of SSCSI can be written as
$$G = A\theta + \eta = H\phi \theta + \eta ,$$
where G is the observation data, H, $\phi $ are the coded dispersion accumulation process and the sparse dictionary respectively, $\eta $ is observation noise. The coherence of sensing matrix A is defined as $\mu (A )$, means the maximum value of any two columns in the sensing matrix after correlation calculation, which can be described as
$$\mu (A )= \mathop {\max }\limits_{m,n \atop m \ne n} \left|{\left\langle {\frac{{A(,m)}}{{{\big \Vert}{A(,m)} {\big \Vert}}},\frac{{A(,n)}}{{{\big \Vert}{A(,n)} {\big \Vert}}}} \right\rangle } \right| ,$$
the dictionary $\phi $ in the sensing matrix A is fixed, therefore the coherence of A is mainly influenced by H. To ensure the low coherence of H matrix, the adjacent code elements on the mask should be as different as possible, which can be achieved by optimizing the pattern of the mask to reduce the element “clustering” effect [15].

In the SSCSI system, to maintain the non-singularity of the sensing matrix, the incident light of adjacent spectral bands needs to shift by at least one column on the mask, so different spectral bands can obtain the different encoding pattern. When the input aperture is a pinhole, by selecting an appropriate disperser, the incident light of different spectral bands can pass through different coding regions and finally converge to the same pixel on the sensor. But with the increase of the input aperture, the incident beam becomes wider, and the projected area on the mask also becomes larger. Then the incident light from different spectral bands will overlap each other on the mask, especially the adjacent spectral bands. Therefore, the coding difference between adjacent spectral bands is reduced, that is, the coherence of the sensing matrix is increased. It has been pointed out in [14] that the higher the coherence of the sensing matrix is, the worse quality of the corresponding reconstructed spectral data.

For example, as shown in Fig. 2, the red, green, and blue lines represent incident light of different spectral bands at the same point. As the input aperture increases, the projected area on the mask expands to $q \times q$ ($q = 2$ as an example) elements, and the corresponding encoding pattern of each band is weighted by all elements in the projected area. The overlapping region of the projection between spectral bands leads to strong coherence between the final encoding pattern, which leads to the coherence of sensing matrix A increases. To solve this problem, the spectrum can be more dispersed by improving the dispersion capability of the grating, thus reducing the “overlapping area” of adjacent spectral bands on the mask. But that would add extra costs to the system, and the wider spectral plane would make some of the spectrum uncollectable [14], resulting in a decrease in the number of observable spectral bands. Therefore, based on existing SSCSI system hardware, how to reduce the overlapping area of adjacent spectral bands on the mask is an important issue to improve the system observation efficiency.

 figure: Fig. 2.

Fig. 2. Adjacent spectral bands share elements on the mask (suppose the projection region of a single spectrum is 2 × 2 elements on the mask) and the change of H matrix.

Download Full Size | PDF

Overlap occurs not only between adjacent spectral bands but also between adjacent pixels in the same spectral band. In Fig. 3, red and blue represent the incident light of two adjacent pixels in the same spectral band respectively, when the distance between the mask and the sensor is e, the incident light corresponding to the two adjacent pixels will generate an overlapping area of size Q on the mask, which can be defined as

$$Q = \frac{e}{f}\Delta c + \frac{{e \times AP}}{{\gamma + f}} ,$$
where AP is the input aperture size, $\Delta c$ is pixel size, $\gamma $ is the compensation introduced in the calculation process, which is a fixed quantity based on the optical system. It can be seen that when the input aperture AP or e increases, the overlapping region Q will increase, the difference of the final encoding pattern of adjacent pixels decreases, resulting in a lower frequency response of the final encoding pattern of each spectral band, which will affect the spatial sampling.

 figure: Fig. 3.

Fig. 3. Overlapping of incident light from adjacent pixels on the coded mask.

Download Full Size | PDF

Edgar [15] et al. conducted frequency domain analysis to optimize the pattern of the mask. The frequency response of the mask has a direct effect on reconstruction quality in the spatial dimension. The high-frequency information can be observed by the mask with a high-frequency response, and the spectral data recovered rich in detail. They used power spectral density for quantitative analysis, which can be given by

$${P_f} = \frac{1}{R}\sum\limits_{i = 1}^R {\frac{{DFT({{\varphi_i}} )}}{{\varepsilon ({{\varphi_i}} )}}} ,$$
where ${P_f}$ means frequency statistics of the mask, R means that the mask can be divided into R areas which can be calculated respectively and ${\varphi _i}$ is the segments of each small area. $DFT({{\varphi_i}} )$ means a Discrete Fourier Transform (DFT) to ${\varphi _i}$, and $\varepsilon ({{\varphi_i}} )$ is used as normalized processing for mask energy calculation. When code elements are uniformly distributed on the mask, their frequency response range is relatively broad. On the contrary, when similar code elements are clustered together on the mask, the frequency response range will be reduced, and the detailed parts in space cannot be sampled.

In the SSCSI system, the final encoding pattern of each pixel changes with the location of the mask. When the mask is located in the focal plane of the system ($s = 0$), the corresponding spatial frequency response is the highest. Therefore, the spatial information is the most abundant, but at this time, the corresponding encoding mode of each spectral band is the same, so the coherence of the sensing matrix is the largest, which means that the spectral information cannot be recovered. As the mask moves away from the focal plane ($0 < s < 1$), the corresponding encoding pattern of each spectral band starts to be different, so the sensing matrix gradually changes into a non-singular matrix to recover the spectral information, but the incident light of adjacent pixels will form more “overlapping areas”, leading to a further decline in frequency response. In the final recovered data, the high-frequency spatial information is lost, resulting in a fuzzy image. As shown in Fig. 4, when the overlapping area of adjacent pixels increases, the corresponding encoding pattern changes.

 figure: Fig. 4.

Fig. 4. Changes in encoding pattern correspond to the projection size of incident light on the mask (the encoding pattern corresponds to a single spectral band).

Download Full Size | PDF

When the projection size of incident light corresponds to $1 \times 1$ element on the mask, it means that each pixel can correspond to one code element on the mask, and the code elements corresponding to adjacent pixels do not overlap with each other. At this point, the final encoding pattern presents a high frequency. When the incident light corresponds to $2 \times 2$ on the mask, each pixel corresponds to a $2 \times 2$ encoding region, and adjacent pixels share a $2 \times 1$ region on the mask, it can be seen that the frequency of the final encoding pattern decreases. Until $10 \times 10$, the adjacent pixels share a $10 \times 9$ size encoding region on the mask, and the frequency of the encoding pattern is already very low. Therefore, in the SSCSI system, how to improve the frequency response of the encoding pattern is an important problem to recover high-quality spatial information on the premise of ensuring the spectral resolution of the system.

3.2 Optimization based on frequency and coherence of the sensing matrix

As mentioned above, the input aperture of the SSCSI system needs to ensure sufficient incident light, but it also improves the coherence of the sensing matrix and reduces the frequency response of the encoding pattern corresponding to each spectral band. Therefore, reducing the coherence of the sensing matrix and improving the frequency response of the encoding pattern is important to improve the observation efficiency of the SSCSI system.

First, the frequency response of the encoding pattern is considered. In the SSCSI system, the incident light cannot converge on the mask, so the coding region corresponding to adjacent pixels will overlap, and this overlapping area will lead to the reduction of the final coding difference, as shown in Fig. 5. Therefore, with the continuous increase of s in the SSCSI system, the spatial resolution will gradually deteriorate. The solution to this issue is to reduce the overlapping area of adjacent pixels on the mask.

 figure: Fig. 5.

Fig. 5. The overlapping area corresponds to adjacent pixels.

Download Full Size | PDF

To meet the RIP criterion in compressive sensing [2], the elements of the mask are generally random, and there is little correlation between adjacent code elements. If the incident light of adjacent pixels does not overlap on the mask, the final encoding pattern is greatly different, and the frequency response can also be improved. However, due to the limitation of optical imaging, the incident light corresponding to adjacent image points on the non-focal plane inevitably overlaps, which is called the “defocusing” phenomenon.

In the SSCSI system, the frequency response of the encoding pattern decreases mainly because of the overlapping of incident light of adjacent pixels on the mask. In this paper, we define the weighted results of multiple code elements corresponding to a single pixel as “equivalent elements”. Due to the code elements on the mask are random, the corresponding equivalent elements in the non-overlapping areas of the coded mask are quite different. Therefore, in the observed image, the points whose project areas on the mask do not overlap with each other, can be extracted and recombined into a new observation image to improve the frequency response of the encoding pattern. Assume that the incident light of each image point projected on the mask is a $q \times q$ region. Then the adjacent image points will have an overlapping region with the size of $q \times (q - 1)$ in the horizontal direction and $(q - 1) \times q$ in the vertical direction on the mask, so the incident light corresponding to the points spaced $(q - 1)$ pixels in the observed image will not overlap on the mask. In Fig. 6, $q = 2$ indicates that the incident light of each pixel will project a $2 \times 2$ area on the mask, ${P_{i - 1}}$, ${P_i}$, ${P_{i + 1}}$ respectively represent three adjacent pixels on the sensor. The incident light of two adjacent pixels like ${P_{i - 1}}$ and ${P_i}$ or ${P_i}$ and ${P_{i + 1}}$ will generate an overlapping area (vertical direction) with a size of $1 \times 2$ elements on the mask as shown in Fig. 6(b). The ${P_{i - 1}}$ and ${P_{i + 1}}$ are spaced one pixel apart, and the corresponding incident light has no overlapping area on the mask, as can be seen in Fig. 6(a). Therefore, interval sampling of the observed image can be carried out to recombine the pixels corresponding to the incident light in the non-overlapping region of the mask into a new observed image. As the difference between the equivalent elements of each pixel increases, the frequency response of the corresponding encoding pattern will increase accordingly.

 figure: Fig. 6.

Fig. 6. Interval sampling of observation image.

Download Full Size | PDF

When the $N \times N$ observed image is sampled at an interval of $(q - 1)$ pixels, $q \times q$ sub-images with a resolution of $\frac{N}{q} \times \frac{N}{q}$ will be generated, and the observation process of the whole system will be converted to

$${g_i} = {d_i} \times H\phi \theta + \eta ,$$
where ${d_i} \in {{\mathbb R}^{\frac{{{N^2}}}{{{q^2}}} \times {N^2}}}(i = 1:{q^2})$ represents the interval sampling, ${g_i} \in {{\mathbb R}^{\frac{N}{q} \times \frac{N}{q}}}(i = 1:{q^2})$ is the sub-images, $H \in {{\mathbb R}^{{N^2} \times {N^2}L}}$ is sensing matrix and $f \in {{\mathbb R}^{N \times N \times L}}$ is the spectral data cube, L is the number of spectral bands. At this time, the frequency response of the encoding pattern corresponding to the low-resolution ${g_i}$ is higher than the one corresponding to the original data. However, if the sub-images are reconstructed directly, only the low spatial resolution data can be recovered. Henry Arguello [16] et al. proposed a method with the variable spatial resolution to improve the reconstruction speed of compressive spectral imaging. It is assumed that $q \times q$ region in the observed image shares the same spectral information, the observed images can be downsampled q times to speed up the reconstruction. Gonzalo R. Arce et al. [17] proposed that in the SSCSI system if the code element is smaller than the pixel size, spatial resolution can be improved through super-resolution reconstruction. Therefore, super-resolution reconstruction can be performed on sub-images to obtain spectral data with a high spatial resolution

Considering that the incident light of a pixel is projected onto the mask in a region of $q \times q$, it can be understood that the incident light is encoded by $q \times q$ code elements and then converges to a pixel, which is a down-sampling process. In theory, if we can get the $q \times q$ code elements pattern to each pixel, then the spatial resolution can be improved by q times through optimization reconstruction. However, in a actual SSCSI system, the final encoding pattern corresponding to each spectral band is different from the pattern on the coded mask, which is a weighted average of $q \times q$ code elements. But the size of q is unknown and needs to be calibrated, combined with Eq. (7), the size of q can be calculated as follows:

$$q = \frac{Q}{{\Delta d}} + 1 = \frac{1}{{\Delta d}} \times (\frac{e}{f}\Delta c + \frac{{e \times AP}}{{r + f}}) + 1 ,$$
where $\Delta d$ represents the size of the code element on the mask. It can be seen from Eq. (8) that as the incident aperture AP keeps increasing, the size of q also keeps increasing. In the actual SSCSI system, the calibration of the coded mask is needed to obtain the encoding pattern corresponding to each spectral band. Therefore, the size of q can be obtained at the same time during the calibration. The specific method is as follows: Firstly, uniform white light is used to illuminate the whiteboard as a target, and the mask is adjusted to the sensor plane to obtain the pattern T. Because adjacent bands have one element offset on the mask, therefore, the corresponding region ${T_{i = 1:L}}$ of each band on the mask can be obtained. After moving the mask to a pre-computed location determined by the desired spectral resolution, which can be calculated according to Eq. (20) in [14]. Then the equivalent encoding pattern ${T^{\prime}}$ is obtained. Lastly, by calculate
$$\mathop {\min }\limits_q {\left\Vert{{T^{\prime}} - \frac{1}{L}\sum\limits_{i = 1}^L {conv2({T_i},K)} } \right\Vert_2} ,$$
can obtain the size of q. Where $conv2({T_i},K)$ represented the convolution of ${T_i}$ with K, and K is a full 1 convolution kernel of size $q \times q$, simulating the process of light rays converging to an image point after passing through several codes. After the $q \times q$ code elements corresponding to each pixel is obtained, the reconstruction process becomes as follows:
$$\mathop {\arg \min }\limits_\theta ({{{{\big \Vert}{{g_i} - {d_i}H_i^{\prime}\phi \theta } {\big \Vert}}_2} + \lambda {{{\big \Vert}\theta {\big \Vert}}_1}} ) ,$$
where, ${d_i}H_i^{\prime} \in {{\mathbb R}^{\frac{{{N^2}}}{{{q^2}}} \times {N^2}L}}$ is the coded dispersion matrix of the sub-image ${g_i}$. And the encoding pattern corresponding to ${d_i}H_i^{\prime}$ is quite different from the one corresponding to H in Eq. (7).

The power spectral density of the encoding pattern described in Eq. (6) was used to compare and verify the effectiveness of optimization between ${d_i}H_i^{\prime}$ and H. In the experiment, the coded mask was generated randomly 3000 times, and then the corresponding observation matrices were generated according to different values of q. The mean results for 3000 times are shown in Table 1 below:

Tables Icon

Table 1. Power Spectral Density of Encoding Pattern Corresponding to Different ${q}$ Values

When the value of q keeps increasing, the power spectral density of the encoding pattern corresponding to H in the SSCSI system gradually decreases. In contrast, the power spectral density of the corresponding encoding pattern of ${d_i}H_i^{\prime}$ remains unchanged after the interval sampling method proposed in this paper, so it can be ensured that the information of higher frequency can still be sampled. For intuitive representation, the encoding pattern corresponding to the original matrix H and the matrix ${d_i}H_i^{\prime}$ is shown in Fig. 7. It can be seen that with the continuous increase of q, the final encoding pattern of the SSCSI system gradually becomes fuzzy, but after optimization of the method in this paper, the encoding pattern changes slowly. Therefore, high-frequency information can still be effectively observed to ensure that the reconstruction with higher spatial quality.

 figure: Fig. 7.

Fig. 7. Different q values correspond to the original encoding pattern and the optimized encoding pattern. (The original image has a large size (256×256), and the size of 20×20 in the central area was selected to display).

Download Full Size | PDF

In terms of spectrum dimension, the coherence of the sensing matrix is analyzed previously. According to the conclusion in [13], the optical path of the SSCSI system is shown in Fig. 8.

 figure: Fig. 8.

Fig. 8. The optical path of SSCSI system with a non-pinhole aperture.

Download Full Size | PDF

Three colors of red, green, and blue represent three different spectral bands corresponding to the image point ${X_s}$. There are overlapping regions between different spectral bands on the mask, which will reduce the coding difference between different spectral bands and destroy the non-singularity of the final sensing matrix. Therefore, if the overlap region can be reduced, the coherence of the sensing matrix can be reduced. The width of each spectrum on the “spectra plane” ${R_\theta }$ is defined as

$${R_\theta } = \theta \frac{{{F_1}{F_2}}}{{p + {F_1} - {F_2}}} ,$$
where ${F_1}$ and ${F_2}$ represent the focal length of lens1 and lens2 respectively. The overlapping regions can be reduced by decreasing ${R_\theta }$, while ${R_\theta }$ is proportional to $\theta $, so reducing $\theta $ can significantly reduce ${R_\theta }$. In general, $\theta $ is defined as $\theta = \frac{{{a_1}}}{{{F_1}}}$, where ${a_1}$ is the input aperture of lens1. The direct way is to reduce ${a_1}$, but this will reduce the light input, and a small aperture will also reduce the optical resolution of the imaging system.

Similar to the method of spatial optimization, increasing the coherence of the sensing matrix can be avoided by reducing the “sharing” part of the coding region. To maintain the non-singularity of the sensing matrix, the incident light of different spectral bands needs to shift by at least one column on the mask for effective observation. When the projected area of incident light on the mask is $q \times q$, regardless of the nonlinearity of the dispersion, the projected area of each corresponding spectral band on the mask is $q \times q$. Therefore, the incident light of adjacent spectral bands will generate a $q \times (q - 1)$ overlapping region on the mask. Therefore, spectral bands of interval $(q - 1)$ code elements do not overlap with each other. If these spectral bands are extracted and recombined, the coherence of the corresponding sensing matrix will be reduced. This process is shown in Fig. 9. The spectral data to be recovered is divided into $\{{{f_1},{f_2},{f_3}, \cdots ,{f_q}} \}$ according to the projection size q of incident light on the mask. And $\{{{H_1}\phi ,{H_2}\phi ,{H_3}\phi , \cdots ,{H_q}\phi } \}$ is the sensing matrix corresponding to each subset, which is non-singular because the incident light of these spectral bands does not overlap. The reconstruction process of the system becomes as

$$\min \left( {{{\left\Vert{G - \sum\limits_{n = 1}^q {{H_n}\phi {\theta_n}} } \right\Vert}_2} + \lambda {{{\big \Vert}{{\theta_1} + \cdots + {\theta_n}} {\big \Vert}}_1}} \right) ,$$
combined with the optimization in the spatial dimension, Eq. (12) can be modified as
$$\min \left( {{{\left\Vert{{g_i} - \sum\limits_{n = 1}^q {{d_i}{H_n}\mathop \phi \limits^ \wedge \mathop {{\theta_n}}\limits^ \wedge } } \right\Vert}_2} + \lambda {{\left\Vert{{{\mathop \theta \limits^ \wedge }_1} + \cdots + \mathop {{\theta_n}}\limits^ \wedge } \right\Vert}_1}} \right) ,$$
where ${H_n}\mathop \phi \limits^ \wedge $, $\mathop {{\theta _n}}\limits^ \wedge $ (n = 1 : q) is the sensing matrix and sparsity coefficient corresponding to each subset. And q is the mapping size of incident light on the mask, and also the number of the new subset. Because each of the sub-image ${g_i}$ can restore the complete spectral data of the scene, so the ${q^2}$ multiple sub-images will make better noise suppression.

 figure: Fig. 9.

Fig. 9. Diagram of spectral dimensional interval sampling.

Download Full Size | PDF

To verify the coherence of the optimized sensing matrix, spectral data were assumed to be $32 \times 32 \times 16$. The coded mask adopted a random binary mask, and DCT dictionaries were used for sparse dictionaries. The observation matrix of the original SSCSI system and the optimized observation matrix were generated accordingly. 1000 experiments were conducted randomly, corresponding to the size of $q = 2 $ to 10 respectively. Here we used Eq. (4) to calculate the coherence of the sensing matrix corresponding to the original SSCSI system and the optimized one, and the mean results are shown in Table 2.

Tables Icon

Table 2. Coherence of Sensing Matrix with Different ${q}$. Values

With the q value increases, the overlap area on the mask between the different spectral bands gradually expands, resulting in the increase of $\mu (A )$. After optimized by the methods in this paper, there is no overlap between the adjacent spectral bands on the mask, so the coherence $\mu (A )$ is decreased. It can be seen that when $q = 10$, the effect of optimization is not very obvious. The reason is that there are 100 elements in the coding region of $10 \times 10$, and they are all randomly distributed as binary codes. So the final encoding pattern to each spectral band has a high probability of approaching $50\%$ gray. This means that although the optimization method in this paper divided the spectral bands into multiple subsets, the singularity of the optimized sensing matrix is still very strong.

4. Simulations

To verify the effectiveness of the proposed method, the GAP-TV algorithm [18] was used as a solver in the experiment to reconstruct the spectral data, because the algorithm has low computational complexity and a good recovery effect. The authors in [8] also used the GAP-TV method as the basic comparison method in subsequent studies [20]. In addition, according to Ref. [7], different reconstruction methods have no obvious influence on the reconstruction results of SSCSI system. Therefore, it is appropriate to choose the GAP-TV method as the reconstruction method to compare the optimization effect. The code of GAP-TV [21] was downloaded from the home page of the author of Ref. [8]. The reconstruction results of the original SSCSI observed data were compared with the reconstruction results of the optimized data by the proposed method. The experimental spectral data adopted the multi-spectral data of [19]. The size of the spectral data cube was $512 \times 512 \times 31$, the spectral range was 400 nm-700 nm, and the sampling interval was 10 nm. In the experiment, the projection size of the incident light on the mask was simulated to be $q \times q$ code region (q value is variable). Peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) were compared in the spatial dimension. In terms of spectral accuracy, specific image blocks were selected to plot the consistency between the recovery results of each band and the ground truth.

Firstly, the experiment simulated the case that the projection size is $2 \times 2$ region. The result is shown in Fig. 10.

 figure: Fig. 10.

Fig. 10. Reconstruction results from 400 nm to 700 nm (a) direct reconstruction of the original observed image (b) the result is optimized by the algorithm in this paper.

Download Full Size | PDF

Where Fig. 10(a) corresponds to the process of directly using the observation to reconstruct. The right side is the measured image, and the left side is the reconstruction results. And Fig. 10(b) corresponds to the algorithm in this paper, since the area projected on the mask is $2 \times 2$ elements, the measured image will be recombined into four low-resolution sub-measured images, and each sub-image will be divided into two subsets to avoid the overlap between spectral bands. The four sub-measured images are distributed on the right, and the spectral reconstruction is distributed on the left.

To evaluate the reconstruction quality of spatial dimension, PSNR was used in the experiment for comparison, and the results are shown in Fig. 11(a). It can be seen that the average PSNR value optimized by the algorithm in this paper is 3 dB higher than that of direct reconstruction, and the SSIM is 0.05 higher than that of direct reconstruction. As the original measured image is divided into several sub-measured images, the number of observations increases, the noise in the reconstruction process is eliminated to some extent, and the reconstruction results will be smoother and clearer, as shown in Fig. 11(b).

 figure: Fig. 11.

Fig. 11. Quality comparison of spatial dimension

Download Full Size | PDF

The spectral signatures comparison is shown in Fig. 12. Four image blocks were selected to draw the normalized intensity of 400 nm-700 nm respectively. It can be seen that the proposed method is more in line with the spectral distribution curve of the original data than the direct reconstruction. To verify the adaptability of the method presented in this paper on different data sets, the other three scenarios were selected for testing, as shown in Fig. 13. It can be seen that the recovery quality of the proposed method is better than that of direct reconstruction in both PSNR and SSIM.

 figure: Fig. 12.

Fig. 12. Comparison between accurate spectral recovery quality

Download Full Size | PDF

 figure: Fig. 13.

Fig. 13. Comparison between the direct reconstruction of observation data and the optimized reconstruction of observation data in this paper in multiple scenes

Download Full Size | PDF

In the SSCSI system, the coded mask may be moved near the spectral plane to achieve higher spectral resolution. And the input aperture of the front objective lens will also be enlarged to increase the amount of light on certain occasions. Therefore, the projected area of the incident light on the mask will increase. To verify the effectiveness of the proposed method in the case of increasing $q \times q$ region, the experiment simulated the projected areas set as $4 \times 4$ and $8 \times 8$ respectively. The results are shown in Table 3 and Table 4.

Tables Icon

Table 3. Comparison Result of $4 \times 4$

Tables Icon

Table 4. Comparison Result of $\textrm{8} \times \textrm{8}$

In Table 3, $q = 4$, that means the projected area on the coded mask becomes $4 \times 4$. With the increase in the projected area, the spatial frequency response of the encoding pattern decreases, and the coherence of the sensing matrix increases. Therefore, the PSNR and SSIM of the final recovery results have obvious regression relative to the projected area of $2 \times 2$. However, after the optimization of the method in this paper, it can be seen that there is about 2 dB improvement on the PSNR, as well as a higher structural similarity (SSIM). In Table 4, $q = 8$, The PSNR value of the method in this paper is slightly higher than the one reconstructed directly from the observed data. However, there is no significant improvement in structural similarity (SSIM). The reason is that after the measured images are optimized and recombined, 64 sub-measured images will be generated, and each sub-image needs to be improved by 64 times of super-resolution, which is difficult. Therefore, compared with direct reconstruction, the advantages are not obvious.

5. Conclusion

In this paper, the influence of input aperture on SSCSI imaging system is analyzed in depth. Due to the requirement of imaging quality, the aperture in the actual SSCSI system cannot be infinitely small, so the incident light will pass through an $q \times q$ area on the mask. In spatial dimension, the encoding pattern difference of adjacent pixels decreases, and the spatial response frequency of the system decreases accordingly, which affects the quality of spatial recovery. In spectral dimension, the difference of encoding pattern between spectral bands also decreases, which will increase the coherence of the sensing matrix and is not conducive to the recovery of spectral information. In this paper, the above two problems were analyzed, and a kind of pre-processing based on the optimization and reorganization of the observed data was proposed. The coherence of the sensing matrix corresponding to the observation data is reduced, and the spatial frequency response of the system is improved while the original imaging system remains unchanged. The experimental results show that the method proposed in this paper can improve the spectral accuracy, spatial smoothness, and PSNR compared with directly using the observed data from the SSCSI system for reconstruction. However, when the projection area of incident light is large, the improvement effect of this method is limited, which is still a problem worthy of future research.

Funding

National Natural Science Foundation of China (61675161,61572395); Fundamental Research Funds for the Central Universities (zdyf2017003).

Disclosures

The authors declare no conflicts of interest.

References

1. E. J. Candes and M. B. Wakin, “An Introduction to Compressive Sampling,” IEEE Signal Process. Mag. 25(2), 21–30 (2008). [CrossRef]  

2. E. J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inf. Theory 52(2), 489–509 (2006). [CrossRef]  

3. M. Gehm, S. McCain, N. Pitsianis, D. Brady, P. Potuluri, and M. Sullivan, “Static two-dimensional aperture coding for multimodal, multiplex spectroscopy,” Appl. Opt. 45(13), 2965–2974 (2006). [CrossRef]  

4. A. Wagadarikar, R. John, R. Willett, and D. Brady, “Single disperser design for coded aperture snapshot spectral imaging,” Appl. Opt. 47(10), B44–B51 (2008). [CrossRef]  

5. M. E. Gehm, R. John, D. J. Brady, R. M. Willett, and T. J. Schulz, “Single-shot compressive spectral imaging with a dual-disperser architecture,” Opt. Express 15(21), 14013–14027 (2007). [CrossRef]  

6. X. Lin, G. Wetzstein, Y. Liu, and Q. Dai, “Dual-coded compressive hyperspectral imaging,” Opt. Lett. 39(7), 2044–2047 (2014). [CrossRef]  

7. C. V. Correa, H. Arguello, and G. R. Arce, “Snapshot colored compressive spectral imager,” J. Opt. Soc. Am. A 32(10), 1754–1763 (2015). [CrossRef]  

8. X. Lin, Y. Liu, J. Wu, and Q. Dai, “Spatial-spectral encoded compressive hyperspectral imaging,” ACM Trans. Graph. 33(6), 1–11 (2014). [CrossRef]  

9. O. F. Kar and F. S. Oktem, “Compressive spectral imaging with diffractive lenses,” Opt. Lett. 44(18), 4582–4585 (2019). [CrossRef]  

10. I. August, Y. Oiknine, M. AbuLeil, I. Abdulhalim, and A. Stern, “Miniature compressive ultra-spectral imaging system utilizing a single liquid crystal phase retarder,” Sci. Rep. 6(1), 23524 (2016). [CrossRef]  

11. X. Cao, T. Yue, X. Lin, S. Lin, X. Yuan, Q. Dai, L. Carin, and D. J. Brady, “Computational snapshot multispectral cameras: Toward dynamic capture of the spectral world,” IEEE Signal Process. Mag. 33(5), 95–108 (2016). [CrossRef]  

12. J. D. Barrie, K. A. Aitchison, G. S. Rossano, and M. H. Abraham, “Patterning of multilayer dielectric optical coatings for multispectral ccds,” Thin Solid Films 270(1-2), 6–9 (1995). [CrossRef]  

13. A. Mohan, R. Raskar, and J. Tumblin, “Agile Spectrum Imaging: Programmable Wavelength Modulation for Cameras and Projectors,” Comput. Graph. Forum 27(2), 709–717 (2008). [CrossRef]  

14. E. Salazar, A. Parada-Mayorga, and G. R. Arce, “Spectral Zooming and Resolution Limits of Spatial Spectral Compressive Spectral Imagers,” IEEE Trans. Comput. Imaging 5(2), 165–179 (2019). [CrossRef]  

15. E. Salazar and G. R. Arce, “Coded Aperture Optimization in Spatial Spectral Compressive Spectral Imagers,” IEEE Trans. Comput. Imag. 6, 764–777 (2020). [CrossRef]  

16. Y. Mejia-Melgarejo, O. Villarreal-Dulcey, and H. Arguello, “Adjustable spatial resolution of compressive spectral images sensed by multispectral filter array-based sensors,” Rev. Fac. Ing. Antioquia 78(78), 89–98 (2015). [CrossRef]  

17. E. Salazar, A. Parada, and G. R. Arce, “Spatial Super-resolution reconstruction via SSCSI Compressive Spectral Imagers,” in Imaging and Applied Optics 2018 (3D, AO, AIO, COSI, DH, IS, LACSEA, LS&C, MATH, pcAOP), OSA Technical Digest (Optical Society of America, 2018), pp. CTu5D.5.

18. X. Yuan, “Generalized alternating projection based total variation minimization for compressive sensing,” in Proceedings of IEEE International Conference on Image Processing (IEEE, 2016), pp. 2539–2543.

19. F. Yasuma, T. Mitsunaga, D. Iso, and S. K. Nayar, “Generalized Assorted Pixel Camera: Postcapture Control of Resolution, Dynamic Range, and Spectrum,” IEEE Trans. on Image Process. 19(9), 2241–2253 (2010). [CrossRef]  

20. X. Yuan, Y. Liu, J. Suo, and Q. Dai, “Plug-and-Play Algorithms for Large-Scale Snapshot Compressive Imaging,” in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (IEEE, 2020), pp. 1444–1454.

21. Y. Liu, "Rank Minimization for Snapshot Compressive Imaging (TPAMI'19)," GitHub (2019), https://github.com/liuyang12/DeSCI.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (13)

Fig. 1.
Fig. 1. The optical path of SSCSI.
Fig. 2.
Fig. 2. Adjacent spectral bands share elements on the mask (suppose the projection region of a single spectrum is 2 × 2 elements on the mask) and the change of H matrix.
Fig. 3.
Fig. 3. Overlapping of incident light from adjacent pixels on the coded mask.
Fig. 4.
Fig. 4. Changes in encoding pattern correspond to the projection size of incident light on the mask (the encoding pattern corresponds to a single spectral band).
Fig. 5.
Fig. 5. The overlapping area corresponds to adjacent pixels.
Fig. 6.
Fig. 6. Interval sampling of observation image.
Fig. 7.
Fig. 7. Different q values correspond to the original encoding pattern and the optimized encoding pattern. (The original image has a large size (256×256), and the size of 20×20 in the central area was selected to display).
Fig. 8.
Fig. 8. The optical path of SSCSI system with a non-pinhole aperture.
Fig. 9.
Fig. 9. Diagram of spectral dimensional interval sampling.
Fig. 10.
Fig. 10. Reconstruction results from 400 nm to 700 nm (a) direct reconstruction of the original observed image (b) the result is optimized by the algorithm in this paper.
Fig. 11.
Fig. 11. Quality comparison of spatial dimension
Fig. 12.
Fig. 12. Comparison between accurate spectral recovery quality
Fig. 13.
Fig. 13. Comparison between the direct reconstruction of observation data and the optimized reconstruction of observation data in this paper in multiple scenes

Tables (4)

Tables Icon

Table 1. Power Spectral Density of Encoding Pattern Corresponding to Different q Values

Tables Icon

Table 2. Coherence of Sensing Matrix with Different q . Values

Tables Icon

Table 3. Comparison Result of 4 × 4

Tables Icon

Table 4. Comparison Result of 8 × 8

Equations (13)

Equations on this page are rendered with MathJax. Learn more.

g ( x , y ) = Λ T ( x ( 1 s ) + s a λ , y ) f ( x , y , λ ) d λ ,
( 1 δ s ) θ 2 2 A θ 2 2 ( 1 + δ s ) θ 2 2 ,
G = A θ + η = H ϕ θ + η ,
μ ( A ) = max m , n m n | A ( , m ) A ( , m ) , A ( , n ) A ( , n ) | ,
Q = e f Δ c + e × A P γ + f ,
P f = 1 R i = 1 R D F T ( φ i ) ε ( φ i ) ,
g i = d i × H ϕ θ + η ,
q = Q Δ d + 1 = 1 Δ d × ( e f Δ c + e × A P r + f ) + 1 ,
min q T 1 L i = 1 L c o n v 2 ( T i , K ) 2 ,
arg min θ ( g i d i H i ϕ θ 2 + λ θ 1 ) ,
R θ = θ F 1 F 2 p + F 1 F 2 ,
min ( G n = 1 q H n ϕ θ n 2 + λ θ 1 + + θ n 1 ) ,
min ( g i n = 1 q d i H n ϕ θ n 2 + λ θ 1 + + θ n 1 ) ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.