Investigating deep optics model representation in affecting resolved all-in-focus image quality and depth estimation fidelity

Xin Liu; Xin Liu; Linpei Li; Linpei Li; Xu Liu; Xiang Hao; Xiang Hao; Xiang Hao; Yifan Peng; Yifan Peng

doi:10.1364/OE.473084

1. Introduction

Deep optics, the encompassing field of jointly designing optics, sensor electronics, and processing algorithms in an end-to-end (E2E) manner, has enhanced the performance of many image-related measurement tasks [1], including spectral imaging [2–4], single-molecule localization microscopy [5], extended depth of field [6], achromatic imaging [7], high dynamic range imaging [8], and the combination of all-in-focus (AiF) imaging and monocular depth estimation (MDE) [9,10]. Of these tasks, high-quality AiF-MDE, in particular, has a substantial impact on many other exciting fields, including robotics [11], autonomous driving [12], and augmented reality [13]. Conventional approaches relying on time-of-flight, stereo pairs, or structured illumination are often accompanied by unwieldy, bulky hardware kits. Fortunately, many deep optics setups simply use widely accessible lenses with a lightweight reinforcement, such as a diffractive optical element (DOE) to code the aperture phase.

A general framework of deep optics designs can be illustrated as Fig. 1. The point spread function (PSF) of the optical system encoded by the specified optics model is convolved with the ground truth image which contains domain-specific information, such as the depth or spectrum of the scene. The sensor data is formed by compressing the entire wave field into an RGB or monochromatic image. These measurement data are then fed into a neural network, whose output is the predicted counterparts of the ground truth information. The loss function evaluates the deviation between the ground truth information and the prediction. During the E2E optimization, the backpropagation starts from the loss and continues along the entire pipeline. As such, the parameters of both the optics model and the neural network can be updated iteratively. Once the optimization converges, the optimal optical design and the corresponding image processing network for a domain-specific task can be obtained.

Fig. 1. Illustration of a general deep optics framework and the phase texture visualization of three investigated optics model representations. (a) PW model. (b) CR model. (c) OAM model. The solid and dashed arrows denote the forward and backward propagation.

Download Full Size | PDF

Notably, such designs generally involve considerations of computational efficiency, optimization complexity, degree of freedom (DoF), fabrication feasibility, and calibration complexity. Benefiting from emerging hardware and software technologies, computational efficiency is becoming less concerned. The primary issue currently is that such optics models require a large DoF for optimization, which is often constrained by fabrication limits of all kinds simultaneously [8]. In search of an appropriate optics model for a specific visual task, one often seeks tricks on expanding the DoF of design. However, even taking the fabrication constraints into consideration, the extra introduced DoF may in turn result in notorious calibration hardship dealing with fabrication and assembling errors in practice. To this point, a freer optics model, if not promising a significant improvement, may not deserve the hassle of its fabrication and calibration and is therefore much less valuable. This is counter-intuitive to the common understanding that the higher DoF the better. Moreover, an intuitive yet simple-structured parameterization may be beneficial for not only reducing optimization complexity but also improving computational efficiency.

As a showcase of such a circumstance, in this article, we comprehensively compare three phase-encoding parameterization types. Starting from an extreme case where the full DoF of a DOE is warranted, we assume each of the graven pixels to be independent of each other, leading to a pixel-wise (PW) parameterized phase distribution during optimization [Fig. 1(a)]. This scheme has been widely used in state-of-the-arts [5,6]. However, the PW model shows very little control over the local smoothness, prone to difficulties in making the exact design sustainable. On the contrary, considering the fabrication feasibility of optics, rotational symmetric shapes are the most favored type for centuries, starting with spherical lenses and later on aspherical ones [14]. Their counterparts in the DOE sense, Fresnel lenses consisting of concentric rings, have also been intensively investigated [15,16]. There is only one DoF being possessed in this type of DOE design, which is the radius of curvature of the surface.

Notably, to enable a higher DoF, one can further relax the radial phase continuity constraint and treat rings as independent variables, thereby, the concentric ring (CR) phase plates [Fig. 1(b)] have been recently reported to solve AiF, MDE, and achromatic imaging problems [7,9,17,18]. Like the conventional spherical optics, the CR model is robust to practical installation and calibration of lenses. Moreover, this symmetric characteristic can not only accelerate the diffraction modeling by dimensionality reduction but also facilitate a relatively practical fabrication process.

Unfortunately, the radial discontinuity of CR can make sophisticated fabrication techniques like lithography indispensable. In this point of view, this type of parameterization option exhibits over-constrains and wastes the underlying DoF that lithography techniques can grant inside the otherwise planar rings [19]. To extend the DoF of the CR type, adding extra features to the phase distribution is often considered. Although various parameterizations are available, if treating the phase distribution of each ring in the CR model as a zero-order polynomial of the azimuthal angle, the DoF can be extended by intuitively introducing an extra first-order term, i.e., a vortex shape [see Fig. 1(c)]. This term expands the DoF by a factor of two, providing more encoding capability for modulating incoming waves subject to desired tasks. It is worth noting that this parameterization is similar to the phase plate forming a helix-shaped PSF, which has been widely used for depth estimation [20–22]. In addition, it simultaneously introduces the extra orbital angular momentum (OAM) to photons transmitted from each ring [23], thus we term it the OAM model in this work.

The fact that most imaging systems feature a circular aperture naturally raises the option of using Zernike polynomials to parameterize the aperture wavefront [24], which have been widely applied in deep optics [4,6,25]. The Zernike model differs from the aforementioned sub-aperture parameterization method in that it is realized by the summation of a set of full-aperture orthogonal and smooth bases. Although this model guarantees phase smoothness, it can be ambiguous to specify the proper number and the type of bases from infinite ones. Importantly, for the target AiF-MDE task, the Zernike model has shown sub-optimal performance compared with the CR model, as indicated in the prior work [9]. Therefore, for brevity, we decide to exclude this model from our investigation.

We adopt the AiF-MDE task to compare the three parameterization manners. As a representative of deep optics applications, the AiF-MDE task contains inherent conflicts in its sub-goals. As AiF prefers easily invertible PSF which tends to be depth-invariant [26], while the MDE task tends to reserve depth-dependent features [24]. Fortunately, there remains a possibility that the two tasks can be balanced well in an E2E scheme, especially when a powerful neural network is involved [9]. As one of the greatest features of deep optics, the E2E optimization usually has the ability to improve the general performance further for multi-objective optimization despite inevitable trade-offs in the point of view of sequential, separated designs.

In this work, we investigate the impact of the parametric strategy on a deep optics model with respect to their potential for balancing the AiF and MDE sub-goals and thereby, lifting their overall performance. Firstly, to assess the performance of the optics models on this well-known trade-off, we train three models (PW, CR, and OAM) with different weight assignments for the AiF and MDE sub-goals and derive the corresponding metrics. The performance comparison of the three optics models is presented with possible insights thoroughly analyzed. Then, we characterize the PSFs of the three optics models and reveal the link between the PSF feature and the trade-off behavior. At last, these optics models are assessed taking into account synthetic fabrication errors.

2. Optics model representation

Although optical coding can be realized on various platforms, here we take the phase-only DOE as an example considering its widespread use in E2E designs. The phase modulation induced by a DOE can be described as

(1)$$\phi\left(u,v,\lambda\right) = \frac{2 \pi \left[n\left(\lambda\right)-n_{\mathrm{air}}\right]}{\lambda} h\left(u,v\right),$$

where $\left (u,v\right )$ indicate the lateral coordinates of the pixels on the aperture, and $\lambda$ is the wavelength of the incident light. $n$ is the refractive index of the DOE, and $n_\mathrm {air}\approx 1$ is that of air, respectively. $h$ is the height profile of the DOE.

General criteria of optics model representation include the quality of convergence during optimization, feasibility for fabrication, and the capability for information encoding. The most straightforward type for numerical optimization is the PW parameterization [8], i.e., the height of every pixel on the DOE is treated as a variable in the E2E optimization. Although this model theoretically facilitates all possible phase modulations of a DOE, in practice, it can be problematic to converge to a reasonably good result due to enormous parameters. Worse still, the optimized DOE most likely features a rough height profile since each pixel is optimized independently, resulting in fabrication challenges.

To decrease the number of parameters and alleviate possible fabrication complexity, recently, a model constituted by a set of vanilla concentric rings, i.e., CR model, has been proposed, manifesting great superiority in applications of AiF-MDE [9] and diffractive achromats [7]. The phase coding of this model can be formulated by

(2)$$\begin{aligned} \phi\left(r, \lambda\right) = \sum_{m=1}^{M} b_{m}\left(\lambda\right) \left[\operatorname{Circ}_{m} \left( r\right) - \operatorname{Circ}_{m-1} \left( r\right)\right], \end{aligned}$$

where $r=\sqrt {u^{2}+v^{2}}$, $\operatorname {Circ}_{m} \left ( r\right )=\operatorname {circ}\left (r / r_{m}\right )$, and $\operatorname {Circ}_{0} \left ( r\right )=0$. $r_{m}=md$, $m=1,2, \ldots, M$ ($d$ is the width of each ring), and $\operatorname {circ}$ is the unit circ function. The phase shifts $b$ of each ring are the variables to optimize. Featuring a rotationally symmetric structure, the efficiency of numerical computation can be tremendously enhanced by reducing the diffraction calculation from two dimensions (2D) to 1D. In addition, this feature can simplify the fabrication and hardware assemblage. However, its relatively simple structure in turn significantly limits the DoF. The resulting PSF is restricted to a circular shape, which indicates that the information can be encoded along only one spatial dimension.

In principle, depth information can be directly extracted from the depth-dependent shape of PSF. One representative is the helix-shaped PSF [20–22], which features a depth-dependent rotation. The corresponding phase profile is also constituted by a set of concentric rings, but with spiral profiles rather than aclinic rings. Specifically, the phase modulation of each ring is $l\varphi$ ($l$ is the topological charge, and $\varphi$ is the azimuthal angle on the aperture plane). Therefore, the incident light of each ring carries the OAM of $l\hbar$ per photon ($\hbar$ is the reduced Planck constant). Assuming a constant phase shift $b$ to each of the rings extends the DoF without introducing difficulty in manufacturing. This modification yields a generalized model of both the OAM and the CR types, which can be expressed with similar notations as

(3)$$\begin{aligned} \phi\left(r, \varphi, \lambda\right) =\sum_{m=1}^{M} \left[l_{m}\left(\lambda\right) \varphi+b_{m}\left(\lambda\right)\right] \left[\operatorname{Circ}_{m} \left( r\right) -\operatorname{Circ}_{m-1} \left( r\right) \right]. \end{aligned}$$

where $l_{m}$ is the topological charge induced by the $m$-th ring. Conventionally, it is defined as an integer. However, here we set $l_{m} \in \mathbb {R}$ because an integer $l_{m}$ is intractable for simultaneously encoding multiple wavelengths.

3. Implementation and analysis

We establish an E2E pipeline and perform optimization threads with three genres of DOE models. In brief, the PSFs of the optical system coded by the DOE are convolved with the ground truth images segmented subject to the ground truth depth map. The sensor image is then simulated by summing up the convolved segments after correcting occlusion at the transitional regions and then pre-processed to form a depth-dependent image stack. This image stack along with the sensor image is fed into a convolutional neural network (CNN), whose output is an estimated AiF image with its depth map. Specifically, we choose the U-Net as the architecture of the CNN, as it is the most representative network in deep optics, especially for AiF-MDE. Details about the pipeline, datasets, and hyperparameter settings can be found in Supplement Section 1.

For the quantification of the two dimensions—AiF image quality and depth estimation fidelity, we use checkpoints of the validation step with the lowest loss of AiF images and depth maps, denoted by $C^{\prime }$ and $C^{\prime \prime }$, respectively. Notably, checkpoints are snapshots of the model status at the current optimization step, which store all values of the hyperparameters and optimizable variables, including the profile of the DOE and each parameter of the CNN.

3.1 AiF-MDE trade-off analysis

First, we assess the performance of the three models with the weight ratio between the AiF image and the depth map $R_{w} = w_{\mathrm {RGB}} : w_{\mathrm {Depth}} = 1:1$. The OAM model is optimized with two different initialization manners, corresponding to the original double ($\mathrm {OAM_{d}}$) and triple ($\mathrm {OAM_{t}}$) helix PSFs, respectively. The PW and the CR ones are both initialized with zeros. The phase distribution and PSFs of the initial status can be found in Supplement Fig. S2. Selected estimated AiF images from $C^{\prime }$ and depth maps from $C^{\prime \prime }$ are shown in Fig. 2. The results are quantitatively evaluated by the average peak signal-to-noise ratio (PSNR) of the AiF images and mean absolute error (MAE) of the depth maps, respectively. We observe that the quality of the AiF images reconstructed from the PW model is the best, while that from the $\mathrm {OAM_{t}}$ model is the worst. In contrast, the depth estimation fidelity of the four models exhibits a reversed trend.

Fig. 2. Estimated AiF images and depth maps from four models configured with $R_{w} = 1:1$. The $1^{\mathrm {st}}$ and $3^{\mathrm {rd}}$ rows show parts of AiF images from the validation set. The $2^{\mathrm {nd}}$ and $4^{\mathrm {th}}$ rows are the corresponding depth maps. The image PSNR and the MAE of the depth maps are indicated on the upper-left of each block. The MAE of the depth maps is measured in meters. Note that the AiF images and the depth maps are resolved from saved checkpoints $C^{\prime }$ and $C^{\prime \prime }$, respectively.

Download Full Size | PDF

The average metrics across the whole validation set are listed in Table 1, which suggests a consistent observation with Fig. 2 that the best and worst performances in AiF imaging are possessed by the PW and $\mathrm {OAM_{t}}$ models, respectively. However, the case for depth estimation is generally, but not completely, reversed as shown in Fig. 2. We observe that the MAE of the depth maps estimated from the PW model is smaller than that of the CR one. This may be due to the different DoFs of these two models. As mentioned above, the PW model has the highest DoF, however, interestingly, it did not possess the best performance in terms of both AiF imaging and depth estimation. This may result from the high DoF as well, which likely leads to over convergence on the AiF imaging with small but featureless PSFs under this weight combination. Although the AiF image quality from the OAM models is worse than the PW model and the CR one, the depth maps show better fidelity. The $\mathrm {OAM_{d}}$ model shows better AiF image quality than the $\mathrm {OAM_{t}}$ one, but they have similar depth estimation performance.

Table 1. Quantitative assessment of optics models for AiF imaging and depth estimation implemented with $R_{w}=1:1$.

View Table

Since the captured image is determined by the PSF while the architecture of the CNN remains unchanged, we analyzed the PSFs of the four models to further explore the substantial reasons for the performance deviations. The phase distribution (at the principal wavelength of 545 nm) of the optimized DOE models and the corresponding color PSFs are visualized in Fig. 3. The superscripts $^{\prime }$ and $^{\prime \prime }$ of each model name denote the results obtained from the two checkpoints $C^{\prime }$ and $C^{\prime \prime }$, respectively. The optimized phase distribution of the DOE and the corresponding PSFs of each model obtained from $C^{\prime }$ and $C^{\prime \prime }$ are similar. This observation indicates that, in $C^{\prime }$ and $C^{\prime \prime }$, the main difference is the parameters in the network, which results in the different performance of AiF imaging and depth estimation during the training of each model. In the four models, unsurprisingly, the phase distribution of the PW model is the roughest. However, to our astonishment, the phase distribution converges to an almost concentric shape, as well as the PSFs. For the CR model, the PSFs at each depth all exhibit similar circular shapes but have depth-dependent color features, which can assist depth estimation. In contrast, the PSFs of the two OAM models exhibit more depth-dependent shapes, although they are more diffused compared with the PW and CR ones.

Fig. 3. Optimized phase distribution at the principal wavelength of DOE models and corresponding color PSFs along the target depth range of 1 m – 5 m. The $1^{\mathrm {st}}$ column presents the optimized phase distribution of the DOE, and the $2^{\mathrm {nd}} - 7^{\mathrm {th}}$ columns are the corresponding PSFs, which are uniformly sampled with the inverse perspective scheme [9] within the target depth range (visualization of the full 16 PSFs can be found in the Supplement Fig. S3, S4). For better visualization, the PSFs are cropped to the central $64\times 64$ pixels and shown in normalized amplitude.

Download Full Size | PDF

To further assess the performance in balancing the trade-off between AiF and MDE sub-goals, we set a series of alternative weight ratios $R_{w} \in \left \{1:10, 1:5, 5:1, 10:1\right \}$ in the loss function and train the four models with these four weight ratios. Therefore, 20 independent optimization threads are conducted, including the case of $R_{w} = 1:1$. We then collect all the checkpoints at the end of every training epoch for a detailed analysis, since they can reveal the possibility/potential of the performance of the optics models during the whole optimization process. The validation metrics of each checkpoint are presented in Fig. 4. Note that those points closer to the top-left corner represent a better overall performance since it is the direction along which both metrics evolve towards optimal. The performance bounds and cloud pattern of the four models suggest that, despite the rough surface profile, the PW model conquers most of the land in terms of requiring better performance in both AiF and MDE. The CR model exhibits better overall performance than the OAM ones, however, in the region boxed by the solid black line, the OAM model is likely to obtain better performance compared to those boxed by the dashed black lines. We conclude that, in most cases, the CR model is preferred for both AiF and MDE tasks. For applications where MDE is more concerned over AiF, the OAM one is likely the better choice compared with the CR model, because it shows the potential to obtain a comparable MAE of depth maps but a better PSNR of AiF images at the same time and vice versa. Nevertheless, this superiority may be very weak to be preserved when affected by manufacturing error.

Fig. 4. Validation metrics from each epoch of the 20 studied optimization clusters. The depth MAE is shown in logarithmic coordinate for the visualization purpose. The colored curves connect the outermost dots of each model, indicating the performance bounds. The density distribution of the dots is visualized as colored clouds. The shape of the markers represents the results obtained with different $R_{w}$ (see Supplement 1, Fig. S5, for the split view). The region bounded by the solid black box indicates where the OAM models outperform the CR one, compared to those by dashed black boxes, irrespective of the PW model. Each optimization thread takes about 33 hours, and the convergence during the training can be found inSupplement 1, Fig. S6, S7. In addition, two groups of selected predictions varying with the weight ratios can be found in Supplement Fig. S8, S9.

Download Full Size | PDF

As noted before, the OAM model can theoretically degrade into the CR one. However, in practice our studied $\mathrm {OAM_{d}}$ and $\mathrm {OAM_{t}}$ models did not converge to the CR one to obtain a better overall performance. This may be attributed to the sub-optimal initialization manners where a large $l$ in each ring is necessary, indicating that it is far from converging to a CR-shaped one, i.e., $l\to 0$ in each ring. To validate this hypothesis, we have optimized another OAM model with zeros initialization, i.e., all $l=0$ and $b=0$. The optimized results (Supplement 1, Fig. S10) suggest that the parameter $l$ in each ring may not be effective in promoting the AiF-MDE performance as expected, since only a few rings carry noticeable vortex features.

3.2 PSF characterization

Extending the qualitative analysis in Fig. 3 leads to a quantitative analysis of PSF energy concentration and similarities between depth layers of the four models. The energy concentration is evaluated by the proportion of energy around the central field of view—the circular area with a radius of 6 pixels. The shape similarity of PSFs at different depth layers is evaluated by the pairwise correlation coefficient. As shown in Fig. 5, when $R_{w}$ becomes larger, i.e., the E2E optimization boosts the AiF imaging over the depth estimation, the PSFs of all the four models exhibit an overall increased energy concentration [Fig. 5(a)], as well as the shape similarity [Fig. 5(b)]. Notably, lower shape similarity indicates more depth dependency. Moreover, the PSF of the CR model possesses the highest energy concentration except for the PW model [Fig. 5(c)], while the shape of PSFs of the two OAM models varies dramatically subject to depth [Fig. 5(d)]. The corresponding PSFs of the four models obtained from $C^{\prime }$ and $C^{\prime \prime }$ at 16 depth layers can be found in Supplement 1, Fig. S3, S4. Combining this observation with Fig. 3 and Fig. 4, it is indicated that the PSFs with higher energy concentration can likely benefit AiF imaging, while those featuring intense depth-dependent shape can be more beneficial to depth estimation.

Fig. 5. PSF characterization. (a) and (b) are the average energy concentration and similarity of PSFs at 16 depth layers of the four models with different $R_{w}$. (c) and (d) are the further averaged values of the four models by different $R_{w}$. The average energy concentration and pairwise correlation coefficient are denoted by $\eta$ and $c$, respectively.

Download Full Size | PDF

3.3 Robustness to fabrication error

Although simulation results have revealed the pros and cons of the three investigated optics models, possible fabrication errors would inevitably distort the optical response. To this point, we reasonably model the fabrication error by adding the Gaussian noise to the height maps of the optimized DOEs with different standard deviations (SDs). Specifically, the SDs are defined as the relative values of the maximum height $h_{\mathrm {max}}$ of DOE profiles, corresponding to the 2$\pi$ phase shift for the principal wavelength. Then, the performance of the optics models is assessed on the validation set. The validation process is the same as that described in Section 3.1, except for the difference in the height maps of DOEs.

We have performed five simulation trials for each model with different heightmap error levels. To alleviate any random error, in each simulation trial, the corresponding validation result is the average of five repeated validations with different random seeds of Gaussian noise. The resulting image PSNR and depth MAE are illustrated in Fig. 6, under the weight ratio $R_{w}=1:1$ of model assessment. As shown in Fig. 6(a), with the increase of the fabrication error, the image PSNR of all optics models exhibits a decreasing trend. Notably, in contrast to the other two models, the PW model is intensively affected as expected. This is probably because the optimized DOE possesses too many high-frequency features which are most likely to be corrupted during the fabrication process, thereby its resulting PSF is more likely to deviate from the target distribution.

Fig. 6. Visualization of robustness to fabrication errors of investigated optics models. (a) and (b) are image PSNR and depth MAE varied with the height map error of DOEs, respectively. The level of the fabrication error is controlled by the SD relative to the maximum height of DOE $h_{\mathrm {max}}$.

Download Full Size | PDF

With respect to the depth estimation [Fig. 6(b)], the PW and CR models exhibit more sensitivity than the OAM models. However, interestingly, the depth MAE of the PW and CR models, especially the PW one, do not follow a monotonic trend like the image PSNR. This observation indicates that a slight variation of the PSF may boost the performance of the depth estimation. A possibility is that, at those assessed checkpoints, these optics models have not been optimized to reach a reasonable local optimum for depth estimation. In addition, the CR model exhibits weak robustness to the fabrication error for MDE. Thus, we conclude that the MDE may be more sensitive to the deviation of circular PSFs.

4. Conclusion

Specifying an appropriate model parameterization of deep optics that can appropriately balance the optimization efficiency and imaging performance is crucial yet challenging. Built upon a representative domain-specific application, we have investigated the impact of DOE parameterization on AiF image quality and depth estimation fidelity. The conclusions are listed as follows:

1. By comparing three representative models with varying weight ratios between AiF and MDE subgoals, the PW model possesses the overall best performance in simulation. However, it may be surpassed by the CR one in real-world applications due to fabrication challenges induced by its rough surface profile. For applications where high-quality MDE is a priority, although the OAM models show slightly better image PSNR while having the same depth MAE as the CR one, and vice versa, this weak superiority is most likely to be overwhelmed by fabrication errors. In addition, both the PW and OAM models may require a complicated calibration for actual lens assembling especially when having multiple elements, while the CR one can significantly simplify this process due to its rotationally symmetric feature. Overall, it is indicated that, for the AiF-MDE task, near-optimal performance could be obtained with the CR model in most cases, especially considering its manufacturing feasibility and promising computational efficiency.
2. The results from the OAM model initialized with zeros indicate that a higher DoF of the optics model does not always lead to better performance for specific tasks. This is still an open question, where essential physical limitations of the optics model and optimization challenges should be considered thoroughly.
3. A PSF with concentrated energy is more likely to benefit the AiF image reconstruction, while a PSF with a depth-dependent shape is more suitable for MDE. Accordingly, we can infer that a PSF possessing both high energy concentration and distinct depth-dependent shape may boost the AiF imaging and MDE performance simultaneously. Nevertheless, more advanced PSF engineering approaches are demanded.
4. The PW model most likely would suffer from more fabrication errors compared with the other two models. Therefore, when parameterizing an optics model, the robustness to fabrication deviations should get sufficient attention. Intuitively, one can add fabrication-driven constraints like the quantization aware scheme [27] into the optimization, or specify parameterization manners that are robust to this fabrication defect impact, such as the CR or OAM models.

We believe these findings can provide insightful guidelines for future deep optics designs. Last, although orthogonal to the technical scope of this investigation, we would like to note several limitations. The three optics models aforementioned form a comprehensive comparison involving two ends of extreme cases (PW and CR) and one in between (OAM). However, possibilities between the two extreme ends are infinite, and an exhaustive comparison is impractical. We also note that although the Zernike model is not included in our comparison, it still should be often considered for parameterizing the wavefront in deep optics. Moreover, most existing optical elements, including the camera lenses and DOEs, as well as propagation methods, are modeled with the paraxial approximation and spatial shift-invariance. In the non-paraxial case, the optics models may demand a further assessment via more rigorous modeling manners [28,29]. In addition, here only three principle wavelengths for the sensor response are considered, while more spectral channels should be included to approach the real-world experimental condition. Nevertheless, when questing for the breakthrough of the two limitations, a larger amount of computational effort may be required.

Funding

National Key Research and Development Program of China (2018YFA0701400); National Natural Science Foundation of China (92050115).

Acknowledgments

The authors gratefully acknowledge Gordon Wetzstein, Cindy Nguyen, Rui Wang, and Edmund Y. Lam for fruitful discussion.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. T. Klinghoffer, S. Somasundaram, K. Tiwary, and R. Raskar, “Physics vs. learned priors: Rethinking camera and algorithm design for task-specific imaging,” arXiv preprint arXiv:2204.09871 (2022).

2. L. Wang, T. Zhang, Y. Fu, and H. Huang, “Hyperreconnet: Joint coded aperture optimization and image reconstruction for compressive hyperspectral imaging,” IEEE Trans. on Image Process. 28(5), 2257–2270 (2019). [CrossRef]

3. W. Zhang, H. Song, X. He, L. Huang, X. Zhang, J. Zheng, W. Shen, X. Hao, and X. Liu, “Deeply learned broadband encoding stochastic hyperspectral imaging,” Light: Sci. Appl. 10(1), 108 (2021). [CrossRef]

4. H. Arguello, S. Pinilla, Y. Peng, H. Ikoma, J. Bacca, and G. Wetzstein, “Shift-variant color-coded diffractive spectral imaging system,” Optica 8(11), 1424–1434 (2021). [CrossRef]

5. E. Nehme, D. Freedman, R. Gordon, B. Ferdman, L. E. Weiss, O. Alalouf, T. Naor, R. Orange, T. Michaeli, and Y. Shechtman, “Deepstorm3d: dense 3d localization microscopy and psf design by deep learning,” Nat. Methods 17(7), 734–740 (2020). [CrossRef]

6. V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” ACM Trans. Graph. 37(4), 1–13 (2018). [CrossRef]

7. X. Dun, H. Ikoma, G. Wetzstein, Z. Wang, X. Cheng, and Y. Peng, “Learned rotationally symmetric diffractive achromat for full-spectrum computational imaging,” Optica 7(8), 913–922 (2020). [CrossRef]

8. C. A. Metzler, H. Ikoma, Y. Peng, and G. Wetzstein, “Deep optics for single-shot high-dynamic-range imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), pp. 1375–1385.

9. H. Ikoma, C. M. Nguyen, C. A. Metzler, Y. Peng, and G. Wetzstein, “Depth from defocus with learned optics for imaging and occlusion-aware depth estimation,” in 2021 IEEE International Conference on Computational Photography (ICCP), (2021), pp. 1–12.

10. S.-H. Baek, H. Ikoma, D. S. Jeon, Y. Li, W. Heidrich, G. Wetzstein, and M. H. Kim, “Single-shot hyperspectral-depth imaging with learned diffractive optics,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), pp. 2651–2660.

11. M. Ye, E. Johns, A. Handa, L. Zhang, P. Pratt, and G.-Z. Yang, “Self-supervised siamese learning on stereo image pairs for depth estimation in robotic surgery,” arXiv preprint arXiv:1705.08260 (2017).

12. Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), pp. 8445–8453.

13. W. Lee, N. Park, and W. Woo, “Depth-assisted real-time 3d object detection for augmented reality,” in ICAT, vol. 11 (2011), pp. 126–132.

14. P. L. Ruben, “Design and use of mass-produced aspheres at kodak,” Appl. Opt. 24(11), 1682–1688 (1985). [CrossRef]

15. Y. Peng, Q. Fu, H. Amata, S. Su, F. Heide, and W. Heidrich, “Computational imaging using lightweight diffractive-refractive optics,” Opt. Express 23(24), 31393–31407 (2015). [CrossRef]

16. A. Nikonorov, R. Skidanov, V. Fursov, M. Petrov, S. Bibikov, and Y. Yuzifovich, “Fresnel lens imaging with post-capture image processing,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), (2015), pp. 33–41.

17. S. Elmalem, R. Giryes, and E. Marom, “Learned phase coded aperture for the benefit of depth of field extension,” Opt. Express 26(12), 15316–15331 (2018). [CrossRef]

18. H. Haim, S. Elmalem, R. Giryes, A. M. Bronstein, and E. Marom, “Depth estimation from a single image using deep learned phase coded mask,” IEEE Trans. Computat. Imaging 4(3), 298–310 (2018). [CrossRef]

19. Q. Fu, H. Amata, and W. Heidrich, “Etch-free additive lithographic fabrication methods for reflective and transmissive micro-optics,” Opt. Express 29(22), 36886–36899 (2021). [CrossRef]

20. S. R. P. Pavani and R. Piestun, “High-efficiency rotating point spread functions,” Opt. Express 16(5), 3484–3489 (2008). [CrossRef]

21. S. R. P. Pavani, M. A. Thompson, J. S. Biteen, S. J. Lord, N. Liu, R. J. Twieg, R. Piestun, and W. E. Moerner, “Three-dimensional, single-molecule fluorescence imaging beyond the diffraction limit by using a double-helix point spread function,” Proc. Natl. Acad. Sci. 106(9), 2995–2999 (2009). [CrossRef]

22. S. Prasad, “Rotating point spread function via pupil-phase engineering,” Opt. Lett. 38(4), 585–587 (2013). [CrossRef]

23. X. Liu, Y. Peng, S. Tu, J. Guan, C. Kuang, X. Liu, and X. Hao, “Generation of arbitrary longitudinal polarization vortices by pupil function manipulation,” Adv. Photonics Res. 2(1), 2000087 (2021). [CrossRef]

24. Y. Shechtman, S. J. Sahl, A. S. Backer, and W. E. Moerner, “Optimal point spread function design for 3d imaging,” Phys. Rev. Lett. 113(13), 133902 (2014). [CrossRef]

25. Y. Wu, V. Boominathan, H. Chen, A. Sankaranarayanan, and A. Veeraraghavan, “Phasecam3d—learning phase masks for passive single view depth estimation,” in 2019 IEEE International Conference on Computational Photography (ICCP), (IEEE, 2019), pp. 1–12.

26. E. R. Dowski and W. T. Cathey, “Extended depth of field through wave-front coding,” Appl. Opt. 34(11), 1859–1866 (1995). [CrossRef]

27. L. Li, L. Wang, W. Song, L. Zhang, Z. Xiong, and H. Huang, “Quantization-aware deep optics for diffractive snapshot hyperspectral imaging,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2022), pp. 19780–19789.

28. F. Wyrowski and M. Kuhn, “Introduction to field tracing,” J. Mod. Opt. 58(5-6), 449–466 (2011). [CrossRef]

29. S. Schmidt, T. Tiess, S. Schröter, R. Hambach, M. Jäger, H. Bartelt, A. Tünnermann, and H. Gross, “Wave-optical modeling beyond the thin-element-approximation,” Opt. Express 24(26), 30188–30200 (2016). [CrossRef]

Optics models	Image (PSNR) $↑$	Depth (MAE) $↓$
PW	$29.80$	$0.056$
CR	$28.40$	$0.058$
$O A M_{d}$	$26.52$	$0.054$
$O A M_{t}$	$25.86$	$0.053$

Investigating deep optics model representation in affecting resolved all-in-focus image quality and depth estimation fidelity

Abstract

1. Introduction

2. Optics model representation

3. Implementation and analysis

3.1 AiF-MDE trade-off analysis

3.2 PSF characterization

3.3 Robustness to fabrication error

4. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (6)

Tables (1)

Equations (3)

Optics Express