Snapshot ptychography on array cameras

Chengyu Wang; Chengyu Wang; Minghao Hu; Minghao Hu; Yuzuru Takashima; Timothy J. Schulz; David J. Brady

doi:10.1364/OE.447499

1. Introduction

The cross range resolution of diffractive imaging systems is aperture limited. Radar imaging has long used aperture synthesis from moving or distributed receivers to increase aperture [1]. Numerous studies of synthetic aperture ladar have attempted to extend this advantage to optical frequencies [2]. While such systems have been demonstrated with significant range and resolution [3,4], the challenges of holographic stability and referencing have limited their applicability. Recently, reference-free aperture synthesis using ptychography has been increasingly popular, beginning with the seminal demonstration by Zheng et al. of a gigapixel-scale microscope [5]. While gigapixel-scale aperture synthesis has also been demonstrated by holographic methods [6,7], Fourier ptychography (FP) requires no reference signal and was implemented by Zheng et al. with a simple LED illumination array.

Diverse approaches have subsequently been proposed to improve the resolution [8], portability [9], or acquisition speed [10] of this setup. Single-shot FP has been demonstrated with a diffractive grating [11], a lens array [12], or color multiplexing [13]. Multi-camera systems capture band-limited images in parallel using multiple cameras to increase the imaging throughput [14–16]. Aperture-scanning FP translates the aperture with a mechanical stage [17] or performs digital scanning with a spatial light modulator (SLM) [18]. The development of aperture-scanning FP further permits macroscopic super-resolution imaging where far-field propagation is equivalent to the Fourier transform of the target field [17,19,20]. Beyond the increased resolution and space-bandwidth product, the advantages of FP also include phase imaging [21], digital refocusing [17], 3D imaging [22] and aberration correction [23]. Recent comprehensive reviews of FP are presented in Konda et al. [24] and Zheng et al. [25].

Here we consider the extension of array camera aperture synthesis to macroscopic imaging systems. Such an extension was previously suggested by Holloway et al. [19], but to our knowledge has yet to be demonstrated. Calibration of the forward mapping from object space to the sensor array is the primary challenge to such a demonstration. We use data measured directly on the array to train a convolutional neural network to directly invert multiaperture data, avoiding both the calibration problem and reliance on phase retrieval algorithms. In so doing, we also demonstrate single frame synthetic aperture imaging, which ultimately may enable video-rate multiaperture coherent imaging. Similar methods have previously been demonstrated in diverse applications of snapshot compressive imaging [26].

Conventional Fourier ptychography uses phase retrieval algorithms, as reviewed for example in [27] to combine coherent image data across multiple frames. Phase retrieval algorithms depend on data redundancy; typical systems require at least 60% overlap in the Fourier space between adjacent images [19]. The scanning associated redundant sampling and the iterative nature of the reconstruction algorithms lead to long acquisition and processing times. To address the processing aspect of this challenge, deep-learning (DL)-based algorithms have been proposed [28,29]. Kappeler et al. [30] proposed a 3-layer CNN and demonstrated reconstruction performance better than alternating projection methods when there was no overlap. Nguyen et al. [31] proposed a conditional generative adversarial network (cGAN) and reported 40 times faster reconstruction. Boominathan et al. [32] proposed a U-Net with different training strategies for high overlap and low overlap cases, and showed improved reconstruction in all cases. Shamshad et al. [33] utilized generative models with subsampling operator which required less observed data and was more robust to noise. To improved the network generalization, Zhang et al. [34] proposed to synthesize a complex field from the measurements as the input to the network. Xue et al. [35] proposed to reconstruct the phase and assess the estimated phase at the same time using a Bayesian convolutional neural network.

While simulations and experiments have demonstrated that DL methods improve the imaging efficiency in FP, less attention has been paid to the their advantages in hardware design. In fact, because DL reconstruction need not rely on the analytic forward model, accurate system calibration is no longer needed. Here we show that a fixed array camera can be used with end-to-end neural training to recover images upscaled up by 6.7x in resolution relative to the single aperture limit. Our contributions include demonstration of an "in place" training strategy and testing strategies to confirm synthetic aperture performance. In particular, we use data selection to show that observed resolution enhancements are intrinsically tied to array size. In addition to our experimental demonstration, we present simulations of diverse array sampling strategies, including multiframe strategies based on subaperture array translation.

While innovations in calibration strategy and processing beyond the scope of this study will be needed to field coherent multiaperture cameras, the results presented here confirm the ability of such systems to greatly exceed the single aperture diffraction limit and the utility of neural processing in image formation from such systems.

Section 2 describes the experimental system we built to demonstrate the proposed approach. Section 3 details our neural training and estimation strategy and experimental results. Section 4 presents design simulations to help understand the impact of subaperture size and distribution and compression ratio on system performance. Finally Section 5 discusses the significance of results presented here and potential next steps in the development of coherent array imaging.

2. System design

We consider the array camera imaging system shown in Fig. 1(a). The object is illuminated by coherent light source, such as laser, and captured by a camera array. One may roughly model each camera as a low pass filter on the object field with transfer function $H(u,v)=P(\lambda F u,\lambda F v)$, where $P(x,y)$ is the pupil function [36] and $F$ is the focal length. In array of identical cameras, the transfer function for the $i^{th}$ camera centered at position $(x_i,y_i)$ is $P(\lambda F u-x_i,\lambda F v-y_i)$. The corresponding coherent impulse response for this camera is

(1)$$h_i(x,y)=e^{2\pi i\frac{x_ix+y_ix}{\lambda F}}h_o(x,y),$$

where $h_o(x,y)$ is the point spread function for a camera at the center of the $(x,y)$ plane. In practice, camera tilt, focal state variation and uncertainty in $(x_i,y_i)$ impact how well the phase function $\phi _i(x,y)\approx 2\pi i\frac {x_ix+y_ix}{\lambda F}$ is known, but for present purposes it is sufficient to define the array measurement model as

(2)$$g_i(x',y')=\left |\int\int f(x,y) e^{i\phi_i(x,y)}h_o(x'-x,y'-y) dx dy\right |^2.$$

Conventional Fourier ptychography uses iterative phase retrieval to invert the spectrogram given in Eqn. 2. Here, however, we propose to directly apply deep learning to estimate $f(x,y)$. This approach enables reconstruction from under sampled Fourier data and avoids the need to precisely characterize $\phi _i(x,y)$.

Fig. 1. Comparison between the proposed method and conventional FP. (a) Proposed array camera snapshot FP. (b) Aperture-scanning FP [20]. The proposed method uses a sparse sampling strategy while conventional FP requires scanning the entire Fourier space in an overlapping manner.

Download Full Size | PDF

In previous multiaperture FP studies, either the camera positions or the illumination wave direction is varied to enable oversampling of the target Fourier space. For example, Fig. 1(b) shows the aperture-scanning FP [20], where the camera moves to capture different regions of the Fourier space. The single frame Fourier coverage may be visualized by a disk in the object Fourier space, where the disk is defined by the pupil function. Shifting the phase $\phi _i$ by changing the camera position $x_i,y_i$ or by changing the coherent wave illumination angle shifts the position of the bandpass filter. Conventional FP assumes a dense array of overlapping bandpass measurements. Such sampling is not possible in a single frame of multi-camera data. Rather we sparsely sample the Fourier space as illustrated at the right of Fig. 1(a).

An experimental system built to validate the proposed approach is shown in Fig. 2. A superluminescent 650 nm light emitting diode (Exalos, Langhorne PA) was used for illumination. A spatial filter was utilized to collimate the source. Object patterns $f(x,y)$ were created using a liquid crystal spatial light modulator (SLM, Hamamatsu X10468), containing $600\times 800$ pixels with pixel pitch 20$\mu m$. The reflected, phase modulated wave was imaged onto the camera array. All cameras in the array were focused on the SLM plane. The array consisted of 16 1MP OV9281 global shutter cameras (Arducam B0267) coupled with Marshall 25 mm f/2.5 lens (V-4325) operated on Nvidia Jetson Nano array. We 3D printed the supporting frame to mount the cameras in a $4\times 4$ array. The sensors of the 16 cameras were not on the same plane, which allowed a slightly compact design. The offset between the optical axes of the adjacent lenses was approximately 33 mm. We also adjusted the orientation of each camera such that the target appeared at the center of its captured frame.

Fig. 2. Experimental camera array and imaging system.

Download Full Size | PDF

To avoid grating diffraction from the pixelation of the SLM and maintain a proper measurement resolution, the SLM was placed 1.1 m away from the camera array. The period of the diffraction pattern at this distance was 35.7 mm. By letting the $0^{th}$ order diffraction fall into the gap between the right four cameras in the middle layers, no diffraction pattern was captured. At this distance, the target was measured by approximately $90\times 120$ pixels on each camera.

The liquid crystal on silicon SLM modulates the phase in proportion to the voltage applied over each pixel [37]. In our setup, the voltage is control by the pixel intensity, and higher intensity corresponds to lager phase retardation. Due to the pixel crosstalk caused by the fringing fields and elastic forces of the material, the expected phase retardation is spatially low pass filtered [38,39]. To mitigate this effect we limited our experiments to binary phase modulation with maximal retardation and clear high frequency images. An experimental analysis of the resolution of the SLM is conducted in the Section 1 in Supplement 1.

The pixel magnification was $600/90\approx 6.7$, meaning that one camera pixel measures $6.7$ SLM pixels. The system goal is to jointly process the 16 array camera images to upsample to the original images. Such upsampling is possible because of the systematic variation in the subsampled images due to the phase functions $\phi _i(x,y)$. To demonstrate the feasibility of such upsampling, we used the physical array to measure output signals for several thousand images displayed on the SLM. We used the known input images as ground truth and the output image array as input to a convolutional neural network. We then trained the network to associate the ground truth images with the measured data.

We first collected 2665 vector clip arts from Openclipart (https://openclipart.org/). With image augmentation methods, i.e., rotation, flipping and scaling, we generated 23200 binary images with resolution $576\times 768$ which were zero-padded to $600\times 800$. The padding was applied because boundary pixels were occluded by the case of the SLM from some viewpoints. During the capture process, 5 frames were averaged for each camera to suppress noise. The low-resolution images from each camera were cropped to just the $90 \times 120$ region imaging the SLM. An example training image and its corresponding measurements are shown in Fig. 3.

Fig. 3. A data sample on the SLM and the corresponding measurements from the camera array.

Download Full Size | PDF

3. Image estimation

The reconstruction network adopted the U-Net structure [40] with dense blocks [41], as shown in Fig. 4. The network consists of the initial convolutional layer, 14 dense blocks, 7 transition layers, 7 upsampling layers and a final convolutional layer. The definitions of the dense block and transition layer follow the original DenseNet [41]. Each dense block consists of 5 BN-PReLU-Conv(1$\times$1)-BN-PReLU-Conv(3$\times$3) building blocks. The growth rate is $k = 24$ and each bottleneck layer produces $4k$ feature maps. The compression factor $\theta$ equals 0.8 in transition layers. The upsampling layer replaces the max-pooling in transition layer with deconvolution, and we apply compression factor $\theta = 0.2$. The initial convolutional layer produces 64 feature maps with filter size $3\times 3$, and the final convolutional layer uses filter size $1\times 1$. We upsampled the measurements so the input to the network matched the output image in spatial dimension.

Fig. 4. Illustration of the reconstruction network. The network adopts from the U-Net [40] and the DenseNet [41].

Download Full Size | PDF

Our 23200 image dataset was separated into 20000 training images and 3200 testing images. The network was first trained with binary cross-entropy loss, Adam optimizer [42] and learning rate 0.0003 for 100 epochs in PyTorch. After that we built a sub-dataset by selecting out the data with poor reconstruction SSIM. We tuned the network with the sub-dataset for 20 epochs using the following loss:

(3)$$l = l_{BCE} + \lambda l_{SSIM},$$

where we selected $\lambda = 0.01$. This training method avoided the domination of smooth data [43]. The training was performed on four Nvidia Tesla V100 with batch size 12. The resulting network, along with the code used to train it, is available for download at [44].

We evaluated the network with widely used image quality assessment metrics, and the results are summarized in Table 1. We also show results for networks that used only subsets of the 16 measured images. (See the following text for details.) While the trend to improved image resolution is clear in the results, the effect of aperture synthesis is much clearer in actual images for the sparse binary patterns used here.

Table 1. Performance metrics evaluated on the testing data.

View Table | View all tables in this article

Several reconstructed samples are demonstrated in Fig. 5. (In the current section, "reconstruction" refers to the network output after thresholding.) In each sample, the reconstructed image, the ground truth image, and the image directly down-sampled to $90\times 120$ are shown. The down-sampled image represents the resolving power of a single camera in terms of the sensor pixel size. The images show that the proposed imaging method overcomes the pixel-limited resolution and super-resolves the texture details. It is worth-noting that the measurements did not include a bright field image as in traditional FP, so the low-frequency structural information was inferred from measurements. More samples along with their measurements are shown in Section 2 in Supplement 1.

Fig. 5. Improvement from pixel limited resolution. Zoom in to see details and quantitative evaluations.

Download Full Size | PDF

The resolution of a camera is limited by diffraction blur, geometric aberration and pixel sampling. We studied the actual resolution improvement of the system by imaging a resolution test chart. The target and the reconstructed image are shown in Fig. 6(a). The width of each line increases from 1 pixel to 14 pixels in the left two columns and from 1 pixel to 7 pixels in the right column. We also directly imaged the target with a single camera using a polarizer, and we compare the direct imaging and the reconstruction from 16 measurements in Fig. 6(b). From direct imaging, the minimum resolvable width of a bar is 7 pixels on the SLM, which agrees with the down-sampling ratio of the camera. With the proposed imaging method, the width of a resolvable bar decreases to 4 pixels, and the we can still see repeating patterns in the right column when the width is 3 pixels.

Fig. 6. Resolution improvement from a single image. The bars with the width of 4 pixels can be resolved, and we can still see patterns when the width is 3 pixels (red bounding box in (a)). Zoom in to see details.

Download Full Size | PDF

To confirm that our reconstructed image quality is based on aperture synthesis over the full camera array we trained two more networks that used only subsets of the 16 measured images. The first network used the images from the right four cameras in the middle layers, and the second network used the remaining 12 images. The quantitative results are shown in Table 1, and visual results are shown in Fig. 7. As one would expect, measurements close to the optical axis contribute to the reconstruction of low-frequency information, and the system relies on off-axis measurements to recover high-frequency details.

Fig. 7. Comparing the reconstruction results using different number of measurements.

Download Full Size | PDF

Figure 8 shows example images with less satisfactory reconstruction results, which also represent the typical types of error in the testing data. The most common error comes from the random dots or large areas without phase variation in the image. Ptychographic image synthesis is based on self-referencing interference between adjacent pixels. For discrete point sources and other sparse images, such interference does not occur and aperture synthesis is impossible. The second type of error comes from animal images which consist of plenty of short lines with varying orientations. While lines are easier to reconstruct compared with dots, the varying random orientation still poses challenges in generating a binary image, but we are able to observe the texture from the network output before thresholding. We also see significant error in artistic images which is difficult to avoid due to the lack of similar samples in the training set. To improve the reconstruction fidelity and build systems for wider applications, the following aspects should be considered.

Fig. 8. Representative samples with reconstruction errors.

Download Full Size | PDF

Calibration. One challenge in traditional FP is calibration, because the reconstruction algorithm requires an accurate forward model. In our experiments, we manually cropped the SLM region from the image without careful pixel alignment, and we did not characterize the pixel cross-talk on SLM. While the calibration can be implicitly completed by the neural network, the burden can be lifted to improve the network’s resolving power.

Camera arrangement. In our experiments, we did not directly measure the pattern on the SLM, and the results show that low-frequency information can be inferred from measured high-frequency information. However, adding a direct measurement should improve the reconstruction, especially when the image consists of mainly low-frequency components. In Sec. 4., we further discuss other considerations in the aperture distribution.

Dataset and training. The performance of a neural network highly depends on the training data. In our experiments, the training data included 1880 samples containing random geometrical shapes, as shown in Fig. 9(a), and the network could reconstruct Fig. 9(b) without perceptible error, see Fig. 9(c). In contrast, the reconstructed image, Fig. 10, showed obvious artifacts when the geometrical data were removed from the training set. The performance of the network is also affected by the training strategy. Figure 10 compares the reconstruction performance before and after the network was tuned with challenging sub-dataset. The reconstruction on details improved with this training trick.

Fig. 9. Effect of training data on reconstruction performance. (a) Samples with geometrical shapes were included in the training set. (b) A sample in testing data. (c) The testing sample could be exactly reconstructed. (d) Reconstruction showed significant artifacts when the geometrical shapes were removed from training data.

Download Full Size | PDF

Fig. 10. Effect of tuning with challenging sub-dataset.

Download Full Size | PDF

4. Design analysis

In this section, we simulate diverse systems to consider how the experimental results presented above might improve. While the proposed method does not require an accurate forward model for reconstruction, the reconstruction fidelity is naturally affected by the diameter(s) and the distribution of the camera apertures. Here we simulate 10 aperture distribution strategies and compare their performances.

The simulation follows the traditional FPM as described in [45], and for simplicity we drop the phase factor and the coordinate scaling. The complex wave from the object is denoted $\psi (x,y)$, and the field at the Fourier plane is denoted $\hat {\psi }(x',y')$, then the image measurement by the $i^{th}$ camera can be expressed as

(4)$$I_i(x,y) = |\mathcal{F}[\hat{\psi}(x',y')A(x'-x'_i,y'-y'_i)]|^2,$$

where $\mathcal {F}$ is the Fourier transform and $A(x'-x'_i,y'-y'_i)$ is the aperture centered at $(x'_i,y'_i)$ defined as

(5)$$A(x',y')=\left\{ \begin{aligned} & 1 & & {x'^2+y'^2\leq(\frac{d}{2})^2}\\ & 0 & & {otherwise}\\ \end{aligned} \right. ,$$

where $d$ is the diameter of the aperture. In simulations, we assumed the wave from the object $\psi (x,y)$ was a real-valued image with $512\times 512$ pixels.

In the first 4 strategies, we considered 9 apertures with aperture diameter $d=128$. Strategy 1 assumed an intuitively ideal but physically challenging layout that 9 apertures were densely located at the center of the Fourier space. Strategy 2 assumed an uniform, symmetrical and sparse distribution. Strategy 3 assumed an uniform, asymmetrical and sparse distribution. Strategy 4 assumed sparse but loosely structured distribution where each aperture was given a random shift compared to the strategy 2, and this strategy best described our physical setup.

Strategy 5 and 6 considered 16 and 36 apertures with diameters $d = 96$ and $d = 64$ respectively. Strategy 7-10 considered multi-scale aperture diameters and random distribution. The number of apertures and diameters for each distribution are illustrated in Fig. 11. The number of apertures were selected so that all strategies except strategy 7 had similar total measured pixels, while strategy 7 measured approximately 20% less pixels.

Fig. 11. Different aperture distribution strategies. Strategy 1-6 consider single aperture diameter, and Strategy 7-10 consider multi-scale aperture diameters. Strategy 7 consists of $d = 128, 96$ and 64, each with 4 apertures. Strategy 8 consists of $d = 128, 96$ and 64, each with 5 apertures. Strategy 9 consists of $d = 128$ and 96, with 1 and 14 apertures respectively. Strategy 10 consists of $d = 128$ and 64, with 1 and 32 apertures respectively. The total measured pixels and the measured percentage of the Fourier space are labeled under each strategy.

Download Full Size | PDF

The simulation data were generated from DIV2K dataset [46] and CLIC dataset [47]. We cropped 20000 patches, each containing $512\times 512$ pixels. The 20000 data were divided into 15000 training data, 2500 validation data and 2500 testing data. We considered the Poisson noise by introducing a parameter $n$ representing the expected number of photons, so the target was $\psi (x,y) = I_{gt}(x,y)$, where $I_{gt}$ represented the image normalized to $[0,1]$, and the measurement became:

(6)$$I_i(x,y) = Poisson(|\mathcal{F}[\hat{\psi}(x',y')A(x'-x'_i,y'-y'_i)]|^2\times n).$$

The reconstruction still adopted the U-net structure as shown in Fig. 4 but with 6 transition layers and 6 upsampling layers. The growth rate was 16, and each dense block consisted of 5 building blocks. We also considered the residual learning scheme [48] and asked the network to predict the residual of the bright field low resolution image.

We first trained the network on noise-free data following Eq. (4), and we tuned the network with noisy data with ($n=10^3$). The performance of the networks on the testing data with different noise levels are summarized in Table 2, and three reconstruction samples are shown in Fig. 12. Extra samples and full resolutions images are shown in Section 3 in Supplement 1. We emphasize the following observations:

1. Sparsity: Although the dense distribution better preserved the structure information of the image, its ability to resolve high frequency information is limited. In contrast, sparse measurement strategies captured more high-frequency information and recovered more details while stilling maintaining high PSNR. Dense distribution also showed poor robustness to noise compared to sparse distributions.
2. Random distribution: Given the diameter and the number of the apertures, as shown in strategy 2-4, randomly distributed apertures outperformed others in both quantitative evaluation and visual results. We attribute this effect to the reduced aliasing ambiguity commonly observed with random projections in compressive measurement systems.
3. Multi-scale apertures: While the resolving power of the system decreased with the aperture diameter in single-aperture-size cases, improved results were demonstrated by combining multi-scale apertures, which even achieved competitive results with fewer measured pixels. We attribute this improvement to a match between sampling structure and the multi-scale features of natural images.

Fig. 12. Comparing the reconstruction results with different aperture distributions. The full resolution images were reconstructed but only the zoomed-in details of the reconstructed images are shown for easy comparison.

Download Full Size | PDF

Table 2. Quantitative comparisons of different distribution strategies.

View Table | View all tables in this article

The first two observations allow great flexibility in camera array design, and the third observation allows using smaller apertures without compromising the resolving power of the system. These pave the way for developing cheap and portable system for wider applications.

While we applied the same network structure used on the experimental data to the simulation data, it should be noted that reconstructing a gray-scale image is more challenging than a binary image. In fact the state-of-the-art single image super-resolution algorithms easily exceed 10 million parameters [49], and we had only 5 million parameters in contrast, so improved results could be achieved with advanced network structures or training techniques. The network structure also ignores the different scales of the measurements. Future research can focus on jointly optimizing the aperture distribution and the reconstruction network.

Results above and in the supplement document show that sparse sampling strategies can produce visually satisfactory reconstruction for most images. Further improving the reconstruction fidelity requires increasing the number of measured pixels and the Fourier space coverage. This process is easily achieved within a camera array system. Because the system throughput has been increased with multiple cameras, the reconstruction fidelity can be significantly improved by spatially shifting the system and increasing the number of snapshots. We simulate this process following the aperture distribution strategy 4.

In this given aperture distribution, each snapshot captures 9 images, covering 44% of the Fourier space. To cover more Fourier space, we can shift the system and have multiple snapshots, so in total $9k$ images are captured with $k$ snapshots. We simulated $k = 1\cdots 6$, increasing the covered Fourier space from 44% to 100%. In terms of reconstruction network, we increased the growth rate to 24 and the number of building blocks in each dense block to 6. With the increased number of measurements and coverage of the Fourier space, the reconstruction may be further improved with traditional alternating projection algorithms by initializing the algorithms with the network prediction. As a benchmark, we simulated the traditional FP with standard 61% overlap and 100 measurements. We also simulated applying only alternating projection algorithm on the 54 images captured from 6 snapshots. For fair comparison, we maintained the total number of photons from the source image, so the number of photons per measurement decreased with the increased overlap in Fourier space. We show the simulation results in Table 3. Image samples and the detailed aperture distributions are provided in Section 4 in Supplement 1.

Table 3. Quantitative comparisons of different number of snapshots. The percentages of the measured Fourier space are labeled in the table.

View Table | View all tables in this article

The results show that: 1) in low noise conditions, increasing the measurements will improve the reconstruction fidelity. With alternating projection algorithm, the system achieves performance competitive to traditional FP; 2) with the increased noise level, the deep learning method outperforms the alternating projection with much fewer measurements.

Another aspect of this is that the proposed system does not require accurate calibration and a forward model, so, instead of shifting the camera system, the shift in Fourier space can be achieved by adjusting the illumination angle. One can consider adding extra illumination sources or installing the laser on a translator.

5. Conclusion

We have shown that it is possible to combine coherent image data over multiple camera apertures to super-resolve a remote scene with a single snapshot of data. Of course, our system is contrived in the sense that we have full control over the object field through an SLM, which allows us to train the system without fully calibrating the structure of the forward model. In future work, we hope to build on the results presented here to create synthetic aperture images of natural objects. We imagine that such an imaging system can be calibrated with a combination of structured illumination and test objects, but we leave demonstration of such calibration to future work. We have also compared diverse array structures and found that unstructured arrays perform best with snapshot reconstruction. Again referring to future work, we anticipate that multiframe estimation over moving platforms will further improve these results.

Funding

Defense Advanced Research Projects Agency (N66001-21-1-4030).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the experimental results presented in this paper are available in Ref. [44]. Data underlying the simulation results presented in this paper are available in Ref. [46,47].

Supplemental document

See Supplement 1 for supporting content.

References

1. M. Ryle and A. Hewish, “The synthesis of large radio telescopes,” Mon. Not. R. Astron. Soc. 120(3), 220–230 (1960). [CrossRef]

2. S. M. Beck, J. R. Buck, W. F. Buell, R. P. Dickinson, D. A. Kozlowski, N. J. Marechal, and T. J. Wright, “Synthetic-aperture imaging laser radar: laboratory demonstration and signal processing,” Appl. Opt. 44(35), 7621–7629 (2005). [CrossRef]

3. B. W. Krause, J. Buck, C. Ryan, D. Hwang, P. Kondratko, A. Malm, A. Gleason, and S. Ashby, Synthetic aperture ladar flight demonstration, in CLEO: Science and Innovations (Optical Society of America, 2011), p. PDPB7.

4. N. Wang, R. Wang, D. Mo, G. Li, K. Zhang, and Y. Wu, “Inverse synthetic aperture ladar demonstration: system structure, imaging processing, and experiment result,” Appl. Opt. 57(2), 230–236 (2018). [CrossRef]

5. G. Zheng, R. Horstmeyer, and C. Yang, “Wide-field, high-resolution fourier ptychographic microscopy,” Nat. Photonics 7(9), 739–745 (2013). [CrossRef]

6. D. J. Brady and S. Lim, Gigapixel holography, in 2011 ICO International Conference on Information Photonics, (IEEE, 2011), pp. 1–2.

7. J. R. Fienup and A. E. Tippie, Gigapixel synthetic-aperture digital holography, in Tribute to Joseph W. Goodman, vol. 8122 (International Society for Optics and Photonics, 2011), p. 812203.

8. X. Ou, R. Horstmeyer, G. Zheng, and C. Yang, “High numerical aperture fourier ptychography: principle, implementation and characterization,” Opt. Express 23(3), 3472–3491 (2015). [CrossRef]

9. S. Dong, K. Guo, P. Nanda, R. Shiradkar, and G. Zheng, “Fpscope: a field-portable high-resolution microscope using a cellphone lens,” Biomed. Opt. Express 5(10), 3305–3310 (2014). [CrossRef]

10. L. Bian, J. Suo, G. Situ, G. Zheng, F. Chen, and Q. Dai, “Content adaptive illumination for fourier ptychography,” Opt. Lett. 39(23), 6648–6651 (2014). [CrossRef]

11. X. He, C. Liu, and J. Zhu, “Single-shot fourier ptychography based on diffractive beam splitting,” Opt. Lett. 43(2), 214–217 (2018). [CrossRef]

12. B. Lee, J.-y. Hong, D. Yoo, J. Cho, Y. Jeong, S. Moon, and B. Lee, “Single-shot phase retrieval via fourier ptychographic microscopy,” Optica 5(8), 976–983 (2018). [CrossRef]

13. J. Sun, Q. Chen, J. Zhang, Y. Fan, and C. Zuo, “Single-shot quantitative phase microscopy based on color-multiplexed fourier ptychography,” Opt. Lett. 43(14), 3365–3368 (2018). [CrossRef]

14. A. C. Chan, J. Kim, A. Pan, H. Xu, D. Nojima, C. Hale, S. Wang, and C. Yang, “Parallel fourier ptychographic microscopy for high-throughput screening with 96 cameras (96 eyes),” Sci. Rep. 9(1), 11114 (2019). [CrossRef]

15. J. Kim, B. M. Henley, C. H. Kim, H. A. Lester, and C. Yang, “Incubator embedded cell culture imaging system (emsight) based on fourier ptychographic microscopy,” Biomed. Opt. Express 7(8), 3097–3110 (2016). [CrossRef]

16. P. C. Konda, J. M. Taylor, and A. R. Harvey, “Multi-aperture fourier ptychographic microscopy, theory and validation,” Optics and Lasers in Engineering 138, 106410 (2021). [CrossRef]

17. S. Dong, R. Horstmeyer, R. Shiradkar, K. Guo, X. Ou, Z. Bian, H. Xin, and G. Zheng, “Aperture-scanning fourier ptychography for 3d refocusing and super-resolution macroscopic imaging,” Opt. Express 22(11), 13586–13599 (2014). [CrossRef]

18. X. Ou, J. Chung, R. Horstmeyer, and C. Yang, “Aperture scanning fourier ptychographic microscopy,” Biomed. Opt. Express 7(8), 3140–3150 (2016). [CrossRef]

19. J. Holloway, M. S. Asif, M. K. Sharma, N. Matsuda, R. Horstmeyer, O. Cossairt, and A. Veeraraghavan, “Toward long-distance subdiffraction imaging using coherent camera arrays,” IEEE Trans. Comput. Imaging 2(3), 251–265 (2016). [CrossRef]

20. J. Holloway, Y. Wu, M. K. Sharma, O. Cossairt, and A. Veeraraghavan, “Savi: Synthetic apertures for long-range, subdiffraction-limited visible imaging using fourier ptychography,” Sci. Adv. 3(4), e1602564 (2017). [CrossRef]

21. X. Ou, R. Horstmeyer, C. Yang, and G. Zheng, “Quantitative phase imaging via fourier ptychographic microscopy,” Opt. Lett. 38(22), 4845–4848 (2013). [CrossRef]

22. R. Horstmeyer, J. Chung, X. Ou, G. Zheng, and C. Yang, “Diffraction tomography with fourier ptychography,” Optica 3(8), 827–835 (2016). [CrossRef]

23. J. Chung, J. Kim, X. Ou, R. Horstmeyer, and C. Yang, “Wide field-of-view fluorescence image deconvolution with aberration-estimation from fourier ptychography,” Biomed. Opt. Express 7(2), 352–368 (2016). [CrossRef]

24. P. C. Konda, L. Loetgering, K. C. Zhou, S. Xu, A. R. Harvey, and R. Horstmeyer, “Fourier ptychography: current applications and future promises,” Opt. Express 28(7), 9603–9630 (2020). [CrossRef]

25. G. Zheng, C. Shen, S. Jiang, P. Song, and C. Yang, “Concept, implementations and applications of fourier ptychography,” Nat. Rev. Phys. 3(3), 207–223 (2021). [CrossRef]

26. X. Yuan, D. J. Brady, and A. K. Katsaggelos, “Snapshot compressive imaging: Theory, algorithms, and applications,” IEEE Signal Process. Mag. 38(2), 65–88 (2021). [CrossRef]

27. L.-H. Yeh, J. Dong, J. Zhong, L. Tian, M. Chen, G. Tang, M. Soltanolkotabi, and L. Waller, “Experimental robustness of fourier ptychography phase retrieval algorithms,” Opt. Express 23(26), 33214–33240 (2015). [CrossRef]

28. S. Jiang, K. Guo, J. Liao, and G. Zheng, “Solving fourier ptychographic imaging problems via neural network modeling and tensorflow,” Biomed. Opt. Express 9(7), 3306–3319 (2018). [CrossRef]

29. T. J. Schulz, D. J. Brady, and C. Wang, “Photon-limited bounds for phase retrieval,” Opt. Express 29(11), 16736–16748 (2021). [CrossRef]

30. A. Kappeler, S. Ghosh, J. Holloway, O. Cossairt, and A. Katsaggelos, Ptychnet: Cnn based fourier ptychography, in 2017 IEEE International Conference on Image Processing (ICIP), (IEEE, 2017), pp. 1712–1716.

31. T. Nguyen, Y. Xue, Y. Li, L. Tian, and G. Nehmetallah, “Deep learning approach for fourier ptychography microscopy,” Opt. Express 26(20), 26470–26484 (2018). [CrossRef]

32. L. Boominathan, M. Maniparambil, H. Gupta, R. Baburajan, and K. Mitra, Phase retrieval for fourier ptychography under varying amount of measurements, arXiv preprint arXiv:1805.03593 (2018).

33. F. Shamshad, F. Abbas, and A. Ahmed, Deep ptych: Subsampled fourier ptychography using generative priors, in ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (IEEE, 2019), pp. 7720–7724.

34. J. Zhang, T. Xu, Z. Shen, Y. Qiao, and Y. Zhang, “Fourier ptychographic microscopy reconstruction with multiscale deep residual network,” Opt. Express 27(6), 8612–8625 (2019). [CrossRef]

35. Y. Xue, S. Cheng, Y. Li, and L. Tian, “Reliable deep-learning-based phase imaging with uncertainty quantification,” Optica 6(5), 618–629 (2019). [CrossRef]

36. D. J. Brady, Optical imaging and spectroscopy (John Wiley & Sons, 2009).

37. Z. Zhang, Z. You, and D. Chu, “Fundamentals of phase-only liquid crystal on silicon (lcos) devices,” Light Sci Appl 3(10), e213 (2014). [CrossRef]

38. M. Persson, D. Engström, and M. Goksör, “Reducing the effect of pixel crosstalk in phase only spatial light modulators,” Opt. Express 20(20), 22334–22343 (2012). [CrossRef]

39. W. Zaperty and T. Kozacki, Numerical model of diffraction effects of pixelated phase-only spatial light modulators, in Speckle 2018: VII International Conference on Speckle Metrology, vol. 10834 (International Society for Optics and Photonics, 2018), p. 108342A.

40. O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

41. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, Densely connected convolutional networks, in Proceedings of the IEEE conference on computer vision and pattern recognition, (2017), pp. 4700–4708.

42. D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint arXiv:1412.6980 (2014).

43. M. Gharbi, G. Chaurasia, S. Paris, and F. Durand, “Deep joint demosaicking and denoising,” ACM Trans. Graph. 35(6), 1–12 (2016). [CrossRef]

44. C. Wang, M. Hu, and D. J. Brady, Snapshot ptychography code, 1, Github,(2021). https://github.com/djbradyAtOpticalSciencesArizona/arrayCameraFourierPtychography.

45. G. Zheng, Fourier ptychographic imaging: a MATLAB tutorial (Morgan & Claypool Publishers, 2016).

46. E. Agustsson and R. Timofte, Ntire 2017 challenge on single image super-resolution: Dataset and study, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, (2017).

47. CLIC, Workshop and challenge on learned image compression, (2018). http://clic.compression.cc/2018/.

48. K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising,” IEEE Trans. on Image Process. 26(7), 3142–3155 (2017). [CrossRef]

49. Y. Guo, J. Chen, J. Wang, Q. Chen, J. Cao, Z. Deng, Y. Xu, and M. Tan, Closed-loop matters: Dual regression networks for single image super-resolution, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020).

Input	MSE	SSIM	BCE
4 measurements	0.0561	0.7447	0.1767
12 measurements	0.0548	0.6974	0.1766
16 measurements	0.0428	0.8117	0.1776

	Noise-free data training			Noisy data Tuning ( $n = 10^{3}$ )
Photons	$n = 10^{5}$	$n = 10^{3}$	$n = 10^{2}$	$n = 10^{5}$	$n = 10^{3}$	$n = 10^{2}$
Criteria	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM
Strategy 1	28.63 / 0.8742	27.32 / 0.8462	25.98 / 0.8082	27.96 / 0.8614	27.71 / 0.8566	26.44 / 0.8233
Strategy 2	27.19 / 0.8448	27.05 / 0.8408	26.23 / 0.8142	27.19 / 0.8446	27.12 / 0.8434	26.43 / 0.8222
Strategy 3	27.63 / 0.8533	27.28 / 0.8453	25.60 / 0.7913	27.59 / 0.8512	27.56 / 0.8506	25.97 / 0.8040
Strategy 4	28.32 / 0.8691	27.74 / 0.8564	25.70 / 0.7900	28.16 / 0.8663	28.02 / 0.8633	25.88 / 0.7991
Strategy 5	26.59 / 0.8300	26.23 / 0.8208	24.97 / 0.7799	26.66 / 0.8279	26.60 / 0.8263	25.36 / 0.7901
Strategy 6	24.66 / 0.7665	24.48 / 0.7618	23.87 / 0.7463	24.64 / 0.7667	24.69 / 0.7674	24.12 / 0.7526
Strategy 7	27.56 / 0.8556	27.22 / 0.8483	25.80 / 0.8090	27.55 / 0.8546	27.42 / 0.8518	26.08 / 0.8107
Strategy 8	27.79 / 0.8577	27.49 / 0.8508	26.05 / 0.8053	27.88 / 0.8563	27.75 / 0.8541	26.53 / 0.8211
Strategy 9	27.84 / 0.8586	27.48 / 0.8507	25.99 / 0.8029	27.94 / 0.8573	27.79 / 0.8546	26.41 / 0.8144
Strategy 10	27.07 / 0.8384	26.97 / 0.8367	26.36 / 0.8202	27.14 / 0.8387	27.07 / 0.8375	26.46 / 0.8221

Input	MSE	SSIM	BCE
4 measurements	0.0561	0.7447	0.1767
12 measurements	0.0548	0.6974	0.1766
16 measurements	0.0428	0.8117	0.1776

	Noise-free data training			Noisy data Tuning ( $n = 10^{3}$ )
Photons	$n = 10^{5}$	$n = 10^{3}$	$n = 10^{2}$	$n = 10^{5}$	$n = 10^{3}$	$n = 10^{2}$
Criteria	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM	PSNR / SSIM
Strategy 1	28.63 / 0.8742	27.32 / 0.8462	25.98 / 0.8082	27.96 / 0.8614	27.71 / 0.8566	26.44 / 0.8233
Strategy 2	27.19 / 0.8448	27.05 / 0.8408	26.23 / 0.8142	27.19 / 0.8446	27.12 / 0.8434	26.43 / 0.8222
Strategy 3	27.63 / 0.8533	27.28 / 0.8453	25.60 / 0.7913	27.59 / 0.8512	27.56 / 0.8506	25.97 / 0.8040
Strategy 4	28.32 / 0.8691	27.74 / 0.8564	25.70 / 0.7900	28.16 / 0.8663	28.02 / 0.8633	25.88 / 0.7991
Strategy 5	26.59 / 0.8300	26.23 / 0.8208	24.97 / 0.7799	26.66 / 0.8279	26.60 / 0.8263	25.36 / 0.7901
Strategy 6	24.66 / 0.7665	24.48 / 0.7618	23.87 / 0.7463	24.64 / 0.7667	24.69 / 0.7674	24.12 / 0.7526
Strategy 7	27.56 / 0.8556	27.22 / 0.8483	25.80 / 0.8090	27.55 / 0.8546	27.42 / 0.8518	26.08 / 0.8107
Strategy 8	27.79 / 0.8577	27.49 / 0.8508	26.05 / 0.8053	27.88 / 0.8563	27.75 / 0.8541	26.53 / 0.8211
Strategy 9	27.84 / 0.8586	27.48 / 0.8507	25.99 / 0.8029	27.94 / 0.8573	27.79 / 0.8546	26.41 / 0.8144
Strategy 10	27.07 / 0.8384	26.97 / 0.8367	26.36 / 0.8202	27.14 / 0.8387	27.07 / 0.8375	26.46 / 0.8221

Snapshot ptychography on array cameras

Abstract

1. Introduction

2. System design

3. Image estimation

4. Design analysis

5. Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (12)

Tables (3)

Equations (6)

Optics Express