Multiscale diffractive U-Net: a robust all-optical deep learning framework modeled with sampling and skip connections

Yiming Li; Yiming Li; Yiming Li; Zexi Zheng; Ran Li; Quan Chen; Haitao Luan; Haitao Luan; Hui Yang; Hui Yang; Qiming Zhang; Qiming Zhang; Min Gu; Min Gu; Min Gu

doi:10.1364/OE.468648

1. Introduction

As one of the most rapidly developing directions of artificial intelligence, deep learning has achieved great success in many fields [1–4]. By training a neural network model, a nonlinear mapping relationship between input and output is established [5–7]. It is usually necessary to expand the network scale when the neural network is used to solve complex tasks [8,9], which puts forward strict requirements on computing efficiency and data throughput of hardware. Traditional electronic-based deep neural networks use large-scale transistors to perform calculations [10]. However, according to Moore’s law, the scaling of electronic transistors approaches its physical limit and the performance growth by electronic hardware implementation has become unsustainable [11]. The speed and energy are fundamentally limited by the parasitic capacitance, the tunnelling effect and crosstalk [12]. Optical operations can perform convolution, Fourier transform and differentiation at the speed of light, providing additional fitting degrees of freedom for neural networks [13,14]. In early works, optical operations usually play an auxiliary role in reducing the scale of electrical neural networks [15]. Recently, the combination of artificial intelligence and nanophotonics has produced a number of nanophotonics devices, which support a new approach to solve this situation [16]. Photonic circuits integrated on a chip can simulate synapses and neural signaling to build artificial neural networks [17]. Diffractive deep neural network (D²NN) is a kind of all-optical neural network framework based on holographic technology, which can be built based on deep learning to perform a variety of complex functions [18]. The appearance of D²NN provides a new possibility to solve the inherent defects of electrical neural networks [19].

The D²NN adopts the training mode of traditional machine learning, but strictly follows the optical transfer model during modeling [20]. Passive multi-layer diffraction layers are used to control the phase/amplitude of the transmission of the light field to achieve all-optical inference with low energy consumption and high throughput. D²NNs with cascaded architectures are widely used in classification tasks, object segmentation, optical shaping and logical calculation [21–25]. Although D²NNs have been successfully applied in many fields, the depth of networks [26] and the misalignment of each layer are still two problems to limit their further development. The residual block (RB) was introduced from the electrical neural network model Res-net to expand the depth of the networks [27]. However, RBs can only transmit short-range residual information and the corresponding pixels of the diffractive layers are required to be strictly aligned. A training scheme was introduced which significantly increased the robustness of diffractive networks against misalignments [28]. However, a large number of error samples need to be generated in advance and the testing accuracy under normal conditions may be affected.

As a mature framework in electrical neural networks, U-Net has been widely used in many fields [29–30]. U-Net was first proposed to solve the problem of biological image segmentation [31]. In optics, U-Net was used for such tasks as wavefront correction and scattering imaging. The wavefront correction of holographic tomography was performed by U-Net to obtain three-dimensional images of protozoan cell samples [32]. The speckle correlation was extracted by U-Net to better complete speckle image reconstruction [33]. The excellent performance of U-Net is attributed to the two special structures of sampling and skip connections. The features of different scales of targets are obtained through multiple downsampling and upsampling, and the multi-scale features are fused through skip connections. In this paper, a robust all-optical deep learning framework based on multi-scale feature fusion has been proposed, called multiscale diffractive U-Net (MDUNet), by introducing sampling and skip connections to achieve multi-scale features extraction and fusion.

The sampling module is realized by changing the pixel size and resolution of each passive diffractive layer. Beam splitters and reflectors can be used to fuse the features of the corresponding scale. In addition, the sampling module can significantly improve the robustness of the model, and the skip connections can increase the depth of the network expansion. Compared with the cascaded network, the requirements of the network model for the parameter quantity are also reduced as the sampling depth increases. We set up a comparative simulation between MDUNet and ordinary cascaded D²NN of MNIST and Fashion-MNIST to verify their performance. Compared with common all-optical learning frameworks, MDUNET achieves the highest accuracy of 98.81% and 89.11% on MNIST and Fashion-MNIST respectively. The testing accuracy of MNIST and Fashion-MNIST can be further improved to 99.06% and 89.86% respectively by using the ensemble learning method to construct the optoelectronic hybrid neural network. After sampling for 4 times, the testing accuracy can be maintained above 90% even if the diffractive layer moves 7 wavelengths in radial direction.

2. Modeling methods and parameters

The phase and amplitude modulation are carried out independently through multiple passive diffractive layers. Each neuron generates a secondary source of the wave, and the pixel-wise modulated lights at each layer are regarded as new diffractive light sources that propagate forward18, as shown in Fig. 1(a).

Fig. 1. (a) Architecture of the MDUNet. The neurons of each layer are linked to the neurons of the neighboring layer through free-space wave propagation. (b) Schematic diagram of down-sampling and up-sampling principle. In the downsampling (upsampling) process, the pixel resolution of the diffraction layer is reduced to 1/2 (enlarged to 2 times) of the previous layer, and the pixel size is enlarged to 2 times (reduced to 1/2). (c) The framework of MDUNET for MNIST and Fashion-MNIST. The features of different scales after sampling are shown in Supplement 1 (see Fig. S1).

Download Full Size | PDF

As an all-optical learning framework, the neurons of each layer are linked to the neurons of the neighboring layer through free-space wave propagation following the Rayleigh-Sommerfeld [34] diffraction equation in the far field regime:

(1)$${w_i}(x,y,z) = \frac{{z - {z_i}}}{{{r^2}}}\left( {\frac{1}{{2\pi r}} + \frac{1}{{j\lambda }}} \right)\exp \left( {\frac{{j2\pi r}}{\lambda }} \right),$$

where i represents the i-th pixel of a given layer of the system located at position $({x_i},{y_i},{z_i})$, $\lambda $ is the operative wavelength, $r = \sqrt {{{({x - {x_i}} )}^2} + {{({y - {y_i}} )}^2} + {{({z - {z_i}} )}^2}} $ and $j = \sqrt { - 1} $. Due to the limited size of the diffracted layer, the energy loss caused by propagation should also be considered through Eq. (1). More details about the network principle and training parameter settings of D²NN are demonstrated in Supplement 1.

The most special architectures of MDUNet are downsampling-upsampling modules (D-UM) and skip connections (SC). The D-UM performs the downsampling or upsampling process by changing the size of the pixels of the passive diffractive layer, as shown in Fig. 1(b). The sampling rate is set to 2 in this paper. In the downsampling process, the pixel resolution of the diffraction layer is reduced to 1/2 of the previous layer, and the pixel size is enlarged to 2 times. On the contrary, in the upsampling process, the single-sided pixel resolution of the diffractive layer is expanded to 2 times that of the previous layer, and the pixel size is reduced to 1/2. In MDUNet, the features of different scales are defined as the distribution of light field after sampling at different scales. Since light fields can be directly superimposed by interference, it is possible to fuse optical features at corresponding scales. The network performance can be improved by using different scale sampling modules to obtain the multi-scale features of the target.

As for the SC, the beam splitters (BS) with trainable splitting ratio γ are set to establish the skip connections which can converge light field to the designated layer by reflectors. The variation range of the splitting ratio is 0-1. If γ is 0, the incident signal is transmitted completely by reflection through skip connections. If γ is 1, the incident signal is transmitted completely by transmission through the main path, that is the skip connection is removed. During the training, this parameter will be optimized according to the backpropagation algorithm. Thus, the BS can be modeled as:

(2)$${U_t} = {U_{in}} \times \sqrt \gamma ,$$

(3)$${U_r} = {U_{in}} \times \sqrt {1 - \gamma } ,$$

where ${U_{in}}$ is the input light field, ${U_t}$ is the transmitted light field and ${U_r}$ is the reflected light field. Light field propagation is also modeled by Eq. (1). For the sake of the experiment, the time delay should be eliminated by extending the main path to be consistent with the skip connection.

In order to increase the inferential ability of the MDUNet, a nonlinear optical layer is added after the passive diffractive layer using photorefractive crystal (strontium barium niobate, SBN: 60), which can generate nonlinear phase modulation according to the incident light intensity [35,36]. The thickness of photorefractive crystal and voltage on a crystal can be set to 1mm and 972 V respectively, with which the phase variation of nonlinear material is between 0∼$\pi$ and can be formulated as Ref. [25]:

(4)$$\Delta \phi = \pi < I > /(1 + < I > ).$$

Taking a 7-layer MDUNet as an example, the schematic diagram of its network structure is shown in Fig. 1(c). It consists of 3 pairs of D-UMs and SCs, and the last diffractive layer is utilized to improve the SNR of the results. Considering the high integration requirements, MDUNet operates under the wavelength of 785 nm. A resolution of 128 × 128 pixels is applied to the input and output, and the minimum pixel size is set to 420 nm. The details of the diffractive layers are shown in Supplement 1.

The corresponding amplitude and phase can be obtained through Eq. (S4)-(S5) (see Supplement 1). As mentioned above, amplitude modulation is limited to (0, 1), while phase modulation is limited to (0, 2π) due to periodicity [37]. In order to ensure the maximum energy transfer efficiency, the initial value of amplitude modulation is defined as 1. Random weights are used to initialize the value of phase modulation. Due to the periodicity of the phase. The initial value of the splitting ratio $\gamma $ is set as 1, and the training interval is 0-1. The input of MDUNet is divided into pixels with specific resolution through amplitude coding. The output plane of the network was divided into 10 non-overlapping regions, each representing a class in the dataset. During the training phase, the SoftMax function was used before the output layer to display the area with the highest light intensity [38]. The setting was only to increase the gradient transfer during training and would not affect the intensity trend distribution in actual testing. The network will optimize the parameters according to the backpropagation algorithm. The network was trained with a learning rate of 0.001 and a batch size of 8. The optimizer was Adam, and the loss function was cross entropy. It was implemented by using Python version 3.7.11 and TensorFlow framework 1.15 (Google Inc.) and running on a desktop computer (Nvidia Titan RTX graphics processing unit, Intel Xeon Platinum 8160 CPU, 48 cores, 256GB RAM, Microsoft Windows 10) system.

3. Results

3.1 Comparison of MDUNet and D²NN on MNIST and Fashion-MNIST

The MNIST and Fashion-MNIST are tested to demonstrate the classification accuracy of the D²NN and the MDUNet with different layers. The two models both include amplitude, phase and nonlinear activation layers and are trained by the same parameters mentioned above. The results are shown in Fig. 2. Since D-UMs always appear symmetrically in pairs, the tests start with a 5-layer network and two layers are added at a time.

Fig. 2. Comparison of blind test accuracy and number of neurons between MDUNet and D²NN for different number of layers: (a) MNIST, (b) Fashion-MNIST. (c) The number of neurons of MDUNet and D²NN with different layers.

Download Full Size | PDF

Figures 2(a) and 2(b) show that when the number of network layers is the same, the testing accuracy of MDUNet is always better than D²NN. As the number of layers increases, the problems of gradient disappearance and gradient explosion gradually appear in the cascaded D²NN, resulting in the decline of the final classification accuracy. On the contrary, MDUNet can effectively transmit gradient information when the number of layers increase, because there are skip connections between the corresponding layers of downsampling and upsampling. Therefore, the depth advantage of MDUNet can be reflected when the number of layers of the network increases.

Furthermore, MDUNet can significantly reduce the number of training parameters due to D-UMs. For input objects with the resolution of 128 × 128 pixels, the comparison of the number of neurons of MDUNet and D²NN is shown in Fig. 2(c). When the number of network layers increases, the number of neurons in D²NN increases linearly, while the number of neurons in MDUNet only increases slightly due to the existence of D-UMs and SCs. Compared with Ref. [26], MDUNet can improve the testing accuracy of MNIST and Fashion-MNIST to 98.81% and 89.11% respectively, although only 11-layer networks and about 12 × 10⁴ neurons are used for training.

Trainable skip connections and fewer parameters reduce the complexity of the network, and the convergence efficiency of MDUNet is greatly improved. Figure 3 shows the comparison of convergence time between cascaded D²NN and MDUNet with different layers. It is obvious that the convergence efficiency of MDUNet is higher than ordinary cascaded D²NN when the number of layers is the same. Moreover, this advantage becomes more obvious as the depth of the network increases. In Fig. 3(b), although the accuracy of 5-layer network is higher than that of deep network after random initialization, the performance of the deep networks eventually surpasses 5-layer network with further training.

Fig. 3. Comparison of convergence plots of MDUNet and D²NN: (a) MNIST, (b) Fashion-MNIST.

Download Full Size | PDF

The comparison of the blind inference accuracy of MDUNet and D²NN for MNIST against various levels of misalignment in four directions (up, down, left, right) is shown in Fig. 4. For networks with different layers, the most central layer of each network was selected as the testing target, so as to test the influence of misalignment of diffractive layers on inferential performance to the maximum extent. Figure 4 shows that the MDUNet has a high robustness in a range although the diffractive layer is offset in the D-UMs. When the offset continues to increase, the accuracy will remain relatively stable. This result is obtained because the MDUNet network takes the alignment robustness into account when establishing the model. By introducing the downsampling-upsampling module, multiple (single) pixels of the upper layer in the down-sampling (up-sampling) stage correspond to the single (multiple) pixels of the next layer in the spatial position. Even if there is a small range of alignment deviation, it will not have much impact on the final result. As the depth of sampling increases, the robustness becomes stronger. Compared with Ref. [27], MDUNet can achieve better robustness after the third downsampling, and the testing accuracy dose not decrease due to the introduction of error data.

Fig. 4. Comparison of the blind inference accuracies of MDUNet and D²NN for MNIST against various levels of misalignment in four directions (7-Layers: 4th layer, 9-Layers: 5th layer, 11-Layers: 6th layer).

Download Full Size | PDF

Furthermore, because of the different pixel feature sizes at different sampling depths, the offset robustness also has periodic stability, which is attributed to the spatial correlation of diffractive layers brought by model establishment. We calculated autocorrelation and cross-correlation curves between testing layers and adjacent layers at three different sampling depths to verify the relationship between spatial correlation characteristics and periodic robustness in Fig. 5.

Fig. 5. Autocorrelation curve and cross-correlation curve with adjacent layers. (a)-(c) MDUNet: (a) 7-Layers: 4^th layer, (b) 9-Layers: 5^th layer, (c) 11-Layers: 6^th layer, (d) D²NN: 9-Layers: 5^th layer.

Download Full Size | PDF

Due to the existence of D-UMs, MDUNet with different sampling depths have different feature sizes. The feature size of the diffractive layer is doubled with each deepening of the sampling depth. This makes the correlation curve have a change period corresponding to the characteristic size. It can be seen from Figs. 5(a)-(c) that the correlation curve changes linearly in each period, and the trend remains consistent. This periodic variation greatly improves the stability of pixel matching between layers. The decrease in autocorrelation within the range of the first feature size is the main reason for the decrease in the first section of testing accuracy after testing layer shift in Fig. 4. The larger the change period of autocorrelation, the slower the decrease of test accuracy caused by diffraction layer shift. From the second period, the change of autocorrelation curve is gradually slow, and the interaction with cross correlation leads to the periodic change of testing accuracy. Because the correlation curve changes slowly and the trend is consistent in the period, the testing accuracy in the same period will only fluctuate in a small range. Only when the dislocation happens at the boundary of the period, the autocorrelation and cross-correlation will be mutated, leading to a great change in the test accuracy.

It can be seen from Fig. 5(d) that the diffractive layer of D²NN only has spatial correlation within a single pixel range. In other words, every pixel shift of the diffractive layer will lead to a mutation in the correlation, which leads to a sharp decline in the testing accuracy of D²NN in Fig. 4. Compared with D²NN, MDUNet introduces spatial correlation into the network modeling process through D-UMs, so as to effectively improve the robustness of the network.

3.2 Optoelectronic hybrid model constructed by ensemble learning

Compared with the CNN architecture LeNet, the accuracy of the all-optical learning framework MDUNet has exceeded LeNet-4 (98.9%) on MNIST, and the accuracy of Fashion-MNIST has been comparable to LeNet-4 (89.9%). Through the analysis of the testing results of MDUNet with different sampling depths (see Supplement 1), we found that different sampling depths cause differences in feature extraction of targets. In order to further combine the advantages of different sampling models, the weighted voting was added after the all-optical neural network MDUNet. The testing results of MDUNet with 5/7/9/11 layers were weighted voted by their testing accuracy, and the optoelectronic hybrid model EL-MDUNet was constructed by ensemble learning method [22,39]. The optoelectronic hybrid network (EL-MDUNet) are modeled, as shown in Fig. 6.

Fig. 6. The optoelectronic hybrid model (EL-MDUNet) constructed by ensemble learning.

Download Full Size | PDF

The difference of the testing accuracy of networks with different sampling depths are too small. In order to increase the response sensitivity of the hybrid network to this difference, the combination of K value and exponential function is used. The following definitions are adopted for weight distribution of different models:

(5)$$weigh{t_i} = \frac{{\exp ({K \times Ac{c_i}} )}}{{\sum\limits_N {\exp ({K \times Ac{c_i}} )} }},$$

where, i is the model serial number, K is the weight adjustment coefficient, K is set to 2 and 1.5 for MNIST and Fashion-MNIST in this model respectively, $Ac{c_i}$ is the test accuracy of the i-th model. Through this weight setting method, the model with high accuracy can get higher weight allocation. Under this setting, the final output will be affected only when the results of two or three shallow networks are simultaneously different from those of the deep networks.

We use ensemble learning to weighted vote the predicted results of 5-layer, 7-layer, 9-layer and 11-layer MDUNet models, and finally improve the blind test accuracy of MNIST and Fashion-MNIST to 99.06% and 89.86% respectively. The confusion matrix of the test results is shown in the Fig. 7. It can be seen that the classification accuracy of the 10 types of targets is at a high level through the ensemble learning model.

Fig. 7. Confusion matrix for inference results of ensemble learning model. (a): MNIST, (b): Fashion-MNIST.

Download Full Size | PDF

The comparison of testing accuracy of different models is shown in Table 1. The accuracy of different models for two datasets (MNIST and Fashion-MNIST) are collected. Compared with other D²NN (all-optical) models, MDUNet can achieve higher accuracy with fewer neurons. By adding ensemble learning, the testing accuracy can be further improved. The accuracy of EL-MDUNet is equivalent to that of the optoelectronic hybrid neural network, and it takes the lead in MNIST datasets.

Table 1. Testing accuracy of different models for MNIST and Fashion-MNIST datasets

View Table

4. Conclusion

An all-optical neural network architecture, MDUNet, with both depth extension and alignment robustness is proposed. We have successfully introduced downsampling and upsampling modules into D²NN, which directly increases the alignment robustness between layers and reduces the requirements for the number of neurons from the perspective of model construction. The network can effectively extract and fuse the features of targets at different scales and the spatial correlation of modulation layer is enhanced through sampling operation. The depth of the network can be effectively expanded by introducing the skip connections to fuse the features of corresponding scales. The network performance is significantly improved with the increase in the depth. MDUNet proposed a new idea for multi-scale feature extraction and fusion of optical neural networks.

Funding

Shanghai Municipal Science and Technology Major Project; Shanghai Frontiers Science Center Program (2021-2025 No. 20); National Key Research and Development Program of China (2021YFB2802000); National Natural Science Foundation of China (11972212, 12002213, 12072200, 61975123).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

2. M. L. Minsky, “Neural nets and the brain-model problem,” Ph.D. thesis (Princeton University, 1954).

3. G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science 313(5786), 504–507 (2006). [CrossRef]

4. D. Silve, A. Huang, C. J. Maddison, et al., “Mastering the game of Go with deep neural networks and tree search,” Nature 529(7587), 484–489 (2016). [CrossRef]

5. S. S. Haykin and R. Gwynn, Neural Networks and Learning Machines (Prentice Hall, 2008).

6. S. S. Haykin, Neural Networks - A Comprehensive Foundation (Prentice Hall, 1994).

7. T. Bouwmans, S. Javed, M. Sultana, and S. K. Jung, “Deep neural network concepts for background subtraction: A systematic review and comparative evaluation,” Neural Networks 117, 8–66 (2019). [CrossRef]

8. S. Bubeck and M. Sellke, “A Universal Law of Robustness via Isoperimetry,” presented at the 35th Advances in Neural Information Processing Systems, China, 6–14 (2021).

9. N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, and J. Dean, “Outrageously large neural networks: The sparsely-gated mixture-of-experts layer,” presented at the ICLR, France, 24–26 (2017).

10. B. McMullin, “John von Neumann and the Evolutionary Growth of Complexity: Looking Backward, Looking Forward,” Artificial Life 6(4), 347–361 (2000). [CrossRef]

11. M. M. Waldrop, “The chips are down for Moore’s law,” Nature 530(7589), 144–147 (2016). [CrossRef]

12. D. A. B. Miller, “Attojoule optoelectronics for low-energy information processing and communications,” J. Lightwave Technol. 35(3), 346–396 (2017). [CrossRef]

13. H. M. Ozaktas, B. Barshan, and D. Mendlovic, “Convolution and Filtering in Fractional Fourier Domains,” Opt. Rev. 1(1), 15–16 (1994). [CrossRef]

14. D. Mendlovic and H. M. Ozaktas, “Fractional Fourier transform and their optical implementation: I,” J. Opt. Soc. Am. A 10(9), 1875–1880 (1993). [CrossRef]

15. J. Chang, V. Sitzmann, D. Xiong, W. Heidrich, and G. J. S. R. Wetzstein, “Hybrid optical-electronic convolutional neural networks with optimized diffractive optics for image classification,” Sci. Rep. 8(1), 12324 (2018). [CrossRef]

16. E. Goi, Q. Zhang, X. Chen, H. Luan, and M. Gu, “Perspective on photonic memristive neuromorphic computing,” PhotoniX 1(1), 3 (2020). [CrossRef]

17. Q. Zhang, H. Yu, M. Barbiero, B. Wang, and M. Gu, “Artificial neural networks enabled by nanophotonics,” Light: Sci. Appl. 8(1), 42 (2019). [CrossRef]

18. X. Lin, Y. Rivenson, N. T. Yardimci, M. Veli, Y. Luo, M. Jarrahi, and A. Ozcan, “All-optical machine learning using diffractive deep neural networks,” Science 361(6406), 1004–1008 (2018). [CrossRef]

19. T. Zhou, X. Lin, J. Wu, Y. Chen, H. Xie, Y. Li, J. Fan, H. Wu, L. Fang, and Q. Dai, “Large-scale neuromorphic optoelectronic computing with a reconfigurable diffractive processing unit,” Nat. Photonics 15(5), 367–373 (2021). [CrossRef]

20. E. Goi, X. Chen, Q. Zhang, B. P. Cumming, S. Schoenhardt, H. Luan, and M. Gu, “Nanoprinted high-neuron-density optical linear perceptrons performing near-infrared inference on a CMOS chip,” Light: Sci. Appl. 10(1), 40 (2021). [CrossRef]

21. C. Hu, H. Huang, M. Chen, S. Yang, and H. Chen, “Video object detection from one single image through opto-electronic neural network,” APL Photonics 6(4), 046104 (2021). [CrossRef]

22. J. Shi, L. Zhou, T. Liu, C. Hu, K. Liu, J. Luo, H. Wang, C. Xie, and X. Zhang, “Multiple-view D²NNs array: realizing robust 3D object recognition,” Opt. Lett. 46(14), 3388–3391 (2021). [CrossRef]

23. C. Qian, X. Lin, X. Lin, J. Xu, Y. Sun, E. Li, B. Zhang, and H. Chen, “Performing optical logic operations by a diffractive neural network,” Light: Sci. Appl. 9(1), 59 (2020). [CrossRef]

24. J. Shi, D. Wei, C. Hu, M. Chen, K. Liu, J. Luo, and X. Zhang, “Robust light beam diffractive shaping based on a kind of compact all-optical neural network,” Opt. Express 29(5), 7084–7099 (2021). [CrossRef]

25. Y. Tao, J. Wu, T. Zhou, H. Xie, F. Xu, J. Fan, L. Fang, X. Lin, and Q. Dai, “Fourier-space Diffractive Deep Neural Network,” Phys. Rev. Lett. 123(2), 023901 (2019). [CrossRef]

26. R. Pascanu, T. Mikolov, and Y. Bengio, “On the difficulty of training Recurrent Neural Networks,” in Proceedings of the 30th International Conference on Machine Learning, PMLR28, 1310–1318 (2013).

27. H. Dou, T. Deng, T. Yan, H. Wu, X. Lin, and Q. Dai, “Residual D²NN: training diffractive deep neural networks via learnable light shortcuts,” Opt. Lett. 45(10), 2688–2691 (2020). [CrossRef]

28. D. Mengu, Y. Zhao, N. T. Yardimci, Y. Rivenson, M. Jarrahi, and A. Ozcanet, “Misalignment resilient diffractive optical networks,” Nanophotonics 9(13), 4207–4219 (2020). [CrossRef]

29. N. Cinar, A. Ozcan, and M. Kaya, “A hybrid DenseNet121-UNet model for brain tumor segmentation from MR Images,” Biomed. Signal Process. Control 76, 103647 (2022). [CrossRef]

30. M. Yang, Y. Yuan, and G. Liu, “SDUNet: Road extraction via spatial enhanced and densely connected UNet,” Pattern Recognit. 126, 108549 (2022). [CrossRef]

31. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” Medical Image Computing and Computer-Assisted Intervention (Springer, 2015), pp. 234–241.

32. L. Lin, C. Huang, Y. Chen, D. Chu, and C. Chen, “Deep learning-assisted wavefront correction with sparse data for holographic tomography,” Opt. Lasers Eng. 154, 107010 (2022). [CrossRef]

33. Y. Wang, Z. Lin, H. Wang, C. Hu, H. Yang, and M. Gu, “High-generalization deep sparse pattern reconstruction: feature extraction of speckles using self-attention armed convolutional neural networks,” Opt. Express 29(22), 35702–35711 (2021). [CrossRef]

34. J. W. Goodman, Introduction to Fourier Optics, 3rd ed. (Stanford University, 2005).

35. R. W. Boyd, Nonlinear Optics (Elsevier, 2003).

36. R. Amin, J. K. George, H. Wang, R. Malti, Z. Ma, H. Dalir, J. B. Khurgin, and V. J. Sorger, “An ITO–graphene heterojunction integrated absorption modulator on Si-photonics for neuromorphic nonlinear activation,” APL Photonics 6(12), 120801 (2021). [CrossRef]

37. D. Mengu, Y. Luo, Y. Rivenson, and A. Ozcan, “Analysis of Diffractive Optical Neural Networks and Their Integration with Electronic Neural Networks,” IEEE J. Sel. Top. Quantum Electron. 26(1), 1–14 (2020). [CrossRef]

38. A. Grave, M. Joulin, D. Cisse, H. Grangier, and Jegou, “Efficient softmax approximation for GPUs,” in Proceedings of the 34th International Conference on Machine Learning (2017), pp. 1302–1310.

39. M. S. S. Rahman, J. Li, D. Mengu, Y. Rivenson, and A. Ozcan, “Ensemble learning of diffractive optical networks,” Light: Sci. Appl. 10(1), 14 (2021). [CrossRef]

Dataset	Type of networks	Model	Nonlinear layer	Number of neurons (Resolution of pixels)	Accuracy/%
MNIST	All-optical neural networks	MDUNet	√	1.2 × 10⁵ (128*128)	98.91
		D²NN [37]	×	2 × 10⁵ (200*200)	97.81
		F-D²NN [25]	√	4 × 10⁵ (200*200)	98.1
		Res-D²NN [27]	√	8 × 10⁵ (200*200)	98.4
	Optoelectronic hybrid neural networks	EL-MDUNet	√	4.8 × 10⁵ (128*128)	99.06
		D-NIN-1++ [19]	√	2.2 × 10⁶ (560*560)	99
		D²NN-based hybrid network [37]	√	25.9 × 10⁶ (200*200)	98.97
Fashion- MNIST	All-optical neural networks	MDUNet	√	1.2 × 10⁵(128*128)	89.11
		D²NN [18]	×	2 × 10⁵ (200*200)	86.60
		Res-D²NN [27]	√	8 × 10⁵(200*200)	88.4
	Optoelectronic hybrid neural networks	EL-MDUNet	√	4.8 × 10⁵ (128*128)	89.86
		D-NIN-1 [19]	√	2.2 × 10⁶(560*560)	90.2
		D²NN-based hybrid network [37]	√	9.9 × 10⁶(200*200)	90.45

Multiscale diffractive U-Net: a robust all-optical deep learning framework modeled with sampling and skip connections

Abstract

1. Introduction

2. Modeling methods and parameters

3. Results

3.1 Comparison of MDUNet and D²NN on MNIST and Fashion-MNIST

3.2 Optoelectronic hybrid model constructed by ensemble learning

4. Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (7)

Tables (1)

Equations (5)

Optics Express

Abstract

1. Introduction

2. Modeling methods and parameters

3. Results

3.1 Comparison of MDUNet and D2NN on MNIST and Fashion-MNIST

3.2 Optoelectronic hybrid model constructed by ensemble learning

4. Conclusion

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (7)

Tables (1)

Equations (5)

Optics Express

3.1 Comparison of MDUNet and D²NN on MNIST and Fashion-MNIST