Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Image-free classification of fast-moving objects using “learned” structured illumination and single-pixel detection

Open Access Open Access

Abstract

Object classification generally relies on image acquisition and subsequent analysis. Real-time classification of fast-moving objects is a challenging task. Here we propose an approach for real-time classification of fast-moving objects without image acquisition. The key to the approach is to use structured illumination and single-pixel detection to acquire the object features directly. A convolutional neural network (CNN) is trained to learn the object features. The “learned” object features are then used as structured patterns for structured illumination. Object classification can be achieved by picking up the resulting light signals by a single-pixel detector and feeding the single-pixel measurements to the trained CNN. In our experiments, we show that accurate and real-time classification of fast-moving objects can be achieved. Potential applications of the proposed approach include rapid classification of flowing cells, assembly-line inspection, and aircraft classification in defense applications. Benefiting from the use of a single-pixel detector, the approach might be applicable for hidden moving object classification.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Object classification has found important applications in various fields, such as traffic control [1], assembly-line industrial inspection [2], remote sensing [3], medical cytometry [4], and military [5]. Most available object classification systems are based on image acquisition and analysis (also known as machine vision). Object classification algorithms rely on the features extracted from the target images. Thus, an image-based object classification system often employs the acquire-and-extract strategy, in which the system first acquires images of the target objects and then deploys certain image analysis algorithms to extract the object features for classification.

Although image-based object classification systems have been explored for decades [68], classification of fast-moving objects in real time and for a long duration is still a challenging task. Fast-moving objects often generate dramatic motion blurs in the images captured. The induced motion blurs cause severe image quality degradation, and therefore, reduce the achievable classification accuracy. High-speed photography [911] and sophisticated image analysis algorithms [12,13] are approaches to handle these motion blurs. However, it is challenging to apply high-speed photography for a long duration, as the generated large amount of data requires the corresponding data storage, bandwidth, and processing capacities. In addition, sophisticated image analysis algorithms are, in general, computationally expensive, and therefore, they are not suited for real-time classification.

Image-based object classification systems commonly capture two-dimensional (2-D) images for subsequent feature extraction. Except the object features, images generally contain a large amount of redundant data which have to be acquired, transferred, and processed in these image-based object classification systems. Recently, S. Ota et al. proposed an image-free method for the classification of high-speed moving blood cells, termed ghost cytometry [4]. Inspired by the advancement of ghost imaging [14,15], this method uses a static and random pattern to modulate the illumination light field. As such, the feature information of high-speed moving cells is compressively encoded into a one-dimensional (1-D) light intensity signal. The resulting 1-D light signal is collected by using a single-pixel detector and fed to a support vector machine for cell classification. This method demonstrates that the image-free scheme enables higher data efficiency by avoiding redundant image acquisition. Similarly, there are some other single-pixel imaging inspired methods proposed for static objects classification [1618].

In this paper, we report an image-free object classification approach that involves adaptive structured illumination and single-pixel detection. Different from the conventional “acquire-and-extract” strategy, our approach employs an “extract-and-acquire” strategy. In this novel strategy, we first train a convolutional neural network (CNN) to learn the object features. The “learned” object features are then used as structured patterns for illumination. As such, the feature information of target objects can be compressively encoded into 1-D light intensity signals. We use a single-pixel detector to acquire the resulting signals and feed them to the trained CNN for object classification. The reported approach is both data- and computation-efficient, allowing real-time and long-duration classification of fast-moving objects. We note that the combination of deep learning and single-pixel detection has been recently demonstrated for reducing data acquisition time or improving image reconstruction quality in imaging [1922]. Here we further demonstrate that such a combination can also achieve fast-moving object classification in an image-free manner, which might open up a new avenue for single-pixel imaging.

The rest of the paper is organized as follows. In Section 2, we describe the principle and the implementation of the proposed image-free approach. In Section 3, we present the design of our CNN and how to train the CNN for moving objects classification. Moreover, we introduce how to extract the object features from the trained CNN and generate the structured patterns for structured illumination. In Section 4, we present an experiment to validate the proposed approach in classification of fast-moving digits. The performance, the advantages, and the further development of the approach are discussed in Section 5. Finally, Section 6 offers concluding remarks.

2. Principle

Single-pixel imaging has been demonstrated as an effective way for spatial information acquisition. It consists of two steps: structured illumination and single-pixel detection [23,24]. Structured illumination is referred to the projection of a sequence of spatially modulated patterns on the target objects. The desired object information can be encoded in this modulation process. Single-pixel detection is referred to the detection of the resulting light signal with a spatially unresolvable photodetector, such as a photodiode or solar cell. In a typical single-pixel imaging implementation, random [25] or deterministic orthogonal patterns [26] are used in structured illumination for encoding the 2-D object information into 1-D single-pixel measurements. The 2-D object image can then be reconstructed through correlation [25], inverse orthogonal transformation [26], or compressive sensing [24] from the single-pixel measurements.

Instead of reconstructing images, our approach only acquires the feature information of the target objects for object classification. The key innovation is to design the proper structured patterns that can be used in structured illumination for effective and efficient feature information acquisition. The detected 1-D signal can be viewed as a measure of the similarity between the projected structured patterns and the target objects. If the structured pattern and the target object have spatial features in common, the magnitude of the resulting single-pixel measurements would be large, and vice versa. With the optimized structured patterns, the resulting single-pixel measurement sequence allows objects to be accurately classified.

The question is how to design the optimized structured patterns for structured illumination. In practice, it is often difficult to analytically express the object features. In our implementation, we employ a CNN for automatic feature information extraction. Specifically, the convolution kernels of our trained CNN are used as the structured patterns. We binarize the grayscale kernels for high-speed pattern projection. The resulting signals are fed to the trained CNN for object classification. The entire object classification process is image-free, ensuring high efficiency in data acquisition. Additionally, the architecture of the designed CNN is so simple that fast-moving objects can be classified in real time.

The implementation of the proposed image-free method can be summarized as the following 6 steps:

  • 1) CNN architecture design;
  • 2) CNN training;
  • 3) Structured illumination patterns generation by extracting the convolution kernels from the CNN;
  • 4) Structured illumination patterns binarization;
  • 5) Using the binarized structured patterns for illumination and collecting the resulting light intensity sequence with a single-pixel detector;
  • 6) Feeding the CNN with the single-pixel measurement sequence as input and deriving the classification results from the CNN.

3. CNN architecture design, training, and illumination patterns generation

As a proof-of-concept, we demonstrate the proposed method in fast-moving handwritten digits classification. The digits are from the MNIST handwritten digits database [27].

3.1 CNN architecture design

The architecture of the proposed CNN is presented in Fig. 1(a). The CNN consists of an image input layer, a convolutional layer, a deconvolutional layer, 3 fully connected layers, and an output layer. The image input layer is available only in CNN training and testing. This layer accepts training or testing images as input. The dimension of the image input layer is 28${\times} $28 which coincides with the size of images provided by the database. The dimension of the convolution kernel layer is 28${\times} $28${\times} $15. In other words, the convolution kernel layer consists of 15 convolution kernels and the size of each kernel is 28${\times} $28. The size of the kernels is identical to the input image size so that the convolution is equivalent to an inner product, which is a key innovation of the proposed image-free method. Such a design allows the convolution to be conducted by an optical means. Specifically, as Fig. 1(b) shows, when the target object is illuminated by a kernel pattern, the resulting single-pixel measurement is the inner product of the kernel pattern and the target object image. The deconvolutional layer is used for features upsampling. To increase the nonlinearity of the network, the feature map of the deconvolutional layer is flattened and connected to the fully connected layers. There are 3 fully connected layers following the deconvolutional layer. Each has 400, 200, 100 units, respectively. In the output layer, the “Softmax” activation function [28] is empirically used to predict the final output.

 figure: Fig. 1.

Fig. 1. The schematic of CNN-based object classification. (a) The architecture of the proposed CNN. Similar to conventional CNNs, the proposed CNN consists of an image input layer which accepts images as input during network training. The convolutional kernels in the trained CNN will be used as the patterns for structured illumination. Different from conventional CNNs, when the proposed CNN is used in object classification (b), the feature map becomes the input layer accepting the single-pixel measurements as input. The single-pixel measurements are obtained by illuminating the target moving objects using the convolutional kernels. As such, the convolution is implemented by an optical manner. In comparison with the conventional image-based counterpart (c) where high-speed photography is required, the proposed method achieves object classification without image acquisition.

Download Full Size | PDF

We note that the feature map between the convolution kernel layer and the deconvolutional layer is an intermediate for conventional CNNs. In the proposed method, the feature map is designed to accept the single-pixel measurements for subsequent object classification, because the convolutions can be optically done by illuminating the target object with the kernel patterns. In comparison with the typical image-based object classification scheme (Fig. 1(c)), the proposed method can achieve “optical convolution” rather than electrical convolution. In addition, the proposed method removes the need for image acquisition, which is an innovation of our method.

3.2 CNN training

The MNIST database provides 60,000 training images and 10,000 test images. These images are upright and centered. Considering objects in motion might have rotation and translation, we apply random translation (-11 to 11 pixels) and rotation (-15 to 15 degrees) to each training image. By doing so, we intend to improve the robustness of the CNN for moving objects classification. Figure 2 shows an example of 8 original training images and the corresponding shifted and rotated images.

 figure: Fig. 2.

Fig. 2. Example of the training images. The first row shows the original images and the second row shows the images with random lateral shift and rotation.

Download Full Size | PDF

In the training process, the CNN is initialized by a truncated normal distribution with 0.1 standard deviations and 0 means. All the bias terms are initialized with 0. The cross-entropy loss function is adopted for optimizing. Estimated by the loss function, the error is backpropagated through the network and the network’s parameters are updated by the adaptive moment estimation (ADAM) optimization [29], which is a stochastic optimization method. The learning rate is set to be 0.001 and the mini-batch size is 128. The training process takes 725.96 seconds on our computer. The computer is equipped with an Intel Core i3-7300 CPU, 32-GB RAM, and an NVidia RTX 2080Ti graphic card. The code for the CNN training is made available on Github – https://github.com/zibangzhang/image-free-single-pixel-moving-object-classification. After the training, we use all 10,000 test images provided by the database to test the CNN. Among the 10,000 test images, 9,611 are correctly classified. The overall accuracy of the test is 96.11%.

In order to demonstrate the resulting kernel patterns are optimized for handwritten digits classification, a comparison is made to sampling with Fourier patterns, Hadamard patterns and random patterns. Specifically, we train another 3 CNNs for the comparison. The architecture of the CNNs is as Fig. 1(b) shows. The feature map is the input layer. We simulate the single-pixel measurements by convoluting the training images and these patterns sets. The resulting single-pixel measurements as input are fed to the feature map for the network training. However, the testing accuracy of these 3 CNNs (49.63% for Fourier, 51.21% for Hadamard, and 57.52% for random) is remarkably lower than that by the proposed method. The lower accuracy might be because these patterns sets contain less objects feature information of the target objects. Therefore, the difference among the target objects cannot well be revealed in the single-pixel measurements, resulting in ambiguity. And such ambiguity results in the lower accuracy.

3.3 Structured illumination patterns generation

The trained CNN “learns” the features of the handwritten digits. As Fig. 3 shows, the features learned from the training data are revealed in the convolution kernels. We propose to use the convolution kernels as structured illumination patterns.

 figure: Fig. 3.

Fig. 3. Convolution kernels extracted from the trained CNN. The size of the kernels is 28${\times} $28 pixels.

Download Full Size | PDF

However, the intensity of the convolution kernels ranges from -1 to 1, while DMDs, as intensity-only spatial light modulators, can only generate patterns with non-negative values. Thus, we propose to binarize the convolution kernels and perform differential measurements. Specifically, we use the “upsample-and-dither” strategy [30] for patterns binarization. As Fig. 4 shows, the binarization strategy is two-step. First, the grayscale kernel patterns are upsampled by using the “bicubic” interpolation algorithm [31] with a scaling factor k. Second, the upsampled patterns are binarized by using the Floyd-Steinberg dithering algorithm [32]. As the resulting patterns are non-negative, we further employ a differential measurement which is referred to illuminating the target objects with a binarized pattern ${P^ + }$ followed by its inverse ${P^ - }$ ($= 1 - {P^ + }$). Then the difference of the corresponding single-pixel measurements is used as one effective measurement. The differential measurement doubles the number of patterns.

 figure: Fig. 4.

Fig. 4. Illustration of binary patterns generation.

Download Full Size | PDF

4. Experiments

The schematic of our experimental set-up is shown in Fig. 5. The structured illumination is generated by using a DMD (Texas Instruments DMD 4100) and a 10-watt white LED as the light source. For structured patterns generation, the scaling factor $k = 25$, resulting in the size of structured pattern to be 700${\times} $700 pixels. As such, we can make full use of the area of the DMD. The structured patterns displayed on the DMD are projected onto the target objects through Lens 1. In our experiment, the target objects are 8 handwritten digits. The digits are randomly chosen from the test set of the database. In order to simulate moving digits, we print the digit images (black background and white digits) on transparent films and uniformly place them in a black paper disk. The disk is driven by a motor. The rotation speed of the disk can be tuned by adjusting the input voltage of the motor. The radius of the disk is 0.15 m. The disk is placed on the image plane of the DMD. The structured illumination area on the disk is 45 mm${\times} $45 mm. When a digit falls into the illumination area, the light passes through. Otherwise, the light will be blocked. A photodiode (Thorlabs PDA-100A2) is used as a single-pixel detector to collect the light through the objects. The collected 1-D signal is digitalized using a data acquisition panel (National Instruments USB-6340) and fed to the trained CNN on the computer for classification.

 figure: Fig. 5.

Fig. 5. Schematic of the experimental set-up.

Download Full Size | PDF

We adjust the input voltage of the motor to control the digits moving at a different speed. The DMD operates at a refreshing rate of 17,850 Hz. We note that it is the optimal refreshing rate for our experiment, as we find that if the refreshing rate is higher than this threshold, the classification accuracy would become lower. It might be because the higher refreshing rate leads to the shorter integration time for each single-pixel measurement. Consequently, the signal-to-noise ratio will be lower. The utilized data acquisition board operates at 500 kHz. We therefore can take 28 (500,000 / 17,850) samples for each structured pattern. We then average the 28 samples to derive a single-pixel measurement. We note that the measurements for the moments when no digits are in the field-of-view (illumination area) are filtered. The filtering is done automatically by employing thresholding, as the disk blocks all the light from the DMD and the magnitude of the corresponding single-pixel measurements is remarkably lower than that of the rest. By employing the differential measurement strategy depicted in Section 3.3, we feed every 15 successive differential single-pixel measurements to the feature map of the CNN and conduct subsequent objects classification.

We loop the 15 pairs of binary structured patterns for 400 times and collect 12,000 single-pixel measurements, during which the disk keeps rotating. Thus, each digit on the disk travels through the field-of-view for multiple times. The accuracy is calculated by dividing the number of correctly classified digits by the number of digits having travelled through the field-of-view. The classification results are presented in Table 1. When the input voltage is 1.5 V, the linear speed of the digits is 3.61 m/s. There are 21 digits passing through the field-of-view and all 21 digits are correctly classified. The achievable accuracy is 100%. It is not surprising that the accuracy is higher than that in the static simulation (96.11%), because the latter is a statistic value for 10,000 test images among which some digits are quite ambiguous. Considering the field-of-view is only 45 mm${\times} $45 mm, each digit enters and leaves the field-of-view within ∼0.012 seconds. Thus, we claim that the digits are moving at a high speed. On the whole, the accuracy decreases gradually, as the rotation speed increases. When the target objects move at a linear speed of 8.24 m/s, the classification accuracy is still higher than 50%.

Tables Icon

Table 1. Classification results of moving handwritten digits for different moving speeds

5. Discussion

The temporal resolution is crucial for object classification methods. The temporal resolution of the proposed method is determined by both the number of structured patterns and the refreshing rate of the spatial light modulator. The number of patterns used in structured illumination varies with the number of categories of target objects and the similarity among categories. Specifically, more patterns might be required, if there are more categories of the target objects or the similarity among the target objects is higher. It is because more feature information is needed to differentiate one object to another. In our case, the number of categories of handwritten digits is 10 and the similarity between some handwritten digits is high. For example, handwritten “1” and “7”, “4” and “6”, “3” and “9” are sometimes difficult to classify. Thus, we use 15 pairs of structured patterns. The number of patterns might be further reduced by finely designing the CNN architecture. On the other hand, the DMD we use to generate structured illumination operates at 17,850 Hz. In other words, the temporal resolution of our method is 1.68 ms (15${\times} $2/17,850) and the equivalent frame rate is 595. However, the higher refreshing rate of the DMD results in the lower signal-to-noise ratio in single-pixel measurements, as the integration time for each single-pixel measurement will be shorter. Thus, it is reasonable to use a high-speed and high-flux spatial light modulator (for example, high-speed LED matrix [33,34]).

The computational complexity of the proposed method is low, as it only takes 1.43 ms for computation on average. In other words, the latency time of the proposed method is short. Additionally, the proposed method demands a very small amount of data for object classification. Specifically, each classification requires 1,680 bytes data (16 bits (2 bytes)/sample ${\times} $ 28 samples/measurement ${\times} $ 30 measurements/classification). The high data efficiency avoids the issue of data accumulation. We note that the latency time and the duration of the conventional image-based object classification methods are generally comprised by the limited data storage, bandwidth, or processing capacities, because high-speed photography generates a huge number of data and image analysis algorithms are typically computationally exhausted. For some applications, such as assembly-line inspection, where real-time and long-duration classification is desired, the proposed method might outperform conventional methods, because the proposed method is both computation- and data-efficient. Another advantage of the proposed method is the capability for hidden objects classification, because it works with a single-pixel detector for data acquisition and single-pixel detectors can work at a wider waveband than conventional silicon-based image sensors.

In order to improve the classification accuracy for objects with rotation, we apply a random rotation to each training image. And this strategy has been proved effective in our experiment. However, in practice, the rotation of target objects might be various. Thus, it is necessary to generate more images with different rotation angles to train the CNN. Fortunately, this strategy would not add any demand for extra labeled data.

6. Conclusion

We report an image-free and real-time classification method for fast-moving objects. The reported method is based on structured illumination, single-pixel detection, and deep learning. As experimentally demonstrated, the method can successfully classify objects moving as fast as 3.61 m/s in a field-of-view of 45 mm${\times} $45 mm. The achievable temporal resolution is 1.68 ms. And each classification only requires 1,680 bytes data and takes 1.43 ms for computation. Thus, the reported method is both data- and computation-efficient, allowing for real-time and long-duration classification. Potential applications of the reported method include rapid classification of flowing cells, assembly-line inspection, and aircrafts classification in defense applications. Additionally, as single-pixel detectors have the advantage of sensing at nonvisible wavebands, the reported method might be applicable for hidden objects classification.

Funding

National Natural Science Foundation of China (61905098, 61875074); Fundamental Research Funds for the Central Universities (11618307).

Disclosures

The authors declare no conflicts of interest.

References

1. D. Cireşan, U. Meier, J. Masci, and J. Schmidhuber, “Multi-column deep neural network for traffic sign classification,” Neural Networks 32, 333–338 (2012). [CrossRef]  

2. R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, and R. X. Gao, “Deep learning and its applications to machine health monitoring,” Mech. Syst. Signal Pr. 115(15), 213–237 (2019). [CrossRef]  

3. X. Lu, X. Zheng, and Y. Yuan, “Remote Sensing Scene Classification by Unsupervised Representation Learning,” IEEE T. Geosci. Remote 55(9), 5148–5157 (2017). [CrossRef]  

4. S. Ota, R. Horisaki, Y. Kawamura, M. Ugawa, I. Sato, K. Hashimoto, R. Kamesawa, K. Setoyama, S. Yamaguchi, K. Fujiu, K. Waki, and H. Noji, “Ghost cytometry,” Science 360(6394), 1246–1251 (2018). [CrossRef]  

5. J. Neulist and W. Armbruster, “Segmentation, classification, and pose estimation of military vehicles in low resolution laser radar images,” P. Soc. Photo-Opt. Ins. 5791, 218–225 (2005). [CrossRef]  

6. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. Berg, and F. Li, “Imagenet large scale visual recognition challenge,” Int. J. Comput. Vision 115(3), 211–252 (2015). [CrossRef]  

7. A. Krizhevsky, I. Sutskever, and G. Hinton, “Imagenet classification with deep convolutional neural networks,” Adv. Neur. In. 25, 1097–1105 (2012).

8. A. Andreopoulos and J. Tsotsos, “50 years of object recognition: Directions forward,” Comput. Vis. Image Und. 117(8), 827–891 (2013). [CrossRef]  

9. M. Vollmer and K. Möllmann, “High speed and slow motion: the technology of modern high speed cameras,” Phys. Educ. 46(2), 191–202 (2011). [CrossRef]  

10. A. Veeraraghavan, D. Reddy, and R. Raskar, “Coded strobing photography: Compressive sensing of high speed periodic videos,” IEEE T. Pattern Anal. 33(4), 671–686 (2011). [CrossRef]  

11. D. Reddy, A. Veeraraghavan, and R. Chellappa, “P2C2: Programmable pixel compressive camera for high speed imaging,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2011), pp. 329–336.

12. L. Xu and J. Jia, “Two-phase kernel estimation for robust motion deblurring,” in European conference on computer vision (2010), pp. 157–170.

13. J. Sun, W. Cao, Z. Xu, and J. Ponce, “Learning a convolutional neural network for non-uniform motion blur removal,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2015), pp. 769–777.

14. A. Gatti, E. Brambilla, M. Bache, and L. A. Lugiato, “Ghost imaging with thermal light: comparing entanglement and classical correlation,” Phys. Rev. Lett. 93(9), 093602 (2004). [CrossRef]  

15. J. H. Shapiro, “Computational ghost imaging,” Phys. Rev. A 78(6), 061802 (2008). [CrossRef]  

16. H. Chen, J. Shi, X. Liu, Z. Niu, and G. Zeng, “Single-pixel non-imaging object recognition by means of Fourier spectrum acquisition,” Opt. Commun. 413, 269–275 (2018). [CrossRef]  

17. S. Jiao, J. Feng, Y. Gao, T. Lei, Z. Xie, and X. Yuan, “Optical machine learning with incoherent light and a single-pixel detector,” Opt. Lett. 44(21), 5186–5189 (2019). [CrossRef]  

18. F. Wang, H. Wang, H. Wang, G. Li, and G. Situ, “Learning from simulation: An end-to-end deep-learning approach for computational ghost imaging,” Opt. Express 27(18), 25560–25572 (2019). [CrossRef]  

19. C. F. Higham, R. Murray-Smith, M. J. Padgett, and M. P. Edgar, “Deep learning for real-time single-pixel video,” Sci. Rep. 8(1), 2369 (2018). [CrossRef]  

20. M. Lyu, W. Wang, H. Wang, H. Wang, G. Li, N. Chen, and G. Situ, “Deep-learning-based ghost imaging,” Sci. Rep. 7(1), 17865 (2017). [CrossRef]  

21. T. Shimobaba, Y. Endo, T. Nishitsuji, T. Takahashi, Y. Nagahama, S. Hasegawa, M. Sano, R. Hirayama, T. Kakue, A. Shiraki, and T. Ito, “Computational ghost imaging using deep learning,” Opt. Commun. 413, 147–151 (2018). [CrossRef]  

22. Y. He, G. Wang, G. Dong, S. Zhu, H. Chen, A. Zhang, and Z. Xu, “Ghost Imaging Based on Deep Learning,” Sci. Rep. 8(1), 6469 (2018). [CrossRef]  

23. M. P. Edgar, G. M. Gibson, and M. J. Padgett, “Principles and prospects for single-pixel imaging,” Nat. Photonics 13(1), 13–20 (2019). [CrossRef]  

24. M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F. Kelly, and R. G. Baraniuk, “Single-pixel imaging via compressive sampling,” IEEE Signal Proc. Mag. 25(2), 83–91 (2008). [CrossRef]  

25. B. Sun, M. P. Edgar, R. Bowman, L. E. Vittert, S. Welsh, A. Bowman, and M. J. Padgett, “3D computational imaging with single-pixel detectors,” Science 340(6134), 844–847 (2013). [CrossRef]  

26. Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Hadamard single-pixel imaging versus Fourier single-pixel imaging,” Opt. Express 25(16), 19619–19639 (2017). [CrossRef]  

27. Y. LeCun, C. Cortes, and C. J. C. Burges, “THE MNIST DATABASE of handwritten digits,” http://yann.lecun.com/exdb/mnist/.

28. R. Dunne and N. Campbell. “On the pairing of the softmax activation and cross-entropy penalty functions and the derivation of the softmax activation function.” in Proc. 8th Aust. Conf. on the Neural Networks, (1997), pp. 185.

29. D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” https://arxiv.org/abs/1412.6980 (2014).

30. Z. Zhang, X. Wang, G. Zheng, and J. Zhong, “Fast Fourier single-pixel imaging via binary illumination,” Sci. Rep. 7(1), 12029 (2017). [CrossRef]  

31. H. Hou and H. Andrews, “Cubic splines for image interpolation and digital filtering,” IEEE Trans. Acoust., Speech, Signal Process. 26(6), 508–517 (1978). [CrossRef]  

32. R. Floyd and L. Steinberg, “An adaptive algorithm for spatial grey scale,” Proc. Soc. Inf. Display 17, 75–77 (1976).

33. Z. Xu, W. Chen, J. Penuelas, M. Padgett, and M. Sun, “1000 fps computational ghost imaging using LED-based structured illumination,”,” Opt. Express 26(3), 2427–2434 (2018). [CrossRef]  

34. E. Salvador-Balaguer, P. Latorre-Carmona, C. Chabert, F. Pla, J. Lancis, and E. Tajahuerce, “Low-cost single-pixel 3D imaging by using an LED array,” Opt. Express 26(12), 15623–15631 (2018). [CrossRef]  

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (5)

Fig. 1.
Fig. 1. The schematic of CNN-based object classification. (a) The architecture of the proposed CNN. Similar to conventional CNNs, the proposed CNN consists of an image input layer which accepts images as input during network training. The convolutional kernels in the trained CNN will be used as the patterns for structured illumination. Different from conventional CNNs, when the proposed CNN is used in object classification (b), the feature map becomes the input layer accepting the single-pixel measurements as input. The single-pixel measurements are obtained by illuminating the target moving objects using the convolutional kernels. As such, the convolution is implemented by an optical manner. In comparison with the conventional image-based counterpart (c) where high-speed photography is required, the proposed method achieves object classification without image acquisition.
Fig. 2.
Fig. 2. Example of the training images. The first row shows the original images and the second row shows the images with random lateral shift and rotation.
Fig. 3.
Fig. 3. Convolution kernels extracted from the trained CNN. The size of the kernels is 28${\times} $28 pixels.
Fig. 4.
Fig. 4. Illustration of binary patterns generation.
Fig. 5.
Fig. 5. Schematic of the experimental set-up.

Tables (1)

Tables Icon

Table 1. Classification results of moving handwritten digits for different moving speeds

Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.