Intelligent evaluation for lens optical performance based on machine vision

Zhonghe Ren; Fengzhou Fang; Fengzhou Fang; Zihao Li; Rui Niu

doi:10.1364/OE.463148

1. Introduction

As the basic unit of an optical system, optical glass elements are widely used in various fields, such as scientific research and industrial production [1,2]. Collimating lens is a kind of aspheric glass lens, which is an optical element mainly used to generate parallel beams [3–6]. In order to guarantee the reliability of application in optical imaging system and precision measurement field, it is necessary to test and evaluate the optical performance of collimating lenses [7,8]. Optical performance evaluation is a critical process in the production of collimating lenses. The traditional optical performance evaluation method is to observe the lens light-spot images manually, and judge whether the optical performance of the lens is acceptable according to the reference legend and personnel experience. However, manual observation is inefficient and prone to fatigue [9].

Human perception mainly depends on vision, and the core technology of intelligent perception is machine vision. Intelligent detection based on machine vision is deeply integrated with industrial demands to promote the transformation and upgrading of the manufacturing industry [10–12]. Wu et al. [13] proposed a fast-axis collimating lens recognition algorithm based on machine vision, which can be used in the actual production to solve the problem of low efficiency and low accuracy of manually picked collimating lenses. Kuo et al. [14] developed an automated defect inspection system for image sensors and realized defect identification classification by using the Support Vector Machine (SVM) based on machine vision. Mizutani et al. [15] introduced a convolutional neural network to increase the quality and speed of ghost imaging detection for position mapping of weakly scattered light source. These studies show that the intelligent detection based on machine vision has the advantages of high efficiency and accuracy, and is convenient to realize the intelligent transformation of industrial production line, which is helpful for data statistics and timely feedback to guide the process improvement. Machine vision is one of the key technologies used to perform intelligent manufacturing, and it has become an effective way to replace the manual observation [16,17].

The objective of this study is to develop an intelligent evaluation model based on machine vision for optical performance evaluation of collimating lenses. In order to build the lens light-spot image dataset, a lens light-spot image acquisition system based on machine vision is designed, and the mapping relation between lens light-spot image morphological features and lens optical performance evaluation is studied for labeling of the dataset. Before model training, the original images are preprocessed by feature location, image cropping, image filtering and data standardization. The intelligent evaluation model for lens optical performance is designed and trained. Through visual analysis of the trained model, the information heat-maps of model feature extraction are obtained. A weighted multi-model voting strategy is proposed to further improve the model performance. Finally, the application software is developed to realize the deployment and application of intelligent evaluation model in collimating lens production.

2. Lens production and evaluation

Collimating lens is an optical element used to modulate light into a parallel collimating beam [5,18,19]. As a kind of aspheric glass lenses, collimating lens is generally produced by Precision Glass Molding (PGM) technology [20,21]. PGM technology is one of the most important approaches to produce optical glass lenses [22,23]. Figure 1 presents the process flow diagram of collimating lens production. The main production processes include optical design, mold design, mold manufacture, glass molding, geometric evaluation, assembly housing, optical performance evaluation and packaging.

Fig. 1. Process flow diagram of collimating lens production.

Download Full Size | PDF

Geometric evaluation and optical performance evaluation are two important inspection and evaluation processes in the production of collimating lenses [7]. The main factors affecting the geometric evaluation are the deviations in mold manufacture process and glass molding process, while the main factors affecting the optical performance evaluation are the deviations in mold manufacture process and glass molding process as well as the assembly housing process. For optical performance evaluation, the main interfering factors are marked with (a), (b) and (c) in Fig. 1. This study is mainly focused on optical performance evaluation.

In order to evaluate lens optical performance, the general method is to analyze the imaging by using corresponding optical test system. For collimating lenses, each image obtained by a customized optical system is distributed with a light-spot and some stripes, which can be called lens light-spot image [7].

Manual detection is to analyze the characteristics by observing lens light-spot images, and evaluate the optical performance of corresponding lenses. In order to improve the detection efficiency, it is a feasible method to combine machine vision and artificial intelligence technology to acquire and analyze lens light-spot images instead of visual inspection. Machine vision is used to capture images in the production, and intelligent model based on deep learning is used to evaluate optical performance.

Figure 2 presents the schematic diagram of model training stage and applied stage. In training stage of intelligent evaluation model, in order to provide training data and corresponding labels for model input and loss function calculation respectively, a large-scale dataset needs to be built, which includes lens light-spot images and optical performance evaluation labels. In applied stage, the trained model is compiled and packaged to realize the deployment in the developed application software. The application software can take the images captured by the production online camera as the input, call the deployed trained model to perform calculation, and output optical performance evaluation prediction results.

Fig. 2. Schematic diagram of model training stage and applied stage.

Download Full Size | PDF

3. Lens light-spot image dataset

The work of building a dataset mainly includes image acquisition and labeling. For Lens Light-spot Image (LLI) dataset, lens light-spot image acquisition and labeling of optical performance evaluation are described in this section.

3.1 Image acquisition

LLI dataset image acquisition system based on machine vision is developed, as shown in Fig. 3(a). Through this system, the lens light-spot images with different morphologies can be recorded, as shown in Fig. 3(b) and Fig. 3(c). Figure 3(d) presents an equipment for detecting optical performance of collimating lenses [7].

Fig. 3. LLI dataset image acquisition system. (a) Schematic diagram of image acquisition system. (b) Accepted lens light-spot image. (c) Rejected lens light-spot image. (d) Collimating lens optical evaluation equipment.

Download Full Size | PDF

In the image acquisition system, the light emitted by laser diode is a modulated Gaussian beam with a wavelength of 635 nm. A sleeve is coaxial with the light beam. The front end of the sleeve is used to clamp a collimating lens to be detected, and the rear end of the sleeve is an outlet of the light beam. The sleeve is fixed on a displacement adjuster, which is used to control the axial movement of the sleeve, so as to control the distance between the lens and light source for realizing the appropriate matching of focal length. After sleeve, the light beam is projected onto an imaging screen, which is placed 15 m away from the lens, and its center is coaxial with the light beam. The lens light-spot image is displayed on the imaging screen. The imaging screen has good light transmittance, so the satisfactory imaging pattern can be seen from the back of the imaging screen. Therefore, a camera is placed on the back of the imaging screen, and the camera is also coaxial with the light beam. A filter is installed in front of the camera lens. The filter only allows the red light with wavelengths of 625-665 nm to pass through, which is used to reduce the interference of ambient light on the imaging effect. Finally, the image data is transmitted to a computer for saving in real time.

The industrial camera has 6 million valid pixels with Sony IMX178 rolling shutter CMOS sensor, and the highest image resolution is 3072×2048. A total of 600 collimating lenses were selected for image acquisition of LLI dataset. Each time the lens is clamped, it is rotated at a random angle, and each lens would be clamped three times to test the optical performance of rotational symmetry. The camera is set to continuously expose and capture 5 images at an interval of 3 seconds. Therefore, each lens produces 15 images. With 600 lenses, a total of 9000 images are collected for LLI dataset.

3.2 Labeling of optical performance evaluation

After image acquisition, each image should be assigned a label for building LLI dataset. Figure 3(b) and Fig. 3(c) present the typical lens light-spot images, and they correspond to different lens optical performance evaluations. The pattern in each lens light-spot image is mainly composed of a central oval highlight area and some stripes distributed on the left and right. The morphology and distribution of the central oval highlight areas and stripes in these lens light-spot images are main features for characterizing optical performance evaluation of different classifications.

(1) Label = 1. Figure 3(b) presents the lens light-spot image corresponding to accepted optical performance evaluation. It has regular and symmetrical fringe distribution, uniform spacing between stripes and clear fringe boundaries. The shape of the central oval highlight area is regular, and there is no stray light above and below the central oval highlight area.
(2) Label = 0. Figure 3(c) presents the lens light-spot image corresponding to rejected optical performance evaluation. The stripes with poor symmetry are too thick, and the primary stripe is connected with the central oval highlight area. In addition, stray light is distributed above and below the central oval highlight area.

In the process of labeling, the distribution rules of the features of these images can be found and used. When the lens is at a same clamping angle, there are only slight differences in the 5 images captured by a camera. These subtle differences may come from light jitter caused by inevitable vibration of worktable, and they would not affect the optical performance evaluation classification represented by image morphology, but can be used to enhance robustness of the training model. However, there are a small number of samples with poor rotational symmetry, and the lens light-spot image morphological features are quite different at different clamping angles. Therefore, in the labeling work, 9000 images correspond to about 1800 label groups.

Furthermore, in order to improve the effectiveness and accuracy of scoring, the labeling process adopts the expert evaluation method. Each sample is scored by 10 engineers separately, and then the sum of 10 scores of each sample is counted. The sum of the 10 scores of a sample numbered N is recorded as Sum(N). The following is the method to evaluate the scoring consistency.

(a) Sum(N) ≤ 2, the consistency is acceptable, Label = 0;
(b) Sum(N) ≥ 8, the consistency is acceptable, Label = 1;
(c) 2 < Sum(N) < 8, the consistency is rejected.

If the consistency of the 10 scores is rejected, the label value of this sample needs to be determined through group discussion. If there is still any dispute after discussion, the lens shall be handed over to product application department for double testing. Table 1 presents the distribution of LLI dataset after labeling. Among the 9000 images, 7380 images labelled 1 while 1620 images labelled 0. The percentage of samples with rejected optical performance evaluation is relatively small. For model classification, LLI dataset has the problem of unbalanced distribution of positive and negative samples. Therefore, in addition to data cleaning, appropriate data augmentation through image processing should also be considered.

Table 1. Distribution of LLI dataset

View Table | View all tables in this article

4. Image preprocessing

Images are the information carriers of machine vision. Image analysis and processing are the key technologies for automatically obtaining understanding of the images acquired by hardware in machine vision systems [24]. The purpose of image processing is to enable the machine to understand the image better and prepare for the next step of image analysis [9]. General image processing methods include: scaling, cropping, rotation, flipping, filling, noise addition, gray transformation, linear transformation, affine transformation, brightness transformation, saturation transformation, contrast transformation, data centralization and data normalization, so as to improve the efficiency of model training and enhance the generalization ability of model application [25].

For LLI dataset, some image processing operations are carried out before inputting the image data into deep convolution neural networks. Feature location, image cropping, image filtering and data normalization are introduced in this section.

4.1 Feature location and image cropping

In an original image with a resolution of 3072×2048 pixels, lens light-spot pattern is the main area to characterize lens optical performance. However, the lens light-spot pattern only occupies a small area. In this case, model training may be time-consuming and difficult. Therefore, feature location and image cropping are necessary processes.

Feature location is to find the position of lens light-spot pattern in the original image, and image cropping is to cut off the redundant background area around the lens light-spot pattern. Figure 4 presents the schematic diagram of lens light-spot pattern location and image cropping. In order to improve the efficiency of image processing and expand the diversity of dataset, cropped image A and cropped image B are obtained successively through the image cropping method. Image A is a primary cropped image, and most of the stripes are retained around the light-spot pattern center. Image B is a second-round cropped image, with about 5 stripes around light-spot pattern center.

(1) Primary cropping. To get a cropped image A based on the original image, S1, S2 and S3 are required, as shown in Fig. 4. Firstly, the first step S1 is to obtain basic information of the original image, including image width W and height H, and obtain position coordinates (X, Y) of the pixel point with the maximum gray value in the image. The second step S2 is the primary calculation to obtain the coordinates of the upper left boundary point (X_AL, Y_AU) and the lower right boundary point (X_AR, Y_AD) of the cropped image: $(1)$${X_{\textrm{AL}}} = X - ({H/5} ),{Y_{\textrm{AU}}} = Y - 0.8({H/5} );$$$ $(2)$${X_{\textrm{AR}}} = X + ({H/5} ),{Y_{\textrm{AD}}} = Y + 1.2({H/5} ).$$$

However, considering the special case that lens light-spot pattern is located at the edge of the original image, the boundary verification is necessary. The third step S3 is boundary verification to ensure that the cropped image has a uniform size. If X_AL < 0, it indicates that the leftmost point is beyond the boundary of the original image, then reset the X axis coordinates: X_AL = 0, X_AR = 2(H/5). If X_AR > W, it indicates that the rightmost point is beyond the boundary of the original image, then reset the X axis coordinates: X_AL = W - 2(H/5), X_AR = W. If Y_AU < 0, it indicates that the uppermost point is beyond the boundary of the original image, then reset the Y axis coordinates: Y_AU = 0, Y_AD = 2(H/5). If Y_AD > H, it indicates that the downside point is beyond the boundary of the original image, then reset the Y axis coordinates: Y_AU = H - 2(H/5), Y_AD = H. After S3 boundary verification, coordinates (X_AL, Y_AU) and (X_AR, Y_AD) delimit the area from the original image to obtain a cropped image A. The cropped image A, with a resolution of 768×768 pixels, is used to build a sub dataset, called LLIc1.

(2) Second round cropping. The fourth step S4 is to obtain the basic information of the cropped image A, including image width W_A and height H_A, and obtain position coordinates (X_A, Y_A) of the pixel point with the maximum gray value in the cropped image A. Then, the calculation principles of S5 and S6 are similar to S2 and S3, but some coefficients are changed, as shown in Fig. 4. Therefore, the details of S5 and S6 will not be repeated here. After S6 boundary verification, coordinates (X_BL, Y_BU) and (X_BR, Y_BD) delimit the area from the cropped image A to obtain a new cropped image B. The cropped image B, with a resolution of 256×256 pixels, is used to build another sub dataset, called LLIc2.

Fig. 4. Schematic diagram of lens light-spot pattern location and image cropping.

Download Full Size | PDF

After all images of LLI dataset have been cropped, two sub datasets, LLIc1 and LLIc2, with the same scale as the original LLI dataset are obtained respectively. For the cropped images in LLIc1 and LLIc2, the proportion of light-spot pattern feature area is larger, which is conducive to the training, deployment and application of the deep convolution neural network model. In LLIc1, each image has a large field of vision and contains more stripe information. In LLIc2, each image has a small field of vision and focuses more on the central area of the light-spot pattern. In model training, LLIc1 and LLIc2 can play different and complementary roles.

4.2 Image filtering and data normalization

Filtering is an effective way for data augmentation [26]. Performing some effective filtering operations on training data could help the trained model learn more generalized features, improve robustness of the model, make the model more capable of dealing with special environments such as noise interference and abnormal imaging. For LLI dataset, the filtering methods used include mean filter, median filter, bilateral filter, fuzzy filter, sharpening filter, boundary enhancement and Gaussian blur. In addition, the random Gamma transform is added to achieve image enhancement by improving dark details and adjust brightness based on nonlinear transformation.

To realize the balanced distribution of positive and negative samples in the dataset, a larger augmentation ratio is implemented for the data labelled 0. It should be noted that this method of balancing the distribution of data sets may cause over fitting of model training, which should be considered in the design of model. After data cleaning and data augmentation, about 22% of the samples from positive and negative samples are randomly selected as the test set, respectively. With the same data cleaning and data augmentation, the data distribution of LLIc1 and LLIc2 can be consistent, as shown in Table 2.

Table 2. Distribution of LLIc1 and LLIc2

View Table | View all tables in this article

The data standardization is used to obtain data subject to standard normal distribution, with mean value is 0 and standard deviation is 1. It can eliminate the errors caused by different dimensions, data variation or large difference in values. In the field of computer vision, the mean value and standard deviation on a large-scale ImageNet dataset are: Mean (ImageNet) = (0.485, 0.456, 0.406), Std (ImageNet) = (0.229, 0.224, 0.225). The three numbers in the Mean (ImageNet) or Std (ImageNet) refer to corresponding values in R, G and B color channels in turn. These values were calculated from millions of images [27,28]. Therefore, these values are usually used for normalization in the training of some models. However, considering the particularity of lens light-spot images, the mean value and standard deviation should be recalculated according to LLI dataset. The results are as follows: Mean (LLIc1) = (0.1211, 0.0084, 0.0096), Std (LLIc1) = (0.1744, 0.0875, 0.0710); Mean (LLIc2) = (0.3916, 0.0550, 0.0493), Std (LLIc2) = (0.2912, 0.2204, 0.1837). LLIc1 refers to the sub dataset composed of images that have been cropped once, and LLIc2 refers to the sub dataset composed of images that have been cropped twice, as shown in Fig. 4.

By comparison with ImageNet, it can be found that the mean value and standard deviation of LLI decreased significantly. In particular, the change of LLIc1 is more obvious. The mean value in R channel of LLIc1 is reduced to one quarter of it of ImageNet. However, for channel G and channel B, the mean values of LLIc1 are two orders of magnitude less than them of ImageNet. The reason for this huge difference is that the images in LLIc1 sub dataset have a large black background area. For LLIc2, the mean value and standard deviation in R channel change little. For channel G and channel B, the mean values of LLIc2 are one order of magnitude less than them of ImageNet. Compared with LLIc1, the images in LLIc2 have completed a second-round cropping, so each image have a larger proportion of light-spot pattern area with higher brightness in R channel.

Normalization is needed before standardization. The variable X₀ (0 ≤ X₀ ≤ 255) refers to the value of each pixel of each channel in a RGB image. By normalization, each pixel value is divided by 255:

(3)$${X_1} = {X_0}/255\textrm{ }(0 \le {X_1} \le 1).$$

Standardization is to subtract the mean value (Mean) from X₁ and divide by the standard deviation (Std):

(4)$${X_2} = ({{X_1} - Mean} )/Std.$$

The standardized data conforms to the standard normal distribution, which makes the model converge quickly.

5. Intelligent evaluation model

Deep learning methods can be employed to automatically extract and combine the essential feature information of objects, and they are especially adept at image classification [29,30]. The CNN is the most popular architecture for image classification [31]. Based on CNN architecture, a Dual-branch Structure Light-spot Evaluation (DSLE) model for lens optical performance is designed and trained in this section. Through the visual analysis of the trained model, the information heat-maps of region concerned by model feature extraction are then obtained and discussed. A weighted multi-model voting strategy is proposed to further improve the model performance. Finally, the application software is developed to realize the deployment and application of intelligent evaluation model in collimating lens production.

5.1 Model design and training

The main structure of DSLE model includes deep Convolutional Neural Networks (CNNs) and Full Connection (FC) layers [30,32]. For industrial applications, such as mobile devices and embedded devices, the computing power may be limited, so it is important to build a small and low latency model. MobileNet is one of the network structures proposed to make CNNs leave laboratory and move towards industrial applications [33,34]. As a classic expression of lightweight network, MobileNet is widely used in industry since it was proposed. MobileNet is based on a streamlined architecture, which uses depthwise separable convolution to build lightweight deep neural networks. The depthwise separable convolution effectively improves the calculation efficiency of convolution layers, thus the running speed is greatly improved.

EfficientNet is one of the representative CNNs for classification tasks with excellent performance, and the structure is generated and optimized by neural architecture search [35]. EfficientNet adopts MBConvBlock modules, and the attention mechanism in MBConvBlock modules can be used to make the network pay more attention to important details. Moreover, EfficientNet adopts an efficient compound model scaling to weigh the network depth, network width and input picture resolution, taking into account the speed and accuracy. Hence, it can achieve high accuracy and save computing resources.

As an intelligent evaluation model based on machine vision and oriented to on-line detection and sorting in the actual production, it not only needs high classification accuracy, but also requires high real-time performance. The proposed DSLE model combines MobileNet and EfficientNet, as shown in Fig. 5. In DSLE model, two sub datasets LLIc1 and LLIc2 are used for model training. The input ports of dual-branch structure include Input I and Input II. Different combinations of sub datasets and input ports form four input models, including M1, M2, M3 and M4. For an input model, each pair of samples from two sub datasets corresponds to same original image before cropping. The two backbone CNNs are MobileNet-V2 and EfficientNet-B2, corresponding to Input I and Input II, respectively. The data output by CNNs enters two parallel data processing streams composed of FC layers, and then integrates into a same data processing stream composed of FC layers. Multiple FC layers can increase the number of neurons to improve the model complexity, nonlinear expression ability and learning ability. One-dimensional array [P₀, P₁] refers to the output of the last FC layer. The index position IP of the maximum value in array [P₀, P₁] is the final output of the model and refers to the predicted value for the optical performance evaluation, where IP = 0 refers to rejected and IP = 1 refers to accepted. In the training, a Loss value is calculated by comparing the final output IP with corresponding label, and back propagation (BP) is used to update network parameters.

Fig. 5. Schematic diagram of DSLE model.

Download Full Size | PDF

In this work, all the training and testing experiments have been performed with Python implementation by using a computer configured with Intel Core i9-10920X 3.50 GHz CPU and 16 GB RAM. In addition, a Nvidia TITAN RTX GPU was used to speed up the training process. Table 3 presents the training parameters and test results of DSLE model.

Table 3. Training parameters and test results of DSLE model

View Table | View all tables in this article

The test results show that the trained models have high real-time performance with a test speed of about 49 fps. Compared with the other three modes, M4 achieves better accuracy, sensitivity and False Negative Rate (FNR). High sensitivity or low FNR, indicates that the little probability of actually rejected lens performance that is misjudged as accepted. Each image in LLIc1 has a large field of vision and contains more stripe information, while each image in LLIc2 has a small field of vision and focuses more on the central area of light-spot pattern. The fusion input of LLIc1 and LLIc2 enables the model to obtain the features of different visual fields, which can improve the feature analysis and recognition ability of trained model. M2 performed better for specificity and False Positive Rate (FPR). High specificity or low FPR, indicates that the little probability of actually accepted lens performance that is misjudged as rejected. In addition, the results show that FNR is higher than FPR. Although data preprocessing balances the distribution of positive and negative samples, the balance achieved by data augmentation may lead to over fitting in training, resulting in poor performance of sensitivity and FNR indicators.

5.2 Visualization of feature extraction

The visualization of model feature extraction shows which features the deep learning has learned in the training, and which features the model is more interested in a target image. Figure 6 presents some typical examples of feature extraction visualization. These heat-maps are realized by Gradient-weighted Class Activation Mapping (Grad-CAM) method [36].

Fig. 6. Examples of feature extraction visualization. (a) ∼ (f) Lens light-spot images with accepted optical performance evaluation. (a) Common view of a lens light-spot image. (b) The position of lens light-spot pattern deviates from the image center. (c) The image size is large, but the proportion of lens light-spot pattern morphological feature area is small. (d) The lens light-spot pattern is incomplete. (e) The upper region of the lens light-spot pattern is blocked. (f) The left region of the lens light-spot pattern is blocked. (g) ∼ (h) Lens light-spot images with rejected optical performance evaluation. (i) Blank control.

Download Full Size | PDF

In each subfigure in Fig. 6, the top is the feature extraction visualization heat-map, and the bottom is the corresponding lens light-spot image. For a pair of images, each pixel in the heat-map corresponds to the pixel at the same position in the lens light-spot image according to X and Y coordinates. In the heat-maps, the red area corresponds to the features highly concerned by the model, while the blue area corresponds to the features slightly concerned by the model. From the examples of feature extraction visualization, it can be seen that the trained model effectively pays attention to the main morphological features in the lens light-spot images.

The center of lens light-spot pattern is an oval highlight area, and the morphology of stripes around the oval highlight area is the main factor to characterize the difference in lens optical performance evaluation. In Fig. 6(a), it is a common view of a lens light-spot image with accepted optical performance evaluation. The trained model is more interested in the optical stripe morphological features on the left and right sides of oval highlight area, especially the upper and lower regions of stripes on the right side of oval highlight area. From Fig. 6(b) ∼ (f), it can be seen that the trained model can correctly find and pay attention to the main morphological features for some special cases, such as, the proportion of lens light-spot pattern in the picture is small, the position of lens light-spot pattern is on the edge of the picture, and the lens light-spot pattern is incomplete. For lens light-spot images with rejected optical performance evaluation, as shown in Fig. 6(g) and Fig. 6(h), the trained model pays more attention to the stripe morphology on the top of oval highlight area. For blank control, as shown in Fig. 6(i), the trained model could not find regular stripe morphology, resulting in the failure to find the area of interest.

5.3 Weighted multi-model voting strategy

During the training of DSLE model, it can be found that the characteristics of some samples are obvious, and the mapping relation between lens light-spot image features and optical performance evaluation is closely relevant. Therefore, the trained model could well predict the classification results. However, there are some samples with unclear characteristics, and the mapping relation between lens light-spot image features and optical performance evaluation is poor. In this case, it is difficult for the trained model to perform correct prediction. Therefore, a Weighted Multi-model Voting Strategy (WMVS) is proposed based on the idea of partitioning training data in this study. Figure 7 presents the schematic diagram of WMVS.

Fig. 7. Schematic diagram of weighted multi-model voting strategy.

Download Full Size | PDF

In the model training stage, the WMVS configures three DSLE models to realize the hierarchical training mode of multi-model coupling. Firstly, the whole training data is input to Model (a) for training after image preprocessing, and each loss value of the third training Epoch is recorded. The median of these loss values is taken as the threshold Q of the data distributor. In the data distributor, the loss value of each sample is compared with threshold Q. If the loss value is less than the threshold Q, the corresponding sample is allocated to Data Part I. If the loss value is greater than or equal to threshold Q, the corresponding sample is allocated to Data Part II. In this way, the training data is divided into two parts with approximate scale. Then, Data Part I and Data Part II are used to train Model (b) and Model (c) respectively. After training, the trained Model (a), Model (b) and Model (c) are obtained.

In the model test stage, each test sample is input to Model (a), Model (b) and Model (c) in parallel. The output P of each model is a one-dimensional array [P₀, P₁], where P₀ represents the probability that the corresponding prediction is rejected, and P₁ represents the probability that the corresponding prediction is accepted. If the model has strong feature recognition and classification ability for a sample, there will be significant difference between P₀ and P₁. In most cases, the classification results represented by P(a), P(b) and P(c) from Model (a), Model (b) and Model (c) are consistent. In case of divergence among the three models, if one model recognizes effective features, it can output P₀ and P₁ with significant difference, while the other two models may output P₀ and P₁ with small difference. According to the weighted voting strategy in WMVS, the fused one-dimensional array WP = [WP₀, WP₁] is obtained. P(a), P(b) and P(c) are weighted W(a), W(b) and W(c) respectively. WP₀ and WP₁ are calculated by (5):

(5)$$\begin{array}{l} W{P_0} = W(a ){P_0}(a )+ W(b ){P_0}(b )+ W(c ){P_0}(c ),\\ W{P_1} = W(a ){P_1}(a )+ W(b ){P_1}(b )+ W(c ){P_1}(c ).\end{array}$$

The index position WIP of the maximum value in array [WP₀, WP₁] is the final output of WMVS and refers to the predicted value for the optical performance evaluation, where WIP = 0 refers to rejected and WIP = 1 refers to accepted. For a tested sample, the fused one-dimensional array WP and final output WIP will depend more on the model with strong recognition ability, which plays a leading role in this prediction of WMVS.

Different weight combinations may bring different final results. Traverse all weight combinations under a resolution to obtain a list containing multiple test accuracies corresponding to each weight combination. Finally, choose the combination corresponding to the best test accuracy in the list as the weights of P(a), P(b) and P(c). The selected weight combination may not be beneficial to every sample, but it can improve the performance of the whole test statistically. Therefore, WMVS can effectively reduce the false negative rate and false positive rate caused by over fitting of a single model, and further improve the prediction accuracy.

Table 4 presents the training parameters and test results of models in WMVS. In the experiment, the threshold Q of the data distributor is 6.54×10⁻⁵. The whole training data contains 17536 samples. After data distributor processing, 8766 samples are used for Model (b) training, and 8770 samples are used for Model (c) training. Each sample consists of two images from LLIc1 and LLIc2 respectively. For training, each sample also contains a label. The best accuracy of Model (a), Model (b) and Model (c) is 97.28%, 96.91%, and 96.84%, respectively. The training time of each model is about 150-160 minutes. In addition, it takes 29 minutes to find the best weight combination through traversal and comparison. Based on the selected weight combination, the test results of WMVS are obtained, as shown in Table 5. The best accuracy obtained by WMVS is 98.89%. The corresponding sensitivity and specificity are 97.97% and 99.41%, respectively. Compared with the results of DSLE model shown in Table 3, the comprehensive performance represented by multiple indicators is improved. However, the test speed is significantly reduced. If there is no strict real-time requirement, WMVS method can be preferred to achieve better accuracy.

Table 4. Training parameters and test results of models in WMVS

View Table | View all tables in this article

Table 5. Test results of WMVS

View Table | View all tables in this article

5.4 Application software development

Based on the trained model, an intelligent evaluation application software of collimating lens optical performance is developed. Figure 8 presents the operation interface of developed application software.

Fig. 8. Operation interface of developed application software.

Download Full Size | PDF

On the left side of the interface is the original image captured by a camera, and on the right side is the target image for analysis. The main functions and operations of the software include Start Camera, Capture Image, AI Evaluation, Load Locally, View Log and Auto Mode. The bottom of the interface is the information feedback area. Figure 8 presents an example of the evaluation report, showing that the test serial number is 1 and the Optical Performance Evaluation is Qualified, and the test time is attached at the end.

In actual production, a lens shall be manually clamped on the collimating lens optical evaluation equipment, as shown in Fig. 3(d). Consequently, Start Camera, Capture Image, AI Evaluation and Load Locally are manual operation, which can assist the user to improve detection accuracy. Furthermore, an automatic clamping device will be developed to realize real-time automatic detection, corresponding to Auto Mode in the software interface.

6. Conclusions

Intelligent detection technology based on machine vision and deep learning can be employed to improve the performance of optical components. In order to develop an intelligent evaluation model for collimating lens optical performance classification, a light-spot image acquisition system based on machine vision is developed. Through this system, a lens light-spot image dataset is built. The proposed dataset is named LLI, which contains 9000 collimating lens light-spot images with corresponding labels. Through feature location and image cropping, two sub datasets with different characteristics, LLIc1 and LLIc2, are obtained respectively. Then, a dual-branch structure light-spot evaluation (DSLE) model based on deep learning is proposed for lens optical performance. Experimental results show that DSLE model can achieve better performance through the input model M4 (LLIc2-Input I, LLIc1-Input II). Through the visualization of feature extraction, the region of interest of model feature extraction is analyzed. Results reveal that the trained model is more interested in the optical stripe morphological features on the left and right sides of oval highlight area, especially the upper and lower regions of stripes on the right side of oval highlight area. In order to further improve the accuracy and robustness of model, a weighted multi-model voting strategy is proposed. Combined with the proposed WMVS, the performance of the model is further improved, and the classification accuracy reaches up to 98.89%. Finally, an intelligent evaluation application software of collimating lens optical performance is developed based on the trained model. Through the application software, the proposed dual-branch structure light-spot evaluation model can be well applied to the quality inspection in collimating lens production. The model and application software developed in this study are scalable. After training with the corresponding datasets, the proposed DSLE model can also be applied in other related evaluation and classification projects, such as roughness form error and surface defects.

Funding

National Natural Science Foundation of China (52035009).

Acknowledgments

The authors express their sincere thanks to Xumu Jiang for the support on the experiments.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available from the corresponding author upon a reasonable request.

References

1. C. Liu and H. Gross, “Numerical optimization strategy for multi-lens imaging systems containing freeform surfaces,” Appl. Opt. 57(20), 5758–5768 (2018). [CrossRef]

2. Z. Zhu, S. Wei, Z. Fan, and D. Ma, “Freeform illumination optics design for extended LED sources through a localized surface control method,” Opt. Express 30(7), 11524–11535 (2022). [CrossRef]

3. F. Z. Fang, X. D. Zhang, A. Weckenmann, G. X. Zhang, and C. Evans, “Manufacturing and measurement of freeform optics,” CIRP Ann. 62(2), 823–846 (2013). [CrossRef]

4. F. Z. Fang, N. Zhang, and X. D. Zhang, “Precision injection molding of freeform optics,” Adv. Opt. Technol. 5(4), 303–324 (2016). [CrossRef]

5. J.-J. Chen and C.-T. Lin, “Freeform surface design for a light-emitting diode-based collimating lens,” Opt. Eng. 49(9), 093001 (2010). [CrossRef]

6. G. E. Romanova and X. Qiao, “Composition of collimating optical systems using aberration theory,” J. Opt. Technol. 88(5), 274–281 (2021). [CrossRef]

7. Y. Zhang, G. P. Yan, Z. X. Li, and F. Z. Fang, “Quality improvement of collimating lens produced by precision glass molding according to performance evaluation,” Opt. Express 27(4), 5033–5047 (2019). [CrossRef]

8. X. L. Liu, X. D. Zhang, F. Z. Fang, and Z. Zeng, “Performance-controllable manufacture of optical surfaces by ultra-precision machining,” Int J Adv Manuf Technol 94(9-12), 4289–4299 (2018). [CrossRef]

9. Z. H. Ren, F. Z. Fang, N. Yan, and Y. Wu, “State of the art in defect detection based on machine vision,” Int. J. of Precis. Eng. and Manuf.-Green Tech. 9(2), 661–691 (2022). [CrossRef]

10. T. Wang, Y. Chen, M. Qiao, and H. Snoussi, “A fast and robust convolutional neural network-based defect detection model in product quality control,” Int J Adv Manuf Technol 94(9-12), 3465–3471 (2018). [CrossRef]

11. D. H. Kim, T. J. Kim, X. L. Wang, M. Kim, Y. J. Quan, J. W. Oh, S. H. Min, H. J. Kim, B. Bhandari, and I. Yang, “Smart machining process using machine learning: A review and perspective on machining industry,” Int. J. of Precis. Eng. and Manuf.-Green Tech. 5(4), 555–568 (2018). [CrossRef]

12. J. Stempin, A. Tausendfreund, D. Stöbener, and A. Fischer, “Roughness measurements with polychromatic speckles on tilted surfaces,” Nanomanuf. Metrol. 4(4), 237–246 (2021). [CrossRef]

13. G. D. Wu, M. Zhu, Q. Y. Jiang, and X. Sun, “Fast-axis collimating lens recognition algorithm based on machine vision,” in Journal of Physics: Conference Series, (IOP Publishing, 2021), 012157.

14. C. F. J. Kuo, W. C. Lo, Y. R. Huang, H. Y. Tsai, C. L. Lee, and H. C. Wu, “Automated defect inspection system for CMOS image sensor with micro multi-layer non-spherical lens module,” Journal of Manufacturing Systems 45, 248–259 (2017). [CrossRef]

15. Y. Mizutani, S. Kataoka, T. Uenohara, and Y. Takaya, “Ghost imaging with deep learning for position mapping of weakly scattered light source,” Nanomanuf. Metrol. 4(1), 37–45 (2021). [CrossRef]

16. S. H. Huang and Y. C. Pan, “Automated visual inspection in the semiconductor industry: A survey,” Computers in Industry 66, 1–10 (2015). [CrossRef]

17. J. J. Wang, Y. L. Ma, L. B. Zhang, R. X. Gao, and D. Z. Wu, “Deep learning for smart manufacturing: Methods and applications,” Journal of Manufacturing Systems 48, 144–156 (2018). [CrossRef]

18. X. Mao, J. Li, F. Wang, R. Gao, X. Li, and Y. Xie, “Fast design method of smooth freeform lens with an arbitrary aperture for collimated beam shaping,” Appl. Opt. 58(10), 2512–2521 (2019). [CrossRef]

19. J.-J. Chen, T.-Y. Wang, K.-L. Huang, T.-S. Liu, M.-D. Tsai, and C.-T. Lin, “Freeform lens design for LED collimating illumination,” Opt. Express 20(10), 10984–10995 (2012). [CrossRef]

20. W. D. Liu and L. C. Zhang, “Thermoforming mechanism of precision glass moulding,” Appl. Opt. 54(22), 6841–6849 (2015). [CrossRef]

21. Y. Liu, Y. Xing, C. Li, C. Yang, and C. Xue, “Analysis of lens fracture in precision glass molding with the finite element method,” Appl. Opt. 60(26), 8022–8030 (2021). [CrossRef]

22. J. Xie, T. Zhou, B. Ruan, Y. Du, and X. Wang, “Effects of interface thermal resistance on surface morphology evolution in precision glass molding for microlens array,” Appl. Opt. 56(23), 6622–6630 (2017). [CrossRef]

23. L. Zhang and A. Y. Yi, “Investigation of mid-infrared rapid heating of a carbide-bonded graphene coating and its applications in precision optical molding,” Opt. Express 29(19), 30761–30771 (2021). [CrossRef]

24. C. Steger, M. Ulrich, and C. Wiedemann, Machine Vision Algorithms and Applications (John Wiley & Sons, 2018).

25. M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis, and Machine Vision (Cengage Learning, 2014).

26. A. K. Jain, Fundamentals of Digital Image Processing (Prentice-Hall, Inc., 1989).

27. J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2009), 248–255.

28. M. Wang, Z. Zhuang, K. Wang, S. Zhou, and Z. Liu, “Intelligent classification of ground-based visible cloud images using a transfer convolutional neural network and fine-tuning,” Opt. Express 29(25), 41176–41190 (2021). [CrossRef]

29. Z. Q. Zhao, P. Zheng, S. T. Xu, and X. D. Wu, “Object detection with deep learning: A review,” IEEE Trans. Neural Netw. Learning Syst. 30(11), 3212–3232 (2019). [CrossRef]

30. Y. Lecun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

31. M. Kavitha, R. Gayathri, K. Polat, A. Alhudhaif, and F. Alenezi, “Performance evaluation of deep e-CNN with integrated spatial-spectral features in hyperspectral image classification,” Measurement 191, 110760 (2022). [CrossRef]

32. Q. R. Zhang, M. Zhang, T. H. Chen, Z. F. Sun, Y. Z. Ma, and B. Yu, “Recent advances in convolutional neural network acceleration,” Neurocomputing 323, 37–51 (2019). [CrossRef]

33. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2018), 4510–4520.

34. A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, and V. Vasudevan, “Searching for mobilenetv3,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (IEEE, 2019), 1314–1324.

35. M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International Conference on Machine Learning (PMLR, 2019), 6105–6114.

36. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE International Conference on Computer Vision (IEEE, 2017), 618–626.

Lens samples	Clamping angles per lens	Captured images per angle	All images	Label = 0		Label = 1
Lens samples	Clamping angles per lens	Captured images per angle	All images	Images	Percentage	Images	Percentage
600	3	5	9000	1620	18%	7380	82%

LLIc1 / LLIc2	Images	Label = 0		Label = 1
LLIc1 / LLIc2	Images	Images	Percentage	Images	Percentage
Test set	4967	2454	49.41%	2513	50.59%
Training set	17536	8676	49.48%	8860	50.52%
Total	22503	11130	49.46%	11373	50.54%

Mode	Dataset		Training				Test
Mode	Input I	Input II	Batch size	Learning rate	Epochs	Time (min)	Speed (fps)	Accuracy (%)	Sensitivity (%)	FNR (%)	Specificity (%)	FPR (%)
M1	LLIc1	LLIc1	12	1×10⁻⁵	20	210	48.94	96.54	94.82	5.18	98.21	1.79
M2	LLIc2	LLIc2	12	1×10⁻⁵	20	208	49.92	96.92	94.62	5.38	99.16	0.84
M3	LLIc1	LLIc2	12	1×10⁻⁵	20	209	49.18	97.02	95.97	4.03	98.05	1.95
M4	LLIc2	LLIc1	12	1×10⁻⁵	20	209	49.18	97.78	96.90	3.10	98.61	1.39

Lens samples	Clamping angles per lens	Captured images per angle	All images	Label = 0		Label = 1
Lens samples	Clamping angles per lens	Captured images per angle	All images	Images	Percentage	Images	Percentage
600	3	5	9000	1620	18%	7380	82%

LLIc1 / LLIc2	Images	Label = 0		Label = 1
LLIc1 / LLIc2	Images	Images	Percentage	Images	Percentage
Test set	4967	2454	49.41%	2513	50.59%
Training set	17536	8676	49.48%	8860	50.52%
Total	22503	11130	49.46%	11373	50.54%

Intelligent evaluation for lens optical performance based on machine vision

Abstract

1. Introduction

2. Lens production and evaluation

3. Lens light-spot image dataset

3.1 Image acquisition

3.2 Labeling of optical performance evaluation

4. Image preprocessing

4.1 Feature location and image cropping

4.2 Image filtering and data normalization

5. Intelligent evaluation model

5.1 Model design and training

5.2 Visualization of feature extraction

5.3 Weighted multi-model voting strategy

5.4 Application software development

6. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (5)

Equations (5)

Optics Express

Model	Data size^a		Training				Test
Model	Training	Test	Batch size	Learning rate	Epochs	Time (min)	Speed (fps)	Accuracy (%)
Model (a)	17536	4967	1	1×10⁻⁴	5	162	49.15	97.28
Model (b)	8766	4967	4	1×10⁻⁴	10	151	49.14	96.91
Model (c)	8770	4967	4	1×10⁻⁴	10	151	49.14	96.84