Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Unsupervised anomaly detection of MEMS in low illumination based on polarimetric Support Vector Data Description

Open Access Open Access

Abstract

Low illuminated images make it challenging to conduct anomaly detection on material surface. Adding polarimetric information helps expand pixel range and recover background structure of network inputs. In this letter, an anomaly detection method in low illumination is proposed which utilizes polarization imaging and patch-wise Support Vector Data Description (SVDD) model. Polarimetric information of Micro Electromechanical System (MEMS) surface is captured by a division-of-focal- plane (DoFP) polarization camera and used to enhance low illuminated images. The enhanced images without defects serve as training sets of model to make it available for anomaly detection. The proposed method can generate heatmaps to locate defects correctly. It reaches 0.996 anomaly scores, which is 22.4% higher than that of low illuminated images and even higher than normal illuminated images.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Object detection and recognition in low illumination are difficult because degraded imaging conditions narrow down the pixel range and decrease the signal-to-noise ratio in acquired images [14]. It is necessary to enhance the image quality before detecting defects in it. Compared with intensity imaging, polarization imaging records the illuminance in four linear and two circular polarized directions, providing more physical information for image processing [5]. It was studied and applied for image enhancement and recovery in complex imaging environment, such as fog [6,7], smoke [8,9], and water [1013]. Therefore, polarization imaging expands the information dimensions and has more advantages in low illuminated conditions [14].

Unsupervised anomaly detection [1518] is a novel deep neural network model to detect previously unseen rare objects or events without any prior knowledge about it. Different from the target detection model, it can distinguish normal and abnormal samples without training labels, which can also be described as one-class classification [19]. One-class support vector machine (OC-SVM) [20] and support vector data description (SVDD) [21,22] are classical algorithms to solve one-class classification problems. The input normal and abnormal samples are linearly inseparable in the low-dimensional space. Encoders are trained to project them into high-dimensional space and seek for a hypersphere which contains all the normal samples. While testing, normal samples have shorter radius from center, lower anomaly scores and are located inside the hypersphere but abnormal samples have opposite attributes. Deep SVDD replaces kernel functions with a deep neural network allowing data-driven transformation [23,24]. Patch SVDD then straightly extends to patch-wise and thus introduces patch scoring and anomaly mapping methods [25]. However, for anomaly detection of low illuminated images, these SVDD-based methods perform badly and generate incorrect anomaly maps.

In this paper, we proposed a low illuminated anomaly detection method based on polarization imaging and Patch SVDD. Firstly, polarimetric information is used to enhance low illuminated images, and then conduct unsupervised anomaly detection and segmentation to the enhanced datasets. This method expanded the pixel range of low illuminated images to [0, 255] and restore the basic structure of Micro Electromechanical System (MEMS) devices. Area under the receiver operating characteristic curve (AUROC) [26] of the proposed method reaches 0.996, which is 22.4% higher than the performance of anomaly detection based on original low illuminated images.

2. Method

2.1 System setups

As shown in Fig. 1(a), the whole data acquisition system consists of three parts: division-of-focal- plane (DoFP) polarization camera, a microscope and a MEMS wafer. (1) The DoFP camera we used is PHX050S-P, connected to the computer via a network cable to set imaging parameters and transmit data. As shown in Fig. 2, inside a PolarizeMono8 format image, every $4 \times 4$ imaging grid stores the luminance of RGB channels with four polarization directions $[{I_{00}},{I_{90}},{I_{45}},{I_{135}}]$. (2) The microscope was equipped with $10 \times $ eyepiece and $20 \times $ objective lens and illuminated by a halogen lamp to obtain sufficient details. (3) MEMS wafer was placed on a stage. By moving the stage in x, y directions, every MEMS cell on the wafer could be observed. Also, such system can be extended to full Stokes Vector (FSV) estimation by adding a controllable retarder just before the camera. Orientation ${\theta _i}$ of the retarder should be modified twice to get two groups of image list $[{I_{00}},{I_{90}},{I_{45}},{I_{135}}]$ as shown in Fig. 1(b).

 figure: Fig. 1.

Fig. 1. (a) Data acquisition system (b) Image list $[{I_{00}},{I_{90}},{I_{45}},{I_{135}}]$ for full Stokes Vector estimation

Download Full Size | PDF

 figure: Fig. 2.

Fig. 2. Polarization demodulation algorithm

Download Full Size | PDF

In order to prevent the effect of the interpolation algorithm provided by the acquisition software, we turned off gain, automatic white balance and fixed the noise value to 25 dB. Considering drawbacks of uneven illumination in the imaging area, a $1280 \times 1280$ evenly illuminated region of interest was selected. The Pixel Format was set to PolarizeMono8 mentioned above. Every pixel recorded the illuminance of a single RGB channel in a single polarization direction. Some polarization images such as Stokes Vector, degree of linear polarization (DoLP), angle of polarization (AoP) can be calculated from the PolarizeMono8 images for further image enhancement.

2.2 Datasets collection

During the polarization imaging, both photon-starved images and photon-abundant images were supposed to be captured for contrast experiment. The exposure time of DoFP polarization camera was adjusted to simulate different light luminance environments. 750 $\mu s$ (shorter) was set to get low illuminated images and 25,000 $\mu s$ (normal) was set to get normal illuminated images. More specifically, the MEMS wafer was exposed in bright to get normal pictures, and then was fixed to be exposed in shorter time to get low illuminated pictures in the same areas. These two pictures would play as a pair of image data. Anomaly detection task was to train normal samples and test either normal or abnormal samples. MVTec [27] is an anomaly detection dataset containing 5354 high-resolution color images of different object and texture categories (e.g. bottle, capsule, grid). There are 15 categories with 3629 images for training and validation and 1725 images for testing. The training set contains around 200 normal images, but test set contains both normal and abnormal images. Referring to the design of the MVTec anomaly detection dataset, 240 normal image pairs were collected as training data. In addition, 80 normal and 80 abnormal image pairs were collected for validation data and test data. The ratio of training, validation, and test data is 6:2:2 as shown in Table 1. Then mask labels of abnormal images were generated by Labelme software. These labels were used for indicators calculation in validation and testing but not guide the model training.

Tables Icon

Table 1. Image numbers of datasets

2.3 Polarization demodulation algorithm

As shown in the Fig. 2, the polarization camera captured the raw data in format of PolarizeMono8, which stores luminance of RGB channels and different polarization directions in one image acquisition because of the integrated micro-polarizer. In order to calculate the polarization image, RGB channels and four polarization directions were separated and concatenated to get a RGB image list as $[{I_{00}},{I_{90}},{I_{45}},{I_{135}}]$, where ${I_{00}}$ denotes the angle between polarization and horizontal directions. ${I_{90}}$, ${I_{45}}$ and ${I_{135}}$ are defined similarly. Stokes Vector can be calculated by Eq. (1) from this image list.

$$S = {[{S_0},{S_1},{S_2},{S_3}]^T} = {[{I_{00}} + {I_{90}},{I_{00}} - {I_{90}},{I_{45}} - {I_{135}},{I_L} - {I_R}]^T}$$

In addition, we have tried to estimate the full Stokes Vector (FSV) and its circular component ${S_3}$ by combining the DoFP camera and a retarder [28] :

$$\; FSV = {W^ + } \times I$$
$${W^ + } = {({W^T} \times W)^{ - 1}} \times {W^T}$$

The $8 \times 1$ linear image matrix I is obtained by modifying the orientation ${\theta _{i,i \in \{ 1,2\} }}$ of retarder and capture twice, which is the transpose of image list $[{I_{{\theta _1}\_00}},{I_{{\theta _1}\_90}}$,${I_{{\theta _1}\_45}},{I_{{\theta _1}\_135,}}{I_{{\theta _2}\_00}}$,${I_{{\theta _2}\_90}},{I_{{\theta _2}\_45}}$,${I_{{\theta _2}\_135}}]$. And ${W^ + }$ denotes Moore-Penrose pseudo-inverse matrix of the measure matrix ${W_{8 \times 4}}$ :

$${W_{8 \times 4}} = \frac{1}{2}\left[ {\begin{array}{cccc} 1&{{{\cos }^2}2{\theta_1} + \cos \delta {{\sin }^2}2{\theta_1}}&{(1 - \cos \delta )\sin 2{\theta_1}\cos 2{\theta_1}}&{ - \sin \delta \sin 2{\theta_1}}\\ 1&{ - ({{{\cos }^2}2{\theta_1} + \cos \delta {{\sin }^2}2{\theta_1}} )}&{ - (1 - \cos \delta )\sin 2{\theta_1}\cos 2{\theta_1}}&{\sin \delta \sin 2{\theta_1}}\\ 1&{(1 - \cos \delta )\sin 2{\theta_1}\cos 2{\theta_1}}&{({{{\sin }^2}2{\theta_1} + \cos \delta {{\cos }^2}2{\theta_1}} )}&{\sin \delta \cos 2{\theta_1}}\\ 1&{ - (1 - \cos \delta )\sin 2{\theta_1}\cos 2{\theta_1}}&{ - ({{{\sin }^2}2{\theta_1} + \cos \delta {{\cos }^2}2{\theta_1}} )}&{ - \sin \delta \cos 2{\theta_1}}\\ 1&{{{\cos }^2}2{\theta_2} + \cos \delta {{\sin }^2}2{\theta_2}}&{(1 - \cos \delta )\sin 2{\theta_2}\cos 2{\theta_2}}&{ - \sin \delta \sin 2{\theta_2}}\\ 1&{ - ({{{\cos }^2}2{\theta_2} + \cos \delta {{\sin }^2}2{\theta_2}} )}&{ - (1 - \cos \delta )\sin 2{\theta_2}\cos 2{\theta_2}}&{\sin \delta \sin 2{\theta_2}}\\ 1&{(1 - \cos \delta )\sin 2{\theta_2}\cos 2{\theta_2}}&{({{{\sin }^2}2{\theta_2} + \cos \delta {{\cos }^2}2{\theta_2}} )}&{\sin \delta \cos 2{\theta_2}}\\ 1&{ - (1 - \cos \delta )\sin 2{\theta_2}\cos 2{\theta_2}}&{ - ({{{\sin }^2}2{\theta_2} + \cos \delta {{\cos }^2}2{\theta_2}} )}&{ - \sin \delta \cos 2{\theta_2}} \end{array}} \right]$$

where $\delta $ denotes phase retardance of the retarder. In practice, a narrow-band (633 $nm$) $\lambda /4$ retarder with $\delta = \pi /2$ was adopted, and ${\theta _1} = 87.0^\circ ,{\theta _2} = 177.0^\circ $ was selected. However, observed from the result of FSV estimation (Fig. 3), pattern of circular component ${S_3}$ is not periodic, and the pixel values are too small compared with other components. It could hardly contribute to the final enhanced images after denoise and MINMAX normalization. So only the linear components ${S_0}$, ${S_1}$, ${S_2}$ were detected in the following process.

 figure: Fig. 3.

Fig. 3. FSV estimation by combining DoFP polarization camera and retarder

Download Full Size | PDF

2.4 Enhancement via polarized images

After getting Stokes Vector, DoLP and AoP images was calculated by Eq. (5) and Eq. (6).

$$DoLP = \frac{{\sqrt {S_1^2 + S_2^2} }}{{{S_0}}}$$
$$AoP = \frac{1}{2}\arctan (\frac{{{S_2}}}{{{S_1}}})$$

where ${S_1}$ and ${S_2}$ are the ${2^{th}}$ and ${3^{th}}$ Stokes component defined in Eq. (1).

Noise was amplified because nonlinear calculations were introduced in generating polarization images especially in AoP images. So, polarization images were denoised and converted to grayscale images firstly. Then MINMAX normalization was applied to normalize the pixel range to [0, 1]:

$${g_{out}}(x,y) = \frac{{{g_{in}}(x,y) - Min}}{{Max - Min}}$$

where ${g_{in}}(x,y)$ and ${g_{out}}(x,y)$ denote the input and output grayscale value, and $(x,y)$ is the location of any pixel in a polarization image (DoLP or AoP). Min and Max denote the minimum and maximum value of the same image. DoLP and AoP images were added to a single polarization image ${I_p}(x,y)$:

$${I_p}(x,y) = \frac{1}{2}[DoLP(x,y) + AoP(x,y)]$$

Logarithmic transformation is defined as:

$${p_{out}}(x,y) = {\log _{(1 + \nu )}}[1 + \nu {p_{in}}(x,y)]$$

where ${p_{in}}(x,y)$ and ${p_{out}}(x,y)$ denote the input and output pixel value, and $\nu $ is a parameter that determines the effectiveness of transformation as Fig. 4 plots. The original line represents ${p_{out}}(x,y) = {p_{in}}(x,y)$. The gradient of logarithmic transformation curve is relatively larger with lower inputs and will gradually get smaller as input value increase. The contrast of the darker region with lower pixel values will increase after logarithmic transformation, so more details of the image can be extracted from the dark.

 figure: Fig. 4.

Fig. 4. Logarithmic transformation curve of different $\nu $

Download Full Size | PDF

Moreover, pseudo-color processing was applied to converted output images from grayscale to RGB ones. The flow chart of enhancement via polarized images was shown in Fig. 5. As a consequence, the polarization information was used to achieve the quality enhancement of the low illuminated images in order to make the differences between defects and backgrounds much easier to be distinguished in the samples. The overall pixel range of the enhanced image was also expanded.

 figure: Fig. 5.

Fig. 5. Flow chart of enhancement via polarized images (DoLP and AoP)

Download Full Size | PDF

2.5 Patch-SVDD training

SVDD-based methods can solve the problem of linear inseparability as shown in Fig. 6. Normal and abnormal samples are linearly inseparable in low-dimensional space. But they can be projected to high-dimensional space by kernel function or encoders, where normal samples are located inside the hypersphere but abnormal samples are located outside. SVDD project input samples by handmade kernel functions (e.g. linear kernel, polynomial kernel and Gaussian kernel) [21]. Deep SVDD project input samples by deep neural network (Encoders) which is a train by a large amount of data [23]. It aims to minimize the radius of the hypersphere, pushes the samples around the center and find a minimal hypersphere that enclose all the samples. Hence, encoders are trained to minimize the Euclidean distances between the projected samples and the center of the hypersphere by loss function ${L_{SVDD}}$ :

$${L_{SVDD}} = \sum\limits_i {||{E_\theta }({x_i}) - c|{|_2}}$$

where ${x_i}$ denotes input samples. ${E_\theta }({\cdot} )$ denotes encoder with parameters $\theta $. c is the center of hypersphere calculated before training. And $||\cdot |{|_2}$ denotes the Euclidean distances.

 figure: Fig. 6.

Fig. 6. In SVDD-based methods, normal and abnormal samples are projected to high-dimensional space by kernel function or encoders to solve the problem of linear inseparability.

Download Full Size | PDF

At test time, representation of every normal train sample is stored in the trained model. Anomaly score is calculating the distance between the input and the center c. Samples whose anomaly score are larger than the radius will be judged as abnormal.

However, Deep SVDD project and calculate anomaly scores for the whole image. It can find out abnormal images but unable to segment abnormal regions. Patch SVDD is a patch-wise extension of Deep SVDD. The original input images are divided into $n \times n$ patches (in practice, we used $32 \times 32$ and $64 \times 64$ patches). Every patch ${p_i}$ was sent to the model in place of original image ${x_i}$ defined in Eq (10). Different from Deep SVDD which calculating a single center c before training, patches from one training image are encoded to several hyperspheres with different centers automatically. Semantically similar patches are gathered to the same hypersphere. Encoders are trained to minimize the Euclidean distances between features by the following loss function ${L_{P - SVDD}}$:

$${L_{P - SVDD}} = \sum\limits_{i,i^{\prime}}^{} {||{E_\theta }({p_i}) - {E_\theta }({p_{i^{\prime}}})|{|_2}}$$
where ${p_i}$ is a patch near center ${p_{i^{\prime}}}$. In practice, self-supervised learning (SSL) is used to prevent completely consistent among patch embedding. The corresponding loss ${L_{SSL}}$ is calculated by the cross-entropy loss (CEL):
$${L_{SSL}} = CEL\{ {y_,}C[{E_\theta }({p_1}),{E_\theta }({p_2})]\} ={-} \frac{1}{N}\sum\limits_i {\sum\limits_{c = 0}^7 {\{ {y_{ic}} \cdot \log {C_{ic}}[{E_\theta }({p_1}),{E_\theta }({p_2})]} } \}$$
where N denotes the number of input training patches. As shown in Fig. 7(b), among a $3 \times 3$ patch grid, ${p_1}$ is the center patch and ${p_2}$ is any one of the surrounding patches. Classifier receives embedded tensors of ${p_1}$ and ${p_2}$ which are encoded by encoder ${E_\theta }({\cdot} )$, and predicts the relative positions $C({\cdot} , \cdot )$ between ${p_1}$ and ${p_2}$ As shown in Fig. 7(a), Indicator function y of the relative position $\{ c|0 \le c \le 7,c \in {N^ + }\} $ is 1 in the true position but 0 in the rest. Network structure of polarimetric SVDD is shown in Fig. 7(b). The $Encode{r_{small}}$ consists of 8 convolutional layers: Conv(32, 3, 2, 0) – Conv(64, 3, 1, 0) – Conv(128, 3, 1, 0) – Conv(128, 3, 1, 0) – Conv(64, 3, 1, 0) – Conv(32, 3, 1, 0) – Conv(32, 3, 1, 0) – Conv(64, 3, 1, 0). The $Encode{r_{big}}$ consists of 2 convolutional layers: Conv(128, 2, 1, 0) – Conv(64, 1, 1, 0). Conv(c, k, s, p) denotes a convolutional layer with c output channels, kernel size k ${\times} $ k, stride s and padding p. The $Classifier$ consists of 2 linear layers with 128 output features and 1 normalization layers. Activation layer is LeakyReLU with slope 0.01. Finally, the total loss combines both two losses by a scale factor $\lambda $:
$${L_{total}} = \lambda {L_{P - SVDD}} + {L_{SSL}}$$

Moreover, hierarchical encoding is added to solve the size variety:

$${e_{big}}(p) = {E_{big}}(Cat({E_{small}}({p_s})))$$

where ${E_{small}}({\cdot} )$ denotes small encoder and ${E_{big}}({\cdot} )$ denotes big encoder. $Cat({\cdot} )$ means concatenation among embedding codes. Every patch is divided into $2 \times 2$ sub-patches ${p_s}$, which are encoded by small encoder, concatenated to small embedding codes ${e_{small}}$. They are aggregately encoded by big encoder to export big embedding codes ${e_{big}}$. The receptive field of small and large encoder were set to be 32 and 64, respectively. ${e_{small}}$ and ${e_{big}}$ are used to calculated total loss ${L_{total}}$.

 figure: Fig. 7.

Fig. 7. (a) Indicator function y of relative positions 0 to 7 used in self-supervised learning. (b) SVDD network of the proposed method.

Download Full Size | PDF

We set the embedding dimension of the model as 64 to yield better inspection results. In the training, the adaptive moment estimation [29] optimizer was used to optimize the loss function, and the scale factor $\lambda $ was set to be 0.001. The batch size was 64. All input images were resized to be $3 \times 256 \times 256$, and the initial learning rate was 0.0001. We trained the model for 300 epochs on an NVIDIA GeForce RTX 2080 Ti GPU.

The flow chart of the proposed method is shown in Fig. 8. The input low illuminated polarized images were demodulated and enhanced firstly, and then were used to train the Patch SVDD model. To evaluate whether the Patch SVDD would be a good fit of MEMS dataset, ground truth images were used to train the model ahead of time. To verify the effectiveness of polarization demodulation algorithm, images without enhancement (${S_0}$ of Stokes Vector) were also used to train the model.

 figure: Fig. 8.

Fig. 8. Flow chart of proposed polarimetric SVDD

Download Full Size | PDF

3. Experimental results and discussions

Total loss and area under the receiver operating characteristic curve (AUROC) in training of polarimetric enhanced images are shown in Fig. 9. Figure 9(a) shows the plot of training loss (Train) and validation loss (Val). Both two losses go down as training epochs increase. After approximately 200 epochs, training loss does not decrease and stay steady around 0.1. Validation loss is slightly higher than training loss and does not rise anymore. In other word, the model training is stable after training for 200 epochs and the model hasn’t been overfit.

 figure: Fig. 9.

Fig. 9. Total loss and Area under the receiver operating characteristic curve (AUROC) in training of polarimetric enhanced images. (a)Training loss (Train) and validation loss (Val). (b) AUROC.

Download Full Size | PDF

It is commonly to use receiver operating characteristic curve (ROC) to describe the performance of a binary classification model, and use area under the ROC (AUROC) to quantify it [26]. In the dataset, normal samples are positive (P) and abnormal samples are negative (N). The model predicts whether input samples are positive or negative, which is either true (T) or false (F). Thus, predicted results can be described by true positive (TP), true negative (TN), false positive (FP) and false negative (FN). Given a dataset, true positive rate (TPR) means the proportion of true positive samples to all the positive samples:

$$TPR = \frac{{TP}}{{TP + FN}}$$

Also, false positive rate (FPR) means the proportion of false positive samples to all the negative samples:

$$FPR = \frac{{FP}}{{FP + TN}}$$

For a selected threshold, predictions of input samples are positive if the score given by classifier is higher than the threshold, and otherwise they are negative. TPR and FPR will change as different thresholds are selected. Multiple sets of TPR and FPR values can be obtained by constantly changing the threshold. And ROC can be plotted with FPR as x axis and TPR as y axis. Area under the ROC (AUROC) is used to judge the performance of a classifier. The closer AUROC get to 1, the stronger classification ability is. According to Wilcoxon-Mann-Witney statistic [30], if a positive sample and a negative sample are randomly selected from the dataset, AUROC can be defined as the probability of positive (normal) sample score higher than the negative (abnormal) sample score

$$AUROC[{s_\theta }] = P[{s_\theta }(abnormal) < {s_\theta }(normal)]$$
${s_\theta }$ is an abnormal score calculating function for an image or a feature map, which measures Euclidean distances $||\cdot |{|_2}$ as shown in Eq. (17):
$${s_\theta }(p) = ||{E_\theta }(p) - {E_\theta }({p_{normal}})|{|_2}$$

where p and ${p_{normal}}$ denote the input patches and saved nearest normal patches. ${E_\theta }({\cdot} )$ is the encoder with training parameters $\theta $. The trained model stores the features of patches in different locations. When it receives test images, the L2 distance between the test patch and the nearest normal patch is calculated as anomaly score of the patch. The score of the whole image ${s_\theta }^{image}$ is the maximum of all the patch scores:

$$s_\theta ^{image} = \mathop {\max }\limits_{i,j} ({e_{small}}(p) \odot {e_{big}}(p))$$

where ${e_{small}}(p)$ and ${e_{big}}(p)$ denotes the embedding code generated by small encoder and big encoder respectively. ${\odot} $ is the element-wise multiplication between two vectors.

Thus, the AUROC of test image can be calculated by Eq. (17).

Figure 9 (b) shows the changes of AUROC in model training. At the beginning, AUROC of model is about 0.55, while 0.5 is the performance of a stochastic classifier. But as the epochs increase, AUROC rises rapidly. At about 80 epochs, AUROC nearly reaches 1.0 and remains stable until the training ends. Finally, the maximum value of AUROC is 0.996.

The Fig. 10 shows the results of three groups of experiments. Rows from top to bottom show results from ground truth images (GT), low illuminated images (Low), and polarimetric enhanced images (Enhanced). Each row contains the input, mask label, model prediction and histogram of the input. The defect positions are depicted by red line in mask label. Prediction performs anomaly segmentation by drawing a heatmap via the abnormity of the patches. Darker heatmap indicates higher anomaly scores. Histogram counts the number of pixels at each gray level in the input image. The horizontal axis is the pixel value ranging from 0 to 255, and the vertical axis is the number of pixels.

 figure: Fig. 10.

Fig. 10. Labels, predictions and input histograms comparison among ground truth images (GT), low illuminated images (Low) and polarimetric enhanced images (Enhanced)

Download Full Size | PDF

Ground truth images have a wide pixel range as [0, 105], whose pixel distribution is uniform. Because of large pixel ranges and distinctive features, Patch SVDD can correctly score the abnormal patches and generate defect heatmaps while receiving polarimetric enhanced images or ground truth images. However, the pixel range of Low illuminated images is narrow and concentrate near 0. Observed from the histogram, pixels locate inside [0, 13] and most of their values are 2. Therefore, almost the whole low illuminated images are black, making differences between defects and background not obvious. Some background patches may be identified as outliers by Patch SVDD, resulting in high scores in normal predictions and difficulties of anomaly segmentation in heatmaps. After enhancement via polarized images, the pixel range of the inputs are extended to [0, 255]. Polarimetric enhanced images allow identifying the background region of periodic structure and white abnormal regions.

More anomaly detection results of normal and abnormal samples are shown in the Fig. 11. Model trained by ground truth images can exactly predict defect locations. For low illuminated images, some background regions are easy to be judged as anomalies due to the narrow pixel range and unobvious features. Polarimetric enhanced images map darker patches in defect region and are easier to detect defects. The prediction results generated by images using proposed enhanced method are mostly consistent with that from ground truth images.

 figure: Fig. 11.

Fig. 11. Comparison of predictions among GT, Low, and Enhanced, including abnormal samples a, b, c and normal samples d, e, f

Download Full Size | PDF

Normal samples that only contain periodic background were also detected as dataset of the model. It is worth noticing that ground truth normal samples can’t be accurately predicted by Patch SVDD, whose reason will be analyzed later. Low illuminated images still score incorrect in prediction, resulting in high anomaly scores in most normal regions and corresponding deep color in the heatmap. The prediction map of polarimetric enhanced images is almost white, indicating that input samples were judged as normal correctly. Combined with results of abnormal samples, proposed enhanced method meet the demand of distinguishing normal and abnormal regions in images.

Poor performance of ground truth normal samples is closely related to original design of the network. Patch SVDD divides an input image into multiple patches to make model able to score patches rather than a single image. It also needs self-supervised learning method to predict relative positions among patches. However, these methods help achieve better performance when the background is an object (e.g., bottle, toothbrush), but obtain minimal improvements in texture background (e.g., carpet, wood). When the pattern of a normal sample is repetitive and periodic, patches from different positions have almost the same features and make self-supervised learning ineffective. In conclusion, the incomplete translation-invariance of Patch SVDD leads to errors while generating heatmaps of ground truth normal samples. In comparison, low illuminated images compress the pixel range and destroy the basic periodic structures. The proposed enhanced method wider the pixel range, but not completely restore the periodic structure. Therefore, Patch SVDD is effective for polarimetric enhanced normal samples.

Table 2 shows the overall AUROC for contrary experiments we conducted. Ground truth group (GT), low illuminated group (Low) and polarimetric enhanced group (Enhanced) achieved 0.995, 0.814 and 0.996 segmentation scores, respectively. Low illuminated group scores less because of classification hardship and incorrect heatmap generating. Each of the other two groups works out almost a perfect binary classifier. Special processing of periodic structures in enhanced group leads to better performance than the ground truth group. This is also reflected in the slightly higher AUROC of polarimetric enhanced images.

Tables Icon

Table 2. Detection and Segmentation AUROC of GT, Low and Enhanced, whose inputs including feature maps generated by ${E_{big}}(p)$, ${E_{small}}(p)$ and ${E_{small}}(p) \odot {E_{big}}(p)$

4. Conclusion

In conclusion, based on the polarimetric information and unsupervised model SVDD, a low illuminated anomaly detection method is proposed especially for MEMS surface defect. Pixel range is expanded and defect-free backgrounds are reconstructed after enhancing the low illuminated images. The experiment results verify that the model we proposed obviously improves the prediction performance of SVDD compared with low illuminated images. It also works well when receives normal samples with repetitive, periodic background because of translation-invariance.

Funding

National Key Research and Development Program of China (2018YFF01013203).

Acknowledgments

Experiment equipment was provided by School of Precision Instrument and Opto-electronics Engineering, Tianjin University.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. A. Markman and B. Javidi, “Integrated circuit authentication using photon-limited x-ray microscopy,” Opt. Lett. 41(14), 3297–3300 (2016). [CrossRef]  

2. A. Markman, X. Shen, and B. Javidi, “Three-dimensional object visualization and detection in low light illumination using integral imaging,” Opt. Lett. 42(16), 3068–3071 (2017). [CrossRef]  

3. A. Markman and B. Javidi, “Learning in the dark: 3D integral imaging object recognition in very low illumination conditions using convolutional neural networks,” OSA Continuum 1(2), 373–383 (2018). [CrossRef]  

4. A. Markman, T. O’Connor, H. Hotaka, S. Ohsuka, and B. Javidi, “Three-dimensional integral imaging in photon-starved environments with high-sensitivity image sensors,” Opt. Express 27(19), 26355–26368 (2019). [CrossRef]  

5. B. Huang, T. Liu, J. Han, and H. Hu, “Polarimetric target detection under uneven illumination,” Opt. Express 23(18), 23603–23612 (2015). [CrossRef]  

6. J. Guan and J. Zhu, “Target detection in turbid medium using polarization-based range-gated technology,” Opt. Express 21(12), 14152–14158 (2013). [CrossRef]  

7. L. Zhang, Z. Yin, K. Zhao, and H. Tian, “Lane detection in dense fog using a polarimetric dehazing method,” Appl. Opt. 59(19), 5702–5707 (2020). [CrossRef]  

8. X. Li, P. Han, F. Liu, Y. Wei, X. Shao, X. Guangzhang, D. Wang, and Li, “Imaging through haze utilizing a multi-aperture coaxial polarizationimager,” in Frontiers in Optics / Laser Science, OSA Technical Digest (Optical Society of America, 2018), JW4A.136.

9. L. Shen, M. Reda, and Y. Zhao, “Image-matching enhancement using a polarized intensity-hue-saturation fusion method,” Appl. Opt. 60(13), 3699–3715 (2021). [CrossRef]  

10. X. Li, H. Hu, L. Zhao, H. Wang, Y. Yu, L. Wu, and T. Liu, “Polarimetric image recovery method combining histogram stretching for underwater imaging,” Sci Rep 8(1), 12430 (2018). [CrossRef]  

11. F. Liu, P. Han, Y. Wei, K. Yang, S. Huang, X. Li, G. Zhang, L. Bai, and X. Shao, “Deeply seeing through highly turbid water by active polarization imaging,” Opt. Lett. 43(20), 4903–4906 (2018). [CrossRef]  

12. H. Hu, Y. Zhang, X. Li, Y. Lin, Z. Cheng, and T. Liu, “Polarimetric underwater image recovery via deep learning,” Opt. Laser Eng 133, 106152 (2020). [CrossRef]  

13. T. Liu, Z. Guan, X. Li, Z. Cheng, Y. Han, J. Yang, K. Li, J. Zhao, and H. Hu, “Polarimetric underwater image recovery for color image with crosstalk compensation,” Opt. Laser Eng 124, 105833 (2020). [CrossRef]  

14. H. Hu, Y. Lin, X. Li, P. Qi, and T. Liu, “IPLNet: a neural network for intensity-polarization imaging in low light,” Opt. Lett. 45(22), 6162–6165 (2020). [CrossRef]  

15. T. Schlegl, P. Seebock, S. M. Waldstein, U. Schmidt-Erfurth, and G. Langs, “Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery,” Information Processing in Medical Imaging 10265, 146–157 (2017). [CrossRef]  

16. S. Mei, H. Yang, and Z. P. Yin, “An Unsupervised-Learning-Based Approach for Automated Defect Inspection on Textured Surfaces,” IEEE Trans. Instrum. Meas. 67(6), 1266–1277 (2018). [CrossRef]  

17. S. Akcay, A. Atapour-Abarghouei, and T. P. Breckon, “GANomaly: Semi-supervised Anomaly Detection via Adversarial Training,” Springer (2019).

18. T. Schlegl, P. Seebock, S. M. Waldstein, G. Langs, and U. Schmidt-Erfurth, “f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks,” Medical Image Analysis 54, 30–44 (2019). [CrossRef]  

19. M. M. Moya, M. W. Koch, and L. D. Hostetler, “One-class classifier networks for target recognition applications,” (1993).

20. B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson, “Estimating the support of a high-dimensional distribution,” Neural Computation 13(7), 1443–1471 (2001). [CrossRef]  

21. D. M. J. Tax and R. P. W. Duin, “Support Vector Data Description,” Machine Learning 54(1), 45–66 (2004). [CrossRef]  

22. K. Sjostrand, M. S. Hansen, H. B. Larsson, and R. Larsen, “A path algorithm for the support vector domain description and its application to medical imaging,” Medical Image Analysis 11(5), 417–428 (2007). [CrossRef]  

23. L. Ruff, R. Vandermeulen, N. Görnitz, L. Deecke, S. Siddiqui, A. Binder, E. Müller, and M. Kloft, Deep One-Class Classification (2018).

24. P. Liznerski, L. Ruff, R. A. Vandermeulen, B. J. Franks, M. Kloft, and K.-R. J. a. e.-p. Müller, “Explainable Deep One-Class Classification,” (2020), p. arXiv:2007.01760.

25. J. Yi and S. J. a. e.-p. Yoon, “Patch SVDD: Patch-level SVDD for Anomaly Detection and Segmentation,” (2020), p. arXiv:2006.16067.

26. T. Calders and S. Jaroszewicz, “Efficient AUC Optimization for Classification,” in Knowledge Discovery in Databases: PKDD 2007 (Springer Berlin Heidelberg, 2007), 42–53.

27. P. Bergmann, M. Fauser, D. Sattlegger, and C. Steger, “MVTec AD — A Comprehensive Real-World Dataset for Unsupervised Anomaly Detection,” in2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019), 9584-9592.

28. X. Li, H. Hu, F. Goudail, and T. Liu, “Fundamental precision limits of full Stokes polarimeters based on DoFP polarization cameras for an arbitrary number of acquisitions,” Opt. Express 27(22), 31261–31272 (2019). [CrossRef]  

29. D. P. Kingma and J. J. a. e.-p. Ba, “Adam: A Method for Stochastic Optimization,” (2014), p. arXiv:1412.6980.

30. J. A. Hanley and B. J. Mcneil, “The Meaning and Use of the Area under a Receiver Operating Characteristic (Roc) Curve,” Radiology 143(1), 29–36 (1982). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (11)

Fig. 1.
Fig. 1. (a) Data acquisition system (b) Image list $[{I_{00}},{I_{90}},{I_{45}},{I_{135}}]$ for full Stokes Vector estimation
Fig. 2.
Fig. 2. Polarization demodulation algorithm
Fig. 3.
Fig. 3. FSV estimation by combining DoFP polarization camera and retarder
Fig. 4.
Fig. 4. Logarithmic transformation curve of different $\nu $
Fig. 5.
Fig. 5. Flow chart of enhancement via polarized images (DoLP and AoP)
Fig. 6.
Fig. 6. In SVDD-based methods, normal and abnormal samples are projected to high-dimensional space by kernel function or encoders to solve the problem of linear inseparability.
Fig. 7.
Fig. 7. (a) Indicator function y of relative positions 0 to 7 used in self-supervised learning. (b) SVDD network of the proposed method.
Fig. 8.
Fig. 8. Flow chart of proposed polarimetric SVDD
Fig. 9.
Fig. 9. Total loss and Area under the receiver operating characteristic curve (AUROC) in training of polarimetric enhanced images. (a)Training loss (Train) and validation loss (Val). (b) AUROC.
Fig. 10.
Fig. 10. Labels, predictions and input histograms comparison among ground truth images (GT), low illuminated images (Low) and polarimetric enhanced images (Enhanced)
Fig. 11.
Fig. 11. Comparison of predictions among GT, Low, and Enhanced, including abnormal samples a, b, c and normal samples d, e, f

Tables (2)

Tables Icon

Table 1. Image numbers of datasets

Tables Icon

Table 2. Detection and Segmentation AUROC of GT, Low and Enhanced, whose inputs including feature maps generated by E b i g ( p ) , E s m a l l ( p ) and E s m a l l ( p ) E b i g ( p )

Equations (19)

Equations on this page are rendered with MathJax. Learn more.

S = [ S 0 , S 1 , S 2 , S 3 ] T = [ I 00 + I 90 , I 00 I 90 , I 45 I 135 , I L I R ] T
F S V = W + × I
W + = ( W T × W ) 1 × W T
W 8 × 4 = 1 2 [ 1 cos 2 2 θ 1 + cos δ sin 2 2 θ 1 ( 1 cos δ ) sin 2 θ 1 cos 2 θ 1 sin δ sin 2 θ 1 1 ( cos 2 2 θ 1 + cos δ sin 2 2 θ 1 ) ( 1 cos δ ) sin 2 θ 1 cos 2 θ 1 sin δ sin 2 θ 1 1 ( 1 cos δ ) sin 2 θ 1 cos 2 θ 1 ( sin 2 2 θ 1 + cos δ cos 2 2 θ 1 ) sin δ cos 2 θ 1 1 ( 1 cos δ ) sin 2 θ 1 cos 2 θ 1 ( sin 2 2 θ 1 + cos δ cos 2 2 θ 1 ) sin δ cos 2 θ 1 1 cos 2 2 θ 2 + cos δ sin 2 2 θ 2 ( 1 cos δ ) sin 2 θ 2 cos 2 θ 2 sin δ sin 2 θ 2 1 ( cos 2 2 θ 2 + cos δ sin 2 2 θ 2 ) ( 1 cos δ ) sin 2 θ 2 cos 2 θ 2 sin δ sin 2 θ 2 1 ( 1 cos δ ) sin 2 θ 2 cos 2 θ 2 ( sin 2 2 θ 2 + cos δ cos 2 2 θ 2 ) sin δ cos 2 θ 2 1 ( 1 cos δ ) sin 2 θ 2 cos 2 θ 2 ( sin 2 2 θ 2 + cos δ cos 2 2 θ 2 ) sin δ cos 2 θ 2 ]
D o L P = S 1 2 + S 2 2 S 0
A o P = 1 2 arctan ( S 2 S 1 )
g o u t ( x , y ) = g i n ( x , y ) M i n M a x M i n
I p ( x , y ) = 1 2 [ D o L P ( x , y ) + A o P ( x , y ) ]
p o u t ( x , y ) = log ( 1 + ν ) [ 1 + ν p i n ( x , y ) ]
L S V D D = i | | E θ ( x i ) c | | 2
L P S V D D = i , i | | E θ ( p i ) E θ ( p i ) | | 2
L S S L = C E L { y , C [ E θ ( p 1 ) , E θ ( p 2 ) ] } = 1 N i c = 0 7 { y i c log C i c [ E θ ( p 1 ) , E θ ( p 2 ) ] }
L t o t a l = λ L P S V D D + L S S L
e b i g ( p ) = E b i g ( C a t ( E s m a l l ( p s ) ) )
T P R = T P T P + F N
F P R = F P F P + T N
A U R O C [ s θ ] = P [ s θ ( a b n o r m a l ) < s θ ( n o r m a l ) ]
s θ ( p ) = | | E θ ( p ) E θ ( p n o r m a l ) | | 2
s θ i m a g e = max i , j ( e s m a l l ( p ) e b i g ( p ) )
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.