Detecting vibrations in digital holographic multiwavelength measurements using deep learning

Tobias Störk; Tobias Seyler; Markus Fratz; Alexander Bertz; Stefan Hensel; Daniel Carl

doi:10.1364/AO.507303

1. INTRODUCTION

The demands for quality control of manufactured parts during production are constantly rising. At the same time, production lines must be more flexible to handle a wide variety of parts and respond quickly to incoming orders. With the increasing digitalization, production steps are becoming more interconnected, and more data are being collected. One approach to optimize capacity utilization is the order-controlled production (OCP) [1]. However, to ensure quality, more measurements are needed, requiring faster and more precise sensor systems that can be integrated into the processing steps [2]. Digital holography (DH) is a technique that can meet these requirements [3–6] but is very sensitive to vibration due to its interferometric measurement principle [7]. While digital holographic sensors have already proven their applicability when they are exposed to harsh conditions and complex vibrations [3,4], these complex vibrations still influence the measurement quality and can lead to bad results. Assuming that the exposure times are short compared to the time interval between successive camera images, these axial vibrations mainly generate errors in the phase steps. These phase step errors result in errors of the measured phase images. The ideal phase step distribution for three phase steps is a step difference of $2\pi /3$; thus, phase step errors up to this range are critical. In this case, the phase step reconstruction cannot be performed correctly because two of three phase steps will be the same, while the algorithm used requires at least three different phase steps. This range corresponds to vibration amplitudes of $\lambda /6$ yielding such bad results. To enhance the usability of such sensors in production, we need to detect or compensate vibrations to make the measurements more reliable.

To compensate vibrations in phase shifting interferometry setups, different methods are typically split into active and passive measures. Active measures try to measure the occurring vibrations and compensate them through adjustments of the interferometry setup, for instance by manipulating the piezo electric transducer (PZT) commonly used for the phase shifting [8–10] or other phase shifting devices [11]. Other solutions have the same objective but do not actively influence the interferometer’s internal setup, like using a setup on a damped optical table. Passive measures on the other side do not actively intervene the interferometer setup. Instead, they try to minimize the influence of vibration through other approaches, e.g., iterative ones [12,13], spectral analysis [14], random phase shifting [15], or single shot approaches [16,17]. Nowadays, with the availability of large data sets, the use of machine learning (ML) and deep learning (DL) [18] algorithms is a common approach. These DL algorithms are based on different kinds of neural network architectures relying on the principle ideas of multilayer perceptrons (MLPs) based on the perceptron [19] or convolutional neural networks (CNNs) [20]. Being used in many research fields, the adaption of such algorithms to optical metrology [21] and in digital holography [22] is a more recent development. Nevertheless applications of DL in DH primarily focus on reconstruction [23], focus prediction [24–26], or denoising [27]. We are not aware of an application for vibration detection in DH yet.

In [6], we have shown that a laterally resolved evaluation of phase steps can improve the measurement quality during vibrations to a certain degree. During this evaluation, we compute parameters that should be able to indicate vibrations in the measurements. This suggests the implementation of a detection algorithm without the need to change the used setup. As we can easily create a big data set of measurements, we decide to use deep learning for detection. Thus, with this work, we present the latest developments in our cascaded data processing [6,28] where we propose using deep learning in the form of a deep neural network to detect vibrations in the measurement data during the early steps of reconstruction. We use our HoloTop NX sensor [29] for multi-wavelength DH measurements in reflection.

Detecting vibrations in the raw measurement data before full evaluation is particularly beneficial because the steps of object wave reconstruction and object wave propagation are computationally expensive with computation times in the range of several hundred milliseconds on modern GPUs. Thus, we want to detect vibrations as early as possible to make recapturing of a new data set feasible.

In Section 2, we present the sensor system we are using and the principles of the hologram acquisition. Furthermore, we propose a deep learning approach for vibration detection. Section 3 covers the experimental setup for data generation, the data processing, and the test data sets. The results of the training and testing are presented in Section 4. Finally, the conclusion is given in Section 5.

2. METHODS

A. Sensor System

We use our digital holographic sensor HoloTop NX [29]. Thanks to its compact setup, it can perform measurements in machine tools or on robots (Fig. 1). This sensor has the following physical dimensions: 125 mm diameter, height of about 200 mm, and weight of around 2 kg. It has a field of view (FoV) of $12.5\times 12.5\;{\rm mm}^2$ and the ability for processing up to 20 Mio. 3D-points per second [30].

Fig. 1. HoloTop NX sensor. (a) Integration in a machine tool. (b) Measurements on a collaborative robot.

Download Full Size | PDF

B. Hologram Acquisition

We use temporal phase shifting to reconstruct the complex object wave. For each wavelength $\lambda$, we capture three interferograms ${I_{\lambda ,n}}(x,y)$ with $n = 0,1,2$. For each interferogram, the sensor’s internal piezo electric transducer (PZT) generates a phase step ${\alpha _n}(x,y)$. These phase steps are approximately ${\alpha _n}(x,y) = 2\pi n/3$. Random axial vibration introduces an unknown phase shifting error $\Delta \alpha$. The real phase steps ${\tilde \alpha _n}(x,y)$ are, thus, ${\tilde \alpha _n}(x,y) = {\alpha _n}(x,y) + \Delta \alpha$. In our investigation, we focus on axial vibrations because they superimpose in the direction of the interferometer’s sensitivity vector. This is due to our sensor’s lateral sampling of 4 µm [29] and exposure times less than 500 µs, e.g., 100 µs for the conducted test measurements, meaning large lateral vibrations compared to critical axial vibrations in the range of nanometers need to be present. It is unlikely that lateral vibrations in the order of multiple micrometers with axial vibrations less than 100 nm are present.

In order to reconstruct the complex object wave, we first calculate the real phase steps ${\tilde \alpha _n}(x,y)$ using the algorithm proposed by Cai et al. [31]. First, four constants $p$, $q$, $r$, and $c$ (Eq. (1)) are defined, where ${I_n}$ denotes the pixelwise interferogram intensity corresponding to the $n$th phase step, $|x|$ denotes the absolute value of $x$, and $\langle x\rangle$ is the average over all values of $x$ [31],

(1)$$\begin{split}p& = { \langle |{I_1} - {I_0}|\rangle},\\q& = { \langle |{I_2} - {I_1}|\rangle},\\r& = { \langle |{I_2} - {I_0}|\rangle},\\c& = { 2pqr{{\left[{2({p^2}{q^2} + {p^2}{r^2} + {q^2}{r^2}) - ({p^4} + {q^4} + {r^4})} \right]}^{- 1/2}}.}\end{split}$$

Eventually, the phase steps ${\alpha _1}$ and ${\alpha _2}$ can be calculated by Eq. (2),

(2)$${\alpha _1} = 2{\sin}^{- 1} \left({\frac{p}{c}} \right),\quad {\alpha _2} = 2 {\sin}^{- 1} \left({\frac{q}{c}} \right).$$

The phase step ${\alpha _0}$ is assumed to be zero as the steps are relative to each other. To achieve a better and spatially resolved reconstruction of the phase steps, we proposed the adaption of the algorithm to the reconstruction in ${16} \times {16}$ regions of interest (ROIs) [6]. The resolution of the sensor in pixels (px) is ${3008}\;{\rm px} \times {3008}\;{\rm px}$, which yields ${188}\;{\rm px} \times {188}\;{\rm px}$ for each ROI.

Greivenkamp [32] proposed a method to calculate the phase of the complex object wave $\varphi (x,y)$ using the interferograms ${I_n}(x,y)$ and the phase steps ${\tilde \alpha _n}(x,y)$. With the introduction of the auxiliary constants ${a_0}(x,y)$, ${a_1}(x,y)$, and ${a_2}(x,y)$ by [32], we can write the interferogram equation in form of Eq. (3),

(3)$$\begin{split}{I_n}(x,y) &= {a_0}(x,y) + {a_1}(x,y)\cos ({\tilde \alpha _n}(x,y))\\&\quad + {a_2}(x,y)\sin ({\tilde \alpha _n}(x,y)).\end{split}$$

This yields to the complex wave given by ${C_\lambda}(x,y) = {a_1}(x,y) + i{a_2}(x,y)$, where $i$ denotes the imaginary unit. Now we can calculate the phase $\varphi (x,y)$ at each location of the interferogram, with the following Eq. (4):

(4)$$\varphi (x,y) = {\tan}^{- 1} \left({\frac{{{a_2}(x,y)}}{{{a_1}(x,y)}}} \right).$$

Due to speckle, the hologram phase $\varphi (x,y)$, which carries the information about the height of the object, consists of noise for a single wavelength and thus cannot be used. In addition, the unambiguous measurement range is $\lambda /2$ for measurements in reflection as we are doing it. That is too small to capture the whole height information of an object for a single wavelength in the range of 500 nm. To measure those rough, speckle-producing objects, multiwavelength DH is used as proposed by Wagner et al. [33]. We are combining several different wavelength measurements with wavelengths in the range of, e.g., 514–517 nm to generate multiple synthetic wavelengths $\Lambda$ in the range of 88–767 µm. Thereby, we extend the measurement range while keeping the precision of the smallest synthetic wavelength used. It has to be considered that large synthetic wavelengths face magnified phase noise that newer methods [34] deal with. The upper half of Fig. 2 shows the described processing steps. For illustration purposes, these steps are shown with a coin sample.

Fig. 2. Diagram of the processing steps involved in hologram acquisition and the proposed DL approach. In the upper half of the figure, the processing steps of hologram acquisition described in Section 2.B is shown. For illustration purposes, these steps are shown with a coin as sample. For each wavelength ${\lambda _i}$, we capture three interferograms that then are used for phase shift reconstruction. Afterward the complex object wave is reconstructed and propagated, and synthetic wavelengths are computed. This result gets filtered, unwrapped, and finally evaluated. In the lower half, the proposed DL approach is shown with the graphical representation of the proposed DNN with the depth of its hidden layers. The activation functions are not shown. Additionally, it is shown where the features are extracted and how the prediction of the network is further evaluated for classification of the measurement.

Download Full Size | PDF

C. Proposed Deep Learning Approach

We propose to use a deep neural network (DNN) with fully connected layers like the MLP as a regressor [35] to predict the standard deviation of the hologram phase ${\sigma _\varphi}$. When using a flat training sample like a mirror, the phase $\varphi (x,y)$ is flat as well; thus, the standard deviation ${\sigma _\varphi}$ is small. If vibrations occur, they influence the interferograms and the introduced phase steps ${\tilde \alpha _n}(x,y)$ like those described in Section 2.B. In Eqs. (2)–(4), we see that the interferograms as well as the phase steps are directly related to $\varphi (x,y)$. Therefore, a measurement influenced by vibrations leads to the phase $\varphi (x,y)$ being not as smooth as without vibrations. This means ${\sigma _\varphi}$ also increases and, thus, directly indicates vibrations for a flat surface. The advantage of using a mirror as a training sample enables direct evaluation of single wavelength measurements without the need of synthetic wavelength computation as no speckle occurs. Further, the target ${\sigma _\varphi}$ can be directly calculated from the measurements we capture by our setup described in Section 3.A without the need to manually label the data. Because of that, we select the regression in favor of the direct classification task. In general, the network learns to predict ${\sigma _\varphi}$ without reconstructing the wavefront from the hologram. It also learns to associate the features of good measurements with low values of ${\sigma _\varphi}$ and bad ones with high values. The decision boundary between good and bad can then be set individually depending on the application.

To be independent of the appearance of the samples measured by the sensor, we choose to use the standard deviations $\sigma$ and mean values $\mu$ of the computed ROIs of the parameters $p$, $q$, $r$, $c$ and the phase steps ${\alpha _1}$ and ${\alpha _2}$. In addition, we compute $\sigma$ and $\mu$ of the gray value intensities over the complete single frames of the raw interferogram data. All these features form the input vector ${{\boldsymbol x}_{{\rm DNN}}}$ of the DNN that is shown in Eq. (5),

(5)$$\begin{split}&{{\boldsymbol x}_{{\rm DNN}}}\\& = [{\mu _p} {\sigma _p} {\mu _q} {\sigma _q} {\mu _r} {\sigma _r} {\mu _c} {\sigma _c} {\mu _{{\alpha _1}}} {\sigma _{{\alpha _1}}} {\mu _{{\alpha _2}}} {\sigma _{{\alpha _2}}} {\mu _{{I_1}}} {\sigma _{{I_1}}} {\mu _{{I_2}}} {\sigma _{{I_2}}} {\mu _{{I_3}}} {\sigma _{{I_3}}}].\end{split}$$

It consists of three fully connected layers with ReLU activation functions that map on the single output $y$ without an activation function. The fully connected layers start with a depth of 256 neurons for the first layer of the network, 128 neurons for the second layer, and 32 neurons for the last layer before the output neuron. Moreover, we use dropout during the training of the network [36]. The lower half of Fig. 2 shows the basic graphical representation of the implemented network without the activation functions. Additionally, the input vector and the functionality of classification are shown. Furthermore, it shows where we extract the features from the processing steps described in Section 2.B.

In addition, we propose a different approach to using the DNN: Instead of predicting the target based on the standard deviations and mean values and thus yielding a spatially unresolved prediction, we can use the DNN to predict the target for each ROI. To do this, we change the input vector to use the computed parameters of every ROI instead of the mean $\mu$ and $\sigma$ of all ROIs. Additionally, we divide the raw frames into the same ROIs like the parameters and calculate the mean $\mu$ and standard deviation $\sigma$ of the raw frames in these ROIs, respectively. We also add the global standard deviation and mean of the phase steps to each input vector of the ROIs [Eq. (6)]. We call this a hybrid approach (hDNN) because we combine spatially resolved prediction with the proposed DNN. For the training, this approach has the advantage that the number of available samples can be multiplied by the total number of ROIs per measurement (256). Due to the mirror sample, the target ${\sigma _\varphi}$ of each ROI is nearly the same value. The features on the other hand do have some variations due to the Gaussian beam profile and its influence on the data. Thus, doing the evaluation with the ROI approach is not like using the same values in each ROI,

(6)$${{\boldsymbol x}_{{\rm hDNN}}} = [p q r c {\alpha _1} {\alpha _2} {\mu _{{\alpha _1}}} {\sigma _{{\alpha _1}}} {\mu _{{\alpha _2}}} {\sigma _{{\alpha _2}}} {\mu _{{I_1}}} {\sigma _{{I_1}}} {\mu _{{I_2}}} {\sigma _{{I_2}}} {\mu _{{I_3}}} {\sigma _{{I_3}}}].$$

Since we do not have a reference to compare our results to, we take three classical ML algorithms for comparison. As the simplest one, we choose the least squares (LS) algorithm, and for more advanced algorithms, we choose the support vector machine (SVM) [37] and random forest (RF) [38]. As a baseline for all, we use an estimator predicting only the mean target value of the test data set, respectively.

3. DATA ACQUISITION AND PREPARATION

Since there are no suitable data sets available for our application, we must collect and prepare our own data set to train the two proposed DL approaches.

A. Experimental Setup

The detailed description of the HoloTop NX sensor and its optical setup can be found in [29]. To introduce an axial oscillation while measuring with the HoloTop NX sensor, we are using a PZT, on which we mount our sample. The PZT is driven by an amplifier, which is fed by a signal generator with which we can introduce the electrical oscillation signal. In addition, the PZT is mounted on an alignment system to align the sample toward the sensor. The sample we are using here is a mirror. This has the advantage of being able to directly evaluate the reconstruction result of the complex object wave without the need to calculate synthetic wavelengths. The complete setup shown in Fig. 3 is mounted on a damped optical table. We use different laser sources with wavelengths between 514 and 517 nm.

Fig. 3. Scheme of the oscillating measurement setup. Our mirror sample is mounted on the PZT, which is fed by a signal generator. The complete setup is placed on an optical table.

Download Full Size | PDF

Fig. 4. Histogram of the computed targets ${\sigma _\varphi}$ for the first benchmark. All targets without vibration lay within the first bin. We use this bin to determine the decision boundary of ${\sigma _\varphi}$ to 0.31 rad.

Download Full Size | PDF

The introduced oscillation is a sine wave with a frequency of 84 Hz [39] as this is the dominant frequency in the machine tool the holographic sensor will be used in for real measurement applications. The amplitude of the oscillation is varied between 39 and 116 nm. To have a baseline for optimum data quality achievable with the optical setup, additional measurements are acquired without an oscillation. The frame rate of the sensor is 20 Hz.

B. Data Preparation

To use the acquired data for the training, it must be preprocessed. The first step we take is a reference measurement of which the reconstructed complex wave field in the hologram plane ${C_\lambda}(x,y)$ is used to remove present tilts of the sample to achieve a plane surface. This tilt removal is done by complex conjugate multiplication of the complex reference wave field with the current complex wave field in the hologram plane of the measurement. The used reference measurement is the same for all samples as the measurement setup is static. Due to the oscillation, the result of the complex conjugate multiplication might still have a tilt of the complete wave field in the hologram plane with a little slope. This is removed by applying a high pass filter on the complex wave field in the hologram plane. After that, the target ${\sigma _\varphi}$ of the sample is directly computed from the reconstructed phase $\varphi (x,y)$ in the hologram plane, and the features are calculated during the corresponding reconstruction steps. A total of 15,000 sample measurements are generated out of 6000 single measurements split into approximately 42% of good samples, i.e., measurements with vibrations that do not affect the data quality observably, and 58% of vibration samples.

C. Test Data Sets

To evaluate the performance of the DNN and the other algorithms, we collect additional data. These data are used for testing purposes only. The first data set is captured using the mechanically oscillating setup with the same amplitude range and the frequencies of 10, 25, 50, 60, and 84 Hz. Additionally, we introduce a constant offset of 233 nm. The frame rate of the sensor is 20 Hz. In total, 600 samples are captured, of which 150 samples are acquired without external oscillation. We calculate the histogram of the computed targets ${\sigma _\varphi}$ shown in Fig. 4 to have a starting point for the selection of the decision boundary of the evaluation in Section 4.

We also want to test the algorithms with data simulating a typical production environment. Therefore, we use the same setup as before, but we do not introduce any external oscillation through the PZT. To simulate manufacturing environments with complex vibrations, we introduce them manually by stimulating the optical table with a hammer. Additionally, we use other laser sources with the wavelengths 632.855, 633.828, and 643.935 nm. In total, we have acquired 100 measurements with 300 test samples calculated for this data set, stimulating the optical table 10 times.

4. RESULTS

A. Train Results

We use PyTorch [40] to train the DNN on a computer equipped with a 9th generation Intel i7 CPU, a Nvidia RTX2080 Ti GPU, and 32 GB RAM. On the same machine, we train the classical ML approaches using the scikit-learn [41] package. The training data are split by 70:30 into training and validation data. It consists of data captured with the setup in Section 3.A while the vibration data set is captured at a frequency of 84 Hz with oscillation amplitudes of 39–116 nm. In total, 15,000 samples are used for training and validation with 42% representing samples without vibration. For the hDNN, $256 \times 15,\!000$ samples are used. The DNN is trained for 200 epochs and a batch size of 64 and of $64 \times 256$ for the hDNN. The method Adam [42] is used as optimizer and the mean squared error (MSE) as the loss function to which the network gets optimized. The learning rate is set to 0.001 and kept constant during training. As metrics, we are using the MSE and mean absolute error (MAE), which are best when values near zero are reached. Additionally, we use the ${R^2}$-score that can vary between 0 (imperfect predictions) and 1.0 (perfect predictions) as this metric gives an indication of how well the model fits the data. The SVM is used with the standard scikit parameters of “kernel” with “RBF” (radial basis function) and “gamma” with “scale” (parameter for kernel coefficients). The RF is trained with the parameter criterion set to “MSE” (mean square error) and the number of forests set to 100.

The training result for the DNN is shown in Fig. 5. There the average loss of training and validation per epoch is shown. We can see that there is no overfitting, and the loss is decreasing for the train loss below 0.1 rad and the validation loss below 0.05 rad. There is no real difference in the training results of DNN and hDNN. As we apply dropout while training the network and we deactivate it for validation and testing, the loss during training is higher than the validation loss. If we do not apply dropout, the test loss is less than the validation loss. The training with dropout yields the better results.

Fig. 5. Training and validation loss (MSE) of the proposed DNN. The loss is smoothly decreasing to very good results of less than 0.1 rad for training and less than 0.05 rad for validation. There is no overfitting.

Download Full Size | PDF

Figure 6 shows the results of the metrics during validation. The MSE is identical to the loss. The MAE reaches a result of less than 0.14 rad, and the ${R^2}$-score is better than 0.9. Additionally, we verify our training results by doing a 5-fold cross validation. The split between training and validation is kept the same as before with 70:30, but now we apply this split five times yielding different training and validation data sets. We also use the same training parameters. The results of the training and validation losses for each fold are shown in Fig. 7. Here we can see that these results are nearly identical between the split and the training before. Thus, the achieved results are not dependent on the applied train-validation-split. After the initial choice of hyperparameters, we applied hyperparameter tuning for the learning rate, hidden layer sizes, and batch size. During hyperparameter tuning, the learning rate is sampled by a logarithmic uniform probability density function with the limits 0.0001 and 0.1. The first and second hidden layer size is sampled for ${2^x}$ while $x \sim {\cal U}(2,10)$, while the third layer is sampled for $x \sim {\cal U}(2,8)$. For the batch size, we sample from the defined values 16, 64, 128, 256, while for the hDNN these values are multiplied by 256. The results of the training with these adapted parameters and adapted network depths are not better than the initial choice, so we decide to stick with our initial choice.

Fig. 6. Achieved metrics in the validation. All three metrics achieve good results with the MAE below 0.15 rad and ${R^2}$-score greater than 0.9. The MSE is identical to the validation loss shown in Fig. 5.

Download Full Size | PDF

Fig. 7. Achieved loss results in training and validation for the 5-fold cross validation. We can see that the results are nearly identical, which means the training is not dependent on the split of the data set.

Download Full Size | PDF

The classic algorithms trained on the same data set with the same features as the DNN achieve the training results shown in Table 1. In general, all results are good and in the same range as the DNN and hDNN. Overall, the RF algorithm achieves the best results.

B. Test Results

For the application, our detector has the goal to predict as much true positives (sensitivity) and true negatives (specificity) as possible. False positive predictions are not as critical as false negative predictions because a false negative prediction will be detected after the computationally expensive processing steps. For the first test data set, we will use a threshold of ${\sigma _{\varphi ,{\rm label}}} = 0.31$ rad based on the histogram (Fig. 4) of the computed targets ${\sigma _\varphi}$ to label our data set. This threshold will capture all good samples while labeling the data set. In the real application, we cannot calculate the precise value of the target ${\sigma _\varphi}$, e.g., due to the speckle-producing rough surface of the sample. In general, it will not be possible to use this type of threshold to label the prediction of the DNN. Thus, we choose another threshold ${\sigma _{\varphi ,{\rm th}}}$. Since we want to maximize the accuracy of the detector, it is reasonable to choose a ${\sigma _{\varphi ,{\rm th}}}$ that maximizes this classification accuracy. In Fig. 8, the scatter plot of the predicted target ${\sigma _{\varphi ,{\rm pred}}}$ of the DNN against the true target ${\sigma _{\varphi ,{\rm true}}}$ is shown while the threshold for both values is 0.31 rad. The green color marks true positive, the blue marks true negative, the orange marks false positive, and the red marks false negative samples.

Table 1. Training Results of the Classic ML Algorithms^a

View Table | View all tables in this article

Fig. 8. Scatter plot of the predicted target ${\sigma _{\varphi ,{\rm pred}}}$ by the DNN on the $x$ axis and the true target ${\sigma _{\varphi ,{\rm true}}}$ on the $y$ axis. The data are labeled with the thresholds 0.31 rad indicated through the red lines. The green points are true positive, the blue are true negative, the orange are false positive, and the red are false negative samples. There should be another prediction threshold ${\sigma _{\varphi ,{\rm th}}}$ that increases the prediction accuracy.

Download Full Size | PDF

We see that there is the potential to reach a higher accuracy when the threshold for the prediction is adjusted. Thus, we start with the initial value of 0.31 rad as the threshold and search for the optimum in the range of 0.31 and 1.0 rad. This yields more than one threshold of ${\sigma _{\varphi ,{\rm th}}}$ with the best result. Because of that, we decide to take the mean of these values and set the threshold ${\sigma _{\varphi ,{\rm th}}}$ to 0.59 rad. In the following, we use this threshold of 0.59 rad to label the predictions of the algorithms.

To test the algorithms, we use data that are not used for training and validation. The detailed description of these data sets is given in Section 3.C. In the following, the first data set, or test set 1, corresponds to the first data set described in Section 3.C, which represents training like data of other frequencies and amplitudes. The second data set, or test set 2, corresponds to the other described data set in this section with the aim of simulating production-like environments.

When the algorithms are tested on this first test data set with its different vibrations, they achieve the regression results shown in Table 2. We can see that the hDNN performs best with the best metrics of $0.139\;{{\rm rad}^2}$ in MSE, 0.229 rad in MAE, and 0.5346 in the ${R^2}$-score. Even though the SVM is nearly identical in all three metrics, the DNN is worse in the achieved results but better than RF and LS. All algorithms outperform the baseline approach.

Table 2. Test Results of the Algorithm Applied on Never Seen Before Data Containing Additional Frequencies Compared to the Training Data^a

View Table | View all tables in this article

The classification by applying the threshold ${\sigma _{\varphi ,{\rm th}}} = 0.59\;{\rm rad}$ on the prediction of the algorithms leads to the results shown in Fig. 9. We see that the RF has the best accuracy with 97.0%, followed by the SVM with 96.2% and the DNN with 96.0%. The hDNN performs slightly worse with 93.5%, while the LS is clearly outperformed with only an accuracy of 79.0%. Overall, the SVM, RF, and DNNs achieve near identical performance, while the RF and SVM have fewer false negatives, which is more favorable for our application.

Fig. 9. Confusion matrix of the classification using the first test data set for the algorithms: (a) DNN, (b) hDNN, (c) LS, (d) SVM, and (e) RF. The SVM and RF perform best with the highest accuracy and favorable less false negative predictions followed by the DNNs. The LS algorithm performs worse.

Download Full Size | PDF

Table 3. Test Results of the Algorithm Applied on Never Seen Before Data Captured by Manual Impulse Stimulation of the Optical Table^a

View Table | View all tables in this article

Fig. 10. Confusion matrices of the labeled predictions of the algorithms for the second test data set with impulse stimulation of the optical table: (a) DNN, (b) hDNN, (c) LS, (d) SVM, and (e) RF. The hDNN performs best with the highest accuracy closely followed by the DNN that has the favorable no false negative prediction. The RF and SVM perform worse than the DNNs while the LS algorithm is outperformed by all.

Download Full Size | PDF

When the algorithms are confronted with the test data set 2 from the data captured by the impulse stimulation of the optical table, the algorithms achieve the metrics shown in Table 3. The hDNN again achieves the best results with a MSE of $0.017\;{{\rm rad}^2}$, a MAE of 0.065 rad, and an ${R^2}$-score of 0.8168. The DNN is the second-best algorithm with the MSE and MAE about twice as high as the hDNN and an ${R^2}$-score of 0.6143. The SVM and RF can perform better than the baseline approach while the LS algorithm fails to achieve better results than the baseline.

To label the samples of the data, we use the threshold ${\sigma _{\varphi ,{\rm label}}}$ of 0.28 rad as this value captures all good samples of the data set, where no impulse stimulation occurred. The prediction threshold is the same as before with ${\sigma _{\varphi ,{\rm th}}} = 0.59\;{\rm rad}$. After labeling the predictions of the algorithms, we can construct the confusion matrices shown in Fig. 10. The hDNN achieves the best accuracy of 99.0% but still has one false negative prediction. The DNN reaches an accuracy of 97.3% with no false negative predictions that are favorable for our application even though the false positive predictions are higher than for the hDNN. The SVM has an accuracy of 96.7%, which is slightly less than the DNN. The accuracy of the RF is 92.0%, and the accuracy of the LS is 88.0%. All algorithms perform better than the LS algorithm.

To conclude the evaluation of the prediction results of the algorithms, Table 4 shows the summary of the discussed results of Figs. 9 and 10.

As described in Section 1, the detection of vibration affected data has to be fast and early in the processing step chain. Thus, we measure the prediction speed of the DNN in the python code, as this is our main approach. When the data and the neural network are available on the GPU, the time elapsed for a single prediction step (which is equal to the forward pass of the neural network) is approximately 35 µs, which is the mean value over 1000 single predictions. In the final application, the data will also be mainly processed on the GPU, which means this scenario is realistic. If we add the time to move the input vector to the GPU, predict, and move the predicted value to the CPU, the elapsed time is approximately 540 µs. In comparison to the time elapsed for reconstruction and propagation, this is considerably faster, which means our initial goal is achieved.

Table 4. Summary of the Achieved Accuracies Shown in Figs. 9 and 10^a

View Table | View all tables in this article

5. CONCLUSION

We have shown that deep learning is a suitable approach to detect vibrations in digital holographic multiwavelength measurements. We propose to use a deep neural network (DNN) with fully connected layers to predict the standard deviation ${\sigma _\varphi}$ of the reconstructed phase $\varphi (x,y)$ in the hologram plane without propagation that is used to classify data. Furthermore, we propose to let the DNN make predictions per ROI (hDNN). In addition, we trained classical machine learning algorithms such as least squares (LS), support vector machine (SVM), and random forest (RF) to evaluate the performance of the proposed neural network. When the trained algorithms are tested with unseen data related to the training data, the DNNs achieve an accuracy of 96.0% (DNN) and 93.5% (hDNN), while the RF and SVM reach accuracies of 97.0% and 96.2%, respectively. When testing the algorithms with data related to the application in the production environment, the DNN and hDNN performed best with an achieved accuracy of 97.3% for the DNN and 99.0% for the hDNN. The DNN predicts no false negative samples, which makes it more suitable for the application because a false negative will lead to bad results in the subsequent processing steps—numerical propagation of the complex wave field from the hologram plane to the object plane—that are time consuming. The effectiveness of this approach for mirror-like samples relevant for production applications [3] is very promising due to the presented results using a mirror. Because the main features used are independent of the sample, we expect the training process with mirror samples to have little impact on the application of the algorithm for other sample types. However, we are going to investigate this in the future. Furthermore, the time of a single prediction of the DNN on the GPU takes only approximately 35 µs, which is considerably faster compared to the steps of wave field reconstruction and propagation, which are in the range of hundreds of milliseconds. To enhance the results even further, more investigations of the influence of vibrations in multiwavelength digital holographic measurements are underway, and some current results of this work are presented in [43].

Acknowledgment

This work is related to the work presented at the Digital Holography and Three-Dimensional Imaging (DH) conference in 2023, Investigation of Permuted Phase Steps on Multiwavelength Digital Holographic Measurements (HTu2C.5), where current topics of our investigation of vibrations in multiwavelength digital holographic measurements are presented. This paper is also based on the master thesis Schwingungskompensation digital-holographischer 3D-Sensorik mithilfe von Machine Learning finished 2023 by Tobias Störk.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

REFERENCES

1. P. Detzner, A. Ebner, M. Horstrup, et al., “PFDL: A production flow description language for an order-controlled production,” in 22nd International Conference on Control, Automation and Systems (ICCAS) (IEEE, 2022), pp. 1099–1106.

2. X. Luo and Y. Qin, eds., Hybrid Machining: Theory, Methods, and Case Studies (Elsevier Science & Technology, 2018).

3. M. Fratz, T. Seyler, A. Bertz, et al., “Digital holography in production: an overview,” Light Adv. Manuf. 2, 134–146 (2021). [CrossRef]

4. M. Fratz, T. Beckmann, J. Anders, et al., “Inline application of digital holography,” in Digital Holography and 3-D Imaging 2019 Feature Issue, P. Picart, ed. (2019), pp. G120–G126.

5. T. Seyler, T. Beckmann, J. Stevanovic, et al., “Multi-wavelength digital holography on a collaborative robot: Session advances in digital holographic techniques II,” in Digital Holography and Three-Dimensional Imaging (Optical Society of America, 2021).

6. T. Seyler, L. Bienkowski, T. Beckmann, et al., “Multiwavelength digital holography in the presence of vibrations: laterally resolved multi-step phase-shift extraction,” Appl. Opt. 58, G112 (2019). [CrossRef]

7. T. M. Kreis, Handbook of Holographic Interferometry: Optical and Digital Methods (Wiley-VCH, 2005).

8. I. Yamaguchi, “Active phase-shifting interferometers for shape and deformation measurements,” Opt. Eng. 35, 2930–2937 (1996). [CrossRef]

9. H. Martin, K. Wang, and X. Jiang, “Vibration compensating beam scanning interferometer for surface measurement,” Appl. Opt. 47, 888–893 (2008). [CrossRef]

10. A. Schiller, T. Beckmann, M. Fratz, et al., “Motion compensation for interferometric off-center measurements of rotating objects with varying radii,” APL Photonics 4, 071301 (2019). [CrossRef]

11. C. Zhao and J. H. Burge, “Vibration-compensated interferometer for surface metrology,” Appl. Opt. 40, 6215–6222 (2001). [CrossRef]

12. L. L. Deck, “Model-based phase shifting interferometry,” Appl. Opt. 53, 4628–4636 (2014). [CrossRef]

13. Q. Liu, Y. Wang, J. He, et al., “Modified three-step iterative algorithm for phase-shifting interferometry in the presence of vibration,” Appl. Opt. 54, 5833–5841 (2015). [CrossRef]

14. L. L. Deck, “Suppressing phase errors from vibration in phase- shifting interferometry,” Appl. Opt. 48, 3948–3960 (2009). [CrossRef]

15. Q. Hao, Q. Zhu, and Y. Hu, “Random phase-shifting interferometry without accurately controlling or calibrating the phase shifts,” Opt. Lett. 34, 1288–1290 (2009). [CrossRef]

16. J. T. Wiersma and J. C. Wyant, “Vibration insensitive extended range interference microscopy,” Appl. Opt. 52, 5957–5961 (2013). [CrossRef]

17. D. G. Abdelsalam, B. Yao, P. Gao, et al., “Single-shot parallel four-step phase shifting using on-axis Fizeau interferometry,” Appl. Opt. 51, 4891–4895 (2012). [CrossRef]

18. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521, 436–444 (2015). [CrossRef]

19. F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain,” Psychol. Rev. 65, 386–408 (1958). [CrossRef]

20. Y. Lecun, L. Bottou, Y. Bengio, et al., et al., “Gradient-based learning applied to document recognition,” Proc. IEEE 86, 2278–2324 (1998). [CrossRef]

21. C. Zuo, J. Qian, S. Feng, et al., “Deep learning in optical metrology: a review,” Light Sci. Appl. 11, 39 (2022). [CrossRef]

22. T. Zeng, Y. Zhu, and E. Y. Lam, “Deep learning for digital holography: a review,” Opt. Express 29, 40572–40593 (2021). [CrossRef]

23. K. Wang, J. Dou, Q. Kemao, et al., “Y-Net: a one-to-two deep learning framework for digital holographic reconstruction,” Opt. Lett. 44, 4765–4768 (2019). [CrossRef]

24. Z. Ren, Z. Xu, and E. Y. Lam, “Learning-based nonparametric autofocusing for digital holography,” Optica 5, 337–344 (2018). [CrossRef]

25. T. Shimobaba, T. Kakue, and T. Ito, “Convolutional neural network-based regression for depth prediction in digital holography,” in IEEE 27th International Symposium on Industrial Electronics (ISIE) (IEEE, 2018), pp. 1323–1326.

26. S. Cuenat, L. Andréoli, A. N. André, et al., “Fast autofocusing using tiny transformer networks for digital holographic microscopy,” Opt. Express 30, 24730–24746 (2022). [CrossRef]

27. Q. Fang, H. Xia, Q. Song, et al., “Speckle denoising based on deep learning via a conditional generative adversarial network in digital holographic interferometry,” Opt. Express 30, 20666–20683 (2022). [CrossRef]

28. T. Seyler, L. Bienkowski, T. Beckmann, et al., “Robust multiwavelength digital holography using cascaded data evaluation,” in Imaging and Applied Optics Congress (OSA, 2020), paper HF3G.6.

29. J. Stevanovic, T. Seyler, J. Aslan, et al., “Digital holographic measurement system for use on multi-axis systems,” Proc. SPIE 11782, 398–408 (2021). [CrossRef]

30. T. Seyler, M. Fratz, A. Schiller, et al., “Digital holography in a machine tool: measuring large-scale objects with micron accuracy,” Proc. SPIE 12618, 126181U (2023). [CrossRef]

31. L. Z. Cai, Q. Liu, and X. L. Yang, “Generalized phase-shifting interferometry with arbitrary unknown phase steps for diffraction objects,” Opt. Lett. 29, 183–185 (2004). [CrossRef]

32. J. E. Greivenkamp, “Generalized data reduction for heterodyne interferometry,” Opt. Eng. 23, 350–352 (1984). [CrossRef]

33. C. Wagner, W. Osten, and S. Seebacher, “Direct shape measurement by digital wavefront reconstruction and multiwavelength contouring,” Opt. Eng. 39, 79–85 (2000). [CrossRef]

34. R. Guo, S. Lu, Y. Wu, et al., “Robust and fast dual-wavelength phase unwrapping in quantitative phase imaging with region segmentation,” Opt. Commun. 510, 127965 (2022). [CrossRef]

35. I. Goodfellow, Y. Bengio, and A. Courville, “Deep learning,” in Adaptive Computation and Machine Learning (MIT, 2016).

36. N. Srivastava, G. Hinton, A. Krizhevsky, et al., “Dropout: a simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res. 15, 1929–1958 (2014).

37. B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for optimal margin classifiers,” in Proceedings of the 5th Annual Workshop on Computational Learning Theory - COLT ’92, D. Haussler, ed. (ACM, 1992), pp. 144–152.

38. L. Breiman, “Random forests,” Mach. Learn. 45, 5–32 (2001). [CrossRef]

39. T. Seyler, “Digitale holographie in der werkzeugmaschine,” Dr.-Ing. (Technische Universität Kaiserslautern, 2020).

40. A. Paszke, S. Gross, F. Massa, et al., “PyTorch: an imperative style, high-performance deep learning library,” arXiv, arXiv:1912.01703 (2019). [CrossRef]

41. F. Pedregosa, G. Varoquaux, A. Gramfort, et al., “Scikit-learn: machine learning in Python,” J. Mach. Learn. Res 12, 2825–2830 (2011).

42. D. P. Kingma and J. Ba, “Adam: A Method for Stochastic OptimizationarXiv,” arXiv:1412.6980 (2014). [CrossRef]

43. T. Störk, T. Seyler, A. Bertz, et al., “Investigation of permuted phase steps on multiwavelength digital holographic measurements,” in Digital Holography and Three-Dimensional Imaging (Optica, 2023).

Algorithm	${M S E / r a d}^{2}$	MAE/rad	$R^{2}$ -score
LS	0.071	0.169	0.8482
SVM	0.052	0.140	0.8902
RF	0.023	0.075	0.9516

Algorithm	${M S E / r a d}^{2}$	MAE/rad	$R^{2}$ -score
DNN	0.159	0.261	0.4569
hDNN	0.139	0.229	0.5246
LS	0.234	0.401	0.2000
SVM	0.139	0.250	0.5232
RF	0.179	0.296	0.3881
Baseline	0.293	0.500	0.0000

Algorithm	${M S E / r a d}^{2}$	MAE/rad	$R^{2}$ -score
DNN	0.036	0.120	0.6143
hDNN	0.017	0.065	0.8168
LS	0.539	0.660	0.0000
SVM	0.085	0.190	0.0963
RF	0.077	0.190	0.1846
Baseline	0.095	0.179	0.0000

Algorithm	${M S E / r a d}^{2}$	MAE/rad	$R^{2}$ -score
LS	0.071	0.169	0.8482
SVM	0.052	0.140	0.8902
RF	0.023	0.075	0.9516

Algorithm	${M S E / r a d}^{2}$	MAE/rad	$R^{2}$ -score
DNN	0.159	0.261	0.4569
hDNN	0.139	0.229	0.5246
LS	0.234	0.401	0.2000
SVM	0.139	0.250	0.5232
RF	0.179	0.296	0.3881
Baseline	0.293	0.500	0.0000

Detecting vibrations in digital holographic multiwavelength measurements using deep learning

Abstract

1. INTRODUCTION

2. METHODS

A. Sensor System

B. Hologram Acquisition

C. Proposed Deep Learning Approach

3. DATA ACQUISITION AND PREPARATION

A. Experimental Setup

B. Data Preparation

C. Test Data Sets

4. RESULTS

A. Train Results

B. Test Results

5. CONCLUSION

Acknowledgment

Disclosures

Data availability

REFERENCES

Data availability

Cited By

Figures (10)

Tables (4)

Equations (6)

Applied Optics

Algorithm	Test Set 1	Test Set 2
Algorithm	Accuracy/%	Accuracy/%
DNN	96.0	97.3
hDNN	93.5	99.0
LS	79.0	88.0
SVM	96.2	96.7
RF	97.0	92.0