Deep learning-based method for non-uniform motion-induced error reduction in dynamic microscopic 3D shape measurement

Ji Tan; Ji Tan; Wenqing Su; Wenqing Su; Zhaoshui He; Zhaoshui He; Naixing Huang; Jianglei Di; Liyun Zhong; Yulei Bai; Yulei Bai; Bo Dong; Bo Dong; Shengli Xie; Shengli Xie; Shengli Xie

doi:10.1364/OE.461174

1. Introduction

Fringe projection profilometry (FPP) is a popular 3D shape measurement technique, which has been broadly applied to industrial inspection, biomedical science, and computer vision [1–4]. It can be divided into two categories: Fourier transform profilometry (FTP) and phase-shifting profilometry (PSP) [5,6]. Compared to the former, PSP has been increasingly appealing to researchers due to its advantages in higher accuracy and sensitivity for static 3D reconstruction [7,8]. Nevertheless, the multi-frame measurement mechanism of PSP limits its performance in dynamic measurement since the object motion produces extra unknown phase-shift in each captured fringe pattern, leading to the motion-induced error. Therefore, the measurement of moving objects still remains challenging.

To overcome this problem, two simple and effective ways are usually employed to alleviate the motion-induced error. The first way is to reduce the number of projected patterns [9], where the object motion will be slight as the projection period decreased. Unfortunately, the temporal phase unwrapping, which requires more fringe patterns, is always utilized to analyze highly discontinuous surfaces [10,11]. The second way is to improve the capturing speed [9], where the object can be considered quasi-static if the capturing speed is much higher than the motion speed. But the high-speed camera will lead to expensive hardware costs [12]. Taking these considerations, it is necessary to develop efficient motion-induced error reduction algorithms.

In the past few years, a large number of approaches have been developed to reduce motion-induced error. Li et al. [13] suggested to combine FTP and PSP to mitigate the motion-induced error and retrieve absolute phase by one high-frequency pattern and three low-frequency patterns using geometric constraint approach [14]. Qian et al. [15] proposed a hybrid Fourier-transform phase-shifting method with four fringe patterns and used stereo phase unwrapping by a dual-camera system. Guo et al. [16] detected the motion region and substituted the phase extracted by FTP based on dual-frequency composite phase-shifting grating. The above FTP-assisted approaches are convenient due to the advantage of single-shot-reconstruction, but they are subject to the inherent limitation of the phase retrieval accuracy. Liu et al. [17] estimated the phase-shift errors using three phase maps. This method can compensate the motion-induced error in pixel level for surfaces with large depth variations under uniform translation motion. Wang et al. [18] presented another error compensation method using additional temporal sampling based on defocused binary patterns. This strategy relies on the accuracy of the external trigger and it achieves high-speed measurement with the assumption of uniform motion. Besides, Wang et al. [19] found that the motion-induced error approximately doubles the frequency of the fringe pattern and used the Hilbert transform to calculate an additional phase map, then employed the average phase map of the original and additional phase map for 3D reconstruction. Similarly, Guo et al. [20] divided the four-step phase-shifted patterns into two sets to calculate corresponding phases, which have an opposite distributional tendency. The average phase can efficiently suppress the motion-induced error.

However, most of the methods mentioned above are limited to uniform motion-induced error compensation because they are based on the assumption that the movement keeps constant and is very slight, which may not satisfy the case of practical dynamic measurement. In contrast to the slight uniform motion, the non-uniform motion-induced error is trickier due to its complex nonlinearity, which is difficult to be described and estimated mathematically [21,22]. To reduce the non-uniform motion-induced error, advanced methods are urgent to be explored.

Recently, deep learning has attracted much attention in optical measurement [23–25] such as phase unwrapping [26], non-sinusoidal fringe analysis [27], and error compensation [28]. It has been demonstrated that deep learning is superior in nonlinear mapping compared with traditional methods. Inspired by this, we employ deep learning to achieve non-uniform motion-induced error reduction by finding the hidden complex nonlinear mapping. We first develop a training dataset for motion-induced error reduction by analyzing the error distribution in phase-shifting algorithm. Then, a deep learning-based architecture is built to construct a nonlinear mapping between the original phase and the ideal one. A fringe compensation module is designed to address the problem of fringe discontinuities and utilize spatial phase unwrapping (SPU) to obtain the original phase by only three fringe patterns. Finally, the network can be trained to correct both the uniform and non-uniform motion-induced error with pixel level.

2. Non-uniform motion-induced error analysis

When the measured scene is considered static during the acquisition process, the n-th phase-shifted pattern can be expressed as

(1)$${I_n}(x,y) = A(x,y) + B(x,y)\cos [{{{\varPhi }_{{GT}}}(x,y) + {\delta_n}} ],$$

where A(x, y) is the background intensity, B(x, y) is the intensity modulation, Φ_GT(x, y) is the ideal phase, δ_n= 2π(n-1)/N, N is the number of phase shift, and n = 1, …, N. The minimum number of phase shift N should be 3 in order to provide three equations and enable calculation of the three unknowns in Eq. (1). Then, the phase map can be solved by

(2)$${\varphi _{GT}}(x,y) = \arctan \left[ {\frac{{\sum\limits_{n = 1}^N {{I_n}(x,y)\sin ({\delta_n})} }}{{\sum\limits_{n = 1}^N {{I_n}(x,y)\cos ({\delta_n})} }}} \right]. $$

Since the operation of arctan function, the value of φ_GT(x, y) is range from -π/2 to π/2, i.e. a total range of π. Therefore, the absolute phase Φ_GT(x, y) can be obtained by expanding the wrapped phase φ_GT (x, y) to eliminate the 2π discontinuity using phase unwrapping algorithm with a well-determined fringe order k(x, y),

(3)$${{\varPhi }_{{GT}}}(x,y) = {\varphi _{GT}}(x,y) + 2\pi \cdot k(x,y).$$

As mentioned previously, the N-step phase-shifting method works well when the measured object keeps static because the phase shift is 2πn/N precisely. Once the object begins to move, the actual phase shift will have an additional error ɛ(x, y), as shown in Fig. 1. The captured fringe patterns and the wrapped phase can be described as follows

(4)$$I_n^{\prime}(x,y) = A(x,y) + B(x,y)\cos [{{{\varPhi }_{{GT}}}(x,y) + {\delta_n} + {\varepsilon_n}(x,y)} ],$$

(5)$${\varphi _{ER}}(x,y) = \arctan \left[ {\frac{{\sum\limits_{n = 1}^N {I_{_n}^{\prime}(x,y)\sin ({\delta_n})} }}{{\sum\limits_{n = 1}^N {I_{_n}^{\prime}(x,y)\cos ({\delta_n})} }}} \right].$$

Fig. 1. Illustration of motion-induced phase error: (a) motion in Z direction; (b) motion in XY plane.

Download Full Size | PDF

Then, the motion-induced error can be expressed as [20]

(6)$$\begin{aligned} \Delta \varphi (x,y) &= {\varphi _{ER}}(x,y) - {\varphi _{GT}}(x,y)\\& = \arctan \left[ {\frac{{\cos 2{\varphi_{GT}}\sum\limits_{n = 1}^N {\sin (2{\delta_n} + {\varepsilon_n}) - \sin 2{\varphi_{GT}}\sum\limits_{n = 1}^N {\cos (2{\delta_n} + {\varepsilon_n})} } + \sum\limits_{n = 1}^N {\sin {\varepsilon_n}} }}{{\cos 2{\varphi_{GT}}\sum\limits_{n = 1}^N {\cos (2{\delta_n} + {\varepsilon_n}) + \sin 2{\varphi_{GT}}\sum\limits_{n = 1}^N {\sin (2{\delta_n} + {\varepsilon_n})} } + \sum\limits_{n = 1}^N {\cos {\varepsilon_n}} }}} \right]. \end{aligned}$$

For the slight uniform motion, the phase-shifted error ɛ_n is assumed to be very small, such that sin(ɛ)≈ɛ and cos(ɛ)≈1. Furthermore, the unknown phase shift is a linear model as ɛ_n= n×ɛ. In this case, the motion-induced error can be further approximated as follows:

(7)$$\Delta \varphi (x,y) \approx k \cdot \varepsilon \cdot \sin [{2{\varphi_{GT}}(x,y)} ],$$

where k is a constant determined by the phase shifting number N. Equation (7) illustrates that the motion-induced error features a periodical distribution and it approximately doubles the frequency to the original fringe. It is relatively easy to estimate ɛ_n for compensating the phase error under uniform motion [17]. However, once ɛ_n is no longer a very small linear variable, possibly due to complex surface motion and low captured speed, the non-uniform motion-induced error distribution is different from that of Eq. (7).

To analyze the characteristic of large non-uniform motion-induced error distribution, we simulated measurement of a sphere using three-step phase shifting algorithm, where the uniform motion is set as ɛ₁=-0.2, ɛ₂= 0, ɛ₃= 0.2 and the large non-uniform motion is set as ɛ₁= 0.2, ɛ₂= 0, ɛ₃=-0.7. Both the uniform and non-uniform motion-induced error exhibits significant periodical ripple, as shown in Fig. 2(b) and 2(c). From Fig. 2(d), the slight uniform motion-induced error distribution is almost consistent with the conclusion of Eq. (7), which features a sinusoidal distribution and changes around the ground truth. However, the large non-uniform motion-induced error distribution shown in Fig. 2(e) is inconsistent with that calculated by Eq. (7). Moreover, the sinusoid characteristic of large non-uniform motion-induced error is destroyed and it tends to deviate one side from the ground truth. This indicates that the non-uniform motion-induced error cannot be precisely modeled by Eq. (7).

Fig. 2. Simulated results of motion-induced error: (a) ideal sphere; (b) sphere with uniform motion-induced error; (c) sphere with non-uniform motion-induced error; (d) error distribution of slight uniform motion; (e) error distribution of large non-uniform motion.

Download Full Size | PDF

Based on the above analysis, the large non-uniform motion error distribution can lead to a complex nonlinearity for the motion estimation [21,22]. Given that the unknown phase-shifted ɛ_n is no longer a linear model and is too large to satisfy the Taylor series approximation, it is hard to explicitly express the nonlinear function Φ_GT =f(Φ_ER) in practice. So, we implicitly describe the nonlinear relation between the phase with error and the ideal phase as follows:

(8)$${{\varPhi }_{{ER}}} = PU \left\{ {\arctan \left[ {\frac{{M ({{{\varPhi }_{{GT}}};{\varepsilon_n}} )}}{{D ({{{\varPhi }_{{GT}}};{\varepsilon_n}} )}}} \right]} \right\},$$

where PU{·}denotes the phase unwrapping; M(·) and D(·) represent $\sum {I_{_n}^{\prime}(x,y)\sin ({\delta _n})}$ and $\sum {I_{_n}^{\prime}(x,y)\cos ({\delta _n})}$, respectively.

3. Deep learning-based non-uniform motion-induced error reduction

From the previous section, the complex non-uniform motion-induced error is difficult to be mathematically modeled due to its inherent nonlinearity. In the past few years, deep learning has been considered an effective data-driven-based method for those nonlinear ill-posed inverse problems. Based on the universal approximation theory [29,30], the deep learning-based method can theoretically fit any nonlinear mapping without the mathematical description, and it mainly relies on the dedicated datasets for specific problems compared with those traditional physical model-based methods.

In this section, a deep learning-based method for motion-induced error reduction is illustrated and its framework is shown in Fig. 3. We first design a simulated dataset for motion-induced error according to the physical principles, and the network training is implemented to construct a pixel-wise nonlinear mapping between the original phase and the corrected one. In the process of DL-based error correction: a fringe compensation module is implemented to address the fringe discontinuities. Finally, the trained deep neural network can be employed to correct the motion-induced error for a practical measurement. The details of the dataset generation and the DL-based architecture are described in sections 3.1 and 3.2 respectively.

Fig. 3. Non-uniform motion-induced error reduction via deep learning.

Download Full Size | PDF

3.1 Training dataset generation

In deep learning, the dataset plays an important role for deep neural networks. Therefore, a high-quality training dataset for motion-induced error correction has to be prepared. However, the phase with motion-induced error and the corresponding ground truth is always hardly obtained in practical measurements. Besides, acquiring real data is inefficient and time-consuming. Taking these considerations, a simulated dataset, instead of the real experiment data, is generated for training the network.

We first generate the ideal phase without any error (ground truth) Φ_GT(x, y), which is used as the network label. The matrices of random numbers with a random size (e.g., 3×3, 5×5) are generated [27], and the probability distribution of these numbers features Gaussian distribution. Then, extrapolating the numbers by interpolations (e.g., bilinear, bicubic), and the high-dimensional smooth 3D surface, which reflects the topography of the measured object. The modulated fringe patterns are obtained by introducing the random surface to the reference fringe patterns. Finally, the corresponding phase distribution Φ_GT(x, y) is calculated as shown in Fig. 4.

Fig. 4. Training dataset generation by incorporating nonlinearity for deep learning.

Download Full Size | PDF

Next, we generate the phase with motion-induced error Φ_ER(x, y), which is used as network input. According to the analysis of motion-induced error in section 2, we set the phase-shifting error ɛ_n(x, y) for each fringe pattern based on three-step phase-shifting method as follows:

(9)$$I_n^{\prime}(x,y) = A(x,y) + B(x,y)\cos [{{2}\pi {f} \cdot {x} + 2\pi n/3 + {\varepsilon_n}(x,y){ + }{{\varPhi }_{GT}}(x,y)} ],$$

where n = 1, 2, 3; f is the frequency of the fringe pattern, ɛ₁(x, y) ∼ ɛ₃(x, y) are range from [-1, 1] to simulate the large movement. We first generate random matrices with different sizes and the complexity of the movement is controlled by changing the size of the random matrix., then the non-uniform phase-shifted can be obtained by interpolating the random matrices to the same size as the network input. In addition, we added random noise (e.g., Gaussian noise, salt/pepper noise) with a random level to I_n^’(x, y). Finally, using these three fringe patterns to retrieve the phase with motion-induced error Φ_ER(x, y) as shown in Fig. 4. It is noted that the ripple in network input Φ_ER(x, y) is entirely motion-induced, with no other factors such as gamma nonlinear response, intensity saturation, etc. This facilitates the network in learning high-level features of the motion-induced error.

3.2 Proposed deep learning-based architecture

3.2.1 Fringe compensation module for network pre-processing

To reduce the number of fringe patterns, we utilize the spatial phase unwrapping which requires fringe continuity. For this reason, a fringe compensation module is first presented to overcome fringe discontinuities and it can be considered a network pre-processing. Three main cases will cause fringe discontinuities: the shadow region, the saturated region, and the defocusing region. It is found that all the regions mentioned above have low intensity modulation. Based on this, the main idea is to use the reference fringe patterns to compensate the regions of low-intensity modulation, so as to achieve fringe continuity. Figure 5 shows the schematic of the fringe compensation module.

Fig. 5. Fringe compensation module for network pre-processing.

Download Full Size | PDF

We first use three fringe images to obtain a modulated image by calculating the intensity modulation as

(10)$${{I}_{\bmod }}(x,y) = \frac{2}{N}\sqrt {{{\left[ {\sum\limits_{n = 1}^3 {{I_n}(x,y)\sin (2\pi n/3)} } \right]}^2} + {{\left[ {\sum\limits_{n = 1}^3 {{I_n}(x,y)\cos (2\pi n/3)} } \right]}^2}}.$$

The regions that need to be compensated should be selected by a mask image, which is generated by setting an appropriate threshold T for I_mod(x, y) as follows:

(11)$$M(x,y) = \left\{ \begin{array}{ll} 1,&{{{I}_{\bmod }}(x,y) \ge T}\\ 0,&{\textrm{otherwise}}\end{array} \right.$$

After obtaining the mask image, the fringe compensation can be implemented. The low intensity modulation regions are located by the mask image M(x, y), and the reference fringe patterns I_ref(x, y) that are used to compensate the corresponding regions can be selected by the inverse mask image M_inv(x, y) [31]. The operation is expressed as

(12)$$\left\{ \begin{array}{l} {I_{\textrm{M}\_n}}(x,y) = M(x,y) \times {I_n}(x,y)\\ {I_{\textrm{Minv}\_n}}(x,y) = {M_{\textrm{inv}}}(x,y) \times {I_{\textrm{ref}\_n}}(x,y) \end{array} \right.$$

Therefore, the compensated fringe pattern can be fused by

(13)$${I_{\textrm{com}\_n}}(x,y) = {I_{\textrm{M}\_n}}(x,y) + {I_{\textrm{Minv}\_n}}(x,y).$$

Though the compensated region can only recover the phase of the reference plane, it can prevent the error diffusion in SPU. Therefore, the phase error caused by fringe discontinuity can be reduced and the quality of the original phase that is used for the coming network process can be improved. It is noted that this strategy may perform poorly on multi-colored objects because the color will affect the quality of the mask. A potential solution is to fuse multiple masks into an optimal one with the aid of the multiple exposure technique.

3.2.2 Deep neural network for nonlinear mapping

The deep neural network for non-uniform motion-induced error reduction is based on a convolutional residual module and we organize all residual modules into a U-net, which is shown in Fig. 6. To prevent overfitting, we designed a 4 layers network, which consists of an encoder part and a decoder part. In the encoder part, the dimension of input data is (W, H, C), where width W = 512, height H = 1024, channel C = 1. The residual module includes a convolution layer and a shortcut. The size of convolution operation in each residual module is 3×3, and the max pool size is 2×2 with stride 2 for down-sampling. In the decoder part, the Up-Conv (or transposed convolution) size is 2×2 for up-sampling, concatenating with corresponding feature map at contracting path by skip connection. The number of feature channels of each residual module is upper the layer. This network has two main advantages: first, we improve the network’s accuracy and avoid the network degradation by implementing the residual modules, which skip the weight layers and use the identity function to preserve gradient; second, fine-grained details can be recovered in the prediction by using long skip connection.

Fig. 6. Network architecture for non-uniform motion-induced error reduction.

Download Full Size | PDF

The network process proceeds in two stages called network training and network testing. First, the network training is implemented to establish the nonlinear mapping between the data with motion-induced errors and the label data. Then, the network testing is performed to correct the phase and verify the network performance.

In the network training stage, we input the training data to our network as shown in Fig. 6. After passing through the residual module, the images are abstracted to feature maps with the size of W × H × C. For the convolutional layer, the number of channels depends on the number of convolution filters (kernels), the same thing as the residual module. To balance the network performance and efficiency, the maximum number of filters is set to 256. The following pooling layer simplifies the output of the residual module by performing nonlinear down sampling. Following the max pooling, there are several Up-conv (or transposed convolution) layers for up sampling. During the up sampling, the output data of these dataflow paths are concatenated into a feature map with size of W × H × C by the concatenation. The output of the last layer is with a 1×1 convolution operation and single channel. The recorrected phase will reflect on the single channel. The accuracy of the network should be verified by a loss function, which is defined as the mean-squared-errors (MSE)

(14)$$Loss(\theta ) = \frac{1}{{W \times H}}\sum\limits_{x = 1}^W {\sum\limits_{y = 1}^H {[{{{||{{{\varPhi }_{OUT}}(x,y) - {{\varPhi }_{GT}}(x,y)} ||}^2}} ]} } ,$$

where θ is the parameter space which includes the weights, bias and so on, Φ_OUT(x, y) is the output from network, Φ_GT(x, y) is the label. During the training stage, the loss function can be considered as a feedback parameter and the adaptive moment estimation (ADAM) can tune the weights of network to minimize the loss function [32]. The training will continue to run until the network tends to converge.

In the network testing stage, we verify the accuracy of our network by using the test dataset which is never seen in the training stage. We consider that the network is available for actual measurement if the validation error is within the acceptable level. Otherwise, we need to optimize the network parameters and return to the training stage.

The architecture is implemented using the TensorFlow framework version 2.0, and trained on a computer with CPU Core i7-8750H ((2.20 GHz) and 16GB of RAM. The dataset contains 27000 groups of phase maps for network training and 3000 groups for network testing. During the training process, 4-ways NVIDIA GeForce GTX 1080Ti GPU was employed for accelerating the training process. The consuming time of the training process is 8 hours for the 96 training epochs.

4. Experiments

4.1 Experiments on simulated dataset

To evaluate the performance of the trained network, two testing results of the simulated dataset are shown in this section. Besides, the network output is also compared with the result of Hilbert transform compensation (HTC) method. Figure 7 shows the testing result of uniform motion, where the phase shifted errors ɛ₁, ɛ_2, ɛ₃ are set as -0.3rad, 0rad, 0.3rad respectively. To compare the results clearly, we use the phase size of 300×300 pixels. Figure 7(a)–7(c) represent the input data (phase with motion-induced error), HTC result, and network output (phase after network correction), respectively. Fig. 7(d) shows the error distributions of three different results. To further compare the phase height, Fig. 7(e) shows the cross-section of three phases. It can be found that HTC suppresses the motion-induced error to some extent and its result still remains the ripple error as shown in the green line. The root mean squared error (RMSE) and the structural similarity (SSIM) of HTC are 0.008 and 0.993 respectively. In contrast, the corrected phase by network shown in the red line is smooth and it almost coincides with the black dashed line (the label data). The RMSE and the SSIM of this result are 0.002 and 0.997, respectively.

Fig. 7. Network testing results with uniform motion: (a) phase with motion-induced error (input data); (b) corrected phase by HTC method; (c) corrected phase by network (output data); (d) error distributions of the three phases; (e) the 200^th row cross-section of the three phases.

Download Full Size | PDF

Figure 8 shows the testing result of non-uniform motion, where the ɛ₁, ɛ_2, ɛ₃ are set as -0.1rad, 0rad, 0.5rad respectively. Similar to the results in Fig. 7, the network also shows available performance in the reduction of non-uniform motion. The error distribution shown in Fig. 8(d) is similar to Fig. 7(d), which verifies the superior performance of the trained network. Compared with the result of HTC, the network output almost eliminates ripples and its cross-section is closer to the ground truth as shown in Fig. 8(e). The RMSE of HTC and the network output are 0.013 and 0.004, while the SSIM are 0.992 and 0.996 respectively. The above testing results demonstrate that the uniform/non-uniform motion-induced error can be obviously reduced by the deep neural network.

Fig. 8. Network testing results with non-uniform motion: (a) phase with motion-induced error (input data); (b) corrected phase by HTC method; (c) corrected phase by network (output data); (d) error distributions of the three phases; (e) the 380^th row cross-section of the three phases.

Download Full Size | PDF

4.2 Experiments on real scene

The experiments of real scene should be implemented to verify the generalization performance of the proposed deep learning-based approach. First, a microscopic PSP system is developed as shown in Fig. 9. It consists of a projector (TI LightCrafter 2010) with a resolution of 854×480, and a camera (Basler daA1600-60uc) with a resolution of 1350×800. The camera was synchronized by a trigger signal of the projector and the captured frame rate is 30 fps. The FOV of our system is 10mm×8mm and the DOF is 7mm. The system calibration is implemented by using the triangular stereo model, which calculates the 3D world coordinate of the object point by calibrating both the camera and projector. The phase is unwrapped by employing the least square algorithm, which is fast and robust in SPU.

Fig. 9. System setup and the measured objects.

Download Full Size | PDF

To quantitatively verify the accuracy of the proposed method, a moving sphere is measured by the typical PSP, Hilbert transform compensation (HTC) method, and the proposed method, respectively. We performed the sphere fitting to the measured results of the three different methods, which are shown in Figs. 10(a)–10(c) respectively. In Fig. 10(a), the fitted diameter of the typical PSP is 14.296 mm and the RMS error is 0.102 mm. In Figs. 10(d) and 10(c), the fitted diameter of both HTC method (14.488 mm) and the proposed method (14.476 mm) are closer to the ground truth. But the RMS error of the proposed method (0.059 mm) is smaller than the HTC method (0.080 mm), illustrating that the result of the proposed method is more reliable.

Fig. 10. Fitting results of the moving sphere: (a) typical PSP; (b) HTC method; (c) deep learning-based method.

Download Full Size | PDF

The entire dynamic measurement sequence of the moving sphere by the proposed method is shown in Visualization 1 and one frame of the sequence is used for comparison, where Figs. 11(a)–11(c) are the 3D reconstructions by traditional PSP, HTC method, and the proposed method. It can be found that the vertical stripes are obvious in the reconstructed surface by using the typical PSP shown in Fig. 11(a). In Fig. 11(b), the motion-induced error is alleviated and the vertical stripes are almost disappeared by using the HTC method. After using the proposed method, the reconstructed surface seems to be smoother than the HTC method and it exhibits the best performance. Figure 11(d) shows the cross-sections of the white lines corresponding to the three methods. From the zoom-in view, the HTC method has remnant errors in some points, while the proposed method works stably in error reduction. These experimental results demonstrate the generalization performance of the trained network and illustrate that the proposed deep learning-based method can correct the motion-induced errors better than the HTC method in microscopic dynamic measurement.

Fig. 11. Measured results of the moving sphere: (a) typical PSP; (b) HTC method; (c) deep learning-based method; (d) cross-section of the three methods. (Visualization 1).

Download Full Size | PDF

To qualitatively evaluate the generalization capability of the deep learning-based method, we test a tooth model to perform a dynamic measurement of complex surface with non-uniform motion. The dynamic measurement sequence can be found in Visualization 2 and four frames of the sequence are used for comparison. Figures 12(a) and 12(b) are the 3D reconstructions by traditional PSP and the proposed deep learning-based method. Figures 12(c) and 12(d) show the zoom-in view of the white box in Figs. 12(a) and 12(b). It is obvious that the vertical stripes in Figs. 12(a) and 12(c) caused by non-uniform motion can be removed after using the proposed deep learning-based approach as shown in Figs. 12(b) and 12(d). Besides, Fig. 12(e) shows the cross-sections of the white lines in Figs. 12(c) and 12(d). The large ripples are clear in the results of typical PSP, while the results of the proposed method are smooth. These results demonstrate that the proposed method is effective for the complex surface with non-uniform motion.

Fig. 12. Measured results of the moving tooth model: (a) typical PSP; (b) deep learning-based PSP; (c) zoom-in view of white box in (a); (d) zoom-in view of white box in (b); (e)cross-section of the two methods. (Visualization 2)

Download Full Size | PDF

In addition, we also evaluate the performance of the deep learning-based method by measuring the deforming surface. A piece of smooth plastic is deformed under the mechanical load and we perform the dynamic measurement. Figure 13 shows four frames of the measurement results and the entire sequence can be found in Visualization 3. The typical PSP without error reduction is firstly implemented, as shown in Fig. 13(a). The 3D reconstruction result is almost unacceptable due to the non-smooth surface geometry. Then, the deep-learning-based method is utilized to correct the motion-induced error. In Fig. 13(b), the vertical stripes are removed and the measurement quality is improved. To investigate the two results more clearly, Figs. 13 (c) and 13(d) show the zoom-in view of the white box in Figs. 13(a) and 13(b). In Fig. 13(e), the cross-sections of the white lines corresponding in Figs. 13(c) and 13(d) further verify the feasibility of the proposed method for measuring the deformable object.

Fig. 13. Measured results of plastic deformation: (a) typical PSP; (b) deep learning-based PSP; (c) zoom-in view of white box in (a); (d) zoom-in view of white box in (b); (e)cross-section of the two methods. (Visualization 3).

Download Full Size | PDF

5. Conclusions and discussions

This paper proposed a deep learning-based motion-induced error reduction method for microscopic PSP. In our framework: first, a fringe compensation module is presented to solve the problem of phase discontinuities in practice using SPU, which supports reconstructing from fewer fringe patterns; then, a deep neural network is constructed by training a self-established dataset. It shows powerful ability of nonlinear estimating in error correction with pixel level for the microscopic dynamic measurement. Both simulation and actual experimental results demonstrate that the deep learning-based method can significantly suppress the uniform/non-uniform motion-induced error even under low capture speed.

Although the deep learning-based method exhibits good performance, there are several points that can be further discussed. First, the applicability of the proposed method may be limited to sharp-edged objects. This is because the dataset lacks the simulation of these object edges. Second, the proposed method may remain residual errors in some measured scenes such as imaging defocusing. The reason is that the coupling factors in real data will destroy the features of the motion-induced error distribution and reduce the accuracy of the network. Third, if the object surface has ripple and its period is similar to the error distribution, the proposed method may be invalid. Future work will be done to improve the performance of the network by extending the training dataset and optimizing the network structure.

Funding

National Natural Science Foundation of China (61727810, 61773127); Natural Science Foundation of Guangdong Province (2018A030313306, 2019B1515120036, 501200069); Key Areas of Research and Development Plan Project of Guangdong (2019B010147001).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. S. Zhang, “Recent progresses on real-time 3-d shape measurement using digital fringe projection techniques,” Opt. Lasers Eng. 48(2), 149–158 (2010). [CrossRef]

2. Z. Zhang, “Review of single-shot 3D shape measurement by phase calculation-based fringe projection techniques,” Opt. Lasers Eng. 50(8), 1097–1106 (2012). [CrossRef]

3. Z. Wang, Z. Zhang, N. Gao, Y. Xiao, F. Gao, and X. Jiang, “Single-shot 3D shape measurement of discontinuous objects based on a coaxial fringe projection system,” Appl. Opt. 58(5), A169–A178 (2019). [CrossRef]

4. X. Su and Q. Zhang, “Dynamic 3-d shape measurement method: a review,” Opt. Lasers Eng. 48(2), 191–204 (2010). [CrossRef]

5. M. Takeda and K. Mutoh, “Fourier transform profilometry for the automatic measurement of 3-D object shapes,” Appl. Opt. 22(24), 3977–3982 (1983). [CrossRef]

6. C. Zuo, S. Feng, L. Huang, T. Tao, W. Yin, and Q. Chen, “Phase shifting algorithms for fringe projection profilometry: A review,” Opt. Lasers Eng. 109, 23–59 (2018). [CrossRef]

7. M. Duan, Y. Jin, C. Xu, X. Xu, C. Zhu, and E. Chen, “Phase-shifting profilometry for the robust 3-d shape measurement of moving objects,” Opt. Express 27(16), 22100–22115 (2019). [CrossRef]

8. Y. Hu, Q. Chen, S. Feng, and C. Zuo, “Microscopic fringe projection profilometry: A review,” Opt. Lasers Eng. 135, 106192 (2020). [CrossRef]

9. S. Feng, C. Zuo, T. Tao, Y. Hu, M. Zhang, Q. Chen, and G. Gu, “Robust dynamic 3-d measurements with motion-compensated phase-shifting profilometry,” Opt. Lasers Eng. 103, 127–138 (2018). [CrossRef]

10. C. Zuo, L. Huang, M. Zhang, Q. Chen, and A. Asundi, “Temporal phase unwrapping algorithms for fringe projection profilometry: A comparative review,” Opt. Lasers Eng. 85, 84–103 (2016). [CrossRef]

11. S. Zhang, “Absolute phase retrieval methods for digital fringe projection profilometry: A review,” Opt. Lasers Eng. 107, 28–37 (2018). [CrossRef]

12. C. Zuo, T. Tao, S. Feng, L. Huang, A. Asundi, and Q. Chen, “Micro Fourier transform profilometry (µFTP): 3D shape measurement at 10,000 frames per second,” Opt. Lasers Eng. 102, 70–91 (2018). [CrossRef]

13. B. Li, Z. Liu, and S. Zhang, “Motion-induced error reduction by combining Fourier transform profilometry with phase-shifting profilometry,” Opt. Express 24(20), 23289–23303 (2016). [CrossRef]

14. Y. An, J. Hyun, and S. Zhang, “Pixel-wise absolute phase unwrapping using geometric constraints of structured light system,” Opt. Express 24(16), 18445–18459 (2016). [CrossRef]

15. J. Qian, T. Tao, S. Feng, Q. Chen, and C. Zuo, “Motion-artifact-free dynamic 3D shape measurement with hybrid Fourier-transform phase-shifting profilometry,” Opt. Express 27(3), 2713–2731 (2019). [CrossRef]

16. W. Guo, Z. Wu, Y. Li, Y. Liu, and Q. Zhang, “Real-time 3D shape measurement with dual-frequency composite grating and motion-induced error reduction,” Opt. Express 28(18), 26882–26897 (2020). [CrossRef]

17. X. Liu, T. Tao, Y. Wan, and J. Kofman, “Real-time motion-induced-error compensation in 3D surface-shape measurement,” Opt. Express 27(18), 25265–25279 (2019). [CrossRef]

18. Y. Wang, V. Suresh, and B. Li, “Motion-induced error reduction for binary defocusing profilometry via additional temporal sampling,” Opt. Express 27(17), 23948–23958 (2019). [CrossRef]

19. Y. Wang, Z. Liu, C. Jiang, and S. Zhang, “Motion induced phase error reduction using a Hilbert transform,” Opt. Express 26(26), 34224–34235 (2018). [CrossRef]

20. W. Guo, Z. Wu, Q. Zhang, and Y. Wang, “Real-time motion-induced error compensation for 4-step phase-shifting profilometry,” Opt. Express 29(15), 23822–23834 (2021). [CrossRef]

21. L. Lu, V. Suresh, Y. Zheng, Y. Wang, and B. Li, “Motion induced error reduction methods for phase shifting profilometry: a review,” Opt. Lasers Eng. 141, 106573 (2021). [CrossRef]

22. L. Lu, Y. Yin, Z. Su, X. Ren, Y. Luan, and J. Xi, “General model for phase shifting profilometry with an object in motion,” Appl. Opt. 57(36), 10364–10369 (2018). [CrossRef]

23. C. Zuo, J. Qian, S. Feng, W. Yin, Y. Li, P. Fan, J. Han, K. Qian, and Q. Chen, “Deep learning in optical metrology: a review,” Light Sci. Appl. 11(1), 39 (2022). [CrossRef]

24. S. Feng, C. Zuo, Y. Hu, Y. Li, and Q. Chen, “Deep-learning-based fringe-pattern analysis with uncertainty estimation,” Optica 8(12), 1507–1510 (2021). [CrossRef]

25. K. Wang, Q. Kemao, J. Di, and J. Zhao, “Y4-Net: a deep learning solution to one-shot dual-wavelength digital holographic reconstruction,” Opt. Lett. 45(15), 4220–4223 (2020). [CrossRef]

26. K. Wang, Y. Li, Q. Kemao, J. Di, and J. Zhao, “One-step robust deep learning phase unwrapping,” Opt. Express 27(10), 15100–15115 (2019). [CrossRef]

27. S. Feng, C. Zuo, L. Zhang, W. Yin, and Q. Chen, “Generalized framework for non-sinusoidal fringe analysis using deep learning,” Photonics Res. 9(6), 1084–1098 (2021). [CrossRef]

28. E. Aguénounon, J. Smith, M. AI-Taher, M. Diana, X. Intes, and S. Gioux, “Real-time, wide-field and high-quality single snapshot imaging of optical properties with profile correction using deep learning,” Biomed. Opt. Express 11(10), 5701–5716 (2020). [CrossRef]

29. T. Chen and H. Chen, “Approximations of continuous functionals by neural networks with application to dynamic systems,” IEEE Trans. Neural Netw. 4(6), 910–918 (1993). [CrossRef]

30. K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural Networks 4(2), 251–257 (1991). [CrossRef]

31. J. Tan, Z. He, B. Dong, Y. Bai, L. Lei, and J. Li, “Fast and robust fringe projection profilometry for surface with hole discontinuities via backlighting,” Meas. Sci. Technol. 32(5), 055002 (2021). [CrossRef]

32. D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” https://arxiv.org/abs/1412.69804. (2014).

Name	Description
Visualization 1	visualization1
Visualization 2	visualization2
Visualization 3	visualization3

Deep learning-based method for non-uniform motion-induced error reduction in dynamic microscopic 3D shape measurement

Abstract

1. Introduction

2. Non-uniform motion-induced error analysis

3. Deep learning-based non-uniform motion-induced error reduction

3.1 Training dataset generation

3.2 Proposed deep learning-based architecture

3.2.1 Fringe compensation module for network pre-processing

3.2.2 Deep neural network for nonlinear mapping

4. Experiments

4.1 Experiments on simulated dataset

4.2 Experiments on real scene

5. Conclusions and discussions

Funding

Disclosures

Data availability

References

Supplementary Material (3)

Data availability

Cited By

Figures (13)

Equations (14)

Optics Express