Unsupervised deep learning for 3D reconstruction with dual-frequency fringe projection profilometry

Sizhe Fan; Shaoli Liu; Shaoli Liu; Xu Zhang; Xu Zhang; Hao Huang; Wei Liu; Peng Jin

doi:10.1364/OE.435606

1. Introduction

Fringe projection profilometry (FPP) [1–4] is widely applied in non-contact 3D reconstruction because of its high speed and accuracy. FPP uses phase-shifting algorithm [5] or Fourier transform algorithm [6] to process a series of fringe images, which are projected onto the object by projector and then captured by camera. The later algorithm requires only one pattern and directly gets the absolute phase map, while not robust to the reconstruction of non-smooth surfaces [7]. To get higher accuracy and resolution, the former algorithm is more widely used. The schematic diagram of phase calculation based on the phase-shifting algorithm is shown in Fig. 1. At least three fringe images (usually four for higher accuracy) are needed for this algorithm [8] and with the phase-shifting calculation, the relative phase map can be achieved. Some unwrapping methods, such as Path-Following Methods [9], Weighted Least-Squares Phase Unwrapping [10], Temporal Phase Unwrapping Algorithm [11–13], are followed to transform the relative phase map to absolute phase and then the 3D reconstruction result(the height map) can be obtained with the calibration data [14]. However, shooting of multiple fringe images can be time consuming, which compromises the efficiency. Furthermore, both algorithms can lead to gamma nonlinear error caused by optical path distortion in measuring facilities [15].Creating look-up table [16] or taking more phase shifting steps [17] can reduce error, nevertheless, inevitably lead to further time consuming.

Fig. 1. The flowchart of phase-shifting calculation.

Download Full Size | PDF

Recently, in order to reduce the fringe images that needed to be taken and to eliminate the influence of gamma nonlinearity mentioned above, deep learning has been applied in various ways in FPP. Feng et al. [18] collected large set of fringe images and used phase-shifting algorithm to produce corresponding ground truth relative phase maps, firstly proposed to use a deep CNN to get the high-accuracy and robust relative phase from a single fringe pattern, which combines the advantages of phase-shifting and Fourier algorithm. G. E. Spoorthi et al. [19] trained the Phase-Net with numerous simulated relative phase maps as input and absolute phase maps as output to solve the problem of phase unwrapping, proved that CNN can overcome the instability of pixel classification in traditional unwrapping algorithm. Yu et al. [20] evaluated the capability of CNN on fringe pattern generating, which inputs one or two grating fringe images and outputs multiple sets of phase-shifted sinusoidal fringes with different frequencies. Multiple times the number of fringe images need to be captured as ground truth to train the network. Then the output images can be substituted into traditional temporal phase-shifting algorithm to get the target absolute phase map. Because only less than or equal to two fringe images are needed to be captured by camera, it can measure dynamic objects accurately. Qian et al. [21] used one RGB fringe image to train the CNN, which has three color channels containing three steps of phase-shifting sinusoidal fringes. As a result, the height map can be directly generated by only one RGB fringe pattern. However, a great quantity of ground truth height maps were still necessary for the CNN training. Jeught and Dirckx [22] explored deep learning for fringe analysis by training an end-to-end CNN model with 12,500 groups of 2D&3D data to directly generate 3D absolute phase from one input simulated grayscale fringe image. It predicts the absolute phase map in one step without considering the problems of phase calculation and unwrapping. Deep learning also has other applications in fringe analysis such as modulation enhancement [23],system calibration [24] and image denoising [25,26].

By replacing the traditional phase processing algorithms with CNN, the aforementioned methods avoided the error caused by traditional algorithm principle. And with the network trained, these method can reconstruct the height map with only one or two fringe images, much less than traditional algorithms(at least four fringe image is needed). Considerable time for image projection and shooting can be saved, so that efficiency of reconstruction will be significantly improved. However, the methods mentioned above are subject to a number of restrictions. Some restrictions are that: (1)though the number of the needed input fringe images is reduced with the trained CNN, establishing the training dataset of a large amount of fringe images and the corresponding ground truth height maps can be even more troublesome and time consuming because the ground truth generation and labeling of each object needs to take at least three fringe images and to carry out phase-shifting, phase unwrapping and calibration algorithms. That means that although the input of the network only needs one fringe image, the other two(or more) fringe images still need to be collected to produce the corresponding ground truth data. Such data processing for the whole dataset involves a large amount of extra fringe image acquisition time and manual labeling work, which takes hundreds of hours. (2) extra fringe images and ground truth data take up large amount of computer memory space. (3) the absolute phase map calculated by phase-shifting algorithm inevitably have errors because of gamma nonlinear distortion and random noise. So it is unreasonable to regard the calculation result as the ground truth of the CNN training. In short, the demand for massive and high-accuracy labeled 3D height maps has become one of the most challenging problems faced by fringe analysis in deep learning.

Zheng et al. [27] used computer graphics to establish a digital twin with the parameters calibrated in the real-word system to simplify the dataset generation process, which reduced the time for ground truth generation to a certain extent but still demanded much labor work and memory space. With the digital twin, the ground truth could be directly obtained, which avoided repeated calculation and saved much labeling time. Compared with other methods, the ground truth could be generated without error and the trained networks could be applied in different FPP systems if the location parameters of the camera and the projection are fixed in this method. Nevertheless, the modeling by using computer graphics in digital twin system can still cost a lot of time and large amount of computer graphic models take up generous memory space in the computer, compromising efficiency. Besides, digital twin system and real-world one are not exactly the same, which reduces the accuracy of the absolute phase map when the trained CNN is applied to the real-world FPP system.

In general, all the above methods are based on supervised training. Nevertheless, in order to train the network by supervised methods, a large set of high-quality training data is necessary. Due to the defects of the algorithm itself and the difficulty of generating ground truth, high quality training data set and efficient training model are still unsolved problems in this field. Under this background, we propose to establish an unsupervised framework to train the network, which, to the best of our knowledge, is the first time unsupervised learning has been applied in FPP system currently. Unsupervised deep learning [28] is an important part of machine learning, which is usually used to solve the problems of building large dataset or labeling the ground truth. If the CNN training is in unsupervised ways, the dataset will only contain input data, which is easy to get. With the unsupervised method applied in FPP system, the calculation for labeling the ground truth can be omitted and large amount of the memory space can be saved. To sum up, the unsupervised CNN in FPP can be trained with much fewer images and with less data storage space, which further improves the reconstruction efficiency.

The key to realize an unsupervised deep learning network is to establish the mathematical connection of input fringe image data and output 3D height map data. With the accuracy of the reconstruction is maintained, we propose an end-to-end framework that is unsupervised based on a reprojection model in this paper. After a small set of specific height map ground truth data are used for network pretraining to limit the learning direction of the network, the framework learns 3D object reconstruction in unsupervised method that removes the need for massive supervised ground truth 3D data, which enables much faster data labeling compared with purely supervised framework. The proposed framework takes two modulated fringe patterns with high and low frequency as the input and outputs the height map. With the extrinsic matrices of the measuring system is fixed, which means the relative position of the projector and camera is constant, once the output phase map is predicted, the standard fringe will be re-projected onto the surface of the height map by the proposed reprojection model so that the newly deformed fringe can be compared with the former input fringe to establish the loss function. As shown in Fig. 2, the flowchart forms a closed loop to ensure the realization of forward propagation and backward propagation [29], which is the premise of convolutional neural network training. Because the phase-shifting calculation is replaced by reprojection, the error of the ground truth caused by traditional phase-shifting algorithm can be eliminated. Compared with the supervised learning, this new deep learning model has the main advantage that the proposed unsupervised framework can achieve competitive performance in accuracy(Root Mean Square Error within 0.0425 mm) and robustness with much less time and labor work for ground truth calculation(about 5 times data generation time less than supervised CNN) and much less memory space(about 10 times memory space less than supervised CNN) for keeping 3D phase data and extra fringe images.

Fig. 2. The unsupervised network architecture

Download Full Size | PDF

The remain of this paper is organized as follows. Section 2 discusses the principle of the proposed method. Section 3 analyzes the experiments and the results. Section 4 presents the discussion. Section 5 is the summary.

2. Unsupervised network model

In this section, the neural network architecture is firstly illustrated. Then we introduce the principle of how the loss function is established based on a reprojection model.

2.1 Neural network architecture

The whole network architecture is shown in Fig. 2. The input of the network consists of two fringe images of high and low frequency and the output of is the corresponding height map. Then we use two Sub-Nets to learn the fringe information contained in two frequency channels respectively. After the Sub-Nets, we set more concentration on high-frequency fringe for it contains more detail information, so before the contact layer, two more convolutional layers are set in high-frequency path than in low-frequency path to improve the learning weight of high frequency fringe image. After the contact layer, another two convolutional layers are established to output the one-channel height map. During the network training, once the height map is predicted, the reprojection of each frequency is employed to transform height map back to fringe images. Then two loss functions can be established for the unsupervised training.

Note that the previous experiments show that single-fringe unsupervised learning may make the predicted three-dimensional height map of the network inherit the fringe information of the input image, which leads to the lack of robustness of the trained network. Dual-fringe heterodyne algorithm [30] has been proposed to unwrap the phase by applying the low frequency image to assist high-frequency phase unwrapping. Inspired by this conception, we propose to combine two fringe images with high and low frequency respectively to a two-channel map as the input of the network. The low-frequency fringe is learned to assist rectify the learning direction while the training process. With the mutual condition of two frequency fringe, the inheritance of single frequency fringe information will not reduce the value of loss function. Therefore, the dual-frequency network can effectively improve the stability of unsupervised network training.

The detail of the Sub-Net is illustrated in Fig. 3. We draws lessons from U-Net [31] to establish the sub-nets for its excellent performance in semantic segmentation. The architecture of the Sub-Net includes the contracting path(encoding) on the left, the expansive path(decoding) on the right and the copying & contacting path(copying the feature maps in contracting path and splicing them to the feature maps in expansive path) in the middle. As the modulated simulated fringe image is input into the network, the height and width of the feature map will be half and the number of channels will be twice after each convolution block calculation in the encoding block, which contains two convolutional layers with 3×3 kernels and a max pooling layer with a 2×2 kernel(for increasing the number of channels and down sampling). Each convolutional layer is followed by a batch normalization(BN) layer and a RELU activation function which can prevent overfitting. While the feature map is processed in the expansive path, its height and width will double and the number of channels will be decreased by two in each encoding block, whose architecture includes an up-convolutional layer with a 2×2 kernel and two convolutional layers with 3×3 kernels which is for up sampling and decreasing the number of channels. Then a BN layer and a RELU layer are followed closely. The last layer feature map of each encoding block is copied and connected with the feature map in the corresponding decoding block in the copying & contacting path to preserve feature information which would be lost during the training in encoder and decoder parts of the deep CNN. Furthermore, the last convolution block is designed so that the height and width of the output feature maps of Sub-Net are consistent with the input image (256×256) and the number of channels is 64.

Fig. 3. The detailed architecture of the Sub-Net. Each cube represents a three-dimensional feature map and both of the size of the feature map and the number of channels are marked around the cube. The input and the output data are fringe image and height map respectively.

Download Full Size | PDF

2.2 Loss function

The main difficulty of our proposed method is to establish the conversion algorithm between the 3D height maps and the disordered coding information in the fringe images so that the loss function can be constructed. Differentiable renderer [32,33] has been widely applied to realize unsupervised learning for 3D reconstruction. The texture of the objects was rendered onto the predicted 3D model, then the two dimensional images can be extracted from the model and the loss function is established with the input images to realize unsupervised training. Inspired by this conception, we compare the rendering process to the fringe projecting in FPP system, which also render the fringe texture onto the object. We create a differentiable reprojection model based on the projecting principle to render the fringe texture onto the output 3D height map. With the reprojection model, the predicted two-dimensional fringe image can be captured to establish loss function with the initial input fringe image.

This paper creates a reprojection model for the unsupervised framework to realize the closed loop of the unsupervised learning path. Figure 4 is a simple schematic diagram of FPP system, as well as a schematic diagram of reprojection model, which includes the measurement system composed of a pinhole camera, a projector and the objects to be measured. With the relative position of projector and camera is defined, the height of the object can be calculated according to the principle of triangular ranging [34]. We generated the standard fringe images by computer firstly, and then sent it to the projector. The standard fringe can be modulated by the object. The camera captures the modulated image at last. It's obvious that $\Delta \textrm{ABO}$ and $\Delta \textrm{CPO}$ in Fig. 4 are similar triangles, so the similar triangle equation can be established as:

(1)$$\frac{{h(x,y)}}{{L - h(x,y)}} = \frac{{\overline {AB} }}{D}.$$

where L is the vertical distance from the optical center of the camera to the reference plane, and $D$ is the distance between the camera and the projector. Since the y-axis is parallel to the fringe direction, the phase value in the projector coordinate changes continuously along the x-axis direction, and the phases of all points on each projector optical path are consistent, the relative phase value of point O can be expressed as $\Delta \varphi (x,y) = 2\pi {f_\textrm{0}}({x_A} - {x_B})$, where ${f_\textrm{0}}$ is the frequency of grating fringes and $\Delta \varphi (x,y)$ is the phase map in pixels, the relationship between $h(x,y)$ and $\Delta \varphi (x,y)$ can be expressed as:

(2)$$\frac{1}{{h(x,y)}} = \frac{1}{L} + \frac{{2\pi D{f_0}}}{{L \cdot \Delta \varphi (x,y)}}.$$

Fig. 4. The schematic diagram of FPP system.

Download Full Size | PDF

When the standard fringes projected by the projector are reflected by the surface of the object, the intensity distribution of modulated fringes can be described as:

(3)$$I(x,y) = \textrm{a}(x,y) + \textrm{b}(x,y)\cos [{2\pi {f_0}x + \Delta \varphi (x,y)} ].$$

where $\textrm{a}(x,y)$ is the background light intensity, $\textrm{b}(x,y)$ is the amplitude intensity and $I(x,y)$ is the intensity distribution of the fringe image. With the $\Delta \varphi (x,y)$ is known, the modulated fringe image ($I(x,y)$) can be calculated by Eq. (3).

In order to construct the reprojection process, we establish the conversion from the 3D height map to the 2D fringe patterns during the network training based on the above equations (Eq. (1) to Eq. (3)). Once the height map ($h(x,y)$) of the object is predicted by the neural network, $\Delta \varphi (x,y)$ can be calculated by Eq. (2) with the position parameters (L and D) unchanged. Then we simulated to project the standard fringes onto the predicted height maps by using Eq. (3), which is similar to the rendering process in [32].

The difference between the reprojection fringe image and the input fringe image forms the loss function. Moreover, the smaller loss value is, the closer the predicted phase map is to the ground truth. The proposed reprojection model is totally composed of differentiable formulas, which ensures the feasibility of back propagation. This model allows grating fringe reprojection from any angle and position in computer space, which prevents the position change of camera and projector from becoming the bottleneck of generalization ability of this model.

With the reprojection results and the input images, we establish two loss functions with two-frequency predicted fringe images and their corresponding input fringes. The Root Mean Square Error (RMSE) [35] is employed to establish the final loss function. We propose a loss function composed of both high-frequency and low-frequency part. It can be expressed as:

(4)$$Loss = {\omega _{low}}Loss1 + {\omega _{high}}Loss2.$$

where:

(5)$$Lossi = \sqrt {\frac{1}{n}\sum\limits_{x = 1}^H {\sum\limits_{y = 1}^W {{{({{I_i}(x,y) - {{\hat{I}}_i}(x,y)} )}^2}} } } ,i = 1,2.$$

Here, ${I_i}(x,y)$ is the intensity contribution of the fringe image. $H$ and $W$ are the pixel height and width of the image, respectively. n is the total number of pixels of the fringe image. The ${\omega _{low}}$ and ${\omega _{high}}$ control the relative contribution of the low-frequency RMSE loss and the high-frequency RMSE loss. For learning more detail information from high-frequency fringe, we set ${\omega _{high}} = 0.8$ and ${\omega _{low}} = 0.35$. Note that the value of the parameters(${\omega _{low}}$ and ${\omega _{high}}$) is selected on experience after parameter adjust when training, which is very common in deep learning [32,33]. With the parameters being set, the loss function is completed and the unsupervised training can then be carried out.

The correctness of the loss function is the key for reliable unsupervised training. Therefore, the following experiment is designed to verify the reprojection model, so that the correctness of the reprojection-based loss function can be demonstrated.

2.3 Verification of the reprojection model

It is necessary to test the reliability of the proposed reprojection model so that the loss function can be established correctly and the training of unsupervised network is mathematically reasonable. In order to verify the feasibility of the reprojection model, we firstly set up the simulated reprojection model. We set the distance of the projector and the camera(D) as 80mm and the distance between the reference plane and the camera(L) as 800mm, which is a common arrangement in FPP system. Then the peaks-shaped surface is established as shown in Fig. 5(a). The Eq. (1) to Eq. (3) is utilized to simulate the reprojection process to generate the corresponding fringe images as shown in Fig. 5(c). In order to test the accuracy of the fringe images, traditional phase-shifting, unwrapping and calibration algorithm is applied to calculate the height map. With Eq. (3) to present the modulated fringe images, the wrapped phase map can be expressed as:

(6)$$\Delta \varphi (x,y) = {\tan ^{ - 1}}\left( {\frac{{ - \sum {{I_i}} \sin (\delta (i))}}{{\sum {{I_i}} \cos (\delta (i))}}} \right).$$

Where $\delta (i) = i \times 2\pi /N$, which is phase shifting step and the ${I_i}(x,y)$ presents the $i$th phase-shifting fringe image. We set four phase-shifting steps instead of three to ensure the accuracy of the algorithm. With the wrapped phase is got by Eq. (6) as shown Fig. 5(d), the weighted least square unwrapping algorithm [36] is applied to generate the absolute phase map. From Fig. 5(e), it can be observed that the absolute phase map clearly shows the morphological characteristics of the peaks. For further demonstration, we employ the calibration method in [14] to transform absolute phase map to 3D height map. We set several random-posture calibration board in computer space to simulate the calibration process as shown in Fig. 5(f). By extracting the absolute phase value at the corners of the calibration boards and the corner pixel coordinates of the corresponding image captured by camera, the least square fitting of the phase-height mapping relationship function in [14] can be carried out. With the calibration parameters in phase-height mapping relationship function being fitted in the simulated calibration model, the absolute phase map in Fig. 5(e) can be reconstructed to height map as shown in Fig. 5(b). The error map is shown in Fig. 5(g) with the average point-to-point error is 0.0028 mm and the worst case of error map is 0.0063 mm. In short, the projection model can extract reliable and high-accuracy fringe images from any 3D height map.

Fig. 5. The fringe result of the reprojection model and the reconstruction 3D map based on the reprojection fringe images. (a) and (b) initial peaks-shaped 3D surface and the reconstruction result by traditional algorithm respectively. (c) four-step modulated fringe images reprojected on the surface respectively. (d) and (e) the wrapped phase map and the absolute phase map calculated by (c). (f) and (g) the simulated calibration boards and the point-to-point subtraction differences between (a) and (b).

Download Full Size | PDF

After demonstrating the reliability of the reprojection model, it can be found that the reprojection model is reasonable to assist the network for unsupervised training. Then the dataset generation and the network training can be carried out.

3. Experiments and results

3.1 Dataset generation and network training

Deep convolutional neural network provides the excellent ability to learn the regression from fringe code to 3D height value. The main difficulty becomes how to generate enough training input modulated fringe data to feed the unsupervised network.

The chief strength and novelty of our reprojection model lies not only in realizing full perspective FPP system unsupervised deep learning but also providing a robust, high speed and high accuracy modulated fringe image generation principle. We designed a reprojection-based data generator in MATLAB environment which can generate simulated fringe patterns of objects with different height distributions automatically.

The whole data generation process is divided into two steps shown in Fig. 6. The first step is to generate a large number of randomly distributed simulation height maps to prepare for the second step. In order to improve the generalization ability of the proposed model, we firstly generated a series of fundamental square matrix that has the size and the value that obey the normal random distribution, which in the range of 2×2 to 10×10 and 0 to 200 respectively. Then we used the interpolation method to expand the basic matrix into a larger matrix (256×256 in this experiment) as the height map. Because the output of the proposed network is in the form of the height map, we set the unit of z-axis to millimeter. In order to avoid over fitting and improve the diversity and complexity of training data set, the interpolation by nearest, bilinear and cubic ways were applied. The second step is to simulate the projection process to get the modulated input fringe images. The initial standard fringe (frequency of the grating is 10) was firstly generated to be projected onto the height maps. Then the simulation FPP system, where the distance between projector and camera(D) was 80mm, and the distance between optical center and measured reference plane(L) was 800mm, was set up. With the height map in middle part of Fig. 6, we applied the reprojection model based on Eq. (2) and Eq. (3) to transform the 3D height map to corresponding modulated fringe image.

Fig. 6. The building process of some random selection in the training dataset.

Download Full Size | PDF

We generated 30,000 groups of high & low frequency input fringe images in total, including 10,000 groups of fringe images generated based on nearest, bilinear and cubic methods respectively. The low and high frequency were set as 10 and 29 respectively, which means that the input fringe images had 10 and 29 periods of sinusoidal fringe in the pixel coordinate range respectively. In this data set, 90%, 5% and 5% are used as training set, validation set and testing set respectively. We also considered the training of network anti-noise ability, so the noise information with random noise description (uniform noise, Gaussian noise, or salt and pepper noise) and random noise value (the mean value of Gaussian noise is 0 and the variance of Gaussian noise is from 0.0001 to 0.1, the density of the salt and pepper noise is from 0.05 to 0.25) is added to the final modulated fringe patterns. For pretraining the network, 30 groups of ground truth height maps (height maps based on each side length of the initial matrix and each kind of interpolation method) are also stored.

In order to verify the proposed unsupervised neural network, a PC with Core i7-6500U CPU(2.50GHz), an 8GB RAM and a NVIDIA GeForce GTX 1080 Ti graphics card were utilized to generate the simulated image data and train the network. The model was trained in the Python environment and the Pytorch [37], a generally recognized library in the field of machine learning, was used to establish the whole neural network. We randomly scrambled all the input fringe images and the corresponding ground truth height map generated sequentially at the MATLAB and adopted the dropout and weight decay to prevent over fitting. The parameters in the proposed neural network were updated in 40 epochs for pretraining and in 500 epochs for unsupervised training. Every 40 fringe images were packed as a batch with 375,000 iterations in unsupervised training.

The training process adopted the Adam optimization [38] with the learning rate starting from 10-4 and halving five times every time the loss value stop decreasing within 10 successive epochs. And the momentum was set as 0.9 to improve computing efficiency. We adopted the RMSE as the loss function to evaluate the accuracy and the final loss value can be decreased to 0.0021,0.0063,0.0054 in training, testing and validation image set respectively after training. The whole generation of the data set and optimization of the proposed unsupervised model takes roughly 20 hours in total.

As a summary, the reprojection-based unsupervised framework should (1) pretrain the network with small set of ground truth for 50 epochs and (2) feed large amount of input fringe images without ground truth into the pretrained network for unsupervised learning. When the loss function stops decreasing, the unsupervised training is finished and the trained network can be applied for 3D reconstruction.

3.2 Feasibility and accuracy test

The training process was finished when the Root Mean Square Error of the neural network results on validation set stopped decreasing. Once the neural network was trained, the trained CNN can output the height map in pixels with the fringe patterns as the input. To verify the feasibility of the proposed method, the test image set was utilized evaluate the performance of the trained neural network. Two different error analysis methods were used to evaluate the accuracy of the prediction of the trained CNN comprehensively.

Figure 7 illustrates the similarity between the ground truth and the predicted height map by selecting two random horizontal and vertical pixel coordinate lines and comparing the distribution of phase value on two lines. As shown in right part of Fig. 7, the solid line and the dotted line represent the horizontal and vertical indicator lines respectively, the red and the yellow lines refer to ground truth and predicted values in the phase map, respectively. The comparison results are presented in lower part of Fig. 7.

Fig. 7. The error analysis of one random example in testing image set. The comparison of the chosen line is represented at the lower part. The red and blue lines indicate the height value of ground truth and the output of the trained CNN respectively.

Download Full Size | PDF

In order to evaluate the accuracy of the prediction results, we apply the total RMSE(henceforth referred to as RMSE) between prediction results and ground truth. The RMSE of the height values in chosen lines of the predicted results are 0.1424 mm(horizontal direction) and 0.1574 mm(vertical direction) respectively, which can strongly prove the validity of the method and the accuracy of the predicted results.

In order to evaluate the overall accuracy of the predicted height maps over the entire pixel range, the height value of all pixel coordinates was adopted to analyze the difference between the ground truth and the predicted height map. As illustrated in Fig. 8, some fringe patterns that randomly chosen from the test image set were sent into the trained CNN and the predicted results and corresponding accuracy analysis (including the ground truth heat maps, predicted phase heat maps and the error heat maps) are shown at the right part. It can be seen that there is almost no visual difference between the height maps predicted by the network and the ground truth. From the error maps, which denote the difference of each data point, it can be seen that the error is evenly distributed in the whole pixel range. The RMSE is within 0.3245 mm and the maximum error of all test fringe image sets is less than 3.6089 mm. The accuracy of the testing fringe image set is even higher in some cases compared with the results predicted by supervised CNN proposed in [22].

Fig. 8. The error analysis based on the entire pixel range of some randomly drawn examples in testing image set. The first column shows the input fringes and the second to fourth columns show the corresponding ground truth, the predicted height maps and the point-to-point error maps respectively. The X and Y axis coordinates represent the pixel range of 256×256, and the Z axis direction represents the height value(millimeter).

Download Full Size | PDF

It is also necessary to compare the accuracy of the network prediction results between the supervised method and our proposed unsupervised method. We built a supervised neural network according to [22], which is a classic method for the application of deep learning in FPP system, to compare the accuracy between supervised method and our proposed unsupervised method. Then we trained the network with the same dataset and same iterations by supervised method. We also generate some special surfaces as shown at the first column in Fig. 9 to test the accuracy of two trained network. And the maximum point to point error and total RMSE of each case are shown in Table 1. As illustrated in Fig. 9 and Table 1, for different surfaces the reconstruction accuracy of the two methods is almost the same (the total RMSE of the prediction result by supervised method are 0.4839mm, 0.371mm, 0.1023mm and 0.3192mm respectively, as well as 0.4454mm, 0.2308mm, 0.3569mm and 0.4734mm by our proposed method). It also can be found from the error maps and Table 1 that the maximum point to point error of the result predicted by proposed unsupervised CNN is even lower than supervised one in some case (the maximum error in first and third column are 3.4259mm and 4.913mm respectively by supervised method, as well as 2.274mm and 4.3954mm by proposed method). All the comparisons in Fig. 9 demonstrated that our proposed method can obtain competitive accuracy compared with supervised method.

Fig. 9. The comparison of prediction accuracy between supervised network and proposed network. The first row shows the input fringes and the second shows the corresponding ground truth. The point-to-point error maps of the proposed and supervised method are shown at the third and fourth rows respectively. The X and Y axis coordinates represent the pixel range of 256×256, and the Z axis direction represents the height value(millimeter).

Download Full Size | PDF

Table 1. Comparison of the reconstruction accuracy by using supervised CNN and our proposed method in simulation environment.

View Table | View all tables in this article

From the above analysis, it can be demonstrated that the trained CNN has the characteristics of robustness in 3D reconstruction for the randomly generated fringe pattern and can obtain high-accuracy results.

3.3 Generalization capability test

Since the data sets used for training are all randomly generated surfaces, it is necessary to test whether the trained neural network can reconstruct the fringe images modulated by common objects and industrial parts so that the proposed unsupervised CNN can be competent for routine industrial measurement tasks. In order to demonstrate the generalization capability of our proposed method, several common objects and industrial parts are built for measurement. We generated several special surfaces for experiments as shown at second column in Figs. 10–12. The system was set according to Fig. 4. The system configuration including the distance of the projector and the camera(D) and the distance between the reference plane and the camera(L) was set as 80mm and 800mm respectively. Then the procedure in Fig. 6 was carried out to generate the fringe images at first column.

Fig. 10. The reconstruction results of some curved objects.

Download Full Size | PDF

Fig. 11. The reconstruction results of objects with planes and holes.

Download Full Size | PDF

Fig. 12. The modeling process and reconstruction result of the human face. (a) the dual-frequency input fringe images of the human face. (b) the height map of the human face transformed from the point cloud. (c) the reconstruction result. (d) the error map of the result.

Download Full Size | PDF

We first evaluate the reconstruction accuracy of the trained neural network for objects with waveform and arc gradient. As shown in Fig. 10, three curved surfaces with different complexity are established. Firstly, the first row of Fig. 10 illustrates the reconstruction result of a yurt shaped conical surface. The RMSE is 0.130 mm and except for the top and the edge of the cone, the error can be kept within ±0.05 mm. The worst case in error map is 3.1562 mm. It can be found that the predicted height map shows little difference with the ground truth. Furthermore, a wavy surface with three bumps as shown in middle row of Fig. 10 was also constructed to test the generalization ability. Similarly, the proposed method can reconstruct high-accuracy 3D surface from the modulated fringe pattern with the maximum error < 1.7129 mm and the RMSE is 0.1552 mm. Although the predicted height map and the ground truth are almost the same it should be noticed from the error map that the higher error value is concentrated in the bulges, which corresponds to the shape of the ground-truth surface. Finally, we complicate the second surface, replace the flat wave with sinusoidal wave. It can be seen at the third row of Fig. 10 that although there is relatively large error in the bottom area, the complex wavy surface is reconstructed with high accuracy with the RMSE = 0.279 mm and the maximum error < 2.0452 mm.

It is also necessary to evaluate the reconstruction effect of industrial parts, which usually have stepped surfaces with larger gradients and holes, to test whether our proposed method competent for industrial measurement. We built three model with planes and holes to carry out the experiment. The initial 3D models were established by Creo and the point cloud is extracted from the surfaces of the 3D models to form the ground truth height maps. A heart-shaped three-dimensional model was first built as shown in Fig. 11. The error map shows that the overall model can be well reconstructed but the edge part of the heart model has higher error. The RMSE is 0.332 mm and the highest error is 5.3359 mm. A positioning fixture with an inclined plane and two positioning holes was also built as shown at middle part of Fig. 11. The RMSE is 0.384 mm and the highest error is 6.6953 mm. From the error map, it can be found that the trained network shown good performance when reconstruct the feature of plane expect for the fault of the edge part and the holes. We finally established a triangular part with an inclined plane and two holes with different shapes as shown at the bottom of Fig. 11. We designed this model because features such as triangular and circular holes, inclined planes and vertical faults are common in industrial parts. Therefore, it is very symbolic to test our method with this model for the reconstruction effect of industrial parts. It can be found that the error is mainly concentrated in the inner wall of the holes, however the point-to-point error is still less than 6.1514 mm and the RMSE is 0.709 mm. From the experiments above, the reconstruction capability for industrial parts of the proposed method can be demonstrated.

Finally, it is also interesting to test our proposed network when faced with objects in daily life. We applied the lidar 3D scanner to extract the height map of a human face as the ground truth as shown in Fig. 12(b). The corresponding fringe patterns and the reconstruction result are shown in Fig. 12(a) and Fig. 12(c) respectively. It can be found from error map in Fig. 12(d) that the proposed framework can reconstruct the sophisticated surface in high accuracy expect some random jump errors at the tip of the nose and the corner of the mouth. The RMSE is 0.2368 mm and the maximum error in pixel range is 3.3954 mm.

From the comparison above, it can be demonstrated that the CNN trained by our proposed method has good generalization capability although the reconstruction performance would deteriorate as the complexity of the model to be measured increases.

3.4 Anti-noise capability test

Considering the instability of the ambient light source in the real measurement environment, it is necessary to investigate the noise tolerating capability of the trained CNN. The Gaussian noise of different levels were added to some random generated fringe images, which is shown at the left part of Fig. 13(with ground truth, predicted results and the error map at the right part). It can be seen that the trained CNN is hardly affected by noise of low levels(the point-to-point error still within 2.617 mm and the RMSE is 0.4995 mm with the variance of the noise ${\mathrm{\sigma }^2}\textrm{ = 0}\textrm{.001}$).The accuracy of the predicted height maps based on high-level noise does reduce(maximum error < 6.5337 mm and RMSE=1.0523 mm with ${\mathrm{\sigma }^2}\textrm{ = 0}\textrm{.01}$ and maximum error < 7.4744 mm and RMSE=1.6471 mm with ${\mathrm{\sigma }^2}\textrm{ = 0}\textrm{.1}$) but the feature information of 3D height map is still restored compared with the ground truth on the whole.

Fig. 13. The error analysis with Gaussian noise of different levels, the mean value of the noise is 0, and the variance is from 0.0001 to 0.1. The first column shows the input fringes and the second to fourth columns show the corresponding ground truth, the predicted height maps and the point-to-point error maps respectively. The X and Y axis coordinates represent the pixel range of 256×256, and the Z axis direction represents the height value(millimeter).

Download Full Size | PDF

In order to further demonstrate the anti-noise ability of the trained neural network, we compared the noise tolerating performance of both the trained CNN and the traditional phase-shifting algorithm. Figure 14 illustrates the height map generated by phase-shifting algorithm and proposed CNN respectively based on fringe images with Gaussian noise of ${\mathrm{\sigma }^2}\textrm{ = 0}\textrm{.1}$. It can be found that the trained CNN performs obviously better than traditional methods at high-level noise. There is almost no consistency between the height map reconstructed by phase-shifting algorithm and the ground truth. It can be seen that the traditional phase-shifting method shows little robustness to high-level noise. However, the proposed model can reconstruct height map with high accuracy. Furthermore, it can be found that compared with traditional algorithm, the proposed model can replace the phase-shifting calculation, phase unwrapping, 3D reconstruction and directly get the height map with only two fringe images in one step, which is efficient. From the comparison results shown in Fig. 14, it can be demonstrated that our proposed method can validly provide noise-robustness capability and perform much better than traditional phase-shifting with much less fringe patterns and labor work for image data processing.

Fig. 14. The comparison of anti-noise capability between phase-shifting algorithm and proposed method. The upper part shows the reconstruction process of the traditional phase-shifting method and the lower path shows the reconstruction process of the proposed method and the ground truth.

Download Full Size | PDF

4. Discussion

A large set of input data and the corresponding accurate ground truth are necessary when it comes to deep learning. Nevertheless, the preparation of the ground truth 3D height map with high precision could be a challenge when deep learning is utilized in the field of the structure light in reality. The ground truth of the fringe pattern are now mainly achieved by phase shifting algorithm [39] with at least three fringe images for one object(usually 4 to 12 images because of the low accuracy for single-frequency three-step phase-shifting method). There are two main disadvantages of this method, which prevent the fast acquisition of high precision ground truth height map. Firstly, the ground truth calculation is mainly based on phase shifting method, which still lead to errors of the 3D height map including gamma nonlinear error and random error caused by noise from the environment. It is obvious that the trained CNN would perform badly without high-accuracy ground truth. Furthermore, as the input of neural network is only two fringe images, the taking of the extra two or more fringe images and the phase-shifting calculation can be very time-consuming and the storage of the ground truth needs large computer memory. Although using sophisticated GPUs or cloud computing services (like Amazon Web Services, Google Cloud, Microsoft Azure Cloud) has been proposed [22] to overcome the lack of computer memory, the root of the problem(too much 3D data) has not been solved.

Therefore, applying unsupervised CNN model can be advantageous to solve aforementioned problems. On the one hand, as based on reprojection, unsupervised method directly calculates the loss function between two fringe images, which avoids the error of algorithm principle caused by using phase-shifting algorithm and theoretically improve the accuracy of the ground truth. On the other hand, by using our proposed method, the storage and calculation of the ground truth height map and the shooting and storage of the extra fringe images are omitted. In order to demonstrate the effectiveness of the proposed method, we used the simulation method to reproduce and compare the data set and ground truth generation process of the supervised method and our proposed unsupervised method. Two different dataset generators were established to generate 30,000 groups of the dataset. The two datasets were fed into the supervised network and our proposed model respectively. The time consumed and the computer memory used by two methods in data generation and network training (a total of 375,000 iterations) are illustrated in Table 2. It can be found that although consumed about half an hour more than supervised model in network training, our method saved almost ten times of memory space (supervised method occupies 13.89 GB of memory space and our proposed method occupies 1.27 GB) and nearly five times of time for phase-shifting calculation (19.1 hours for supervised method and 3.2 hours for proposed method).

Table 2. Comparison of the time consumed and the computer memory occupied by using supervised CNN and our proposed method in simulation environment.

View Table | View all tables in this article

It should be noted that supervised CNN will spend even more time than our proposed method in real conditions rather than in simulation environment because that the shooting of the extra fringe images is replaced by simulation in virtual environment, which takes large amount of time in reality. Furthermore, the needed computer memory space would be much larger because that there are more pixels of the fringe images taken in real FPP setup and that the training of the deeper neural network needs larger amount of the data set as well as ground truth 3D height maps. So it can be meaningful if the proposed unsupervised CNN model be implemented into the real FPP system so that the measurement accuracy and the network training efficiency can be improved.

In this paper, we propose an unsupervised framework to tackle the laborious ground truth labeling for dataset generation and the experiments have shown the feasibility and accuracy of our proposed method compared with supervised methods in simulation environment. However, it is still necessary to discuss the implementation of the unsupervised deep learning in real-world FPP system. Actually, the main difficulty is that the situation would be much more complex in unsupervised deep learning-assisted real-world FPP system when the output height map (or other three-dimensional forms, such as point cloud, voxel, mesh) and the input image is needed to be connected strictly. The transformation from 3D height map to 2D fringe image must be consistent with the real-world FPP system to ensure the accuracy of unsupervised network training. The render technique can be a solution to previously mentioned problem, which have been successfully applied in unsupervised deep learning for 3D reconstruction [32,40]. The render procedure includes the render the surface texture to the object from specific angle and capture the 3D model to the image plane from the camera position [33,39], which is similar to the fringe projecting and capturing in FPP system. And the extrinsic and intrinsic matrixes of the projector and the camera are needed to be identified, which is also a mature technique [40–43]. With the above techniques, the unsupervised deep learning for FPP system can be carried out in real-world system.

In general, in this paper we proposed a reprojection model and designed the loss function to realize the unsupervised method for network training in FPP system. The simulation experiments have demonstrated the feasibility of the unsupervised network for fringe analysis. Although the proposed model needs further optimization for the application of the unsupervised model in real-work system, this paper provides ideas and basis for the researchers and engineers. In future work, we hope to apply and optimize the render technique mentioned above to evaluate the unsupervised framework in real-world FPP systems.

5. Conclusion

In this paper, a reprojection-based unsupervised CNN model is designed to transform dual-frequency modulated fringe images to corresponding height maps and the simulated environment is established to demonstrate the proposed method. The reconstruction results can be obtained with the point-to-point error < 3.6089 mm on the randomly generated test image set, whose accuracy is competitive with the supervised methods. Furthermore, the results of fringe images modulated by 3D models of some common objects and industrial parts perform equally well, which proves that our network can be well applied in industry. We also compare the proposed CNN with traditional algorithm about processing fringe images with noise of different levels to verify the network’s anti-noise capability. In addition, the comparison of data generating of supervised and unsupervised CNN training is carried out, proving that our proposed training model can avoid the generation and storage of a large number of 3D ground truth data and reduce the number of the fringe images needed to be taken if our system is applied in reality, which enables the training and dataset generating more effective.

Funding

National Key Research and Development Program of China (2020YFB2008200).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper may be obtained from the authors upon reasonable request.

References

1. F. Chen, G. M. Brown, and M. Song, “Overview of 3-D shape measurement using optical methods,” Opt. Eng. 39(1), 10–22 (2000). [CrossRef]

2. Z. Wang, D. A. Nguyen, and J. C. Barnes, “Some practical considerations in fringe projection profilometry,” Optics and Lasers in Engineering 48(2), 218–225 (2010). [CrossRef]

3. T. Bell, B. Li, and S. Zhang, “Structured light techniques and applications,” Wiley Encyclopedia of Electrical and Electronics Engineering W8298, 1–24 (1999). [CrossRef]

4. J. Geng, “Structured-light 3D surface imaging: a tutorial,” Adv. Opt. Photonics 3(2), 128–160 (2011). [CrossRef]

5. D. Zheng, F. Da, Q. Kemao, and H. S. Seah, “Phase-shifting profilometry combined with Gray-code patterns projection: unwrapping error removal by an adaptive median filter,” Opt. Express 25(5), 4700–4713 (2017). [CrossRef]

6. X. Su and W. Chen, “Fourier transform profilometry:: a review,” Optics and lasers in Engineering 35(5), 263–284 (2001). [CrossRef]

7. X. Su and Q. Zhang, “Dynamic 3-D shape measurement method: a review,” Optics and Lasers in Engineering 48(2), 191–204 (2010). [CrossRef]

8. Y. An, J.-S. Hyun, and S. Zhang, “Pixel-wise absolute phase unwrapping using geometric constraints of structured light system,” Opt. Express 24(16), 18445–18459 (2016). [CrossRef]

9. D. C. Ghiglia and M. D. Pritt, “Two-dimensional phase unwrapping: theory, algorithms, and software,” Wiley, New York (1998).

10. D. C. Ghiglia and L. A. Romero, “Robust two-dimensional weighted and unweighted phase unwrapping that uses fast transforms and iterative methods,” J. Opt. Soc. Am. A 11(1), 107–117 (1994). [CrossRef]

11. J. M. Huntley and H. O. Saldner, “Shape measurement by temporal phase unwrapping: comparison of unwrapping algorithms,” Meas. Sci. Technol. 8(9), 986–992 (1997). [CrossRef]

12. H. O. Saldner and J. M. Huntley, “Temporal phase unwrapping: application to surface profiling of discontinuous objects,” Appl. Opt. 36(13), 2770–2775 (1997). [CrossRef]

13. H. O. Saldner and J. M. Huntley, “Profilometry using temporal phase unwrapping and a spatial light modulator based fringe projector,” Opt. Eng. 36(2), 610–615 (1997). [CrossRef]

14. P. Lu, C. Sun, B. Liu, and P. Wang, “Accurate and robust calibration method based on pattern geometric constraints for fringe projection profilometry,” Appl. Opt. 56(4), 784–794 (2017). [CrossRef]

15. C. Zuo, S. Feng, L. Huang, T. Tao, W. Yin, and Q. Chen, “Phase shifting algorithms for fringe projection profilometry: A review,” Optics and Lasers in Engineering 109, 23–59 (2018). [CrossRef]

16. S. Zhang and P. S. Huang, “Phase error compensation for a 3-D shape measurement system based on the phase-shifting method,” Opt. Eng. 46(6), 063601 (2007). [CrossRef]

17. Z. Cai, X. Liu, H. Jiang, D. He, X. Peng, S. Huang, and Z. Zhang, “Flexible phase error compensation based on Hilbert transform in phase shifting profilometry,” Opt. Express 23(19), 25171–25181 (2015). [CrossRef]

18. S. Feng, Q. Chen, G. Gu, T. Tao, L. Zhang, Y. Hu, W. Yin, and C. Zuo, “Fringe pattern analysis using deep learning,” Adv. Photonics 1(02), 1 (2019). [CrossRef]

19. G. Spoorthi, S. Gorthi, and R. K. S. S. Gorthi, “PhaseNet: A deep convolutional neural network for two-dimensional phase unwrapping,” IEEE Signal Processing Letters 26(1), 54–58 (2019). [CrossRef]

20. H. Yu, X. Chen, Z. Zhang, C. Zuo, Y. Zhang, D. Zheng, and J. Han, “Dynamic 3-D measurement based on fringe-to-fringe transformation using deep learning,” Opt. Express 28(7), 9405–9418 (2020). [CrossRef]

21. J. Qian, S. Feng, Y. Li, T. Tao, J. Han, Q. Chen, and C. Zuo, “Single-shot absolute 3D shape measurement with deep-learning-based color fringe projection profilometry,” Opt. Lett. 45(7), 1842–1845 (2020). [CrossRef]

22. S. Van der Jeught and J. J. Dirckx, “Deep neural networks for single shot structured light profilometry,” Opt. Express 27(12), 17091–17101 (2019). [CrossRef]

23. H. Yu, D. Zheng, J. Fu, Y. Zhang, C. Zuo, and J. Han, “Deep learning-based fringe modulation-enhancing method for accurate fringe projection profilometry,” Opt. Express 28(15), 21692–21703 (2020). [CrossRef]

24. S. Lv, Q. Sun, Y. Zhang, Y. Jiang, J. Yang, J. Liu, and J. Wang, “Projector distortion correction in 3D shape measurement using a structured-light system by deep neural networks,” Opt. Lett. 45(1), 204–207 (2020). [CrossRef]

25. K. Yan, Y. Yu, C. Huang, L. Sui, K. Qian, and A. Asundi, “Fringe pattern denoising based on deep learning,” Opt. Commun. 437, 148–152 (2019). [CrossRef]

26. K. Yan, Y. Yu, T. Sun, A. Asundi, and Q. Kemao, “Wrapped phase denoising using convolutional neural networks,” Optics and Lasers in Engineering 128, 105999 (2020). [CrossRef]

27. Y. Zheng, S. Wang, Q. Li, and B. Li, “Fringe projection profilometry by conducting deep learning from its digital twin,” Opt. Express 28(24), 36568–36583 (2020). [CrossRef]

28. J. Karhunen, T. Raiko, and K. Cho, “Unsupervised deep learning: A short review,” Advances in Independent Component Analysis and Learning Machines 18, 125–142 (2015). [CrossRef]

29. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86(11), 2278–2324 (1998). [CrossRef]

30. M. Dai, F. Yang, C. Liu, and X. He, “A dual-frequency fringe projection three-dimensional shape measurement system using a DLP 3D projector,” Opt. Commun. 382, 294–301 (2017). [CrossRef]

31. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention (Springer2015), pp. 234–241.

32. K. Genova, F. Cole, A. Maschinot, A. Sarna, D. Vlasic, and W. T. Freeman, “Unsupervised training for 3d morphable model regression,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 8377–8386.

33. S. Wu, C. Rupprecht, and A. Vedaldi, “Unsupervised learning of probably symmetric deformable 3d objects from images in the wild,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 1–10.

34. C. Rocchini, P. Cignoni, C. Montani, P. Pingi, and R. Scopigno, “A low cost 3D scanner based on structured light,” in Computer Graphics Forum20(3) (Wiley Online Library2001, pp. 299–308. [CrossRef]

35. T. Chai and R. R. Draxler, “Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature,” Geosci. Model Dev. 7(3), 1247–1250 (2014). [CrossRef]

36. C. Prati, M. Giani, and N. Leuratti, “SAR Interferometry: A 2-D phase unwrapping technique based on phase and absolute values informations,” in 10th Annual International Symposium on Geoscience and Remote Sensing (IEEE1990), pp. 2043–2046.

37. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, and L. Antiga, “Pytorch: An imperative style, high-performance deep learning library,” arXiv preprint arXiv:1912.01703 (2019).

38. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980 (2014).

39. W. Yin, Q. Chen, S. Feng, T. Tao, L. Huang, M. Trusiak, A. Asundi, and C. Zuo, “Temporal phase unwrapping using deep learning,” Sci. Rep. 9, 1–12 (2019). [CrossRef]

40. S. Zhang and P. S. Huang, “Novel method for structured light system calibration,” Opt. Eng. 45(8), 083601 (2006). [CrossRef]

41. Y. Yin, X. Peng, A. Li, X. Liu, and B. Z. Gao, “Calibration of fringe projection profilometry with bundle adjustment strategy,” Opt. Lett. 37(4), 542–544 (2012). [CrossRef]

42. W. Gao, L. Wang, and Z. Hu, “Flexible method for structured light system calibration,” Opt. Eng. 47(8), 083602 (2008). [CrossRef]

43. E. Wong, S. Heist, C. Bräuer-Burchardt, H. Babovsky, and R. Kowarschik, “Calibration of an array projector used for high-speed three-dimensional shape measurements using a single camera,” J. Opt. Soc. Am. A 57(26), 7570–7578 (2018). [CrossRef]

Method	Supervised CNN		Our
Error type	Max	RMSE	Max	RMSE
Case 1	3.4259 mm	0.4454 mm	2.274 mm	0.4839 mm
Case 2	5.5628 mm	0.2308 mm	6.0134 mm	0.371 mm
Case 3	4.913 mm	0.3569 mm	4.3954 mm	0.1023 mm
Case 4	8.274 mm	0.4769 mm	9.0922 mm	0.3192 mm

Method	Supervised CNN	Our
Training time	19.7 h	20.1 h
Dataset generation	19.1 h	3.2 h
Memory used	13.89 GB	1.27 GB

Method	Supervised CNN		Our
Error type	Max	RMSE	Max	RMSE
Case 1	3.4259 mm	0.4454 mm	2.274 mm	0.4839 mm
Case 2	5.5628 mm	0.2308 mm	6.0134 mm	0.371 mm
Case 3	4.913 mm	0.3569 mm	4.3954 mm	0.1023 mm
Case 4	8.274 mm	0.4769 mm	9.0922 mm	0.3192 mm

Method	Supervised CNN	Our
Training time	19.7 h	20.1 h
Dataset generation	19.1 h	3.2 h
Memory used	13.89 GB	1.27 GB

Unsupervised deep learning for 3D reconstruction with dual-frequency fringe projection profilometry

Abstract

1. Introduction

2. Unsupervised network model

2.1 Neural network architecture

2.2 Loss function

2.3 Verification of the reprojection model

3. Experiments and results

3.1 Dataset generation and network training

3.2 Feasibility and accuracy test

3.3 Generalization capability test

3.4 Anti-noise capability test

4. Discussion

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (14)

Tables (2)

Equations (6)

Optics Express