3-D face registration solution with speckle encoding based spatial-temporal logical correlation algorithm

Pei Zhou; Jiangping Zhu; Zhisheng You

doi:10.1364/OE.27.021004

Introduction

Structured light 3-D measurement technique [1–4] has been widely applied in numerous fields, such as inspection and control of industry quality, 3-D visualization of cultural relics and historic sites, 3-D diagnosis of face, body and other parts in medical industry [5–8] due to the fact that it is flexible and versatile in in terms of full field and non-contact, high accuracy and reliability.

As a classic geometric architecture of 3-D system, binocular stereo vision configuration is commonly adopted for its easy-operation of system calibration. Accurate correspondence between two views is crucial and is also challenging because the final 3-D reconstruction accuracy is directly determined by the resolution of correspondences. In order to obtain high density pointcloud data, it is generally not likely to rely on the natural texture variation to determine the correspondence, otherwise the accuracy is low and variable from one object to another, followed by many outliers if an object does not present rich surface texture. For this purpose, numerous geometrical structured patterns have been designed [1] to optically encode the tested surface. However, different encoding and decoding strategies differ in measurement performance and application range. Among them, the meteorology based on the optical or digital random binary/gray speckle pattern [7–13] is a classic one because it carries rich information and offers properties like non-periodic and isotropic texture to rich the surface texture which is beneficial to determine the correspondence.

A generation approach of speckle pattern based on diffraction optics element (DOE) and laser illumination is commercially successful. In comparison with other projection devices like DLP or LCD, the laser provides an overall compact system and higher brightness, leading to the popularity of consumer-level real-time 3-D imaging technologies. Such a technique is more widely applied in fields like games, living room and human-computer interaction markets where the accuracy and completeness of recovered 3-D shape are not especially concerned and more emphasis is placed on the motion-robustness. PrimeSense’s depth sensor adopts a single random binary pattern including about ~30, 000 dots [14]. Microsoft Kinect Version 1.0 [15], Apple iPhone X 3-D face ID [16] as well as Intel RealSense 3-D sensor [17] are several typical examples of adopting the similar technique. However, limited projected number and size of speckle dots subject to its manufacturing principle in measurement volume or onto the tested surface lead to a low spatial resolution. Besides, the secondary coherence of laser resulted from the surface details also lead to the differences between stereo images pairs, and in turn influences the reliability of stereo matching. Especially, for 3-D face measurement, the image contrast on the face using laser illumination is lower relative to other non-laser light sources [7], and the usage of high powered laser can solve this issue but has a potential risk of injuring human eyes. In case of precise measurement, for example, 3d MD corporation [8] adopted high power infrared LED rather than a laser to project a fixed mode optical random binary pattern so that high-speed medical 3-D face measurement can be achieved with ~2ms exposure.

The image correlation, i.e., the establishment of the pixel-to-pixel correspondence, between stereo image pair(s) with speckle pattern encoding can be calculated using some similarity functions. Unfortunately, the activity is considerably time-consuming for a lot of calculations during the searching of homologous point pairs based on the unique gray feature information within a given matching window, and therefore limits its applications in many cases. Some classic algorithms and their variants have been presented. For example, Guo et. al [9] adopted zero-mean normalized cross-correlation (ZNCC) to implement the stereo matching in 3-D measurement of whole-body. Jiang et. al [18] utilized the sum of absolute differences (SAD) for balancing the computation time and the quality of output disparity map. A comprehensive overview of correlation criteria including ZNCC, zero-mean normalized sum of squared difference (ZNSSD), and parametric sum of squared difference (PSSD) as well as their equivalences and performances are also presented in [19]. In [20], the authors offered us the basic concepts, theory and applications of image correlation applied in 3-D shape measurement. Generally, these algorithms can yield good matching results at expense of expensive calculation cost, especially ZNCC.

For 3-D face reconstruction, Wiegmann et. al [13] projected multiple band-limited statistical patterns and utilized a temporal correlation technique (TCT) with the gray information of 1pixel × 1pixel matching window along stereo image pairs sequence to implement the stereo matching. The authors experimentally studied the relationship between their approach and sinusoidal fringe projection (SFP) with different number band-limited statistical patterns [21], concluding that at least six image pairs are required to achieve an acceptable accuracy, but comparisons of calculation cost, accuracy and reliability with ZNCC [22] are not provided. An disadvantage not to be neglected concerning this technique is that the number of used stereo image pairs is up to 15 or more, which requires the tested object must keep static for a long time if used to measure a human face, leading to a low measurement efficiency including but not limited a large burden of images transmission and storage.

For reducing the computation cost of correspondence searching, some works have been reported. For example, Liu et.al [23] presented a stereo matching method by projecting one additional pattern to assist the correspondence searching. However, it may bring about a low measurement efficiency and make no contribution to the improvement of accuracy. Based on Kinect V1.0, Wang et. al [24] adopted a logical comparison strategy to obtain an initial disparity and then used prediction-based growing matching strategy to recalculate the disparity map.

Schaffer et. al [25,26] extended this idea of [13] to 3-D sensing of moving objects. They designed an acousto-optical deflector [27] to create continually scanning laser speckle patterns while two synchronized high-speed cameras capture stereo images with an assumption that the moving object was nearly static. However, the assumption is not always tenable so that they had to remedy motion-induced artifacts [28].

This work presents a 3-D information acquisition solution to meet the application of reliable and accurate 3-D whole face registration that plays an increasingly important role in the development of deep learning based 3-D face recognition algorithm. A prototype of the 3-D system that are comprised of two binocular measurement units is shown. In this system, we design an optimum speckle mask illuminated by infrared LED, then employ optical modulation to produce temporally and spatially varying and high-density binary speckle patterns to encode the tested face. We propose a coarse and refined strategy based highly efficient stereo matching algorithm to estimate the accurate disparity. The acquisition of 3-D point clouds from 2-D image pairs can be finished within 0.0667s on the benefit of STLC stereo matching algorithm. A series of contrast experiments by simulating real application situations have shown that the proposed approaches are remarkably advantageous over the widely used ZNCC stereo matching algorithm considering the balance between accuracy and calculation cost.

2. Principle

This section will introduce the 3-D information acquisition solution of whole face from left- and right ear. First of all, we will describe the geometric configuration of the 3-D system and how to produce spatially and temporally varying and high-density binary structured patterns. Then, on basis of the built 3-D system, we propose STLC 3-D reconstruction frameworks.

2.1 3-D information acquisition system of whole face

Figure 1(a) is the schematic diagram of the 3-D system, mainly including two same binocular stereo measurement units, a computer and a control module. A and B units are respectively responsible for the 3-D information acquisition of human face on the left and right. The resultant 3-D whole face pointcloud can be obtained through fusing the data from the two angles of view.

Fig. 1 3-D information acquisition system of whole face: (a) schematic diagram of the 3-D system; (b) designed high-density speckle pattern; (c) schematic diagram of the projector within red box, two cameras are included.

Download Full Size | PDF

The random binary pattern given in Fig. 1(b) is optimally designed by our proposed method [12] according to the system parameters, which then is etched on a chrome plated glass. Here the total black and white dots N_dot in the measurement volume can be approximately calculated by the following expression:

N_{d o t} = \frac{π D^{2}}{4 {(Δ x)}^{2}} .

Where D and Δx are respectively the diameter and feature size of chrome plated glass.

As shown Fig. 1(c), instead of the generation approach using laser illumination [27], we design a LED driven random binary speckle pattern projector. Attentively, two cameras are included in this figure to better explain how the device to work.

In this paper, an infrared light LED (850nm) is used to illuminate the optical component, and a lens to project the pattern onto the reflective surface of a motor-driven prism by reference to the method in [27], which can generate temporally and spatially non-correlative varying random binary speckle patterns in the measurement volume. Two synchronized cameras with infrared filters capture the pattern images pairs distorted by the tested human face.

2.2 STLC -based stereo matching and 3-D reconstruction of whole face

Building 3-D face database, e.g., 3-D registration, is the precondition as the source of fundamental data for the development of deep learning based 3-D recognition algorithm. In actual 3-D face recognition, usually only some part of the tested face is available to the 3-D acquisition system. Therefore, it is necessary to accurately acquire the shape of whole face from left- and right ear in registration where the tested subject can be cooperative.

By referring to the flowchart demonstrated in Fig. 2, we propose a STLC 3-D reconstruction scheme based on several stereo-image pairs. A and B units are respectively responsible for the 3-D reconstruction of left and right faces with the same work mechanism and image processing procedure. Taking A unit for example, the whole procedure of image processing mainly includes the following four steps: (1) rectification and binarization of stereo images; (2) STLC-based stereo matching; (3) sub-pixel disparity refinement as well as (4) point cloud registration and fusion.

Fig. 2 3-D reconstruction scheme of whole face in registration with STLC.

Download Full Size | PDF

Step 1. Rectification and Binarization of Stereo Images. The captured four stereo image pairs (N = 4 for illustration) are firstly subject to rectification, which is implemented such that the corresponding point only occurs on the same column for purpose of improving the correspondence searching speed, increasing the robustness and simplifying the correspondence determination.

Then we propose a Local Contrast Binarization (LCB) scheme to compute the local mean in a small patch along the temporal sequences, which is used as the binarization threshold of the current pixel intensity. Binarizing the raw stereo images is needed to obtain 1bit depth LCB images and make the following correlation computation more efficient. The LCB image I_LCB^k(i,j,t) is calculated by:

I_{L C B}^{k} (i, j, t) = {\begin{cases} 1 i f I^{k} (i, j, t) \geq t h^{k} (i, j) \\ 0 i f I^{k} (i, j, t) < t h^{k} (i, j) \end{cases}, (k \in {t o p, d o w n}) .

The binarization result is shown in Fig. 3, middle. From Fig. 3, right, we can notice how the information of raw input image (Fig. 3, left) is expressed by the white and black pixels of LCB image (Fig. 3, middle). By converting raw input image to LCB 1bit image, cost computation can be implemented in logical correlation instead of grayscale similarity correlation, thus remarkably speeding up the computation efficiency, which is crucial in 3-D human face recognition application. In Eq. (2), I^k(i,j,t) is the image intensity at pixel position (i, j) in the rectification image along the temporal sequences, and th^k(i,j) is the local mean as well as the binarization threshold determined by

Fig. 3 Comparisons between raw input image (left) and LCB image(middle). To make the distinction clearer, LCB white pixels are highlighted in red in the raw input image (right).

Download Full Size | PDF

t h^{k} (i, j) = \frac{\sum_{t = 1}^{N} \sum_{h = - w_{y} / 2}^{w_{y} / 2} \sum_{l = - w_{x} / 2}^{w_{x} / 2} I^{k} (i + h, j + l, t)}{N w_{x} w_{y}}, (k \in {t o p, d o w n}) .

Where th^k(i, j) is calculated within a rectangular window of size w_x × w_y (both w_x and w_y are odd greater than or equal to 3) centered at a pixel (i, j) along the temporal sequences. N represents the total number of stereo image pairs.

Step 2. STLC-based Stereo Matching. STLC is used to implement pixels-level matching, as showed in Fig. 4. The binarized LCB images I_LCB^k(i,j,t) feed to the STLC correlation algorithm, and finally a reduced disparity map is obtained, which is enlarged for a better graphic illustration. The following gives the detailed description.

Fig. 4 Illustration of STLC-based stereo matching mechanism to compute a low-resolution coarse disparity map.

Download Full Size | PDF

Here we introduce a binary descriptor B^k(i,j,t) for every position in the LCB image pairs along the temporal axis, which is calculated for cost computation. S_xS_y-dimensional bit string that corresponds to the binary descriptor B^k(i,j,t) for selecting the test locations (i_s, j_s) in a patch of size S_x × S_y is defined as:

B^{k} (i, j, t) = \sum_{s = 1}^{S_{x} S_{y}} 2^{s - 1} \cdot I_{L C B}^{k} (i_{s}, j_{s}, t) (k \in {t o p, d o w n}) .

Where neighboring pixels (i_s, j_s) are within a fixed rectangular window of size S_x × S_y (both S_x and S_y are odd greater than or equal to 3) centered at pixel (i, j), and I_LCB^k(i,j,t) is calculated by Eq. (2). After calculating the descriptor B^k(i,j,t), i.e., a binary string for each pixel, the matching cost of our proposed STLC method can be constructed as:

C_{_{S T L C}} (i, j, d) = 1 - \frac{\sum_{t = 1}^{N} {‖ B^{t o p} (i, j, t) X O R B^{d o w n} (i, j - d, t) ‖}_{1}}{N S_{x} S_{y}}, (k \in {t o p, d o w n}) .

Where (i, j-d) is the corresponding pixel of (i, j) with disparity d in another view, XOR is a bit-wise xor-operation. In short, C_STLC (i,j,d) measures the hamming distance between top and down binary strings along the temporal sequences.

After the STLC cost calculation, we employ the most widely used Winner-Take-All (WTA) strategy [29] to select the integer-pixel disparity by Eq. (6). Where the disparity d∈[d_min, d_max], d_min and d_max represent the minimum and maximum disparity respectively. This strategy generates each pixel’s optimal disparity D_INT(i, j) by choosing the disparity with the lowest matching cost value in all allowed disparities according to

D_{I N T} (i, j) = \arg \min_{d} C_{_{S T L C}} (i, j, d) .

A good coarse disparity map can be easily upsampled and refined up to a still accurate full resolution output. For the sake of further reducing the computation amount using STLC in matching, the S_x × S_y rectangular window of STLC correlation has to slide at a fixed-interval steps (15pixles for example) to touch the bottom side of the input image (and equivalently downwards). Adding one to account for the initial window position, the output size O_x × O_y of the coarse disparity map $D_{I N T}^{L O W}$ is a reduced image as shown in Fig. 4. Here,

O_{x} = ⌊ \frac{w i d t h - S_{x}}{s t e p_{x}} ⌋ + 1, O_{y} = ⌊ \frac{h e i g h t - S_{y}}{s t e p_{y}} ⌋ + 1.

Where_{$⌊ X ⌋$ represents to round each element of the expression X to the nearest integer less than or equal to that element, stepx and stepy respectively indicate_{the sliding interval in the horizontal and vertical direction.}}

As can be seen, the low resolution disparity estimation $D_{I N T}^{L O W}$ obtained by this step is rarely rough but enough to give the exact initial information for further disparity refinement. To recover a full resolution disparity D_INT, we up-sample the reduced output of the coarse disparity map simply using the nearest interpolation. And Step. 3 can correct the up-sampled one coming from the lower resolution to a more accurate one.

Step 3. Sub-pixel Disparity Refinement. Step. 2 only can give a pixel-level disparity (the low-resolution coarse disparity map shown in Fig. 4, meanwhile pixel-level 3-D reconstruction of upsampling full-resolution disparity is given in Fig. 2). With those initial pixel-level disparity values, a much smaller search range of refinement is allowed and a much lower computational cost is needed. Furthermore, here we present STLC-based stereo algorithm to implement the sub-pixel matching within a smaller searching range △d ( ± 10pixels for example) located around these determined pixels along the column until the matching of all pixels is finished.
Concretely, after obtaining the integer-pixel disparity map D_INT (i, j) by Step. 2, we can calculate the correlation values C_STLC(i, j, d + △d) by Eq. (5) and use WTA strategy to update the integer-pixel disparity D_INT (i, j) as Eq. (6) within a smaller searching range d + △d, where d = D_INT (i, j), △d∈ [-10, 10].
The sub-pixel position d_sub can be refined by a five-point quadratic curve fitting of (d-2, C_-2), (d-1, C_-1), (d, C₀), (d + 1, C₊₁) and (d + 2, C₊₂), where d = D_INT (i, j), C_-2 = C_STLC(i, j, d-2), C_-1 = C_STLC (i, j, d-1), C₀ = C_STLC (i, j, d), C₊₁ = C_STLC (i, j, d + 1) and C₊₂ = C_STLC (i, j, d + 2).
Finally, the resultant disparity D_SUB can be modified as
$D_{S U B} (i, j) = D_{I N T} (i, j) + d_{s u b} .$
The sub-pixels disparity map is then used to reconstruct 3-D coordinates based on the calibration parameters.
Step 4. Pointcloud Registration and Fusion. Based on the proposed 3-D information acquisition system of whole face in Sec. 2.1, the paper will focus on the framework of STLC-based stereo matching algorithm instead of the pointcloud registration and fusion. Thus, we directly utilize the manual and global registration and fusion functions of Geometric Studio 2015 software to register and fuse the 3-D point clouds from A unit and B unit by manually selecting more than three features within overlapped areas, and finally obtain an accurate and complete 3-D reconstruction of human face from left ear to right ear.
Figure 5(a) is the pointcloud data of left (red) and right (green) face that needs to be registered and fused. Next, we firstly select 3 feature points numbered by 1, 2, 3 on the left pointcloud data colored by red, and also select the similar feature points numbered by 1', 2', 3′ on the right pointcloud data colored by green, then the software will automatically and roughly register the two pieces of pointcloud, which can be found in Figs. 5(b) and 5(c). We further use the global register function to implement an iteration global registration (Fig. 5(d)) until it meets the convergence condition. Finally, the fusion function is employed to obtain the refined whole face pointcloud data, the fusion process and the final fusion result are given in Figs. 5(e) and 5(f). What needs to be emphasized that all of the parameters during the registration and fusion are the default values given by the software.

Fig. 5 Pointcloud registration and fusion. (a) pointcloud data of left (red) and right (green) face to be registered and fused; (b) selecting feature points and (c) the process of rough registration; (d) the result of global registration after iteration 5times; (e) the process of fusion; (f)the final fusion result of whole face from left and right ear.

Download Full Size | PDF

3. Experiments

In order to verify the performance of the proposed 3-D system and 3-D reconstruction algorithm, a series of experiments are implemented. In addition to comparatively analyzing the measurement, we also experimentally investigate how the proposed algorithm affects the 3D reconstruction results in terms of reconstruction accuracy and cost time.

In the prototype of 3-D system according to Fig. 1, each measurement unit is composed of two configured vertically (top and down) eight-bit black and white cameras (Model: IDS 3360CP-M, resolution 2048pixles × 1088pixles, focal length = 16mm, the baseline is about 280mm) and a projector. The chrome plated glass has a thickness of 1.5mm, a diameter (D) of 12mm and feature size (Δx) of 9.6μm. A infrared light LED power of 20w and 850nm wavelength is used to illuminate the optical component, and an 18mm lens to project the pattern onto the reflective surface of a motor-driven prism, which can produce temporally and spatially non-correlative varying random binary speckle patterns (~1.22million black and white dots) in the measurement volume.

For 3-D whole face reconstruction, A and B binocular measurement units are respectively responsible for alternately projecting and capturing the varying random patterns at 120Hz distorted by tested face at a distance of 750mm from the projector. The system is controlled by a synchronized control circuit that allows the acquisition of four 3-D image pairs within 0.0667s.

It needs to be emphasized that any image/data post-processing operations like smoothing filtering, interpolation and hole-filling, are not involved except the outlier removal of pointclouds with 5 × 5 median filtering. All experiments are conducted on a 3.5GHz Intel Core i8-8700K CPU and 16GB of RAM, without GPU optimization.

3.1 Accuracy evaluation of the 3-D system with SFP

By reference to Germany guideline VDI VDE2634 Part2 [30], a 140mm × 140mm square shaped ceramic standard plate with certified flatness 0.005mm and a dumbbell gauge with sphere diameters D₁ = 50.885mm, D₂ = 50.879 mm and sphere center distance d = 100.0178mm are adopted to test the accuracy of the 3-D system. Because A and B measurement units are the same, we select A Unit for the accuracy evaluation. A three frequencies (fringe period number 1,12,76) temporal phase unwrapping algorithm with four-step phase shifting technique based on sinusoidal fringe projection (SFP) [31–33] is adopted to implement the 3-D reconstruction for its high measurement accuracy. We use DLP4500 (1140pixels × 912pixels) digital projector with focal length 18mm to project sinusoidal patterns. Two standard components are displaced at the distance 720~800mm away from the projector.

Figure 6 shows tested results of ceramic standard plate. Figures 6(a)-6(c) are respectively one of phase-shifting fringe images with fringe number 1, 12 and 72. Figure 6(d) illustrates the plane fitting error map of No. 5 (see Fig. 6(e)) that the error is uniformly distributed. And the statistical results listed in Fig. 6(e) indicate that the maximum and minimum fitting standard derivation (Std) are respectively 0.0432mm (No. 5) and 0.0293mm (No. 6) at seven different spatial poses and no major changes are produced.

Fig. 6 Tested results of ceramic standard plate. (a)-(c) one of phase-shifting fringe images with period number 1, 12 and 72; (d) the plane fitting error map of No. 5; (e)fitting Std at seven different spatial poses.

Download Full Size | PDF

The statistical results of seven repeated testing a dumbbell gauge at different spatial poses are listed in Table 1. The mean, the maximum error and the mean error of measuring the sphere diameters of D₁ are respectively 50.8736mm, 0.0359mm and 0.0141mm, as well as 50.8739mm, 0.0339mm and 0.0180mm for D₂. Their least square spherical fitting deviations can be found in the last two columns. The mean, the maximum error and the mean error of measuring the sphere center distance d are 100.0105mm, 0.0222mm and 0.0153mm. Figure 7 illustrates the fitting error distributions of dumbbell gauge (D₁ and D₂) of No. 3 (See Table 1).

Table 1. Tested result of dumbbell gauge. (Unit: mm)

View Table | View all tables in this article

Fig. 7 Fitting error distributions of dumbbell gauge (D₁ and D₂) of No. 3. Unit: mm.

Download Full Size | PDF

Because this work focuses more attention on 3-D face reconstruction, we employ the same SFP based 3-D reconstruction method mentioned above to standardize a plastic human face model. The system parameters (image contrast, measurement distance, etc.) are carefully adjusted to ensure an optimum measurement result. Figures 8(a)-8(c) and 8(d)-8(f) respectively give some of captured fringe images respectively from A- and B-Unit. 3-D reconstruction results of A- and B Units are registered and fused with the method described in Sec. 2.2 Step. 4, leading to a whole 3-D shape (pointcloud) from left- and right ear, as shown in Figs. 8(g)-8(i) with three different perspectives, which will be as the standard to evaluate the 3-D reconstruction accuracy of our proposed solution mentioned above.

Fig. 8 A-unit: (a)-(c) one of phase-shifting fringe images with fringe number 1, 12 and 72; B-unit: (d)-(f) one of phase-shifting fringe images with period number 1,12 and 72; (g)-(i) 3-D reconstruction results of a plastic human face model from left- and right ear.

Download Full Size | PDF

3.2 Comparative experiments using the proposed 3-D reconstruction solution

Based on the same experimental configuration provided in Fig. 1, comparative experiments are implemented. In two measurement units, here we replace digital projectors with the projectors designed as shown in Fig. 1(c) to project temporally and spatially non-correlative varying random binary speckle patterns to encode the tested subject. The plastic human face model is still placed at the same measurement distance from the projector. In the manner of time division multiplexing, the random binary encoding patterns from projectors of A- and B-unit start to scan the model, the cameras synchronously capture the stereo image pairs. Considering that an optimum selection in balancing measurement time and accuracy, in total we adopt four stereo image pairs [12] respectively from A- and B-Unit. Next, we respectively use the pure (both pixel and subpixel-level matching, similarly hereinafter) ZNCC [12,22] with 9 pixels × 9pixels rectangular matching window and correlation threshold χ_th = 0.08, N = 4 (similarly hereinafter, unless otherwise noted), STLC-based 3-D reconstruction solution in Sec. 2.2 with binaried window of (S_x × S_y) = 3pixels × 3pixels, the same size matching window (9 pixels × 9pixels) and stereo image pairs (similarly hereinafter, unless otherwise noted), to determine the correspondences. Figures 9(a) and 10(a) show the reconstruction results respectively using ZNCC and STLC by registering and fusing the 3-D pointcloud data from A- and B unit when N is respectively equal to 4 and 1. Their error distributions in comparison with the standardized plastic model given by Figs. 8(g)-8(h) are shown in Figs. 10(a') and 10(b') .

Fig. 9 (a) 3-D reconstruction of the plastic model using ZNCC (N = 4) and (a') comparisons with SFP; (b) 3-D reconstruction using STLC (N = 4) and (b') comparisons with SFP. Units: mm.

Download Full Size | PDF

Fig. 10 (a) 3-D reconstruction of the plastic model using ZNCC (N = 1) and (a') comparisons with SFP; (b) 3-D reconstruction using STLC (N = 1) and (b') comparisons with SFP. Units: mm.

Download Full Size | PDF

By carefully observing, we can discover that on the whole the errors of Fig. 9(b') vs. Figure 9(a') and Fig. 10(b') vs. Figure 10(a'), the former are more uniformly distributed, while the latter shows a bigger error especially areas like ears, mouth and two sides of the nose characterized by the complexity and slope of shape. Statistically, the performances of the traditional ZNCC and the proposed STLC including the maximum error, mean error and Std of error are listed in Table 2. And the computation time of two algorithms changed with the number of stereo image pairs is also illustrated in Fig. 11. The results show that our proposed STLC-based 3-D reconstruction solution has almost the same performance as the ZNCC when N = 4 in measurement accuracy and even yields a higher accuracy with N = 1. Furthermore, our proposed schemes show an overwhelming advantage in matching time regardless of the number (N) of stereo image pairs.

Table 2. Performance comparison between the traditional ZNCC and the proposed STLC by measuring a plastic model . (Unit: mm)

View Table | View all tables in this article

Fig. 11 Computation time comparison between the traditional ZNCC and the proposed STLC by measuring a plastic model.

Download Full Size | PDF

3.3 3-D whole face reconstruction of a real human

For further validating the feasibility of the proposed scheme in 3-D human face registration, a real human face is involved. Figures 12(a)-12(b) demonstrate the experimental results on the basis of experimental configuration in Fig. 1(a). Figures 12(a)-12(b) and 12(c)-12(d) are the captured 3-D stereo image pairs respectively distorted by left- and right faces. The corresponding 3-D reconstruction results of whole face with ZNCC and the proposed STLC when N = 4 are shown in Fig. 12(e) and 12(e') . We also employ the quality analysis function of Geometric Studio 2015 to compare their differences quantitatively. Figure 12(f) visualizes the difference distributions in three different views. The maximum, mean and standard derivation of absolute (distance) error are 2.972mm, 0.035mm and 0.086mm, respectively. The computation time of ZNCC is 1.0126s while the proposed STLC is 0.0772s. By closely observing the color map, we can obviously find that the performance of STLC is very close to that of ZNCC in most of areas, which agrees very well with the conclusion in Sec. 3.2 (Fig. 9).

Fig. 12 (a)-(b) and (c)-(d) the stereo images pairs sequences (N = 4) of measuring a real human face from A- and B-Unit; (e) 3-D reconstruction of whole face by registering and fusing point- cloud data from (a)-(b) and (c)-(d) by ZNCC; (e') 3-D reconstruction of whole face by registering and fusing point cloud data from (a)-(b) and (c)-(d) by STLC; (f) difference distributions between 3-D reconstructions of (e) and (e').

Download Full Size | PDF

4. Discussion

In this paper, our attention is mainly focused on how to implement 3-D reconstruction to meet the requirement of 3-D face registration in real application environments. STLC based 3-D reconstruction scheme is provided, which has also been quantitatively tested to show the accuracy and efficiency via a standardized plastic human face model by SFP. The procedure of sub-pixel matching does not involve any searching activities because the integer-pixels disparity has been obtained by STLC beforehand, so that the calculation cost is greatly reduced relative to the traditional ZNCC-based matching algorithm. It is noticed that the time consumed by point cloud registration isn’t measured since how to realize a fast and robust point cloud registration is another popular research topic. All of the experiments carried out in this paper validate the success and advantages of our proposals over the widely used ZNCC stereo matching algorithm. However, several aspects are still needed to be further explained.

We draw from the idea in [27] that the projector is based on laser illumination while our developed LED driven projector has several advantages: 1) there is no coherence noise that offers better 3-D reconstruction quality without any image prepossessing [7]; 2) by a motor-driven reflective wedge mirror, continually varying random binary encoding patterns are produced. For each of measurement units, only two cameras are required to be strictly synchronized to capture the distorted patterns while the strict synchronization among the projector and cameras is not needed, yielding a good feasibility as opposed to the traditional digital projectors; 3) the random binary pattern etched on a glass enables the number and average size of speckle dots can be feasibly controllable according to the system parameters [12], which is extremely beneficial to 3-D reconstruction of different accuracy requirements.

In all of experiments, we select the same matching parameters like the matching (binarization) window size and correlation threshold. Besides, the image binarization methods for STLC can also affect the final stereo matching. More optimum considerations could be available in view of statistics for balancing both reliability and spatial resolution, but this topic is beyond the scope of our attention in this work..

Let's turn to the experimental results in Fig. 9, Fig. 10 and Table 2. The final statistical error can come from the registration and fusion as well as the accuracy of measurement principle, but we use the same standardized plastic human face model and registration operation as the criterion. Therefore, the evaluation results are referential.

As the contrast of optical random binary encoding pattern is lower on the human face skin compared to on the plastic human face model, we believe that it can be increased by improving the projection device. High contrast can further boost the quality of the pointcloud. We plan to further investigate the possibilities of using high powered infrared light source, more optimized pattern design and associated high-resolution optical system for the improvement of 3-D reconstruction quality.

Taking into consideration the logical calculation property (pixel-level matching) of STLC based 3-D reconstruction scheme, our group will make great efforts to transplant it on the hardware platform instead of the current PC for a substantial increase of computation speed, which will greatly promote the development of 3-D human recognition application.

5. Summary

In this paper, we have successfully demonstrated the capability of the proposed scheme and experimental configuration to achieve satisfactory 3-D face reconstructions in registration. Experiments are conducted to evaluate the accuracy of 3-D face acquisition system using a plastic human face model standardized by SFP, and the 3-D results provided by our proposed scheme are comparable with that of the widely used and recognized ZNCC-based 3-D reconstruction scheme, but our proposal displays a much higher matching efficiency under the same parameters, which is attractive in time-critical applications like 3-D face registration or other scenes. However, the current reconstruction procedure is not perfect, which needs to be improved according to the actual application. Our future work will also place more emphasis on the implementation of the system that can automatically acquire the 3-D face data instead of any human intervention and be hopeful to be put into practice.

Funding

2018 Sichuan Province Science and Technology Innovation Seedling Project(18-YCG041); Full-time Postdoctoral Research and Development Funds of Sichuan University (2017SCU12023); China Post-doctoral Special Funds (2016T90847); China's Post-doctoral Science Fund (2015M572472); National Major Instrument Special Fund (2013YQ490879); National Natural Science Foundation of China (61801057), Sichuan education department project(18ZB0124).

References

1. J. Geng, “Structured-light 3D surface imaging: a tutorial,” Adv. Opt. Photonics 3(2), 128–160 (2011). [CrossRef]

2. S. Zhang, “High-speed 3D shape measurement with structured light methods: A review,” Opt. Lasers Eng. 106, 119–131 (2018). [CrossRef]

3. M. Takeda and K. Mutoh, “Fourier transform profilometry for the automatic measurement of 3-D object shapes,” Appl. Opt. 22(24), 3977–3982 (1983). [CrossRef] [PubMed]

4. Z. H. Zhang, “Review of single-shot 3D shape measurement by phase calculation-based fringe projection techniques,” Opt. Lasers Eng. 50(8), 1097–1106 (2012). [CrossRef]

5. X. Liu, H. Zhao, G. Zhan, K. Zhong, Z. Li, Y. Chao, and Y. Shi, “Rapid and automatic 3D body measurement system based on a GPU-Steger line detector,” Appl. Opt. 55(21), 5539–5547 (2016). [CrossRef] [PubMed]

6. L. Maier-Hein, P. Mountney, A. Bartoli, H. Elhawary, D. Elson, A. Groch, A. Kolb, M. Rodrigues, J. Sorger, S. Speidel, and D. Stoyanov, “Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery,” Med. Image Anal. 17(8), 974–996 (2013). [CrossRef] [PubMed]

7. D. Khan, M. A. Shirazi, and Y. K. Min, “Single shot laser speckle based 3D acquisition system for medical applications,” Opt. Lasers Eng. 105, 43–53 (2018). [CrossRef]

8. 3dMD, “3dMD home page,” http://www.3dmd.com/.

9. J. Guo, X. Peng, A. Li, X. Liu, and J. Yu, “Automatic and rapid whole-body 3D shape measurement based on multinode 3D sensing and speckle projection,” Appl. Opt. 56(31), 8759–8768 (2017). [CrossRef] [PubMed]

10. M. Sjödahl and P. Synnergren, “Measurement of shape by using projected random patterns and temporal digital speckle photography,” Appl. Opt. 38(10), 1990–1997 (1999). [CrossRef] [PubMed]

11. J. García, Z. Zalevsky, P. García-Martínez, C. Ferreira, M. Teicher, and Y. Beiderman, “Three-dimensional mapping and range measurement by means of projected speckle patterns,” Appl. Opt. 47(16), 3032–3040 (2008). [CrossRef] [PubMed]

12. P. Zhou, J. Zhu, and H. Jing, “Optical 3-D surface reconstruction with color binary speckle pattern encoding,” Opt. Express 26(3), 3452–3465 (2018). [CrossRef] [PubMed]

13. A. Wiegmann, H. Wagner, and R. Kowarschik, “Human face measurement by projecting bandlimited random patterns,” Opt. Express 14(17), 7692–7698 (2006). [CrossRef] [PubMed]

14. Wikipedia, “PrimeSense,” https://en.wikipedia.org/wiki/PrimeSense.

15. Microsoft, “Kinect for Windows,” https://developer.microsoft.com/en-us/windows/kinect.

16. Apple, “Iphone X,” https://www.apple.com/cn/iphone-x/.

17. Intel, “Realsense,” https://realsenseapp.intel.com/.

18. J. Jiang, J. Cheng, and H. Zhao, ”Stereo matching based on random speckle projection for dynamic 3D sensing,” International Conference on Machine Learning and Applications, IEEE, 191–196(2013).

19. B. Pan, H. Xie, and Z. Wang, “Equivalence of digital image correlation criteria for pattern matching,” Appl. Opt. 49(28), 5501–5509 (2010). [CrossRef] [PubMed]

20. M. A. Sutton, J. J. Orteu, and H. Schreier, Image Correlation for Shape, Motion and Deformation Measurements: Basic Concepts, Theory and Applications, Springer US, (2009).

21. P. Lutzke, M. Schaffer, P. Kühmstedt, R. Kowarschik, and G. Notni, “Experimental comparison of phase-shifting fringe projection and statistical pattern projection for active triangulation systems,” Proc. SPIE 8788, 878813 (2013). [CrossRef]

22. L. Di Stefano, S. Mattoccia, and F. Tombari, “ZNCC-based template matching using bounded partial correlation,” Pattern Recognit. Lett. 26(14), 2129–2134 (2005). [CrossRef]

23. K. Liu, C. Zhou, S. Wei, S. Wang, X. Fan, and J. Ma, “Optimized stereo matching in binocular three-dimensional measurement system using structured light,” Appl. Opt. 53(26), 6083–6090 (2014). [CrossRef] [PubMed]

24. G. Wang, X. Yin, X. Pei, and C. Shi, “Depth estimation for speckle projection system using progressive reliable points growing matching,” Appl. Opt. 52(3), 516–524 (2013). [CrossRef] [PubMed]

25. M. Schaffer, M. Grosse, B. Harendt, and R. Kowarschik, “High-speed three-dimensional shape measurements of objects with laser speckles and acousto-optical deflection,” Opt. Lett. 36(16), 3097–3099 (2011). [CrossRef] [PubMed]

26. M. Schaffer, M. Grosse, and R. Kowarschik, “High-speed pattern projection for three-dimensional shape measurement using laser speckles,” Appl. Opt. 49(18), 3622–3629 (2010). [CrossRef] [PubMed]

27. B. Harendt, M. Grosse, M. Schaffer, and R. Kowarschik, “3D shape measurement of static and moving objects with adaptive spatiotemporal correlation,” Appl. Opt. 53(31), 7507–7515 (2014). [CrossRef] [PubMed]

28. M. Große, M. Schaffer, B. Harendt, and R. Kowarschik, “Fast data acquisition for three-dimensional shape measurement using fixed-pattern projection and temporal coding,” Opt. Eng. 50(10), 100503 (2011). [CrossRef]

29. D. Scharstein and R. Szeliski, “A taxonomy and evaluation of dense two-frame stereo correspondence algorithms,” Int. J. Comput. Vis. 47(1/3), 7–42 (2002). [CrossRef]

30. VDI/VDE 2634 Blatt 2: 2002–08 Optische 3D-Messsysteme; Systeme mit flachenhafter Antastung. Berlin: Beuth Verlag.

31. J. M. Huntley and H. Saldner, “Temporal phase-unwrapping algorithm for automated interferogram analysis,” Appl. Opt. 32(17), 3047–3052 (1993). [CrossRef] [PubMed]

32. J. Zhu, P. Zhou, X. Su, and Z. You, “Accurate and fast 3D surface measurement with temporal-spatial binary encoding structured illumination,” Opt. Express 24(25), 28549–28560 (2016). [CrossRef] [PubMed]

33. P. Zhou, J. Zhu, X. Su, Z. You, H. Jing, C. Xiao, and M. Zhong, “Experimental study of temporal-spatial binary pattern projection for 3D shape acquisition,” Appl. Opt. 56(11), 2995–3003 (2017). [CrossRef] [PubMed]

No.	D₁ (50.885)	D₂ (50.879)	d (100.0178)	Fitting Std. D₁ D₂
1	50.8764	50.8622	100.0215	0.0317	0.0348
2	50.8874	50.8696	99.9956	0.0371	0.0401
3	50.8858	50.8987	100.0348	0.0279	0.0294
4	50.8491	50.8451	100.0009	0.0236	0.0231
5	50.8914	50.8971	99.9994	0.0332	0.0297
6	50.8651	50.8588	100.0237	0.0273	0.0301
7	50.8602	50.8863	99.9977	0.0401	0.0302
Mean	50.8736	50.8739	100.0105	0.0316	0.0311
Maximum Error	0.0359	0.0339	0.0222	-	-
Mean Error	0.0141	0.0180	0.0153	-	-

Method	N	Maximum Error	Mean Error	Std. of error
ZNCC	4	0.868	0.026	0.038
ZNCC	1	1.280	0.082	0.096
STLC	4	0.727	0.020	0.035
STLC	1	1.511	0.030	0.048

No.	D₁ (50.885)	D₂ (50.879)	d (100.0178)	Fitting Std. D₁ D₂
1	50.8764	50.8622	100.0215	0.0317	0.0348
2	50.8874	50.8696	99.9956	0.0371	0.0401
3	50.8858	50.8987	100.0348	0.0279	0.0294
4	50.8491	50.8451	100.0009	0.0236	0.0231
5	50.8914	50.8971	99.9994	0.0332	0.0297
6	50.8651	50.8588	100.0237	0.0273	0.0301
7	50.8602	50.8863	99.9977	0.0401	0.0302
Mean	50.8736	50.8739	100.0105	0.0316	0.0311
Maximum Error	0.0359	0.0339	0.0222	-	-
Mean Error	0.0141	0.0180	0.0153	-	-

Method	N	Maximum Error	Mean Error	Std. of error
ZNCC	4	0.868	0.026	0.038
ZNCC	1	1.280	0.082	0.096
STLC	4	0.727	0.020	0.035
STLC	1	1.511	0.030	0.048

3-D face registration solution with speckle encoding based spatial-temporal logical correlation algorithm

Abstract

Introduction

2. Principle

2.1 3-D information acquisition system of whole face

2.2 STLC -based stereo matching and 3-D reconstruction of whole face

3. Experiments

3.1 Accuracy evaluation of the 3-D system with SFP

3.2 Comparative experiments using the proposed 3-D reconstruction solution

3.3 3-D whole face reconstruction of a real human

4. Discussion

5. Summary

Funding

References

Cited By

Figures (12)

Tables (2)

Equations (8)

Optics Express