Three-dimensional distortion-tolerant object recognition using integral imaging

Sekwon Yeom; Bahram Javidi

doi:10.1364/OPEX.12.005795

1. Introduction

The ability to detect and identify objects in input scenes and categorizing them into a class has been subject of intense research [1–21]. Distortion, rotation, and scaling changes of objects create many challenges toward achieving this aim. Automatic target recognition (ATR) needs to deal with uncooperative objects in complex scenes.

Recently, there has been growing research interest on object recognition in 3D [9–14]. Three-dimensional scenes have additional information beyond that of 2D space. In digital holography, we can reconstruct 2D images with varying depths and perspectives from complex data. In II, we utilize multiple 2D scenes that are characterized by their own perspectives to reconstruct 3D scenes [13, 14].

II is a popular recording and displaying technique for 3D scenes. II uses a pinhole array or a micro-lenslet array as illustrated in Fig. 1. Each lenslet in the lenslet array generates an elemental image on a Charge-Coupled Device (CCD) during the pickup process. Reconstruction is the reverse of the pickup process. The 3D scene of the pseudoscopic real image is formed by propagating elemental images through the lenslet array from the display device. The pseudoscopic real image is displayed by the rays from opposite directions but having the same intensities as in the pickup process. II provides autostereoscopic images with full parallax, continuously varying viewpoints, and real perspectives [22, 23]. Orthoscopic images can be reconstructed by various methods. Recently, more advanced techniques were developed for improved viewing angle and resolution [24].

The scope of the II application has been extended to object recognition and depth estimation. In [13], the II was analyzed by estimating the longitudinal depth of a 3D object. The non-stationary technique improved the resolution of reconstructed II and accuracy of depth estimation in [14]. Recently, novel methods to extract depth information by means of disparity between modified elemental images are proposed in [15, 16].

Object recognition is a comprehensive task to recognize patterns with arbitrary distortion, orientation, location and scale [20, 21]. In this paper, we specify our scope to research on distortion — particularly caused by out-of-plane rotation. One advantage of the II system is that we can record multiple and continuous perspective scenes of objects by one shot instead of using several cameras or changing the position of cameras. Therefore, we can build a compact system in capturing the 3D information for a distortion (rotation)-tolerant object recognition system.

We apply Principle Component Analysis (PCA) and Fisher Linear Discriminant (FLD) analysis to out-of-plane rotation-tolerant 3D object recognition using II. They are both linear transformation techniques for dimensionality reduction according to specific optimal criteria [1,2,3,12,17–21]. In PCA, input vectors are mapped into a subspace spanned by eigenvectors of covariance matrix of training data. The PCA projects data onto the principal components, so it maximizes the scatter of data. This projection is optimal in terms of Mean Square Error (MSE) between original and reconstructed vectors. Therefore, the PCA is more suitable to statistically redundant and highly correlated data which are characteristics of elemental images because it can retain more dominant features and generate less RMSE.

The FLD projects data onto another low dimensional subspace where the discrimination of features are maximized by maximizing the ratio of the between-class scatter to the within-class scatter [1,3]. The FLD is more suitable in the case classes are linearly separable. In the PCA-FLD classifier, we can handle high dimensional data by projecting dominant features onto lower subspace while reducing redundant or noisy data of the II system.

Our decision strategies are composed of two decision rules: 1) the nearest neighbor rule using Euclidean distance, and 2) statistical distance decision rule using Mahalanobis distance. The second rule is equivalent to Maximum Likelihood (ML) decision rule, assuming Gaussian Probability Density Function (PDF) with equal covariances between classes [1].

In this paper, we categorize input elemental images of 3D objects into one of rotation angle sets and/or one of object classes. The performance is analyzed in terms of probability of correct decision, normalized Root Mean Square Error (RMSE), the PCA-FLD cost function, and different decision rules. The results are compared by varying the number of training data.

The organization of the paper is as follows. In Section 2, we present the system description. Overviews of PCA and FLD analysis are described in Section 3. The decision rules and performance evaluations are presented in Section 4. Experimental results are discussed in Section 5. Summary and conclusions are presented in Section 6.

Fig. 1. Optical system for II. Each lenslet produces an elemental image.

Download Full Size | PDF

2. System description

As shown in Fig. 1, II is used to obtain 3D information of the object. Each lenslet obtains a 2D perspective of the 3D scene which is recorded by the CCD. These 2D perspectives are referred to as elemental images.

The proposed system is performed via several stages as shown in Fig. 2: II acquisition, elemental image alignment, PCA-FLD training and projection, and decision rules. In the first stage, we acquire elemental images for several classes of 3D objects. Next, we align elemental images in each integral image with one common reference elemental image which is located at the center of the integral image. The training and projection of PCA-FLD classifier follow in the next stage. Finally, the decision rules determine the classes of input images. In the experiments, we define the class as the rotation angle sets as well as types of objects.

In the following, we discuss the alignment of elemental images. The alignment or segmentation of objects in each elemental image is an important step in the proposed 3D object recognition. Elemental images provide different but continuous perspectives of the same 3D object. The corresponding positions between the objects in elemental images are not uniformly apart from each other. This shift is a function of the longitudinal distance between the 3D object and the lenslet array, and the position of the lenslet. That is, the farther a lenslet is located from the center of the micro-lenslet array, the more the object will be shifted from the center of the elemental image, which corresponds to the optical axis of the lenslet. Therefore, alignment is a necessary preprocess before training their disparity and similarity between objects. By shifting each object to the position which satisfies a certain criterion, we reduce unnecessary variation of objects to extract more distinguished features from them.

In this paper, we adopt the cross-correlation technique for the alignment [13] of elemental images. It is often used in stereo-matching algorithms for position matching of two scenes. Elemental images are shifted to the positions where the cross-correlation coefficients between the central elemental image and the off-axis elemental image are maximized:

[\hat{p}, \hat{q}] = max_{p, q} c (p, q),

c (p, q) = \frac{\sum_{x = 1}^{M_{x}} \sum_{y = 1}^{M_{y}} e_{r} (x, y) e_{i} (x + p, y + q)}{\sum_{x = 1}^{M_{x}} \sum_{y = 1}^{M_{y}} {∣ e_{r} (x, y) ∣}^{2} \sum_{x = 1}^{M_{x}} \sum_{y = 1}^{M_{y}} {∣ e_{i} (x, y) ∣}^{2}},

where c(p,q) is a cross-correlation coefficient for the elemental image intensity e_i and the intensity of the reference elemental image e_r ; and M_x and M_y are the sizes of the elemental image in x and y directions, respectively. We shift each elemental image e_i by the position estimates p̂ and q̂:

{e'}_{1} (x, y) = e_{i} (x + \hat{p}, y + \hat{q}), x = 1, \dots, M_{x}, y = 1, \dots, M_{y} .

Fig. 2. Frameworks of object recognition system using II.

Download Full Size | PDF

3. PCA-FLD classifier

3.1. Principal component analysis (PCA)

The PCA is a popular projection method to represent d-dimensional vectors in the l dimension subspace (l≤d) [1,2,3]. For a real d-dimension random vector x, let the mean vector be µ _x =E(x), and the covariance matrix be ∑_xx=E(x-µ _x )(x-µ _x ) ^t where t denotes matrix transpose. The PCA space is spanned by orthonormal eigenvectors of the covariance matrix; that is, ∑ _xx E=EΛ where the column vectors of E are normalized eigenvectors e _i ’s, i.e., E=[e ₁,…,e _d ], and the diagonal components of Λ are eigenvalues λ_i ’s, i.e. Λ=diag(λ₁,…,λ_d ). The PCA projection matrix W_p is the same as the eigenvector matrix E. Therefore, a projected vector y by PCA projection matrix W_p is y= $W_{P}^{t}$ x=E ^t x.

The PCA projection diagonalizes the covariance matrix of y, i.e. ∑ _yy =E(y-µ _y )(y-µ _y )^t=Λ where µ _y=E(y). If we choose PCA projection matrix W_p =[e ₁,…,e_l ], the PCA subspace is spanned by corresponding l eigenvectors. It is a well known property of the PCA that by choosing l eigenvectors of the largest eigenvalues, the projected vector minimizes the Mean Square Error (MSE) between a vector x and a reconstructed vector x̂ from the PCA subspace. The MSE is defined as:

MSE (\hat{x}) = E {∥ x - \hat{x} ∥}^{2} = \sum_{i = l + 1}^{d} λ_{i}

where x̂=µ _x+W_P (y-µ _y )=µ _x +W_P $W_{P}^{t}$ (x-µ _x ) ; and $λ_{i}^{’}$ s are eigenvalues of λ_d ≤λ _d-1,…,λ ₂≤λ ₁.

The PCA projection reduces the dimension of the vectors while retaining dominant features of the object structure. The PCA can remove any redundant information reducing the noisy parts in 3D data. The optimal l cannot be decided analytically. Usually, the projected vectors of smaller eigenvalues include more noise than the vectors corresponding to larger eigenvalues. However, very low l may not contain enough energy to properly represent the characteristics of the object.

3.2 Fisher linear discriminant (FLD) analysis

The FLD projects l-dimension vectors onto a subspace of k dimension (k≤l). The FLD maximizes the ratio of determinant of between-class scatter matrix S_B to determinant of within-class scatter matrix S_W . The between-class scatter matrix is a measure of the separations of the classes. The within-class scatter matrix is proportional to the sample covariance matrix, so, it is inversely proportional to the concentration of data in each class. S_B and S_W for training data vector y are defined as:

S_{B} = \sum_{j = 1}^{c} n_{j} (m_{j} - m) {(m_{j} - m)}^{t},

S_{W} = \sum_{j = 1}^{c} \sum_{n = 1}^{n_{j}} (y_{j} (n) - m_{j}) {(y_{j} (n) - m_{j})}^{t},

where c is the number of classes; n_j is the number of training data in the class j; m _j is the sample mean vector of the class j; m is the sample mean vector of all training data; and y _j (n) is the n-th training vector in the class j. Total scatter matrix S_T is defined as $S_{T} = S_{B} + S_{W} = \sum_{n = 1}^{n_{t}} (y (n) - m) {(y (n) - m)}^{t}$ where n_t is the total number of training data, that is, $\sum_{j = 1}^{c}$ n_j =n_t ·

Let the FLD transformation matrix be W_F and z be a new vector after projection of y, i.e. z= $W_{F}^{t}$ y. After applying W_F to each scatter matrix, we have:

{\tilde{S}}_{B} = W_{F}^{t} S_{B} W_{F} = \sum_{j = 1}^{c} n_{j} ({\tilde{m}}_{j} - \tilde{m}) {({\tilde{m}}_{j} - \tilde{m})}^{t},

{\tilde{S}}_{W} = W_{F}^{t} S_{W} W_{F} = \sum_{j = 1}^{c} \sum_{n = 1}^{n_{j}} (z_{j} (n) - {\tilde{m}}_{j}) {(z_{j} (n) - {\tilde{m}}_{j})}^{t},

where m̃ _j is the sample mean vector of the class j after projection; m̃ is the sample mean vector of all training data after projection; and z _j (n) is the n-th training vector in the class j after projection. W_F maximizes the cost function, J(W)=|S̃ _B |/|S̃ _W |=|W^tS_BW|/|W^tS_WW|· The column vectors of W_F are the eigenvectors of $S_{W}^{- 1}$ S_B with the largest nonzero k eigenvalues [1,3]. Note that k is a reduced dimension that is k≤c-1 because the rank of S_B is at most c-1. In other words, the maximum number of nonzero eigenvalues of $S_{W}^{- 1}$ S_B is c-1. Therefore, the maximum dimension of the FLD projection for the c-class problem is c-1, so, k ≤c-1≤l.

In the FLD projection, usually S_W is singular because the total number of training data n_t is much less than the dimension of the training vector. We can overcome this problem by applying the PCA first to reduce the dimensionality of the vector. Reduced dimension of the PCA should be less than or equal to n_t -c because the number of independent vectors in S_W is at most n_t -c. We know that dimension k in the FLD subspace should satisfy the relation, k≤c-1≤l. So, we can combine them as k≤c-1≤l≤n_t -c. The FLD combined with the PCA has two consecutive projections of W_P and W_F . Final projected vector in k dimensional space is z= $W_{F}^{t}$ y=W_t _F $W_{P}^{t}$ x with the cost function:

J (W_{F}, W_{P}) = \frac{∣ W_{F}^{t} S_{B} W_{F} ∣}{∣ W_{F}^{t} S_{W} W_{F} ∣} = \frac{∣ W_{F}^{t} W_{P}^{t} S_{B 0} W_{P} W_{F} ∣}{∣ W_{F}^{t} W_{P}^{t} S_{W 0} W_{P} W_{F} ∣}, .

where S_B and S_W is between-class scatter matrix and within-class scatter matrix of training data vector y after PCA projection, respectively; and S_B0 and S_W0 is between-class scatter matrix and within-class scatter matrix of training data vector x, respectively.

4. Decision rules and performance evaluations

4.1 Decision rules

One approach for decision rule is to use Euclidean distance, which is called the nearest neighbor decision rule. We classify a test vector z as the class ĵ as following:

z \in C_{\hat{j}} if \hat{j} = \underset{j}{\arg min} ∥ z - m_{z}^{j} ∥, j = 1, \dots, c,

where

Cĵ _j is the data set of the class jĵ; ‖·‖ is Euclidean norm; and $m_{z}^{j}$ s the sample mean vector of the training data in the class j.

The statistical distance decision rule uses the discriminant function of statistical (Mahalanobis) distance: g_j (z)=(z-m j_z ) ^t ( $\sum^{^}$ ^j _zz )^-1(z- $m_{z}^{j}$ ) where $m_{z}^{j}$ is the sample mean vector of training data in the class j and $\sum^{^}$ ^j _zz is the unbiased sample covariance matrix of the training data in the class j. We classify the test vector z as the class jĵ as following:

z \in C_{\hat{j}} if \hat{j} = \underset{j}{\arg min} g_{j} (z), j = 1, \dots, c,

where Cĵ _j is the data set of the class jĵ. It is also a powerful similarity measure while being used for the case of unknown distributions as well as the Gaussian distribution of the same covariance.

Generally, for the large number of training data or Gaussian distributed test data, the statistical distance decision rule is known to be superior to the nearest neighbor decision rule [1]. However, any number of training data is allowed in the nearest neighbor decision rule while for the statistical distance decision rule, at least k+1 training data are required for each class to avoid singularity of the sample covariance matrices, where k is the dimension of the vector z.

4.2. Performance evaluations

We define the probability of correct decision P_d as:

P_{d} = \frac{Number of correct decisions}{Number of test data} .

Another parameter for the performance evaluation is the RMSE between raw vectors and reconstructed test vectors from PCA subspace as shown in Eq. (4). The RMSE is computed by averaging normalized errors over all test data:

rmse = \sqrt{\frac{1}{n_{test}} \sum_{n = 1}^{n_{test}} \frac{{∥ X_{test} (n) - {\hat{X}}_{test} (n) ∥}^{2}}{{∥ X_{test} (n) ∥}^{2}}},

{\hat{X}}_{test} (n) = m_{x} + {\hat{W}}_{p} {\hat{W}}_{p}^{t} (X_{test} (n) - m_{x}),

where n_test is the number of test data vectors; x _test (n) is the n-th test data vector, m _x is the sample mean vector computed from training data; WŴ _p is the PCA projection matrix computed by the sample mean vector m _x and the unbiased sample covariance matrix $\sum^{^}$ _xx . For the PCA projection matrix, unknown mean and covariance are estimated by the sample mean vector m _x and the unbiased sample covariance matrix $\sum^{^}$ _xx from training data, respectively:

m_{x} = \frac{1}{n_{t}} \sum_{n = 1}^{n_{t}} x (n),

{\sum^{̂}}_{x x} = \frac{1}{n_{t} - 1} \sum_{n = 1}^{n_{t}} (x (n) - m_{x}) {(x (n) - m_{x})}^{t},

where n_t is the total number of training data; and x(n) is the n-th training data vector.

5. Experimental results

In this section, we present the experimental setup and the experimental results for a number of 3D objects. Our experiments for object classification have two approaches: (1) we categorize elemental images into one of rotation angle sets for each object (car); and (2) we classify elemental images for both the object type and the rotation angle set simultaneously. During the first experiment, the number of classes, c is the same as the number of out-of-plane rotation angles sets r. For the second experiment, the number of classes of interest is o×r, where o is the number of object types.

We investigate the system performance in terms of probability of correct decision, RMSE between raw data and the reconstructed data after the PCA projection, the PCA-FLD cost function, different decision rules, and the effect of the number of training data.

5.1. Integral imaging acquisition and pre-process

Figure 1 illustrates the II optical system for capturing 3D scenes from out-of-plane rotated objects. The experimental system is composed of a micro-lenslet array, an imaging lens, and a CCD camera. The focal length of each micro lenslet is about 3 mm, the focal length of the imaging lens is 50 mm, and the f-number of the imaging lens is 2.5. The imaging lens is inserted between the lenslet array and the CCD camera due to the short focal length of the lenslets. Three types of toy cars are used in the experiments as shown in Fig. 3. The size of three toy cars is about 2.5 cm×2.5 cm×4.5 cm. The distance between the CCD camera and the imaging lens is 7.2 cm, and the distance between the micro-lenslet array and the imaging lens is 2.9 cm. One integral image corresponds to a set of 56 (7×8) elemental images and each elemental image has 140×140 pixels. Six sets of elemental images for each toy car are obtained according to 6 different out-of-plane rotation angles. We rotate objects from 30° to 45° and capture integral images by every 3° as shown in Table 1. Movies of integral images are shown in Fig. 4. In the experiments, only intensity information is used.

Let us denote an integral image (a set of elemental images) as I(r,r_i ), when the class of the toy car captured is i and the rotation angle set is r_i . The reference elemental image for alignment is chosen as the elemental image located at the center of I(1, 4). All elemental images are shifted according to the maximizing cross-correlation coefficients in Eq. (2). After alignment, each elemental image constitutes a column vector; therefore, the dimension (d) of the vector is equal to M_x ×M_y , where M_x and M_y are the size of the elemental image in x and y directions, respectively; and d is 19600 pixels (140×140).

Fig. 3. Three toy cars used in the experiments; object type 1, 2 and 3 are shown from right to left.

Download Full Size | PDF

Table 1. 6 out-of-plane rotation angles

View Table | View all tables in this article

Fig. 4. (Each 1.82 MB) Movies of II frames for out-of-plane rotated objects. (a) car 1 (13.98 MB version), (b) car 2 (13.98 MB version), (c) car 3 (13.98 MB version). [Media 1, Media 2, Media 3]

Download Full Size | PDF

5.2 Classification of the rotation angles

In the first experiment, we aim at classifying the rotation angle set of each object. By training randomly selected data from each integral image, we embed necessary information to distinguish out-of-plane rotation.

The number of classes is the same as the number of rotation angles, therefore, c=6. We have defined the number of training data for each class as n_j and the total number of training data as n_t . For training, we randomly select 7 training elemental images from each set of elemental images, so the total number of training data is 42 (n_j =7 and n_t =42). Tests are performed for all other elemental images which are not used for the training. All tests are repeated for 100 runs and the averages are computed. The same experiment is performed for all three types of objects. The dimension of the PCA subspace (l) is set at 30 and the dimension of the FLD subspace (k) is set at 2; l and k are determined heuristically when better results are produced throughout the paper. The same experiments are performed with 15 training data.

Figure 5 shows the first, second, third and last basis-images (column vectors in W_P ) for the object type 1 in the PCA subspace. The basis-images are the eigenvectors e ₁, e ₂, e ₃, and e ₃₀, which correspond to eigenvalues λ ₁, λ ₂, λ ₃ and λ ₃₀, respectively. The basis-images of larger eigenvalues demonstrate more distinguishable characteristics of training images while noisy data are present in the last basis-image.

Fig. 5. Examples of basis-images of PCA subspace (column vectors in W_P ) for rotation angle sets of experiments. c=6, n_j =15, l=30, and k=2. (a) 1st, (b) 2nd, (c) 3rd, (d) 30th basis-images.

Download Full Size | PDF

Figure 6 shows all 2 basis-images (column vectors in W_PW_F ) for the object type 1 in the PCA-FLD subspace. Each basis-image represents different features of rotation angles of the object type 1, individually.

Fig. 6. Examples of basis-images of PCA-FLD subspace (column vectors in W_PW_F ) for rotation angle sets of experiments. c=6, n_j =15, l=30, and k=2. (a) 1st, (b) 2nd basis images.

Download Full Size | PDF

In Figs. 7–8, we present the results when the number of training data is 15. Figure 7 shows average probability of correct decision (P_d ) for each type of objects over 100 runs. Figure 8 shows average RMSE for each type of object. Elemental images taken from the rotation angle set in the mid-range provides smaller RMSE because characteristics of the objects in the mid-range are more embedded during the training.

Fig. 7. Average probability of correct decision (P_d ) for each type of objects. (a) nearest neighbor decision rule, (b) statistical distance decision rule.

Download Full Size | PDF

Fig. 8. Average RMSE for each type of objects.

Download Full Size | PDF

Table 2 shows overall relationship between average probability of correct decision (P_d ), average RMSE for the PCA projection, and average of the cost function J(W_F ,W_P ) for the PCA-FLD classifier. We observe the correct decision rates depend on both of the average RMSE and average of the cost function J(W_F ,W_P ). Statistical distance decision rule shows better performance when the number of training data increases. This is because the data distribution approaches Gaussian as the number of data increases.

Table 2. Performance evaluation for each type of objects

View Table | View all tables in this article

5.3 Classification of the object type and rotation angles

In the second experiment, we determine both the object type and rotation angle set simultaneously. By training data in multiple categories which are composed of different rotation angles and different object types, we characterize the recognition system to distinguish a large variation of object types as well as a small disparity of rotation angles.

The number of classes to be considered is 18 (3×6). As in the first experiment, we randomly select 7 and 15 training data from each set of elemental images. Tests are performed for all other elemental images which are not used for training. All tests are repeated for 100 runs and the averages are computed. The dimension of the PCA subspace (l) is set at 70 and the dimension of the FLD subspace (k) is set at 4.

Figure 9 shows the first, second, third and last basis-images in the PCA subspace (column vectors in W_P ). They have similar properties as in the first experiments; however it is noted that basis images for the PCA subspace demonstrate the effects of different types of cars as well as different rotation angles. Figure 10 shows all 4 basis-images in the PCA-FLD subspace (column vectors in W_PW_F ).

We present the results when the number of training data is 15 in Fig. 11–12. Figure 11 shows average probability of correct decision over 100 runs for the nearest neighbor decision and statistical distance decision rules. Figure 12 shows average RMSE. We classify a total of 18 classes to determine both the object type (the number of different objects is 3) and the rotation angle (the number of rotation angles is 6). Table 3 shows the effect of different numbers of training data. As we use more training data, average RMSE decreases and average of the cost functions increases, resulting in better performance.

Fig. 9. Examples of basis-images of PCA subspace (column vectors in W_P ) for the second experiment. c=18, n_j =15, l=70, and k=4. (a) 1st, (b) 2nd, (c) 3rd, (d) 70th basis images.

Download Full Size | PDF

Fig. 10. Examples of basis-images of PCA-FLD subspace (column vectors in W_PW_F ) for the second experiment. c=18, n_j =15, l=70, and k=4. (a) 1st, (b) 2nd, (c) 3rd, (d) 4th basis-images.

Download Full Size | PDF

Fig. 11. Average probability of correct decision (P_d ) for all 18 classes. (a) nearest neighbor decision rule, (b) statistical distance decision rule.

Download Full Size | PDF

Fig. 12. Average RMSE for all 18 classes.

Download Full Size | PDF

Table 3. Performance evaluation for all 18 classes

View Table | View all tables in this article

6. Summary and conclusion

In this paper, we have presented a 3D object classification system using II. We use the PCA-FLD classifier combined with the nearest neighbor or the statistical distance decision rule. We have presented experimental results for 3D distortion-tolerant recognition system. We have presented experiments to classify input objects into one of possible classes. Three-dimensional out-of-plane rotation-tolerant recognition and classification using II has been demonstrated experimentally. In the future, we may develop the II object recognition system to overcome other obstacles of object recognition such as such as shifting, scaling, and illumination change.

References and links

1. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification2nd, (Wiley Interscience, New York, 2001).

2. A. K. Jain, Fundamentals of digital image processing, (Prentice-Hall Inc., 1989).

3. C. M. Bishop, Neural networks for pattern recognition, (Oxford University Press, New Yeork, 1995).

4. A. Mahalanobis and F. Goudail, “Methods for automatic target recognition by use of electro-optic sensors: introduction to the feature Issue,” Appl. Opt. 43, 207–209 (2004). [CrossRef]

5. C. F. Olson and D. P. Huttenlocher, “Automatic target recognition by matching oriented edge pixels,” IEEE Trans. on Image processing, special issue on Automatic Target Detection and Recognition 6, 103–113 (1997).

6. P. Refregier, Noise Theory and applications to physics, (Spinger, 2004).

7. F. A. Sadjadi, ed., Selected papers on Automatic Target Recognition, (SPIE-CDROM, 1999).

8. F. A. Sadjadi, “IR Target detection using probability density functions of wavelet transform subbands,” Appl. Opt. 43, 315–323 (2004). [CrossRef] [PubMed]

9. B. Javidi, ed., Image Recognition and Classification: Algorithms, Systems, and Applications, (Marcel Dekker, New York, 2002). [CrossRef]

10. J. Rosen, “Three-dimensional joint transform correlator,” Appl. Opt. 37, 7538–7544 (1998). [CrossRef]

11. J. J. Esteve-Taboada, D. Mas, and J. Garcia, “Three-dimensional object recognition by Fourier transform profilometry,” Appl. Opt. 38, 4760–4765 (1999). [CrossRef]

12. S. Yeom and B. Javidi, “Three-dimensional object feature extraction and classification with computational holographic imaging,” Appl. Opt. 43, 442–451 (2004). [CrossRef] [PubMed]

13. Y. Frauel and B. Javidi, “Digital three-dimensional image correlation by use of computer-reconstructed integral imaging,” Appl. Opt. 41, 5488–5496 (2002). [CrossRef] [PubMed]

14. S. Kishk and B. Javidi, “Improved resolution 3D object sensing and recognition using time multiplexed computational integral imaging,” Opt. Express 11, 3528–3541 (2003), http://www.opticsexpress.org/abstract.cfm?URI=OPEX-11-26-3528. [CrossRef] [PubMed]

15. J.-H. Park, S. Jung, H. Choi, Y. Kim, and B. Lee, “Depth extraction by use of a rectangular lens array and one-dimensional elemental image modification,” Appl. Opt. 43, 4882–4895 (2004). [CrossRef] [PubMed]

16. C. Wu, A. Aggoun, M. McCormick, and S. Y. Kung, “Depth extraction from unidirectional image using a modified multi-baseline technique,” in Stereoscopic Displays and Virtual Reality Systems IX, Proc. SPIE , 4660, 135–145 (2002).

17. S.-H. Lin, S.-Y. Kung, and L.-J. Lin, “Face recognition/detection by probabilistic decision-based neural network,” IEEE Trans. on Neural Networks, special issue on Artificial Neural Network and Pattern Recognition 8, 114–132 (1997). [PubMed]

18. P. N. Belhumer, J. P. Hespanha, and D. J. Kriegman, “Eigenfaces vs. Fisherfaces: recognition using class specific linear projection,” IEEE Trans. on PAMI. 19, 711–720 (1997). [CrossRef]

19. M. J. Lyons, J. Budynek, and S. Akamatsu, “Automatic classification of single facial images,” IEEE Trans. on PAMI. 21, 1357–1362 (1999). [CrossRef]

20. A. K. Jain, R. P. W. Duin, and J. Mao, “Statistical pattern recognition: a review,” IEEE Trans. on PAMI. 22, 4–37 (2000). [CrossRef]

21. M. Egmont-petersen, D. de Ridder, and H. Handels, “Image processing with neural networks — a review,” J. patt. recog. soc. , 35, 2279–2301 (2002). [CrossRef]

22. T. Okoshi, “Three-dimensional displays,” in Proceedings of the IEEE 68, 548–564 (1980) [CrossRef]

23. F. Okano, H. Hoshino, J. Arai, and I. Yuyama, “Real-time pickup method for a three-dimensional image based on integral photography,” Appl. Opt. 36, 1598–1603 (1997). [CrossRef] [PubMed]

24. J.-S. Jang and B. Javidi, “Time-multiplexed integral imaging for 3D sensing and display,” Optics and Photonics News 15, 36–43 (2004), http://www.osa-opn.org/abstract.cfm?URI=OPN-15-4-36.

r_i	1	2	3	4	5	6
Out-of-plane rotation angle	30°	33°	36°	39°	42°	45°

Object type	Number of training data for each class (n_j )	Average P_d		Average RMSE (%)	Average J(W_F ,W_P )
Object type	Number of training data for each class (n_j )	Nearest Neighbor	Statistical Distance	Average RMSE (%)	Average J(W_F ,W_P )
1		95.1	87.7	5.3	688.2
2	7	92.1	84.7	5.2	216.5
3		91.7	86.0	4.4	208.0
1		99.0	98.8	4.5	3012.1
2	15	96.8	96.9	4.2	585.6
3		97.8	98.2	3.6	766.6

Number of training data for each class (n_j )	Average P_d		Average RMSE (%)	Average J(W_F ,W_P )
Number of training data for each class (n_j )	Nearest Neighbor	Statistical Distance	Average RMSE (%)	Average J(W_F ,W_P )
7	93.54	83.30	4.6	4.49×10⁷
15	97.73	97.72	3.9	3.86×10⁸

r_i	1	2	3	4	5	6
Out-of-plane rotation angle	30°	33°	36°	39°	42°	45°

Object type	Number of training data for each class (n_j )	Average P_d		Average RMSE (%)	Average J(W_F ,W_P )
Object type	Number of training data for each class (n_j )	Nearest Neighbor	Statistical Distance	Average RMSE (%)	Average J(W_F ,W_P )
1		95.1	87.7	5.3	688.2
2	7	92.1	84.7	5.2	216.5
3		91.7	86.0	4.4	208.0
1		99.0	98.8	4.5	3012.1
2	15	96.8	96.9	4.2	585.6
3		97.8	98.2	3.6	766.6

Three-dimensional distortion-tolerant object recognition using integral imaging

Abstract

1. Introduction

2. System description

3. PCA-FLD classifier

3.1. Principal component analysis (PCA)

3.2 Fisher linear discriminant (FLD) analysis

4. Decision rules and performance evaluations

4.1 Decision rules

4.2. Performance evaluations

5. Experimental results

5.1. Integral imaging acquisition and pre-process

5.2 Classification of the rotation angles

5.3 Classification of the object type and rotation angles

6. Summary and conclusion

References and links

Supplementary Material (6)

Cited By

Figures (12)

Tables (3)

Equations (16)

Optics Express