Abstract
In this paper, we focus on the metrological aspects of spectroscopic Mueller ellipsometry—i.e. on the uncertainty estimation of the measurement results. With the help of simulated Mueller matrices, we demonstrate that the commonly used merit functions do not return the correct uncertainty for the measurand under consideration (here shown for the relatively simple case of the geometrical parameter layer thickness for the example system of a SiO2 layer on a Si substrate). We identify the non-optimal treatment of measured and sample- induced depolarization as a reason of this discrepancy. Since depolarization results from sample properties in combination with experimental parameters, it must not be minimized during the parameter fit. Therefore, we propose a new merit function treating this issue differently: It implicitly uses the measured depolarization as a weighting parameter. It is very simple and computationally cheap. It compares for each wavelength the measured Jones matrix elements to Cloude’s covariance matrix: ∼$\sum\nolimits_\lambda {{\boldsymbol j}_{\textrm{sim,}\lambda }^{\dagger }{\boldsymbol H}_{\textrm{meas,}\lambda }^\textrm{ + }{{\boldsymbol j}_{\textrm{sim,}\lambda }}}$. Moreover, an extension will be presented which allows us to include the measurement noise into this merit function. With this, reliable statistical uncertainties can be calculated. Except for some pre-processing of the raw data, there is no additional computational cost.
© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
1. Introduction
Spectroscopic ellipsometry (SE) is a well-known technique to gain information about various kinds of samples. Its application areas cover biology, medicine, solar cell inspection, solid state physics, manufacturing control in the semiconductor industry, and many more disciplines. Its full application spectrum is outlined in the classical literature and newer textbooks [1,2,3,4,5,6]
1.1 Ellipsometric metrology
SE is also widely used for metrology on technical samples to measure, for example, layer thicknesses and structure sizes or complex refractive indices of bulk or layered materials.
However, despite its high sensitivity [e.g. 7,8], SE is hardly used for so-called traceable1 measurements that require “an unbroken metrological traceability chain to”…"a measurement standard,” and “a documented measurement uncertainty”, which is established in the ideal case following the Guide to the expression of uncertainty in measurement (GUM) [9]. In a pioneering work, Germer et al. have described a GUM- compatible uncertainty evaluation for an application of SE [10]. Later, it has been shown that Bayesian uncertainty analysis should be preferred for non-linear problems [11].
So-called OCD tools (optical critical dimension, essentially based on SE) are nowadays a metrology workhorse for process development and process and quality control in front-end semiconductor manufacturing [10,12]. However, the OCD measurements of nano-scaled structures are always traced back to a CD-SEM tool (critical dimension scanning electron microscope, typically used as the in-house golden standard).
There is a broad database [13] and handbooks [14] on ellipsometric determination of the complex refractive index of many different materials indicating that measurement results from different authors for one material often differ remarkably. Since the measurement uncertainties and traceability of ellipsometric measurements are usually missing, one cannot assess the quality of data.
For another standard application of ellipsometry, the measurement of the layer or film thickness, it is noticeable and astonishing that world-wide only two national metrology institutes (NMI), ITRI (Taiwan) and PTB (Germany), hold a CMC entry for ellipsometry in the BIPM key comparison database [15]. And in fact, at PTB ellipsometric measurements of film thickness standards [16] or of the oxide layer on a high precision silicon sphere [17] are traced back to traceable X-ray reflectometry (XRR) measurements and do not rely on the measurement uncertainty evaluation of the ellipsometric measurement.
Finally, several international film thickness measurements comparisons have been performed during the last 20 years, comparing different methods like XRR, X-ray photoelectron spectroscopy (XPS), transmission electron microscopy (TEM), and partly SE as well [16,18,19,20]. Although the materials, applied methods, and results show some spreading, some similarities can be observed: The SE measurement results typically show systematic offsets of the order of one to several nanometers and some linearity deviation compared e.g. to XRR measurements, while the stated uncertainties for ellipsometry are astonishingly small [18], much smaller than the observed offsets to XRR measurements [19], or are missing at all [16,20]. One BIPM Key comparison, K32, has been performed, but only one NMI (NIMT, Thailand) has used a dual wavelength ellipsometer [21]—the comparison was nearly exclusively dominated by XRR and XPS.
Several reasons for the observed inconsistencies have been discussed and partly proven, such as hydrocarbon films [21], interface/interdiffusion layers, interface and surface roughness, and of course the ambiguity introduced by the choice of the layer model used for the analysis of ellipsometry measurements. However, it is quite evident that there are also some open challenges in the data analysis of ellipsometric measurements, especially in the estimation of realistic measurement uncertainties. Several extensive uncertainty evaluations for different types of ellipsometers have already been discussed and published [e.g. 10,22,23,24,25]. They all essentially deal with estimations of the influence of non-ideal instrument parameters like retardation errors, misalignment, or photometric inaccuracy, just to name a few, and error propagation is used to derive the corresponding uncertainty contributions.
These are excellent and required steps toward a complete uncertainty evaluation. However, we think that we have identified another problem in ellipsometric data evaluation—it is the non-adequate consideration and treatment of depolarization. Depolarization is additional statistical measurement information, which is included in Mueller matrices. Therefore, in the following, we will concentrate on data analysis in Mueller matrix ellipsometry (MME, also called Mueller matrix polarimetry).
Although this technique is in use for several decades, there are still ongoing discussions on the physical interpretation of the Mueller matrix.
1.2 Interpretation of Mueller matrices
The interpretation of measured Mueller matrices is rather complex and not straightforward. To address this problem, a large number of matrix decompositions have been proposed, thus allowing to derive certain optical sample properties (Cloude sum decomposition [26], Lu-Chipman polar decomposition [27], normal form decomposition [28,29], symmetric decomposition [30,31], differential decomposition [32], and integral decomposition [33]).
The decomposition techniques are immensely powerful tools, but not all of them can be applied in metrology. This is particularly the case for product decompositions that postulate a sequential occurrence of optical effects, which, in reality, takes place in parallel. From the metrology point of view, one cannot expect correct measurement results by using mathematical models that are known to be systematically incorrect.
In this paper, we carry forward the idea of the integral decomposition, which is based on the sum decomposition. With Cloude’s sum decomposition [26], any measured depolarizing Mueller matrix can be formulated as a sum of four non-depolarizing Mueller matrices. However, this decomposition is also difficult to interpret. A question comes up: What is the physical equivalent of these four non-depolarizing Mueller matrices? Ossikovski and Arteaga [33] have pointed out that Cloude’s covariance matrix, appearing as an auxiliary quantity in Cloude’s method, can be interpreted as a statistical quantity to describe depolarization.
This interpretation is the starting point of our work. And we will end up with a merit function for Mueller ellipsometry which considers Cloude’s covariance matrix as the statistical measurement uncertainty matrix.
We are aware that a new solution for an old problem needs to be well-justified, especially as at least three different merit functions are commonly in use in the ellipsometric literature. However, we have found out that ellipsometrically achieved uncertainties are frequently incompatible with those from other measurement methods. We will show that the used merit function is of crucial importance here.
But there are also other reasons: One is that ellipsometrically achieved uncertainties are typically given as uncertainty for the mean value achieved from N different wavelengths, while other techniques usually present the uncertainty for a single measurement. Another reason is that merit functions need some a priori knowledge or estimate about the expected range of the measurands. Otherwise, the optimization procedure may yield quite wrong or even non-physical sample parameter values caused by possibly strong (anti-)correlation between different parameters. In many cases, however, this additional information is hard to acquire.
And finally, one should keep in mind that ellipsometry is an integral measurement method. In general, this means that one loses information on the sample’s parameter variances by averaging (here over the illumination spot size). But, as already mentioned before, in Mueller ellipsometry one implicitly also measures some statistical distribution parameters resulting from the stochastic effect of depolarization. In the next sections, we demonstrate how to use them beneficially.
2. Motivation for a new merit function
In ellipsometry, optical and/or geometrical sample parameters (p1,…,pn)=p are measured indirectly. This means that a measured quantity, Ymeas, does not offer direct access to p but must be compared with the corresponding simulated quantity Ysim(p), and the best fitting parameters, $\hat{{\boldsymbol p}}$, have to be determined by numerical optimization techniques. With the standard approach in regression, the least-squares method, optimization is performed by minimizing a merit function of the squared deviations between experimental and simulated test data. Typical ellipsometric measurands are the Mueller matrix M, and for isotropic and homogenous samples, the ellipsometric angles Ψ and Δ as well as the coefficient ratio ρ.
In the non-depolarizing case, these quantities can be derived from the Jones formalism: The two-dimensional electric field vector in p- and s-co-ordinates of an outgoing beam, Eout, after reflection is linked to the field of the incoming beam, Ein, via the Jones matrix J of the sample:
Here, the Jones matrix is a complex 2 × 2 matrix. For isotropic samples, J is diagonal with The reflection coefficients, rp and rs, can be calculated using the Fresnel equations, which are an analytical solution of Maxwell’s equations for planar layers. In the general case (e.g. for structured samples), J must be calculated numerically with the so-called Maxwell solvers.For isotropic samples, the measurands ρ, Ψ, and Δ are connected to the Fresnel coefficients by the fundamental equation of ellipsometry:
Now that the typical ellipsometric measurands M, ρ, Ψ, and Δ have been linked to the Fresnel coefficients, which can be simulated, and the following three squared error (SE) functions can be formulated:
These functions are commonly used in least squares parameter optimization. They do not include an experimental error estimate as weights for the measurands. In [34], Jellison has discussed why one should not use an unweighted least squares function (he calls them unbiased functions) as a merit function in regression. Besides, it is not recommended to use SEΨΔ (not even with weights) as a merit function because of its incorrect metric. The point is that the sensitivity on Δ decreases when Ψ approaches zero. From the experimental point of view, this is very easy to understand: When there is zero intensity of p-polarized light (dielectric sample measured at the Brewster angle → rp=0), there is no phase information and hence no phase difference (i.e. Δ)-sensitivity. (Remark: Therefore, accuracy specifications, dΨ and dΔ, for ellipsometers without a given nominal value pair (Ψ, Δ) are useless).
Contrary to SEΨΔ, SEρ and SEM do account for that, though in different ways: When using SEρ, Ψ and Δ are transformed to a vector in the complex plane with the radius tan Ψ and the angle Δ. With SEM, one implicitly interprets Ψ and Δ as spherical co-ordinates and uses their Cartesian equivalents N, S, and C. Therefore, in both interpretations (SEρ and SEM), a distance function (metric) is well defined and used correctly. They are, however, defined in different regimes—SEρ operates more or less on the Jones matrix (E-fields), while SEM acts on the Mueller matrix (intensities).
The different metrics mean that the merit functions apply different inherent spectral weights. So, the question is which one of merit functions (Eqs. (8)–(10)) is correct? They cannot all be valid at the same time. Only with perfect measurement data and a perfect model (the best fitting parameters $\hat{{\boldsymbol p}}$ are identical for all wavelengths) one obtains the same best-fit values with all three merit functions. In this case, different weights do not change the best-fit result. In practice, however, non-perfect samples (and models) are always present.
2.1 A virtual experiment to test the merit functions
A further question gets relevant: Does any of these merit functions return correct parameter uncertainties, especially if we have sample-induced depolarization? To answer this, we have applied the different merit functions on simulated measurement data. In this way, one can exclude all experimental errors and has full control on all parameters. The virtual experiment and sample are shown in Fig. 1.
We have simulated Mueller ellipsometry measurements on a silicon wafer covered with natural oxide at an incident angle of θ=70°, which is close to the Brewster angle of Si. The dispersion data for Si and SiO2 has been taken from [35] and [14], respectively, for a spectral range of 190–990 nm.
The oxide layer featured a defined thickness variance and therefore the Mueller matrix for each wavelength has been generated in a statistical procedure. We
- 1. generate a large set (n ∼105) of Gaussian distributed heights for SiO2 with a mean of h0 and variance of $\sigma _0^2$. I.e.: ${{\textrm{H}}_{{\textrm{SiO}}2}}{\sim }{{\cal N}}({{h_0},\sigma_0^2} )$.
- 2. used the Fresnel equations and calculated the Jones matrix Ji with the given experimental parameters for each ${h_i} \in {{\textrm{H}}_{{\textrm{SiO}}2}}$ with Eq. (2).
- 3. built the Mueller matrices Mi from all Ji by applying Eq. (5).
- 4. calculated the mean Mueller matrix $\bar{{\boldsymbol M}}: = {n^{ - 1}}\sum\nolimits_{i = 1}^n {{{\boldsymbol M}_i}} $ and redefined it by normalization to its (1,1)-element: ${\boldsymbol M}: = \bar{{\boldsymbol M}}/{\bar{M}_{1,1}}$.
With the generated virtual measurement data, one can test the merit functions (8)–(10). So, we solve the problem:
The results for two examples are summarized in Table 1.Here, ${\hat{\sigma }_{\textrm{h}}}$ is the square root of the variance of the best-fit result. The variance or in the general multivariate case, the covariance matrix can be calculated as
Table 1 shows that all merit functions give back exactly the correct SiO2 thickness. As mentioned above, this is not astonishing and should not be misinterpreted: the simulation was especially constructed in a way that for each wavelength, the optimum is reached at the nominal height of h0 and therefore the applied different inherent spectral weights have no influence on the best-fit result (an example where this does not hold and thus the different merit functions give different results is given in Appendix 1). Things look different, however, when looking at the uncertainties.
We should expect that ${\hat{\sigma }_h} \cdot \sqrt {N - \textrm{dof}} $ should equal ${\sigma _0}$. However, none of the merit functions (Eqs. (8)–(10)) gives back the correct result. Hence, it is pointless to discuss the differences among the results. One might argue that it is not reasonable to compare these equations without any experimental error estimate. But here in this ideal simulation, we do not have any experimental errors and especially no spectral dependent errors.
From our point of view, a merit function is needed that can recover the (given) sample parameter uncertainties correctly. We will now derive a merit function that fulfills our needs. Later, we will extend it to include also experimental measurement errors.
A final remark on this section: Fig. 2 (left) shows the polarization index
3. The new merit function
3.1 Derivation
We now derive a merit function that implicitly uses the measured sample depolarization as an error estimate, thereby overcoming the problems shown in the last section when using Eqs. (8)–(10).
We start with some general and surely well-known basic definitions. Suppose N Gaussian distributed quantities xm (with dimension n) have been measured independently. For each measurand, the uncertainty shall be given as a covariance matrix, Σx. In an optimization process, the measurands shall be compared to simulated values xs(p) to derive the best-fitting model parameter vector $\hat{{\boldsymbol p}}$. Next, the likelihood function can be formulated as the product of the N probability density functions:
The question is how to apply Eq. (17) in ellipsometry. Obviously, the summation must run over different configurations, differing in terms of the wavelength and/or angle. But what about the quantities ${\boldsymbol \Sigma }$, ${{\boldsymbol x}_m}$, and ${{\boldsymbol x}_s}$ in ${\chi ^2}$’s kernel
H is Hermitian. According to Cloude’s original paper [26], H can also be eigen-decomposed to:
With the same argumentation used above, one can now extract also ${{\boldsymbol j}_{2,\ldots ,4}}$ from H and define H2,…,4 accordingly. Combining all terms $K_{1,\ldots ,4}^2$ leads to
Because of ${\boldsymbol j}_s^{\dagger} {{\boldsymbol H}^ + }{{\boldsymbol j}_s} = {\textrm{tr}}({{{\boldsymbol j}_s}{\boldsymbol j}_s^{\dagger} {{\boldsymbol H}^ + }} )= {\textrm{tr}}({{{\boldsymbol H}_s}{{\boldsymbol H}^ + }} )$, the term can also be interpreted as a comparison between a simulated and measured Cloude covariance matrix. For a better understanding, the quantities are sketched graphically in Fig. 3 in a simplified way.
Finally, we end up with:
With the merit function SEH=${\chi ^2}$ from Eq. (26), we indeed found the solution for our initial examples. See Table 2: Of course, we also confirmed the merit function to work as perfectly with other thicknesses and uncertainties. Moreover, we have checked that the results are correct for each single wavelength. From that, we heuristically conclude that the inherent spectral weighting is correct as well.
In Appendix 1, we give an additional example of an absorbing layer.
3.2 Extension to include measurement noise
We have demonstrated the new merit function using simulated data. Now we discuss its application to real data. There is a practical problem: The inversion of Cloude’s covariance matrix in Eq. (26) may cause some difficulties. With the simulated data, we had a Cloude covariance matrix with two eigenvalues equal to zero. We overcame the problem of the singular matrix inversion by using the pseudo-inverse matrix. In practice and especially in an important application of ellipsometry, the determination of isotropic layers, one often has to deal with eigenvalues close to zero. Such matrices are ill-conditioned for inversion.
Two solutions are quite simple: 1) If the sample is a priori definitely isotropic, one can neglect the zero off-diagonal elements of the Jones matrix and reduce the 4 × 1 j-vector to a 2 × 1 vector and, equivalently, the 4 × 4 H-matrix to a 2 × 2 matrix. 2.) One can apply some matrix regularization. For example, one can use the Tikhonov regularization [38] and add a small number (e.g. 10−8) to all eigenvalues before they are inverted. So, effectively, one adds some noise to get a more stable solution.
Both methods are easy to implement and well established, and therefore, they shall not be discussed here any further. Instead, we propose another method which will allow us to incorporate statistical measurement noise as well. To derive it, we first take a closer look at the measured Mueller matrices for isotropic samples. It turns out that there are not only two small eigenvalues, but even tiny negative eigenvalues sometimes. These are unphysical results, as Cloude pointed out in [26] (indicating e.g. a polarization index larger than one or negative intensities and so on). So, he proposed what we will call “Cloude-filtering” here: Mueller matrices are transformed to Cloude covariance matrices. Next, physically realizable Mueller matrices correspond to positive semidefinite Cloude covariance matrices. If negative eigenvalues occur, they are set to zero and the corrected covariance matrix is transformed back into a then valid Mueller matrix. Note: The so filtered covariance matrix is the nearest positive semidefinite matrix in the Frobenius norm to the original matrix [39].
But a general question remains: How come there is seemingly no (or even “negative”) noise in one or more eigenvector directions when measurements are always noisy?
This can be explained by the fact that a Mueller matrix is not measured directly but calculated from valid and (slightly) invalid raw data (here from Fourier coefficients describing the polarization ellipse measured at the detector when rotating the analyzer; see Appendix 3). Invalid data can be caused by statistical noise or systematic measurement errors (e.g. owing to non-linear detector response, wrong detector offset correction, and so on). While systematic errors should preferably be addressed by technical (hardware) improvements, we focus on the treatment of statistical noise. The proposed method, then, is straightforward and describes the propagation of the measurement noise on the Cloude covariance matrix H in the merit function:
- 1. Suppose raw data evaluation gives ${\boldsymbol m} = {({{M_{1,2}},\ldots ,{M_{4,4}}} )^{\textrm{T}}}$, the 15 best-fitting Mueller matrix elements as a vector and the associated covariance matrix ${{\boldsymbol \Sigma }_{\boldsymbol m}}$ (see Appendix 3 for the calculation of m and ${{\boldsymbol \Sigma }_{\boldsymbol m}}$).
- 2. Generate a large set (n ∼ some thousands) of Mueller matrix element vectors with $\textrm{\{ }{{\boldsymbol m}_1}\textrm{,}\ldots \textrm{,}{{\boldsymbol m}_n}\textrm{\} }{\sim }{{\cal N}}({{\boldsymbol m},{{\boldsymbol \Sigma }_{\boldsymbol m}}} )$ and build the corresponding Mueller matrices $\{ {{\boldsymbol M}_1},\ldots ,{{\boldsymbol M}_n}\} $ from them3.
- 3. Cloude-filter these Mueller matrices: ${{\boldsymbol M}^{\boldsymbol \prime}}_{1,\ldots ,n}^{}: = {\textrm{CF}}({{{\boldsymbol M}_{\textrm{1,}\ldots \textrm{,}n}}} )$.
- 4. Renormalize the filtered matrices to their respective (1,1)-element: ${{\boldsymbol M}^{\boldsymbol \prime\prime}}_{1,\ldots ,n}: = {\boldsymbol M}^{\boldsymbol \prime}_{1,\ldots ,n}/M^{\prime}_{1,\ldots ,n;1,1}$.
- 5. Build the mean matrix: ${{\boldsymbol M}_{{\textrm{CF}}}}: = {n^{ - 1}}\sum\nolimits_{k = 1}^n {{\boldsymbol M}^{\boldsymbol \prime\prime}}_{k} $.
- 6. Calculate its Cloude covariance matrix ${{\boldsymbol H}_{{\textrm{CF}}}}$ and use ${\boldsymbol j}_{\textrm{s}}^{\dagger} {\boldsymbol H}_{{\textrm{CF}}}^ + {{\boldsymbol j}_{\textrm{s}}}$ in the merit function (26) or (27), respectively.
With decreasing measurement noise, ${{\boldsymbol H}_{{\textrm{CF}}}}$ approximates the original H asymptotically. Experimentally, this can be achieved with an increasing number of measurements. This lowers the mean Mueller matrix uncertainty (${{\boldsymbol \Sigma }_{\boldsymbol m}}\sim 1/\textrm{N}$). Also, in the limiting case, when there is only noise, the method behaves as desired—it leads to seemingly full depolarization.
We demonstrate the application of this method to our initial example for the 2.0 ± 0.1 nm-thick SiO2 layer on Si. Therefore, we generate different noise levels. For reasons of simplicity, they shall only depend on one scalar parameter, σm. We define: ${{\boldsymbol \Sigma }_{\boldsymbol m}}({\sigma _{\boldsymbol m}}) = \sigma _{\boldsymbol m}^2{{\textrm{I}}_{15}}$ with I15 being the 15 × 15 identity matrix. We do this for all wavelengths. The interesting parameter is now the recovered SiO2-height uncertainty in dependence of the noise level. The result is shown in Fig. 4.
So, as can be seen and described above, in the no-noise limit one sets up on the nominal sample uncertainty (here 0.1 nm), while the recovered uncertainty increases with the raw data noise. Although everything is plausible, we do not and cannot claim that the raw data error propagation on the parameter uncertainty is fully correct. Gaussian error propagation cannot be used here as the conditions of unconstraint variables are not fulfilled owing to the eigenvalues being restricted to the interval [0,2]. A (more) correct approach would be to use Bayesian statistics. But this would be extremely costly because for each used wavelength, a posterior distribution of each Mueller matrix element had to be calculated numerically before its expected value could be determined. The proposed method, on the other hand, is computationally cheap. For example, the generation and Cloude-filtering of 5,000 Mueller matrices is done in less than 0.2 sec on a modern machine (without parallelization). Please note: This only needs to be done once as part of raw data processing.
Effectively, the method adds noise to the covariance matrix. In this way, the matrix inversion problem with which we started this section is solved by a measurement noise-dependent regularization. The inclusion of statistical measurement noise is an important improvement when aiming at quantitative ellipsometry.
4. Application on measured data
4.1 Data evaluation tools
Before we apply our method in the next section to real data, we define some helpful data evaluation tools.
So far, we have derived the tools to process ellipsometric raw data to a covariance matrix that includes the sample’s polarization properties and the statistical noise of the experiment. Furthermore, a χ2- and a likelihood function for data analysis have been defined. With the extension of the last section, the functions are now given by
As shown in Sections 2 and 3, the averaged covariance is of high relevance as it contains the physical parameter uncertainties. Therefore, we define the average likelihood function as
- • apply the maximum likelihood method: $\hat{{\boldsymbol p}} = \arg \mathop {\max }\limits_{\boldsymbol p} {\bar{{\cal L}}}({\boldsymbol p})$,
- • calculate the posterior distribution (πo, “out”) from a given prior distribution (πi, “in”) with the help of Bayesian statistics:
- • and define the Bayesian information criterion [40] to relatively compare fit models with each other:$${\textrm{BIC}} ={-} 2\ln ({{\cal L}}(\hat{{\boldsymbol p}})) + k\ln (N) ={-} 2N\ln ({\bar{{\cal L}}}(\hat{{\boldsymbol p}})) + k\ln (N)$$
(models with lower values are preferred; k, the number of free parameters). The information criterion balances between fit optimality (first summand) and model complexity (second summand). As a result, it prevents from under- and over-fitting.
4.2 Examples
To demonstrate how the derived method works on real measurement data, we performed a Mueller ellipsometric measurement on a standard crystalline silicon wafer to determine the thickness of its native oxide layer. As in the simulation examples presented before, a spectral range of 190–1000 nm at N = 940 discrete wavelengths and an angle of incidence of 70° were chosen. Dispersion data were taken from [41] for SiO2 and from [35] for Si.
4.2.1 Single-layer model
Figure 5 shows the spectral contributions to the minimal χ2 value and their distribution being calculated according to Eq. (28). The χ2 values are close to 1 over the whole spectrum, indicating a good agreement of model and measurement. In the DUV-range, the values increase for the following reasons: the intensity of the ellipsometer’s light source and the detector’s sensitivity are lower than in VIS and NIR; and short wavelengths are more sensitive to roughness-induced depolarization. This results in higher noise levels. The χ2 distribution is of a typical shape.
We get ${\chi ^2}({\hat{p}} )/N$=1.210 (BICred:1145.72)—this is slightly better than using Palik’s dispersion data [14] for fused SiO2, which gives 1.213 (BICred:1146.90).
4.2.2 Models with additional interlayers
With the help of the Bayesian information criterion, one can test if it is reasonable to add an interface layer between the oxide and the silicon. We checked it for SiO [14] and a Bruggeman effective medium layer (EMA) [42]. The latter was chosen as a 50:50 mixture of SiO2 and Si to represent a first-order approximation of the gradual transition from silicon to its oxide. With the SiO layer, we got the best result: BICred:1128.85 instead of BICred:1130.60 for the EMA layer. With an additional free height parameter for an EMA layer at the air/SiO2 interface, the information criterion gets worse again (BICred: 1129.12). In this case, the problem is said to be overfitted, while it is underfitted with just one layer. In Table 3, all the results are summarized.
The results of this little model comparison are quite physical and plausible: The latest dielectric parameters for SiO2 from Rodríguez-de Marcos et al. fit better than the older Palik data, and the interface between SiO2 and Si can be described best with SiO instead of an (SiO2/Si) EMA layer.
Now, we focus on the thickness results for the different models. They are also given in Table 3. The best one-layer model yields an oxide height of 2.18 nm with an average standard deviation of 0.38 nm. This is a plausible result, especially when comparing the measured larger depolarization, as indicated by the polarization index and the entropy, to those in the simulation example for h = 2 ± 0.1 nm (Fig. 2 and Fig. 6). When adding interface layers, however, high (anti-) correlations with large single height uncertainties occur. So, while the overall height is quite constant, the layer composition is unclear. Moreover, another problem gets obvious: The uncertainty ranges cover also negative layer heights, which, of course, are unphysical. So, the Gaussian approximation (which in general assumes unconstraint parameters) cannot be applied here. We propose to use Bayesian statistics according to Eq. (31).
We do so by sampling the posterior distribution numerically with Markov chain Monte Carlo (MCMC) using the random walk Metropolis algorithm [43,44]. Details of the technique used here are described in [45,46], along with their references. Since no external a priori information was available, uniform height distributions have been used as prior πi.
In Fig. 7 and Fig. 8 the posterior distributions are shown for the single-layer model (with SiO2 dispersion data from [41]) and for the two-layer model (with SiO2 data from [41] and SiO data from [14]) obtained with the mean likelihood function ${\bar{{\cal L}}}({\boldsymbol p})$ from Eq. (30). For the single-layer model, one recognizes a nearly Gaussian distribution. Its expected value and standard deviation (2.18 ± 0.35 nm) are in good agreement with the optimization results given in Table 3. The slight deviation for the standard deviation can be explained with the limited number of sample points (here 20,000).
Nonetheless, for the two-layer model, one gets a non-symmetric, non-Gaussian posterior distribution for both layer thicknesses (Fig. 8(a) and 8(b)). Also, a high (anti-)correlation between the two heights can be observed (Fig. 8(c)). It is not trivial to express such a result in numbers. The expected values for the layer heights (E(hSiO2) = 1.18 nm and E(hSiO) = 0.82 nm, respectively), for example, are of low relevance only. They do not agree with the most likely values we got from the optimization as well as from the maximum of the posterior distribution. The standard deviations (σ(hSiO2) = 0.67 nm, σ(hSiO) = 0.52 nm, correlation coefficient: −0.87) depend on the expected values, so they also are not very meaningful. So, one has to accept and interpret the whole posterior distribution as the final result.
Instead of ${\bar{{\cal L}}}({\boldsymbol p})$ (Eq. (30)), ${{\cal L}}({\boldsymbol p})$ (Eq. (29)) can be used in the Bayesian formula (Eq. (31)) to calculate the variations of the mean thicknesses of the measurements at N = 940 different wavelengths. They are given as posterior distribution in Fig. 9. Apparently, a Gaussian approximation would be appropriate: The posterior is distributed with the best fit values $\hat{{\boldsymbol p}}$, the statistical uncertainty ${\hat{{\boldsymbol \sigma }}_{\boldsymbol p}}$, and the correlations ${\textrm{Corr}}(\hat{{\boldsymbol p}})$, all obtained with the optimization method (Table 3). But please keep in mind: The deviation of the mean thicknesses is not a measurand for the layer uncertainty, but it decreases usually with the number of measurements.
However, depending on the intended application of the measurement, the mean thickness might be the measurand of interest.
For the sake of completeness: For the measured spectra, we have compared the optimization results achieved with the new merit function to those of the traditional merit functions. The results are given in Appendix 5. In Appendix 4, we propose an additional weight function considering the differences in measurement sensitivities for different wavelengths and angles of incident.
5. Summary, conclusions, and outlook
In this paper, we have dealt with the statistical uncertainty calculation in Mueller ellipsometry. In ellipsometry, the (co-)variance of the mean best-fit parameters (which decreases with the inverse of the typically large number of used wavelengths) is often misinterpreted and misunderstood as the sample’s parameter uncertainty. Apart from that, we have demonstrated, with the help of simulations, that commonly used methods for the ellipsometric uncertainty calculation fail even in one of the simplest cases, namely the determination of the oxide thickness on silicon. The main problem is the inappropriate treatment of depolarization, which is most often simply neglected.
Based on the work of Cloude [26] and Ossikovski and Arteaga [33], we have algebraically derived a new merit function as the central part of the ellipsometric uncertainty calculation. This new merit function solves the problem of determining measurement uncertainties in SE metrology, and it was tested and proved using simulated and noise-free SE data.
We have also presented an extension of our solution which additionally includes statistical measurement noise. Hence, our extended approach enables the analysis of real-world measurement data, which is always affected by noise. As a welcome side-effect, this numerically regularizes the problem and stabilizes the solution. A demonstration with real data was given.
The choice of an adequate model is a major task in ellipsometry. Here, the Bayesian information criterion is helpful, as it allows to compare different models with each other. This Bayesian information criterion can reasonably be defined and used also in the context of our new approach—it is based on the likelihood function that we utilize to apply Bayesian statistics. Generally, we recommend the Bayesian statistics, especially in cases where the Gaussian statistics fail or cannot be applied (e.g. in case of constrained parameters). Examples of how to use the information criterion and how to calculate the posterior distribution with Bayesian statics were given.
To sum up, our main conclusion is: Cloude’s covariance matrix can be used as what it is, namely a covariance matrix. With it, sample-induced depolarization can now be considered. This results in a correct weighting of all spectral contributions to the merit function. Standard statistical tools like maximum likelihood, Bayesian information criterion, or Bayesian statistics can be applied.
In this paper, we have indirectly applied many commonly used idealizations like the far-field-approximation, plane wavefronts, unlimited spectral detector resolution, perfect polarization optics, and so on. The systematic deviations from ideal cases lead to systematic contributions to the best-fit result and their uncertainties. In a full uncertainty budget, they should be considered as well.
We have also used seemingly contradictory assumptions: We assumed full coherence for the light–matter interaction at distinct layer thicknesses but averaged the resulting Mueller matrices incoherently. The contradiction can be solved by considering spatial information on both the sample and the light: We assume that the illuminating light is spatially coherent only over a limited area with a sufficiently constant and distinct layer height. Over larger distances on the sample where the layer thickness may vary, the spatial coherence is assumed to be lost.
Hingerl and Ossikovski have done pioneering work on how to deal with partial coherence in Mueller ellipsometry [47,48]. We will have to involve their results when applying our method on inhomogeneous or highly depolarizing rough samples. This will be one of our next goals.
Additionally, for rough sample surfaces, Fresnel’s equations are not applicable anymore. Instead, one must solve Maxwell’s equations numerically to calculate the light–matter interaction. In the past, we successfully used the finite-element solver JCMSuite for this while analyzing periodically structured surfaces [49,50,51]. In the future, we are also going to apply this Maxwell solver to stochastically structured, rough surfaces.
Appendix 1: Remarks on absorbing layers
The thickness and uncertainty determination of a dielectric layer, as shown in the introducing example with a SiO2 layer on Si, is quite simple because of the missing absorption. The problem is different with a metallic layer. So, we carried out similar virtual experiments as before but with an aluminum layer on glass. Again, we chose an angle of incidence of 70° and a spectral range of 190–1000 nm. On the one hand, one expects that the mean reconstructed thickness is systematically shifted toward smaller values because light probing thinner parts of the Al layer has higher remaining intensity than those probing thicker parts. On the other hand, the thickness sensitivity is expected to decrease with increasing absorbance/thickness, which lets us expect larger thickness uncertainty. And indeed, this is what we observe: Simulation results are shown in Table 4 for two different nominal Al layer thicknesses and uncertainties. While the achieved uncertainties with the SEH=χ2 from Eq. (26) are reasonable and plausible, they again give wrong results and seem to be without any recognizable system for the other merit functions.
Though it is only small, experimenters and users should always be aware of the systematic deviation and the increased uncertainty when analyzing absorbing layers.
Furthermore, one can also see that the different merit functions give different best-fit values. This results from the spectral dependent shift of the best-fit height and their different weighting coming along with the selected merit function (see explanation in Section 2).
Appendix 2: Polarization criteria
From the definition of Cloude’s covariance matrix (19), one can see:
For M1,1-normalized Mueller matrices, tr(H) = 2 follows. Furthermore, the trace of a quadratic matrix equals the sum of its eigenvalues: As shown in [52,53], this allows to trace back the polarization index (14) of a Mueller matrix on the eigenvalues of its Cloude covariance matrix asPD is widely used (and therefore we use it here also), but it is not the only index to describe the depolarization caused by a sample. The information-based approach leading to the entropy [54]
(with ${P_k}$, the normalized eigenvalues: ${P_k} = {{{\lambda _k}} \mathord{\left/ {\vphantom {{{\lambda_k}} {\sum\nolimits_{l = 1}^4 {{\lambda_l}} }}} \right.} {\sum\nolimits_{l = 1}^4 {{\lambda _l}} }}$) is also of relevance and probably more fundamental.As can be seen, both definitions are based on Cloude’s covariance matrix eigenvalues—i.e. they are scalar quantities in the interval [0,1] to describe the shape deviation of the 4D covariance ellipsoid from the ideal sphere on the one hand (fully depolarized state) and a line (fully polarized state) on the other hand. (PD is 1 for fully polarized and 0 for fully depolarized light. For S, it is vice-versa).
Appendix 3: Raw data processing
Our Mueller-ellipsometer (Sentech SE 850) works in the so-called step-scan mode. For different combinations of the polarizer and the two compensators, the analyzer scans the polarization ellipse. So, the intensity is measured as a function of the analyzers angle ${\theta _A}$:
Appendix 4: Sensitivity weights
The merit function given in Eq. (28) considers depolarization and statistical noise for each single measurement. Until now, when calculating statistical sample parameter uncertainties, the contributions of all measurement configurations (differing e.g. in wavelength or angle of incidence) have been uniformly weighted. However, in addition, they can and should be interfered with external or technically motivated weights. For example, to consider for the measurement contrast: We introduce a measure for the sample’s polarizing capabilities and use it as a weight factor for the summands in Eq. (28). Therefore, we compare the measured (and normalized) Mueller matrix with the one of an ideal reflector (Fresnel coefficients: rs=−1, rp=1). The latter calculates with Eqs. (2) and (5) to:
Note: The argumentation we supply here is not limited to our merit function only.
Appendix 5: Measurements analyzed with different merit functions
We have demonstrated on simulated data that the new merit function is the only one to consider the effect of sample-induced depolarization correctly. Nevertheless, the reader may like to compare the optimization results achieved with the traditional merit functions (Section 2) on the measured data presented in Section 4.2. They are presented in Tables 5 and 6 for the single- and two-layer models, respectively.
As can be seen, except for the thickness uncertainties obtained with (Ψ, Δ)-optimization, which are significantly lower, all the results are in adequate agreement. So, for this example, the differences are manageable, but we cannot seriously assess its representability.
Appendix 6: Endnotes
1International vocabulary of metrology (VIM):
Metrological traceability: Property of a measurement result whereby the result can be related to a reference through a documented unbroken chain of calibrations, each contributing to the measurement uncertainty.
2Owing to the normalization, js is limited to vectors on a 4D complex sphere ${\Omega _R} = \left\{ {{\boldsymbol r|}||{\boldsymbol r} ||= \sqrt R } \right\}$ with R = 2. The normalization parameter C is then:
3The Cholesky decomposition can be used for this: ${\boldsymbol L}{{\boldsymbol L}^\textrm{T}}\textrm{ = }{{\boldsymbol \Sigma }_{\boldsymbol m}}$. Next, the vectors ${{\boldsymbol m}_k}: = {\boldsymbol m} + {\boldsymbol L}{{\boldsymbol x}_k}$ are distributed as desired.${{\boldsymbol x}_k}$ are vectors with normally distributed random components, each with zero mean and unit variance.
Funding
Transmet (2016-6).
Disclosures
The authors declare no conflicts of interest.
References
1. R. M. A. Azzam and N. M. Bashara, Ellipsometry and polarized light (North-Holland Publishing Co, 1977).
2. H. Tompkins and E. A. Irene, Handbook of ellipsometry (William Andrew, 2005).
3. M. Schubert, Infrared ellipsometry on semiconductor layer structures: phonons, plasmons, and polaritons209, (Springer Science & Business Media, 2004).
4. H. Fujiwara, Spectroscopic ellipsometry: principles and applications (John Wiley & Sons, 2007).
5. M. Losurdo and K. Hingerl, Ellipsometry at the Nanoscale (Springer, 2013).
6. J. J. Gil and R. Ossikovski, Polarized Light and the Mueller Matrix Approach (CRC, 2017).
7. M. Losurdo, M. Bergmair, G. Bruno, D. Cattelan, C. Cobet, A. de Martino, K. Fleischer, Z. Dohcevic-Mitrovic, N. Esser, M. Galliet, R. Gajic, D. Hemzal, K. Hingerl, J. Humlicek, R. Ossikovski, Z. V. Popovic, and O. Saxl, “Spectroscopic ellipsometry and polarimetry for materials and systems analysis at the nanometer scale: state-of-the-art, potential, and perspectives,” J. Nanopart. Res. 11(7), 1521–1554 (2009). [CrossRef]
8. M. Losurdo, Defining and Analysing the Optical Properties of Materials at the Nanoscale, Ges. für Mikro- und Nanoelektronik, (2010).
9. BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP and OIML “Evaluation of Measurement Data - Guide to the Expression of Uncertainty in Measurement,” Joint Committee for Guides in Metrology, Technical Report, JCGM 100, (2008).
10. T. A. Germer, H. J. Patrick, R. M. Silver, and B. Bunday, “Developing an uncertainty analysis for optical scatterometry,” Proc. SPIE 7272, 72720T (2009). [CrossRef]
11. C. Elster and B. Toman, “Bayesian uncertainty analysis under prior ignorance of the measurand versus analysis using the Supplement 1 to the Guide: a comparison,” Metrologia 46(3), 261–266 (2009). [CrossRef]
12. B. Bunday, “HVM metrology challenges towards the 5 nm node,” Proc. SPIE 9778, 97780E (2016). [CrossRef]
13. M. N. Polyanskiy, “Refractive index database,” [Online]. Available: https://refractiveindex.info. [Accessed 09 10 2019].
14. E. D. Palik, Handbook of Optical Constants of Solids (Academic, Inc., 1998).
15. BIPM, “The BIPM key comparison database,” [Online]. Available: https://kcdb.bipm.org/. [Accessed 09 10 2019].
16. K. Hasche, P. Thomsen-Schmidt, M. Krumrey, G. Ade, G. Ulm, J. Stuempel, S. Schaedlich, W. Frank, M. Procop, and U. Beck, “Metrological characterization of nanometer film thickness standards for XRR and ellipsometry applications,” Proc. SPIE 5190, 165 (2003). [CrossRef]
17. Y. Azuma, P. Barat, G. Bartl, H. Bettin, M. Borys, I. Busch, L. Cibik, G. D’Agostino, K. Fujii, H. Fujimoto, A. Hioki, M. Krumrey, U. Kuetgens, N. Kuramoto, G. Mana, E. Massa, R. Meeß, S. Mizushima, T. Narukawa, A. Nicolaus, A. Pramann, S. A. Rabb, O. Rienitz, C. Sasso, M. Stock, R. D. Vocke Jr, A. Waseda, S. Wundrack, and S. Zakel, “Improved measurement results for the Avogadro constant using a 28Si-enriched crystal,” Metrologia 52(2), 360–375 (2015). [CrossRef]
18. J. Ehrstein, C. Richter, D. Chandler-Horowitz, E. Vogel, D. Ricks, C. Young, S. Spencer, S. Shah, D. Maher, B. Foran, A. Diebold, and P. Yee-Hung, “Thickness Evaluation for 2 nm SiO2 Films, a Comparison of Ellipsometric, Capacitance-Voltage and HRTEM Measurements,” AIP Conf. Proc. 683(1), 331–336 (2003). [CrossRef]
19. S. Kohli, C. D. Rithner, P. K. Dorhout, A. M. Dummer, and C. S. Menoni, “Comparison of nanometer-thick films by x-ray reflectivity and spectroscopic ellipsometry,” Rev. Sci. Instrum. 76(2), 023906 (2005). [CrossRef]
20. I. Busch, Auf dünnen Schichten - Pilotvergleich HfO2 auf Si, private communication, (2019).
21. M. P. Seah, “CCQM-K32 key comparison and P84 pilot study: Amount of silicon oxide as a thickness of SiO2 on Si,” Metrologia 45(1A), 08013 (2008). [CrossRef]
22. J. M. M. De Nijs and A. van Silfhout, “Systematic and random errors in rotating-analyzer ellipsometry,” J. Opt. Soc. Am. A 5(6), 773–781 (1988). [CrossRef]
23. D. H. Goldstein and R. A. Chipman, “Error analysis of a Mueller matrix polarimeter,” J. Opt. Soc. Am. A 7(4), 693–700 (1990). [CrossRef]
24. Y. J. Cho, W. Chegal, J. P. Lee, and H. M. Cho, “Universal evaluation of combined standard uncertainty for rotating-element spectroscopic ellipsometers,” Opt. Express 24(23), 26215–26227 (2016). [CrossRef]
25. X. Cheng, M. Li, J. Zhou, H. Ma, and Q. Hao, “Error analysis of the calibration of a dual-rotating-retarder Mueller matrix polarimeter,” Appl. Opt. 56(25), 7067–7074 (2017). [CrossRef]
26. S. R. Cloude, “Conditions for the physical realisability of matrix operators in polarimetry,” Proc. SPIE 1166, 177–185 (1989). [CrossRef]
27. S. Lu and R. Chipman, “Interpretation of Mueller matrices based on polar decomposition,” J. Opt. Soc. Am. A 13(5), 1106–1113 (1996). [CrossRef]
28. Z.-F. Xing, “On the Deterministic and Non-deterministic Mueller Matrix,” J. Mod. Opt. 39(3), 461–484 (1992). [CrossRef]
29. R. Sridhar and R. Simon, “Normal form for Mueller Matrices in Polarization Optics,” J. Mod. Opt. 41(10), 1903–1915 (1994). [CrossRef]
30. R. Ossikovski, “Analysis of depolarizing Mueller matrices through a symmetric decomposition,” J. Opt. Soc. Am. A 26(5), 1109–1118 (2009). [CrossRef]
31. C. Fallet, A. Pierangelo, R. Ossikovski, and A. de Martino, “Experimental validation of the symmetric decomposition of Mueller matrices,” Opt. Express 18(2), 831–842 (2010). [CrossRef]
32. R. Ossikovski, “Differential matrix formalism for depolarizing anisotropic media,” Opt. Lett. 36(12), 2330–2332 (2011). [CrossRef]
33. R. Ossikovski and O. Arteaga, “Integral decomposition and polarization properties of depolarizing Mueller matrices,” Opt. Lett. 40(6), 954–957 (2015). [CrossRef]
34. G. E. Jellison, “Use of the biased estimator in the interpretation of spectroscopic ellipsometry data,” Appl. Opt. 30(23), 3354–3360 (1991). [CrossRef]
35. C. Herzinger, B. Johs, W. McGahan, J. Woollam, and W. Paulson, “Ellipsometric determination of optical constants for silicon and thermally grown silicon dioxide via a multi-sample, multi-wavelength, multi-angle investigation,” J. Appl. Phys. 83(6), 3323–3336 (1998). [CrossRef]
36. J. J. Gil and E. Bernabeu, “Depolarization and Polarization Indices of an Optical System,” Optica Acta: International Journal of Optics 33(2), 185–189 (1986). [CrossRef]
37. I. Busch, Y. Azuma, H. Bettin, L. Cibik, P. Fuchs, K. Fujii, and S. Mizushima, “Surface layer determination for the Si spheres of the Avogadro project,” Metrologia 48(2), S62–S82 (2011). [CrossRef]
38. A. N. Tikhonov and V. Y. Arsenin, Solution of Ill-posed Problems (Winston & Sons, 1977).
39. N. J. Highnam, “Computing a nearest symmetric positive semidefinite matrix,” Linear algebra and its applications 103, 103–118 (1988). [CrossRef]
40. G. Schwarz, “Estimating the dimension of a model,” Ann. Statist. 6(2), 461–464 (1978). [CrossRef]
41. L. V. Rodríguez-de Marcos, J. I. Larruquert, J. A. Méndez, and J. A. Aznárez, “Self-consistent optical constants of SiO2 and Ta2O5 films,” Opt. Mater. Express 6(11), 3622–3637 (2016). [CrossRef]
42. D. A. G. Bruggeman, “Berechnung verschiedener physikalischer Konstanten von heterogenen Substanzen. I. Dielektrizitätskonstanten und Leitfähigkeiten der Mischkörper aus isotropen Substanzen,” Ann. Phys. 416(7), 636–664 (1935). [CrossRef]
43. C. Sherlock, P. Fearnhead, and G. O. Roberts, “The random walk Metropolis: Linking theory and practice through a case study,” Statist. Sci. 25(2), 172–190 (2010). [CrossRef]
44. H. Haario, E. Saksman, and J. Tamminen, “Adaptive proposal distribution for random walk Metropolis algorithm,” Comput. Stat. 14(3), 375–395 (1999). [CrossRef]
45. S. Heidenreich, H. Gross, and M. Bär, “Bayesian approach to determine critical dimensions from scatterometric measurements,” Metrologia 55(6), S201–S211 (2018). [CrossRef]
46. K. van den Meersche, K. Soetaert, and D. van Oevelen, “xsample(): An R function for sampling linear inverse problems,” J. Stat. Soft. 30, 1–15 (2009). [CrossRef]
47. K. Hingerl and R. Ossikovski, “General approach for modeling partial coherence in spectroscopic Mueller matrix polarimetry,” Opt. Lett. 41(2), 219–222 (2016). [CrossRef]
48. R. Ossikovski and K. Hingerl, “General formalism for partial spatial coherence in reflection Mueller matrix polarimetry,” Opt. Lett. 41(17), 4044–4047 (2016). [CrossRef]
49. “Program package JCMsuite,” JCMwave GmbH, [Online]. Available: http://www.jcmwave.com.
50. M. Wurm, J. Endres, J. Probst, M. Schoengen, A. Diener, and B. Bodermann, “Metrology of nanoscale grating structures by UV scatterometry,” Opt. Express 25(3), 2460–2468 (2017). [CrossRef]
51. M. Hammerschmidt, M. Weiser, X. G. Santiago, L. Zschiedrich, B. Bodermann, and S. Burger, “Quantifying parameter uncertainties in optical scatterometry using Bayesian inversion,” Proc. SPIE 10330, 1033004 (2017). [CrossRef]
52. J. J. Gil, “Characteristic properties of Mueller matrices,” J. Opt. Soc. Am. A 17(2), 328–334 (2000). [CrossRef]
53. R. Ossikovski, “Alternative depolarization criteria for Mueller matrices,” J. Opt. Soc. Am. A 27(4), 808–814 (2010). [CrossRef]
54. S. R. Cloude, “Group theory and polarisation algebra,” OPTIK 75(1), 26–36 (1986).
55. D. H. Goldstein, Polarized Light3 ed., (CRC, 2010).
56. D. E. Aspnes, “Effects of component optical activity in data reduction and calibration of rotating-analyzer ellipsometers,” J. Opt. Soc. Am. 64(6), 812–819 (1974). [CrossRef]
57. D. M. Radman and B. D. Cahan, “Effects of component optical activity in data reduction and calibration of rotating-analyzer ellipsometers,” J. Opt. Soc. Am. 71(12), 1546 (1981). [CrossRef]
58. I. Markovsky and S. van Huffel, “Overview of total least squares methods,” Signal Processing 87(10), 2283–2302 (2007). [CrossRef]