Some aspects on the uncertainty calculation in Mueller ellipsometry

Matthias Wurm; Tobias Grunewald; Sven Teichert; Bernd Bodermann; Johanna Reck; Uwe Richter

doi:10.1364/OE.381244

1. Introduction

Spectroscopic ellipsometry (SE) is a well-known technique to gain information about various kinds of samples. Its application areas cover biology, medicine, solar cell inspection, solid state physics, manufacturing control in the semiconductor industry, and many more disciplines. Its full application spectrum is outlined in the classical literature and newer textbooks [1,2,3,4,5,6]

1.1 Ellipsometric metrology

SE is also widely used for metrology on technical samples to measure, for example, layer thicknesses and structure sizes or complex refractive indices of bulk or layered materials.

However, despite its high sensitivity [e.g. 7,8], SE is hardly used for so-called traceable¹ measurements that require “an unbroken metrological traceability chain to”…"a measurement standard,” and “a documented measurement uncertainty”, which is established in the ideal case following the Guide to the expression of uncertainty in measurement (GUM) [9]. In a pioneering work, Germer et al. have described a GUM- compatible uncertainty evaluation for an application of SE [10]. Later, it has been shown that Bayesian uncertainty analysis should be preferred for non-linear problems [11].

So-called OCD tools (optical critical dimension, essentially based on SE) are nowadays a metrology workhorse for process development and process and quality control in front-end semiconductor manufacturing [10,12]. However, the OCD measurements of nano-scaled structures are always traced back to a CD-SEM tool (critical dimension scanning electron microscope, typically used as the in-house golden standard).

There is a broad database [13] and handbooks [14] on ellipsometric determination of the complex refractive index of many different materials indicating that measurement results from different authors for one material often differ remarkably. Since the measurement uncertainties and traceability of ellipsometric measurements are usually missing, one cannot assess the quality of data.

For another standard application of ellipsometry, the measurement of the layer or film thickness, it is noticeable and astonishing that world-wide only two national metrology institutes (NMI), ITRI (Taiwan) and PTB (Germany), hold a CMC entry for ellipsometry in the BIPM key comparison database [15]. And in fact, at PTB ellipsometric measurements of film thickness standards [16] or of the oxide layer on a high precision silicon sphere [17] are traced back to traceable X-ray reflectometry (XRR) measurements and do not rely on the measurement uncertainty evaluation of the ellipsometric measurement.

Finally, several international film thickness measurements comparisons have been performed during the last 20 years, comparing different methods like XRR, X-ray photoelectron spectroscopy (XPS), transmission electron microscopy (TEM), and partly SE as well [16,18,19,20]. Although the materials, applied methods, and results show some spreading, some similarities can be observed: The SE measurement results typically show systematic offsets of the order of one to several nanometers and some linearity deviation compared e.g. to XRR measurements, while the stated uncertainties for ellipsometry are astonishingly small [18], much smaller than the observed offsets to XRR measurements [19], or are missing at all [16,20]. One BIPM Key comparison, K32, has been performed, but only one NMI (NIMT, Thailand) has used a dual wavelength ellipsometer [21]—the comparison was nearly exclusively dominated by XRR and XPS.

Several reasons for the observed inconsistencies have been discussed and partly proven, such as hydrocarbon films [21], interface/interdiffusion layers, interface and surface roughness, and of course the ambiguity introduced by the choice of the layer model used for the analysis of ellipsometry measurements. However, it is quite evident that there are also some open challenges in the data analysis of ellipsometric measurements, especially in the estimation of realistic measurement uncertainties. Several extensive uncertainty evaluations for different types of ellipsometers have already been discussed and published [e.g. 10,22,23,24,25]. They all essentially deal with estimations of the influence of non-ideal instrument parameters like retardation errors, misalignment, or photometric inaccuracy, just to name a few, and error propagation is used to derive the corresponding uncertainty contributions.

These are excellent and required steps toward a complete uncertainty evaluation. However, we think that we have identified another problem in ellipsometric data evaluation—it is the non-adequate consideration and treatment of depolarization. Depolarization is additional statistical measurement information, which is included in Mueller matrices. Therefore, in the following, we will concentrate on data analysis in Mueller matrix ellipsometry (MME, also called Mueller matrix polarimetry).

Although this technique is in use for several decades, there are still ongoing discussions on the physical interpretation of the Mueller matrix.

1.2 Interpretation of Mueller matrices

The interpretation of measured Mueller matrices is rather complex and not straightforward. To address this problem, a large number of matrix decompositions have been proposed, thus allowing to derive certain optical sample properties (Cloude sum decomposition [26], Lu-Chipman polar decomposition [27], normal form decomposition [28,29], symmetric decomposition [30,31], differential decomposition [32], and integral decomposition [33]).

The decomposition techniques are immensely powerful tools, but not all of them can be applied in metrology. This is particularly the case for product decompositions that postulate a sequential occurrence of optical effects, which, in reality, takes place in parallel. From the metrology point of view, one cannot expect correct measurement results by using mathematical models that are known to be systematically incorrect.

In this paper, we carry forward the idea of the integral decomposition, which is based on the sum decomposition. With Cloude’s sum decomposition [26], any measured depolarizing Mueller matrix can be formulated as a sum of four non-depolarizing Mueller matrices. However, this decomposition is also difficult to interpret. A question comes up: What is the physical equivalent of these four non-depolarizing Mueller matrices? Ossikovski and Arteaga [33] have pointed out that Cloude’s covariance matrix, appearing as an auxiliary quantity in Cloude’s method, can be interpreted as a statistical quantity to describe depolarization.

This interpretation is the starting point of our work. And we will end up with a merit function for Mueller ellipsometry which considers Cloude’s covariance matrix as the statistical measurement uncertainty matrix.

We are aware that a new solution for an old problem needs to be well-justified, especially as at least three different merit functions are commonly in use in the ellipsometric literature. However, we have found out that ellipsometrically achieved uncertainties are frequently incompatible with those from other measurement methods. We will show that the used merit function is of crucial importance here.

But there are also other reasons: One is that ellipsometrically achieved uncertainties are typically given as uncertainty for the mean value achieved from N different wavelengths, while other techniques usually present the uncertainty for a single measurement. Another reason is that merit functions need some a priori knowledge or estimate about the expected range of the measurands. Otherwise, the optimization procedure may yield quite wrong or even non-physical sample parameter values caused by possibly strong (anti-)correlation between different parameters. In many cases, however, this additional information is hard to acquire.

And finally, one should keep in mind that ellipsometry is an integral measurement method. In general, this means that one loses information on the sample’s parameter variances by averaging (here over the illumination spot size). But, as already mentioned before, in Mueller ellipsometry one implicitly also measures some statistical distribution parameters resulting from the stochastic effect of depolarization. In the next sections, we demonstrate how to use them beneficially.

2. Motivation for a new merit function

In ellipsometry, optical and/or geometrical sample parameters (p₁,…,p_n)=p are measured indirectly. This means that a measured quantity, Y_meas, does not offer direct access to p but must be compared with the corresponding simulated quantity Y_sim(p), and the best fitting parameters, $\hat{{\boldsymbol p}}$, have to be determined by numerical optimization techniques. With the standard approach in regression, the least-squares method, optimization is performed by minimizing a merit function of the squared deviations between experimental and simulated test data. Typical ellipsometric measurands are the Mueller matrix M, and for isotropic and homogenous samples, the ellipsometric angles Ψ and Δ as well as the coefficient ratio ρ.

In the non-depolarizing case, these quantities can be derived from the Jones formalism: The two-dimensional electric field vector in p- and s-co-ordinates of an outgoing beam, E_out, after reflection is linked to the field of the incoming beam, E_in, via the Jones matrix J of the sample:

(1)$${{\boldsymbol E}_{out}} = {\boldsymbol J} \cdot {{\boldsymbol E}_{\textrm{in}}}. $$

Here, the Jones matrix is a complex 2 × 2 matrix. For isotropic samples, J is diagonal with

(2)$${\boldsymbol J} = {\textrm{diag}}({{r_{\textrm{p}}},{r_{\textrm{s}}}} ). $$

The reflection coefficients, r_p and r_s, can be calculated using the Fresnel equations, which are an analytical solution of Maxwell’s equations for planar layers. In the general case (e.g. for structured samples), J must be calculated numerically with the so-called Maxwell solvers.

For isotropic samples, the measurands ρ, Ψ, and Δ are connected to the Fresnel coefficients by the fundamental equation of ellipsometry:

(3)$$\rho = \tan {\Psi }\exp i\Delta = {{{r_p}} \mathord{\left/ {\vphantom {{{r_p}} {{r_s}}}} \right.} {{r_s}}}. $$

Maxwell’s (or Fresnel’s) equations, as solution in classical optics, are strictly deterministic. Depolarization as a statistical result of sample roughness, finite illumination coherence, and finite bandwidth, among others, can be considered in the Mueller’s formalism as a superposition of deterministic waves. Here, an outgoing Stokes vector, S_out, is linked to the incoming Stokes vector S_in via the Mueller matrix M:

(4)$${{\boldsymbol S}_{\textrm{out}}} = {\boldsymbol M} \cdot {{\boldsymbol S}_{\textrm{in}}}. $$

In the non-depolarizing limit, the Mueller matrix can be calculated from the Jones matrix by:

(5)$${\boldsymbol M} = {\boldsymbol T} \cdot ({{\boldsymbol J} \otimes {{\boldsymbol J}^\ast }} )\cdot {{\boldsymbol T}^{ - 1}}, $$

with * denoting the complex conjugation, ${\otimes}$ the Kronecker product, and the transition matrix

(6)$${\boldsymbol T} = \left( {\begin{array}{cccc} 1 &0 &0 &1\\ 1 &0 &0 &{ - 1}\\ 0 &1 &1 &0\\ 0 &i &{ - i} &0 \end{array}} \right). $$

Note that since most Mueller ellipsometers cannot measure absolute intensities, one usually normalizes M to its (1,1)-element: ${\boldsymbol M}: = {\boldsymbol M}/{M_{1,1}}$. This results in:

(7)$${\boldsymbol M} = \left( {\begin{array}{cccc} \textrm{1} &{\textrm{ - N}} &\textrm{0} &\textrm{0}\\ {\textrm{ - N}} &\textrm{1} &\textrm{0} &\textrm{0}\\ \textrm{0} &\textrm{0} &\textrm{C} &\textrm{S}\\ \textrm{0} &\textrm{0} &{\textrm{ - S}} &\textrm{C} \end{array}} \right), $$

with N = cos 2Ψ, S = sin 2Ψ sin Δ and C = sin 2Ψ cos Δ, the Cartesian co-ordinates on a unit sphere. So, N²+S²+C²=1.

Now that the typical ellipsometric measurands M, ρ, Ψ, and Δ have been linked to the Fresnel coefficients, which can be simulated, and the following three squared error (SE) functions can be formulated:

(8)$${{{\textrm{SE}}_{\Psi \Delta }}({\boldsymbol p}) = \sum\nolimits_\lambda ^N {[{{{({{\Psi _{{\textrm{m}},\lambda }} - {\Psi _{{\textrm{s}},\lambda }}({\boldsymbol p})} )}^2} - {{({{\Delta _{{\textrm{m}},\lambda }} - {\Delta _{{\textrm{s}},\lambda }}({\boldsymbol p})} )}^2}}} ]}$$

(9)$$\textrm{S}{\textrm{E}_\rho }({\boldsymbol p}) = \sum\nolimits_\lambda ^N {{{||{{\rho_{\textrm{m},\lambda }} - {\rho_{\textrm{s},\lambda }}({\boldsymbol p})} ||}^2}}$$

(10)$$\textrm{S}{\textrm{E}_{M}}({\boldsymbol p}) = \sum\nolimits_\lambda ^N {\sum\nolimits_{k,l}^4 {{{({{M_{\textrm{m},\lambda ,k,l}} - {M_{\textrm{s},\lambda ,k,l}}({\boldsymbol p})} )}^2}} }, $$

(Note: We always use λ as an index variable to indicate that the summation runs over N measurement configurations differing in wavelengths and/or angles of incidence. Index m: measured, s: simulated).

These functions are commonly used in least squares parameter optimization. They do not include an experimental error estimate as weights for the measurands. In [34], Jellison has discussed why one should not use an unweighted least squares function (he calls them unbiased functions) as a merit function in regression. Besides, it is not recommended to use SE_ΨΔ (not even with weights) as a merit function because of its incorrect metric. The point is that the sensitivity on Δ decreases when Ψ approaches zero. From the experimental point of view, this is very easy to understand: When there is zero intensity of p-polarized light (dielectric sample measured at the Brewster angle → r_p=0), there is no phase information and hence no phase difference (i.e. Δ)-sensitivity. (Remark: Therefore, accuracy specifications, dΨ and dΔ, for ellipsometers without a given nominal value pair (Ψ, Δ) are useless).

Contrary to SE_ΨΔ, SE_ρ and SE_M do account for that, though in different ways: When using SE_ρ, Ψ and Δ are transformed to a vector in the complex plane with the radius tan Ψ and the angle Δ. With SE_M, one implicitly interprets Ψ and Δ as spherical co-ordinates and uses their Cartesian equivalents N, S, and C. Therefore, in both interpretations (SE_ρ and SE_M), a distance function (metric) is well defined and used correctly. They are, however, defined in different regimes—SE_ρ operates more or less on the Jones matrix (E-fields), while SE_M acts on the Mueller matrix (intensities).

The different metrics mean that the merit functions apply different inherent spectral weights. So, the question is which one of merit functions (Eqs. (8)–(10)) is correct? They cannot all be valid at the same time. Only with perfect measurement data and a perfect model (the best fitting parameters $\hat{{\boldsymbol p}}$ are identical for all wavelengths) one obtains the same best-fit values with all three merit functions. In this case, different weights do not change the best-fit result. In practice, however, non-perfect samples (and models) are always present.

2.1 A virtual experiment to test the merit functions

A further question gets relevant: Does any of these merit functions return correct parameter uncertainties, especially if we have sample-induced depolarization? To answer this, we have applied the different merit functions on simulated measurement data. In this way, one can exclude all experimental errors and has full control on all parameters. The virtual experiment and sample are shown in Fig. 1.

Fig. 1. Left: Sketch of a Mueller-ellipsometer measuring in reflection. Right: Virtual sample: SiO₂ with a Gaussian height distribution on silicon.

Download Full Size | PDF

We have simulated Mueller ellipsometry measurements on a silicon wafer covered with natural oxide at an incident angle of θ=70°, which is close to the Brewster angle of Si. The dispersion data for Si and SiO₂ has been taken from [35] and [14], respectively, for a spectral range of 190–990 nm.

The oxide layer featured a defined thickness variance and therefore the Mueller matrix for each wavelength has been generated in a statistical procedure. We

1. generate a large set (n ∼10⁵) of Gaussian distributed heights for SiO₂ with a mean of h₀ and variance of $\sigma _0^2$. I.e.: ${{\textrm{H}}_{{\textrm{SiO}}2}}{\sim }{{\cal N}}({{h_0},\sigma_0^2} )$.
2. used the Fresnel equations and calculated the Jones matrix J_i with the given experimental parameters for each ${h_i} \in {{\textrm{H}}_{{\textrm{SiO}}2}}$ with Eq. (2).
3. built the Mueller matrices M_i from all J_i by applying Eq. (5).
4. calculated the mean Mueller matrix $\bar{{\boldsymbol M}}: = {n^{ - 1}}\sum\nolimits_{i = 1}^n {{{\boldsymbol M}_i}} $ and redefined it by normalization to its (1,1)-element: ${\boldsymbol M}: = \bar{{\boldsymbol M}}/{\bar{M}_{1,1}}$.

Physically this procedure corresponds to a totally incoherent superposition of single beams scanning different oxide heights. But please note that this virtual experiment is not intended to simulate roughness. Roughness would require the specification of space frequencies or a space frequency spectrum and that would also lead to diffraction.

With the generated virtual measurement data, one can test the merit functions (8)–(10). So, we solve the problem:

(11)$$\hat{h} = \mathop {\arg \min }\limits_{h \in {{\mathbb R}}} {\textrm{SE}}(h). $$

The results for two examples are summarized in Table 1.

Table 1. Simulations: SiO₂-thicknesses and averaged standard deviations. Nominal and recovered values for different merit functions and two different combinations of (h₀, σ₀). While all merit functions give back the nominal values as the best-fit result, they fail to recover the given uncertainties.

View Table | View all tables in this article

Here, ${\hat{\sigma }_{\textrm{h}}}$ is the square root of the variance of the best-fit result. The variance or in the general multivariate case, the covariance matrix can be calculated as

(12)$${\Sigma _{\boldsymbol p}} = \frac{{{\textrm{SE}}(\hat{{\boldsymbol p}})}}{{N - \textrm{dof}}} \cdot 2{{{\cal H}}^{ - 1}}(\hat{{\boldsymbol p}}), $$

with

(13)$${{{\cal H}}^{(k,l)}}(\hat{{\boldsymbol p}}) = {\left. {\frac{{{\partial^2}{\textrm{SE}}({\boldsymbol p})}}{{\partial {{\boldsymbol p}_k}\partial {{\boldsymbol p}_l}}}} \right|_{{p} = \hat{{p}}}}, $$

the Hessian matrix of the merit function at its optimum $\hat{{\boldsymbol p}}$, and dof, the degrees of freedom. Note that in Table 1, the values for ${\hat{\sigma }_h} \cdot \sqrt {N - \textrm{dof}} $ are given. While ${\hat{\sigma }_h}$ is the standard deviation of the mean value $\hat{h}$, ${\hat{\sigma }_h} \cdot \sqrt {N - \textrm{dof}} $ is the square root of the averaged variance for a single measurement (at only one wavelength). For reasons of simplicity, we call it here the averaged standard deviation. It is important not to confuse the averaged standard deviation with the standard deviation of the mean, which usually decreases with the number of measurements N (e.g. used wavelengths).

Table 1 shows that all merit functions give back exactly the correct SiO₂ thickness. As mentioned above, this is not astonishing and should not be misinterpreted: the simulation was especially constructed in a way that for each wavelength, the optimum is reached at the nominal height of h₀ and therefore the applied different inherent spectral weights have no influence on the best-fit result (an example where this does not hold and thus the different merit functions give different results is given in Appendix 1). Things look different, however, when looking at the uncertainties.

We should expect that ${\hat{\sigma }_h} \cdot \sqrt {N - \textrm{dof}} $ should equal ${\sigma _0}$. However, none of the merit functions (Eqs. (8)–(10)) gives back the correct result. Hence, it is pointless to discuss the differences among the results. One might argue that it is not reasonable to compare these equations without any experimental error estimate. But here in this ideal simulation, we do not have any experimental errors and especially no spectral dependent errors.

From our point of view, a merit function is needed that can recover the (given) sample parameter uncertainties correctly. We will now derive a merit function that fulfills our needs. Later, we will extend it to include also experimental measurement errors.

A final remark on this section: Fig. 2 (left) shows the polarization index

(14)$${P_D} = {{{{[{{\textrm{tr}}({{{\boldsymbol M}^T}{\boldsymbol M}} )- M_{1,1}^2} ]}^{1/2}}} \mathord{\left/ {\vphantom {{{{[{tr({{{\boldsymbol M}^T}{\boldsymbol M}} )- M_{1,1}^2} ]}^{1/2}}} {\left( {\sqrt 3 \cdot {M_{1,1}}} \right)}}} \right.} {\left( {\sqrt 3 \cdot {M_{1,1}}} \right)}}$$

defined by Gil and Bernabeu [36] for the simulated measurement example with h₀=2 nm and σ₀=0.1 nm (with ^T denoting the matrix transposition; please see also Appendix 2). Apparently, the polarization index is always very close to 1, indicating only small depolarization. Experimentally, it would be very hard to resolve it. However, owing to its impact on the thickness uncertainty, it should not be neglected as it is common practice in large parts of the community. The thickness uncertainty here is of the order of one atomic layer. This is quite small. Nonetheless, often even smaller uncertainties (corresponding to even lower depolarization) are proclaimed in the literature (e.g. [7,37]).

Fig. 2. Polarization index (left) and entropy (right, discussion Appendix 2) for the simulated example with a SiO₂ thickness h₀=2 ± 0.1 nm.

Download Full Size | PDF

3. The new merit function

3.1 Derivation

We now derive a merit function that implicitly uses the measured sample depolarization as an error estimate, thereby overcoming the problems shown in the last section when using Eqs. (8)–(10).

We start with some general and surely well-known basic definitions. Suppose N Gaussian distributed quantities x_m (with dimension n) have been measured independently. For each measurand, the uncertainty shall be given as a covariance matrix, Σ_x. In an optimization process, the measurands shall be compared to simulated values x_s(p) to derive the best-fitting model parameter vector $\hat{{\boldsymbol p}}$. Next, the likelihood function can be formulated as the product of the N probability density functions:

(15)$${{\cal L}}({\boldsymbol p}) = \prod\limits_{k\textrm{ = 1}}^N {{{({2\pi } )}^{ - n/2}}{{|{{{\boldsymbol \Sigma }_{{\boldsymbol x},k}}} |}^{ - 1/2}}\exp \left( { - \frac{1}{2}{{({{{\boldsymbol x}_{{\textrm{m,k}}}} - {{\boldsymbol x}_{{\textrm{s,k}}}}({\boldsymbol p})} )}^T}{\boldsymbol \Sigma }_{{\boldsymbol x},k}^{ - 1}({{{\boldsymbol x}_{{\textrm{m}},{\textrm{k}}}} - {{\boldsymbol x}_{{\textrm{s}},{\textrm{k}}}}({\boldsymbol p})} )} \right)}$$

The best-fitting parameters can be extracted at the maximum of the likelihood function which is at the same position as the minimum of the negative log-likelihood function: ${{\hat{\boldsymbol p}} = }\mathop {\arg \max }\limits_{\boldsymbol p} {{\cal L}}({\boldsymbol p}) = \mathop {\arg \min }\limits_{\boldsymbol p} \ell ({\boldsymbol p})$, with

(16)$$\begin{aligned} \ell ({\boldsymbol p}) &={-} \log ({{{\cal L}}({\boldsymbol p})} )\\ &= \frac{{Nn}}{2}\ln 2\pi + \frac{1}{2}\sum\limits_{k = 1}^N {\ln |{{\Sigma _{x,k}}} |} + \frac{1}{2}\sum\limits_{k = 1}^N {{{({{{\boldsymbol x}_{{\textrm{m,k}}}} - {{\boldsymbol x}_{{\textrm{s,k}}}}({\boldsymbol p})} )}^T}{\boldsymbol \Sigma }_{{\boldsymbol x},k}^{ - 1}({{{\boldsymbol x}_{{\textrm{m,k}}}} - {{\boldsymbol x}_{{\textrm{s,k}}}}({\boldsymbol p})} )} \end{aligned}$$

Here, the covariance matrices are not a subject of optimization (Σ_x independent of p), so the maximum likelihood approach has the same optimum as the χ² function

(17)$${\chi ^2}({\boldsymbol p}) = \sum\limits_{k = 1}^N {{{({{{\boldsymbol x}_{{\textrm{m}},{\textrm{k}}}} - {{\boldsymbol x}_{{\textrm{s}},{\textrm{k}}}}({\boldsymbol p})} )}^{\textrm{T}}}{\boldsymbol \Sigma }_{{\boldsymbol x},k}^{ - 1}({{{\boldsymbol x}_{{\textrm{m}},{\textrm{k}}}} - {{\boldsymbol x}_{{\textrm{s}},{\textrm{k}}}}({\boldsymbol p})} )}, $$

which does not depend on the normalization factor ${({2\pi } )^{ - n/2}}{|{{{\boldsymbol \Sigma }_{{\boldsymbol x},k}}} |^{ - 1/2}} = const$. (Note: Instead of applying the regular transposition (^T), the Hermitian transposition (^†) should be used for complex variables. Furthermore, the exponent differs by a factor of 2 for circularly symmetric complex random vectors from that of real vectors—this requires a slightly different normalization factor: ${\pi ^{ - n}}{|{{{\boldsymbol \Sigma }_{{\boldsymbol x},k}}} |^{ - 1}}$).

The question is how to apply Eq. (17) in ellipsometry. Obviously, the summation must run over different configurations, differing in terms of the wavelength and/or angle. But what about the quantities ${\boldsymbol \Sigma }$, ${{\boldsymbol x}_m}$, and ${{\boldsymbol x}_s}$ in ${\chi ^2}$’s kernel

(18)$${{\textrm{K}}^2}: = {({{{\boldsymbol x}_{\textrm{m}}} - {{\boldsymbol x}_{\textrm{s}}}} )^{\dagger}}{\boldsymbol \Sigma }_{}^{ - 1}({{{\boldsymbol x}_{\textrm{m}}} - {{\boldsymbol x}_{\textrm{s}}}} )$$

To answer this, we follow Ossikovski’s and Arteaga’s idea that they used for the integral decomposition [33]. First, Cloude’s covariance matrix is needed [26]. It can be calculated from a Mueller matrix as

(19)$${\boldsymbol H} = \frac{1}{2}\sum\nolimits_{k,l\textrm{ = 1}}^4 {{M_{k,l}} \cdot {{\boldsymbol \sigma }_k} \otimes {\boldsymbol \sigma }_l^{}}$$

with the help of the (modified) Pauli matrices ${{\boldsymbol \sigma }_1} = \left( {\begin{array}{cc} 1 &0\\ 0 &1 \end{array}} \right)$, ${{\boldsymbol \sigma }_2} = \left( {\begin{array}{cc} 1 &0\\ 0 &{ - 1} \end{array}} \right)$, ${{\boldsymbol \sigma }_3} = \left( {\begin{array}{cc} 0 &1\\ 1 &0 \end{array}} \right)$ and ${{\boldsymbol \sigma }_4} = \left( {\begin{array}{cc} 0 &i\\ { - i} &0 \end{array}} \right)$.

H is Hermitian. According to Cloude’s original paper [26], H can also be eigen-decomposed to:

(20)$${\boldsymbol H} = {\boldsymbol V}{\boldsymbol \Lambda }{{\boldsymbol V}^{\dagger} } = \sum\nolimits_{k = 1}^4 {{\lambda _k}{{\boldsymbol v}_k}{\boldsymbol v}_k^{\dagger} } = \sum\nolimits_{k = 1}^4 {{{\boldsymbol j}_k}{\boldsymbol j}_k^{\dagger} }$$

with V, the eigenvector matrix, Λ is the diagonal matrix of the eigenvalues and ${{\boldsymbol j}_k} = \sqrt {{\lambda _k}} {{\boldsymbol v}_k}$. The j_k are a vectorial representation of a Jones-matrix ${\boldsymbol J} = \left( {\begin{array}{cc} {{j_1}}&{{j_2}}\\ {{j_3}}&{{j_4}} \end{array}} \right)$:${\boldsymbol j} = {({j_1},{j_2},{j_3},{j_4})^{\dagger} }$. Ossikovski and Arteaga [33] extracted the dominant j_k (the one with the largest λ_k) from H. Without loss of generality, let us call it j₁. Then, they interpreted j₁ as the measurand and used the reduced covariance matrix ${{\boldsymbol H}_\textrm{1}} = \sum\nolimits_{k = 2}^4 {{{\boldsymbol j}_k}{\boldsymbol j}_k^{\dagger} }$ as an estimation of its statistical uncertainty. With this and the simulated Jones element vector ${{\boldsymbol j}_\textrm{s}}$, expression (18) reads as:

(21)$${\textrm{K}}_1^2 = {({{{\boldsymbol j}_1} - {{\boldsymbol j}_s}} )^{\dagger} }{\boldsymbol {H}}_1^ + ({{{\boldsymbol j}_1} - {{\boldsymbol j}_s}} ). $$

From the normalization of the Mueller matrix follows $\sum\nolimits_{k = 1}^4 {{\lambda _k}} = 2$, as can be easily shown with Eq. (19) and by keeping in mind that the trace of a quadratic matrix equals the sum of its eigenvalues. Thus, expression (21) only makes sense when j_s is normalized analogously:

(22)$${{\boldsymbol j}_\textrm{s}}: = \sqrt 2 {{\boldsymbol j}_\textrm{s}}/||{{{\boldsymbol j}_\textrm{s}}} ||. $$

H₁ is built from three j-vectors and therefore is not of full rank and definitely singular, so one can use the pseudo-inverse ${\boldsymbol H}_1^ + $ instead of ${\boldsymbol H}_1^{ - 1}$.

With the same argumentation used above, one can now extract also ${{\boldsymbol j}_{2,\ldots ,4}}$ from H and define H_2,…,4 accordingly. Combining all terms $K_{1,\ldots ,4}^2$ leads to

(23)$${{\textrm{K}}^2} = \frac{1}{3}\sum\nolimits_{k = 1}^4 {{{({{{\boldsymbol j}_k} - {{\boldsymbol j}_s}} )}^{\dagger} }{\boldsymbol {H}}_k^ + ({{{\boldsymbol j}_k} - {{\boldsymbol j}_s}} )}. $$

The normalization factor 1/3 will become clear when we analyze this expression further. We do this stepwise. First, we expand the expression and apply the pseudo-inversion:

(24)$$\begin{array}{r} {{\textrm{K}}^2} = \frac{1}{3}[{({{{\boldsymbol j}_1} - {{\boldsymbol j}_s}} )^{\dagger} }({0 + \lambda_2^{ - 1}{{\boldsymbol v}_2}{\boldsymbol v}_2^{\dagger} + \lambda_3^{ - 1}{{\boldsymbol v}_3}{\boldsymbol v}_3^{\dagger} + \lambda_4^{ - 1}{{\boldsymbol v}_4}{\boldsymbol v}_4^{\dagger} } )({{{\boldsymbol j}_1} - {{\boldsymbol j}_s}} )\;\\ + {({{{\boldsymbol j}_2} - {{\boldsymbol j}_s}} )^{\dagger} }({\lambda_1^{ - 1}{{\boldsymbol v}_1}{\boldsymbol v}_1^{\dagger} + 0 + \lambda_3^{ - 1}{{\boldsymbol v}_3}{\boldsymbol v}_3^{\dagger} + \lambda_4^{ - 1}{{\boldsymbol v}_4}{\boldsymbol v}_4^{\dagger} } )({{{\boldsymbol j}_2} - {{\boldsymbol j}_s}} )\;\\ + {({{{\boldsymbol j}_3} - {{\boldsymbol j}_s}} )^{\dagger} }({\lambda_1^{ - 1}{{\boldsymbol v}_1}{\boldsymbol v}_1^{\dagger} + \lambda_2^{ - 1}{{\boldsymbol v}_2}{\boldsymbol v}_2^{\dagger} + 0 + \lambda_4^{ - 1}{{\boldsymbol v}_4}{\boldsymbol v}_4^{\dagger} } )({{{\boldsymbol j}_3} - {{\boldsymbol j}_s}} )\;\\ + {({{{\boldsymbol j}_4} - {{\boldsymbol j}_s}} )^{\dagger} }({\lambda_1^{ - 1}{{\boldsymbol v}_1}{\boldsymbol v}_1^{\dagger} + \lambda_2^{ - 1}{{\boldsymbol v}_2}{\boldsymbol v}_2^{\dagger} + \lambda_3^{ - 1}{{\boldsymbol v}_3}{\boldsymbol v}_3^{\dagger} + 0} )({{{\boldsymbol j}_4} - {{\boldsymbol j}_s}} )] \end{array}$$

Now, one can benefit from the orthogonality of the eigenvectors, ${{\boldsymbol v}_k}{\boldsymbol v}_l^{\dagger} = 0$ for $k \ne l$:

(25)$$\begin{aligned}{{\textrm{K}}^2} &= \frac{1}{3}[{\boldsymbol j}_s^{\dagger} \left( {3{\kern 1pt} {\kern 1pt} {\kern 1pt} \sum\nolimits_{k = 1}^4 {\lambda_k^{ - 1}{{\boldsymbol v}_k}{\boldsymbol v}_k^{\dagger} } } \right){{\boldsymbol j}_s}]\\ &= {\boldsymbol j}_s^{\dagger} {{\boldsymbol H}^ + }{{\boldsymbol j}_s} \end{aligned}$$

This is a very simple result, and thanks to the normalization of ${{\boldsymbol j}_\textrm{s}}$ (Eq. (22)), the trivial minimum at ${{\boldsymbol j}_\textrm{s}} = {\boldsymbol 0}$ is excluded.

Because of ${\boldsymbol j}_s^{\dagger} {{\boldsymbol H}^ + }{{\boldsymbol j}_s} = {\textrm{tr}}({{{\boldsymbol j}_s}{\boldsymbol j}_s^{\dagger} {{\boldsymbol H}^ + }} )= {\textrm{tr}}({{{\boldsymbol H}_s}{{\boldsymbol H}^ + }} )$, the term can also be interpreted as a comparison between a simulated and measured Cloude covariance matrix. For a better understanding, the quantities are sketched graphically in Fig. 3 in a simplified way.

Fig. 3. Simplified 2D illustration of the quantities j and H: Cloude’s covariance matrix H represents a 4D-ellipsoid, which is shown here as an ellipse. The simulated test candidate, j_s, is scanned on a 4D-sphere (here a circle) to find the closest distance to the ellipsoid. In the non-depolarizing limit, the ellipsoid degenerates to a line (three eigenvalues are 0, one is 2). Full depolarization results in a sphere (all eigenvalues are equal, 0.5).

Download Full Size | PDF

Finally, we end up with:

(26)$${\chi ^2}({\boldsymbol p}) = \sum\limits_{\lambda = 1}^N {{\boldsymbol j}_{s,\lambda }^{\dagger} \textrm{(}{\boldsymbol p}\textrm{)}{\boldsymbol H}_\lambda ^ + {{\boldsymbol j}_{s,\lambda }}\textrm{(}{\boldsymbol p}\textrm{)}}, $$

and a likelihood function of:

(27)$${{\cal L}}({\boldsymbol p}) = \prod\limits_{\lambda \textrm{ = 1}}^N {{{\textrm{C}}_\lambda }\exp \left( { - \frac{1}{2}{\boldsymbol j}_{s,\lambda }^{\dagger} ({\boldsymbol p}){\boldsymbol {H}}_\lambda^ + {{\boldsymbol j}_{s,\lambda }}({\boldsymbol p})} \right)}. $$

The values ${{\textrm{C}}_\lambda }$ are constant normalization parameters².

With the merit function SE_H=${\chi ^2}$ from Eq. (26), we indeed found the solution for our initial examples. See Table 2: Of course, we also confirmed the merit function to work as perfectly with other thicknesses and uncertainties. Moreover, we have checked that the results are correct for each single wavelength. From that, we heuristically conclude that the inherent spectral weighting is correct as well.

Table 2. Simulations: SiO₂-thicknesses and averaged standard deviations. Nominal and recovered agree for the new merit function SE_H.

View Table | View all tables in this article

In Appendix 1, we give an additional example of an absorbing layer.

3.2 Extension to include measurement noise

We have demonstrated the new merit function using simulated data. Now we discuss its application to real data. There is a practical problem: The inversion of Cloude’s covariance matrix in Eq. (26) may cause some difficulties. With the simulated data, we had a Cloude covariance matrix with two eigenvalues equal to zero. We overcame the problem of the singular matrix inversion by using the pseudo-inverse matrix. In practice and especially in an important application of ellipsometry, the determination of isotropic layers, one often has to deal with eigenvalues close to zero. Such matrices are ill-conditioned for inversion.

Two solutions are quite simple: 1) If the sample is a priori definitely isotropic, one can neglect the zero off-diagonal elements of the Jones matrix and reduce the 4 × 1 j-vector to a 2 × 1 vector and, equivalently, the 4 × 4 H-matrix to a 2 × 2 matrix. 2.) One can apply some matrix regularization. For example, one can use the Tikhonov regularization [38] and add a small number (e.g. 10⁻⁸) to all eigenvalues before they are inverted. So, effectively, one adds some noise to get a more stable solution.

Both methods are easy to implement and well established, and therefore, they shall not be discussed here any further. Instead, we propose another method which will allow us to incorporate statistical measurement noise as well. To derive it, we first take a closer look at the measured Mueller matrices for isotropic samples. It turns out that there are not only two small eigenvalues, but even tiny negative eigenvalues sometimes. These are unphysical results, as Cloude pointed out in [26] (indicating e.g. a polarization index larger than one or negative intensities and so on). So, he proposed what we will call “Cloude-filtering” here: Mueller matrices are transformed to Cloude covariance matrices. Next, physically realizable Mueller matrices correspond to positive semidefinite Cloude covariance matrices. If negative eigenvalues occur, they are set to zero and the corrected covariance matrix is transformed back into a then valid Mueller matrix. Note: The so filtered covariance matrix is the nearest positive semidefinite matrix in the Frobenius norm to the original matrix [39].

But a general question remains: How come there is seemingly no (or even “negative”) noise in one or more eigenvector directions when measurements are always noisy?

This can be explained by the fact that a Mueller matrix is not measured directly but calculated from valid and (slightly) invalid raw data (here from Fourier coefficients describing the polarization ellipse measured at the detector when rotating the analyzer; see Appendix 3). Invalid data can be caused by statistical noise or systematic measurement errors (e.g. owing to non-linear detector response, wrong detector offset correction, and so on). While systematic errors should preferably be addressed by technical (hardware) improvements, we focus on the treatment of statistical noise. The proposed method, then, is straightforward and describes the propagation of the measurement noise on the Cloude covariance matrix H in the merit function:

1. Suppose raw data evaluation gives ${\boldsymbol m} = {({{M_{1,2}},\ldots ,{M_{4,4}}} )^{\textrm{T}}}$, the 15 best-fitting Mueller matrix elements as a vector and the associated covariance matrix ${{\boldsymbol \Sigma }_{\boldsymbol m}}$ (see Appendix 3 for the calculation of m and ${{\boldsymbol \Sigma }_{\boldsymbol m}}$).
2. Generate a large set (n ∼ some thousands) of Mueller matrix element vectors with $\textrm{\{ }{{\boldsymbol m}_1}\textrm{,}\ldots \textrm{,}{{\boldsymbol m}_n}\textrm{\} }{\sim }{{\cal N}}({{\boldsymbol m},{{\boldsymbol \Sigma }_{\boldsymbol m}}} )$ and build the corresponding Mueller matrices $\{ {{\boldsymbol M}_1},\ldots ,{{\boldsymbol M}_n}\} $ from them³.
3. Cloude-filter these Mueller matrices: ${{\boldsymbol M}^{\boldsymbol \prime}}_{1,\ldots ,n}^{}: = {\textrm{CF}}({{{\boldsymbol M}_{\textrm{1,}\ldots \textrm{,}n}}} )$.
4. Renormalize the filtered matrices to their respective (1,1)-element: ${{\boldsymbol M}^{\boldsymbol \prime\prime}}_{1,\ldots ,n}: = {\boldsymbol M}^{\boldsymbol \prime}_{1,\ldots ,n}/M^{\prime}_{1,\ldots ,n;1,1}$.
5. Build the mean matrix: ${{\boldsymbol M}_{{\textrm{CF}}}}: = {n^{ - 1}}\sum\nolimits_{k = 1}^n {{\boldsymbol M}^{\boldsymbol \prime\prime}}_{k} $.
6. Calculate its Cloude covariance matrix ${{\boldsymbol H}_{{\textrm{CF}}}}$ and use ${\boldsymbol j}_{\textrm{s}}^{\dagger} {\boldsymbol H}_{{\textrm{CF}}}^ + {{\boldsymbol j}_{\textrm{s}}}$ in the merit function (26) or (27), respectively.

With this extension, the merit function now includes not only depolarization, but also information about the statistical measurement noise given by ${{\boldsymbol \Sigma }_{\boldsymbol m}}$. The Cloude covariance matrix ${{\boldsymbol H}_{{\textrm{CF}}}}$ is built from physically valid Mueller matrices only and hence no negative eigenvalues are possible anymore. Eigenvalues of zero (singular matrices) are only possible as the asymptotical limit in case of no noise.

With decreasing measurement noise, ${{\boldsymbol H}_{{\textrm{CF}}}}$ approximates the original H asymptotically. Experimentally, this can be achieved with an increasing number of measurements. This lowers the mean Mueller matrix uncertainty (${{\boldsymbol \Sigma }_{\boldsymbol m}}\sim 1/\textrm{N}$). Also, in the limiting case, when there is only noise, the method behaves as desired—it leads to seemingly full depolarization.

Fig. 4. Application of the method to incorporate raw data noise into the data evaluation. Simulated example: A SiO₂-layer on Si has a thickness of 2.0 ± 0.1 nm. Starting from the true uncertainty of 0.1 nm, the recovered uncertainty increases with the raw data noise level. At typical measurements with our device, we have a noise level of 10⁻⁹−10⁻¹⁰.

Download Full Size | PDF

We demonstrate the application of this method to our initial example for the 2.0 ± 0.1 nm-thick SiO₂ layer on Si. Therefore, we generate different noise levels. For reasons of simplicity, they shall only depend on one scalar parameter, σ_m. We define: ${{\boldsymbol \Sigma }_{\boldsymbol m}}({\sigma _{\boldsymbol m}}) = \sigma _{\boldsymbol m}^2{{\textrm{I}}_{15}}$ with I₁₅ being the 15 × 15 identity matrix. We do this for all wavelengths. The interesting parameter is now the recovered SiO₂-height uncertainty in dependence of the noise level. The result is shown in Fig. 4.

So, as can be seen and described above, in the no-noise limit one sets up on the nominal sample uncertainty (here 0.1 nm), while the recovered uncertainty increases with the raw data noise. Although everything is plausible, we do not and cannot claim that the raw data error propagation on the parameter uncertainty is fully correct. Gaussian error propagation cannot be used here as the conditions of unconstraint variables are not fulfilled owing to the eigenvalues being restricted to the interval [0,2]. A (more) correct approach would be to use Bayesian statistics. But this would be extremely costly because for each used wavelength, a posterior distribution of each Mueller matrix element had to be calculated numerically before its expected value could be determined. The proposed method, on the other hand, is computationally cheap. For example, the generation and Cloude-filtering of 5,000 Mueller matrices is done in less than 0.2 sec on a modern machine (without parallelization). Please note: This only needs to be done once as part of raw data processing.

Effectively, the method adds noise to the covariance matrix. In this way, the matrix inversion problem with which we started this section is solved by a measurement noise-dependent regularization. The inclusion of statistical measurement noise is an important improvement when aiming at quantitative ellipsometry.

4. Application on measured data

4.1 Data evaluation tools

Before we apply our method in the next section to real data, we define some helpful data evaluation tools.

So far, we have derived the tools to process ellipsometric raw data to a covariance matrix that includes the sample’s polarization properties and the statistical noise of the experiment. Furthermore, a χ²- and a likelihood function for data analysis have been defined. With the extension of the last section, the functions are now given by

(28)$${\chi ^2}({\boldsymbol p}) = \sum\limits_{\lambda = 1}^N {{\boldsymbol j}_{{\textrm{s}},\lambda }^{\dagger} \textrm{(}{\boldsymbol p}\textrm{)}{\boldsymbol H}_{{\textrm{CF}},\lambda }^ + {{\boldsymbol j}_{{\textrm{s}},\lambda }}\textrm{(}{\boldsymbol p}\textrm{)}}$$

and

(29)$${{\cal L}}({\boldsymbol p}) = \prod\limits_{\lambda \textrm{ = 1}}^N {{C_{{\textrm{CF}},\lambda }}\exp \left( { - \frac{1}{2}{\boldsymbol j}_{{\textrm{s}},\lambda }^{\dagger} ({\boldsymbol p}){\boldsymbol {H}}_{{\textrm{CF}},\lambda }^ + {{\boldsymbol j}_{{\textrm{s}},\lambda }}({\boldsymbol p})} \right)}. $$

(Please also notice Appendix 4 for information on further weight factors which can be applied in the equations above.)

As shown in Sections 2 and 3, the averaged covariance is of high relevance as it contains the physical parameter uncertainties. Therefore, we define the average likelihood function as

(30)$${\bar{{\cal L}}}({\boldsymbol p}) = {\left[ {\prod\limits_{\lambda \textrm{ = 1}}^N {{{\textrm{C}}_{{\textrm{CF}},\lambda }}\exp \left( { - \frac{1}{2}{\boldsymbol j}_{{\textrm{s}},\lambda }^{\dagger} ({\boldsymbol p}){\boldsymbol {H}}_{{\textrm{CF}},\lambda }^ + {{\boldsymbol j}_{{\textrm{s}},\lambda }}({\boldsymbol p})} \right)} } \right]^{1/N}}. $$

This allows us to

• apply the maximum likelihood method: $\hat{{\boldsymbol p}} = \arg \mathop {\max }\limits_{\boldsymbol p} {\bar{{\cal L}}}({\boldsymbol p})$,
• calculate the posterior distribution (π_o, “out”) from a given prior distribution (π_i, “in”) with the help of Bayesian statistics: $(31)$${\pi _o}({\boldsymbol p}) = \frac{{{\bar{{\cal L}}}({\boldsymbol p}){\pi _i}({\boldsymbol p})}}{{\int {{\bar{{\cal L}}}({\boldsymbol p}){\pi _i}({\boldsymbol p})d{\boldsymbol p}} }}, $$$
• and define the Bayesian information criterion [40] to relatively compare fit models with each other: $(32)$${\textrm{BIC}} ={-} 2\ln ({{\cal L}}(\hat{{\boldsymbol p}})) + k\ln (N) ={-} 2N\ln ({\bar{{\cal L}}}(\hat{{\boldsymbol p}})) + k\ln (N)$$$
(models with lower values are preferred; k, the number of free parameters). The information criterion balances between fit optimality (first summand) and model complexity (second summand). As a result, it prevents from under- and over-fitting.

Since the normalization factors C_CF,λ and the covariance matrices H_CF,λ do not depend on the fit parameter vector p, the maximum likelihood method is equivalent to χ² minimization. When calculating the posterior distribution with the Bayesian approach, the C_CF,λ values are also not important as they cancel themselves out. The BIC is only affected by the C_CF,λ values in the form of an offset. This offset is irrelevant when comparing models. So, for our purposes,

(33)$${\textrm{BI}}{{\textrm{C}}_{{\textrm{red}}}} ={-} 2\ln ({{\cal L}}(\hat{{\boldsymbol p}})) + k\ln (N) + 2\sum\nolimits_{\lambda = 1}^N {\ln {{\textrm{C}}_{{\textrm{CF}},\lambda }}} = {\chi ^2}(\hat{{\boldsymbol p}}) + k\ln (N)$$

is sufficient. In the next subsection, we will apply these tools on measured data.

4.2 Examples

To demonstrate how the derived method works on real measurement data, we performed a Mueller ellipsometric measurement on a standard crystalline silicon wafer to determine the thickness of its native oxide layer. As in the simulation examples presented before, a spectral range of 190–1000 nm at N = 940 discrete wavelengths and an angle of incidence of 70° were chosen. Dispersion data were taken from [41] for SiO₂ and from [35] for Si.

4.2.1 Single-layer model

Figure 5 shows the spectral contributions to the minimal χ² value and their distribution being calculated according to Eq. (28). The χ² values are close to 1 over the whole spectrum, indicating a good agreement of model and measurement. In the DUV-range, the values increase for the following reasons: the intensity of the ellipsometer’s light source and the detector’s sensitivity are lower than in VIS and NIR; and short wavelengths are more sensitive to roughness-induced depolarization. This results in higher noise levels. The χ² distribution is of a typical shape.

Fig. 5. Best fit residuals: Spectral contribution to χ² (left) and its distribution (right) for an oxide height fit on silicon. The measurement was performed at an angle of incidence of 70°.

Download Full Size | PDF

We get ${\chi ^2}({\hat{p}} )/N$=1.210 (BIC_red:1145.72)—this is slightly better than using Palik’s dispersion data [14] for fused SiO₂, which gives 1.213 (BIC_red:1146.90).

4.2.2 Models with additional interlayers

With the help of the Bayesian information criterion, one can test if it is reasonable to add an interface layer between the oxide and the silicon. We checked it for SiO [14] and a Bruggeman effective medium layer (EMA) [42]. The latter was chosen as a 50:50 mixture of SiO₂ and Si to represent a first-order approximation of the gradual transition from silicon to its oxide. With the SiO layer, we got the best result: BIC_red:1128.85 instead of BIC_red:1130.60 for the EMA layer. With an additional free height parameter for an EMA layer at the air/SiO₂ interface, the information criterion gets worse again (BIC_red: 1129.12). In this case, the problem is said to be overfitted, while it is underfitted with just one layer. In Table 3, all the results are summarized.

Table 3. Model comparison: Measurements on a silicon wafer are analyzed with different models to characterize the native oxide. Best values are bold. Bayesian information criterion advices to use the SiO₂/SiO layer stack with SiO₂-dispersion data from [41].

View Table | View all tables in this article

The results of this little model comparison are quite physical and plausible: The latest dielectric parameters for SiO₂ from Rodríguez-de Marcos et al. fit better than the older Palik data, and the interface between SiO₂ and Si can be described best with SiO instead of an (SiO₂/Si) EMA layer.

Now, we focus on the thickness results for the different models. They are also given in Table 3. The best one-layer model yields an oxide height of 2.18 nm with an average standard deviation of 0.38 nm. This is a plausible result, especially when comparing the measured larger depolarization, as indicated by the polarization index and the entropy, to those in the simulation example for h = 2 ± 0.1 nm (Fig. 2 and Fig. 6). When adding interface layers, however, high (anti-) correlations with large single height uncertainties occur. So, while the overall height is quite constant, the layer composition is unclear. Moreover, another problem gets obvious: The uncertainty ranges cover also negative layer heights, which, of course, are unphysical. So, the Gaussian approximation (which in general assumes unconstraint parameters) cannot be applied here. We propose to use Bayesian statistics according to Eq. (31).

Fig. 6. Polarization index (left) and entropy (right, see Appendix 2) calculated from the Mueller matrices measured on a silicon wafer at an angle of incidence of 70°.

Download Full Size | PDF

We do so by sampling the posterior distribution numerically with Markov chain Monte Carlo (MCMC) using the random walk Metropolis algorithm [43,44]. Details of the technique used here are described in [45,46], along with their references. Since no external a priori information was available, uniform height distributions have been used as prior π_i.

In Fig. 7 and Fig. 8 the posterior distributions are shown for the single-layer model (with SiO₂ dispersion data from [41]) and for the two-layer model (with SiO₂ data from [41] and SiO data from [14]) obtained with the mean likelihood function ${\bar{{\cal L}}}({\boldsymbol p})$ from Eq. (30). For the single-layer model, one recognizes a nearly Gaussian distribution. Its expected value and standard deviation (2.18 ± 0.35 nm) are in good agreement with the optimization results given in Table 3. The slight deviation for the standard deviation can be explained with the limited number of sample points (here 20,000).

Fig. 7. Posterior distribution for the single-layer model after the calculation of 20,000 sample points with the likelihood function ${\bar{{\cal L}}}({{{h}}_{{\textrm{SiO}}2}})$. Its shape is close to Gaussian. Expected value: 2.18 nm, standard deviation: 0.35 nm.

Download Full Size | PDF

Fig. 8. Posterior distribution for the two-layer model after the calculation of 20,000 sample points with the mean likelihood function ${\bar{{\cal L}}}({{(}{{{h}}_{{\textrm{SiO}}2}},{{{h}}_{{\textrm{SiO}}}})} )$. Owing to the limitation at h = 0, the layer thicknesses are unsymmetrically distributed (see 1D projections (a) and (b)). Furthermore, h_SiO2 and h_SiO are highly anti-correlated (c)).

Download Full Size | PDF

Nonetheless, for the two-layer model, one gets a non-symmetric, non-Gaussian posterior distribution for both layer thicknesses (Fig. 8(a) and 8(b)). Also, a high (anti-)correlation between the two heights can be observed (Fig. 8(c)). It is not trivial to express such a result in numbers. The expected values for the layer heights (E(h_SiO2) = 1.18 nm and E(h_SiO) = 0.82 nm, respectively), for example, are of low relevance only. They do not agree with the most likely values we got from the optimization as well as from the maximum of the posterior distribution. The standard deviations (σ(h_SiO2) = 0.67 nm, σ(h_SiO) = 0.52 nm, correlation coefficient: −0.87) depend on the expected values, so they also are not very meaningful. So, one has to accept and interpret the whole posterior distribution as the final result.

Instead of ${\bar{{\cal L}}}({\boldsymbol p})$ (Eq. (30)), ${{\cal L}}({\boldsymbol p})$ (Eq. (29)) can be used in the Bayesian formula (Eq. (31)) to calculate the variations of the mean thicknesses of the measurements at N = 940 different wavelengths. They are given as posterior distribution in Fig. 9. Apparently, a Gaussian approximation would be appropriate: The posterior is distributed with the best fit values $\hat{{\boldsymbol p}}$, the statistical uncertainty ${\hat{{\boldsymbol \sigma }}_{\boldsymbol p}}$, and the correlations ${\textrm{Corr}}(\hat{{\boldsymbol p}})$, all obtained with the optimization method (Table 3). But please keep in mind: The deviation of the mean thicknesses is not a measurand for the layer uncertainty, but it decreases usually with the number of measurements.

Fig. 9. Posterior distribution for the two-layer model after the calculation of 20,000 sample points with the likelihood function ${{\cal L}}({({{{h}}_{{\textrm{SiO}}2}},{{{h}}_{{\textrm{SiO}}}})} )$. Since it is close to Gaussian, it should and does agree with in expected values (${{\hat{\boldsymbol p}}}$), standard deviations of the means (${{\hat{\boldsymbol \sigma}}_{\boldsymbol p}}$), and correlation coefficients (${\textrm{Corr}}({\hat{\boldsymbol p}})$) with the regarding optimization values.

Download Full Size | PDF

However, depending on the intended application of the measurement, the mean thickness might be the measurand of interest.

For the sake of completeness: For the measured spectra, we have compared the optimization results achieved with the new merit function to those of the traditional merit functions. The results are given in Appendix 5. In Appendix 4, we propose an additional weight function considering the differences in measurement sensitivities for different wavelengths and angles of incident.

5. Summary, conclusions, and outlook

In this paper, we have dealt with the statistical uncertainty calculation in Mueller ellipsometry. In ellipsometry, the (co-)variance of the mean best-fit parameters (which decreases with the inverse of the typically large number of used wavelengths) is often misinterpreted and misunderstood as the sample’s parameter uncertainty. Apart from that, we have demonstrated, with the help of simulations, that commonly used methods for the ellipsometric uncertainty calculation fail even in one of the simplest cases, namely the determination of the oxide thickness on silicon. The main problem is the inappropriate treatment of depolarization, which is most often simply neglected.

Based on the work of Cloude [26] and Ossikovski and Arteaga [33], we have algebraically derived a new merit function as the central part of the ellipsometric uncertainty calculation. This new merit function solves the problem of determining measurement uncertainties in SE metrology, and it was tested and proved using simulated and noise-free SE data.

We have also presented an extension of our solution which additionally includes statistical measurement noise. Hence, our extended approach enables the analysis of real-world measurement data, which is always affected by noise. As a welcome side-effect, this numerically regularizes the problem and stabilizes the solution. A demonstration with real data was given.

The choice of an adequate model is a major task in ellipsometry. Here, the Bayesian information criterion is helpful, as it allows to compare different models with each other. This Bayesian information criterion can reasonably be defined and used also in the context of our new approach—it is based on the likelihood function that we utilize to apply Bayesian statistics. Generally, we recommend the Bayesian statistics, especially in cases where the Gaussian statistics fail or cannot be applied (e.g. in case of constrained parameters). Examples of how to use the information criterion and how to calculate the posterior distribution with Bayesian statics were given.

To sum up, our main conclusion is: Cloude’s covariance matrix can be used as what it is, namely a covariance matrix. With it, sample-induced depolarization can now be considered. This results in a correct weighting of all spectral contributions to the merit function. Standard statistical tools like maximum likelihood, Bayesian information criterion, or Bayesian statistics can be applied.

In this paper, we have indirectly applied many commonly used idealizations like the far-field-approximation, plane wavefronts, unlimited spectral detector resolution, perfect polarization optics, and so on. The systematic deviations from ideal cases lead to systematic contributions to the best-fit result and their uncertainties. In a full uncertainty budget, they should be considered as well.

We have also used seemingly contradictory assumptions: We assumed full coherence for the light–matter interaction at distinct layer thicknesses but averaged the resulting Mueller matrices incoherently. The contradiction can be solved by considering spatial information on both the sample and the light: We assume that the illuminating light is spatially coherent only over a limited area with a sufficiently constant and distinct layer height. Over larger distances on the sample where the layer thickness may vary, the spatial coherence is assumed to be lost.

Hingerl and Ossikovski have done pioneering work on how to deal with partial coherence in Mueller ellipsometry [47,48]. We will have to involve their results when applying our method on inhomogeneous or highly depolarizing rough samples. This will be one of our next goals.

Additionally, for rough sample surfaces, Fresnel’s equations are not applicable anymore. Instead, one must solve Maxwell’s equations numerically to calculate the light–matter interaction. In the past, we successfully used the finite-element solver JCMSuite for this while analyzing periodically structured surfaces [49,50,51]. In the future, we are also going to apply this Maxwell solver to stochastically structured, rough surfaces.

Appendix 1: Remarks on absorbing layers

The thickness and uncertainty determination of a dielectric layer, as shown in the introducing example with a SiO₂ layer on Si, is quite simple because of the missing absorption. The problem is different with a metallic layer. So, we carried out similar virtual experiments as before but with an aluminum layer on glass. Again, we chose an angle of incidence of 70° and a spectral range of 190–1000 nm. On the one hand, one expects that the mean reconstructed thickness is systematically shifted toward smaller values because light probing thinner parts of the Al layer has higher remaining intensity than those probing thicker parts. On the other hand, the thickness sensitivity is expected to decrease with increasing absorbance/thickness, which lets us expect larger thickness uncertainty. And indeed, this is what we observe: Simulation results are shown in Table 4 for two different nominal Al layer thicknesses and uncertainties. While the achieved uncertainties with the SE_H=χ² from Eq. (26) are reasonable and plausible, they again give wrong results and seem to be without any recognizable system for the other merit functions.

Table 4. Al-thicknesses and their uncertainties. Nominal and recovered values for different merit functions. Dispersion data for Al and SiO₂ was taken from [14].

View Table | View all tables in this article

Though it is only small, experimenters and users should always be aware of the systematic deviation and the increased uncertainty when analyzing absorbing layers.

Furthermore, one can also see that the different merit functions give different best-fit values. This results from the spectral dependent shift of the best-fit height and their different weighting coming along with the selected merit function (see explanation in Section 2).

Appendix 2: Polarization criteria

From the definition of Cloude’s covariance matrix (19), one can see:

(34)$${\textrm{tr}}({\boldsymbol H}) = \frac{1}{2} \cdot 4 \cdot {M_{1,1}}. $$

For M_1,1-normalized Mueller matrices, tr(H) = 2 follows. Furthermore, the trace of a quadratic matrix equals the sum of its eigenvalues:

(35)$${\textrm{tr}}({\boldsymbol H}) = \sum\nolimits_{k = 1}^4 {{\lambda _k}}. $$

As shown in [52,53], this allows to trace back the polarization index (14) of a Mueller matrix on the eigenvalues of its Cloude covariance matrix as

(36)$${P_{\textrm{D}}} = \sqrt {\frac{{{\textrm{tr}}({{{\boldsymbol M}^T}{\boldsymbol M}} )- M_{1,1}^2}}{{3 \cdot M_{1,1}^2}}} = \sqrt {\frac{{4\sum\nolimits_{k = 1}^4 {\lambda _k^2} - {{\left( {\sum\nolimits_{k = 1}^4 {{\lambda_k}} } \right)}^2}}}{{3 \cdot {{\left( {\sum\nolimits_{k = 1}^4 {{\lambda_k}} } \right)}^2}}}} = \sqrt {\frac{{\sum\nolimits_{k = 1}^4 {\lambda _k^2} - 1}}{3}}, $$

where the last equal sign holds for M_1,1-normalized Mueller matrices only.

P_D is widely used (and therefore we use it here also), but it is not the only index to describe the depolarization caused by a sample. The information-based approach leading to the entropy [54]

(37)$$S ={-} \sum\nolimits_{k = 1}^4 {{P_k} \cdot {{\log }_4}{P_k}}$$

(with ${P_k}$, the normalized eigenvalues: ${P_k} = {{{\lambda _k}} \mathord{\left/ {\vphantom {{{\lambda_k}} {\sum\nolimits_{l = 1}^4 {{\lambda_l}} }}} \right.} {\sum\nolimits_{l = 1}^4 {{\lambda _l}} }}$) is also of relevance and probably more fundamental.

As can be seen, both definitions are based on Cloude’s covariance matrix eigenvalues—i.e. they are scalar quantities in the interval [0,1] to describe the shape deviation of the 4D covariance ellipsoid from the ideal sphere on the one hand (fully depolarized state) and a line (fully polarized state) on the other hand. (P_D is 1 for fully polarized and 0 for fully depolarized light. For S, it is vice-versa).

Appendix 3: Raw data processing

Our Mueller-ellipsometer (Sentech SE 850) works in the so-called step-scan mode. For different combinations of the polarizer and the two compensators, the analyzer scans the polarization ellipse. So, the intensity is measured as a function of the analyzers angle ${\theta _A}$:

(38)$$\textrm{I(}{{\theta }_\textrm{A}}\textrm{) = }{\textrm{F}_\textrm{0}} \cdot \textrm{(1 + }{\textrm{F}_\textrm{1}}\textrm{cos2}{{\theta }_\textrm{A}}\textrm{ + }{\textrm{F}_\textrm{2}}\textrm{sin2}{{\theta }_\textrm{A}}\textrm{)}. $$

So, the Fourier coefficients can be expressed as:

(39)$$\begin{array}{c} {\textrm{F}_\textrm{1}}\textrm{ = I(}{\theta_{A}} = 0^\circ )/{\textrm{F}_\textrm{0}}\textrm{ - 1,}\\ {\textrm{F}_\textrm{2}}\textrm{ = I(}{\theta _{A}} = 45^\circ )/{\textrm{F}_\textrm{0}}\textrm{ - 1} \end{array}$$

with

(40)$${\textrm{F}_\textrm{0}}\textrm{ = }{{({\textrm{I(}{\theta_{A}} ={-} 45^\circ ) + \textrm{I(}{\theta_{A}} = 45^\circ )} )} \mathord{\left/ {\vphantom {{({\textrm{I(}{\theta_{A}} ={-} 45^\circ ) + \textrm{I(}{\theta_{A}} = 45^\circ )} )} 2}} \right.} 2}. $$

The Stokes vector measured in the Mueller-ellipsometric experiment with unpolarized light and normalized intensity is:

(41)$$\left( {\begin{array}{c} \textrm{I}\\ {{\textrm{I}_\textrm{x}} - {\textrm{I}_\textrm{y}}}\\ {{\textrm{I}_{45^\circ }} - {\textrm{I}_{- 45^\circ}}}\\ {{\textrm{I}_\textrm{R}} - {\textrm{I}_\textrm{L}}} \end{array}} \right)\textrm{ = }{{\boldsymbol M}_{\textrm{P2}}} \cdot {{\boldsymbol M}_{\textrm{C2}}} \cdot {\boldsymbol M}\textrm{ } \cdot {{\boldsymbol M}_{\textrm{C1}}} \cdot {{\boldsymbol M}_{\textrm{P1}}}\left( {\begin{array}{c} 1\\ 0\\ 0\\ 0 \end{array}} \right)$$

Therefore, the detectable overall intensity I corresponds formally to the (1,1)-element of the experiment’s Mueller matrix:

(42)$$\textrm{I = }{[{{{\boldsymbol M}_{\textrm{P2}}} \cdot {{\boldsymbol M}_{\textrm{C2}}} \cdot {\boldsymbol M}\textrm{ } \cdot {{\boldsymbol M}_{\textrm{C1}}} \cdot {{\boldsymbol M}_{\textrm{P1}}}} ]_{1,1}}$$

Here, M is the Mueller matrix of the sample. M_P and M_C are those of the analyzer/polarizer and the compensators, respectively. M_P and M_C can be calculated with Eq. (5) from their respective Jones matrices [55]:

(43)$${{\boldsymbol J}_\textrm{P}}\textrm{ = }\left( {\begin{array}{cc} \textrm{1} &\textrm{0}\\ \textrm{0} &\textrm{0} \end{array}} \right)$$

and [56,57]

(44)$${{\boldsymbol J}_\textrm{C}}\textrm{ = }\left( {\begin{array}{cc} \textrm{1}&{\textrm{ - i}\gamma (1 - \exp \textrm{ i}\delta )}\\ {\textrm{i}\gamma (1 - \exp \textrm{ i}\delta )}&{\exp \textrm{ i}\delta } \end{array}} \right)$$

with γ, the optical activity coefficient, and the retardance δ. The rotation of the polarizing elements around their optical axis is mathematically a co-ordinate transformation

(45)$${\boldsymbol M}{(\theta ) = }{\boldsymbol R}{(\theta )} \cdot {\boldsymbol M}{(\theta = 0)} \cdot {{\boldsymbol R}^{\textrm{ - 1}}}{(\theta )}$$

with

(46)$${\boldsymbol R}{(\theta ) = }\left( {\begin{array}{cccc} \textrm{1} &\textrm{0} &\textrm{0} &\textrm{0}\\ \textrm{0}&{\textrm{cos 2}\theta }&{\textrm{ - sin 2}\theta } &\textrm{0}\\ \textrm{0}&{\textrm{sin 2}\theta }&{\textrm{cos 2}\theta } &\textrm{0}\\ \textrm{0} &\textrm{0} &\textrm{0} &\textrm{1} \end{array}} \right). $$

Finally, one can calculate the Fourier coefficients F₁ and F₂ (39) for each configuration (combination of polarizer and compensator settings) and sample analytically from the intensity given by:

(47)$$\begin{aligned} \textrm{I}({{\theta }_\textrm{A}}): &= {\textrm{I}_{{\theta \textrm{P}, \theta \textrm{C}1, \delta \textrm{C}1, }\gamma {\textrm{C}1, \theta \textrm{C}2, \delta \textrm{C}2, \gamma \textrm{C}2,}{\boldsymbol M}}}({{\theta}_\textrm{A}})\\ &= {[{{{\boldsymbol M}_{\textrm{P2}}}\textrm{(}{{\theta }_\textrm{A}}\textrm{)} \cdot {{\boldsymbol M}_{\textrm{C2}}}\textrm{(}{{\theta }_{\textrm{C2}}}\textrm{,}{{\delta }_{\textrm{C2}}}\textrm{,}{{\gamma }_{\textrm{C2}}}\textrm{)} \cdot {\boldsymbol M}\textrm{ } \cdot {{\boldsymbol M}_{\textrm{C1}}}\textrm{(}{{\theta }_{\textrm{C1}}}\textrm{,}{{\delta }_{\textrm{C1}}}\textrm{,}{{\gamma }_{\textrm{C1}}}\textrm{)} \cdot {{\boldsymbol M}_{\textrm{P1}}}\textrm{(}{{\theta }_\textrm{P}}\textrm{)}} ]_{1,1}} \end{aligned}$$

Ideally, the analytical terms for F₁ and F₂ equal the measurands:

(48)$$\begin{array}{c} {\textrm{F}_\textrm{1}}\textrm{(}{\boldsymbol M}\textrm{) = }{\textrm{F}_{\textrm{1,meas}}}\\ {\textrm{F}_\textrm{2}}\textrm{(}{\boldsymbol M}\textrm{) = }{\textrm{F}_{\textrm{2,meas}}} \end{array}$$

These equations are lengthy but linear with respect to the matrix elements m_1,…,16. So, they can be written as $\sum\nolimits_{k\textrm{ = 1}}^{\textrm{16}} {{\textrm{a}_k}} {m_k}\textrm{ = b}$. With the usual normalization m₁=1 and for a set of n/2 measurements differing in polarizer and compensator settings, they can be written as linear equation systems

(49)$$\begin{array}{l} {{\boldsymbol A}_\textrm{1}}{\boldsymbol m}\textrm{ = }{{\boldsymbol b}_\textrm{1}}\\ {{\boldsymbol A}_\textrm{2}}{\boldsymbol m}\textrm{ = }{{\boldsymbol b}_\textrm{2}} \end{array}$$

with m=(m₂,…,m₁₆)^T. Both equation systems can be combined to ${\boldsymbol A}\textrm{ = }\left[ {\begin{array}{c} {{{\boldsymbol A}_\textrm{1}}}\\ {{{\boldsymbol A}_\textrm{2}}} \end{array}} \right]$ and ${\boldsymbol b}\textrm{ = }\left[ {\begin{array}{c} {{{\boldsymbol b}_\textrm{1}}}\\ {{{\boldsymbol b}_\textrm{2}}} \end{array}} \right]$ so that they end up as the following simple system:

(50)$${\boldsymbol A}{\boldsymbol m}\textrm{ = }{\boldsymbol b}$$

with ${\boldsymbol A} \in {{\mathbb R}^{n{ \times 15}}}$ and ${\boldsymbol b} \in {{\mathbb R}^{n{ \times 1}}}$. From that, the best-fitting Mueller matrix elements as well as their covariance matrix can be derived if the number of (linear independent) measurands is equal or larger than the number of free parameters (here 135; 15 Mueller-Matrix elements + 120 unique elements for the covariance matrix). Next, one usually calculates the best fitting m with the help of the pseudo-inverse matrix ${{\boldsymbol A}^\textrm{ + }}\textrm{ = (}{{\boldsymbol A}^\textrm{T}}{\boldsymbol A}{\textrm{)}^{\textrm{ - 1}}}{{\boldsymbol A}^\textrm{T}}$ as ${{\boldsymbol m}_{\textrm{bf}}}\textrm{ = }{{\boldsymbol A}^\textrm{ + }}{\boldsymbol b}$. Note: this solution is equal to the least square solution ${{\boldsymbol m}_{\textrm{bf}}} = \mathop {\textrm{arg min}}\limits_{\boldsymbol m} {||{{\boldsymbol A}{\boldsymbol m}{\boldsymbol - b}} ||^\textrm{2}}$, where only A is assumed to be affected by errors. But here, b is also perturbed by measurement noise. This is considered by applying total least squares instead. The optimization problem, then, reads as [58]:

(51)$${{\boldsymbol m}_{\textrm{tls}}} = \mathop {\textrm{arg min}}\limits_{\boldsymbol m} [{{{{{||{{\boldsymbol A}{\boldsymbol m}\textrm{ - }{\boldsymbol b}} ||}^\textrm{2}}} \mathord{\left/ {\vphantom {{{{||{{\boldsymbol A}{\boldsymbol m}\textrm{ - }{\boldsymbol b}} ||}^\textrm{2}}} {({{{||{\boldsymbol m} ||}^\textrm{2}}\textrm{ + 1}} )}}} \right.} {({{{||{\boldsymbol m} ||}^\textrm{2}}\textrm{ + 1}} )}}} ]. $$

There exists an algebraic solution:

(52)$${{\boldsymbol m}_{\textrm{tls}}}\textrm{ = (}{{\boldsymbol A}^\textrm{T}}{\boldsymbol A}{ - \sigma }_{\textrm{min}}^\textrm{2}{\boldsymbol I}{\textrm{)}^{\textrm{ - 1}}}{{\boldsymbol A}^\textrm{T}}{\boldsymbol b}$$

with σ_min, the smallest singular value of the matrix $[{{\boldsymbol A}\textrm{,}{\boldsymbol b}} ]$, and I, the identity matrix. The statistical uncertainty of the solution m_tls can be estimated with the covariance matrix

(53)$${{\boldsymbol \Sigma }_{\boldsymbol m}}\textrm{ = }\frac{\textrm{1}}{{n \cdot \textrm{(}n\textrm{ - 15)}}}{\textrm{(}{{\boldsymbol A}^\textrm{T}}{\boldsymbol A}{ - \sigma }_{\textrm{min}}^\textrm{2}{\boldsymbol I}\textrm{)}^{\textrm{ - 1}}}{\textrm{(}{\boldsymbol A}{{\boldsymbol m}_{\textrm{tls}}}\textrm{ - }{\boldsymbol b}\textrm{)}^\textrm{T}} \cdot \textrm{(}{\boldsymbol A}{{\boldsymbol m}_{\textrm{tls}}}\textrm{ - }{\boldsymbol b}\textrm{)}. $$

Appendix 4: Sensitivity weights

The merit function given in Eq. (28) considers depolarization and statistical noise for each single measurement. Until now, when calculating statistical sample parameter uncertainties, the contributions of all measurement configurations (differing e.g. in wavelength or angle of incidence) have been uniformly weighted. However, in addition, they can and should be interfered with external or technically motivated weights. For example, to consider for the measurement contrast: We introduce a measure for the sample’s polarizing capabilities and use it as a weight factor for the summands in Eq. (28). Therefore, we compare the measured (and normalized) Mueller matrix with the one of an ideal reflector (Fresnel coefficients: r_s=−1, r_p=1). The latter calculates with Eqs. (2) and (5) to:

(54)$${{\boldsymbol M}_{{\textrm{iR}}}} = \left( {\begin{array}{cccc} 1 &0 &0 &0\\ 0 &1 &0 &0\\ 0 &0 &{ - 1} &0\\ 0 &0 &0 &{ - 1} \end{array}} \right). $$

Thus,

(55)$${w_{\textrm{k}}} = ||{{{\boldsymbol M}_{{\textrm{iR}}}} - {{\boldsymbol M}_k}} ||_{\textrm{F}}^2$$

is a measure for the sample’s impact on the light’s polarization at the current configuration (here, ${||{{\kern 1pt} \cdot {\kern 1pt} } ||_\textrm{F}}$ is the Frobenius norm). The ability to measure sample-induced polarization changes is a necessary condition in ellipsometry to be parameter-sensitive. We assume that measurements at larger w_k have a better contrast and go along with higher sample parameter sensitivities. In the first order, this is surely true and therefore compensates the technical resolution limitation of the device. Then, relative and normalized values

(56)$${w_{\textrm{rel,} k}} = {{N \cdot {w_k}} \mathord{\left/ {\vphantom {{N \cdot {w_k}} {\sum\nolimits_{k = 1}^N {{w_k}} }}} \right.} {\sum\nolimits_{k = 1}^N {{w_k}} }}$$

make different measurements comparable. Using

(57)$${\chi ^2}({\boldsymbol p}) = \sum\limits_{\lambda = 1}^N {{w_{{\textrm{rel}},\lambda }}{\boldsymbol j}_{{\textrm{s}},\lambda }^{\dagger} \textrm{(}{\boldsymbol p}\textrm{)}{\boldsymbol H}_{{\textrm{CF}},\lambda }^ + {{\boldsymbol j}_{{\textrm{s}},\lambda }}\textrm{(}{\boldsymbol p}\textrm{)}}$$

for optimization, one gets a slightly modified best-fit result compared to the uniformly weighted case (Eq. (28)). For example, with the single-layer model and at an angle of incidence of 70°, an oxide height (dispersion data from [41]) of 2.161 ± 0.405 nm fits best. At an angle of incidence of 20°, far away from the Brewster angle, there is nearly no sensitivity and one obtains 3.8 ± 10.3 nm. A combined analysis now gives 2.161 ± 0.406 nm. Therefore, in practice, one can skip the measurement at AoI = 20°. Figure 10 exemplary shows the calculated sensitivity weights for the Mueller matrix spectra measured on our silicon wafer at different angles of incidence.

Fig. 10. Proposed sensitivity weights for a combined analysis of the Mueller matrix spectra measured at six different angles of incidence (AoI) from 20° to 70° on a silicon wafer.

Download Full Size | PDF

Note: The argumentation we supply here is not limited to our merit function only.

Appendix 5: Measurements analyzed with different merit functions

We have demonstrated on simulated data that the new merit function is the only one to consider the effect of sample-induced depolarization correctly. Nevertheless, the reader may like to compare the optimization results achieved with the traditional merit functions (Section 2) on the measured data presented in Section 4.2. They are presented in Tables 5 and 6 for the single- and two-layer models, respectively.

Table 5. Analysis of the measured data from Section 4.2 with the new and the traditional merit function for the single-layer model with SiO₂ dispersion data form [41].

View Table | View all tables in this article

Table 6. Analysis of the measured data from Section 4.2 with the new and the traditional merit function for the two-layer model with SiO₂ dispersion data from [41] and SiO data from [14].

View Table | View all tables in this article

As can be seen, except for the thickness uncertainties obtained with (Ψ, Δ)-optimization, which are significantly lower, all the results are in adequate agreement. So, for this example, the differences are manageable, but we cannot seriously assess its representability.

Appendix 6: Endnotes

¹International vocabulary of metrology (VIM):

Metrological traceability: Property of a measurement result whereby the result can be related to a reference through a documented unbroken chain of calibrations, each contributing to the measurement uncertainty.

²Owing to the normalization, j_s is limited to vectors on a 4D complex sphere ${\Omega _R} = \left\{ {{\boldsymbol r|}||{\boldsymbol r} ||= \sqrt R } \right\}$ with R = 2. The normalization parameter C is then:

\begin{aligned} {\textrm{C}} &= {1 \mathord{\left/ {\vphantom {1 {\int_{{\Omega _2}} {\exp \left( { - \frac{1}{2}{\boldsymbol j}_s^{\dagger} {\boldsymbol {H}}_{}^ + {{\boldsymbol j}_s}} \right)} {\kern 1pt} {\textrm{d}}{{\boldsymbol j}_s}}}} \right.} {\int_{{\Omega _2}} {\exp \left( { - \frac{1}{2}{\boldsymbol j}_s^{\dagger} {\boldsymbol {H}}_{}^ + {{\boldsymbol j}_s}} \right)} {\kern 1pt} {\textrm{d}}{{\boldsymbol j}_s}}}\\ &= {1 \mathord{\left/ {\vphantom {1 {\int_{{\Omega _1}} {\exp \left( { - \frac{1}{2}\sqrt 2 {\boldsymbol v}_s^{\dagger} {\boldsymbol V}{\boldsymbol \Lambda }_{}^ + {{\boldsymbol V}^{\dagger} }{{\boldsymbol v}_s}\sqrt 2 } \right)} {\kern 1pt} d{{\boldsymbol v}_s}}}} \right.} {\int_{{\Omega _1}} {\exp \left( { - \frac{1}{2}\sqrt 2 {\boldsymbol v}_s^{\dagger} {\boldsymbol V}{\boldsymbol \Lambda }_{}^ + {{\boldsymbol V}^{\dagger} }{{\boldsymbol v}_s}\sqrt 2 } \right)} {\kern 1pt} d{{\boldsymbol v}_s}}}\\ &= {1 \mathord{\left/ {\vphantom {1 {\int_{{\Omega _1}} {\exp ({ - {\boldsymbol v}_s^{\dagger} {\boldsymbol \Lambda }_{}^ + {{\boldsymbol v}_s}} )} {\kern 1pt} {\textrm{d}}{{\boldsymbol v}_s}}}} \right.} {\int_{{\Omega _1}} {\exp ({ - {\boldsymbol v}_s^{\dagger} {\boldsymbol \Lambda }_{}^ + {{\boldsymbol v}_s}} )} {\kern 1pt} d{{\boldsymbol v}_s}}} \end{aligned}

So, it only depends on the eigenvalues of H. We could not find an analytical solution for the 4D case, but it can be calculated numerically. Its value is not of relevance in the further context.

³The Cholesky decomposition can be used for this: ${\boldsymbol L}{{\boldsymbol L}^\textrm{T}}\textrm{ = }{{\boldsymbol \Sigma }_{\boldsymbol m}}$. Next, the vectors ${{\boldsymbol m}_k}: = {\boldsymbol m} + {\boldsymbol L}{{\boldsymbol x}_k}$ are distributed as desired.${{\boldsymbol x}_k}$ are vectors with normally distributed random components, each with zero mean and unit variance.

Funding

Transmet (2016-6).

Disclosures

The authors declare no conflicts of interest.

References

1. R. M. A. Azzam and N. M. Bashara, Ellipsometry and polarized light (North-Holland Publishing Co, 1977).

2. H. Tompkins and E. A. Irene, Handbook of ellipsometry (William Andrew, 2005).

3. M. Schubert, Infrared ellipsometry on semiconductor layer structures: phonons, plasmons, and polaritons209, (Springer Science & Business Media, 2004).

4. H. Fujiwara, Spectroscopic ellipsometry: principles and applications (John Wiley & Sons, 2007).

5. M. Losurdo and K. Hingerl, Ellipsometry at the Nanoscale (Springer, 2013).

6. J. J. Gil and R. Ossikovski, Polarized Light and the Mueller Matrix Approach (CRC, 2017).

7. M. Losurdo, M. Bergmair, G. Bruno, D. Cattelan, C. Cobet, A. de Martino, K. Fleischer, Z. Dohcevic-Mitrovic, N. Esser, M. Galliet, R. Gajic, D. Hemzal, K. Hingerl, J. Humlicek, R. Ossikovski, Z. V. Popovic, and O. Saxl, “Spectroscopic ellipsometry and polarimetry for materials and systems analysis at the nanometer scale: state-of-the-art, potential, and perspectives,” J. Nanopart. Res. 11(7), 1521–1554 (2009). [CrossRef]

8. M. Losurdo, Defining and Analysing the Optical Properties of Materials at the Nanoscale, Ges. für Mikro- und Nanoelektronik, (2010).

9. BIPM, IEC, IFCC, ILAC, ISO, IUPAC, IUPAP and OIML “Evaluation of Measurement Data - Guide to the Expression of Uncertainty in Measurement,” Joint Committee for Guides in Metrology, Technical Report, JCGM 100, (2008).

10. T. A. Germer, H. J. Patrick, R. M. Silver, and B. Bunday, “Developing an uncertainty analysis for optical scatterometry,” Proc. SPIE 7272, 72720T (2009). [CrossRef]

11. C. Elster and B. Toman, “Bayesian uncertainty analysis under prior ignorance of the measurand versus analysis using the Supplement 1 to the Guide: a comparison,” Metrologia 46(3), 261–266 (2009). [CrossRef]

12. B. Bunday, “HVM metrology challenges towards the 5 nm node,” Proc. SPIE 9778, 97780E (2016). [CrossRef]

13. M. N. Polyanskiy, “Refractive index database,” [Online]. Available: https://refractiveindex.info. [Accessed 09 10 2019].

14. E. D. Palik, Handbook of Optical Constants of Solids (Academic, Inc., 1998).

15. BIPM, “The BIPM key comparison database,” [Online]. Available: https://kcdb.bipm.org/. [Accessed 09 10 2019].

16. K. Hasche, P. Thomsen-Schmidt, M. Krumrey, G. Ade, G. Ulm, J. Stuempel, S. Schaedlich, W. Frank, M. Procop, and U. Beck, “Metrological characterization of nanometer film thickness standards for XRR and ellipsometry applications,” Proc. SPIE 5190, 165 (2003). [CrossRef]

17. Y. Azuma, P. Barat, G. Bartl, H. Bettin, M. Borys, I. Busch, L. Cibik, G. D’Agostino, K. Fujii, H. Fujimoto, A. Hioki, M. Krumrey, U. Kuetgens, N. Kuramoto, G. Mana, E. Massa, R. Meeß, S. Mizushima, T. Narukawa, A. Nicolaus, A. Pramann, S. A. Rabb, O. Rienitz, C. Sasso, M. Stock, R. D. Vocke Jr, A. Waseda, S. Wundrack, and S. Zakel, “Improved measurement results for the Avogadro constant using a 28Si-enriched crystal,” Metrologia 52(2), 360–375 (2015). [CrossRef]

18. J. Ehrstein, C. Richter, D. Chandler-Horowitz, E. Vogel, D. Ricks, C. Young, S. Spencer, S. Shah, D. Maher, B. Foran, A. Diebold, and P. Yee-Hung, “Thickness Evaluation for 2 nm SiO2 Films, a Comparison of Ellipsometric, Capacitance-Voltage and HRTEM Measurements,” AIP Conf. Proc. 683(1), 331–336 (2003). [CrossRef]

19. S. Kohli, C. D. Rithner, P. K. Dorhout, A. M. Dummer, and C. S. Menoni, “Comparison of nanometer-thick films by x-ray reflectivity and spectroscopic ellipsometry,” Rev. Sci. Instrum. 76(2), 023906 (2005). [CrossRef]

20. I. Busch, Auf dünnen Schichten - Pilotvergleich HfO2 auf Si, private communication, (2019).

21. M. P. Seah, “CCQM-K32 key comparison and P84 pilot study: Amount of silicon oxide as a thickness of SiO2 on Si,” Metrologia 45(1A), 08013 (2008). [CrossRef]

22. J. M. M. De Nijs and A. van Silfhout, “Systematic and random errors in rotating-analyzer ellipsometry,” J. Opt. Soc. Am. A 5(6), 773–781 (1988). [CrossRef]

23. D. H. Goldstein and R. A. Chipman, “Error analysis of a Mueller matrix polarimeter,” J. Opt. Soc. Am. A 7(4), 693–700 (1990). [CrossRef]

24. Y. J. Cho, W. Chegal, J. P. Lee, and H. M. Cho, “Universal evaluation of combined standard uncertainty for rotating-element spectroscopic ellipsometers,” Opt. Express 24(23), 26215–26227 (2016). [CrossRef]

25. X. Cheng, M. Li, J. Zhou, H. Ma, and Q. Hao, “Error analysis of the calibration of a dual-rotating-retarder Mueller matrix polarimeter,” Appl. Opt. 56(25), 7067–7074 (2017). [CrossRef]

26. S. R. Cloude, “Conditions for the physical realisability of matrix operators in polarimetry,” Proc. SPIE 1166, 177–185 (1989). [CrossRef]

27. S. Lu and R. Chipman, “Interpretation of Mueller matrices based on polar decomposition,” J. Opt. Soc. Am. A 13(5), 1106–1113 (1996). [CrossRef]

28. Z.-F. Xing, “On the Deterministic and Non-deterministic Mueller Matrix,” J. Mod. Opt. 39(3), 461–484 (1992). [CrossRef]

29. R. Sridhar and R. Simon, “Normal form for Mueller Matrices in Polarization Optics,” J. Mod. Opt. 41(10), 1903–1915 (1994). [CrossRef]

30. R. Ossikovski, “Analysis of depolarizing Mueller matrices through a symmetric decomposition,” J. Opt. Soc. Am. A 26(5), 1109–1118 (2009). [CrossRef]

31. C. Fallet, A. Pierangelo, R. Ossikovski, and A. de Martino, “Experimental validation of the symmetric decomposition of Mueller matrices,” Opt. Express 18(2), 831–842 (2010). [CrossRef]

32. R. Ossikovski, “Differential matrix formalism for depolarizing anisotropic media,” Opt. Lett. 36(12), 2330–2332 (2011). [CrossRef]

33. R. Ossikovski and O. Arteaga, “Integral decomposition and polarization properties of depolarizing Mueller matrices,” Opt. Lett. 40(6), 954–957 (2015). [CrossRef]

34. G. E. Jellison, “Use of the biased estimator in the interpretation of spectroscopic ellipsometry data,” Appl. Opt. 30(23), 3354–3360 (1991). [CrossRef]

35. C. Herzinger, B. Johs, W. McGahan, J. Woollam, and W. Paulson, “Ellipsometric determination of optical constants for silicon and thermally grown silicon dioxide via a multi-sample, multi-wavelength, multi-angle investigation,” J. Appl. Phys. 83(6), 3323–3336 (1998). [CrossRef]

36. J. J. Gil and E. Bernabeu, “Depolarization and Polarization Indices of an Optical System,” Optica Acta: International Journal of Optics 33(2), 185–189 (1986). [CrossRef]

37. I. Busch, Y. Azuma, H. Bettin, L. Cibik, P. Fuchs, K. Fujii, and S. Mizushima, “Surface layer determination for the Si spheres of the Avogadro project,” Metrologia 48(2), S62–S82 (2011). [CrossRef]

38. A. N. Tikhonov and V. Y. Arsenin, Solution of Ill-posed Problems (Winston & Sons, 1977).

39. N. J. Highnam, “Computing a nearest symmetric positive semidefinite matrix,” Linear algebra and its applications 103, 103–118 (1988). [CrossRef]

40. G. Schwarz, “Estimating the dimension of a model,” Ann. Statist. 6(2), 461–464 (1978). [CrossRef]

41. L. V. Rodríguez-de Marcos, J. I. Larruquert, J. A. Méndez, and J. A. Aznárez, “Self-consistent optical constants of SiO2 and Ta2O5 films,” Opt. Mater. Express 6(11), 3622–3637 (2016). [CrossRef]

42. D. A. G. Bruggeman, “Berechnung verschiedener physikalischer Konstanten von heterogenen Substanzen. I. Dielektrizitätskonstanten und Leitfähigkeiten der Mischkörper aus isotropen Substanzen,” Ann. Phys. 416(7), 636–664 (1935). [CrossRef]

43. C. Sherlock, P. Fearnhead, and G. O. Roberts, “The random walk Metropolis: Linking theory and practice through a case study,” Statist. Sci. 25(2), 172–190 (2010). [CrossRef]

44. H. Haario, E. Saksman, and J. Tamminen, “Adaptive proposal distribution for random walk Metropolis algorithm,” Comput. Stat. 14(3), 375–395 (1999). [CrossRef]

45. S. Heidenreich, H. Gross, and M. Bär, “Bayesian approach to determine critical dimensions from scatterometric measurements,” Metrologia 55(6), S201–S211 (2018). [CrossRef]

46. K. van den Meersche, K. Soetaert, and D. van Oevelen, “xsample(): An R function for sampling linear inverse problems,” J. Stat. Soft. 30, 1–15 (2009). [CrossRef]

47. K. Hingerl and R. Ossikovski, “General approach for modeling partial coherence in spectroscopic Mueller matrix polarimetry,” Opt. Lett. 41(2), 219–222 (2016). [CrossRef]

48. R. Ossikovski and K. Hingerl, “General formalism for partial spatial coherence in reflection Mueller matrix polarimetry,” Opt. Lett. 41(17), 4044–4047 (2016). [CrossRef]

49. “Program package JCMsuite,” JCMwave GmbH, [Online]. Available: http://www.jcmwave.com.

50. M. Wurm, J. Endres, J. Probst, M. Schoengen, A. Diener, and B. Bodermann, “Metrology of nanoscale grating structures by UV scatterometry,” Opt. Express 25(3), 2460–2468 (2017). [CrossRef]

51. M. Hammerschmidt, M. Weiser, X. G. Santiago, L. Zschiedrich, B. Bodermann, and S. Burger, “Quantifying parameter uncertainties in optical scatterometry using Bayesian inversion,” Proc. SPIE 10330, 1033004 (2017). [CrossRef]

52. J. J. Gil, “Characteristic properties of Mueller matrices,” J. Opt. Soc. Am. A 17(2), 328–334 (2000). [CrossRef]

53. R. Ossikovski, “Alternative depolarization criteria for Mueller matrices,” J. Opt. Soc. Am. A 27(4), 808–814 (2010). [CrossRef]

54. S. R. Cloude, “Group theory and polarisation algebra,” OPTIK 75(1), 26–36 (1986).

55. D. H. Goldstein, Polarized Light3 ed., (CRC, 2010).

56. D. E. Aspnes, “Effects of component optical activity in data reduction and calibration of rotating-analyzer ellipsometers,” J. Opt. Soc. Am. 64(6), 812–819 (1974). [CrossRef]

57. D. M. Radman and B. D. Cahan, “Effects of component optical activity in data reduction and calibration of rotating-analyzer ellipsometers,” J. Opt. Soc. Am. 71(12), 1546 (1981). [CrossRef]

58. I. Markovsky and S. van Huffel, “Overview of total least squares methods,” Signal Processing 87(10), 2283–2302 (2007). [CrossRef]

	Nominal		Recovered best fit result
Merit function	$h_{0}$ [nm]	$σ_{0}$ [nm]	$\hat{h}$ [nm]	${\hat{σ}}_{h} \cdot \sqrt{N - dof}$ [nm]
SE_ΨΔ	2	0.1	2	0.00014
SE _ρ	2	0.1	2	0.00024
SE _M	2	0.1	2	0.00030
SE_ΨΔ	5	0.5	5	0.00183
SE _ρ	5	0.5	5	0.00362
SE _M	5	0.5	5	0.00568

	Nominal		Recovered best fit result
Merit function	$h_{0}$ [nm]	$σ_{0}$ [nm]	$\hat{h}$ [nm]	${\hat{σ}}_{h} \cdot \sqrt{N - dof}$ [nm]
SE _H	2	0.1	2	0.1
SE _H	5	0.5	5	0.5

Free model parameters p	Reference for dispersion data	$\frac{χ^{2} (\hat{p})}{N}$	$BI C_{red} (\hat{p})$	$\hat{p}$ [nm]	${\hat{σ}}_{p} \sqrt{N - dof}$ [nm]	$Corr (\hat{p})$
$(h_{Si O_{2}})$	[14]	1.213	1146.90	2.19	0.39	.
$(h_{Si O_{2}})$	[41]	1.210	1144.18	2.18	0.38	.
$(\begin{matrix} h_{Si O_{2}} \\ h_{SiO} \end{matrix})$	[41]	1.186	1128.85	$(\begin{matrix} 1.81 \\ 0.29 \end{matrix})$	$(\begin{matrix} 2.61 \\ 2.04 \end{matrix})$	$(\begin{array}{cc} . & - 0.98 \\ . & . \end{array})$
$(\begin{matrix} h_{Si O_{2}} \\ h_{SiO} \end{matrix})$	[14]	1.186	1128.85	$(\begin{matrix} 1.81 \\ 0.29 \end{matrix})$	$(\begin{matrix} 2.61 \\ 2.04 \end{matrix})$	$(\begin{array}{cc} . & - 0.98 \\ . & . \end{array})$
$(\begin{matrix} h_{Si O_{2}} \\ h_{EMA} \end{matrix})$	[41]	1.188	1130.60	$(\begin{matrix} 1.89 \\ 0.26 \end{matrix})$	$(\begin{matrix} 2.10 \\ 1.88 \end{matrix})$	$(\begin{array}{cc} . & - 0.98 \\ . & . \end{array})$
$(\begin{matrix} h_{Si O_{2}} \\ h_{EMA} \end{matrix})$	[41]	1.188	1130.60	$(\begin{matrix} 1.89 \\ 0.26 \end{matrix})$	$(\begin{matrix} 2.10 \\ 1.88 \end{matrix})$	$(\begin{array}{cc} . & - 0.98 \\ . & . \end{array})$
$(\begin{matrix} h_{EMA} \\ h_{Si O_{2}} \\ h_{SiO} \end{matrix})$	[41]	1.179	1129.12	$(\begin{matrix} 2.10 \\ 0.29 \\ 0.42 \end{matrix})$	$(\begin{matrix} 26.4 \\ 19.4 \\ 2.7 \end{matrix})$	$(\begin{array}{ccc} . & - 0 .99 & 0 .65 \\ . & . & - 0 .74 \\ . & . & . \end{array})$
	[41]
	[14]

	Nominal		Recovered best fit result
Merit function	h₀ [nm]	σ₀ [nm]	$\hat{h}$ [nm]	${\hat{σ}}_{h} \cdot \sqrt{N - dof}$ [nm]
SE_ΨΔ	2	0.1	1.99813	0.00020
SE _ρ	2	0.1	1.99814	0.00025
SE _M	2	0.1	1.99814	0.00130
SE _H	2	0.1	1.99815	0.10006
SE_ΨΔ	50	1.0	49.9437	0.02089
SE _ρ	50	1.0	49.9339	0.01714
SE _M	50	1.0	49.9342	0.01725
SE _H	50	1.0	49.9286	1.62053

Merit function	${\hat{h}}_{SiO 2}$ [nm]	${\hat{σ}}_{hSiO 2} \cdot \sqrt{N - dof}$ [nm]
SE_ΨΔ	2.16	0.15
SE _ρ	2.14	0.33
SE _M	2.15	0.41
SE _H	2.18	0.38

Some aspects on the uncertainty calculation in Mueller ellipsometry

Abstract

1. Introduction

1.1 Ellipsometric metrology

1.2 Interpretation of Mueller matrices

2. Motivation for a new merit function

2.1 A virtual experiment to test the merit functions

3. The new merit function

3.1 Derivation

3.2 Extension to include measurement noise

4. Application on measured data

4.1 Data evaluation tools

4.2 Examples

4.2.1 Single-layer model

4.2.2 Models with additional interlayers

5. Summary, conclusions, and outlook

Appendix 1: Remarks on absorbing layers

Appendix 2: Polarization criteria

Appendix 3: Raw data processing

Appendix 4: Sensitivity weights

Appendix 5: Measurements analyzed with different merit functions

Appendix 6: Endnotes

Funding

Disclosures

References

Cited By

Figures (10)

Tables (6)

Equations (58)

Optics Express

Merit function	${\hat{h}}_{SiO 2}$ [nm]	${\hat{σ}}_{hSiO 2} \cdot \sqrt{N - dof}$ [nm]	${\hat{h}}_{SiO}$ [nm]	${\hat{σ}}_{hSiO} \cdot \sqrt{N - dof}$ [nm]	$Corr ([{\hat{h}}_{SiO2}, {\hat{h}}_{SiO}])$
SE_ΨΔ	1.94	1.48	0.17	1.16	−1.00
SE _ρ	1.84	1.20	0.23	0.88	−0.96
SE _M	1.84	1.74	0.24	1.29	−0.97
SE _H	1.81	2.61	0.29	2.04	−0.99