Quantification and reduction of Poisson-Gaussian mixed noise induced errors in ellipsometry

Bo Jiang; Kai Meng; Kamal Youcef-Toumi

doi:10.1364/OE.432793

1. Introduction

Ellipsometry has been applied in a plethora of industries as a metrology and inspection tool. The technology gains popularity due to its advantages of non-invasiveness, high-speed, and high-resolution. Ellipsometry is a promising metrology tool for the next generation of semiconductor devices [1], and a major tool in new material characterization [2–4]. In addition, the technology has been applied in biomedical detection [5,6] and surface roughness estimation [7]. The working principles of ellipsometry are as follows. An ellipsometer experimentally measures the samples’ changing effects on a light beam’s polarization state, quantified by Mueller matrix or ellipsometric parameters. The experimental results are then fitted to an optical model to extract the sample’s critical dimensions and/or optical properties. The latest development in ellipsometry features spectroscopic Mueller matrix ellipsometry, capable of parallel broadband wavelength scanning for high speed as well as characterizing complex samples exhibiting inhomogeneity and anisotropy. Spectroscopic Mueller matrix ellipsometry is majorly achieved through rotating-component ellipsometers, which are the most popular type of commercial ellipsometry products [8,9]. Recently in academia, the Mueller matrix ellipsometer has been increasingly used to characterize anisotropic materials. In addition, the semiconductor industry is developing towards finer dimension and higher complexity from 2D to 3D, also calling for improved Mueller matrix measurements. Therefore, this paper focuses on spectroscopic Mueller matrix ellipsometry.

Discrete Fourier Transform is the most recognized signal demodulation approach to rotating-component spectroscopic ellipsometry. The light intensity of a rotating-component ellipsometer is a time-varying spectrum, and the time-domain intensity output is a sum of harmonics at any specific wavelength. Signal demodulation involves two steps: solving for the Fourier coefficients from the intensity signal, and in turn solving for the Mueller matrix from the Fourier coefficients.The most popular methods to solve for the Fourier coefficients are Discrete Fourier Transform (DFT) [9,10] and Hadamard transform [8,11], with the same underlying approach. The work in [11] unifies the two methods by establishing the relationship between the Fourier coefficients found by DFT and by Hadamard equations.

Many of the current demodulation methods, such as DFT, are not designed to tackle the signal-dependent noise. Poisson-Gaussian mixed noise exists in an optical system, and introduces errors on measurements. Gaussian noise represents the random and thermal noise. Poisson noise rises from the photon detection being a Poisson process, and its presence means the noise has strong correlation with the signal, and can potentially induce significant errors on the Mueller matrix and in turn the sample measurement. The error propagation analysis and error estimation for both systematic error and random noise have been studied in [12], covering major error sources including setup errors, random errors in experiment, and reconstruction errors. In addition, the paper accounts for other types of random noise than Poisson-Gaussian noise. Excellent works have been devoted to DFT and its advanced variants in optical system: Fourier domain peak phase (FDPP), Fast Peak Locating (FPL), and linear regression (LR) have been applied to estimate the optical path in spectral interferometry in the presence of additive Gaussian noise [13]. However, the above-mentioned methods do not account for the noise at the signal frequency or signal-noise correlation. To achieve Poisson denoising, an efficient way is transforming Poisson to white Gaussian distribution through Anscombe transform. However, signal in rotating-component ellipsometry changes in the temporal domain, and a different Gaussian distribution is generated after transformation for each time point on the signal. The Fourier coefficients are then obtained from all signal points with varying Gaussian distribution properties, and therefore the popular filters for white Gaussian noise such as DFT do not apply. Other works use redundant spatial or temporal information to achieve high imaging performance [14], but the methods do not apply to temporal varying ellipsometry. In addition to signal post-processing, there are studies seeking to reduce the sensitivity to noise through setup optimization, and choosing specific polarization generation and analysis states in order to minimize the measurement variance when the system is subject to Gaussian and/or Poisson noise [15–18]. These works take an important step towards noise mitigation, but is only suitable for discrete measurement method. This paper focuses on continuously-rotating measurement method, having its advantages of being compatible with high-speed and accurate measurement and imposing fewer motion control requirements.

This paper achieves two goals. The first goal is a detailed error characterization and quantification in terms of normalized Mueller matrix calculation in the presence of Poisson-Gaussian noise. Specifically, the distribution of the normalized Mueller matrix elements error is derived to analyze nonlinear propagation occurring at the normalization step. The noise is firstly characterized and quantified through experiments on an in-house setup. Through error propagation, the distribution of normalized Mueller matrix error is then modeled when the system is subject to Poisson-Gaussian noise, and its statistical properties are derived and verified. The model accounts for system parameters including the signal strength, the signal sampling frequency, and the correlation between the noise and signal. The effects of the parameters on the induced errors are also studied.

This paper secondly proposes an alternative approach to DFT in spectroscopic ellipsometry demodulation, in order to reduce the effects of Poisson-Gaussian noise. The major improvement from DFT is that the proposed method accounts for the signal’s probabilistic distribution induced by the Poisson-Gaussian noise. The method then solves for the Fourier coefficients by maximizing the probability of observed signal. The improved performance of the method is demonstrated against DFT in terms of Mueller matrix measurement accuracy and sample dimension reconstruction accuracy.

The rest of this paper develops as follows. Section 2 presents the probabilistic analysis of the induced normalized Mueller matrix error, and the proposed maximum likelihood method of signal demodulation. Section 3 presents the characterization of noise through experiments, verification of the normalized Mueller matrix error analysis through Monte Carlo simulations, and the proposed method’s capability of achieving higher Mueller matrix accuracy as well as higher dimension precision. The paper concludes in Section 4.

2. Methodology

This section firstly quantifies the measurement uncertainty induced by Poisson-Gaussian mixed noise. The probabilistic distributions of the normalized Mueller matrix elements are derived, along with its properties derived as functions of the system and noise parameters. This section secondly proposes an alternative maximum likelihood method to spectroscopic ellipsometry demodulation for improved accuracy.

2.1 Probabilistic analysis of induced normalized Mueller matrix error

In this subsection, the probabilistic analysis of the induced error on normalized Mueller matrix is performed.

A linear transformation can be established between the measured signal and unnomalized Mueller matrix elements. The transformation could be decomposed into two parts: the linear transformation from the signal to the Fourier coefficients, and the linear transformation from the Fourier coefficients to the unnormalized Mueller matrix elements. As to the first part, the output of a rotating-component spectroscopic ellipsometry is a time-varying spectrum resulting from the rotation of designated polarizing components. The light intensity is a sum of harmonics at any specific wavelength. Broadband waveform integration is commonly adopted: the spectrometer measures the integrated intensity over a period of time. An integrated light intensity is given in Eq. (1), where $S_k$ is the $k_{\mathrm {th}}$ integrated intensity, $\Delta t$ is the integration duration, $I_o$ is the DC term, $2N$ is the maximum harmonic order, $\alpha _{2n}$ and $\beta _{2n}$ are the Fourier coefficients associated with order $2n$, and $\omega$ is the rotating components’ fundamental frequency.

(1)$$S_k = \Delta t \cdot I_o + \sum_{n=1}^{N} \frac{\sin(n \omega \Delta t)}{n \omega} \cos[n \omega (2k-1) \Delta t] \alpha_{2n} + \sum_{n=1}^{N} \frac{\sin(n \omega \Delta t)}{n \omega} \sin[n \omega (2k-1) \Delta t] \beta_{2n}$$

Each measured light intensity is therefore a linear function of the DC term and the Fourier coefficients, which are to be determined during signal demodulation. Up to now, the formalism follows the well-defined Hadamard equation method [11]. Define a column vector $\mathbf {\theta }$ consisting of the DC term and the Fourier coefficients as in Eq. (2). Equation (1) can then be rewritten as Eq. (3), where $\phi _k$ is the coefficient vector associated with the $k_{\mathrm {th}}$ integrated intensity.

(2)$$\mathbf{\theta} = [I_o, \alpha_2,\ldots, \alpha_{2N}, \beta_2,\ldots, \beta_{2N}]^T$$

(3)$$S_k = \mathbf{\phi}_k^T \mathbf{\theta}$$

Integrated light intensities are measured consecutively with the same integration time, leading to a system of linear equations as in Eq. (4), where $\mathbf {S}$ is the integrated intensities vector, and $K$ is the number of integrated intensities in one optical period. The elements in coefficient matrix $\mathbf {\Phi }$ involve the rotation speed and sampling frequency.

(4a)$$\mathbf{S}= \begin{bmatrix} S_1 & \cdots & S_k & \cdots & S_K \end{bmatrix} \mathrm{^T}$$

(4b)$$\mathbf{\Phi} = \begin{bmatrix} \mathbf{\phi_1} & \cdots & \mathbf{\phi_k} & \cdots & \mathbf{\phi_K} \end{bmatrix} \mathrm{^T}$$

(4c)$$\mathbf{S} = \mathbf{\Phi} \cdot \mathbf{\theta}$$

A linear relationship can further be established between the unnormalized Mueller matrix elements and the integrated light intensities signal. The unnormalized Mueller matrix elements are solved from the DC term and the Fourier coefficients through a system of linear equations, well established in literature [19]. Here the transformation is written in Eq. (5). The elements of the coefficient matrix $B_1$ involve the optical components’ initial positions and the compensators’ retardance. Equation (6) describes the demodulation, a linear transformation from the integrated intensities to unnormalized Mueller matrix elements. Here the unnormalized Mueller matrix is reshaped to a 16 by 1 column vector $M^u$. The linear regression solution is given in Eq. (6) and (7).

(5)$$\mathbf{\theta} = B_1 \cdot M^u$$

(6)$$M^u = B_2 \cdot S$$

(7)$$B_2 = (B_1^T B_1)^{{-}1} \cdot B_1^T \cdot( \mathbf{\Phi}^T \mathbf{\Phi})^{{-}1} \cdot \mathbf{\Phi}^T$$

An unnormalized Mueller matrix element can be modeled as following a Gaussian distribution. As the number of experiments increases, each integrated intensity follows a Gaussian distribution according to the Central Limit Theorem (CLT). In the case of mixed Poisson-Gaussian noise, the variance of each integrated intensity is linearly related with its magnitude as in Eq. (8), where $\sigma _k^2$ is the variance associated with the $k_{\mathrm {th}}$ integrated intensity. Both coefficients $\gamma _1$ and $\gamma _2$ are constants. The offset $\gamma _2$ represents Gaussian noise. The first order coefficient $\gamma _1$ originates from the photon shot noise. When the spectral irradiance incidents on the photodetector, the generated photoelectrons follow a Poisson distribution. The scaling factor $\gamma _1$ represents the integrated intensity being proportional to the generated photoelectrons [20].

(8)$$\sigma_k^2 = \gamma_1 \cdot S_k + \gamma_2$$

The linear relationship is confirmed through experiments on an in-house setup and presented in the next section. From Eq. (6), an unnormalized Mueller matrix element is a linear summation of integrated intensities. Given the assumption of independent integrated intensities and Gaussian distribution of each integrated intensity, the unnormalized Mueller matrix elements follow Gaussian distributions, as in Eq. (9), where $M_k^u$ is the $k_{\mathrm {th}}$ element in $M^u$, $A_k$ is the $k_{\mathrm {th}}$ row of A, $S_n$ is a vector consisting of the noisy integrated intensity variables.

(9)$$M_k^u =A_k \cdot S_n$$

The properties of error distribution are derived for each normalized Mueller matrix element. The Mueller matrix is only independent of the incidence intensity after normalization. When the matrix is normalized to the first element in the first row as in Eq. (10), the normalized Mueller matrix describes the change in polarization state without accounting for the change in light intensity. In Eq. (10), $M^n$ is the column vector consisting of the normalized Mueller matrix elements, and $M_1^u$ is the first element in $M^u$.

(10)$$M^n = \frac{M^u}{M_1^u}$$

Combined with Eq. (6), the normalized Mueller matrix elements are given in Eq. (11), where $A_1$ is the first row of A, and $A_k$ is the $k_{\mathrm {th}}$ row of A.

(11)$$M_k^n = \frac{A_k \cdot S_n}{A_1 \cdot S_n}$$

Each element in the normalized Mueller matrix therefore follows a ratio distribution from the quotient of two Gaussian variables, or a Cauchy distribution, which does not have a finite mean or variance. However, all normalized Mueller matrix elements pass a Lilliefors test with 1% significance level in simulations, and therefore can be approximated with a Gaussian distribution. This approximation is verified through Monte Carlo simulations in the next section. The properties of the approximated Gaussian distributions are derived as follows. The mean and variance of the numerator and denominator are derived in Eq. (12) and Eq. (13) respectively, where $Q_N$ denotes the numerator of the normalized Mueller matrix elements, $\Delta S$ is the noise vector, $Q_D$ denotes the denominator of the normalized Mueller matrix elements, and $\odot$ represents element-wise multiplication.

(12a)$$Q_N = A_k \cdot S_n = A_k (\mathbf{S}+\Delta S)$$

(12b)$$\mu_{Q_N} = A_k \cdot \mathbf{S}$$

(12c)$$\sigma^{2}_{Q_N} = A_k \odot A_k \cdot \sigma^2_S = A_k \odot A_k \cdot (\gamma_1 \cdot \mathbf{S} + \gamma_2)$$

(13a)$$\mu_{Q_D} = A_1 \cdot \mathbf{S}$$

(13b)$$\sigma^{2}_{Q_D} = A_1 \odot A_1 \cdot (\gamma_1 \cdot \mathbf{S} + \gamma_2)$$

The covariance of the numerator and denominator is calculated in Eq. (14), where $A_{kj}$ is the element of A in the $k_{th}$ row and $j_{th}$ column, similarly $A_{1i}$ is the element of A in the first row and the $i_{th}$ column, $A_{ki}$ is the element of A in the $k_{th}$ row and the $i_{th}$ column, and $\mathcal {N}$ refers to a normal distribution. The simplification in Eq. (14c) is based on independent noise from each integrated light intensity. The expectation in Eq. (14d) is obtained from the property of a Chi-squared distribution.

(14a) $$cov(Q_N,Q_D) = E[(A_1 S_n)(A_k S_n)] -E[A_1 S_n]E[A_k S_n]$$

(14b)$$= E[ (A_1 \mathbf{S})(A_k \mathbf{S}) + (A_1 \mathbf{S})(A_k \Delta S) + (A_1 \Delta S)(A_k \mathbf{S}) + (A_1 \Delta S)(A_k \Delta S)] - (A_1 \mathbf{S})(A_k \mathbf{S})$$

(14c)$$= E[(\sum_{i=1}^n A_{1i}\Delta S_i)(\sum_{j=1}^n A_{kj} \Delta S_j)] = \sum_{i=1}^n E[(A_{1i} \Delta S_i)(A_{ki} \Delta S_i)]$$

(14d)$$= \sum_{i=1}^n A_{1i}A_{ki} \sigma_{\Delta S} E[(\mathcal{N}(0,1))^2]$$

(14e)$$= A_1 \odot A_k (\gamma_1 S + \gamma_2)$$

The mean and variance of the approximated Gaussian distribution of a normalized Mueller matrix element, $E[M_k^n]$ and $\sigma ^2_{M_k^n}$, are then obtained from a second-order Taylor expansion as in Eq. (15), where $\mu _N$ and $\mu _D$ represent the mean of the numerator and denominator respectively.

(15a)$$E[M_k^n] = \frac{\mu_N}{\mu_D} + \frac{\sigma_D^2 \cdot \mu_N}{\mu_D^3} - \frac{cov(Q_N,Q_D)}{\sigma_D^2}$$

(15b)$$\sigma^2_{M_k^n} = \frac{\sigma_D^2 \cdot \mu_N^2}{\mu_D^4} + \frac{\sigma^2_N}{\mu^2_D} - \frac{2 cov(Q_N,Q_D) \cdot \mu_N}{\mu_D^3}$$

Error of the $k_{\mathrm {th}}$ normalized MM element is the difference between $M_k$ and the true value, which is a constant, and therefore is also a Gaussian distribution. The mean and variance of the error are $\mu _e$ and $\sigma _e^2$, and are given in Eq. (16), where $\sigma _D$ represents the standard deviation of the denominator.

(16a)$$\mu_e = \frac{\sigma_D^2\cdot \mu_N}{\mu_D^3} - \frac{cov(Q_N,Q_D)}{\sigma_D^2}$$

(16b)$$\sigma_e^2 = \sigma^2_{M_k^n}$$

Finally, the mean absolute error of a normalized Mueller matrix is a folded normal distribution, and its expectation $E[|\Delta M_k^n|]$ is given by Eq. (17).

(17)$$E[|\Delta M_k^n|] = \sigma_e \sqrt{\frac{2}{\pi}} e^{-(\mu_e^2/2\sigma_e^2)} - \mu_e \left(1-2h(\frac{\mu_e}{\sigma_e})\right)$$

where $h$ is the normal cumulative distribution function.

2.2 Maximum likelihood method in spectroscopic ellipsometry

This subsection derives the maximum likelihood formalism for general rotating-component spectroscopic elllipsometry signal demodulation, as well as its solution as a system of nonlinear equations.

The maximum likelihood method described here solves for the Fourier coefficients, taking into account the probabilistic distribution of the signal. In the existing Hadamard method, to obtain the Fourier coefficients, linear regression minimizes the root mean squared error (RMSE) of the expected integrated intensities. Linear regression is based on the assumption of independent and identical noise distribution at each integrated intensity, which is violated in the case of the ellipsometry system. In practice, the ellipsometry system is subject to dominant Poisson-Gaussian noise, and therefore the noise distribution depends on the signal, and its distribution varies from each integrated intensity to another. Instead of minimizing the RMSE of the expected integrated intensities, this paper finds the Fourier coefficients by maximizing the probability of the signal as in Eq. (18), where $\hat {\mathbf {\theta }}$ is the expected unknown parameters, and $S_k^n$ is the $k_{\mathrm {th}}$ element of $S^n$.

(18)$$\hat{\mathbf{\theta}} = \underset{\mathbf{\theta}}{\mathrm{argmax}} \; P( \{S_1^n,\ldots, S_k^n,\ldots, S_K^n\} | \mathbf{\theta} )$$

The probability of one integrated intensity taking on its observed value is in Eq. (19) when it follows a Gaussian distribution, where $P(S_k^n)$ is the probability of the $k_{\mathrm {th}}$ integrated intensity taking on the value of $S_k^n$, $\sigma _k$ is the variance of the $k_{\mathrm {th}}$ integrated intensity. The probability of all observations assuming independence is shown in Eq. (20), where $p$ is the likelihood function of the all integrated intensities taking on the observed values.

(19)$$P(S_k^n) = \frac{1}{\sqrt{2\pi\sigma_k^2}} \mathrm{exp} \left[\frac{-(S_k^n - S_k)^2}{2 \sigma_k^2} \right]$$

(20)$$p = P( \{S_1^n,\ldots, S_k^n,\ldots, S_K^n\} | \mathbf{\theta} ) = \prod_{k=1}^{K} \frac{1}{\sqrt{2\pi\sigma_k^2}} {\mathrm{exp}} \left[\frac{-(S_k^n-S_k)^2}{2\sigma_k^2} \right]$$

The clean signal is unknown, and the expected signal is used instead as in Eq. (21),where $\hat {S}_k$ is the expected $k_{\mathrm {th}}$ integrated intensities. Taking the natural log of the likelihood function yields the log-likelihood function in Eq. (22).

(21)$$p = \prod_{k=1}^K \left(\frac{1}{2\pi\sigma_k^2} \right)^{\frac{1}{2}} {\mathrm{exp}} \left[\frac{-(\hat{S}_k-S_k^n)^2}{2\sigma_k^2} \right]$$

(22)$$\ln p ={-}\frac{K}{2} \ln(2\pi) - \frac{1}{2} \sum_{k=1}^{K} \ln \sigma_k^2 - \frac{1}{2}\sum_{k=1}^{K}\frac{1}{\sigma_k^2} (\mathbf{\phi}_k^T\mathbf{\theta}-S_k^n)^2$$

Due to the monotonic nature of the natural log function, the objective is to maximize the log-likelihood function. After sign flip, the formalism is equivalent to minimizing the objective function f, defined in Eq. (23a). The optimization problem is shown in Eq. (23). The constraint function $g_k(\theta )$ dictates that all the predicted integrated intensities are non-negative as in Eq. (23c). The optimal condition is described by Eq. (23d) through (23h) from the Karush-Kuhn-Tucker (KKT) conditions, where H is the Hessian matrix, and $\mu _k$ is a slack variable.

(23a)$$f = \sum_{k=1}^{K} \ln \sigma_k^2 + \sum_{k=1}^{K}\frac{1}{\sigma_k^2} (\mathbf{\phi}_k^T\mathbf{\theta}-S_k^n)^2$$

(23b)$$\hat{\mathbf{\theta}} = \underset{\mathbf{\theta}}{\mathrm{argmin}} \; f$$

(23c)$$g_k(\mathbf{\theta}) ={-}\mathbf{\phi}^T\mathbf{\theta} \leq 0 \;\; \forall \; k = 1,\ldots,K$$

(23d)$$\frac{\partial f}{\partial \mathbf{\theta}} + \sum_{k=1}^{K} \frac{\partial g_k}{\partial \mathbf{\theta}} = \mathbf{0}$$

(23e)$$H = \frac{\partial^2 f}{\partial \mathbf{\theta}^2} \; \textrm{is positive definite}$$

(23f)$$\mu_k \geq 0 \;\; \forall \; k = 1,\ldots,k,\ldots,K$$

(23g)$$\mu_k \mathbf{\phi} = 0 \;\; \forall \; k = 1,\ldots,k,\ldots,K$$

(23h)$$-\mathbf{\phi}_k' \mathbf{\theta} \leq 0 \;\; \forall \; k = 1,\ldots,k,\ldots,K$$

After equation manipulation and simplification, Eq. (24) gives the solution to the optimization problem.

(24a)$$\sum_{k=1}^{K} \left[ \frac{\gamma_1+\mathbf{\phi}_k^T\mathbf{\theta}}{\gamma_1\mathbf{\phi}_k^T\mathbf{\theta}+\gamma_2} + \frac{\gamma_2\mathbf{\phi}_k^T\mathbf{\theta}-2\gamma_2 S_k^n-\gamma_1 (S_k^n)^2}{(\gamma_1\mathbf{\phi}_k^T\mathbf{\theta}+\gamma_2)^2} - \mu_k \right] \cdot \phi_k = \mathbf{0}$$

The Hessian matrix H is positive definite with elements $H_{ij}$ given by

(24b)$$\begin{aligned} H_{ij} & = \\& \sum_{k=1}^K \left[ \frac{(\gamma_2-\gamma_1^2)\phi_{ki}\phi_{kj}}{(\gamma_1\mathbf{\phi}_k^T\mathbf{\theta}+\gamma_2)^2} + \frac{\gamma_1\phi_{ki}\phi_{kj}(\gamma_1\mathbf{\phi}_k^T\mathbf{\theta}-2\gamma_2\mathbf{\phi}_k^T\mathbf{\theta}+\gamma_2+4\gamma_2S_k^n+2\gamma_1 (S_k^n)^2)}{(\gamma_1\mathbf{\phi}_k^T\mathbf{\theta}+\gamma_2)^3} \right] \end{aligned}$$

(24c)$$\mu_k \geq 0 \;\; \forall \; k = 1,\ldots,k,\ldots,K$$

(24d)$$\mu_k \mathbf{\phi} = 0 \;\; \forall \; k = 1,\ldots,k,\ldots,K$$

(24e)$$-\mathbf{\phi}_k^T \mathbf{\theta} \leq 0 \;\; \forall \; k = 1,\ldots,k,\ldots,K$$

3. Experiments & Simulations

This section presents experiments and Monte Carlo simulations. The noise is experimentally characterized, the probabilistic analysis of the induced Mueller matrix error is verified through Monte Carlo simulations, and the performance of the proposed maximum likelihood method is evaluated in terms of Mueller matrix errors and dimension errors.

3.1 Experimental noise characterization

To characterize and quantify the noise, measurements are taken on an in-house dual rotating-compensator ellipsometer (DRCE) setup. The setup consists of a light source, a polarizer (P), the first compensator ($\mathrm {C_{1}}$), a sample (S), the second compensator ($\mathrm {C_{2}}$), an analyzer (A), and a detector shown in Fig. 1. During the experiment, the setup is kept stationary in its transmission mode. The spectrometer reads a spectrum at 2068 wavelengths, and 3000 integrated intensities are taken consecutively. The same procedure is repeated for an integration time of 30ms and 100ms, and at several light source intensities. Ideally the integrated intensities are identical given the same wavelength, integration time, and light source intensity. However the integrated intensities involve noise.

Fig. 1. DRCE setup illustration

Download Full Size | PDF

The experiments confirmed dominant Poisson-Gaussian mixed noise, and obtained the distribution properties. Figure 2 plots the signal variance versus its mean and the linear fit. The integrated light intensity depends on the integration time, light spot size (varied by adjusting iris aperture). Given the light source in this setup, the intensity is also different at different wavelengths. Intensities are collected at different combinations of integration time, spot size, and wavelength. For each set of parameters, the variance and mean are calculated from 3000 integrated intensities. The variance is observed to be linearly correlated with the mean. The first-order behavior represents a Poisson process in photon detection. The first-order coefficient is 0.32, representing the property of the CCD detector. The offset magnitude is 51.95 from fitting, owing to a Gaussian process. From experiments, this system is subject to dominant Poisson noise and nondominant Gaussian noise. Higher orders of the variance-mean curve represent other noise sources [12]. In this paper, Poisson-Gaussian is recognized as the dominant random noise in the setup, and the focus of this paper is the study of its effects in signal processing and mitigation. Figure 3 shows the histogram of integrated intensities at 873nm using a 30ms integration time and fixed light source intensity. The integrated intensities follow a Gaussian distribution according to the Central Limit Theorem. In Fig. 4, the noise’s auto-correlation is plotted for a lag of 1 to 100 integration period, and has a maximum magnitude of 0.043, indicating the noise from integrated intensities are independent or very weakly correlated.

Fig. 2. Light intensity variance vs. mean

Download Full Size | PDF

Fig. 3. Integrated intensity histogram at 873nm

Download Full Size | PDF

Fig. 4. Integrated intensities autocorrelation

Download Full Size | PDF

3.2 Monte Carlo simulation verifying Mueller matrix error analysis

In order to verify the probabilistic analysis in Section 2., Monte Carlo simulations are carried out. The expressions of induced normalized Mueller matrix error are verified, as well as the Gaussian distribution approximation of normalized Mueller matrix elements. Monte Carlo simulations are carried out for different values of light intensity, sampling frequency, and noise magnitude. For each set of parameters, the experiment is simulated for 4000 times. For each simulated noise-free integrated intensity, the probabilistic distribution of noise is obtained, where the variance depends on the noise-free integrated intensity. By sampling from the distribution, the noisy integrated intensity is simulated. Simulation results for a grating sample are presented. The $\mathrm {SiO_{2}}$ grating sample has a pitch of 3 $\mu$m, a step height of 1 $\mu$m, and its theoretical Mueller matrix is computed through rigorous coupled-wave analysis (RCWA) at an incident wavelength of 384 nm, an incident angle of 55$^{\circ }$, and an azimuthal angle of 0$^{\circ }$.

The Gaussian approximation of normalized Mueller matrix elements is verified through simulations. All simulations are conducted at fixed nominal normalized MM elements values, obtained from simulation on the above-mentioned grating. Due to presence of Poisson-Gaussian noise, the values of each normalized MM element form a distribution. For example, the fixed nominal value of $M_{33}^n$ is 0.4437, and the simulated measurements form a distribution ranging from -0.5116 to -0.3604. Figure 5(a) shows the histogram of normalized Mueller matrix elements measurements from simulations. The blue bars represent the probability density function from the simulations, and the red lines show the Gaussian fit. Figure 5(b) shows the probability-probability plot (P-P plot) of each MM element against a Gaussian distribution. The blue markers show the data from simulations, and the red line shows the normal distribution reference line. The match further validates the statement that each normalized MM element statistically form a Gaussian distribution. All normalized Mueller matrix elements present a valid Gaussian distribution, except for $M_{11}$, which is at a constant value of 1, due to the normalization with respect to $M_{11}$. The normal probability plot of other Mueller matrix elements can be found in Fig. 8 in the Appendix. The Gaussian approximation is also examined and verified for a smaller number of simulations. For example, the Gaussian approximations pass a Lilliefors test with 2% significance level for 200 simulations.

Fig. 5. $M_{33}^n$ has a normal distribution from Monte Carlo simulations

Download Full Size | PDF

The expression for induced Mueller matrix error is then verified. During each simulation, the mean absolute error of the Mueller matrix elements is calculated by averaging the absolute error over all elements. The mean absolute error is then averaged over all simulations. The effects of three parameters are studied, namely the first-order coefficient between an integrated intensity’s variance and mean $\gamma _1$, the sampling frequency, and the light intensity. The effect of varying coefficient $\gamma _1$ is presented because Poisson noise is the dominant random noise in the system. By inspecting the simulation results, the Monte Carlo simulations ensure convergence. In Fig. 6, each subfigure represents a scenario with a different combination of light intensity and sampling frequency. The light intensity is increased from left to right, with the averaged light intensity $\bar {S}$ obtained by averaging the noise-free integrated intensities. The sampling frequency is increased from top to bottom. Two parameters, optical period and sampling frequency, can both affect the measurement accuracy, and the effects are dependent. By fixing one parameter, the effects of the other can be studied. In the simulations, the optical period is fixed at 36s, and the sampling frequency is varied, and therefore the number of integrated intensities $K$ is varied. In each subfigure, the mean absolute error is plotted with respect to the variance-mean first-order coefficient. The red curves are obtained from the probabilistic analysis, and the blue curves are obtained from Monte Carlo simulations. The match indicates that the probabilistic analysis is verified by Monte Carlo simulation. In all scenarios, the error increases with increased $\gamma _1$ as expected. The error significantly decreases with increased light intensity. However, high intensity is not always feasible. For example, larger electrode gaps are required for the operation of higher power lamps, where the arc is susceptible to wandering [9]. Moreover, to be compatible with high-throughput requirements, the ellipsometer should operate with high speed, and therefore the integration time is shortened, and the integrated intensity magnitude is lowered. Lastly, the signal strength could be further lowered when the sample is absorbing. With respect to the effects of sampling frequency, the error slightly decreases with increasing sampling frequency. However, when operating at higher speed, only fewer integrated intensities can be obtained within one optical period with the same detector sampling capability.

Fig. 6. The induced measurement error increases with $\gamma _1$

Download Full Size | PDF

3.3 Improvements from maximum likelihood method

The improved performance of the proposed maximum likelihood method is demonstrated comparing to the existing DFT method. The Fourier coefficients are solved from the maximum likelihood formalism in MATLAB using fsolve function. The performance indicators are the Mueller matrix mean absolute error and the grating’s reconstructed pitch. Table 1 tabulates the reduction in averaged Mueller matrix mean absolute error by using the maximum likelihood method compared to using DFT in Monte Carlo simulations. In all scenarios, the maximum likelihood method outperforms the linear regression method. Intuitively the maximum likelihood accounts for the varying variance and puts less confidence on the noisier integrated intensities.

Table 1. Percent reduction in Mueller matrix mean absoute error

View Table

The performance of the maximum likelihood method is then evaluated in terms of reconstructed grating pitch. During the reconstruction, the RCWA model is realized with different pitch values, and the closest match to the measurement is used to determine the sample pitch. In this paper, the reconstruction has a resolution of 1nm. Due to the presence of noise, the reconstructed pitch value shows a variation. Figure 7(a) shows the histogram of the reconstructed pitch, where the blue and red bars are the results from the DFT and maximum likelihood method respectively. Figure 7(b) presents another scenario where the light intensity is increased, and therefore the error is reduced. In both cases, the result from the maximum likelihood method is more centered at the nominal value of 3000nm. The maximum likelihood method reduces the standard deviation from 3.9nm to 3.8nm in the first case, and from 2.3nm to 2.1nm in the second case. The result shows that the signal-correlated noise, specifically Poisson-Gaussian noise, can induce large errors on the reconstructed dimension, and the maximum likelihood method mitigates the effect to some extend.

Fig. 7. Histogram of reconstructed pitch from a signal wavelength

Download Full Size | PDF

4. Conclusion

Ellipsometry, as a metrology and inspection tool, plays an important role in many industries. However, the existence of Poisson-Gaussian mixed noise can significantly affect the measurement accuracy in ellipsometry. The noise is characterized and quantified through experiments on an in-house setup, and the linear relationship between the signal variance and mean is quantified. This paper quantifies the induced measurement error by deriving the expression for the expected error of normalized Mueller matrix elements. The expression is verified through Monte Carlo simulations for various scenarios of light intensity, sampling frequency, and the first order coefficient of the integrated intensity’s variance and mean. As expected, the error is reduced with increasing signal strength, increasing sampling frequency, and decreasing first order coefficient of the signal’s variance to mean. The method can be also generalized to other types of random noises such as light source uncertainty using the same procesure. The error quantification serves as a guide in identifying the system’s optimal operating conditions and providing an evaluation to measurements.

This paper then proposes a maximum likelihood estimation approach to signal demodulation in spectroscopic ellipsometry, in order to mitigate the effects of Poisson-Gaussian noise. The method is compared to the existing DFT method, and shows improvements in terms of normalized Mueller matrix accuracy and dimension reconstruction performance. The method could be applied in many applications such as semiconductor metrology and surface roughness estimation and provide more accurate measurements.

Appendix

Fig. 8. Normal probability plot of normalized MM values from simulations

Download Full Size | PDF

Funding

Wuxi Friedrich Measurement and Control Instruments Co., Ltd..

Disclosures

The authors declare no conflicts of interest.

Data Availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon request.

References

1. N. G. Orji, M. Badaroglu, B. M. Barnes, C. Beitia, B. D. Bunday, U. Celano, R. J. Kline, M. Neisser, Y. Obeng, and A. Vladar, “Metrology for the next generation of semiconductor devices,” Nat. Electron. 1(10), 532–547 (2018). [CrossRef]

2. L. Augel, I. Fischer, F. Hornung, M. Dressel, A. Berrier, M. Oehme, and J. Schulze, “Ellipsometric characterization of doped ge 0.95 sn 0.05 films in the infrared range for plasmonic applications,” Opt. Lett. 41(18), 4398–4400 (2016). [CrossRef]

3. R. Synowicki, C. M. Herzinger, J. T. Hall, and A. Malingowski, “Optical constants of electroplated gold from spectroscopic ellipsometry,” Appl. Surf. Sci. 421, 824–830 (2017). [CrossRef]

4. G. Jayaswal, Z. Dai, X. Zhang, M. Bagnarol, A. Martucci, and M. Merano, “Measurement of the surface susceptibility and the surface conductivity of atomically thin mos 2 by spectroscopic ellipsometry,” Opt. Lett. 43(4), 703–706 (2018). [CrossRef]

5. Z. F. Yu, C. W. Pirnstill, and G. L. Coté, “Dual-modulation, dual-wavelength, optical polarimetry system for glucose monitoring,” J. Biomed. Opt. 21(8), 087001 (2016). [CrossRef]

6. A. B. Bulykina, V. A. Ryzhova, and V. V. Korotaev, “In vivo skin surface study by scattered ellipsometry method,” in Optical Methods for Inspection, Characterization, and Imaging of Biomaterials IV, vol. 11060 (International Society for Optics and Photonics, 2019), p. 110601F.

7. B. Fodor, P. Kozma, S. Burger, M. Fried, and P. Petrik, “Effective medium approximation of ellipsometric response from random surface roughness simulated by finite-element method,” Thin Solid Films 617, 20–24 (2016). [CrossRef]

8. D. H. Kim, Y. H. Yun, and K.-N. Joo, “Sparse (spatially phase-retarded spectroscopic ellipsometry) for real-time film analysis,” Opt. Lett. 42(16), 3189–3192 (2017). [CrossRef]

9. H. Tompkins and E. A. Irene, Handbook of ellipsometry (William Andrew, 2005).

10. L. Jin, Y. Iizuka, T. Iwao, E. Kondoh, M. Uehara, and B. Gelloz, “Calibration of the retardation inhomogeneity for the compensator-rotating imaging ellipsometer,” Appl. Opt. 58(33), 9224–9229 (2019). [CrossRef]

11. Y. J. Cho, W. Chegal, and H. M. Cho, “Fourier analysis for rotating-element ellipsometers,” Opt. Lett. 36(2), 118–120 (2011). [CrossRef]

12. X. Chen, S. Liu, H. Gu, and C. Zhang, “Formulation of error propagation and estimation in grating reconstruction by a dual-rotating compensator mueller matrix polarimeter,” Thin Solid Films 571, 653–659 (2014). [CrossRef]

13. C. Li, S. Chen, and Y. Zhu, “Maximum likelihood estimation of optical path length in spectral interferometry,” J. Lightwave Technol. 35(22), 4880–4887 (2017). [CrossRef]

14. C. DeForest, “Noise-gating to clean astrophysical image data,” The Astrophys. J. 838(2), 155 (2017). [CrossRef]

15. X. Li, H. Hu, L. Wu, and T. Liu, “Optimization of instrument matrix for mueller matrix ellipsometry based on partial elements analysis of the mueller matrix,” Opt. Express 25(16), 18872–18884 (2017). [CrossRef]

16. X. Li, F. Goudail, H. Hu, Q. Han, Z. Cheng, and T. Liu, “Optimal ellipsometric parameter measurement strategies based on four intensity measurements in presence of additive gaussian and poisson noise,” Opt. Express 26(26), 34529–34546 (2018). [CrossRef]

17. N. Quan, C. Zhang, and T. Mu, “Optimal configuration of partial mueller matrix polarimeter for measuring the ellipsometric parameters in the presence of poisson shot noise and gaussian noise,” Photonics Nanostructures-Fundamentals Appl. 29, 30–35 (2018). [CrossRef]

18. K. Meng, B. Jiang, C. D. Samolis, M. Alrished, and K. Youcef-Toumi, “Unevenly spaced continuous measurement approach for dual rotating–retarder mueller matrix ellipsometry,” Opt. Express 27(10), 14736–14753 (2019). [CrossRef]

19. R. Collins and J. Koh, “Dual rotating-compensator multichannel ellipsometer: instrument design for real-time mueller matrix spectroscopy of surfaces and films,” J. Opt. Soc. Am. A 16(8), 1997–2006 (1999). [CrossRef]

20. Y. J. Cho, W. Chegal, J. P. Lee, and H. M. Cho, “Universal evaluations and expressions of measuring uncertainty for rotating-element spectroscopic ellipsometers,” Opt. Express 23(12), 16481–16491 (2015). [CrossRef]

		$γ_{1}$ = 0.1	$γ_{1}$ = 0.32	$γ_{1}$ = 0.5	$γ_{1}$ = 1
K = 50	$\bar{S} = 10^{2}$	3.99%	3.15%	2.89%	2.46%
	$\bar{S} = 10^{3}$	3.47%	4.50%	3.19%	3.04%
	$\bar{S} = 10^{4}$	3.55%	3.33%	3.45%	2.93%
K = 70	$\bar{S} = 10^{2}$	3.59%	3.27%	2.46%	2.63%
	$\bar{S} = 10^{3}$	3.48%	3.25%	3.82%	3.28%
	$\bar{S} = 10^{4}$	3.77%	3.72%	3.03%	3.24%

Quantification and reduction of Poisson-Gaussian mixed noise induced errors in ellipsometry

Abstract

1. Introduction

2. Methodology

2.1 Probabilistic analysis of induced normalized Mueller matrix error

2.2 Maximum likelihood method in spectroscopic ellipsometry

3. Experiments & Simulations

3.1 Experimental noise characterization

3.2 Monte Carlo simulation verifying Mueller matrix error analysis

3.3 Improvements from maximum likelihood method

4. Conclusion

Appendix

Funding

Disclosures

Data Availability

References

Data Availability

Cited By

Figures (8)

Tables (1)

Equations (46)

Optics Express