Source mask optimization using the covariance matrix adaptation evolution strategy

Guodong Chen; Guodong Chen; Sikun Li; Sikun Li; Sikun Li; Xiangzhao Wang; Xiangzhao Wang; Xiangzhao Wang

doi:10.1364/OE.410032

1. Introduction

Lithography is one of the key technologies for the very-large-scale integrated circuits (VLSI) manufacturing. Resolution is one of the three key performance metrics for lithography. As the technology node continues to shrink, resolution enhancement techniques (RETs) [1] are indispensable for overcoming the significant optical proximity effects (OPEs) and improving the lithographic imaging quality, such as resolution, image fidelity and process robustness. Source mask optimization (SMO) proves to be an effective resolution enhancement technique for 22nm technology node and beyond [2].

Rosenbluth et al. introduced the concept of SMO in 2001 [3]. Since then, many researches related to SMO have been carried out, involving optimization algorithms and strategies to accelerate the convergence speed and improve the searching ability for global optimum solution, representation methods of the source/mask to balance the freedom degree and efficiency of optimization, and intensity calculation formulations convenient for different optimization objects. Extended SMO methods improve the imaging quality and process robustness by establishing various evaluation metrics, such as the process window (PW), the normalized image log slope (NILS), the edge placement error (EPE), the pattern error (PE), the mask error enhancement factor (MEEF) and so on.

Early SMO was realized by joint optimization of the parametric source and the mask, with the mask modified via optical proximity correction (OPC) or sub-resolution assist feature (SRAF) insertion. For better image fidelity and larger process window, pixelated SMO methods [4–18] are successively proposed to exploit the higher degree of optimization freedom provided by the freeform source and pixelated mask [19–21].

Pixelated SMO methods include gradient-based methods such as gradient descent (GD) [4], steepest descent (SD) [5,6], conjugate gradient (CG) [7,8], augmented Lagrange [9], depth learning method [10], level-set based method [11,12], compressive sensing based method [13,14], and SMO methods using heuristic algorithms like genetic algorithm (GA) [15,16], particle swarm optimization (PSO) [17], and self-adaptive differential evolution (JADE) [18].

In gradient-based SMO methods, using a sigmoid function to approximate the resist effect [4–9] enables a differentiable merit function, which is convenient for calculating the derivate for the source and mask variables and adjusting the variables along the descent direction of the gradient. However, it is difficult and time-consuming to calculate the gradients of a more complex merit function for a lithographic imaging model considering the complicated resist effect, limiting the applicability of the gradient-based SMO methods. Heuristic algorithms are popular, because they are free of gradient calculations, free of the prior knowledge for the specific problem, and search for a suboptimal solution as an alternative global optimal solution. They are compatible with various lithographic imaging models. The existing SMO methods using heuristic algorithms [15–18] pick out the optimal solution by continuously comparing the fitness values of different encoded individuals and generating new populations. Hence, these methods are easy to implement. Nevertheless, these methods lack control of the solution search space and the search step size during the optimization stage, and make little attempt to find out the best direction for the variables’ adjustment, causing an inefficient convergence.

Using a large number of variables to encode the source pixels’ intensity and the mask pixels’ transmittance permits a rather large solution space for SMO, but suffers from an intensive computation load conversely. Several researchers proposed alternative source representation methods to speed up the source optimization process. Wu et al. introduced the Zernike polynomials as basis functions and characterized the source as a weighted superposition of well-chosen Zernike polynomial functions [22]. The source variables corresponds to the coefficients of chosen Zernike polynomial functions, so the number of source variables reduces significantly. However, the continuity of Zernike polynomials forces the optimized source to be concentrated in certain regions without highlighting the important spatial frequency components of the source [23]. Yang et al. proposed a multipole source representation under the prerequisite of small pupil filling ratio [16]. The source consists of a group of circular poles with finite-size. The source is modulated by tuning the poles’ locations and intensities. Artificial restriction on the number of circular poles according to a given pupil filling ratio exactly improves the convergence efficiency, but the small number of poles may limit the global optimization capacity because of insufficient coverage of the solution space composed of locations and intensities. Moreover, it is noted that the optimized sources using the multipole representation and the full pixel representation are quite different.

While the pixelated source provides many degrees of freedom, only part of the source pixels on behalf of specific illumination directions contribute to improving the imaging quality. Most source pixels usually have zero intensity after source optimization. It is inefficient to optimize all pixels’ intensity without taking advantage of the sparse property of the optimal source.

In this paper, we propose a pixelated SMO method using the covariance matrix adaptation evolution strategy (CMA-ES). CMA-ES is an evolutionary strategy based on Gaussian mutation and deterministic selection [24]. It is one of the best choices against ill-conditioned, non-convex and non-separable black-box optimization problems in the continuous domain. CMA-ES adaptively adjusts the solution search space by adapting the covariance matrix using rank-µ-update and rank-1-update mechanism, and controls the global step size related to the scale of the distribution. Good solutions in former generations have higher probabilities to reappear in later generations. To further identify the most advantageous spatial frequency components of the source, we propose a new source representation method. The source comprises a set of ideal point sources with unit intensity and adjustable positions. The number of the ideal point sources is predetermined based on the total number of source pixels and partial coherence factor of the illumination system. According to the position coordinates, the point sources fall into different source pixels. A source pixel with more point sources inside has a higher relative intensity. The positions of the point sources determine the intensity distribution of the pixelated source and affect the imaging performance. Consequently, the source optimization is converted to the adjustment of the point sources’ positions. The source representation method proposed in this paper reduces the number of source variables and helps to highlight the spatial frequency components contributing to better imaging quality. Lithographic imaging systems with complicated resist models limit the application of the gradient based SMO methods, because it is difficult to calculate the analytical derivate of the merit function. By contrast, the CMA-ES based SMO is compatible with any type of resist model and more applicable than gradient based SMO methods.

Simulation results show that the CMA-ES based SMO together with the new source representation improve the optimization capacity and the convergence efficiency.

2. Methodology

2.1 Forward vector imaging formulation

Figure 1 illustrates the schematic of immersion lithography system. An extended light source is placed at the focal plane of the condenser to generate Köhler illumination [25]. The light rays emitted from the source are diffracted when passing through the mask. Only low-frequency diffracted light rays pass the projection lens due to the low-pass filtering effect of the pupil. When the light rays reach the film stack on the wafer, they expose the resist and change the solubility of the resist. After post exposure bake (PEB) and development, a feature is printed on the wafer. The aim of SMO as well as other types of RETs is to make the feature printed on the wafer as close to the target pattern (TP) as possible.

Fig. 1. Schematic of immersion lithography

Download Full Size | PDF

The imaging process in the lithography system belongs to partial coherence imaging. According to Abbe imaging theory [25], the aerial image is calculated as

(1)$$I(\hat{x},\hat{y}) = \int\limits_{ - \infty }^{ + \infty }\!\!\!{\int {J(\hat{f},\hat{g})\left[ {{{\left|{\int\limits_{ - \infty }^{ + \infty }\!\!\! {\int {H(\hat{f} + {{\hat{f}}^{\prime}},\hat{g} + {{\hat{g}}^{\prime}})O({{\hat{f}}^{\prime}},{{\hat{g}}^{\prime}}){e^{ - i2\pi ({{\hat{f}}^{\prime}}\hat{x} + {{\hat{g}}^{\prime}}\hat{y})}}d{{\hat{f}}^{\prime}}d{{\hat{g}}^{\prime}}} } } \right|}^2}} \right]} d\hat{f}d\hat{g}}. $$

In Eq. (1), $(\hat{x},\hat{y})$ are normalized spatial coordinates in the image plane, $(\hat{f},\hat{g})$ are normalized spatial frequencies in the pupil plane, $({\hat{f}^{\prime}},{\hat{g}^{\prime}})$ are normalized spatial frequencies of the mask spectrum. All of them are dimensionless owing to the normalization operation. $J(\hat{f},\hat{g})$ is the intensity of a point source at $(\hat{f},\hat{g})$. $H(\hat{f},\hat{g})$ denotes the pupil function which covers the effects of defocus, aberration and apodization, besides the original low-frequency pass property with $(\hat{f},\hat{g})$ restricted in the unit circle. $O({\hat{f}^{\prime}},{\hat{g}^{\prime}})$ is the spectrum of the mask. The quantity delimited by the square brackets is the coherent image due to a point source of intensity$J(\hat{f},\hat{g})$. With the pixel-based representation of the source and the mask, Eq. (1) can be calculated numerically. Using the illumination cross coefficient (ICC) representation [26], Eq. (1) is transformed to the format of a matrix-vector multiplication as

(2)$$\tilde{I} = ICC \cdot \tilde{J}. $$

The two-dimensional discrete source J is converted to one-dimensional vector $\tilde{J}$ by stacking the source pixels column by column. Each column of ICC corresponds to the coherent image generated by the source pixels of $\tilde{J}$, just appearing in the form of a column vector by stacking the image matrix column by column. The output vector $\tilde{I}$ is just a simple linear weighted superposition of coherent image vectors. The aerial image I is a rearrangement of $\tilde{I}$ according to the dimensional sizes of mask matrix.

Alternatively, by exchanging integral order in Eq. (1), the Hopkins’ formulation for lithographic imaging is obtained [25]. Transmission cross coefficient (TCC) has been extensively exploited in fast aerial image calculation, because it has integrated all the information of the imaging system except that of the mask. TCC is decomposed into multiple kernels via the singular value decomposition (SVD) as

(3)$$TCC({\hat{f}^{\prime}},{\hat{g}^{\prime}};{\hat{f}^{^{\prime\prime}}},{\hat{g}^{^{\prime\prime}}}) \approx \sum\limits_{i = 1}^K {{\lambda _i}{\Phi _i}({{\hat{f}}^{\prime}},{{\hat{g}}^{\prime}})\Phi _i^\ast ({{\hat{f}}^{^{\prime\prime}}},{{\hat{g}}^{^{\prime\prime}}})}. $$

${\Phi _i}$ is the kernel corresponding to the ${i_{th}}$ singular value ${\lambda _i}$, K denotes the number of kernels to approximate TCC and determines the accuracy of aerial image calculation. Both $({\hat{f}^{\prime}},{\hat{g}^{\prime}})$ and $({\hat{f}^{^{\prime\prime}}},{\hat{g}^{^{\prime\prime}}})$ are normalized spatial frequencies of the mask spectrum. The aerial image is calculated as

(4)$$I = IFFT\left\{ {\sum\limits_{i = 1}^K {\sqrt {{\lambda_i}} {\Phi _i}({f^{\prime}},{g^{\prime}})O({f^{\prime}},{g^{\prime}})} } \right\}. $$

For pixel-based SMO, the above two kinds of aerial image calculation formulations are applied in different optimization stages for higher efficiency. The ICC method is efficient during source optimization, and the TCC method is preferable during mask optimization. One important issue is the aerial image intensity normalization with the open-frame intensity [27].

In this paper, the resist effect is approximated with the sigmoid model. If necessary, the sigmoid model can be replaced with a more accurate and physical resist model. The resist image (RI) is given by

(5)$$RI = sig\{I \}\textrm{ = }\frac{\textrm{1}}{{\textrm{1}\textrm{ +} exp[{\textrm{ - }\alpha \textrm{(}\textrm{I - tr}\textrm{)}} ]}}.$$

The parameter $\alpha $ and $tr$ are the steepness and threshold of the resist, respectively. Besides, we use the constant threshold resist model to generate the printed image (PI) on the wafer directly from the comparison between the aerial image and the threshold,

(6)$$PI = \Gamma ({I - tr} ). $$

$\Gamma ({\cdot} )$ is the hard threshold function, which says $\Gamma (x) = 1,\;if\;x > 0$. The parameter $tr$ is the same as above. Unlike the continuously valued RI, PI is a binary wafer image depending on whether the resist material is removed after development. These two models are essentially the same if we use a threshold of 0.5 to truncate the resist image for the printed wafer image. A threshold of 0.5 for development process is a reasonable assumption, because the resist image intensities are as close to 0 or 1 as possible if SMO is well done, far away from the hard threshold used in the constant threshold resist model.

2.2 Encoding and decoding of the source and mask

Encoding of the source and mask is to use a set of variables to characterize the constructive elements for the source and mask, while decoding of the source and mask is to perform some transformations on the variables to recover the source and mask. For pixelated SMO, both the source and the mask are gridded into pixels in the Cartesian system. Let positive odd numbers ${N_S}$ and ${N_M}$ be the lateral dimensions of the source and the mask, respectively. The pixelated source is an ${N_S} \times {N_S}$ matrix, and the pixelated mask is an ${N_M} \times {N_M}$ matrix.

We use $(\hat{f},\hat{g})$ to denote the spatial frequency coordinates of the source pixels. Every source pixel represents a unique illumination direction, and the pixels’ values reveal the illumination directional intensity distribution. Owing to the restriction of cut-off frequency from the pupil, only those source pixels in the unit circle participate in imaging. The discrete values for $\hat{f}$ are from -1 to 1 with an increment of ${2 / {({{N_S} - 1} )}}$, and $\hat{g}$ shares the same discrete values with $\hat{f}$. The spatial frequency coordinates $(\hat{f},\hat{g})$ match with the position indexes $(m,n)$ for each source pixel with the relationship of $m = {{(\hat{f} + 1)({N_S} - 1)} / \textrm{2}} + 1$, $n = {{(\hat{g} + 1)({N_S} - 1)} / \textrm{2}} + 1,\;\textrm{1} \le m \le {N_S},\;1 \le n \le {N_S},\;m \in {\mathbb Z},\;n \in {\mathbb Z}$. To prevent pattern shift caused by a mono-pole source, we use a 4-fold source which is symmetric with respect to the horizontal and vertical axes. In this paper, we use ${N_{pre}}$ point sources with unit intensity and adjustable positions to represent the source. ${N_{pre}}$ is the number of point sources in the first quadrant. We assume that the point sources are ideal and infinitesimal. The point sources in other quadrants are obtained by mirror operation. Figure 2 shows the source representation with a predetermined number of point sources. The circles with gradient colors indicate the blur effect of the source that will be mentioned later.

Fig. 2. Source representation with a predetermined number of ideal point sources

Download Full Size | PDF

The point sources fall into different source pixels. A source pixel with more point sources inside has a higher relative intensity. The positions of the point sources determine the intensity distribution of the pixelated source and affect the imaging performance.

The source is encoded with all the point sources’ polar coordinates as

(7)$${x_S} = ({{\rho_1},\;{\rho_2},\; \cdots ,\;{\rho_{{N_{pre}}}},\;{\theta_1},\;{\theta_2},\; \cdots ,\;{\theta_{{N_{pre}}}}} ). $$

The number of variables for source encoding is${D_S} = 2{N_{pre}}$. $({{\rho_k},\;{\theta_k}} )$ are the polar coordinates for the ${k_{th}}$ point source, $1 \le k \le {N_{pre}}$, $k \in {\mathbb Z}$, ${\rho _k} \in [{{\sigma_{in}},\;{\sigma_{out}}} ]$, ${\sigma _{in}}$ and ${\sigma _{out}}$ are the inner and outer partial coherence factor for the source, ${\theta _k} \in [{0,{\pi / 2}} ]$. Regardless of the ideal point source’s specific location within the pixel, the ${k_{th}}$ point source falls in the source pixel with index

(8)$$({{m_k},{n_k}} )= \left( {round\left( {{\rho_k}\cos {\theta_k} \times \frac{{{N_S}}}{2} + \frac{{{N_S}}}{2}} \right),\;round\left( {{\rho_k}\sin {\theta_k} \times \frac{{{N_S}}}{2} + \frac{{{N_S}}}{2}} \right)} \right). $$

The function round(x) returns the nearest integer of the input real number x. According to the point sources’ distribution in the first quadrant, the number of point sources falling in the pixel with index (m, n) from the full source matrix is expressed as

(9)$$S(m,n) = \sum\limits_{k = 1}^{{N_{pre}}} {\left[ \begin{array}{l} \delta ({m - {m_k},\;n - {n_k}} )+ \delta ({m + {m_k} - {N_S} - 1,\;n - {n_k}} )+ \\ \delta ({m - {m_k},\;n + {n_k} - {N_S} - 1} )+ \delta ({m + {m_k} - {N_S} - 1,\;n + {n_k} - {N_S} - 1} )\end{array} \right]}. $$

(10)$$\delta ({m - {m_k},\;n - {n_k}} )= \left\{ {\begin{array}{cc} 1&{when\;m = {m_m}\;and\;n = {n_k}}\\ 0&{others} \end{array}} \right.. $$

Some source pixels have larger intensities by collecting more point sources, which in turn supports the more positive influence of these pixels on imaging performance than other pixels.

To avoid duplicated count of point sources falling in the middle row or column that act as the symmetry axes, we divide the count number at those locations by 2. In mathematic form, it is written as

(11)$$\begin{array}{l} S({{{({{N_S} + 1} )} / 2},\;:} )= \frac{\textrm{1}}{\textrm{2}} \times S({{{({{N_S} + 1} )} / 2},\;:} )\\ S({:,\;{{({{N_S} + 1} )} / 2}} )= \frac{\textrm{1}}{\textrm{2}} \times S({:,\;{{({{N_S} + 1} )} / 2}} )\end{array}. $$

$S({{{({{N_S} + 1} )} / \textrm{2}},\;:} )$ indicates the number of point sources falling in the middle row of S, $S({:,\;{{({{N_S} + 1} )} / 2}} )$ indicates the number of point sources falling in the middle column of S.

There exists a practical limit to the sharpness of defined illumination directions. R. Socha suggested to roughly account for the finite illuminator resolution by convolving the pixel-based source with a blur function [24]. The source with blur effect is defined as

(12)$${S_{blurred}} = S \otimes PSF. $$

${\otimes} $ is the convolution operator, PSF is a Gaussian function with a support related to the source pixel size, ${S_{blurred}}$ is the source with blur effect. The source blur effect is applied to all the subsequent SMO implementations in this paper. The source is normalized with all pixels’ intensities scaled by the maximum intensity,

(13)$$\hat{S} = \frac{{{S_{blurred}}}}{{max({S_{blurred}})}}. $$

$\hat{S}$ is the decoded source. Keeping the pixels’ maximum intensity unchanged as one helps to avoid sharp spikes that may damage lenses [28]. From the perspective of efficient imaging with ICC method, the normalized source $\hat{S}$ is converted to a N_S²×1 vector SV column-wise.

For source representation with full pixels, the source is encoded with intensities of all the pixels in the concentric ring region with inner and outer radii of ${\sigma _{in}}$ and ${\sigma _{out}}$, respectively. The number of source variables depends on the total number of source pixels in that concentric ring region.

Figure 3 illustrates the mask representation with opaque and transparent pixels. The parameter P denotes the period of the mask pattern.

Fig. 3. Mask representation with opaque and transparent pixels

Download Full Size | PDF

Encoding and decoding of the mask depends on the symmetry of the mask pattern. We take the asymmetric mask and 4-fold symmetric mask for examples to demonstrate the encoding and decoding of the mask.

The asymmetric mask is encoded with all the mask pixels’ transmittance extracted column by column to form a row vector,

(14)$${x_M} = ({{t_1}, \cdots ,{t_{{N_M}}},{t_{1 + {N_M}}}, \cdots ,{t_{2{N_M}}}, \cdots ,{t_{1 + {N_M}({{N_M} - 1} )}}, \cdots ,{t_{{N_M} \times {N_M}}}} ). $$

The number of variables for an asymmetric mask encoding is ${D_M} = N_M^2$. The parameter ${t_j}$ denotes the transmittance of the ${j_{th}}$ mask pixel, ${t_j} \in [0,1],\;1 \le j \le {D_M},\;j \in {\mathbb Z}$. Recovery of the asymmetric mask is very simple, just by reshaping the encoding variables to the mask dimensions,

(15)$$M = reshape(x_M^T,{N_M},{N_M}). $$

The superscript ^T is the transpose operator, and the function reshape rearranges the column vector $x_M^T$ into an N_M×N_M matrix.

For a 4-fold symmetric mask, the encoding and decoding operations are similar to those of a 4-fold symmetric source. We encode the mask with the mask pixels in the first quadrant, and update the pixels in other quadrants using mirror operation and halve the pixels’ transmittance in the middle row and column. The number of variables for an asymmetric mask encoding is ${D_M} = {{{{({1 + {N_M}} )}^2}} / 4}$.

The resulting transmittance matrix M is the recovered mask with continuous transmittance. In this paper, the continuous transmittance mask is just the intermediate object for variable updates. We always use a binary mask for imaging evaluation. Therefore, the last step of mask decoding is to binarize a mask with threshold of 0.5,

(16)$${M_B} = \Gamma ({M\textrm{ - }0.5} ). $$

$\Gamma ({\cdot} )$ is the same hard threshold function as mentioned above. If the transmittance of a mask pixel is larger than 0.5, reset the pixel’s transmittance to one. Otherwise, reset the pixel’s transmittance to zero. The binary mask M_B is the decoded mask.

2.3 Flow of the CMA-ES based SMO

In this paper, the merit function for the SMO process is defined as the square of the Euclidean distance between the resist image (RI) and the target pattern (TP),

(17)$$F\{{{x_S},{x_M}} \}= ||{RI\{{{x_S},{x_M}} \}- TP} ||_2^2. $$

The resist image and the target pattern have the same size as the mask pattern. They are both ${N_M} \times {N_M}$ matrices. Given the encoded ${x_S}$ and ${x_M}$, the source and mask are recovered to calculate RI according to the flow introduced in section 2.1. Other kinds of metrics for imaging quality evaluation are available for different motivations [23,29,30].

The aim of SMO is to minimize the merit function by optimizing the source and mask variables under certain constraints as follows,

(18)$$\{ x_S^\ast ,x_M^\ast \} = \arg \mathop {min}\limits_{{x_S},{x_M}} \;F\{ {x_S},{x_M}\} ,\;\;\;\;\;\;\;\;\;s.t.\left\{ \begin{array}{l} {\sigma_{in}} \le x_S^p \le {\sigma_{out}},\;\;1 \le p \le \frac{{{D_S}}}{2}\\ 0 \le x_S^p \le \frac{\pi }{2},\;\;\frac{{{D_S}}}{2} + 1 \le p \le {D_S}\\ 0 \le x_M^q \le 1,\;\;1 \le q \le {D_M} \end{array} \right.. $$

In the list of constraints, $x_S^p$ refers to the ${p_{th}}$ element of the source variables, $x_M^q$ refers to the ${q_{th}}$ element of the mask variables. $x_S^\ast $ and $x_M^\ast $ are the output optimal source variables and mask variables.

We use CMA-ES as the optimization algorithm to implement the sequential optimization of the source and mask. The flow of CMA-ES based SMO is shown in Fig. 4. The source and the mask are encoded as ${x_S}$ and ${x_M}$, respectively. The source variables are initialized randomly according to their ranges to generate the initial source. The mask variables are initialized corresponding to the pixel-based target pattern, which is directly sampled from the mask pattern. Therefore, the initial mask is the same as the target pattern. Then, set other parameters related to the lithographic system such as the illumination wavelength, numerical aperture, refraction index of the immersion liquid, reduction ratio, aberration and so on. When the initialization is finished, we start the one-loop of sequential optimization of the source and the mask, followed by an additional refinement of the source to match the optimized mask. For the sequential SMO, each stage of the optimization is a greedy search for optimal solutions based on current conditions. After many tests, we find that it is amazing to obtain good enough solutions with only one loop of sequential source and mask optimization owing to the excellent performance of CMA-ES, instead of the need for iterative efforts when using other optimization algorithms.

Fig. 4. The flow chart of the CMA-ES based SMO

Download Full Size | PDF

CMA-ES suits ill-conditioned, non-convex and non-separable black-box optimization problems in the continuous domain due to its invariance properties. CMA-ES use a multivariate normal distribution to characterize the distribution of solutions. By updating the covariance matrix with good solutions found in the populations along with the generations, together with a control of the search step size, CMA-ES helps to find an optimal solution more efficiently. For an optimization problem, CMA-ES only needs the encoding and decoding methods of the solution vector, constraints of the solution variables and the merit function. Among them, the dimension of the solution vector is the most important factor for CMA-ES, because many other parameters inherent to CMA-ES depend on the dimension. As a result, the general optimization flow of CMA-ES usually starts with the construction of the merit function according to a specific problem. The solution vectors are the interfaces to call the merit function. ${x_S}$ and ${x_M}$ are the solution encoding for source optimization and mask optimization via CMA-ES, respectively. The corresponding dimensions of the solution vector are ${D_S}$ and ${D_M}$.

We take the source optimization via CMA-ES for example to describe the optimization flow of CMA-ES, and the same flow goes for the mask optimization by changing the dimension, encoding and decoding of the solution variables.

The dimension N equals ${D_S}$ in the source optimization, and the source is encoded as ${x_S}$. Given N, we can set the internal parameters of CMA-ES. The population size λ is given by

(19)$$\lambda = 4 + floor({3 \times \log (N)} ). $$

The function $floor(x)$ rounds a real number x towards negative infinity. The parent population size µ, i.e. the number of selected individuals to update the mean of the distribution, is given by

(20)$$\mu = floor({\lambda / 2}). $$

Decreasing weights for these µ individuals are given by

(21)$${w_i} = \ln ({{{(\lambda + 1)} / 2}} )- \ln (i),\;i = 1,2, \cdots ,\mu. $$

To make the sum of all the weights be one, the weights are normalized with the summand as

(22)$${W_i} = {{{w_i}} / {\sum\limits_{j = 1}^\mu {{w_j}} }}, $$

with$\sum\limits_{i = 1}^\mu {{W_i}} = 1,\;{W_1} > {W_2} > \cdots > {W_\mu } > 0$. Initialize the covariance matrix as ${C^{(0)}} = {I_{N \times N}}$, the evolution path for the covariance matrix and step-size as$p_c^{(0)} = {\bf 0}$, $p_\sigma ^{(0)} = {\bf 0}$, and choose distribution mean ${m^{(0)}} \in {{\mathbb R}^N}$ and step-size ${\sigma ^{(0)}}$ depending on the problem. The superscript ⁽⁰⁾ refers to the parameters at the generation 0.

The CMA-ES generates a population with λ individuals as the next generation by sampling from a multivariate normal distribution ${\cal N}({{m^{(g)}},\;{\sigma^{{{(g)}^2}}}{C^{(g)}}} )$ of current generation,

(23)$$x_k^{(g + 1)} \sim {\cal N}({{m^{(g)}},\;{\sigma^{{{(g)}^2}}}{C^{(g)}}} ),\;k = 1,2, \ldots ,\lambda. $$

The superscript ^(g) refers to the parameters of the search distribution at the generation g. ${m^{(g)}}$ is the mean at generation g, ${m^{(g)}} \in {{\mathbb R}^N}$, ${C^{(g)}}$ is the covariance matrix at generation g, ${C^{(g)}} \in {{\mathbb R}^{N \times N}}$, ${\sigma ^{(g)}}$ is the step-size at generation g. $x_k^{(g + 1)}$ is the ${k_{th}}$ individual from generation g+1. The symbol “∼” means that $x_k^{(g + 1)}$ is a random variable obeying the multivariate normal distribution.

All the individuals are ranked in ascending order according to their merit function values,

$F(x_{1:\lambda }^{(g + 1)}) \le F(x_{2:\lambda }^{(g + 1)}) \le \cdots \le F(x_{\lambda :\lambda }^{(g + 1)})$, $x_{i:\lambda }^{(g + 1)}$ is the individual with the ${i_{th}}$ smallest merit function values within the population at generation g+1. For each generation, the call times of merit function will increase by λ. This cumulative count number of merit function call is used for adjustment of the covariance matrix and termination judgement. The mean of the distribution is updated with the first µ individuals with the smallest merit function values,

(24)$${m^{(g + 1)}} = \sum\limits_{i = 1}^\mu {{W_i}x_{i:\lambda }^{(g + 1)}} . $$

The evolution path for the step-size is

(25)$$p_\sigma ^{(g + 1)} = ({1 - {c_\sigma }} )p_\sigma ^{(g)} + \sqrt {{c_\sigma }({2 - {c_\sigma }} ){\mu _{eff}}} {C^{{{(g)}^{ - \frac{1}{2}}}}}\frac{{{m^{(g + 1)}} - {m^{(g)}}}}{{{\sigma ^{(g)}}}}. $$

The step-size is adaptively updated by comparing length of the evolution paths $||{p_\sigma^{(g + 1)}} ||$ with its expected length $E||{{\cal N}({\bf 0},{\boldsymbol I})} ||$,

(26)$${\sigma ^{(g + 1)}} = {\sigma ^{(g)}} \times exp \left( {\frac{{{c_\sigma }}}{{{d_\sigma }}}\left( {\frac{{||{p_\sigma^{(g + 1)}} ||}}{{E||{{\cal N}({\bf 0},{\boldsymbol I})} ||}} - 1} \right)} \right). $$

It is with this adaptive adjustment for the search step-size that the optimization efficiency is improved.

The accumulation means a learning strategy taken over a number of generations, using the evolution path to reintroduce the sign information, which is related to the search direction and vital for searching the optimal solution. The evolution path for the covariance matrix is

(27)$$p_c^{(g + 1)} = ({1 - {c_c}} )p_c^{(g)} + {h_\sigma }\sqrt {{c_c}({2 - {c_c}} ){\mu _{eff}}} \frac{{{m^{(g + 1)}} - {m^{(g)}}}}{{{\sigma ^{(g)}}}}, $$

(28)$${h_\sigma } = \left\{ \begin{array}{l} 1,\;\;\;if\;\frac{{||{p_\sigma^{(g + 1)}} ||}}{{\sqrt {1 - {{({1 - {c_\sigma }} )}^{2({g + 1} )}}} }} < \left( {1.4 + \frac{2}{{N + 1}}} \right)E||{{\cal N}({\bf 0},{\boldsymbol I})} ||\\ 0,\;\;\;others \end{array} \right.. $$

The covariance matrix is updated with rank-1 update and rank-µ update mechanisms,

(29)$${C^{(g + 1)}} = \left( {1 + {c_1}\delta ({h_\sigma }) - {c_1} - {c_\mu }\sum\limits_{k = 1}^\lambda {{W_k}} } \right){C^{(g)}} + {c_1}p_c^{(g + 1)}p_c^{{{(g + 1)}^T}} + {c_\mu }\sum\limits_{k = 1}^\mu {{W_k}y_{k:\lambda }^{(g + 1)}y_{k:\lambda }^{{{(g + 1)}^T}}} . $$

On the right side of Eq. (29), the second term adopts the rank-1 update mechanism, and the third term imports the rank-µ update mechanism. On the one hand, rank-1 update exploits the information of correlations among generations by using the evolution path. On the other hand, rank-µ update makes good use of the information from the entire population. Rank-1 update is important in small populations, while rank-µ update is particularly important in large populations. The combination of rank-1 update and rank-µ update makes CMA-ES applicable to optimization problems with both large and small populations. Besides, an explicit control of the covariance matrix works every ${\lambda / {({10N({{c_1} + {c_\mu }} )} )}}$ generations to reduce the numerical effort due to the eigenvalue decomposition. A more brief description about CMA-ES can be found in [24].

The CMA-ES based SO flow will terminate according to several criteria. The parameter maxEval gives the maximum call times of the merit function. The parameter maxCond denotes the maximum condition number of the covariance matrix. The parameter minF denotes the threshold to stop the optimization if an individual’s merit function value is less than the threshold. The parameter nLimit denotes the maximum generations with the best merit function values unchanged. The parameters for termination judgement are selected depending on the specific problem, considering the time consumption and the memory occupation. The optimization process with CMA-ES continues until any of the termination criteria is satisfied.

The above optimization flow using CMA-ES is also applied to the mask optimization. The dimension N equals D_M in the mask optimization, and the mask is encoded as ${x_M}$. The parameters for termination judgement are adjusted accordingly. When the whole SMO flow terminates, the optimal solution $x_S^\ast $ and $x_M^\ast $ are output, and the optimized source $\hat{S}$ and mask M_B are synthesized according to the decoding operations introduced in section 2.2.

3. Simulations and analysis

This Section presents several simulations to verify the optimization capacity and convergence efficiency of the proposed method. In this paper, all of the simulations are implemented on a computer with Intel Core i5-6300HQ CPU, 2.3GHz, and 8GB of RAM.

The simulations exploit the vector imaging theory for the 193nm immersion lithography system. The source of the partially coherent imaging system is pixelated as a 51×51 matrix with the inner partial coherence factor ${\sigma _{in}} = 0.4$ and the outer partial coherence factor ${\sigma _{out}} = 0.95$. The polarization mode of the illumination source is set as tPol, i.e. linear polarization with tangential orientation. The numerical aperture NA is 1.35, the reduction of the projection optics is 4, and the refractive index of the immersion liquid is 1.44. We use a sigmoid function to approximate the resist effect, with the resist steepness $\alpha \textrm{ = }\textrm{85}$ and the resist threshold $tr = 0.24$.

Figure 5 shows the initial mask pattern used in the simulations, which is also the target pattern for SMO. The size of the mask pattern is 1215nm×1215nm. The mask is pixelated as a 81×81 matrix with the pixel size being 15nm×15nm, which is big enough to ensure the manufacturability of the mask. Pixels in the black region are opaque pixels with transmittance of zero, and pixels in the white region are transparent pixels with transmittance of one. Two red dashed lines intersect the boundary of the transparent region at four points T1, T2, M1 and M2. These two cut-lines are located in the relatively sparse region where breaking or shrinking of the line are more likely to emerge. The NILS at the edges is a good evidence for the validity of SMO, because the NILS reveals the slope of the aerial image on the image boundary. The NILS is defined as

(30)$$NILS = \left|{\frac{{width}}{{{I_{edge}}}} \times {{\left. {\frac{{dI}}{{dx}}} \right|}_{edge}}} \right|. $$

The width is the dimension of the intersected segment T1T2 or M1M2 in Fig. 5, and it corresponds to the critical dimension (CD) of the mask pattern. The value of CD is 45nm. ${I_{edge}}$ is the aerial image intensity at the edge, and ${\left. {\frac{{dI}}{{dx}}} \right|_{edge}}$ means the change rate of the aerial image intensity at the edge. $|x |$ returns the absolute value of x.

Fig. 5. The initial mask pattern used for SMO

Download Full Size | PDF

The first simulation is to compare the optimization performances between SMO using CMA-ES and several popular heuristic algorithms such as JADE, PSO, and GA. In this simulation, the source is represented with full pixels limited in the range of ${\sigma _{in}}$ and ${\sigma _{out}}$. Taking account of the 4-fold symmetry of the source, there are 377 source pixels in total in a quadrant. Similarly, there are 1681 mask pixels in a quadrant. The SMO methods are used to optimize the intensity for each source pixel and the transmittance for each mask pixel. SMO is run by iterative source optimization (SO) and mask optimization (MO).

Table 1 lists the parameters for different optimization algorithms for SMO in the simulation. The definitions for the parameters of JADE, PSO and GA can be found in [30], [17] and [16], respectively. The procedures of SO and MO share the parameters that are not distinguished by subscripts SO or MO. For fair comparison of the optimization performances between difference SMO methods, all of the SMO flows start with the same initial source and mask. Figure 6 demonstrates the results of SMO using different optimization algorithms with the full-pixel source representation. Four columns from left to right represent the source, the mask, the resist image (RI), and the comparison between the resist contour (RC) and the target pattern (TP), respectively. RC is the contour of the printed image (PI) defined in Eq. (6), and outlined with red solid lines. TP appears as the background pattern in dark red.

Fig. 6. The results of SMO using different optimization algorithms with the full-pixel source representation. From left to right: the source, the mask, the resist image (RI), the comparison of the resist contour (RC) and the target pattern (TP). From top to bottom: the simulation results with the initial source and mask, the simulation results after SMO using CMA-ES, JADE, PSO, and GA, respectively.

Download Full Size | PDF

Table 1. The parameters for different optimization algorithms for SMO

View Table | View all tables in this article

From visual point of view, it is intuitive that the SMO using CMA-ES obtains a more uniformly distributed resist image in the third column, and outputs a smooth contour which is more consistent with the boundary of the target pattern in the fourth column, than other SMO methods. Looking at the last column, SMO using JADE provides a very close resist contour to that obtained from CMA-ES based SMO, while the results from the other two SMO methods are quite different. Because GA is not very suitable for high-dimensional optimization problems [31] like pixelated SMO. Therefore, even with generous adjustment of the GA algorithm parameters, GA based SMO shows very poor performance, especially in mask optimization with a relatively large optimization dimension. One notable point is that the SMO using CMA-ES offers a much more compact source than other SMO methods. Comparing the four optimized sources in the first column, we find that the pixels with larger intensities are mainly located at the two separated poles in the vertical direction. Even the GA-based SMO with poor optimization performance shows a slight but similar behavior. Unlike the dispersed distribution of source generated from other SMO methods, the source produced via CMA-ES based SMO is more concentrated. The aggregation of some point sources in the same source pixels decreases the source fill ratio. It is reasonable to conclude that CMA-ES based SMO has the ability to capture the most advantageous spatial frequency components of the source for better imaging quality, and suppress the spatial frequency components degrading the imaging quality at the same time. On the contrary, the other three SMO methods fail to distinguish different spatial frequency components sufficiently according to their importance. In other words, CMA-ES based SMO has superior optimization capacity.

We further analyze the simulation results in Fig. 6 quantitatively. Table 2 lists several evaluation metrics about the simulation results corresponding to Fig. 6, including the final merit function values, pattern error (PE), pattern error ratio (PER), normalized image log slope (NILS) after SMO, the runtime of individual SO and MO stages and the whole SMO procedure, and the times for forward imaging evaluations. PE is defined as the Euclidean distance between PI and TP. Because both PI and TP are binary matrixes, PE refers to the number of pixels with errors. PER is the ratio of PE to the total number of pixels for the pixelated mask. The runtime for SMO is slightly longer than the sum of the runtime for individual SO and MO stages, owing to extra calculations for encoding, decoding and assignment operations.

Table 2. Comparison between the results of SMO using different optimization algorithms

View Table | View all tables in this article

The merit function values given by SMO using CMA-ES and JADE are rather close, much smaller than those given by SMO using PSO and GA. Similar data trends are found in PE, confirming the effectiveness of the merit function. The obtained PER with SMO using CMA-ES is less than 1%, which is a common level in other advanced SMO methods. SMO using CMA-ES improves the NILS along the cut-lines, which reveals a better imaging contrast and greater potential for a larger process window. There is no significant difference in runtime of the whole SMO procedure using different optimization algorithms. However, it takes the SMO using CMA-ES much less times of forward imaging evaluation to reach the optimal solution. The eigenvalue decomposition of the covariance matrix is a time-consuming operation in the CMA-ES algorithm, and occupies most of the runtime in SMO using CMA-ES. Nevertheless, the adaptive updates of the covariance matrix and the search step size indeed improve the ability of searching the optimal solution. Compared with the SMO using JADE that has a comparable merit function value, SMO using CMA-ES behaves better in convergence efficiency because of a shorter runtime.

To further identify the most advantageous spatial frequency components of the source, we propose a new source representation method. The source is comprised of a set of ideal point sources with unit intensity and adjustable positions. The source pixels that are more influential to improving the imaging quality tend to have more point sources falling inside and relatively larger intensities. The number of the point sources affects the relative intensity distribution of the source, as well as the convergence efficiency. The variable number for the source represented with full pixels is 377. Using the source representation of point sources, each point source corresponds to two variables as the polar coordinates. Considering the sparsity of the optimal source, it is a natural dimension reduction operation to set the number of point sources less than 188. Through the aggregation of these point sources, the vital spatial frequency components are highlighted. Figure 7 illustrates the results of SMO using CMA-ES with the source represented by three different numbers of point sources, 50, 100, and 150 respectively. From the uniformly distributed resist image and well-matched resist contour with the target pattern, all of the three cases seem to achieve an optimal combination of the source and the mask. Although the initial sources for three cases are quite different, the output optimal sources are similar to each other. They are sparser than the optimized source in Fig. 6(e) obtained with full-pixel source representation.

Fig. 7. Results of SMO using CMA-ES with the source represented by three different numbers of point sources. From left to right: the source, the mask, RI, the comparison of RC and TP. Row #1, #3, #5: the simulation results with the initial source and mask for N_pre=50, 100 and 150, respectively. Row #2, #4, #6: the simulation results with the optimized source and mask for N_pre=50, 100 and 150, respectively.

Download Full Size | PDF

To investigate the effect of the number of point sources and the source representation methods on the optimization performance, we further analyze the simulation results in Fig. 7 quantitatively. Table 3 lists the same evaluation metrics as those in Table 2 about the simulation results from Fig. 7. Besides, the evaluation metrics for SMO based on the full-pixel source representation are copied from Table 2, as a comparison of the source representation methods. Source representations with point sources lead to smaller merit function values than the full-pixel source representation. This new source representation method further reduces the pattern error and increases the NILS values. The numbers of source variables differ among the listed four cases of source representation and affect the runtime for SO significantly. With a smaller number of point sources to represent the source, the reduced dimension of the source variables leads to a relatively faster convergence but limited optimization capacity. The limited number of source points are only enough for figuring out the important spatial frequency components, but fail to produce the required relative intensity distribution across the source plane. Because the relative intensities have something to do with the number of point sources collected in different source pixels. By contrast, a larger number of point sources balances the ability of searching the advantageous spatial frequency components and the production capacity of desired relative intensity distribution. This is supported by the comparison of the PE values after SMO with ${N_{pre}}$ being 50 and 150.

Table 3. Comparison of results with CMA-ES based SMO between different source representations

View Table | View all tables in this article

It is noted that with the increment of the number of point sources, the runtime of SMO did not increase significantly. This phenomenon can be explained for three reasons. First, the incremental number of point sources increases the time consumption in SO stage, but the time increment is slight relative to the total time consumption for SMO, because the SO stage with linear operations is not so time-consuming. Second, the excellent optimization performance of CMA-ES, together with the proposed source representation method, enables similar optimized sources in three cases. The similar optimized sources affect the MO stage slightly, presented as the small difference of runtime for MO stage among three cases. Third, the time consumptions for extra calculations such as encoding, decoding and assignment operations in three cases are almost the same.

To validate the ability of finding out the key spatial frequency components with the proposed source representation method, we use a metric called cosine similarity [32] to quantify the correlation degree between any pair of two sources after SMO. The cosine similarity is defined as

(31)$${C_{1,2}} = \frac{{\sum\limits_u {{S_{1,u}}{S_{2,u}}} }}{{\sqrt {\sum\limits_u {S_{1,u}^2} \sum\limits_u {S_{2,u}^2} } }}.$$

The parameters${S_1}$ and ${S_2}$ refer to two individual sources. The subscript u points to the ${u_{th}}$ element of the source matrix. ${C_{1,2}}$ is the cosine similarity between the two sources. The closer ${C_{1,2}}$ is to one, the more similar source ${S_1}$ is to source ${S_2}$. The closer ${C_{1,2}}$ is to zero, the greater the difference between source ${S_1}$ and source ${S_2}$ is. The cosine similarity among the three optimized sources corresponding to ${N_{pre}}$ = 50, 100 and 150 are calculated and listed in Table 4. In the 3×3 symmetric matric, each entry except the ones on the diagonal has a value close to one, indicating the three optimized sources are highly correlated. In other words, the new source representation method helps to find the critical spatial frequency components of the source and improve the imaging quality.

4. Conclusion

In this paper, we develop a pixelated SMO method based on CMA-ES. The forward vector imaging formulations, the encoding and decoding of the source and the mask, and the flows of SMO using CMA-ES are studied. On the one hand, benefiting from the adaptive adjustment of the solution search space and search step size, CMA-ES outperforms the heuristic algorithms such as JADE, PSO and GA in optimization performance. On the other hand, considering the sparsity of the optimal source, a new source representation with point sources is proposed to further identify the advantageous spatial frequency components of the source for imaging performance improvement. Moreover, the CMA-ES based SMO is more applicable than gradient based SMO methods in lithographic imaging systems with complicated resist models which are not analytically differentiable. Simulation results confirm that the SMO using CMA-ES together with the new source representation improve the optimization capacity and the convergence efficiency of pixelated SMO effectively.

Table 4. The cosine similarity among the optimized sources obtained from CMA-ES based SMO using the new source representation method with different values of N_pre

View Table | View all tables in this article

Funding

National Major Science and Technology Projects of China (2017ZX02101004-002, 2017ZX02101004); Natural Science Foundation of Shanghai (17ZR1434100).

Disclosures

The authors declare no conflicts of interest.

References

1. A. K. Wong, Resolution Enhancement Techniques in Optical Lithography (SPIE, 2001).

2. A. E. Rosenbluth, D. O. Melville, K. Tian, S. Bagheri, J. Tirapu-Azpiroz, K. Lai, A. Waechter, T. Inoue, L. Ladanyi, F. Barahona, K. Scheinberg, M. Sakamoto, H. Muta, E. Gallagher, T. Faure, M. Hibbs, A. Tritchkov, and Y. Granik, “Intensive optimization of masks and sources for 22 nm lithography,” Proc. SPIE 7274, 727409 (2009). [CrossRef]

3. A. E. Rosenbluth, S. J. Bukofsky, M. S. Hibbs, K. Lai, A. F. Molless, R. N. Singh, and A. Wong, “Optimum mask and source patterns to print a given shape,” Proc. SPIE 4346, 486–502 (2001). [CrossRef]

4. X. Ma and G. R. Arce, “Pixel-based simultaneous source and mask optimization for resolution enhancement in optical lithography,” Opt. Express 17(7), 5783–5793 (2009). [CrossRef]

5. Y. Peng, J. Zhang, Y. Wang, and Z. Yu, “Gradient-based source and mask optimization in optical lithography,” IEEE Trans. on Image Process. 20(10), 2856–2864 (2011). [CrossRef]

6. X. Ma, C. Han, Y. Li, L. Dong, and G. R. Arce, “Pixelated source and mask optimization for immersion lithography,” J. Opt. Soc. Am. A 30(1), 112–123 (2013). [CrossRef]

7. N. Jia and E. Y. Lam, “Pixelated source mask optimization for process robustness in optical lithography,” Opt. Express 19(20), 19384–19398 (2011). [CrossRef]

8. J. Li and E. Y. Lam, “Robust source and mask optimization compensating for mask topography effects in computational lithography,” Opt. Express 22(8), 9471–9485 (2014). [CrossRef]

9. J. Li, S. Liu, and E. Y. Lam, “Efficient source and mask optimization with augmented Lagrangian methods in optical lithography,” Opt. Express 21(7), 8076–8090 (2013). [CrossRef]

10. F. Peng and Y. Shen, “Source and mask co-optimization based on depth learning methods,” in “2018 China Semiconductor Technology International Conference (CSTIC),” 1–3 (2018).

11. Y. Shen, “Lithographic source and mask optimization with a narrow-band level-set method,” Opt. Express 26(8), 10065–10078 (2018). [CrossRef]

12. Y. Shen, F. Peng, and Z. Zhang, “Semi-implicit level set formulation for lithographic source and mask optimization,” Opt. Express 27(21), 29659–29668 (2019). [CrossRef]

13. X. Ma, D. Shi, Z. Wang, Y. Li, and G. R. Arce, “Lithographic source optimization based on adaptive projection compressive sensing,” Opt. Express 25(6), 7131–7149 (2017). [CrossRef]

14. X. Ma, Z. Wang, Y. Li, G. R. Arce, L. Dong, and J. Garcia-Frias, “Fast optical proximity correction method based on linear compressive sensing,” Opt. Express 26(11), 14479–14498 (2018). [CrossRef]

15. T. Fühner and A. Erdmann, “Improved mask and source representations for automatic optimization of lithographic process conditions using a genetic algorithm,” Proc. SPIE 5754, 41 (2005). [CrossRef]

16. C. Yang, S. Li, and X. Wang, “Efficient source mask optimization using multipole source representation,” J. Micro/Nanolith. MEMS MOEMS 13(4), 043001 (2014). [CrossRef]

17. L. Wang, S. Li, X. Wang, and C. Yang, “Source mask projector optimization method of lithography tools based on particle swarm optimization algorithm,” Acta Opt. Sin. 37(10), 1022001 (2017). [CrossRef]

18. Y. Mao, S. Li, X. Wang, Y. Wei, and G. Chen, “Multi-parameter joint optimization for lithography based on photoresist topography model,” Acta Opt. Sin. 40(4), 0422002 (2020). [CrossRef]

19. Y. V. Miklyaev, W. Imgrunt, V. S. Pavelyev, D. G. Kachalov, T. Bizjak, L. Aschke, and V. N. Lissotschenko, “Novel continuously shaped diffractive optical elements enable high efficiency beam shaping,” Proc. SPIE 7640, 764024 (2010). [CrossRef]

20. M. Mulder, A. Engelen, O. Noordman, G. Streutker, B. van Drieenhuizen, C. van Nuenen, W. Endendijk, J. Verbeeck, W. Bouman, A. Bouma, R. Kazinczi, R. Socha, D. Jürgens, J. Zimmermann, B. Trauter, J. Bekaert, B. Laenens, D. Corliss, and G. Mclntyre, “Performance of FlexRay: a fully programmable illumination system for generation of freeform sources on high NA immersion systems,” Proc. SPIE 7640, 76401P (2010). [CrossRef]

21. P. Gao, A. Gu, and A. Zakhor, “Optical proximity correction with principal component regression,” Proc. SPIE 6924, 69243N (2008). [CrossRef]

22. X. Wu, S. Liu, J. Li, and E. Y. Lam, “Efficient source mask optimization with Zernike polynomial functions for source representation,” Opt. Express 22(4), 3924–3937 (2014). [CrossRef]

23. A. E. Rosenbluth and N. Seong, “Global optimization of the illumination distribution to maximize integrated process window,” Proc. SPIE 6154, 61540H (2006). [CrossRef]

24. N. Hansen, “The CMA evolution strategy: a tutorial,” Tech. Rep., INRIA, arXiv: 1604.00772, (2016).

25. A. K.-K. Wong, Optical Imaging in Projection Microlithography (SPIE, 2005).

26. J. C. Yu, P. Yu, and H. Y. Chao, “Fast source optimization involving quadratic line-contour objectives for the resist image,” Opt. Express 20(7), 8161–8174 (2012). [CrossRef]

27. Y. Peng, J. Zhang, Y. Wang, and Z. Yu, “High performance source optimization using a gradient-based method in optical lithography,” Proc. ISQED, 108–113 (2010).

28. S. Hsu, Z. Li, L. Chen, K. Gronlund, H. Liu, and R. Socha, “Source-mask co-optimization: optimize design for imaging and impact of source complexity on lithography performance,” Proc. SPIE 7520, 75200D (2009). [CrossRef]

29. Y. Shen, F. Peng, X. Huang, and Z. Zhang, “Adaptive gradient-based source and mask co-optimization with process awareness,” Chin. Opt. Lett. 17(12), 121102 (2019). [CrossRef]

30. J. Zhang and A. C. Sanderson, “JADE: adaptive differential evolution with optional external archive,” IEEE Trans. Evol. Comput. 13(5), 945–958 (2009). [CrossRef]

31. R. H. Abiyev and M. Tunay, “Optimization of high-dimensional functions through hypercube evaluation,” Comput. Intell. Neurosci. 2015, 1–11 (2015). [CrossRef]

32. K. Tian, A. Krasnoperova, D. Melville, A. E. Rosenbluth, D. Gil, J. Tirapu-Azpiroz, K. Lai, S. Bagheri, C. Chen, and B. Morgenfeld, “Benefits and trade-offs of global source optimization in optical lithography,” Proc. SPIE 7274, 72740C (2009). [CrossRef]

Algorithms	Values of the parameters
CMA-ES	$l o o p = 1$ , $σ_{S O}^{(0)} = 0.2$ , $σ_{M O}^{(0)} = 0.05$ , $n L i m i t = 200$ , $m i n F = 10$ , $m a x E v a l = 2000$ , $m a x C o n d = 10^{7}$
JADE	$l o o p = 5$ , $N P = 40$ , $m a x G = 250$ , $μ_{C R} = 0.5$ , $μ_{F} = 0.5$ , $p_{0} = 0.05$ , $c = 0.1$
PSO	$l o o p = 5$ , $N P = 40$ , $m a x G = 500$ , $m a x V_{S O} = π / 2$ , $m a x V_{M O} = 1$ , $m i n W = 0.4$ , $m a x W = 0.9$
GA	$l o o p = 5$ , $N P = 40$ , $m a x G = 500$ , $p_{C} = 0.8$ , $p_{M} = 0.1$

Algorithms		CMA-ES	JADE	PSO	GA
Merit function values		22.487	28.640	208.118	514.650
Pattern error (PE)		26	28	240	667
Pattern error ratio (PER)		0.396%	0.427%	3.658%	10.166%
Normalized image log slope (NILS)	M1	1.167	0.811	1.074	0.924
	M2	1.096	0.776	1.013	0.879
	T1	1.038	0.722	0.745	0.943
	T2	1.111	0.983	1.166	0.883
Runtime (unit: s)	SO	82.179	277.409	295.962	268.604
	MO	404.594	201.719	195.761	168.484
	SMO	497.532	511.517	523.960	468.421
Times of forward imaging evaluation	SO	24213	120000	130631	123240
Times of forward imaging evaluation	MO	20020	100000	102735	102700

Source representations		full pixels	N_pre
Source representations		full pixels	50	100	150
Merit function values		22.487	23.419	18.897	17.816
Pattern error (PE)		26	20	20	12
Pattern error ratio (PER)		0.396%	0.305%	0.305%	0.183%
Normalized image log slope (NILS)	M1	1.167	1.264	1.216	1.409
	M2	1.096	1.181	1.139	1.307
	T1	1.038	1.439	1.089	1.428
	T2	1.111	1.300	1.100	1.261
Runtime (unit: s)	SO	82.179	38.339	54.820	59.706
	MO	404.594	391.351	399.163	410.648
	SMO	497.532	439.632	463.954	480.446
Times of forward imaging evaluation	SO	24213	17170	23807	24213
Times of forward imaging evaluation	MO	20020	20020	20020	20020

	$N_{p r e} = 50$	$N_{p r e} = 100$	$N_{p r e} = 150$
$N_{p r e} = 50$	1	0.9589	0.9663
$N_{p r e} = 100$	0.9589	1	0.9599
$N_{p r e} = 150$	0.9663	0.9599	1

Algorithms	Values of the parameters
CMA-ES	$l o o p = 1$ , $σ_{S O}^{(0)} = 0.2$ , $σ_{M O}^{(0)} = 0.05$ , $n L i m i t = 200$ , $m i n F = 10$ , $m a x E v a l = 2000$ , $m a x C o n d = 10^{7}$
JADE	$l o o p = 5$ , $N P = 40$ , $m a x G = 250$ , $μ_{C R} = 0.5$ , $μ_{F} = 0.5$ , $p_{0} = 0.05$ , $c = 0.1$
PSO	$l o o p = 5$ , $N P = 40$ , $m a x G = 500$ , $m a x V_{S O} = π / 2$ , $m a x V_{M O} = 1$ , $m i n W = 0.4$ , $m a x W = 0.9$
GA	$l o o p = 5$ , $N P = 40$ , $m a x G = 500$ , $p_{C} = 0.8$ , $p_{M} = 0.1$

Source mask optimization using the covariance matrix adaptation evolution strategy

Abstract

1. Introduction

2. Methodology

2.1 Forward vector imaging formulation

2.2 Encoding and decoding of the source and mask

2.3 Flow of the CMA-ES based SMO

3. Simulations and analysis

4. Conclusion

Funding

Disclosures

References

Cited By

Figures (7)

Tables (4)

Equations (31)

Optics Express