Information theoretical approaches in computational lithography

Zhiqiang Wang; Xu Ma; Gonzalo R. Arce; Javier Garcia-Frias

doi:10.1364/OE.26.016736

1. Introduction

To date computational lithography has been extensively used in the semiconductor industry to improve the imaging performance of optical lithography systems [1,2]. Computational lithography is a set of mathematical and algorithmic approaches to improve the lithography resolution and image fidelity by individually or jointly optimizing the lithography tools, mask and process parameters based on imaging models and process models. Figure 1(a) illustrates a typical optical lithography system. In practical optical lithography systems, partially coherent light sources are used to illuminate the mask on which the circuit layout is carved. The circuit layout is then transferred from the mask onto the wafer by optical projection and photoresist development. As lithography technology enters the sub-wavelength realm, the optical proximity effect severely influences the imaging quality of lithography systems. An example of optical proximity effect is the distortion of mask patterns where aerial images for dense and isolated lines are different. Optical proximity correction (OPC) is thus one of the most important computational lithography techniques used to compensate for image distortion by optimizing the mask patterns [3–6].

Fig. 1 (a) The sketch of an optical lithography system and (b) the pixelated OPC method.

Download Full Size | PDF

In the last several years, a number of pixelated OPC approaches have been developed to obtain the optimized mask patterns by solving for the inverse lithography problem [7–19]. Different from traditional rule-based and edge-based OPC methods, pixelated OPC co-optimizes the transmissivities of all mask pixels. Therefore, pixelated OPC methods dramatically increase the degrees of optimization freedom and achieve higher resolution and image fidelity than rule-based or edge-based OPC methods [2]. It is natural to wonder how information is transferred in lithography systems, and what is the theoretical limit of image fidelity achieved by pixelated OPC methods. However, most of current research on computational lithography focuses on developing numerical approaches to optimize the lithography imaging performance. On the other hand, the underlying information transmission mechanism in computational lithography has not yet been well understood. Recently, Rieger discussed the analogy between the lithography techniques and communication theory [20]. Ma et. al, first established an approximate information channel model for coherent optical lithography systems [21], deriving the maximum information transfer and theoretical limit of image fidelity for coherent lithography systems. This prior work focused on coherent lithography systems. However, most practical lithography systems are partially coherent, where the partially coherent illumination consists of a number of coherent source points [22–24]. Thus, the information channel model proposed in [21] is inadequate to characterize partially coherent lithography systems. In addition, the information channel model in [21] fails to consider the correlation between different pixels on the print image. However, neighboring pixels on the print image are indeed correlated to each other due to the optical proximity effect. Thus, more rigorous models are desired to accurately depict the information transmission in partially coherent lithography systems.

To our best knowledge, this paper is the first to study information theoretical approaches for computational lithography in partially coherent lithography systems. It first develops the information channel model based on the statistical relationship between the mask and print images. As shown in Fig. 1(a) and 1(b), the partially coherent lithography system is regarded as an information channel that transfers the layout pattern from mask to wafer. The mask and print images are the input and output signals of the information channel, respectively. A statistical method is used to calculate the probability transfer matrix between a batch of pixels on the mask and print images. Then, we derive the mutual information between the mask and print images.

Another contribution of this paper is to study and analyze the theoretical limit of image fidelity that can be achieved by pixelated OPC in partially coherent lithography systems. The pixelated OPC method encodes the input mask to increase the information transfer accuracy in the channel. The image fidelity is evaluated by the pattern error (PE), which is defined as the square of the Euclidean distance between the actual print image and the target layout [6,25]. The theoretical limit of image fidelity is formulated as a function of the mutual information. Then, a numerical optimization algorithm is applied to solve for the optimal information transfer (OIT), which leads to the best image fidelity. Finally, an application of the proposed information theoretical approaches is discussed. In this application, the optimal probability distribution of the mask pattern, obtained from the information theoretical model, is used to improve the OPC solutions provided by current gradient-based algorithms. The proposed approach is assessed by a set of simulations based on different layout patterns.

The remainder of this paper is organized as follows. Section 2 establishes the information channel model and derives the mutual information between the mask and print images. Section 3 discusses the relationship between mutual information and image fidelity in partially coherent lithography systems. Section 4 proposes an optimization method to solve for the OIT and to obtain the theoretical limit of image fidelity for pixelated OPC techniques. Section 5 discusses the application of the proposed information theoretical approaches. Conclusions are provided in Section 6.

2. Information channel model for partially coherent lithography systems

In this section, we first establish the information channel model for partially coherent lithography systems, and then derive the mutual information between the mask pattern and print images. The aerial image of partially coherent lithography systems can be separated into several images generated by coherent systems. As shown in Fig. 2, the aerial image of partially coherent lithography systems is formulated as

I = \sum_{m} Γ_{m} {| h^{m} (r) \otimes M (r) |}^{2},

where M(r) is the mask pattern, r = (x, y) is the spatial coordinate, Γ_m is the coefficient of the Fourier series expansion of the complex degree of coherence γ(r), and ⊗ is the notation of convolution operation. Assuming that the mask pattern is constrained within a square region defined by x, y ∈ [−D/2, D/2], we can calculate Γ_m as

Γ_{m} = \frac{1}{D^{2}} \int_{A_{γ}} γ (r) exp (j ω_{0} m \cdot r) d r,

where A_γ is a square region with x, y ∈ [−D, D], ω₀ = π/D, m = (m_x, m_y), m_x and m_y are integers, and · is the inner-product. In Eq. (1), h^m is the point spread function of the coherent component corresponding to Γ_m and is given by

h^{m} (r) = h (r) \cdot exp (j ω_{0} m \cdot r) .

In Eq. (3), h(r) is the convolution kernel defined as [26,27]

h (r) = \frac{J_{1} (2 π r NA / λ)}{2 π r NA / λ},

where NA is the numerical aperture, λ is the wavelength of illumination, and J₁(·) is the Bessel function of the first kind. In addition, the effect of the photoresist is modeled by a hard threshold function [6]. The print image of partially coherent lithography systems can be expressed as

Z = Λ {I - t_{r}},

where Λ{x} = 1 if x > 0, otherwise Λ{x} = 0, t_r is the threshold of photoresist, and I is defined in Eq. (1).

Fig. 2 The imaging model and information channel model of partially coherent lithography systems. The lithography system is regarded as an information channel, and the mask and print image are its input and output signals, respectively.

Download Full Size | PDF

In this paper, the partially coherent lithography system is regarded as an information channel, and the mask and print images are its input and output signals, respectively. According to Eq. (1), the optical proximity effect is described by the convolution between the point spread function h^m (r) and the mask M(r). This means that the imaging of one pixel on the mask will be influenced by its surrounding pixels within the region covered by h^m (r). Assume the area covered by h^m (r) is a circle denoted as C_p. Each pixel at coordinate (x, y) on the print image is affected by a set of mask pixels within C_p around the coordinate (x, y). Therefore, the neighboring pixels on the print image are correlated to each other. Accordingly, the information on the mask is transferred to the print image by a batch of pixels together, rather than by independent pixels. Next, we will build up the statistical relationship between a batch of pixels on the mask and print images to take into account the correlation between neighboring image pixels. In Fig. 2, assume region C_p includes K pixels. Let the vector x⃗ = (x₁, x₂, . . ., x_K)^T represent the K mask pixels covered by C_p, where x_i is the value of the ith pixel. For the binary mask, x_i = 0 or 1. The vector y⃗ = (y₁, y₂, . . ., y_K)^T represents the K pixels on the print image corresponding to x⃗, and y_i = 0 or 1. Let N_x and N_y be the number of one-valued pixels in x⃗ and y⃗, respectively. Thus,

N_{x} = \sum_{i = 1}^{K} x_{i}, N_{y} = \sum_{i = 1}^{K} y_{i} .

Assume p_m is the probability that m pixels in x⃗ have value of 1, and q_n is the probability that n pixels in y⃗ have value of 1, i.e.,

p_{m} = P_{r} {N_{x} = m}, q_{n} = P_{r} {N_{y} = n},

where m, n = 0, 1, . . ., K, and P_r {·} represents the probability of the argument. Define the vectors of probability masses as p⃗ = (p₀, p₁, . . ., p_K)^T and q⃗ = (q₀, q₁, . . ., q_K)^T. Suppose T ∈ ℝ^(K+1)×(K+1) is the probability transfer matrix between p⃗ and q⃗, i.e.,

\vec{q} = T \cdot \vec{p} .

In the above equation, the element of T in the (n + 1)th row and the (m + 1)th column is defined as T(n + 1, m + 1) = Pr{N_y = n|N_x = m}, which indicates the probability of N_y = n given N_x = m, where m, n = 0, 1, . . ., K.

In this paper, we use a statistical method to calculate the matrix T based on the layout patterns as shown in Fig. 3. The top and bottom rows in Fig. 3 show the mask patterns and the corresponding print images, respectively. Figures 3(a) and 3(b) illustrate the target of “layout 1” and its pixelated OPC solution, respectively. Figures 3(e) and 3(f) illustrate the target of “layout 2” and its pixelated OPC solution. Here, we use the gradient-based OPC algorithm in [6] to optimize the masks. We go through all of the sub-regions covered by the circle C_p on the mask patterns and their corresponding print images in Fig. 3. Then, we count the frequencies for the events {N_x = m and N_y = n}, where m, n = 0, 1, . . ., K. Let #{N_x = m; N_y = n} be the number of occurrences of the event {N_x = m and N_y = n}. We calculate the probability P_r {N_y = n|N_x = m} as

P_{r} {N_{y} = n | N_{x} = m} = \frac{# {N_{x} = m; N_{y} = n}}{\sum_{n = 0}^{K} # {N_{x} = m; N_{y} = n}} .

Note that P_r {N_y = n|N_x = m} is the element of T in the (n + 1)th row and the (m + 1)th column.

Fig. 3 The masks and print images used to calculate the probability transfer matrix T.

Download Full Size | PDF

Now, let us consider the specific case in which n pixels in y⃗ have value of 1. Note that y⃗ represents the K pixels on the wafer covered by the circle C_p. There will be $(\begin{array}{c} n \\ K \end{array})$ distribution patterns of the n one-valued pixels in C_p, where $(\begin{array}{c} n \\ K \end{array})$ represents the number of K-combinations from a set of n elements (n choose K). Furthermore, the K grouped pixels in C_p could be located in any region of the print image. Assuming that the print image can have arbitrary geometry, on average, all of the $(\begin{array}{c} n \\ K \end{array})$ distributions have the same probability. In particular, each of the distribution pattern occurs with probability $P_{r} {N_{y} = n} / (\begin{array}{l} n \\ K \end{array}) = q_{n} / (\begin{array}{l} n \\ K \end{array})$ . If we know that the number of one-valued pixels in y⃗ is n, then the conditional probability of each distribution pattern is $1 / (\begin{array}{l} n \\ K \end{array})$ . It is important to remark that this simple uniform distribution model can somehow approximately describe the statistical behavior of y⃗ on average, leading to some useful results that will be described in the sequel. Based on the analysis above, we can calculate the entropy of y⃗ as

\begin{array}{l} E_{n} (\vec{y}) & = & - \sum_{n = 0}^{K} {[\frac{P_{r} {N_{y} = n}}{(\begin{matrix} n \\ K \end{matrix})}] \cdot {log}_{2} [- \frac{P_{r} {N_{y} = n}}{(\begin{matrix} n \\ K \end{matrix})}] \cdot (\begin{matrix} n \\ K \end{matrix})} \\ = & \sum_{n = 0}^{K} \sum_{m = 0}^{K} P_{r} {N_{y} = n, N_{x} = m} \cdot [{log}_{2} P_{r} {N_{y} = n} - {log}_{2} (\begin{matrix} n \\ K \end{matrix})] \\ = & - \sum_{n = 0}^{K} \sum_{m = 0}^{K} P_{r} {N_{y} = n, N_{x} = m} \cdot {log}_{2} P_{r} {N_{y} = n} \\ + \sum_{n = 0}^{K} \sum_{m = 0}^{K} P_{r} {N_{y} = n, N_{x} = m} \cdot {log}_{2} (\begin{matrix} n \\ K \end{matrix}) . \end{array}

The conditional entropy of y⃗ given x⃗ is formulated as follows

\begin{array}{l} E_{n} (\vec{y} | \vec{x}) & = & - \sum_{m = 0}^{K} P_{r} {N_{x} = m} \cdot [\sum_{n = 0}^{K} (\frac{P_{r} {N_{y} = n | N_{x} = m}}{(\begin{matrix} n \\ K \end{matrix})} \cdot {log}_{2} \frac{P_{r} {N_{y} = n | N_{x} = m}}{(\begin{matrix} n \\ K \end{matrix})} \\ \cdot (\begin{matrix} n \\ K \end{matrix}))] \\ = & - \sum_{n = 0}^{K} \sum_{m = 0}^{K} P_{r} {N_{y} = n, N_{x} = m} \cdot {log}_{2} \frac{P_{r} {N_{y} = n | N_{x} = m}}{(\begin{matrix} n \\ K \end{matrix})} \\ = & - \sum_{n = 0}^{K} \sum_{m = 0}^{K} P_{r} {N_{y} = n, N_{x} = m} \cdot {log}_{2} P_{r} {N_{y} = n | N_{x} = m} \\ + \sum_{n = 0}^{K} \sum_{m = 0}^{K} P_{r} {N_{y} = n, N_{x} = m} \cdot {log}_{2} (\begin{matrix} n \\ K \end{matrix}) . \end{array}

Thus, the mutual information between x⃗ and y⃗ can be calculated as

\begin{array}{l} I (\vec{x}; \vec{y}) & = & E_{n} (\vec{y}) - E_{n} (\vec{y} | \vec{x}) \\ = & - \sum_{n = 0}^{K} \sum_{m = 0}^{K} P_{r} {N_{y} = n, N_{x} = m} \cdot ({log}_{2} P_{r} {N_{y} = n} \\ - {log}_{2} P_{r} {N_{y} = n | N_{x} = m}) \\ = & - \sum_{n = 0}^{K} \sum_{m = 0}^{K} P_{r} {N_{y} = n | N_{x} = m} \cdot P_{r} {N_{y} = m} \cdot ({log}_{2} q_{n} \\ - {log}_{2} P_{r} {N_{y} = n | N_{x} = m}) \\ = & - \sum_{n = 0}^{K} \sum_{m = 0}^{K} T_{n m} \cdot p_{m} \cdot [{log}_{2} (\sum_{u = 0}^{K} T_{n u} \cdot p_{u}) - {log}_{2} T_{n m}], \end{array}

where p_m and p_u are the mth and the uth elements in p⃗, and T_nm and T_nu are the (n, m)th and the (n, u)th elements in T.

Note that Eq. (1) is a scalar imaging model, which takes into account only the amplitude of the electromagnetic field, but ignores its vector nature. For immersion lithography with hyper-NA (NA>1), the vector imaging model is more accurate than the scalar imaging model. According to the Abbe’s method, the vector imaging model of a partially coherent lithography system is

I = \frac{1}{J_{sum}} \sum_{x_{s}} \sum_{y_{s}} (J (x_{s}, y_{s}) \sum_{p = x, y, z} {| h_{p}^{x_{s} y_{s}} \otimes (B^{x_{s} y_{s}} ⊙ M) |}^{2}),

where (x_s, y_s) is the coordinate on the source plane, ⊙ is the notation of element-by-element multiplication, M is the mask pattern,

h_{p}^{x_{s} y_{s}}

represents the point spread function along the p–axis (p = x, y, z) of the lithography system corresponding to the source point (x_s, y_s), B^x_sy_s represents the oblique incidence effect of the light rays on the mask, J(x_s, y_s) is the intensity of the source point at (x_s, y_s), and

J_{sum} = \sum_{x_{s}} \sum_{y_{s}} J (x_{s}, y_{s})

is the normalized factor. Note that

h_{p}^{x_{s} y_{s}}

in Eq. (13) corresponds to the point spread function h^m (r) in Eq. (1). In this way, the aforementioned method to establish the information channel model can be straightforwardly applied to the vector imaging model by calculating the probability transfer matrix T and the mutual information between x⃗ and y⃗ according to the statistical methods mentioned above.

3. Relationship between mutual information and image fidelity

In this section, we discuss the relationship between the mutual information in Eq. (12) and the image fidelity of partially coherent lithography systems. According to information theory, the mutual information indicates the rate at which the information can be reliably transmitted over the information channel. In the partially coherent lithography system, the information of the layout pattern is transferred from mask to wafer pixel by pixel. For the binary mask, each pixel on the mask includes 1 bit information. However, the information on the mask cannot be completely transferred due to the band-limitation of lithography systems. Equation (12) provides the mutual information between K pixels on the mask and print images. The average mutual information per pixel is I(x⃗; y⃗)/K. Thus, the partially coherent lithography system can only transmit I(x⃗; y⃗)/K bit of information over one pixel without error, where 0≤ I(x⃗; y⃗)/K ≤ 1. That means we have to use at least K/I(x⃗; y⃗) pixels to transfer 1 bit information. On the print image, the adjacent K/I(x⃗; y⃗) pixels within a squared area are grouped together to form a macro pixel. The macro pixel is the smallest unit to transmit 1 bit information through the lithography system, and thus all of the micro pixels in one macro pixel have the same value. Figures 4(a) and 4(b) provide the examples of single pixels and macro pixels, respectively. Assume the side length of a single pixel is a, then the side length of a macro pixel is given by

a^{'} = a \cdot \sqrt{\frac{K}{I (\vec{x}; \vec{y})}} .

Figure 4(c) shows two adjacent macro pixels. The minimum distance between the two adjacent macro pixels is Δd = a, which is the side length of a single pixel. This is because adjacent pixels on the mask can be independent to each other. Each mask pixel will be mapped to a macro pixel on the print image.

Fig. 4 The relationship between mutual information and image fidelity. In the top row, (a), (b) and (c) show the single pixel, macro pixel and two adjacent macro pixels, respectively. In the bottom row, the target layout can be (d) perfectly covered, (e) incompletely covered or (f) overcompletely covered by the macro pixels.

Download Full Size | PDF

Figures 4(d)–4(f) show the method to evaluate the image fidelity based on mutual information. In this paper, pattern error (PE) is used as the metric to measure the image fidelity of lithography systems. The PE is defined as the square of the Euclidean distance between the actual print image and the target layout [6, 25]. Edge placement error (EPE) is another extensively used metric for lithography image fidelity. The EPE indicates the edge position error of the actual print image with respect to the desired pattern. It has been noted that the PE is approximately proportional to the average EPE for a given print image [25]. In Figs. 4(d)–4(f), the shadow regions illustrate examples of target layouts. As the output of the information channel, the print image consists of a set of macro pixels. Different macro pixels may overlap each other. Thus, we try to use the overlapped macro pixels to cover the underlying target layout. The macro pixels are represented by the squares contained in the red-dotted lines. The difference between the coverage and the target layout indicates the PE of the print image. The pattern error occurs when the target layout is uncovered (incomplete coverage) or the non-pattern area is extra-covered (overcomplete coverage).

Consider first the special case where the side length of the macro pixel, a′, is an integral multiple of the side length of the single pixel a. Now, let us consider two subcases. The first case is a′ ≤ CD, where CD is the critical dimension, and a′ is given in Eq. (14). The critical dimension is defined as the minimum feature size on the layout. Please note that CD is a multiple of a, since the single pixels are the minimum units to compose the layout pattern. As shown in Fig. 4(d), the macro pixels can perfectly cover the target layout without offset. According to Eq. (14), we have

\sqrt{\frac{K}{I (\vec{x}; \vec{y})}} = Π,

where Π is a positive integer. In this case, the minimum PE is 0. The other case is a′ > CD. In order to simplify the analysis, we assume that the target layout consists of lines with width equal to CD. Then, the PE resulting from overcomplete coverage is approximate to (a′ − CD) · L_t/(2 · a²), where L_t is the perimeter of the target layout, and a is the size length of the single pixel. On the other hand, the PE resulting from incomplete coverage is equal to the area of target layout divided by the area of single pixel, which is denoted by A_t/a². In summary, the minimum PE of print image can be calculated as follows

{PE}_{\min} = {\begin{array}{l} min {(a^{'} - CD) \cdot L_{t} / (2 \cdot a^{2}), A_{t} / a^{2}} & if a^{'} > CD \\ 0 & if a^{'} \leq CD \end{array},

where min {·, ·} is to select the minimum from the arguments.

Consider next the most general case where the side length of the macro pixel is not an integral multiple of the side length of the single pixel. As shown in Figs. 4(e) and 4(f), the target layout cannot be perfectly covered by the macro pixels, because of the inconsistency between the side lengths of a single pixel and a macro pixel. In order to obtain the minimum PE of the print image, we need to choose one of two options, either incomplete coverage as shown in Fig. 4(e) or overcomplete coverage as shown in Fig. 4(f). Now, let us consider three subcases. The first case is a′ > CD. Based on the analysis mentioned above, the PE resulting from overcomplete coverage is approximately (a′ − CD) · L_t/(2 · a²), while the PE resulting from incomplete coverage is equal to A_t/a². The second case is a′ ≤ CD and (a′ mod a) ≥ a/2, where “mod” is the congruence symbol. In this case, the incomplete coverage will lead to smaller PE than the overcomplete coverage. The minimum PE is [a − (a′ mod a)] · L_t/(2 · a²). The third case is a′ ≤ CD and (a′ mod a) < a/2. In this case, the overcomplete coverage will lead to smaller PE than the incomplete coverage. The minimum PE is (a′ mod a) · L_t/(2 · a²). In summary, the minimum PE of the print image can be calculate as follows

{PE}_{\min} = {\begin{array}{l} min {(a^{'} - CD) \cdot L_{t} / (2 \cdot a^{2}), A_{t} / a^{2}} & if a^{'} > CD \\ [a - (a^{'} mod a)] \cdot L_{t} / (2 \cdot a^{2}) & if a^{'} \leq CD and (a^{'} mod a) \geq a / 2 \\ (a^{'} mod a) \cdot L_{t} / (2 \cdot a^{2}) & if a^{'} \leq CD and (a^{'} mod a) < a / 2 \end{array} .

Based on the analysis above, in order to find out the theoretical limitation of image fidelity, we should try to make a′ approximate an integral multiple of a. At the same time, a′ should not be much larger than CD. According to Eq. (15), the mutual information should satisfy the following equation to reduce the minimum PE of the print image:

I (\vec{x}; \vec{y}) \approx \frac{K}{Π^{2}},

where Π is any integer smaller than or equal to CD/a.

4. Optimal information transfer and image fidelity limit

In this section, we will solve for the optimal information transfer (OIT), and then calculate the theoretical limit of image fidelity that can be achieved by pixelated OPC. The OIT is defined as the optimal value of the mutual information that results in the minimum PE of the print image. As discussed in Section 3, the OIT should satisfy Eq. (18). According to Eqs. (12) and (18), in order to calculate the OIT, we will find out the optimal distribution of p⃗ to minimize the following cost function:

F (\vec{p}) = {\sum_{n = 0}^{K} \sum_{m = 0}^{K} T_{n m} \cdot p_{m} \cdot [{log}_{2} (\sum_{u = 0}^{K} T_{n u} \cdot p_{u}) - {log}_{2} T_{n m}] + \frac{K}{Π^{2}}}^{2},

where the value of Π can be assigned according to the initial p⃗. Specifically, given the initial p⃗, we can calculate the initial value of (I(x⃗; y⃗) denoted by I⁰. Then, we substitute I⁰ into Eq. (18) to find the best positive integer Π that meets Eq. (18) most closely.

When we solve for the OIT, three constraints on p⃗ need to be taken into account. First, all of the elements in p⃗ are restricted within the range of [0,1], since the probability is always non-negative and smaller than or equal to 1. Second, the summation of all elements in p⃗ is equal to 1, i.e., $\sum_{m = 0}^{K} p_{m} = 1$ . In addition, the optimized print image obtained by computational lithography should be very close to the target layout. Thus, the probability distribution p⃗ should satisfy the following equation:

\tilde{\vec{q}} \approx T \cdot \vec{p},

where

\tilde{\vec{q}}

is the probability distribution of the target layout to be replicated on the wafer.

\tilde{\vec{q}}

is estimated from the target layout. This is done by going through all of the sub-regions covered by circle C_p on the target layout, and then counting the frequencies for the events N_y = n, where n = 0, 1, . . ., K. Finally, we calculate the probability of P_r {N_y = n}, which is the (n + 1)th element of

\tilde{\vec{q}}

.

According to Eq. (19) and the constraints on p⃗, the optimal probability distribution $\hat{\vec{p}}$ can be calculated by solving the following optimization problem:

\begin{array}{l} \hat{\vec{p}} = arg min_{\vec{p}} {\sum_{n = 0}^{K} \sum_{m = 0}^{K} T_{n m} \cdot p_{m} \cdot [{log}_{2} (\sum_{u = 0}^{K} T_{n u} \cdot p_{u}) - {log}_{2} T_{n m}] + \frac{K}{Π^{2}}}^{2}, \\ s . t . 0 \leq p_{m} \leq 1 for \forall m, \sum_{m = 1}^{K} p_{m} = 1, and \tilde{\vec{q}} \approx T \cdot \vec{p} . \end{array}

In other words, we want to obtain the optimal distribution p⃗ under the constraint for the target layout distribution. In order to do so, we transfer the second and third constraints as penalty terms and reformulate Eq. (21) as

\hat{\vec{p}} = arg min_{\vec{p}} F (\vec{p}), s . t . 0 \leq p_{m} \leq 1 for \forall m,

where the cost function F(p⃗) is defined as

\begin{array}{l} F (\vec{p}) & = & {\sum_{n = 0}^{K} \sum_{m = 0}^{K} T_{n m} \cdot p_{m} \cdot [{log}_{2} (\sum_{u = 0}^{K} T_{n u} \cdot p_{u}) - {log}_{2} T_{n m}] + \frac{K}{Π^{2}}}^{2} \\ + ω_{1} \cdot {(\sum_{m = 0}^{K} p_{m} - 1)}^{2} + ω_{2} \cdot {‖ \tilde{\vec{q}} - T \vec{p} ‖}_{2}^{2}, \end{array}

where ω₁ and ω₂ are the weight coefficients of the penalty terms.

Next, we use the following parameter transformation to restrict p_m within the range of [0,1]

p_{m} = \frac{1 + cos θ_{m}}{2} .

Based on Eq. (24), Eq. (22) is reformulated as

\hat{\vec{θ}} = arg min_{\vec{θ}} F (\vec{θ}),

where θ⃗ = (θ₁, θ₂, . . ., θ_K)^T. Before optimization, the parameters are initialized as

{\vec{θ}}^{0} = {cos}^{- 1} {2 \tilde{\vec{q}} - 1}

. Then, the steepest descent algorithm is applied to solve Eq. (25). In the (i + 1)th iteration, the variables are updated as

{\vec{θ}}^{i + 1} = {\vec{θ}}^{i} - step \cdot \nabla F (\vec{θ}),

where “step” is the step length, and ∇F(θ⃗) is the gradient of the cost function with respect to θ⃗. After each update, we normalize θ⃗ to guarantee that

\sum_{m = 0}^{K} p_{m} = 1

. Appendix A provides the details to derive the gradient ∇F(θ⃗).

After the optimization procedure, we substitute the optimal probability distribution $\hat{\vec{p}}$ into Eq. (12) and calculate the OIT as

\hat{I} = - \sum_{n = 0}^{K} \sum_{m = 0}^{K} T_{n m} \cdot {\hat{p}}_{m} \cdot [{log}_{2} (\sum_{u = 0}^{K} T_{n u} \cdot {\hat{p}}_{u}) - {log}_{2} T_{n m}] .

Next, we use the proposed method to obtain the OITs of “Layout 3” and “Layout 4” in Figs. 5(a) and 5(e). The weight coefficients ω₁ and ω₂ in Eq. (23) are chosen empirically to keep balance among different terms in the cost function. In the following simulations, we set step = 0.1, ω₁ = 1, ω₂ = 10, and the total number of iterations is 200. The convergence curves for the cost function and the mutual information for both layouts are plotted in Fig. 6. Notice that the cost function decreases with the iteration number, while the mutual information gradually approaches the optimal values to satisfy Eq. (18). The resulting OITs for “Layout 3” and “Layout 4” are 4.5982 and 4.5900, respectively. Substituting these OITs into Eqs. (14) and (17), we can calculate the lower bound of PE for the print images. As shown in the second column of Table 1, the limits of image fidelity for “Layout 3” and “Layout 4” are 3.05 and 17.50, respectively.

Fig. 5 Simulations of pixelated OPC algorithm for two layout patterns. In the top row, (a) and (e) show the target patterns for “Layout 3” and “Layout 4”, respectively. (b) and (f) show the OPC masks for “Layout 3” and “Layout 4”, respectively. The bottom row shows the print images of the masks in the top row.

Download Full Size | PDF

Fig. 6 Convergence curves for the cost function and the mutual information for the two layout patterns.

Download Full Size | PDF

Table 1. Theoretical limits of image fidelity, minimum PEs obtained by pixelated OPC, and the improved PEs obtained according to the optimal distributions.

View Table | View all tables in this article

In order to verify the theoretical limits of image fidelity, we use the gradient-based pixelated OPC algorithm in [6] to optimize the mask patterns for “Layout 3” and “Layout 4”. In this case, the wavelength λ=193nm, CD=45nm, NA=1.25, and the resist threshold is t_r = 0.19. The light source is an annular illumination with inner and outer partial coherence factors of σ_in = 0.8 and σ_in = 0.975 [2]. In Eq. (1), the lateral sizes of M(r) for “Layout 3” and “Layout 4” are 201 pixels and 261 pixels, respectively. The lateral size of h^m (r) is 21 pixels. In order to use the gradient-based OPC algorithm to optimize the mask, the hard threshold function in Eq. (5) is approximated by the differentiable sigmoid function [5], so that

Z \approx sigmoid {I, t_{r}} = \frac{1}{1 + exp [- a_{r} (I - t_{r})]},

where a_r is the steepness index of the sigmoid function. Note that a_r and t_r are two important parameters in the sigmoid function characterizing the effect of photoresist development on the print image. The gradient-based OPC methods optimize the mask patterns based on the imaging model and the photoresist model. As an important parameter in the photoresist model, a_r has a significant impact on the optimization results and on the lithography image fidelity. In addition, the value of the step length has a significant influence on the convergence of gradient-based OPC algorithms. In order to find out the minimum PE of the print image that can be obtained by the gradient-based OPC algorithm, we repeat the simulations using different combinations of step lengths and a_r. Particularly, we traverse the step length from 0.5 to 4 with an interval of 0.5, and traverse a_r from 10 to 90 with an interval of 5. For each parameter combination, we run the algorithm for 100 iterations, and find out the minimum PE value obtained during these iterations. Then, we compare the minimum PE to its theoretical limit derived from the optimal information transfer, thus verifying the proposed information theoretical approaches.

Figure 5 and Table 1 illustrate the results obtained from the OPC algorithm. For both target layouts, the best parameter combination is “step length”=1.5 and a_r = 90. Figures 5(b) and 5(f) show the optimized masks of “Layout 3” and “Layout 4” under the best parameter combinations. Figures 5(d) and 5(h) are the corresponding print images. Figures 5(c) and 5(g) are the print images obtained when the masks are equal to the target layouts. In Table 1, the second column presents, as explained before, the theoretical limits of image fidelity for both of the layouts. The third column provides the minimum PEs of print images that can be achieved by the gradient-based OPC algorithm. Notice that the theoretical limits are lower than the minimum PEs obtained by the OPC algorithm.

5. Application of the proposed information theoretical approach to improve the performance of OPC methods

As shown in Fig. 5, the minimum PEs obtained by the gradient-based OPC algorithm are larger than the theoretical limits. In this section, we apply the proposed information theoretical approach to improve the OPC solutions shown in Fig. 5. According to Eq. (22), $\hat{\vec{p}}$ represents the optimal probability distribution of the mask pattern, by which we can reach the lower bound of PE. Figures 7(a) and 7(d) illustrate the optimal distributions for “Layout 3” and “Layout 4”, respectively. On the other hand, Figs. 7(b) and 7(e) show the probability distributions of the obtained OPC masks in Figs. 5(b) and 5(f), respectively. Compared to the optimal distributions, the root mean square errors (RMSE) of the distributions of the OPC masks are 0.4997 and 0.2880, respectively. Notice that compared to the distributions of the OPC masks, the optimal distributions have smaller values for p₀, and larger values for p_i when i is small, where p₀ is the probability of opaque regions on the mask, the p_i’s for small i represent the likelihood of sub-resolution assist features (SRAF), and p_i’s for large i represent the likelihood of the main features. This means that, compared to the optimal distributions, the OPC patterns in Fig. 5 have too many opaque areas, and fail to insert enough SRAFs around the main features. This conclusion can be verified in Figs. 5(b) and 5(f), which show that the gradient-based OPC algorithm generates few SRAFs surrounding the main features, and most of the mask areas are opaque.

Fig. 7 Probability distributions of the $\hat{\vec{p}}$ (left column), OPC masks (middle column) and refined masks (right column). The top and bottom rows illustrate the probability distributions for “Layout 3” and “Layout 4”, respectively.

Download Full Size | PDF

Next, we propose a method to refine the OPC solutions and improve the image fidelity based on the optimal probability distribution $\hat{\vec{p}}$ . Let M′ be the OPC mask obtained by the gradient-based OPC algorithm. The workflow of the proposed method to refine the OPC solutions is provided in Table 2. At the beginning, the contour of M′ is calculated and denoted as E{M′}. Then, we try to insert the SRAFs around the main features to reduce the PE by turnning on some of the zero-valued pixels outside E {M′}. Subsequently, we go over all of the one-valued pixels inside E{M′}, and turn off some of them to further reduce the PE. After that, the refined mask pattern is produced. According to the manufacturing constraints of the mask, the SRAFs are not allowed to be too close to the main features on the mask pattern. Thus, the proposed method inserts SRAFs within the regions that are 10 pixels away from the boundaries of the mask patterns produced by OPC. In practice, the distance between SRAFs and main features can be adjusted based on the real mask fabrication requirements.

Table 2. Proposed method to refine the OPC solutions.

View Table | View all tables in this article

Figure 8 shows the refined mask patterns and their corresponding print images for “Layout 3” and “Layout 4”. Figures 7(c) and 7(f) show the probability distributions of the refined masks, which are shown in Figs. 8(a) and 8(c), respectively. Compared to the optimal distributions, the RMSEs of the distribution of the refined masks are 0.2067 and 0.1387, respectively, lower than the RMSEs for the masks optimized by the OPC method. The PEs for the refined masks are presented in the fourth column in Table 1, and are lower than the PEs resulting from the standard OPC masks. From the comparison between Figs. 5 and 8, we observe that the proposed method can insert many more SRAFs on the masks, and further improve the image fidelity of lithography systems.

Fig. 8 Refined masks and print images obtained by modifying the masks produced by OPC using the information theoretical approach.

Download Full Size | PDF

It should be noted that the proposed method in Table 2 is a heuristic method, which is likely to converge to a local minimum of pattern error. In addition, the mask patterns in Fig. 8 are complex and include numerous tiny features that are difficult to produce by current mask fabrication techniques. Nevertheless, the results in Fig. 8 demonstrate the potential of the proposed information theoretical approaches to improve the performance of current OPC algorithms, providing intuitions on how to achieve this improvement. In future work, we will extend the proposed information channel model to take into account other realistic influence factors, such as the process variations in lithography systems [28], mask manufacturability issues [29] and so on. We will also try to develop a global OPC optimization method to minimize the pattern error according to the optimal probability distribution of mask patterns.

6. Conclusion

This paper has introduced information theoretical approaches for computational lithography in partially coherent lithography systems. The information channel model was built up, and the mutual information between mask and print images was formulated and studied. Subsequently, the relationship between the mutual information and lithography image fidelity was discussed. Then, we derived the optimal information transfer and the theoretical limit of image fidelity for pixelated OPC techniques. Finally, the proposed information theoretical approaches were utilized to improve the OPC solutions of gradient-based algorithms. The methods proposed in this paper were verified by a set of simulations based on different layout patterns. In our future work, we will extend the proposed information theoretical approaches to take into account other realistic factors that influence lithography, such as process variations and mask manufacturability.

A. Appendix

According to Eq. (23), we define the three terms in the cost function as

F_{1} = {\sum_{n = 0}^{K} \sum_{m = 0}^{K} T_{n m} \cdot p_{m} \cdot [{log}_{2} (\sum_{u = 0}^{K} T_{n u} \cdot p_{u}) - {log}_{2} T_{n m}] + \frac{K}{Π^{2}}}^{2},

F_{2} = {(\sum_{m = 0}^{K} p_{m} - 1)}^{2},

F_{3} = {‖ \tilde{\vec{q}} - T \vec{p} ‖}_{2}^{2} .

The partial derivative of F₁ with respect to p_m is

\begin{array}{l} \frac{\partial F_{1}}{\partial p_{m}} & = & 2 {\sum_{n = 0}^{K} \sum_{m = 0}^{K} T_{n m} \cdot p_{m} \cdot [{log}_{2} (\sum_{u = 0}^{K} T_{n u} \cdot p_{u}) - {log}_{2} T_{n m}] + \frac{K}{Π^{2}}} \\ \cdot \sum_{n = 0}^{K} T_{n m} \cdot {[{log}_{2} (\sum_{u = 0}^{K} T_{n u} \cdot p_{u}) - {log}_{2} T_{n m}] + p_{m} \cdot \frac{T_{n m}}{(\sum_{u = 0}^{K} T_{n u} \cdot p_{u}) ln 2}} \\ = & 2 {\sum_{n = 0}^{K} \sum_{m = 0}^{K} T_{n m} \cdot p_{m} \cdot ({log}_{2} q_{n} - {log}_{2} T_{n m}) + \frac{K}{Π^{2}}} \\ \cdot \sum_{n = 0}^{K} T_{n m} \cdot {{log}_{2} q_{n} - {log}_{2} T_{n m} + \frac{p_{m} \cdot T_{n m}}{q_{n} \cdot ln 2}} . \end{array}

The partial derivative of F₂ with respect to p_m is

\frac{\partial F_{2}}{\partial p_{m}} = 2 (\sum_{m = 0}^{K} p_{m} - 1) .

The gradient of F₃ with respect to p⃗ can be calculated as

{\nabla F_{3} |}_{\vec{p}} = 2 T^{T} \cdot T \cdot \vec{p} - 2 T^{T} \cdot \vec{q},

where T^T is the transposition of T.

Funding

National Natural Science Foundation of China (NSFC) (61675021); Beijing Natural Science Foundation (4173078); the Fundamental Research Funds for the Central Universities (2018CX01025); the China Scholarship Council (Grant No. 201706035012).

References and links

1. A. K. Wong, Resolution Enhancement Techniques in Optical Lithography (SPIE, 2001). [CrossRef]

2. X. Ma and G. R. Arce, Computational Lithography, 1st ed. (John Wiley and Sons, 2010). [CrossRef]

3. Y. Liu and A. Zakhor, “Binary and phase shifting mask design for optical lithography,” IEEE Trans. Semicond. Manuf. 5(2), 138–152 (1992). [CrossRef]

4. Y. Granik, “Fast pixel-based mask optimization for inverse lithography,” J. Microlith. Microfab. Microsyst. 5(4), 043002 (2006).

5. A. Poonawala and P. Milanfar, “Mask design for optical microlithography – an inverse imaging problem,” IEEE Trans. Image Process. 16(3), 774–788 (2007). [CrossRef] [PubMed]

6. X. Ma and G. R. Arce, “Binary mask optimization for inverse lithography with partially coherent illumination,” J. Opt. Soc. Am. A 25(12), 2960–2970 (2008). [CrossRef]

7. X. Ma and G. R. Arce, “Generalized inverse lithography methods for phase-shifting mask design,” Opt. Express 15(23), 15066–15079 (2007). [CrossRef] [PubMed]

8. N. B. Cobb and Y. Granik, “Dense OPC for 65nm and below,” Proc. SPIE 5992, 599259 (2005). [CrossRef]

9. P. M. Martin, C. J. Progler, G. Xiao, R. Gray, L. Pang, and Y. Liu, “Manufacturability study of masks created by inverse lithography technology (ILT),” Proc. SPIE 5992, 599235 (2005). [CrossRef]

10. A. Poonawala and P. Milanfar, “OPC and PSM design using inverse lithography: A non-linear optimization approach,” Proc. SPIE 6154, 61543H (2006). [CrossRef]

11. A. Poonawala, B. Painter, and C. Kerchner, “Model-based assist feature placement for 32nm and 22nm technology nodes using inverse mask technology,” Proc. SPIE 7488, 748814 (2009). [CrossRef]

12. Y. Shen, N. Wong, and E. Y. Lam, “Level-set-based inverse lithography for photomask synthesis,” Opt. Express 17(26), 23690–23701 (2009). [CrossRef]

13. N. Jia and E. Y. Lam, “Machine learning for inverse lithography: Using stochastic gradient descent for robust photomask synthesis,” J. Opt. 12(4), 045601 (2010). [CrossRef]

14. J. Yu and P. Yu, “Impacts of cost functions on inverse lithography patterning,” Opt. Express 18(8), 23331–23342 (2010). [CrossRef] [PubMed]

15. X. Ma and G. R. Arce, “Pixel-based OPC optimization based on conjugate gradients,” Opt. Express 19(3), 2165–2180 (2011). [CrossRef] [PubMed]

16. X. Ma, Y. Li, and L. Dong, “Mask optimization approaches in optical lithography based on a vector imaging model,” J. Opt. Soc. Am. A 29(7), 1300–1312 (2012). [CrossRef]

17. X. Ma, Z. Song, Y. Li, and G. R. Arce, “Block-based mask optimization for optical lithography,” Appl. Opt. 52(14), 3351–3363 (2013). [CrossRef] [PubMed]

18. X. Ma, G. R. Arce, and Y. Li, “Optimal 3D phase-shifting masks in partially coherent illumination,” Appl. Opt. 50(28), 5567–5576 (2011). [CrossRef] [PubMed]

19. W. Lv, S. Liu, Q. Xia, X. Wu, Y. Shen, and E. Y. Lam, “Level-set-based inverse lithography for mask synthesis using the conjugate gradient and an optimal time step,” J. Vac. Sci. Technol. B 31(4), 041605 (2013). [CrossRef]

20. M. L. Rieger, “Communication theory in optical lithography,” J. Micro/Nanolith. MEMS MOEMS 11(1), 013003 (2012). [CrossRef]

21. X. Ma, H. Zhang, Z. Wang, Y. Li, G. R. Arce, J. Garcia-Frias, and L. Zhang, “Information theoretical aspects in coherent optical lithography systems,” Opt. Express 25(23), 29043–29057 (2017). [CrossRef]

22. B. E. A. Saleh and M. Rabbani, “Simulation of partially coherent imagery in the space and frequency domains and by modal expansion,” Appl. Opt. 21(15), 2770–2777 (1982). [CrossRef] [PubMed]

23. X. Ma, D. Shi, Z. Wang, Y. Li, and G. R. Arce, “Lithographic source optimization based on adaptive projection compressive sensing,” Opt. Express 25(6), 7131–7149 (2017). [CrossRef] [PubMed]

24. Z. Song, X. Ma, J. Gao, J. Wang, Y. Li, and G. R. Arce, “Inverse lithography source optimization via compressive sensing,” Opt. Express 22(12), 14180–14198 (2014). [CrossRef] [PubMed]

25. W. Lv, Q. Xia, and S. Liu, “Mask-filtering-based inverse lithography,” J. Micro/Nanolith. MEMS MOEMS 12(4), 043003 (2013). [CrossRef]

26. M. Born and E. Wolf, Principles of Optics, Cambridge University (1999). [CrossRef]

27. R. Wilson, Fourier Series and Optical Transform Techniques in Contemporary Optics (Wiley, 1995).

28. P. Yu, S. X. Shi, and D. Z. Pan, “True process variation aware optical proximity correction with variational lithography modeling and model calibration,” J. Micro/Nanolith. MEMS MOEMS 6(3), 031004 (2007). [CrossRef]

29. S. Jiang, X. Ma, and A. Zakhor, “A recursive cost-based approach to fracturing,” Proc. SPIE 7973, 79732P (2011). [CrossRef]

Step1	Find out the contour of M′, which is denoted by E{M′}.
Step2	Go over all of the zero-valued pixels outside E{M′}. First visit the pixels close to the contour, then those far off. Each time, choose one pixel called M′ (x, y). The pixel M′ (x, y) must satisfy the following condition: The number of bright pixels within the area of C_p around M′ (x, y) should be less than 50. Then, we turn M′ (x, y) on, and calculate the PE of the print image. If the PE is increased, restore M′ (x, y) to a dark pixel. Otherwise, keep the update of M′ (x, y).
Step3	Go over all of the one-valued pixels inside E{M′}. Each time, choose one pixel called M′ (x′, y′) and turn it off. Then, calculate the PE of the print image. If the PE is increased, then restore M′ (x, y) to a bright pixel. Otherwise, keep the update of M′ (x′, y′).
Step4	Output the refined mask pattern, which is denoted by M^*.

Step1	Find out the contour of M′, which is denoted by E{M′}.
Step2	Go over all of the zero-valued pixels outside E{M′}. First visit the pixels close to the contour, then those far off. Each time, choose one pixel called M′ (x, y). The pixel M′ (x, y) must satisfy the following condition: The number of bright pixels within the area of C_p around M′ (x, y) should be less than 50. Then, we turn M′ (x, y) on, and calculate the PE of the print image. If the PE is increased, restore M′ (x, y) to a dark pixel. Otherwise, keep the update of M′ (x, y).
Step3	Go over all of the one-valued pixels inside E{M′}. Each time, choose one pixel called M′ (x′, y′) and turn it off. Then, calculate the PE of the print image. If the PE is increased, then restore M′ (x, y) to a bright pixel. Otherwise, keep the update of M′ (x′, y′).
Step4	Output the refined mask pattern, which is denoted by M^*.

Information theoretical approaches in computational lithography

Abstract

1. Introduction

2. Information channel model for partially coherent lithography systems

3. Relationship between mutual information and image fidelity

4. Optimal information transfer and image fidelity limit

5. Application of the proposed information theoretical approach to improve the performance of OPC methods

6. Conclusion

A. Appendix

Funding

References and links

Cited By

Figures (8)

Tables (2)

Equations (34)

Optics Express

	Limit of PE	Minimum PE	Improved PE
Layout 3	3.05	54	46
Layout 4	17.50	630	599

	Limit of PE	Minimum PE	Improved PE
Layout 3	3.05	54	46
Layout 4	17.50	630	599