Fast diffraction model of lithography mask based on improved pixel-to-pixel generative adversarial network

Junbi Zhang; Xu Ma; Xu Ma

doi:10.1364/OE.489770

1. Introduction

Optical lithography is a key process in integrated circuit (IC) manufacturing. The development of extreme ultraviolet (EUV) lithography with very short illumination wavelength can alleviate the challenge of imaging resolution caused by the continuous reduction of IC feature size [1]. Figure 1 (a) shows the sketch of an EUV lithography system. The plasma material is bombarded by an instantaneous power laser with tens of thousands of watts, and the EUV light with 13.5 nm wavelength is excited [2]. The EUV light rays pass through the illumination system and obliquely strike the reflective mask. The reflected light rays off mask are then collected by the projection optics and produce the aerial image on the wafer. Finally, the photoresist coated on the wafer is developed to form the print image of the mask pattern. Figure 1 (b) shows the cross section of the three-dimensional (3D) mask with a contact hole feature. The mask is composed of the reflective multilayer and absorber layer. The reflective multilayer consists of 40 pairs of Mo/Si layers [3]. The absorber layer is usually made from TaN, and is used to depict the mask feature. The mask will reflect the incident EUV light rays and generates the diffraction near-field (DNF) close to the mask surface [4].

Fig. 1. Illustrations of the (a) EUV lithography system and (b) 3D mask structure

Download Full Size | PDF

In EUV lithography system, the reflective mask is obliquely illuminated, and the thickness of absorber is usually several times larger than the illumination wavelength. Thus, the serious mask 3D effects appear, including the asymmetric shadowing effect, focus shifting and so on [5–7]. These effects bring a great challenge on the computation of mask DNF in EUV lithography. The DNF can be formulated as the multiplication between a set of diffraction matrices and the Jones vector of incident light rays. Considering the polarizations of the light waves, the DNF can be characterized by four complex-valued diffraction matrices. Thus, computing the diffraction matrices becomes the main task of 3D mask modelling.

The rigorous electromagnetic field (EMF) simulation methods can accurately analyze the diffraction behavior of 3D mask by solving the Maxwell’s equations [8–11]. However, these methods are too slow to simulate the large-scale mask patterns. Therefore, various fast 3D mask models were proposed at the cost of reducing accuracy [12–16]. To date, the shrinkage of IC critical dimension makes the mask 3D effects more pronounced, and the crosstalk effect between neighboring mask edges becomes a key problem [17]. It is difficult to simulate the crosstalk effect accurately, especially for the curvilinear mask patterns with freeform feature contours. Moreover, the real part and imaginary part of DNF are coupled together through the amplitude and phase of the electromagnetic fields. However, most of the approximation models predict the real part and imaginary part of DNF separately, without considering the correlation between them.

Recently, deep learning has been applied to 3D mask modelling to seek the balance between calculation accuracy and efficiency. For example, the non-parametric kernel regression and data fitting techniques were used to speed up the 3D mask models in both of deep ultraviolet lithography system [18] and EUV lithography system [19]. However, these methods include the mask decomposition and stitching operations that bring additional runtime cost. Convolutional neural network (CNN) was used to predict the DNF of thick mask [20], but prediction accuracy is limited by the incident angle. In addition, the conditional generative adversarial network and fully convolution network were applied in fast 3D mask models [21,22]. However, the crosstalk effect is hardly to be characterized accurately, and the prediction accuracy remains to be improved.

This paper proposes an improved pixel-to-pixel generative adversarial network (P2P-GAN) to enhance the performance of fast 3D mask model. In order to facilitate the calculation, the four complex-valued diffraction matrices of 3D mask can be decomposed into eight real-valued matrices, corresponding to the real parts and imaginary parts, respectively. Given a mask pattern, the proposed network will predict the eight diffraction matrices simultaneously. The proposed P2P-GAN is mainly composed of a generator and a discriminator. The generator is implemented by U-Net [23], which generates the samples as close as possible to the distribution of real data sets. The discriminator uses the PatchGAN structure to distinguish the real data from the generated data [24].

In order to improve the performance of P2P-GAN, we modify the generator in three folds. Firstly, the deformable convolution (DeConv) is used to capture the edge crosstalk effect [25]. The DeConv kernel can be adaptively modified according to the mask edge distributions in the training stage, thus extracting the edge features. In order to balance the computational accuracy and efficiency, we only use DeConv in the second layer. Secondly, in order to take into account the coupling relationship between the real parts and imaginary parts of the diffraction matrices, a long-short-term-memory (LSTM) module is inserted between the encoder and decoder of U-Net [26]. The LSTM module is used to exchange and fuse the information from different channels. Thirdly, the up-sampling operation in U-Net is usually realized by transpose convolution and interpolation. In order to improve the quality of the generated DNF samples, a subpixel super-resolution (SPSR) method is used in the up-sampling process [27].

In this paper, the supervised learning method is used to train the network parameters. A group of representative curvilinear masks are selected as training samples. The DNFs of the training masks are calculated using the rigorous EMF method. The proposed method in this paper is compared to the rigorous EMF simulator, fully convolution network (FCN) [28], and standard P2P-GAN (SP2P-GAN). We also conduct ablation study to verify the effectiveness of the DeConv, LSTM module and SPSR method. It shows that the proposed model outperforms other fast models in terms of prediction accuracy, and significantly accelerates the computation in contrast with the rigorous model.

The remainder of this paper is organized as follows. Section 2 introduces the data preparation method and the fundamental of SP2P-GAN model. Section 3 puts forward the improved P2P-GAN model for the fast 3D mask effect simulation. Section 4 presents the simulation results and analysis. Section 5 gives the conclusions.

2. Data preparation and standard P2P-GAN

In this section, the construction of DNF data set is introduced. In addition, the fundamental and background of the SP2P-GAN are described.

2.1 DNF data preparation

Considering the polarization of light wave, the DNF of 3D mask is usually represented by the diffraction matrices $\textrm{E}(\textrm{UV}) = {\textrm{E}_{\textrm{real}}}(\textrm{UV}) + i{\textrm{E}_{\textrm{imag}}}(\textrm{UV})$, where $\textrm{U = X}\;or\;\textrm{Y}$ and $\textrm{V = X}\;or\;\textrm{Y}$. The diffraction matrices represent the complex amplitude of mask DNF with U-polarization generated by the unit incident light field with V-polarization, where ${\textrm{E}_{\textrm{real}}}(\textrm{UV})$ and ${\textrm{E}_{\textrm{imag}}}(\textrm{UV})$ represent the real part and imaginary part of E(UV), respectively. Considering the combination of polarization states of U and V, a 3D mask can be represented by four complex-valued diffraction matrices, which can be then divided into eight real-valued matrices of the same size. Figure 2 illustrates an example of the DNF data set. Figure 2(a) shows the target mask ${\mathbf M} \in {{\mathbb R}^{{N_M} \times {N_M}}}$, and Fig. 2(b) to 2(i) show the eight real-valued diffraction matrices of the target mask, which are calculated by the rigorous EMF simulation method. It is noted that the eight diffraction matrices are coupled to each other through the amplitude and phase of the electromagnetic fields. Thus, we can stack the eight real-valued matrices in a certain order to form a data cube ${\mathbf T} \in {{\mathbb R}^{{N_M} \times {N_M} \times 8}}$ as shown in Fig. 2(j).

Fig. 2. Data set of the 3D mask DNFs: (a) target mask; (b) to (i) eight real-valued diffraction matrices calculated by rigorous EMF simulator; (j) data cube T of diffraction matrices combined in sequence.

Download Full Size | PDF

2.2 Architecture of standard P2P-GAN

The classical GAN can only generate samples according to the distribution of target data, as shown in Fig. 3 (a), so it can expand the data set. However, in this paper, we expect to predict the DNF for the target mask, so we need to pose more precise restrictions on the output data. We use labels as constraints to train the network, and use P2P-GAN to realize the automatic image-to-image translation for pixelated layout pattern, as shown in Fig. 3 (b).

Fig. 3. Network structures of: (a) traditional GAN and (b) SP2P-GAN.

Download Full Size | PDF

As shown in Fig. 3 (b), in the training stage, the input to the generator is the target mask M. The input data of the discriminator include the target mask M, the ground-truth DNF from database and the predicted DNF obtained by the generator. In this paper, the generator of P2P-GAN consists of an encoder and a decoder with the same number of layers [29]. The discriminator of P2P-GAN uses the PatchGAN structure [24]. PatchGAN maps the input data to a patch D of size $P \times P$, and each element in D represents a small area of the input data. The value of each element represents the probability that each patch is true. Then, the average over the elements in D is assigned as the final discriminator output.

Let ${\mathbf M}\ast{=} \{{{{\mathbf M}_1},{{\mathbf M}_2},\ldots ,{{\mathbf M}_\textrm{n}}} \}$ represent the set of input masks, where n represents the total number of masks. Let ${\mathbf T}\ast{=} \{{{{\mathbf T}_1},{{\mathbf T}_2},\ldots ,{{\mathbf T}_\textrm{n}}} \}$ represent the set of DNF data cubes. The generator ${\mathbf G}({\cdot} )$ and discriminator ${\mathbf D}({\cdot} )$ can be formulated as:

(1)$$\begin{array}{l} \mathop {\max }\limits_{\mathbf{G}} {\mathrm{\mathbb{E}}_{{{\mathbf{M}}_\textrm{n}}\sim {\mathbf{M}}\ast }}[{\log ({\mathbf D}({{\mathbf M}_\textrm{n}},{\mathbf G}({{\mathbf M}_\textrm{n}})))} ],\\ \mathop {\max }\limits_{\mathbf D} {\mathrm{\mathbb{E}}_{{{\mathbf M}_\textrm{n}}\sim {\mathbf M}\ast ,{{\mathbf T}_\textrm{n}}\sim {{\mathbf T}_\textrm{n}}\ast }}[{\log {\mathbf D}({{\mathbf M}_\textrm{n}},{{\mathbf T}_\textrm{n}})} ]+ {\mathrm{\mathbb{E}}_{{{\mathbf M}_\textrm{n}}\sim {\mathbf M}\ast }}[{\log (1 - {\mathbf D}({{\mathbf M}_\textrm{n}},{\mathbf G}({{\mathbf M}_\textrm{n}})))} ]. \end{array}$$

where ${\boldsymbol x}\sim {\boldsymbol y}$ means that the data x satisfies the distribution of y, and $\mathrm{\mathbb{E}}$ denotes the mathematical expectation.

In addition, in order to reduce the difference between the generator and discriminator, and enhance the network ability to capture the low-frequency information of images, we need to introduce traditional losses, such as the l₁-norm or l₂-norm. However, using l₂-norm will cause the generated image to become more blurred. On the other hand, the l₁-norm could alleviate this problem [30]. So, the l₁-norm is used in this paper:

(2)$${\mathrm{\mathbb{E}}_{{{\mathbf M}_\textrm{n}}\sim {\mathbf M}\ast ,{{\mathbf T}_\textrm{n}}\sim {{\mathbf T}_\textrm{n}}\ast }}[{{{||{{{\mathbf T}_\textrm{n}} - {\mathbf G}({{\mathbf M}_\textrm{n}})} ||}_1}} ],$$

where, ${||\cdot ||_1}$ represents the l₁-norm. As a result, the loss function of P2P-GAN is defined as:

(3)$$\begin{array}{l} loss = \arg \mathop {\min }\limits_{\mathbf G} \mathop {\max }\limits_{\mathbf D} \{{{\mathrm{\mathbb{E}}_{{{\mathbf M}_\textrm{n}}\sim {\mathbf M}\ast ,{{\mathbf T}_\textrm{n}}\sim {{\mathbf T}_\textrm{n}}\ast }}[{\log {\mathbf D}({{\mathbf M}_\textrm{n}},{{\mathbf T}_\textrm{n}})} ]+ {\mathrm{\mathbb{E}}_{{{\mathbf M}_\textrm{n}}\sim {\mathbf M}\ast }}[{\log (1 - {\mathbf D}({{\mathbf M}_\textrm{n}},{\mathbf G}({{\mathbf M}_\textrm{n}})))} ]} \}\\ \;\;\;\;\;\;\;\; + \lambda {\mathrm{\mathbb{E}}_{{{\mathbf M}_\textrm{n}}\sim {\mathbf M}\ast ,{{\mathbf T}_\textrm{n}}\sim {{\mathbf T}_\textrm{n}}\ast }}[{{{||{{{\mathbf T}_\textrm{n}} - {\mathbf G}({{\mathbf M}_\textrm{n}})} ||}_1}} ], \end{array}$$

where $\lambda$ represents the weight of the L1-loss.

3. Improved P2P-GAN model

This section describes the details of the proposed model. The improved P2P-GAN uses U-Net as the generator. U-Net is a commonly used CNN framework, which exploits the layer skip connection structure to achieve better end-to-end prediction. Considering the coupling relationship between the real parts and imaginary parts of the diffraction matrices, we set the output dimension of U-Net to 8 channels, corresponding to the eight real-valued diffraction matrices. The PatchGAN is used as the discriminator to generate a matrix. Each element of the matrix judges the authenticity of the corresponding DNF patch that is inputted to the discriminator. All elements should be normalized and then averaged. The final output represents the authenticity of the entire DNF data. It can be used to improve the clarity and accuracy of the generated results.

3.1 Deformable convolution

In the 3D mask model, the crosstalk effect between mask edges caused by the scattering phenomenon has to be considered [31]. As shown in Fig. 4, the scattering of mask edges can be divided into primary scattering effect (blue solid arrows) and secondary scattering effect (blue dotted arrows). The primary scattering effect refers to the scattering of incident light by the edge of the absorber layer, and also includes the interference effect between the scattered beams. The secondary scattering effect refers to the result that the scattered beams are again scattered by the edges on the adjacent patterns, which leads to the interaction between the edges, namely edge crosstalk. Generally, the secondary scattering effect is much weaker than the primary scattering effect. However, the thickness of absorber layer is much larger than the wavelength and close to the mask critical dimension. Therefore, the secondary scattering is more prone to happen on the EUV lithography mask, which leads to non-negligible crosstalk effect. In order to better simulate the crosstalk effect, we introduce the DeConv to sense the interaction between mask edges.

Fig. 4. Primary and secondary scattering effects at the edges of absorber layer

Download Full Size | PDF

As shown in Fig. 5 (a), the standard convolution kernel can be represented by a square matrix, where the underlying layout is sampled at fixed locations by the kernel. So, the receptive field of the network is limited and hardly to capture the neighboring edges with different scales and curvilinear contours. On the other hand, the DeConv kernel adds different offsets to the sampling positions, as shown in Fig. 5 (b). By training the offset values, the DeConv can automatically adjust the sampling scales and distributions, thus extending the receptive field to capture the crosstalk effect, as shown in Fig. 5 (c).

Fig. 5. Standard convolution and DeConv: (a) the sampling of standard convolution kernel; (b) the sampling of DeConv kernel; and (c) the offsets of sampling positions.

Download Full Size | PDF

It is noted that DeConv will introduce extra parameters and increase the computational complexity of the training process. In order to reduce the computation burden, we only introduce the DeConv in the second convolution layer of U-Net.

3.2 LSTM module

As mentioned above, the real parts and imaginary parts of the diffraction matrices are coupled together through the amplitude and phase of the light field. However, the parallel network will predict the real and imaginary parts separately, and thus destroy the coupling relationship. To solve this problem, we introduce an LSTM module to strengthen the coupling relationship between channels. LSTM module is extensively used to process the sequence data [26]. It has strong long-term memory ability and is suitable for processing the time-series with many data samples. In addition, it can solve the problems of gradient disappearance and explosion in the learning process [26].

As shown in Fig. 6, we insert the LSTM module between the encoder and decoder of U-Net. The output of the encoder is a tensor of size (B_E, C_E, W_E, H_E), where B_E, C_E, W_E, and H_E respectively represent the batch size, channel number, width and height of the output. We convert this tensor to a new one of size (B_E, C_E, ${\textrm{W}_\textrm{E}} \times {\textrm{H}_\textrm{E}}$) and input it into the LSTM module. The LSTM module is used to exchange and fuse information among different channels. Then, the output of LSTM module is resized to a tensor of size (B_E, C_E, W_E, H_E), which is used as the input of the decoder.

Fig. 6. The flowchart of the encoder, LSTM module and decoder

Download Full Size | PDF

The LSTM module is composed of multiple memory cells. The structure of a memory cell is shown in Fig. 7, which includes the forget gate, input gate and output gate [26]. We can describe these three gates respectively through Eqs. (4)–(6). The meanings of mathematical symbols in Fig. 7 and Eqs. (4)–(6) are defined as follows: ${x_t}$ is the input of memory cells at time t; ${\sigma _f}$, ${\sigma _i}$ and ${\sigma _o}$ are the activation functions of the three gates; ${W_f}$, ${W_i}$, ${W_c}$, ${W_o}$, ${V_f}$, ${V_i}$, ${V_c}$ and ${V_o}$ are the weight matrices; ${b_f}$, ${b_i}$, ${b_c}$ and ${b_o}$ are the bias vectors; ${f_t}$, ${i_t}$ and ${o_t}$ are the output values of the three gates, C_t and ${h_t}$ represent the long-term memory and short-term memory at time t.

Fig. 7. The structure of memory cell in LSTM module: the (a) forget gate; (b) input gate and (c) output gate.

Download Full Size | PDF

Figure 7 (a) shows the forget gate, which determines the amount of historical information retained. The forget gate generates an activation value ${f_t}$ of the forgotten information based on the output value ${h_{t - 1}}$ at the previous moment and the current input data ${x_t}$:

(4)$${f_t} = {\sigma _f}({W_f} \cdot {h_{t - 1}} + {V_f} \cdot {x_t} + {b_f}).$$

Figure 7 (b) shows the input gate, which is used to determine the new information stored in the cell state and update the cell state. First, we should calculate the current value $\widetilde {{C_t}}$ of the cell at time t. We need to calculate the activation value ${i_t}$ of the memorized information from ${h_{t - 1}}$ and ${x_t}$. Then, the activation value ${f_t}$ of the forgotten information are multiplied by the cell state ${C_{t - 1}}$, and the current value $\widetilde {{C_t}}$ are multiplied by the activation value ${i_t}$ of the memorized information. Finally, the cell state ${C_t}$ at time t is obtained by adding them up:

(5)$$\begin{aligned} \widetilde {{C_t}} &= \tanh ({W_c} \cdot {h_{t - 1}} + {V_c} \cdot {x_t} + {b_c}),\\ {i_t} &= {\sigma _i}({W_i} \cdot {h_{t - 1}} + {V_i} \cdot {x_t} + {b_i}),\\ {C_t} &= {f_t} \cdot {C_{t - 1}} + {i_t} \cdot \widetilde {{C_t}}. \end{aligned}$$

Figure 7 (c) shows the output gate that determines the output value of the cell. The calculation process is formulated as follows:

(6)$$\begin{aligned} {o_t} &= {\sigma _o}({W_o} \cdot {h_{t - 1}} + {V_o} \cdot {x_t} + {b_o}),\\ {h_t} &= {o_t} \cdot \tanh ({C_t}). \end{aligned}$$

Therefore, according to Eqs. (4)–(6), the input of LSTM module includes the long-term memory C_t-1, short-term memory h_t-1 and the information x_t at the current moment. The output results include the long-term memory C_t and the short-term memory h_t at the current moment.

3.3 Subpixel super-resolution sampling

The traditional up-sampling methods used in decoder include the interpolation, deconvolution and unpooling, as shown in Fig. 8. The interpolation method will blur edge and reduce the accuracy of the generated image. For the deconvolution method, when the convolutional kernel size is not divisible by the stride, it will lead to “uneven overlap” as shown in Fig. 8 (b), so that some colorful blocks in the image are darker than the surrounding blocks. Unpooling method uses 0 to fill in the blank position, as shown in Fig. 8 (c), which will introduce additional noise.

Fig. 8. Three kinds of traditional up-sampling methods: (a) interpolation, (b) deconvolution and (c) unpooling.

Download Full Size | PDF

For those reasons mentioned above, the traditional generator may reduce the resolution of the output image. In order to improve the accuracy of the generated DNF, we use the subpixel super-resolution sampling to replace the traditional up-sampling methods in the decoder [32].

SPSR sampling is a new up-sampling method that is widely used in super-resolution processing. It can reconstruct the high-resolution image from the low-resolution image without introducing additional noise, as shown in Fig. 9. The key step of SPSR is called pixelshuffle, which is to obtain the high-resolution feature images from the low-resolution feature images through the convolution and multi-channel reorganization.

Fig. 9. The flow of SPSR sampling method.

Download Full Size | PDF

When we want to convert the low-resolution feature map of size $W \times H$ into the high-resolution feature map of size $rW \times rH$ through SPSR, we need to first convert the low-resolution feature map into a tensor of size $1 \times 1 \times W \times H$. Then, the tensor is convolved with totally r² kernels. As a result, the output tensor size after convolution is $1 \times {r^2} \times W \times H$. Then, the pixel reorganization is implemented through periodic shuffling, and the dimension of the tensor is adjusted to $1 \times 1 \times rW \times rH$. Finally, this new tensor can be converted into a matrix of size $rW \times rH$, which is the desired high-resolution feature map.

In this model, the SPSR up-sampling process uses a convolutional kernel with the size of $3 \times 3$, and the convolution stride is set to 1 pixel.

3.4 Structure of improved P2P-GAN

Combined with the aforementioned methods, the generator and discriminator of the improved P2P-GAN are shown in Fig. 10 and Fig. 11, respectively. In Fig. 10, different operations are represented by arrows of different colors, and the blue box represents the size of the tensor in the current layer.

Fig. 10. The generator of the improved P2P-GAN by adding the DeConv, LSTM module and SPSR sampling.

Download Full Size | PDF

Fig. 11. The discriminator of the improved P2P-GAN using the PatchGAN structure.

Download Full Size | PDF

As shown in Fig. 11, the discriminator uses the PatchGAN structure with two inputs, one receives the single-channel image, and the other receives the eight-channel data cube. The final output is a tensor with the size of (batch_size, channel, width, height). We convert this tensor into a matrix of size (width, height), which is used to judge the authenticity of the data.

4. Simulation results and analysis

4.1 DNF data set of EUV lithography mask

In this paper, we establish our own dataset for DNF prediction, which contains a total of 120 curvilinear mask patterns with different geometric features. These mask patterns mainly include three types of features: convex corners, concave corners, and edges. The size of each mask pattern is $80\textrm{nm} \times 80\textrm{nm}$. In order to represent the masks by matrices, we convert them into the pixelated patterns with the dimension of $80\textrm{pixels} \times 80\textrm{pixels}$, where the edge length of the pixel is 1 nm. The critical dimensions of those mask patterns are below 20 nm. We randomly allocate the training set and testing set according to the ratio of 2:1. Figure 12 illustrates some examples of the training samples and testing samples.

Fig. 12. Some examples of the mask patterns in (a) the training set and (b) the testing set.

Download Full Size | PDF

In order to generate the labels for the training set, a rigorous simulation method called waveguide (WG) method is used to calculate the DNFs of the training mask patterns [33]. The conditions and parameters required in the WG method are listed in Table 1. In this simulation, we use the on-axis illumination to generate the mask DNFs.

Table 1. Conditions and parameters of rigorous EMF simulation

View Table | View all tables in this article

4.2 Network parameter setting

In the proposed model, the generator consists of 49 layers, where the first 24 layers constitutes the encoder, the next 4 layers constitute the LSTM module, and the last 21 layers constitutes the decoder. The encoder is composed of convolution layers, activation layers and down-sampling layers to extract the geometric features from the mask pattern. The down-sampling is implemented by the convolution instead of traditional pooling operation, since the traditional pooling operation will selectively discard some information from the input feature maps. Moreover, the convolution operation can still extract features during the down-sampling process. It should be noted that in this network, the third layer is DeConv layer, and the other convolution layers use standard convolution.

Layers 25 to 28 constitute the LSTM module, which are alternately composed of LSTM layer and activation layer. Layers 29 to 49 constitute the decoder, which includes the convolution layers, activation layers and up-sampling layers. The purpose of the decoder is to restore the resolution of feature map, and thus obtaining the DNF data. The up-sampling is implemented by the SPSR sampling method, and the layer skip connection is inserted after each up-sampling operation. Layer skip connection can fuse the shallow geometric information with the deep semantic information of the mask pattern, and can improve the network prediction ability. Besides, except for the last convolution layer, the normalization will be performed after each convolution ahead. The generator finally outputs 8-channel data cube.

The discriminator includes 12 layers, consisting of convolution layer, activation layer and normalization layer. Each convolution completes one down-sampling operation. There are two inputs to the discriminator, one is the 8-channel DNF data, and the other is the mask layout. We need to concatenate the input mask with the DNF data along the channel dimension, and then input the data into the discriminator, where the size of the input tensor is (batch_size, 9, 80, 80).

The discriminator outputs a tensor with the size of (batch_size, 1, 8, 8). We can convert this tensor into a matrix of size (8,8), every element of which corresponds to a small area in the DNF data. Finally, the elements in the matrix can be normalized and the mean value can be calculated to judge the authenticity of the inputted DNF.

4.3 Simulation results

In this section, we will compare the proposed improved P2P-GAN method with the rigorous EMF simulation method and some other networks, including FCN, SP2P-GAN, SP2P-GAN with only SPSR sampling, SP2P-GAN with only DeConv, SP2P-GAN with only LSTM module. Among the networks mentioned above, the FCN model was proposed in Ref. [28]. The model uses the dilated convolution to enlarge the receptive field of the network, extract the information of the surrounding environment of the mask patterns, and finally generates the DNF matrices. However, FCN model can only predict one DNF matrix at a time. In addition, we will also conduct ablation experiments to verify the effectiveness of SPSR sampling, deformable convolution and LSTM module. Based on the ground-truth data calculated by the rigorous simulation, the average runtime and relative root mean square error (RRMSE) are used as the metrics to evaluate the network performance.

The average runtime refers to the total runtime of all testing samples divided by the number of testing samples. We need to assess the accuracy of the eight predicted diffraction matrices separately. RRMSE is used to measure the prediction accuracy, which is defined as:

(7)$$\textrm{RRMSE(}{{\mathbf G}_{nf}}\textrm{,}\;{{\mathbf T}_{nf}}\textrm{) = }{{\sqrt {\frac{1}{N}\sum\limits_{i = 1}^N {{{({{G_i} - {T_i}} )}^2}} } } {\bigg /} {{T_{\max }}}},$$

where G_nf represents one of the 8 diffraction matrices calculated by the 3D mask model; T_nf represents the corresponding ground-truth diffraction matrix of the testing mask; ${G_i}$ and ${T_i}$ represent the ith elements in matrix G_nf and matrix T_nf, respectively; N refers to the total number of the elements in the diffraction matrix; ${T_{\max }}$ represents the maximum element value in T_nf.

In addition, all of the networks are built on Windows PC using Pytorch in Python. The computer has a processor of Intel Xeon CPU E5-2620 v4 @ 2.10 GHz, and has a NVIDIA GTX 1080Ti GPU. All of the networks are trained using the Adam optimizer. The number of training epoch of all networks is 500, the batch_size is 1. The learning rate for FCN is 0.0001, and the learning rate for other networks is 0.00005. In the end, the proposed method takes 6560 seconds for network training. It should be noted that the network is trained once, and the well-trained network can be repeatedly used to predict the DNFs of the same type of masks. In addition, the prediction time of the network is much less than that required by the rigorous simulator. Therefore, as the number of testing masks increases, the computational gain obtained by the deep learning methods will be significant.

Figure 13 shows the diffraction matrices calculated by different methods, where the names of methods are listed on top of the figure. On the right side of the figure is the colorbar of DNF matrices. In Fig. 13, the first row shows the mask pattern, which belongs to the test set. And the next eight rows show the eight diffraction matrices in the order of E_real(XX), E_imag(XX), E_real(XY), E_imag(XY), E_real(YX), E_imag(YX), E_real(YY) and E_imag(YY). Because the absolute values of E(XY) and E(YX) are too small, the figure cannot show them clearly in the normal display. So, we increase the values of E(XY) and E(YX) by 10000 times and show them in the figure.

Fig. 13. Diffraction matrices of a 3D mask calculated by different methods.

Download Full Size | PDF

In addition, in order to illustrate the calculation errors of different deep learning methods, we provide the difference plots in the supplemental file, which indicate the differences between the diffraction matrices calculated by the rigorous method and the deep learning methods. It shows that the DNFs predicted by the improved P2P-GAN model are most closest to the ground truth. It is noted that the proposed model can be generalized to other kinds of patterns, if the geometric features in those patterns can be covered by the training dataset. If the new set of patterns are very different from the existing training dataset, we need to retain the model based on the selected samples from the new patterns.

Table 2 lists the average runtimes and RRMSEs obtained by different methods. The second row in Table 2 corresponds to the rigorous EMF method, which is used as the benchmark and ground truth for the predicted results. The third and fourth row show the results of the traditional FCN and SP2P-GAN models. Rows 5-7 show the results of the SP2P-GAN models combined with the SPSR sampling method, DeConv or LSTM module separately, which are used to verify the improvements contributed by each additional feature. Rows 8-10 show the ablation experiments of the proposed models. The eighth row shows the results of the proposed model where only the LSTM module is removed. The ninth row shows the results of the proposed model where only the DeConv is removed. The tenth row shows the results of the proposed model where only SPSR sampling method is replaced by the traditional interpolation method. Row 11 shows the results of the complete model proposed in this paper. We verify the necessity of each module by comparing the performance metrics in Rows 7-10.

Table 2. The performance metrics of different methods

View Table | View all tables in this article

The best results of each diffraction matrix are marked in bold font. Because the values of E(XY) and E(YX) are very small, they usually have little influence on the wafer image. On the other hand, the values of E(XX) and E(YY) will significantly influence the final lithography image on wafer. By observing Table 2, we can find that the improved P2P-GAN obtains the best prediction results of all DNF matrices. In addition, the proposed method accelerates the speed by 128 folds compared to the rigorous EMF simulator. And because the FCN model can only predict one DNF matrix at a time, the speed of the proposed model is more than 10 times faster than the FCN model when predicting eight DNF matrices. Although the speed of the proposed method is slower than some other networks, its runtime is also acceptable for the application in large scale simulation. What is more, the proposed method greatly improves the prediction accuracy compared to other networks. For example, compared with the traditional FCN, the prediction accuracies of the E_real(XX), E_imag(XX), E_real(YY) and E_imag(YY) have been improved by 55.6%, 50.6%, 53.7% and 54.7%, respectively.

Through the ablation experiments in Rows 8-10 we can find that the prediction error of the network has increased after removing any of the SPSR sampling, DeConv and LSTM module. Therefore, these three features are useful and contribute to the improvement of prediction accuracy.

In this study, we used the isolated mask patterns in the simulations. In fact, the proposed model can be extended to the lithography masks with repetitive fine patterns. In this case, the rigorous simulator can be used to calculate the mask DNF data based on the periodic mask boundary condition. In addition, the near-field distribution of a local mask feature is mainly influenced by its adjacent mask features, and the far end of the periodic patterns has little influence on the DNF at the central position. Thus, the impact of repetitive patterns on the DNF depends on the pitch of the periodic patterns.

5. Conclusion

This paper developed an improved P2P-GAN model for the fast and accurate simulation of 3D mask DNF in EUV lithography. First, DeConv was introduced in the generator network to sense the crosstalk effect between mask edges. Then, the LSTM module was exploited to fuse the information between the real parts and the imaginary parts of diffraction matrices. Finally, the SPSR up-sampling method was used to further improve the resolution and accuracy of the predicted DNF data. Additionally, the proposed model can predict all of the eight diffraction matrices simultaneously, which is beneficial to reduce the computation time. The superiority of the proposed model was proven by the comparison in both of computational efficiency and simulation accuracy. The ablation experiments were also conducted to verify the effectiveness of each feature in the proposed method. In the future, we will consider to use different methods, including the attention mechanism to improve the performance of our model. In addition, we can try to improve the generalization ability of the model under different physical conditions.

Funding

State Key Lab of Digital Manufacturing Equipment and Technology (DMETKF2022011).

Acknowledgments

This work is partially supported by State Key Lab of Digital Manufacturing Equipment and Technology (DMETKF2022011).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. B. Wu and A. Kumar, “Extreme ultraviolet lithography: a review,” J. Vac. Sci. Technol. B 25(6), 1743–1761 (2007). [CrossRef]

2. T. Aota and T. Tomie, “Ultimate Efficiency of Extreme Ultraviolet Radiation from a Laser-Produced Plasma,” Phys. Rev. Lett. 94(1), 015004 (2005). [CrossRef]

3. H. Zhang, S. Li, and X. Wang, “Fast Simulation Method of Extreme-Ultraviolet Lithography 3D Mask Based on Variable Separation Degration Method,” Acta Opt. Sin. 37(5), 0505001 (2017). [CrossRef]

4. J. Tirapu-Azpiroz and E. Yablonovitch, “Fast evaluation of photomask near-fields in subwavelength 193-nm lithography,” Proc. SPIE 5377, 1528–1535 (2004). [CrossRef]

5. Y. Wei, Advanced lithography theory and application of VLSI. (Science Press, Beijing, China, 2016).

6. A. Erdmann, D. Xu, P. Evanschitzky, V. Philipsen, V. Luong, and E. Hendrickx, “Characterization and mitigation of 3D mask effects in extreme ultraviolet lithography,” Adv. Opt. Technol. 6(3-4), 187–201 (2017). [CrossRef]

7. Y. Cao, X. Wang, Y. Tu, and P. Bu, “Impact of mask absorber thickness on the focus shift effect in extreme ultraviolet lithography,” J. Vac. Sci. Technol., B: Nanotechnol. Microelectron.: Mater., Process., Meas., Phenom. 30(3), 031602 (2012). [CrossRef]

8. J. Lagrone and T. Hagstrom, “Double absorbing boundaries for finite-difference time-domain electromagnetics,” J. Comput. Phys. 326, 650–665 (2016). [CrossRef]

9. X. Xiang and M. Escuti, “Numerical modeling of polarization gratings by rigorous coupled wave analysis,” Proc. SPIE 9769, 976918 (2016). [CrossRef]

10. H. Mesilhy, P. Evanschitzky, G. Bottiglieri, C. van Lare, E. van Setten, and A. Erdmann, “Investigation of waveguide modes in EUV mask absorbers,” Materials, and Metrology J. Micro/Nanopattern. Mats. Metro. 20(02), 021004 (2021). [CrossRef]

11. G. L. Wojcik, J. Mould, R. A. Ferguson, R. M. Martino, and K. K. Low, “Some image modeling issues for I-line, 5X phase-shifting masks,” Proc. SPIE 2197, 455–465 (1994). [CrossRef]

12. K. Adam, Domain decomposition methods for the electromagnetic simulation of scattering from three-dimensional structures with applications in lithography. University of California:Berkeley, 2001.

13. J. Tirapu-Azpiroz, P. Burchard, and E. Yablonovitch, “Boundary layer model to account for thick mask effects in photolithography,” Proc.SPIE 5040, 1611–1619 (2003). [CrossRef]

14. P. Liu, Y. Cao, L. Chen, G. Chen, M. Feng, J. Jiang, H. Liu, S. Suh, S. Lee, and S. Lee, “Fast and accurate 3D mask model for full-chip OPC and verification,” Proc. SPIE 6520, 65200R (2007). [CrossRef]

15. C. H. Clifford, M. J. Lercel, and A. R. Neureuther, “Fast simulation of buried EUV mask defect interaction with absorber features,” Proc SPIE 6517, 65170A (2007). [CrossRef]

16. Y. T. Cao, X. Wang, A. Erdmann, P. Bu, and Y. Bu, “Analytical model for EUV mask diffraction field calculation,” Proc. SPIE 8171, 81710N (2011). [CrossRef]

17. J. S. Yang and A. R. Neureuther, “Crosstalk Noise Variation Assessment and Analysis for the Worst Process Corner,” International Symposium on Quality Electronic Design IEEE, (2008).

18. X. Ma, X. Zhao, Z. Wang, Y. Li, S. Zhao, and L. Zhang, “Fast lithography aerial image calculation method based on machine learning,” Appl. Opt. 56(23), 6485–6495 (2017). [CrossRef]

19. Z. Li, L. Dong, X. Jing, X. Ma, and Y. Wei, “High-precision lithography thick-mask model based on a decomposition machine learning method,” Opt. Express 30(11), 17680–17697 (2022). [CrossRef]

20. H. Tanabe, S. Sato, and A. Takahashi, “Fast EUV lithography simulation using convolutional neural network,” J. Micro/Nanopattern. Mats. Metro. 20(04), 41202 (2021). [CrossRef]

21. A. Awad, P. Brendel, P. Evanschitzky, D. Woldeamanual, A. S. Rosskopf, and A. Erdmann, “Accurate prediction of EUV lithographic images and 3D mask effects using generative networks,” J. Micro/Nanopattern. Mats. Metro. 20(04), 043201 (2021). [CrossRef]

22. J. Lin, L. Dong, T. Fan, X. Ma, and Y. Wei, “Fast aerial image model for EUV lithography using the adjoint fully convolutional network,” Opt. Express 30(7), 11944–11958 (2022). [CrossRef]

23. O. Ronneberger, P. Fischer, and T. Brox, “U-net: convolutional networks for biomedical image segmentation,” Proc. MIC and CAI, 234–241 (2015).

24. U. Demir and G. Unal, “Patch-Based Image Inpainting with Generative Adversarial Networks,” arXiv, arXiv:1803.07422, (2018). [CrossRef]

25. J. Dai, H. Qi, Y. Xiong, Y. Li, G. Zhang, H. Hu, and Y. Wei, “Deformable Convolutional Networks,” arXiv, arXiv:1703.06211 (2017). [CrossRef]

26. S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation 9(8), 1735–1780 (1997). [CrossRef]

27. W. Shi, J. Caballero, F. Huszár, J. Totz, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” Proc. CVPR, 1874–1883 (2016).

28. J. Lin, L. Dong, T. Fan, X. Ma, and Y. Wei, “Fast mask near-field calculation using fully convolution network,” Proc. IWAPS, 1–4 (2020).

29. J. Masci, U. Meier, Dan Cireşan, and J. Schmidhuber, “Stacked convolutional auto-encoders for hierarchical feature extraction,” Proc. Int. Conf. Artif. Neural Netw. (ICANN), pp. 52–59 (2011).

30. P. Isola, J. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” Proc. CVPR, pp. 5967–5976, (2017).

31. P. Liu, X. Xie, W. Liu, and K. Gronlund, “Fast 3D thick mask model for full-chip EUVL simulations,” Proc. SPIE 8679, 86790W (2013). [CrossRef]

32. H. Yang, S. Li, Z. Deng, Y. Ma, B. Yu, and E. F. Y. Young, “GAN-OPC: Mask Optimization With Lithography-Guided Generative Adversarial Nets,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 39(10), 2822–2834 (2020). [CrossRef]

33. K. D. Lucas, H. Tanabe, and A. J. Strojwas, “Efficient and rigorous three-dimensional model for optical lithography simulation,” J. Opt. Soc. Am. A 13(11), 2187–2199 (1996). [CrossRef]

Conditions		Parameters
Illumination system	Wavelength	13.5nm
	Polarization	None
	Angle of incidence	6°
	Sampled point coordinate	(0,0)
EUV lithography mask	Material of reflective multi-layer	Mo/Si
	Complex refractive index of reflective multi-layer	Mo: 0.92108-0.00644j
	Complex refractive index of reflective multi-layer	Si: 0.99932-0.00183j
	Thickness of reflective multi-layer	Mo: 3 nm/Si: 4 nm (40 pairs)
	Material of absorber layer	TaN
	Complex refractive index of absorber layer	0.93025-0.04338j
	Thickness of absorber layer	60nm
	Critical dimension	Below 20nm

Methods	E_real(XX)	E_imag(XX)	E_real(XY)	E_imag(XY)	E_real(YX)
Rigorous EMF method	-	-	-	-	-
FCN	4.77%	5.77%	8.62%	8.16%	9.18%
SP2P-GAN	2.90%	3.75%	7.71%	7.47%	6.37%
SP2P-GAN with only SPSR sampling	2.82%	3.44%	7.96%	7.69%	6.38%
SP2P-GAN with only DeConv	2.50%	3.46%	7.27%	7.42%	5.72%
SP2P-GAN with only LSTM module	2.69%	3.27%	7.65%	7.72%	6.17%
Improved P2P-GAN without LSTM module	2.27%	3.11%	7.28%	8.20%	5.67%
Improved P2P-GAN without DeConv	2.41%	3.00%	8.18%	7.85%	6.46%
Improved P2P-GAN without SPSR sampling	2.35%	3.11%	7.33%	7.56%	5.75%
Improved P2P-GAN (SPSR sampling + DeConv + LSTM module)	2.12%	2.85%	7.22%	7.31%	5.64%
Methods	E_imag(YX)	E_real(YY)	E_imag(YY)	Average runtime
Rigorous EMF methods	-	-	-	3.2s
FCN	7.88%	5.05%	5.76%	0.264s
SP2P-GAN	6.40%	2.96%	3.41%	0.012s
SP2P-GAN with only SPSR sampling	6.40%	2.79%	3.27%	0.01s
SP2P-GAN with only DeConv	5.91%	2.62%	3.17%	0.021s
SP2P-GAN with only LSTM module	6.27%	2.73%	2.99%	0.018s
Improved P2P-GAN without LSTM module	6.20%	2.47%	2.90%	0.02s
Improved P2P-GAN without DeConv	6.16%	2.63%	2.85%	0.017s
Improved P2P-GAN without SPSR sampling	6.06%	2.53%	2.96%	0.028s
Improved P2P-GAN (SPSR sampling + DeConv + LSTM module)	5.81%	2.34%	2.61%	0.025s

Conditions		Parameters
Illumination system	Wavelength	13.5nm
	Polarization	None
	Angle of incidence	6°
	Sampled point coordinate	(0,0)
EUV lithography mask	Material of reflective multi-layer	Mo/Si
	Complex refractive index of reflective multi-layer	Mo: 0.92108-0.00644j
	Complex refractive index of reflective multi-layer	Si: 0.99932-0.00183j
	Thickness of reflective multi-layer	Mo: 3 nm/Si: 4 nm (40 pairs)
	Material of absorber layer	TaN
	Complex refractive index of absorber layer	0.93025-0.04338j
	Thickness of absorber layer	60nm
	Critical dimension	Below 20nm

Methods	E_real(XX)	E_imag(XX)	E_real(XY)	E_imag(XY)	E_real(YX)
Rigorous EMF method	-	-	-	-	-
FCN	4.77%	5.77%	8.62%	8.16%	9.18%
SP2P-GAN	2.90%	3.75%	7.71%	7.47%	6.37%
SP2P-GAN with only SPSR sampling	2.82%	3.44%	7.96%	7.69%	6.38%
SP2P-GAN with only DeConv	2.50%	3.46%	7.27%	7.42%	5.72%
SP2P-GAN with only LSTM module	2.69%	3.27%	7.65%	7.72%	6.17%
Improved P2P-GAN without LSTM module	2.27%	3.11%	7.28%	8.20%	5.67%
Improved P2P-GAN without DeConv	2.41%	3.00%	8.18%	7.85%	6.46%
Improved P2P-GAN without SPSR sampling	2.35%	3.11%	7.33%	7.56%	5.75%
Improved P2P-GAN (SPSR sampling + DeConv + LSTM module)	2.12%	2.85%	7.22%	7.31%	5.64%
Methods	E_imag(YX)	E_real(YY)	E_imag(YY)	Average runtime
Rigorous EMF methods	-	-	-	3.2s
FCN	7.88%	5.05%	5.76%	0.264s
SP2P-GAN	6.40%	2.96%	3.41%	0.012s
SP2P-GAN with only SPSR sampling	6.40%	2.79%	3.27%	0.01s
SP2P-GAN with only DeConv	5.91%	2.62%	3.17%	0.021s
SP2P-GAN with only LSTM module	6.27%	2.73%	2.99%	0.018s
Improved P2P-GAN without LSTM module	6.20%	2.47%	2.90%	0.02s
Improved P2P-GAN without DeConv	6.16%	2.63%	2.85%	0.017s
Improved P2P-GAN without SPSR sampling	6.06%	2.53%	2.96%	0.028s
Improved P2P-GAN (SPSR sampling + DeConv + LSTM module)	5.81%	2.34%	2.61%	0.025s

Fast diffraction model of lithography mask based on improved pixel-to-pixel generative adversarial network

Abstract

1. Introduction

2. Data preparation and standard P2P-GAN

2.1 DNF data preparation

2.2 Architecture of standard P2P-GAN

3. Improved P2P-GAN model

3.1 Deformable convolution

3.2 LSTM module

3.3 Subpixel super-resolution sampling

3.4 Structure of improved P2P-GAN

4. Simulation results and analysis

4.1 DNF data set of EUV lithography mask

4.2 Network parameter setting

4.3 Simulation results

5. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (13)

Tables (2)

Equations (7)

Optics Express