Multislice forward modeling of coherent surface scattering imaging on surface and interfacial structures

Peco Myint; Miaoqi Chu; Ashish Tripathi; Michael J. Wojcik; Jian Zhou; Jian Zhou; Mathew J. Cherukara; Suresh Narayanan; Jin Wang; Zhang Jiang; Zhang Jiang

doi:10.1364/OE.481401

1. Introduction

With the advent of continually growing higher coherent flux form accelerator-based X-ray sources, it has become possible to resolve smaller spatial length scales at shorter experimental times, enabling new surface imaging techniques to emerge. Recent developments of state-of-the-art X-ray focusing techniques provides new insights into materials research [1]. New techniques are being developed to take advantage of higher brilliance and coherence of X-rays to image or characterize surfaces at nano to sub-nano scales. For imaging small non-crystalline or heterogeneous structures on thick opaque substrates, coherent X-rays are essential to obtain surface scattering images that have rich spatial information – thus named coherent surface scattering imaging (CSSI) [2]. Due to the nature of the detecting scattering, phase information is lost. Therefore, conventional coherent diffraction imaging methods resort to iterative forward and inverse Fourier transformations to reconstruct the real space information. However, these conventional phase retrieval algorithms do not work straightforwardly for CSSI due to the dynamical scattering phenomenon of the grazing-incidence geometry. An appropriate model that can reproduce experimental scattering images in surface scattering geometry is therefore critical not only to explain experimental scattering patterns, but also to perform reconstructions.

CSSI as a surface sensitive technique performed at the grazing-incidence geometry shares many characteristics as conventional grazing-incidence small-angle X-ray scattering (GISAXS) for probing nanostructures at surfaces or in thin films. The most significant phenomenon is the dynamical scattering effect which arises from the multiple scattering events of the photons due to the strong reflection from the substrate or film surface as well as the large X-ray illumination footprint. This phenomenon cannot be dealt with by single forward Fourier transform of the probed structures as adopted by the kinematic approximation for CDI analysis. A thorough understanding and the ability to reproduce the complex dynamical scattering patterns are necessary to extract accurate structural information. Distorted wave Born approximation (DWBA) is most often employed to deal with this situation [3,4]. In contrast to assuming a constant plane-wave like incident electric field in the kinematic approximation, the DWBA setting takes into account the strong substrate reflection (i.e., the major source of the distortion to the incident electric field) by means of a height-dependent electric field along the surface normal direction. Thus, DWBA can only provide structural information statistically averaged in the sample surface plane rather than an absolute structure. In the presence of a heterogeneous structure of in-plane feature sizes comparable to the coherent length of the incident beam, the convenience of the in-plane statistical averaging in the DWBA’s approximation vanishes. The accuracy of the DWBA deteriorates further if the electric field is largely distorted by the presence of heavy elements in the nanostructures. Hence arises the need to calculate the absolute location-dependent three-dimensional scattering field.

In this paper, we present an approach based on multislice to tackle the above challenges encountered in solving for nanostructures on surfaces and in thin films from coherent grazing-incidence scattering patterns. The multislice method naturally handles the three-dimensional potential field by means of iteratively producing the scattering from stacked two-dimensional slices and can accurately reproduce dynamical scattering effects. This is also done without assuming a close-form reciprocal-space shape factor as often adopted in the DWBA method. In other words, the multislice method intuitively treats dynamical scattering by localized scattering potentials by separately considering the effects of free-space propagation within the material slab (through which wave will propagate), from the phase and amplitude shifts obtained under the projection approximation. The original multislice method was originally proposed by Cowley and Moodie [5] to deal with the scattering of electrons by three-dimensional potential fields. It was later expanded by Goodman and Moodie [6] into numerical implementation on a computer. Multislice has long been used in acoustics and electron beam microscopy and has now been adapted to model the interaction of soft X-rays with an object of arbitrary shape and composition, given knowledge of its optical constants such as wavelength and beam profile. Multislice methods have been implemented in numerical calculations of optical interactions in electron [7,8] and X-ray [9–11] optics, but all of them were in transmission geometry. Recently, the multislice approach has be shown to be applicable to an even broader range of phenomena including X-ray reflectivity, where the substrate is part of the sample system. For example, Kenan Li et al. illustrated the total external reflection through multislicing [12]. Building on these results, we demonstrate that multislicing in grazing-incidence geometry can be used to study the 3D structure of substrate-supported surface patterns.

(1)$$\begin{aligned} \phi_\omega(X,Y,Z=Z_0+\Delta)=&\text{exp}(ik\Delta) \mathcal{F}^{{-}1} \Bigg\{ \text{exp}\Big({-}i\Delta\frac{k_X^2+k_Y^2}{2k}\Big)\times \mathcal{F}\Bigg\{\\ &\text{exp}\Big(\frac{k}{2i} \int_{Z=Z_0}^{Z=Z_0+\Delta} [1-n_\omega^2(X,Y,Z)] \,dZ \Big) \phi_\omega(X,Y,Z=Z_0)\Bigg\}\Bigg\}\\ \end{aligned}$$

Our multislice formalism for reflection geometry requires that a three-dimensional object be created from which two-dimensional slices are made for wave to propagate and interact throughout. Calculation of wave propagation through a single slice can be described by Eq. (1), which is derived in detail from Helmholtz equation [13], where $k$, $n_\omega$, $\Delta$ and $\phi _\omega$ represent wavenumber of X-ray, refractive index (of a pixel) corresponding to the chosen X-ray wavenumber, the Z thickness of a single slice, and the incoming complex wave probe. In Eq. (1), the incoming wave $\phi _\omega$ is first multiplied by the integration of refractive indexes of different materials present in a Z slice of thickness $\Delta$ and follows through Fourier transform and inverse Fourier transform to compute the exit wave from the slice. By sequentially computing the exit wave of the next slice by using the exit waves of the previous slice, the final exit wave can be obtained. The final exit wave can then be Fourier transformed and squared to obtain the far-field scattering intensities as one would measure in real experiments. This iterative procedure is also illustrated in the Fig. 1(a), where each slice can be seen as a projected two-dimensional object consisting of substrate, pattern, and air/vacuum.

Fig. 1. Schematic diagrams of multislice and CSSI experimental setup: (a) Diagram explaining multislice which is also described by Eq. (1). $N$ is total number of $Z$ slices. $n$ is refractive index. $\Delta$ is a Z thickness for each slice. $T(X,Y)$ is a 2D transfer function in $\hat {e}_X$ and $\hat {e}_Y$ planes, integrated along $\hat {e}_Z$ for the thickness of $\Delta$, containing substrate, pattern, and air/vacuum. $C$ is a complex constant array containing information about the coordinates and X-ray energy. $\phi _0$ is an incoming/incident complex (with phase) X-ray probe along $\hat {e}_Z$, with wavenumber $k_i$. $\phi _1$, $\phi _2$, $\phi _3$, and $\phi _N$ are exit waves at slice 1, slice 2, slice 3, and slice N. The final detector image is the free propagated final exit wave $\phi _N^{\textrm {free propagated}}$ Fourier transformed and modulus squared, thus the phase information is not present there as in experiments. (b) Experiment depicted in the lab frame of reference. (c) Experiment depicted in the sample frame of reference.

Download Full Size | PDF

Multislice method in transmission geometry has been implemented with hardware acceleration provided by graphic processing units (GPU) [14]. The advantage of utilizing a GPU is that the forward calculations could be sped up to 60-100 times compared to only using a CPU. The prevalent usage of GPUs in machine learning has led to the development of automatic differentiation tools such as PyTorch (Pytorch is a Python package to perform tensor operations and neural network optimizations with strong GPU acceleration. [15]), which is integrated into our multislice-assisted reconstructions, as discussed in Subsection 4.2. PyTorch keeps track of every numerical operation step in a computational graph, which allows differentiating with respect to specific parameters in a complex model such as multislice. Automatic differentiation has been already utilized in holographic and ptychographic image reconstruction algorithms [16,17]. As for our CSSI geometry, 3D samples with different materials (or refractive indexes) can be created in a GPU by voxelization of a pattern into 3D matrices with each element representing an integrated refractive index of materials which may be comprised of any desired materials (that make substrate, pattern, and air/vacuum above it). By containing multislice computations entirely within a GPU, one can efficiently manipulate 3D matrices such as rotations, scaling, and extending boundaries to emulate experimental conditions to perform realistic multislice forward calculations.

Micro-scale 3D patterns used to obtain experimental and simulated CSSI images are listed in Table 1. These samples are smaller than the total illumination footprint of the incident probe. All experimental data acquired from samples listed there are explained by the multislice forward model in Section 3. Reconstructions using the multislice model are demonstrated in Section 4 in the form of coherent surface scattering imaging-coherent diffraction imaging (CSSI-CDI).

Table 1. List of samples and specifications

View Table

2. Experimental setups

The experiments were conducted with a CSSI prototype at beamline 8-ID of the Advanced Photon Source, Argonne National Laboratory, with 7.36 keV X-ray energy. The incident angles vary from around 0.3$^{\circ }$ to 1$^{\circ }$, where dynamical scattering is most pronounced at exit angles less than 0.6$^{\circ }$. It is important to also note that the critical angles of total external reflection of gold and silicon, two most common materials in our samples, determine the said dynamical scattering range. For experiments described here, the incident angle of 0.5$^{\circ }$ was chosen to observe pronounced dynamical scattering. The CSSI geometry enables X-ray penetration into our samples up to tens to hundreds of nanometers, yielding high sensitivity to depth and lateral spatial information. Our samples have either uniform or non-uniform depth profiles, through which it can be confirmed whether CSSI geometry experiments and multislice simulations have agreeing results. A coherent X-ray beam of $\sim$ 2$\mu$m $\times$ 2$\mu$m FWHM (measured experimentally by knife-edge scans and can be approximated as a Gaussian profile) is used so the full sample resides in the illuminated area for CDI experiments, where our chosen CSSI geometry creates the X-ray footprints that sweep in the direction of the probe, ensuring the elongated samples are fully engulfed within the coherent X-ray beam. The detector used in the experiments is Lambda-2M from X-Spectrum with an individual pixel size of 55$\mu$m $\times$ 55$\mu$m and was measured to be placed at 3.82 m away from sample. The CSSI geometry is portrayed in two schematic diagrams in Fig. 1; (b) is the CSSI sample in the lab frame of reference (coordinates: $\hat {e}_X$, $\hat {e}_Y$, $\hat {e}_Z$) with an X-ray incident angle $\alpha _i$, whereas (c) is in the sample frame of reference (coordinates: $\hat {x}$, $\hat {y}$, $\hat {z}$) and $\alpha _{i}$, $\alpha _{f}$, and $\psi$ are X-ray incident angle, X-ray exit angle (in the $\hat {z}-\hat {y}$ plane), and X-ray exit angle (in $\hat {x}-\hat {y}$ plane). Equation (2) summarizes an important step of multislice simulation of how the index of refraction in the sample frame of reference is transformed into that in the lab frame of reference through yaw and pitch rotations, using the mentioned angles in rotation matrices described in Eq. (3). It is important to understand the relationship between the two frames of references because the scattering experiments are normally understood through the sample frame consideration, whereas the lab frame of reference was utilized by the coherent imaging community, and it is more straightforward for the settings of the Fresnel propagation simulations in our work. The final scattering pattern on the detector is the same regardless of the choice of frame of reference.

(2)$$n_\omega(X,Y,Z) = R_{\textrm{pitch}}(\alpha_i ) R_{\textrm{yaw}}(\psi ) \; n_\omega(x,y,z)$$

(3)$$\begin{aligned} R_{\textrm{pitch}}(\alpha_i )={\begin{bmatrix}1 & 0 & 0\\0 & \cos \alpha_i & -\sin \alpha_i \\0 & \sin \alpha_i & \cos \alpha_i \\\end{bmatrix}},R_{\textrm{yaw}}(\psi )={\begin{bmatrix}\cos \psi & 0 & \sin \psi \\0 & 1 & 0\\-\sin \psi & 0 & \cos \psi \\\end{bmatrix}} \end{aligned}$$

The pattern with uniform height (the elongated rod A in Table 1) was made by e-beam lithography, whereas the non-uniform three-dimensional Platinum pattern (the FIB structure in Table 1) was made by focused ion beam (FIB) deposition method.

3. Fast computing multislice simulations in GPU and experimental validation

The simulations are done in the lab frame of reference, meaning that the X-ray probe travels straight to meet an object that is tilted at a desired X-ray incident angle. Simulations do not require high resolution to slice object along the direction of the probe because the small field of view (i.e. numerical aperture) along the $\hat {e}_Z$ direction in the grazing-incidence geometry. Higher resolution of the sample along the footprint direction can be achieved via in-plane tomography reconstruction which is not the focus of this paper and will be discussed elsewhere. The appropriate thickness $\Delta$ of an individual $Z$ slice (along $\hat {e}_Z$) and the sizes of the voxels in the lateral dimensions were inferred from the experimental numerical apertures, which correspond to the maximum achieved exit wave vector $k_{f}$ given by the flux in the experiment. The X-ray footprint determines how many total number of slices we need for the entire simulation. Naturally, finer slice thickness only leads to longer computational times without necessarily yielding finer 3D electric field needed for the final scattering pattern; keeping 200 nm thickness per a $Z$ slice is observed to be a good choice for speed and accuracy for our working X-ray incident angles; in other words, selecting smaller $\Delta \;<$ 200 nm does not yield a significantly different scattering image, but only prolongs the simulation time. The lateral dimensions (the field view plane perpendicular to the direction of probe) of a voxel define the finest angular resolution the simulation can provide. To reproduce the same resolution and wavenumber range achieved in experiments, a minimum of 5 nm length scale or 0.2 nm$^{-1}$ wavenumber per voxel is required meaning that a significant 3D matrix size of refractive indexes is necessary to include pattern, substrate, air, and enough wave propagation distance. This is a memory-taxing and time-consuming computation and manipulation of the enormous 3D matrix can be quickly performed through a high-performing GPU code, if its memory can hold the matrix.

CuPy [18] and PyTorch [19] are used as fast-performing GPU computation tools to manipulate the multislice object described above to perform operations such rotations and matrix multiplications required to do reflection geometry multislice wave propagation. In GPU, the complex-valued 3D matrix containing refractive indexes representing sample, substrate, and air is created and is then tilted using bi-linear interpolation. The 3D matrix is then sliced into pieces as demonstrated in the schematic diagram (a) of Fig. 1. The incoming probe $\phi _0$ is defined as a simple Gaussian probe with FWHM of 2$\mu$m $\times$ 2$\mu$m to match the experimental condition, although in principle any shape or number of modes of complex valued incoming probe, obtained experimentally or assumed, could be used. The probe interaction through a 2D slice of the 3D matrix is computed by Eq. (1) and the exit wave is used as the incoming wave for the next slice as portrayed in the Fig. 1 (a). For mere comparison between experimental data and forward calculation simulations of large samples discussed in this section, CuPy code was written in a memory efficient way to accomplish the task. For 3D object reconstruction purposes in Section 4, every calculation step has to be tracked for automatic differentiation, which is required to obtain gradients with respect to parameters, and PyTorch is therefore preferred albeit more memory taxing. For both CuPy and PyTorch GPU codes, from object creation of any desired 3D pattern of 70 $\mu$m (length, $\hat {z}$) $\times$ 4 $\mu$m (width, $\hat {x}$) $\times$ 50 nm (depth, $\hat {y}$) dimensions, along with substrate and air, to computing the final scattering intensity pattern, it takes approximately a second (on HPE NVIDIA A100 Graphic Card with 80 GB memory), enabling fast multiple iterations with different object guesses: a process intrinsic in reconstruction algorithms as discussed in Section 4.

Although multislice is computationally more expensive than DWBA, it has advantages in that it enables finer three-dimensional control of sample’s compositions and yields accurate final scattering patterns. To test whether the multislice forward model can reproduce experimental data, the rod A pattern in Table 1, oriented in two positions, is used. The angles $\alpha$ and $\psi$ are calculated for simulated scattering pattern and are compared with the experimental data angle to angle.

Multislice simulations of the simple elongated rod A agree with experimental scattering images, showing dynamical scattering, or beatings, near sample horizon as seen in Fig. 2. The X-ray incident angle was chosen to be 0.5$^\circ$ to highlight dynamical scattering phenomenon. It could be noted that the multislice simulation does not have a sharp drop of scattering intensities at the sample horizon ($\alpha _i = 0$) as seen in experiments. This is due to the fact that multislice propagation does not have enough empty Silicon substrate (due to GPU memory limits) to attenuate intensities below $\alpha _i = 0$. Additionally, the propagation past the gold pattern does not have a significant influence on the scattering intensities above the sample horizon. In Fig. 2 a.IV and b.IV, corresponding DWBA simulations are also shown, where the fringes near the sample horizon are not reproduced. For comparison purposes, multi-layered DWBA simulations were performed, where electron density of the isolated pattern is vertically discretized within the frame of multi-layered DWBA [4]. The binning of the isolated pattern for simulation was set so that further increasing resolution more than 5nm per pixel no longer changes the calculated scattering pattern. In the conventional DWBA setting such as multi-layered DWBA, the electric field is a one-dimensional parameter that is discretized only in the direction normal to surface and thus, it is a constant in the plane. However, in the coherent beam illumination, this assumption is no longer valid and the interference with the substrate and structured layers at different in-plane locations need to be considered in the full three dimensions. Therefore, dynamical scattering fringes near the critical angles observed in the experiments cannot be reproduced by DWBA. These fringes are dependent on the thickness of the samples and can be accurately predicted by multislicing simulations as also confirmed by SEM or AFM. Multislice is again shown to give accurate scattering patterns when the sample is rotated in the beam, for example, in-plane rotations up to $\psi$ = 0.5$^{\circ }$, as shown in the right column of Fig. 2. This will be useful for the reconstruction of tomography experiments in order recover the limited field view along the footprint direction due to the limited field of view in that direction. In all multislice simulations, scattering intensities can also be matched to experimental values by using the incident X-ray probe with a known photon flux.

Fig. 2. Elongated Rod A pattern scattering images and simulations. Left column (a): The rod is at an X-ray incident angle of 0.5$^\circ$ with zero in-plane rotation angle, schematically portrayed in (a.I). Right column (b): the rod is at the same incident angle of 0.5$^\circ$ with -0.5$^\circ$ in-plane rotation angle, as portrayed in (b.I). The gaps in the plot (b.II) are due to modular gaps on the detector, whereas gaps are patched in the plot (a.II) by taking two images at offset detector locations. In both cases, multislice model is able to exactly reproduce the experimental scattering images as seen in (a.III) and (b.III). The plots (a.IV) and (b.IV) show that corresponding DWBA simulations do not reproduce the fringes from dynamical scattering below $\alpha _f$ $\sim$ 0.2$^\circ$.

Download Full Size | PDF

4. Reconstruction using the multislice forward model

In utilizing the multislice formalism for image reconstruction purposes, we can consider two regimes: kinematical regime, which is for scattering experiments done at angles well above the critical angles (e.g. 0.7$^{\circ }$ and above for samples made of silicon and gold is where dynamical scattering starts to get weaker) and dynamical regime, which is for those done at angles between 0.3$^{\circ }$ and 0.5$^{\circ }$. Up to now, most CDI experiments are performed in the forward transmission geometry and thus reconstruction algorithms adopt the kinematical approximations. The multislice model can be used to deal with both kinematical and dynamical regimes, but the reconstructions here are done in the complex dynamical regimes because the dynamical scattering phenomena can be only simulated by using multislice as discussed in Section 3. In the first subsection, the formalism is used to reconstruct a buried 3D pattern arrangement by minimizing a cost function between a ground truth scattering pattern and PyTorch’s guesstimated scattering pattern: a simulated toy model reconstruction of buried chip pattern. In the second subsection, a single-shot experimental scattering image of the 3D FIB sample is used to perform 3D structural refinement using PyTorch’s optimization, where measurements from Atomic Force Microscopy are used as a strong object support to help with PyTorch’s optimization.

4.1 CSSI-CDI reconstruction from simulation data: a simulated 3D pattern buried within a silicon substrate

Reconstructions from scattering images acquired at the X-ray incident angles near the critical angle can be implemented by utilizing the multislice model that can help explain dynamical scattering. A simple mean square error (MSE) cost function could be written as described in Eq. (4) such that the difference between the ground truth (experimental scattering amplitude data or in this scenario simulated scattering ground truth amplitude, i.e. simulated scattering applied with Poisson noise and rounded to mimic a physical detector) and an educated multislice simulation guess from PyTorch that is iteratively updated. Such a cost function could be minimized using PyTorch’s automatic differentiation, updating parameters by using gradients computed by PyTorch. In other words, it is intrinsically a multiple parameter optimization problem, in which parameters define pattern’s shape. By iteratively updating the guessed object used for the multislice simulation and reducing the cost function of difference between ground truth and guessed object, the buried 3D object can be reconstructed.

(4)$$\textrm{Loss}(T_{\textrm{predicted}} - T_{\textrm{ground truth}})=\frac{1}{N} \sum _{i=1}^{N = \textrm{Total pixels}}\left(T_{\textrm{predicted}}^i-T_{\textrm{ground truth}}^i\right)^{2}$$

A more complicated cost function could be written such that 216 parameters are used to define 216 (6 x 6 x 6) voxels which describe a buried pattern arrangement (or rather a simple chip pattern with reconstruction voxel resolution as the resolving limit for this particular case). Here, it is important to note that the reconstruction voxel resolution in real space here is bigger than the ultimate experimental resolution or length scale that is determined by X-ray photon flux. Each parameter describes whether a voxel is just silicon (if all 216 parameters are silicon, we have reflection from a bare silicon) or gold. A ground truth is first picked: a particular CSSI scattering pattern (with Poisson noise) from a 3D buried chip. Then, the cost function is minimized and the parameters are updated; the difference between ground truth scattering pattern and the guessed scattering patterns is reduced. The reason why the cost function is written using scattering amplitude is because in actual image reconstructions, we only have scattering intensities with lost phase information. The optimization is also written such that 216 parameters that define voxels are applied with a filter after 20 epochs of optimization by taking nearest approximation to refractive indexes of interest; in other words, voxels are represented by integrated refractive index indicating a mixture of air, silicon, and/or gold, etc. For each iteration, an educated guess using gradients from automatic differentiation is made on 216 parameters using ADAM optimizer [20] and a buried pattern arrangement is created and then multislice wave propagation is computed through the created object to obtain the final scattering pattern.

The buried pattern arrangement is about 500 nm wide and 200 nm high, situated inside silicon, 10 nm below the surface. Each reconstruction voxel is about 10 $\mu$m (length) $\times$ 250 nm (width) $\times$ 100 nm (height). The voxels are elongated along the direction of the beam because of the small field of view along the wave propagation direction. It is important to note that these voxels are reconstruction voxels, not the multislice voxels, which are of much smaller dimensions as mentioned in Section 3. The sample is set to the X-ray incident angle of $\alpha _{i}$ = 0.5$^\circ$ . In the bottom leftmost of Fig. 3, the scattering image shown there is the ground truth that is used to minimize the cost function between itself and guessed scatterings for each epoch. Starting from a pure silicon substrate without features, the cost function minimization is observed to converge to ground truth after 1800 multislice propagation iterations or epochs. Different initial guessed conditions were also tested and they were all observed to converge to the ground truth. This simple 216 parameter optimization shows that multislice simulations are sensitive to the complex dynamical scattering near the critical angle and it is possible to reconstruct a three-dimensional buried pattern from a single scattering pattern if coherent X-ray flux and the field of view are sufficiently high and there are sufficient computing resources to support the high-resolution reconstruction.

Fig. 3. Reconstruction of a buried 3D structure inside a Silicon substrate. The leftmost column is for the ground truth, detailing a 2D cross-section (a), the 3D structure itself, and the computed ground truth scattering (c), which is applied with poission noise and rounded to nearest integer value to emulate a detector image. (d, e, f) describe the 3D object being updated and its corresponding scattering images at epochs 0, 400, and 1800. (b) shows the loss function as a function of epoch, showing convergence to lowest cost function value to match the ground truth scattering. All scattering images in the bottom row are on the same scale.

Download Full Size | PDF

4.2 CSSI-CDI structural refinement from experimental data: a 3D pattern created by the focused ion beam

In this subsection, parameter optimization utilized in the previous subsection 4.1 is applied to an experimental scattering image. The experimental scattering image of the FIB 3D structure from Table 1 was taken with X-ray flux of $5 \times 10^{9}$ Photons s$^{-1}$ where as for the buried 3D structure in 4.1, the simulated X-ray flux was 100 times more. The advanced photon source is going through an upgrade that will increase its coherent by two orders of magnitude, but currently the experimental scattering image of the FIB 3D structure does not have enough X-ray flux to resolve finer length scales along $\hat {y}$(height) and $\hat {z}$(length) directions (i.e. numerical aperture is flux and geometry limited in a single exposure setup). Therefore, a strong object support or prior knowledge could be used as a guide to do fine-tuning or structural refinement using the PyTorch optimization. The CSSI geometry has asymmetric resolutions along each dimension and the AFM image of the FIB sample has high sensitivity for height ($\hat {y}$) and length ($\hat {z}$), but has limitations along width ($\hat {y}$) especially at sharp edges due to the dull AFM tip profile. Given the constraints, it is demonstrated here horizontal beatings seen in the experimental X-ray scatterings are due to the tapering profile of the top layer of the FIB pattern – one-dimensional parameter optimization or structural refinement.

In Fig. 4, the ground truth scattering from experiment is shown on the left bottom plot. It is important to note that there are smaller beatings noticed at small $\psi$ angles ($\sim 0.1^{\circ }$) or indicated by pink annotation on d of Figure 4, which represent bigger length scales (i.e. the bottom part of the FIB object which is 0.9$\mu$m). The bigger beatings are indicated by green annotations, and it will be discussed below that they are not entirely due to the smaller top part of the FIB object, which is 0.3$\mu$m. Fringes along arch (annotated by red texts and drawings) are due to both length scales along $\hat {z}$ and $\hat {y}$. To simulate the entire FIB sample with reasonable resolution, 50 x 40 x 14 voxels (28000 parameters) are used to define the 3D shape of the FIB sample and up to 77 GB of GPU memory is occupied during forward calculations and gradient updates; each voxel is about 600 nm (length) $\times$ 7 nm (width) $\times$ 1 nm (height) and each epoch, including forward calculation and gradient update, takes a total of 2.5 s for this sample size. If we start from a perfect two-layers as described in Table 1, the computed scattering image (bottom middle plot of the figure) does not have any bigger beatings as seen in experiment. We do not have enough length scales sensitivity along $\hat {y}$ because the resolution necessary to resolve the details of heights is restricted by vertical numerical aperture which in turn is limited by the X-ray flux for a single exposure reconstruction. Additionally, the information along beam $\hat {z}$ is also limited due to X-ray flux issues. Therefore, the initial starting point for structural fine-tuning is best set with perfect rectangles with heights, length, and width values given by the AFM image. One reason for starting with sharp edges, rather than using AFM profile is to retrieve spatial variation along $\hat {x}$ direction without being influenced by the AFM tip convolution issues. The optimization is carried out, using the MSE loss function and the ADAM optimizer. Every twenty epochs, a shrink wrap support [21] is applied, which also acts as nearest neighbor filter to determine whether a voxel is platinum or air. After 80 epochs, the guessed scattering starts to show the bigger beatings (along the arch) which agree with the experimental ground truth. This confirms that the tapering profile of the top layer of the FIB sample contribute to the bigger beatings. The goal here is to demonstrate that thousands of parameters can be updated per each epoch through fast computations and optimization in GPU, as long as the experimental data allows it. In this case, there is only enough flux for physical information along the $\hat {x}$ direction, requiring strong sample support information from AFM measurements for height($\hat {y}$) and length($\hat {z}$). Higher X-ray fluxes, which will be available in Advanced Photon Source Upgrade, will enable better sensitivity to the 3D structure, as demonstrated by the simulation exercise in Subsection above. Since distributed memory algorithms for X-ray wave propagation have been implemented [22], the next critical step is to implement this algorithm across multiple GPUs to enable experimental reconstructions of larger length scales at higher resolution, which require large field view or are memory intensive.

Fig. 4. a 3D CSSI-CDI structural refinement from a single shot experimental scattering pattern using prior knowledge. (d) shows ground truth scattering image obtained from the experiment and (a) the measured AFM cross-section profile juxtaposed on top of refined PyTorch’s 3D structure. (b, c) show the initial starting point which is an idealistic 3D structure and its dimensions were from AFM measurements. The guesstimated scattering intensities in (e) have similarities with the ground truth, but the bigger periodicity (green annotations on ground truth scattering) in the scattering is missing. (c, f) show structurally refined 3D structure along X-Y plane after 80 epochs, where tapering profile at the top is observed to be responsible for the bigger periodicity in the guesstimated scattering pattern (f). The smaller periodicity is present for both epoch 0 and 80 and matches with the ground truth scattering since it is due to bigger length scale, i.e. bottom 0.9 $\mu$m wide structure.

Download Full Size | PDF

5. Conclusion

Coherent surface scattering imaging (CSSI) utilizes the advantages of the coherent diffractive imaging concept to reconstruct coherent grazing-incidence scatterings from surface and thin-films structures. High-flux scattering images taken for tomography will further complement CSSI for full 3D reconstructions with isotropic high-resolution in all directions. Whatever solutions there may be in the instrumentation aspect of experiments, multislice is crucial in providing a holistic forward model in the CSSI imaging technique. One big advantage of the reflection-geometry multislice approach is that it can be applied to any three-dimensional object structure with inhomogeneous refractive index distribution $n(x, y, z)$ and any incoming X-ray probe shape and phase as long as a sufficient computational resource (memory, computing nodes, etc) makes it possible. Additionally, multislice model can simulate ptychography and tomography simulations in both dynamical and kinematic regimes without requiring the plane wave assumption as in DWBA and without any restrictions on the form of the X-ray probe nor its phase. Unlike DWBA theory only applicable for far field scattering analysis, multislice as a wave proportion method is also capable for near-field imagining analysis. The next critical step is therefore to implement automatic differentiation and computation of the multislice model across multiple GPUs to simulate with larger field view and enable experimental reconstructions of larger sample sizes and smaller voxel sizes.

With continuing advances in synchrotron X-ray sources, the coherent X-ray scattering Imaging technique will be able to probe smaller lenghth scales at shorter timescales. Resolution to differentiate layer by layer, or in other words depth sensitivity, can be achieved in experiments by varying X-ray incident angles and changing in-plane rotations. Such experimental implementations along with the multislice forward model open the door to myriads of imaging techniques such as CSSI-CDI (reconstruction from a single shot scattering image from a small sample), CSSI-ptychography (reconstruction from scattering images of overlapping scans on an extended sample), and CSSI-tomography (reconstructions from scattering images of sample at different in-plane angles), and CSSI-laminography (combination of tomography and ptychography for 3D reconstruction of an extended object).

Funding

U.S. Department of Energy (DE-AC02-06CH11357).

Acknowledgments

P.M., M.C. and Z.J. took the X-ray experimental data. P.M. developed CuPy and PyTorch codes to do forward model calculations and reconstructions based on M.C.’s initial work on multislice simulations. M.J.W made the rod A sample with e-beam lithography. J.Z. made the Focused Ion Beam sample and performed the Atomic Force Microscopy and Scanning Electron Microscopy measurements on it. All the other co-authors were involved in experiments, discussions, and in the preparation of the manuscript. This work is supported by the Advanced Photon Source and Center for Nanoscale Materials, US Department of Energy (DOE), Office of Basic Energy Sciences, Office of Science User Facilities, under Contract No. DE-AC02-06CH11357. This work is also supported by the DOE Early Career Research Program.

Disclosures

The authors declare no conflicts of interest.

Data Availability

Data and codes to produce the results presented in this paper may be obtained from the authors upon reasonable request.

References

1. A. Sakdinawat and D. Attwood, “Nanoscale x-ray imaging,” Nat. Photonics 4(12), 840–848 (2010). [CrossRef]

2. T. Sun, Z. Jiang, J. Strzalka, L. Ocola, and J. Wang, “Three-dimensional coherent x-ray surface scattering imaging near total external reflection,” Nat. Photonics 6(9), 586–590 (2012). [CrossRef]

3. S. K. Sinha, E. B. Sirota, S. Garoff, and H. B. Stanley, “X-ray and neutron scattering from rough surfaces,” Phys. Rev. B 38(4), 2297–2311 (1988). [CrossRef]

4. Z. Jiang, D. R. Lee, S. Narayanan, J. Wang, and S. K. Sinha, “Waveguide-enhanced grazing-incidence small-angle x-ray scattering of buried nanostructures in thin films,” Phys. Rev. B 84(7), 075440 (2011). [CrossRef]

5. J. M. Cowley and A. F. Moodie, “The scattering of electrons by atoms and crystals. i. a new theoretical approach,” Acta Crystallogr. 10(10), 609–619 (1957). [CrossRef]

6. P. Goodman and A. Moodie, “Numerical evaluations of n-beam wave functions in electron scattering by the multi-slice method,” Acta Crystallogr., Sect. A: Cryst. Phys., Diffr., Theor. Gen. Crystallogr. 30(2), 280–290 (1974). [CrossRef]

7. D. J. Smith, “The realization of atomic resolution with the electron microscope,” Rep. Prog. Phys. 60(12), 1513–1580 (1997). [CrossRef]

8. H. Brown, P. Pelz, C. Ophus, and J. Ciston, “A python based open-source multislice simulation package for transmission electron microscopy,” Microsc. Microanal. 26(S2), 2954–2956 (2020). [CrossRef]

9. A. Hare and G. Morrison, “Near-field soft x-ray diffraction modelled by the multislice method,” J. Mod. Opt. 41(1), 31–48 (1994). [CrossRef]

10. Y. Wang, “A numerical study of resolution and contrast in soft x-ray contact microscopy,” J. Microsc. 191(2), 159–169 (1998). [CrossRef]

11. P. Li and A. Maiden, “Multi-slice ptychographic tomography,” Sci. Rep. 8(1), 1–10 (2018). [CrossRef]

12. K. Li, M. Wojcik, and C. Jacobsen, “Multislice does it all-calculating the performance of nanofocusing x-ray optics,” Opt. Express 25(3), 1831–1846 (2017). [CrossRef]

13. D. Paganin, Coherent X-ray optics, 6 (Oxford University Press on Demand, 2006) pp. 99–101.

14. I. Lobato and D. Van Dyck, “Multem: A new multislice program to perform accurate and fast electron diffraction and imaging simulations using graphics processing units with cuda,” Ultramicroscopy 156, 9–17 (2015). [CrossRef]

15. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” openreview.net (2017).

16. M. Du, S. Kandel, J. Deng, X. Huang, A. Demortiere, T. T. Nguyen, R. Tucoulou, V. De Andrade, Q. Jin, and C. Jacobsen, “Adorym: A multi-platform generic x-ray image reconstruction framework based on automatic differentiation,” Opt. Express 29(7), 10000–10035 (2021). [CrossRef]

17. S. Kandel, S. Maddali, M. Allain, S. O. Hruszkewycz, C. Jacobsen, and Y. S. Nashed, “Using automatic differentiation as a general framework for ptychographic reconstruction,” Opt. Express 27(13), 18653–18672 (2019). [CrossRef]

18. R. Okuta, Y. Unno, D. Nishino, S. Hido, and C. Loomis, “Cupy: A numpy-compatible library for nvidia gpu calculations,” in Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Thirty-first Annual Conference on Neural Information Processing Systems (NIPS), (2017).

19. A. Paszke, S. Gross, F. Massa, et al., “Pytorch: An imperative style, high-performance deep learning library,” Adv. Neural Information Processing Systems 32, 1 (2019).

20. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014). [CrossRef]

21. S. Marchesini, H. He, H. N. Chapman, S. P. Hau-Riege, A. Noy, M. R. Howells, U. Weierstall, and J. C. Spence, “X-ray image reconstruction from a diffraction pattern alone,” Phys. Rev. B 68(14), 140101 (2003). [CrossRef]

22. S. Ali, M. Du, M. F. Adams, B. Smith, and C. Jacobsen, “Comparison of distributed memory algorithms for x-ray wave propagation in inhomogeneous media,” Opt. Express 28(20), 29590–29618 (2020). [CrossRef]

Sample Description	Length ( $\hat{z}$ ) [ $μ$ m]	width ( $\hat{x}$ ) [ $μ$ m]	thickness ( $\hat{y}$ ) [nm]
Elongated rod A ( $α_{i}$ = 0.5 $^{\circ}$ , $ψ$ = 0 $^{\circ}$ ) (Experiment and simulation) Pattern material: Gold	70	4	50
Elongated rod A ( $α_{i}$ = 0.5 $^{\circ}$ , $ψ$ = -0.5 $^{\circ}$ ) (Experiment and simulation) Pattern material: Gold	70	4	50
Buried 3D structure ( $α_{i}$ = 0.5 $^{\circ}$ , $ψ$ = 0 $^{\circ}$ ) (Simulated reconstruction) Pattern materials: Gold, Titanium, Silicon, Air	20	0.5	200
FIB deposited 3D structure ( $α_{i}$ = 0.5 $^{\circ}$ , $ψ$ = 0 $^{\circ}$ ) Two layers on top of each other while centered (Experiment and simulation) Pattern material: Platinum	70 & 70	0.9 & 0.3	15 & 10

Multislice forward modeling of coherent surface scattering imaging on surface and interfacial structures

Abstract

1. Introduction

2. Experimental setups

3. Fast computing multislice simulations in GPU and experimental validation

4. Reconstruction using the multislice forward model

4.1 CSSI-CDI reconstruction from simulation data: a simulated 3D pattern buried within a silicon substrate

4.2 CSSI-CDI structural refinement from experimental data: a 3D pattern created by the focused ion beam

5. Conclusion

Funding

Acknowledgments

Disclosures

Data Availability

References

Data Availability

Cited By

Figures (4)

Tables (1)

Equations (4)

Optics Express