Abstract
We introduce a simple method to accelerate the convergence of iterative solvers of the frequency-domain Maxwell’s equations for deep-subwavelength structures. Using the continuity equation, the method eliminates the high multiplicity of near-zero eigenvalues of the operator while leaving the operator nearly positive-definite. The impact of the modified eigenvalue distribution on the accelerated convergence is explained by visualizing residual vectors and residual polynomials.
© 2013 Optical Society of America
1. Introduction
To understand electromagnetic (EM) and optical phenomena, it is essential to solve Maxwell’s equations efficiently. In the frequency domain, assuming a time dependence e+iωt and nonmagnetic materials, Maxwell’s equations reduce to
where ε is the electric permittivity (which can be complex); μ0 is the magnetic permeability of vacuum; ω is the angular frequency; E and J are the electric field and electric current source density, respectively.To solve Eq. (1) numerically, one can use a method such as the finite-difference frequency-domain (FDFD) method [1–3] or the finite element method (FEM) [4] to construct a large system of linear equations
where A is a matrix representing the operator [∇× (∇× ) −ω2μ0ε]; x is an unknown column vector representing E; b is a column vector representing −iωμ0J. The matrix A thus constructed is sparse (with only 13 nonzero elements per row when generated by the FDFD method) and typically very large (often with more than 10 million rows and columns for three-dimensional (3D) problems). To solve a system with such a large and sparse matrix, iterative methods are usually preferred to direct methods [5].However, in the “low-frequency regime” where the wavelength is much longer than the grid cell size Δ, it is well-known that convergence is quite slow when the iterative methods are directly applied to solve Eq. (1). The low-frequency regime arises, for example, in nanophotonics [6–8] and geophysics [9–11] where structures have feature sizes that are at deep-subwavelength scale, and it will be defined more rigorously in Sec. 2. The huge null space of the operator ∇× (∇× ) was shown to be the origin of the slow convergence [10,11], and several techniques to improve the convergence speed have been developed.
The first class of techniques is based on the Helmholtz decomposition, which decomposes the E-field as E = Ψ + ∇φ, where Ψ is a divergence-free vector field and φ is a scalar field [9–15]. Because ∇ · Ψ = 0, Eq. (1) is written as
where the operator ∇× (∇× ), which has a huge null space, is replaced with the negative Laplacian −∇2, which is positive-definite for appropriate boundary conditions and thus has the smallest possible null space. However, these techniques either solve an extra equation for the extra unknown φ at every iteration step [9–12], which can be time-consuming, or increase the number of the rows and columns of the matrix by about 33% [13–15], which requires more memory.The second class of techniques utilizes the charge-free condition
The condition (4) holds at every source-free (i.e., J = 0) position, where Eq. (1) can be modified to for an arbitrary constant s; note that the right-hand side is 0 because J = 0. In this class of techniques, Eqs. (1) and (5) are solved at positions with and without sources, respectively.Reference [16] applied the above technique with s = +1 to boundary value problems described in [17] and achieved accelerated convergence. Such boundary value problems satisfied J = 0 everywhere, so Eq. (5) was solved throughout the entire simulation domain.
However, Ref. [16] did not conduct a detailed comparison of convergence speed between different values of s. It also did not report whether its technique leads to accelerated convergence for problems with sources, even though many problems have nonzero electric current sources J inside the simulation domain. Reference [1] applied the technique with s = +1 to problems with sources, but only in order to suppress spurious modes rather than to accelerate convergence.
In this paper, we develop a modification of Eq. (1) that improves convergence speed even if electric current sources J exist inside the simulation domain [18]. Unlike the previous technique that made the modification only at source-free positions, our technique modifies Eq. (1) everywhere including positions with sources. For the modification, we utilize the continuity equation
which can be derived by taking the divergence of Eq. (1). When Eq. (6) is manipulated appropriately and then added to Eq. (1), we obtain for a constant s. The modified equation (7) is the equation to solve in this paper.The solution E-field of Eq. (7) is the same as the solution of the original equation (1) regardless of the value of s, because the solution of Eq. (1) always satisfies Eq. (6). However, the choice of s affects the convergence speed of iterative methods significantly. In this paper, we demonstrate that s = −1 induces faster convergence speed than other values of s by comparing the convergence behavior of iterative methods for s = −1, 0, +1; the latter two values of s are of particular interest, because s = 0 reduces Eq. (7) to the original equation (1) and s = +1 is the value that Ref. [16] used in Eq. (5), which is similar to Eq. (7).
We also show that the difference in convergence behavior results from the different eigenvalue distributions of the operators for different s. There are many general mathematical studies about the dependence of the convergence behavior on the eigenvalue distribution [19–26]. Our aim here is instead to provide an intuitive understanding of the convergence behavior specifi-cally for the operator of Eq. (7). For this purpose, at each iteration step we visualize the residual vector and residual polynomial, which are widely used concepts to explain the convergence behavior of iterative methods [5] and also defined briefly in Sec. 3. As a result, we find that convergence speed deteriorates substantially for s = 0 because the operator has eigenvalues clustered near zero, and for s = +1 because the operator is strongly indefinite.
The rest of this paper is organized as follows. In Sec. 2 we investigate the eigenvalue distribution of the operator in Eq. (7) for s = 0, −1, +1 for a simple homogeneous system. We also define the low-frequency regime rigorously in the section. In Sec. 3, we relate the eigenvalue distribution with the convergence behavior of an iterative method. In Sec. 4, we solve Eq. (7) for a wide range of realistic 3D problems to compare the convergence behavior of an iterative method for the three values of s, and we conclude in Sec. 5.
2. Eigenvalue distribution of the operator for a homogeneous system
In this section, we consider the operator in Eq. (7) for a homogeneous system and show that the properties of the eigenvalue distribution of the operator strongly depend on the value of s. The impact of s on the eigenvalue distribution has been studied in detail in the literature of the deflation method (also known as the penalty method) [27–29]. Here we only highlight those aspects that are important for the present study.
For a homogeneous system where ε is constant, Eq. (7) is simplified to
where the operator is Hermitian for real ε. Because ε is constant in this section, the eigenvalue distribution of T is shifted from the eigenvalue distribution of a Hermitian operator by a constant −ω2μ0ε. In the low-frequency regime such shift is negligible, and thus the eigen-value distribution of T0 approximates that of T very well. Hence, we examine the eigenvalue distribution of T0 below to investigate the eigenvalue distribution of T.In Appendix A, we show that Fke−ik·r with
are the three eigenfunctions of both ∇× (∇× ) and ∇(∇· ) for each wavevector k. We also show in the same appendix that the corresponding three eigenvalues are for ∇×(∇× ), and for ∇(∇· ). Therefore, T0 has as three eigenvalues for each wavevector k.Equation (14) indicates that the eigenvalue distribution of T0 is greatly affected by the value of s. Specifically, the multiplicity of the eigenvalue 0 depends critically on whether s is 0 or not: for s = 0 T0 has a very high multiplicity of the eigenvalue 0 because Eq. (14) has 0 as an eigenvalue for every k, whereas for s ≠ 0 T0 does not have such a high multiplicity of the eigenvalue 0. The definiteness of T0 also depends on the value of s: for s ≤ 0 T0 is positive-semidefinite because the three eigenvalues in Eq. (14) are always nonnegative, whereas for s > 0 T0 is indefinite because Eq. (14) has both positive and negative numbers as eigenvalues. The different properties of the eigenvalue distributions of T0 for s = 0, s < 0, and s > 0 are summarized in Table 1.
The above description of the eigenvalue distributions of T0 should approximately hold for the eigenvalue distributions of T as well in the low-frequency regime, as mentioned in the discussion of Eqs. (9) and (10). Moreover, even though T is a differential operator defined in an infinite space, it turns out that the description also applies to the matrix A discretized from T that is defined in a spatially bounded simulation domain.
To demonstrate, we numerically calculate the eigenvalues of A for a two-dimensional (2D) system shown in Fig. 1, a square domain filled with vacuum. The domain is discretized on a finite-difference grid with Nx × Ny = 50 × 50 cells and cell size Δ = 2 nm. Therefore, the matrix A for each s has 3NxNy = 7500 rows and columns, where the extra factor 3 accounts for the three Cartesian components of the E-field. We choose ω corresponding to the vacuum wavelength λ0 = 1550nm, which puts the system in the low-frequency regime as will be seen at the end of this section. The matrices A are constructed for three values of s: 0, −1, and +1, each of which represents each category of s in Table 1.
The distributions of the numerically calculated eigenvalues of A for s = 0, −1, +1 are shown as three plots in Fig. 2. In each plot, the horizontal axis represents eigenvalues, and it is divided into 41 intervals t−20,...,t0,...,t20 where t0 ∋ 0. The height of the column on each interval corresponds to the number of the eigenvalues in the interval.
The eigenvalue distributions of A shown in Fig. 2 agree well with the description of the eigenvalues of T0 in Table 1: the very tall column on t0 in Fig. 2(a) indicates the very high multiplicity of λ ≃ 0 for s = 0, and the eigenvalues distributed over tj<0 and tj>0 in Fig. 2(c) indicate a strongly indefinite operator for s = +1. In addition, the height of the column on t0 in Fig. 2(a) is about 2500, or one third of the total number of eigenvalues, which agrees with Eq. (14) for s = 0 where one of the three eigenvalues is 0 for each k; the columns on tj>0 are about 1.5 times taller in Fig. 2(b) than in Fig. 2(a), which also agrees with Eq. (14) where the number of |k|2 increases from two for s = 0 to three for s = −1.
We end the section by providing a quantitative definition of the low-frequency regime. Suppose that A0 is the matrix discretized from T0 of Eq. (10). For s = 0, the eigenvalues of A0 range from 0 to , where Δmin is the minimum grid cell size [31]; note that the range agrees with Fig. 2(a). The eigenvalue distribution of A is the shifted eigenvalue distribution of A0 by −ω2μ0ε. The low-frequency regime is where the magnitude of the shift is so small that A has an almost identical eigenvalue distribution as A0. Therefore, the condition for the low-frequency regime is
Equation (15) is consistent with the condition introduced in [10], but here we provide a condition that is based on a more accurate estimate of the maximum eigenvalue of A0. We can rewrite Eq. (15) in terms of the vacuum wavelength λ0 as where εr = ε/ε0 is the relative electric permittivity. The system described in Fig. 1 satisfies Eq. (16), so it is in the low-frequency regime.3. Impact of the eigenvalue distribution on the convergence behavior of GMRES
In this section, we explain how the different eigenvalue distributions for different values of s examined in Sec. 2 influence the convergence behavior of an iterative method to solve Eq. (8).
For each of s = −1, 0, +1, we discretize Eq. (8) using the FDFD method for the system illustrated in Fig. 1 with an x-polarized electric dipole current source placed at the center of the simulation domain. We then solve the discretized equation by an iterative method to observe the convergence behavior. The iterative method to use in this section is the general minimal residual (GMRES) method [33], which is one of the Krylov subspace methods [5]. We use GMRES without restart because the system is sufficiently small.
Like other iterative methods, GMRES generates an approximate solution xm of Ax = b at the mth iteration step. As m increases, xm eventually converges to the exact solution. We assume that convergence is achieved when the residual vector
satisfies ||rm||/||b|| < τ, where ||·|| is the 2-norm and τ is a user-defined small positive number. In practice, τ = 10−6 is sufficiently small for accurate solutions. We use x0 = 0 as an initial guess solution throughout the paper.Figure 3 shows ||rm||/||b|| versus the number m of iteration steps for the three values of s. As can be seen in the figure, the convergence behavior of GMRES is quite different for different s, with s = −1 far more superior than the other two choices of s.
The overall trend of the convergence behavior shown in Fig. 3 is consistent with the mathematical theories of iterative methods. For example, the convergence stagnates initially for s = 0, and according to [21] this is typical behavior of GMRES for a matrix with many eigenvalues close to 0 such as our A for s = 0 (see Fig. 2(a)). Also, the convergence is very slow for s = +1, and Ref. [23] argues that in general the Krylov subspace methods converge much more slowly for indefinite matrices such as our A for s = +1 (see Fig. 2(c)) than for definite matrices. In this section we provide a more intuitive explanation for the convergence behavior by using the residual polynomial.
We first review the residual polynomial of GMRES briefly. Suppose that 𝒫m is the set of all polynomials p̃m of degree at most m such that
For each p̃m ∈ 𝒫m, we can define a column vector At the mth iteration step of GMRES, the residual vector rm of Eq. (17) is the r̃m with the smallest 2-norm [5]. We refer to the p̃m for r̃m = rm as the residual polynomial pm. Therefore, from Eq. (19) we haveBelow, we show how the eigenvalue distribution of A influences pm at each iteration step and hence influences the convergence behavior of GMRES. The matrix A ∈ ℂn×n for our homogeneous system described in Fig. 1 is Hermitian because it is discretized from the Hermitian operator T of Eq. (8). Hence, the eigendecomposition of A is
where with real eigenvalues λi and the corresponding normalized eigenvectors vi, and V† is the conjugate transpose of V; note that V is unitary, i.e., V†V = I. Substituting Eq. (21) in Eq. (19), we obtain because (VΛV†)k = VΛkV† for any nonnegative integer k. We then define a column vector whose ith element, which is referred to as z̃mi below, is the projection of r̃m/||b|| onto the direction of the ith eigenvector vi. Similarly, we also define From Eq. (25) for m = 0 and Eqs. (23) and (24), we obtain which can be written element-by-element as Because ||z̃m|| = ||r̃m||/||b||, GMRES minimizes ||z̃m|| to ||zm|| when it minimizes ||r̃m|| to ||rm|| at the mth iteration step.According to Eq. (27), |z̃mi| is minimized to 0 when p̃m has λi as a root. Thus, the most ideal p̃m has all the n eigenvalues of A as its roots, because it reduces ||z̃m|| to 0. However, p̃m has at most m roots, and m, which is the number of iteration steps, is typically far less than n. Therefore, p̃m needs to have its roots optimally placed near the eigenvalues to minimize ||z̃m||. Hence, the eigenvalue distribution of A greatly influences the convergence behavior of GMRES.
We now seek to understand the convergence behavior of GMRES for the different choices of s. We begin with s = −1. In Fig. 4 we plot rm/||b|| for s = −1 as bar graphs at the first few iteration steps. The horizontal axis in each plot represents eigenvalues. We divide the range of eigenvalues into the same 41 intervals t−20,...,t0,...,t20 used in Fig. 2; note that t0 ∋ 0. The height of the column on each interval is the norm of the projection of rm/||b|| onto the space spanned by the eigenvectors whose corresponding eigenvalues are contained in the interval. More specifically, the height of the column on tj after m iteration steps is
Note that , and thus the sum of the squares of the column heights is a direct measure of convergence.A few properties of rm/||b|| for s = −1 shown in Fig. 4 are readily predicted from the corresponding eigenvalue distribution of the matrix A presented in Fig. 2(b). For instance, A has no eigenvalues in tj<0, and therefore rm/||b|| has components only in tj≥0 throughout the iteration process as demonstrated in Fig. 4. Also, A has very few eigenvalues in t0, and thus r0/||b|| has a very weak component in t0 as can be seen in the m = 0 plot in Fig. 4.
Now, we relate rm/||b|| with the residual polynomial to explain the convergence behavior of GMRES for s = −1. The residual polynomial pm(λ), which is obtained by solving a least squares problem, is also plotted in Fig. 4 at each iteration step. As the iteration proceeds, the residual polynomial in Fig. 4 has more and more roots, but only in tj≥0, because the eigenvalues exist only in tj≥0 and the roots of residual polynomials should stay close to the eigenvalues as mentioned in the discussion following Eq. (27). Also, as Eq. (27) predicts, the columns in each plot of Fig. 4 almost vanish at the roots of the residual polynomial. Therefore, all the columns quickly shrink as the number of the roots of the residual polynomial increases in the iteration process of GMRES. The fast reduction of the column heights provides visualization of the fast convergence of GMRES for s = −1 shown in Fig. 3.
Next, we examine the convergence behavior for s = 0. Figure 5 shows rm/||b|| for s = 0 at the first few iteration steps. Note that r0/||b|| has a tall column on t0 because A has many eigenvalues in t0 as shown in Fig. 2(a). Also, the tall column on t0 persists during the initial period of the iteration process.
To explain the above convergence behavior for s = 0, we show that for a nearly positive-definite matrix the column on t0 is persistent during the initial period of the iteration process of GMRES in general. For that purpose, we compare the three polynomials p̃m ∈ 𝒫m shown in Fig. 6. The three p̃m are chosen as candidates for the residual polynomial pm for a nearly positive-definite matrix, and therefore the roots of the polynomials are placed in tj≥0 according to the discussion following Eq. (27). The three p̃m have the same roots except for their smallest roots: p̃m in Fig. 6(a) does not have its smallest root in t0, whereas p̃m in Figs. 6(b) and 6(c) do. Note that the latter two p̃m can shrink the column on t0 more effectively than the first p̃m according to Eq. (27).
However, the slopes at the roots of the latter two p̃m are steeper than the slopes at the corresponding roots of the first p̃m as shown in Fig. 6. In Appendix B, we prove rigorously that the slopes of p̃m at all roots indeed increase as the smallest root decreases in magnitude. In general, p̃m with steeper slopes at the roots oscillates with larger amplitudes around the horizontal axis because it varies faster around the axis; compare the amplitudes of oscillation in Fig. 6(a) with those in Figs. 6(b) and 6(c). The increased amplitudes of oscillation amplify |z̃mi| overall according to Eq. (27), and thus ||z̃m|| as well.
In other words, shrinking the column on t0 (by placing the smallest root of p̃m in t0) is achieved only at the penalty of amplifying the columns on tj>0. This penalty is too heavy when the columns on tj>0 constitute a considerable portion of ||z̃m||. Therefore, roots of residual polynomials are not placed in t0 until the columns on tj>0 become quite small, which results in the persistence of the column on t0 during the initial period of the iteration process.
Because the height of the column on t0 remains almost the same at the initial iteration steps of GMRES, h00 of Eq. (28), which is the initial height of this column, provides an approximate lower bound of ||zm|| = ||rm||/||b|| during the initial period of the iteration process. A more accurate lower bound is calculated as the norm of r0/||b|| projected onto the eigenspace of the eigenvalue closest to 0. For our example system, for s = 0 the calculated lower bound is 0.707. Note that ||rm||/||b|| for s = 0 indeed stagnates initially at this value in Fig. 3. For s = −1 the calculated lower bound is 4.16 × 10−7, at which ||rm||/||b|| also stagnates as shown in Fig. 3. However, this value is much smaller than the lower bound for s = 0, because for s = −1 the initial height of the column on t0 is almost negligible as shown in the m = 0 plot in Fig. 4. In fact, the value is smaller than the conventional tolerance τ = 10−6 mentioned below Eq. (17), so the stagnation does not deteriorate the convergence speed for s = −1.
Lastly, we examine the convergence behavior for s = +1. Figure 7 shows rm/||b|| for s = +1 at some first (m = 0, 4, 7, 11) and later (m = 120, 140) iteration steps. Because the matrix A for s = +1 has both positive and negative eigenvalues as indicated in Fig. 2(c), rm/||b|| has components in both tj>0 and tj<0, but in the present example the components of rm/||b|| are concentrated in tj<0 initially (m = 0 plot in Fig. 7). Thus GMRES begins with the roots of residual polynomials placed in tj<0 (m = 4 plot in Fig. 7). However, such residual polynomials have large values in tj>0, so they amplify the initially very small components of rm/||b|| in tj>0 according to Eq. (27), and eventually we reach a point where the components of rm/||b|| in tj>0 and tj<0 become comparable (m = 7 plot in Fig. 7). Afterwards, GMRES places the roots of residual polynomials in both tj>0 and tj<0 so that the components of rm/||b|| in both regions are reduced.
We note that the convergence behavior for s = +1 is initially quite similar to that for s = −1 because r0/||b|| for s = +1 has components concentrated in tj<0 and only a very weak component in t0. Therefore, ||rm||/||b|| reduces quickly for s = +1 without stagnation during the initial period of the iteration process as shown in Fig. 3.
During the later period of the iteration process, however, the reduction of ||rm||/||b|| for s = +1 slows down significantly, and eventually s = +1 produces the slowest convergence among the three values of s as shown in Fig. 3. The slow reduction of ||rm||/||b|| is due to the very persistent column on t0: comparing the plots for m = 120 and m = 140 in Fig. 7 shows that the column barely reduces for 20 iteration steps.
We have shown earlier that the column on t0 is quite persistent for a nearly positive-definite matrix. The argument relied on the properties proved in Appendix B about a polynomial p̃m ∈ 𝒫m with only positive roots. We can easily extend the proof in the appendix to p̃m with both positive and negative roots, and then show that the column on t0 is persistent also for a strongly indefinite matrix, which explains the slow convergence for s = +1 described above. However, the explanation is insufficient to explain why the convergence is much slower for s = +1 than for s = 0 as indicated in Fig. 3.
Here, we show that the column on t0 is in fact even more persistent for a strongly indefinite matrix than for a nearly positive-definite matrix. For that purpose, we compare the two polynomials p̃m ∈ 𝒫m shown in Fig. 8. As can be seen from the locations of their roots, they are candidates for the residual polynomials for different matrices: p̃m shown in Fig. 8(a) is appropriate for a nearly positive-definite matrix (referred to as Adef below), and p̃m shown in Fig. 8(b) is appropriate for a strongly indefinite matrix (referred to as Aind below). Moreover, we choose these two p̃m to have the same smallest-magnitude root ζ0 in t0. Being elements of 𝒫m, both p̃m satisfy Eq. (18). Hence, we have |p̃′m (ζ0)| ≃ 1/|ζ0| for both p̃m, where p̃′m is the first derivative of p̃m.
Now, we note that |p̃′m| evaluated at a root of p̃m tends to decrease as the root gets closer to the median of the roots; see Appendix C for a more rigorous explanation. Hence, |p̃′m| ≤ 1/|ζ0| tends to hold at most roots of p̃m for Adef, because ζ0 is one of the farthest roots from the median of the roots. On the other hand, |p̃′m| ≥ 1/|ζ0| tends to hold at most roots of p̃m for Aind, because ζ0 is one of the closest roots to the median of the roots. Therefore, p̃m for Aind has much steeper slopes at most roots than p̃m for Adef in general, and thus has larger amplitudes of oscillation around the horizontal axis, as demonstrated in Fig. 8.
Combined with Eq. (27), the above argument shows that shrinking the column on t0 (by placing the smallest-magnitude root of p̃m in t0) increases ||zm|| much more for a strongly indefinite matrix than for a nearly positive-definite matrix. Therefore, the column on t0 should be much more persistent for a strongly indefinite matrix than for a nearly positive-definite matrix in general, which explains the much slower convergence for s = +1 than for s = 0 in Fig. 3.
In summary of this section, we have shown that s = −1 produces the most superior convergence behavior; s = 0 induces stagnation during the initial period of the iteration process due to the high multiplicity of eigenvalues near zero; s = +1 leads to the slowest convergence overall due to the strongly indefinite matrix. We have provided a graphical explanation of the difference in the convergence behavior of GMRES, for which a strong theoretical basis exists, using a simple system of a homogeneous dielectric medium.
The arguments provided in this section can be easily extended to show that s = −1 is indeed optimal among all values in general. Compared with the case of s = −1, according to Eq. (14), for s > 0 A is always more strongly indefinite and therefore the convergence should be slower; for −1 < s < 0 A has more eigenvalues clustered near zero and thus the initial stagnation period should be longer; for s < −1 A has a wider eigenvalue value range, so the condition number of A should be larger and the convergence should be slower [23]. In other words, s = −1 is the value that leaves A nearly positive-definite while removing the eigenvalues clustered near zero sufficiently without increasing the condition number. With separate numerical experiments we have verified that s = −1 is indeed superior to values other than s = 0 and s = +1 as well.
In the next section we will see that the difference in the convergence behavior for different choices of s is in fact quite general in practical situations.
4. Convergence behavior of QMR for 3D inhomogeneous systems
In this section, we solve Eq. (7) for 3D inhomogeneous systems of practical interest by an iterative method, and demonstrate that s = −1 still induces faster convergence than s = 0 and s = +1. We note that the systems examined in this section are inhomogeneous and have complex ε in general. The analyses in Secs. 2 and 3, therefore, do not hold strictly here. Nevertheless, we will see that the analyses for the homogeneous system in the previous sections provide insight in understanding the convergence behavior for more general systems examined in this section.
The three 3D inhomogeneous systems we consider are illustrated in the first row of Fig. 9. To prevent reflection of EM waves from boundaries, we enclose each system by the stretched-coordinate perfectly matched layer (SC-PML), because SC-PML produces a much better-conditioned matrix than the more commonly used uniaxial PML (UPML) [32]. For each system, we construct three systems of linear equations Ax = b corresponding to s = −1, 0, +1 by the FDFD method. The number of the grid cells Nx, Ny, and Nz and the grid cell sizes Δx, Δy, and Δz of the finite-difference grid used to discretize each system are shown in Table 2. Considering the parameters summarized in Table 3 and the condition (16), all the three systems are in the low-frequency regime.
The constructed systems of linear equations are solved by the quasi-minimal residual (QMR) method [37], which is another Krylov subspace method. GMRES that was used in Sec. 3 to solve a 2D problem is not suitable for 3D problems here because it requires too much memory [5].
The second row of Fig. 9 shows the convergence behavior of QMR for the three systems.
Note that for all three systems the choice of s = −1 results in the fastest convergence, and the choice of s = +1 barely leads to convergence. The three systems shown in Fig. 9 are chosen deliberately to include different materials such as dielectrics and metals and geometries with different degrees of complexity. Therefore, Fig. 9 suggests that the superiority of s = −1 over both s = 0 and s = +1 is quite general.
Even though both the iterative method and the systems in this section are significantly different from those in the previous section, the convergence behaviors are very similar. We explain the resemblance as follows.
The matrix for an inhomogeneous system is actually not much different from that for a homogenous system in many cases. Indeed, most inhomogeneous systems consist of several homogeneous subdomains. Inside each homogeneous subdomain of such an inhomogeneous system, the differential operator in Eq. (7) for the inhomogeneous system is the same as the differential operator (9) for a homogeneous system, whereas at the interfaces between the subdomains it is not. Nevertheless, the number of finite-difference grid points assigned at the interfaces is usually much smaller than that of the grid points assigned inside the homogeneous subdomains. Therefore most rows of the matrix for the inhomogeneous system should be the same as those for a homogeneous system discretized on the same grid.
In addition, the differential operator (9) for a homogeneous system is nearly Hermitian in the low-frequency regime even if ε is complex, because it is approximated well by the Hermitian operator (10).
Hence, the matrix for an inhomogeneous system is actually quite similar to the nearly Hermitian matrix for a homogeneous system. Because QMR reduces to GMRES for Hermitian matrices in exact arithmetic [38], it is reasonable that the convergence behavior of QMR for an inhomogeneous system is similar to that of GMRES for a homogeneous system.
The matrix for an inhomogeneous system deviates more from that for a homogeneous system as the number of homogeneous subdomains increases, because then the number of grid points assigned at the interfaces between homogeneous subdomains increases. Therefore, we can expect that the convergence behavior for an inhomogeneous system would deviate from that for a homogeneous system as the number of homogeneous subdomains increases. Such deviation is demonstrated in Fig. 9(c), where the system has many metallic pillars; note that the convergence behavior for s = −1 is no longer very different from that for s = 0 in this case.
5. Conclusion and final remarks
We have introduced a new method to accelerate the convergence of iterative solvers of the frequency-domain Maxwell’s equations in the low-frequency regime. The method solves a new equation that is modified from the original Maxwell’s equations using the continuity equation.
The operator of the newly formulated equation does not have the high multiplicity of near-zero eigenvalues that makes the convergence stagnate for the original operator. At the same time, the new operator is nearly positive-definite, so it avoids the long-term slow convergence that indefinite operators suffer from.
In this paper, we have considered only nonmagnetic materials (μ = μ0). For magnetic materials (μ ≠ μ0), we note that a similar equation
which can also be formulated from Maxwell’s equations and the continuity equation, can be used instead of Eq. (7) to accelerate the convergence of iterative methods. We leave the discussion on the optimal value of s in this equation for future work.Because our method achieves accelerated convergence by formulating a new equation, it can be easily combined with other acceleration techniques such as preconditioning and better iterative methods.
Appendix A: Eigenvalues and eigenfunctions of ∇×(∇× ) and ∇(∇· )
Using the k-space representations of the operators, in this section we derive the eigenvalues Eq. (12) of ∇×(∇× ) and Eq. (13) of ∇(∇· ) as well as their corresponding eigenfunctions.
Because both ∇× (∇× ) and ∇(∇· ) are translationally invariant, their eigenfunctions have the form [39]
where r represents position, k = x̂kx + ŷky + ẑkz is a real constant wavevector, and is a k-dependent vector.We reformulate the eigenvalue equations ∇×(∇×F) = λF and ∇(∇·F) = λF by substituting Eq. (30) for F. Then, the eigenvalue equation for ∇×(∇× ) is
and the eigenvalue equation for ∇(∇· ) isBy solving Eqs. (31) and (32) for a given k, we obtain
which is Eq. (12), as the eigenvalues of ∇×(∇× ), and which is Eq. (13), as the eigenvalues of ∇(∇· ), and Eq. (30) with as the eigenfunctions corresponding to both Eqs. (33) and (34).We note from Eqs. (33) and (34) that ∇×(∇× ) and ∇(∇· ) are positive-semidefinite and negative-semidefinite, respectively.
Appendix B: Effect of the smallest root of p̃m ∈ 𝒫m on the slopes at the roots
In this section, we show that the slopes at the roots of a polynomial p̃m ∈ 𝒫m with all positive roots become steeper when the smallest root decreases in magnitude. This behavior is illustrated in Fig. 6.
Since p̃m ∈ 𝒫m satisfies the condition (18), it can be factored as
where dm ≤ m is the degree of p̃m and ζi’s are the roots of p̃m. Hence, the slope of p̃m at a root ζk isNow, suppose that 0 < ζ1 < ··· < ζdm. We can easily show that |p̃′m(ζk)| increases for any k when ζ1 decreases toward zero (while remaining positive) as follows. For k = 1, we have
which clearly increases as ζ1 decreases to 0. For k > 1, we have where the parentheses enclose the only quantity that is dependent on ζ1. We can therefore see that |p̃′m (ζk)| increases as ζ1 decreases for k > 1 as well. Therefore, for a given p̃m ∈ 𝒫m whose roots are all positive, the slopes of p̃m at the roots become steeper if the smallest root decreases in magnitude while remaining positive. This situation is illustrated by the transition from Fig. 6(a) to Fig. 6(b).The slopes at the roots also become steeper when the originally positive ζ1 is replaced by a negative value, as long as the negative value is smaller in magnitude than the original ζ1. Replacing the originally positive ζ1 with a negative quantity that is smaller in magnitude is equivalent to first replacing ζ1 with a smaller positive value and then flipping its sign. Because we have already shown above that the slopes get steeper when the originally positive ζ1 is replaced by a smaller positive value, it is sufficient to show that flipping the sign of ζ1 makes the slopes even steeper. For a negative ζ1, the slopes at the roots are
and for k > 1. These slopes are steeper than Eqs. (38) and (39), respectively, which are the slopes for a positive ζ1 with the same magnitude. Therefore, for a given p̃m ∈ 𝒫m whose roots are all positive, the slopes of p̃m at the roots become steeper if the smallest root is replaced by the one that is smaller in magnitude but negative. This situation is illustrated by the transition from Fig. 6(a) to Fig. 6(c).Appendix C: Trend in the slopes of a polynomial at the roots
In this section, we consider a polynomial p with all real roots, and show that the slope of p evaluated at a root closer to the median of the roots tends to be gentler than the slope evaluated at a root farther away from the median of the roots. This behavior is illustrated in Fig. 8.
Consider a polynomial of degree m,
with ζ1 < ··· < ζm. The slope of p at a root ζk isNow, we evaluate |p′(ζk+1)|/|p′(ζk)|. We first consider the case where the roots are evenly spaced, i.e., ζi+1 − ζi = (const.), for which we have
Equation (44) is an increasing function of k for 1 ≤ k ≤ m −1, and it is less than 1 for k < m/2 and greater than 1 for k > m/2. Therefore, |p′(ζk)| is largest for k = 1 and k = m, and it decreases as k becomes closer to k = ⌈(m + 1)/2⌉ and k = ⌊(m + 1)/2⌋, which are the medians of the indices. In other words, for p with evenly spaced roots, the slopes of p get gentler at the roots closer to the median of the roots.It is reasonable to expect that the above trend in the slopes also holds for p with unevenly spaced roots, unless the unevenness is too severe. To verify the expectation, we examine |p′(ζk+1)|/|p′(ζk)| without assuming ζi+1 − ζi = (const.):
Here, the factors within the first (second) brackets are always greater (less) than 1, so the number of factors greater (less) than 1 increases (decreases) for increasing k. Hence, |p′(ζk+1)|/|p′(ζk)| tends to be less than 1 for smaller k, and it tends to be greater than 1 for larger k. This means that as k increases |p′(ζk)| tends to decrease first and then tends to increase. Therefore, even if the roots of p are unevenly spaced, the slopes of p tend to get gentler at the roots closer to the median of the roots.Acknowledgment
We thank Prof. Michael Saunders for helpful discussions about iterative methods. This work was supported by the National Science Foundation (Grant No. DMS-0968809), the AFOSR MURI program (Grant No. FA9550-09-1-0704), and the Interconnect Focus Center, funded under the Focus Center Research Program, a Semiconductor Research Corporation entity. Wonseok Shin also acknowledges the support of Samsung Scholarship.
References and links
1. N. J. Champagne II, J. G. Berryman, and H. M. Buettner, “FDFD: A 3D finite-difference frequency-domain code for electromagnetic induction tomography,” J. Comput. Phys. 170, 830–848 (2001). [CrossRef]
2. G. Veronis and S. Fan, “Overview of simulation techniques for plasmonic devices,” in “Surface Plasmon Nanophotonics,” M. Brongersma and P. Kik, eds. (Springer, 2007), pp. 169–182. [CrossRef]
3. U. S. Inan and R. A. Marshall, Numerical Electromagnetics: The FDTD Method (Cambridge University, 2011). Ch. 14. [CrossRef]
4. J.-M. Jin, The Finite Element Method in Electromagnetics, 2nd ed. (Wiley, 2002).
5. V. Simoncini and D. B. Szyld, “Recent computational developments in Krylov subspace methods for linear systems,” Numer. Linear Algebra Appl. 14, 1–59 (2007). [CrossRef]
6. W. Cai, W. Shin, S. Fan, and M. L. Brongersma, “Elements for plasmonic nanocircuits with three-dimensional slot waveguides,” Adv. Mater. 22, 5120–5124 (2010). [CrossRef] [PubMed]
7. G. Veronis and S. Fan, “Bends and splitters in metal-dielectric-metal subwavelength plasmonic waveguides,” Appl. Phys. Lett. 87, 131102–3 (2005). [CrossRef]
8. L. Verslegers, P. Catrysse, Z. Yu, W. Shin, Z. Ruan, and S. Fan, “Phase front design with metallic pillar arrays,” Opt. Lett. 35, 844–846 (2010). [CrossRef] [PubMed]
9. J. T. Smith, “Conservative modeling of 3-D electromagnetic fields, Part II: Biconjugate gradient solution and an accelerator,” Geophys. 61, 1319–1324 (1996). [CrossRef]
10. G. A. Newman and D. L. Alumbaugh, “Three-dimensional induction logging problems, Part 2: A finite-difference solution,” Geophys. 67, 484–491 (2002). [CrossRef]
11. C. J. Weiss and G. A. Newman, “Electromagnetic induction in a generalized 3D anisotropic earth, Part 2: The LIN preconditioner,” Geophys. 68, 922–930 (2003). [CrossRef]
12. V. L. Druskin, L. A. Knizhnerman, and P. Lee, “New spectral Lanczos decomposition method for induction modeling in arbitrary 3-D geometry,” Geophys. 64, 701–706 (1999). [CrossRef]
13. E. Haber, U. M. Ascher, D. A. Aruliah, and D. W. Oldenburg, “Fast simulation of 3D electromagnetic problems using potentials,” J. Comput. Phys. 163, 150–171 (2000). [CrossRef]
14. J. Hou, R. Mallan, and C. Torres-Verdin, “Finite-difference simulation of borehole EM measurements in 3D anisotropic media using coupled scalar-vector potentials,” Geophys. 71, G225–G233 (2006). [CrossRef]
15. R. Hiptmair, F. Kramer, and J. Ostrowski, “A robust Maxwell formulation for all frequencies,” IEEE Trans. Magn. 44, 682–685 (2008). [CrossRef]
16. K. Beilenhoff, W. Heinrich, and H. Hartnagel, “Improved finite-difference formulation in frequency domain for three-dimensional scattering problems,” IEEE Trans. Microwave Theory Tech. 40, 540–546 (1992). [CrossRef]
17. A. Christ and H. L. Hartnagel, “Three-dimensional finite-difference method for the analysis of microwave-device embedding,” IEEE Trans. Microwave Theory Tech. 35, 688–696 (1987). [CrossRef]
18. At the final stage of our work, we were made aware of a related work by M. Kordy, E. Cherkaev, and P. Wannamaker, “Schelkunoff potential for electromagnetic field: proof of existence and uniqueness” (to be published), where an equation similar to our Eq. (7) with s= −1 was developed.
19. A. Jennings, “Influence of the eigenvalue spectrum on the convergence rate of the conjugate gradient method,” IMA J. Appl. Math. 20, 61–72 (1977). [CrossRef]
20. A. van der Sluis and H. van der Vorst, “The rate of convergence of conjugate gradients,” Numer. Math. 48, 543–560 (1986). [CrossRef]
21. S. L. Campbell, I. C. F. Ipsen, C. T. Kelley, and C. D. Meyer, “GMRES and the minimal polynomial,” BIT Numer. Math. 36, 664–675 (1996). [CrossRef]
22. S. Goossens and D. Roose, “Ritz and harmonic Ritz values and the convergence of FOM and GMRES,” Numer. Linear Algebra Appl. 6, 281–293 (1999). [CrossRef]
23. M. Benzi, G. H. Golub, and J. Liesen, “Numerical solution of saddle point problems,” Acta Numer. 14, 1–137 (2005). Sec. 9.2. [CrossRef]
24. B. Beckermann and A. B. J. Kuijlaars, “Superlinear convergence of conjugate gradients,” SIAM J. Numer. Anal. 39, 300–329 (2001). [CrossRef]
25. B. Beckermann and A. B. J. Kuijlaars, “On the sharpness of an asymptotic error estimate for conjugate gradients,” BIT Numer. Math. 41, 856–867 (2001). [CrossRef]
26. O. Axelsson, “Iteration number for the conjugate gradient method,” Math. Comput. Simulat. 61, 421–435 (2003). [CrossRef]
27. J. P. Webb, “The finite-element method for finding modes of dielectric-loaded cavities,” IEEE Trans. Microwave Theory Tech. 33, 635–639 (1985). [CrossRef]
28. F. Kikuchi, “Mixed and penalty formulations for finite element analysis of an eigenvalue problem in electromagnetism,” Comput. Method Appl. Mech. Eng. 64, 509–521 (1987). [CrossRef]
29. D. A. White and J. M. Koning, “Computing solenoidal eigenmodes of the vector Helmholtz equation: a novel approach,” IEEE Trans. Magn. 38, 3420–3425 (2002). [CrossRef]
30. N. W. Ashcroft and N. D. Mermin, Solid State Physics, 1st ed. (SaundersCollege, 1976), Ch. 8.
31. To obtain , take the first equation of Eq. (4.29) in [32] and then multiply the extra factor μ= μ0to account for the difference between Eq. (4.17a) in [32] and Eq. (1) in the present paper.
32. W. Shin and S. Fan, “Choice of the perfectly matched layer boundary condition for frequency-domain Maxwell’s equations solvers,” J. Comput. Phys. 231, 3406–3431 (2012). [CrossRef]
33. Y. Saad and M. H. Schultz, “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM J. Sci. Stat. Comp. 7, 856–869 (1986). [CrossRef]
34. P. B. Johnson and R. W. Christy, “Optical constants of the noble metals,” Phys. Rev. B 6, 4370–4379 (1972). [CrossRef]
35. E. D. Palik, ed., Handbook of Optical Constants of Solids (Academic Press, 1985).
36. D. R. Lide, ed., CRC Handbook of Chemistry and Physics, 88th ed. (CRC, 2007).
37. R. Freund and N. Nachtigal, “QMR: a quasi-minimal residual method for non-hermitian linear systems,” Numer. Math. 60, 315–339 (1991). [CrossRef]
38. R. W. Freund, G. H. Golub, and N. M. Nachtigal, “Iterative solution of linear systems,” Acta Numer. 1, 57–100 (1992). Secs. 2.4 and 3.3. [CrossRef]
39. J. W. Goodman, Introduction to Fourier Optics(Roberts & Company Publishers, 2005), 3rd ed. Sec. 2.3.2.