Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Accelerated solution of the frequency-domain Maxwell’s equations by engineering the eigenvalue distribution of the operator

Open Access Open Access

Abstract

We introduce a simple method to accelerate the convergence of iterative solvers of the frequency-domain Maxwell’s equations for deep-subwavelength structures. Using the continuity equation, the method eliminates the high multiplicity of near-zero eigenvalues of the operator while leaving the operator nearly positive-definite. The impact of the modified eigenvalue distribution on the accelerated convergence is explained by visualizing residual vectors and residual polynomials.

© 2013 Optical Society of America

1. Introduction

To understand electromagnetic (EM) and optical phenomena, it is essential to solve Maxwell’s equations efficiently. In the frequency domain, assuming a time dependence e+iωt and nonmagnetic materials, Maxwell’s equations reduce to

××Eω2μ0εE=iωμ0J,
where ε is the electric permittivity (which can be complex); μ0 is the magnetic permeability of vacuum; ω is the angular frequency; E and J are the electric field and electric current source density, respectively.

To solve Eq. (1) numerically, one can use a method such as the finite-difference frequency-domain (FDFD) method [13] or the finite element method (FEM) [4] to construct a large system of linear equations

Ax=b,
where A is a matrix representing the operator [∇× (∇× ) −ω2μ0ε]; x is an unknown column vector representing E; b is a column vector representing −iωμ0J. The matrix A thus constructed is sparse (with only 13 nonzero elements per row when generated by the FDFD method) and typically very large (often with more than 10 million rows and columns for three-dimensional (3D) problems). To solve a system with such a large and sparse matrix, iterative methods are usually preferred to direct methods [5].

However, in the “low-frequency regime” where the wavelength is much longer than the grid cell size Δ, it is well-known that convergence is quite slow when the iterative methods are directly applied to solve Eq. (1). The low-frequency regime arises, for example, in nanophotonics [68] and geophysics [911] where structures have feature sizes that are at deep-subwavelength scale, and it will be defined more rigorously in Sec. 2. The huge null space of the operator ∇× (∇× ) was shown to be the origin of the slow convergence [10,11], and several techniques to improve the convergence speed have been developed.

The first class of techniques is based on the Helmholtz decomposition, which decomposes the E-field as E = Ψ + ∇φ, where Ψ is a divergence-free vector field and φ is a scalar field [915]. Because ∇ · Ψ = 0, Eq. (1) is written as

2Ψω2μ0ε(Ψ+φ)=iωμ0J,
where the operator ∇× (∇× ), which has a huge null space, is replaced with the negative Laplacian −∇2, which is positive-definite for appropriate boundary conditions and thus has the smallest possible null space. However, these techniques either solve an extra equation for the extra unknown φ at every iteration step [912], which can be time-consuming, or increase the number of the rows and columns of the matrix by about 33% [1315], which requires more memory.

The second class of techniques utilizes the charge-free condition

(εE)=0.
The condition (4) holds at every source-free (i.e., J = 0) position, where Eq. (1) can be modified to
××E+s[((ε/ε0)E)]ω2μ0εE=0
for an arbitrary constant s; note that the right-hand side is 0 because J = 0. In this class of techniques, Eqs. (1) and (5) are solved at positions with and without sources, respectively.

Reference [16] applied the above technique with s = +1 to boundary value problems described in [17] and achieved accelerated convergence. Such boundary value problems satisfied J = 0 everywhere, so Eq. (5) was solved throughout the entire simulation domain.

However, Ref. [16] did not conduct a detailed comparison of convergence speed between different values of s. It also did not report whether its technique leads to accelerated convergence for problems with sources, even though many problems have nonzero electric current sources J inside the simulation domain. Reference [1] applied the technique with s = +1 to problems with sources, but only in order to suppress spurious modes rather than to accelerate convergence.

In this paper, we develop a modification of Eq. (1) that improves convergence speed even if electric current sources J exist inside the simulation domain [18]. Unlike the previous technique that made the modification only at source-free positions, our technique modifies Eq. (1) everywhere including positions with sources. For the modification, we utilize the continuity equation

iωρ+J=0,or(εE)=iωJ,
which can be derived by taking the divergence of Eq. (1). When Eq. (6) is manipulated appropriately and then added to Eq. (1), we obtain
××E+s[ε1(εE)]ω2μ0εE=iωμ0J+siω[ε1J]
for a constant s. The modified equation (7) is the equation to solve in this paper.

The solution E-field of Eq. (7) is the same as the solution of the original equation (1) regardless of the value of s, because the solution of Eq. (1) always satisfies Eq. (6). However, the choice of s affects the convergence speed of iterative methods significantly. In this paper, we demonstrate that s = −1 induces faster convergence speed than other values of s by comparing the convergence behavior of iterative methods for s = −1, 0, +1; the latter two values of s are of particular interest, because s = 0 reduces Eq. (7) to the original equation (1) and s = +1 is the value that Ref. [16] used in Eq. (5), which is similar to Eq. (7).

We also show that the difference in convergence behavior results from the different eigenvalue distributions of the operators for different s. There are many general mathematical studies about the dependence of the convergence behavior on the eigenvalue distribution [1926]. Our aim here is instead to provide an intuitive understanding of the convergence behavior specifi-cally for the operator of Eq. (7). For this purpose, at each iteration step we visualize the residual vector and residual polynomial, which are widely used concepts to explain the convergence behavior of iterative methods [5] and also defined briefly in Sec. 3. As a result, we find that convergence speed deteriorates substantially for s = 0 because the operator has eigenvalues clustered near zero, and for s = +1 because the operator is strongly indefinite.

The rest of this paper is organized as follows. In Sec. 2 we investigate the eigenvalue distribution of the operator in Eq. (7) for s = 0, −1, +1 for a simple homogeneous system. We also define the low-frequency regime rigorously in the section. In Sec. 3, we relate the eigenvalue distribution with the convergence behavior of an iterative method. In Sec. 4, we solve Eq. (7) for a wide range of realistic 3D problems to compare the convergence behavior of an iterative method for the three values of s, and we conclude in Sec. 5.

2. Eigenvalue distribution of the operator for a homogeneous system

In this section, we consider the operator in Eq. (7) for a homogeneous system and show that the properties of the eigenvalue distribution of the operator strongly depend on the value of s. The impact of s on the eigenvalue distribution has been studied in detail in the literature of the deflation method (also known as the penalty method) [2729]. Here we only highlight those aspects that are important for the present study.

For a homogeneous system where ε is constant, Eq. (7) is simplified to

××E+s(E)ω2μ0εE=iωμ0J+siωε(J),
where the operator
T=×(×)+s()ω2μ0ε
is Hermitian for real ε. Because ε is constant in this section, the eigenvalue distribution of T is shifted from the eigenvalue distribution of a Hermitian operator
T0=×(×)+s()
by a constant −ω2μ0ε. In the low-frequency regime such shift is negligible, and thus the eigen-value distribution of T0 approximates that of T very well. Hence, we examine the eigenvalue distribution of T0 below to investigate the eigenvalue distribution of T.

In Appendix A, we show that Fkeik·r with

Fk=[kxkykz],[kz0kx],[kykx0]
are the three eigenfunctions of both ∇× (∇× ) and ∇(∇· ) for each wavevector k. We also show in the same appendix that the corresponding three eigenvalues are
λ=0,|k|2,|k|2
for ∇×(∇× ), and
λ=|k|2,0,0
for ∇(∇· ). Therefore, T0 has as
λ=s|k|2,|k|2,|k|2
three eigenvalues for each wavevector k.

Equation (14) indicates that the eigenvalue distribution of T0 is greatly affected by the value of s. Specifically, the multiplicity of the eigenvalue 0 depends critically on whether s is 0 or not: for s = 0 T0 has a very high multiplicity of the eigenvalue 0 because Eq. (14) has 0 as an eigenvalue for every k, whereas for s ≠ 0 T0 does not have such a high multiplicity of the eigenvalue 0. The definiteness of T0 also depends on the value of s: for s ≤ 0 T0 is positive-semidefinite because the three eigenvalues in Eq. (14) are always nonnegative, whereas for s > 0 T0 is indefinite because Eq. (14) has both positive and negative numbers as eigenvalues. The different properties of the eigenvalue distributions of T0 for s = 0, s < 0, and s > 0 are summarized in Table 1.

Tables Icon

Table 1. Properties of the eigenvalue distributions of T0 for different s. Depending on the sign of s, T0 has very different eigenvalue distributions in terms of the multiplicity of the eigenvalue 0 and the definiteness of T0.

The above description of the eigenvalue distributions of T0 should approximately hold for the eigenvalue distributions of T as well in the low-frequency regime, as mentioned in the discussion of Eqs. (9) and (10). Moreover, even though T is a differential operator defined in an infinite space, it turns out that the description also applies to the matrix A discretized from T that is defined in a spatially bounded simulation domain.

To demonstrate, we numerically calculate the eigenvalues of A for a two-dimensional (2D) system shown in Fig. 1, a square domain filled with vacuum. The domain is discretized on a finite-difference grid with Nx × Ny = 50 × 50 cells and cell size Δ = 2 nm. Therefore, the matrix A for each s has 3NxNy = 7500 rows and columns, where the extra factor 3 accounts for the three Cartesian components of the E-field. We choose ω corresponding to the vacuum wavelength λ0 = 1550nm, which puts the system in the low-frequency regime as will be seen at the end of this section. The matrices A are constructed for three values of s: 0, −1, and +1, each of which represents each category of s in Table 1.

 figure: Fig. 1

Fig. 1 A 2D square domain filled with vacuum (ε = ε0) for which the eigenvalue distribution of T is calculated numerically for s = 0, −1, +1. The domain is homogeneous in the z-direction, whereas its x- and y-boundaries are subject to periodic boundary conditions. The square domain is discretized on a finite-difference grid with cell size Δ = 2 nm. The domain is composed of 50 ×50 grid cells, which lead to 7500 eigenvalues in total. A vacuum wavelength λ0 = 1550 nm, which puts the system in the low-frequency regime, is assumed for the electric current source to be used in Sec. 3.

Download Full Size | PDF

The distributions of the numerically calculated eigenvalues of A for s = 0, −1, +1 are shown as three plots in Fig. 2. In each plot, the horizontal axis represents eigenvalues, and it is divided into 41 intervals t−20,...,t0,...,t20 where t0 ∋ 0. The height of the column on each interval corresponds to the number of the eigenvalues in the interval.

 figure: Fig. 2

Fig. 2 The eigenvalue distribution of A discretized from T for (a) s = 0, (b) s = −1, and (c) s = +1 for the vacuum-filled domain illustrated in Fig. 1. All 7500 eigenvalues λ of A are calculated for each s and categorized into 41 intervals in the horizontal axis that represents the range of the eigenvalues; the unit of the horizontal axis is nm−2. The height of the column on each interval represents the number of the eigenvalues in the interval. In (b) and (c), the black dots indicate the eigenvalue distribution for s = 0 shown in (a). The vertical axes are broken due to the extremely tall column at λ ≃ 0 in (a). The local maxima at λ = ±1 nm−2 are the Van Hove singularities [30] arising from the lattice structure imposed by the finite-difference grid.

Download Full Size | PDF

The eigenvalue distributions of A shown in Fig. 2 agree well with the description of the eigenvalues of T0 in Table 1: the very tall column on t0 in Fig. 2(a) indicates the very high multiplicity of λ ≃ 0 for s = 0, and the eigenvalues distributed over tj<0 and tj>0 in Fig. 2(c) indicate a strongly indefinite operator for s = +1. In addition, the height of the column on t0 in Fig. 2(a) is about 2500, or one third of the total number of eigenvalues, which agrees with Eq. (14) for s = 0 where one of the three eigenvalues is 0 for each k; the columns on tj>0 are about 1.5 times taller in Fig. 2(b) than in Fig. 2(a), which also agrees with Eq. (14) where the number of |k|2 increases from two for s = 0 to three for s = −1.

We end the section by providing a quantitative definition of the low-frequency regime. Suppose that A0 is the matrix discretized from T0 of Eq. (10). For s = 0, the eigenvalues of A0 range from 0 to 8/Δmin2, where Δmin is the minimum grid cell size [31]; note that the range agrees with Fig. 2(a). The eigenvalue distribution of A is the shifted eigenvalue distribution of A0 by −ω2μ0ε. The low-frequency regime is where the magnitude of the shift is so small that A has an almost identical eigenvalue distribution as A0. Therefore, the condition for the low-frequency regime is

ω2μ0|ε|8/Δmin2.
Equation (15) is consistent with the condition introduced in [10], but here we provide a condition that is based on a more accurate estimate of the maximum eigenvalue of A0. We can rewrite Eq. (15) in terms of the vacuum wavelength λ0 as
λ0/Δminπ|εr|/2,
where εr = ε/ε0 is the relative electric permittivity. The system described in Fig. 1 satisfies Eq. (16), so it is in the low-frequency regime.

3. Impact of the eigenvalue distribution on the convergence behavior of GMRES

In this section, we explain how the different eigenvalue distributions for different values of s examined in Sec. 2 influence the convergence behavior of an iterative method to solve Eq. (8).

For each of s = −1, 0, +1, we discretize Eq. (8) using the FDFD method for the system illustrated in Fig. 1 with an x-polarized electric dipole current source placed at the center of the simulation domain. We then solve the discretized equation by an iterative method to observe the convergence behavior. The iterative method to use in this section is the general minimal residual (GMRES) method [33], which is one of the Krylov subspace methods [5]. We use GMRES without restart because the system is sufficiently small.

Like other iterative methods, GMRES generates an approximate solution xm of Ax = b at the mth iteration step. As m increases, xm eventually converges to the exact solution. We assume that convergence is achieved when the residual vector

rm=bAxm
satisfies ||rm||/||b|| < τ, where ||·|| is the 2-norm and τ is a user-defined small positive number. In practice, τ = 10−6 is sufficiently small for accurate solutions. We use x0 = 0 as an initial guess solution throughout the paper.

Figure 3 shows ||rm||/||b|| versus the number m of iteration steps for the three values of s. As can be seen in the figure, the convergence behavior of GMRES is quite different for different s, with s = −1 far more superior than the other two choices of s.

 figure: Fig. 3

Fig. 3 Convergence behavior of GMRES for the vacuum-filled domain illustrated in Fig. 1. Three systems of linear equations discretized from Eq. (8) for s = 0, −1, +1 are solved by GMRES. In the iteration process of GMRES for each s, we plot the relative residual norm ||rm||/||b|| at each iteration step m. Notice that for s = 0 the relative residual norm stagnates initially; for s = −1 it stagnates around m = 100; for s = +1 it does not stagnate, but decreases very slowly. The upper and lower “X” marks on the vertical axis indicate the values around which our theory expects ||rm||/||b|| to stagnate for s = 0 and s = −1, respectively.

Download Full Size | PDF

The overall trend of the convergence behavior shown in Fig. 3 is consistent with the mathematical theories of iterative methods. For example, the convergence stagnates initially for s = 0, and according to [21] this is typical behavior of GMRES for a matrix with many eigenvalues close to 0 such as our A for s = 0 (see Fig. 2(a)). Also, the convergence is very slow for s = +1, and Ref. [23] argues that in general the Krylov subspace methods converge much more slowly for indefinite matrices such as our A for s = +1 (see Fig. 2(c)) than for definite matrices. In this section we provide a more intuitive explanation for the convergence behavior by using the residual polynomial.

We first review the residual polynomial of GMRES briefly. Suppose that 𝒫m is the set of all polynomials m of degree at most m such that

p˜m(0)=1.
For each m ∈ 𝒫m, we can define a column vector
r˜mp˜m(A)r0.
At the mth iteration step of GMRES, the residual vector rm of Eq. (17) is the m with the smallest 2-norm [5]. We refer to the m for m = rm as the residual polynomial pm. Therefore, from Eq. (19) we have
rm=pm(A)r0.

Below, we show how the eigenvalue distribution of A influences pm at each iteration step and hence influences the convergence behavior of GMRES. The matrix A ∈ ℂn×n for our homogeneous system described in Fig. 1 is Hermitian because it is discretized from the Hermitian operator T of Eq. (8). Hence, the eigendecomposition of A is

A=VΛV,
where
Λ=[λ1λn],V=[v1vn]
with real eigenvalues λi and the corresponding normalized eigenvectors vi, and V is the conjugate transpose of V; note that V is unitary, i.e., VV = I. Substituting Eq. (21) in Eq. (19), we obtain
r˜m=Vp˜m(Λ)Vr0
because (VΛV)k = VΛkV for any nonnegative integer k. We then define a column vector
z˜mV(r˜m/b),
whose ith element, which is referred to as mi below, is the projection of m/||b|| onto the direction of the ith eigenvector vi. Similarly, we also define
zmV(rm/b).
From Eq. (25) for m = 0 and Eqs. (23) and (24), we obtain
z˜m=p˜m(Λ)z0=[p˜m(λ1)p˜m(λn)]z0,
which can be written element-by-element as
z˜mi=p˜m(λi)z0i.
Because ||m|| = ||m||/||b||, GMRES minimizes ||m|| to ||zm|| when it minimizes ||m|| to ||rm|| at the mth iteration step.

According to Eq. (27), |mi| is minimized to 0 when m has λi as a root. Thus, the most ideal m has all the n eigenvalues of A as its roots, because it reduces ||m|| to 0. However, m has at most m roots, and m, which is the number of iteration steps, is typically far less than n. Therefore, m needs to have its roots optimally placed near the eigenvalues to minimize ||m||. Hence, the eigenvalue distribution of A greatly influences the convergence behavior of GMRES.

We now seek to understand the convergence behavior of GMRES for the different choices of s. We begin with s = −1. In Fig. 4 we plot rm/||b|| for s = −1 as bar graphs at the first few iteration steps. The horizontal axis in each plot represents eigenvalues. We divide the range of eigenvalues into the same 41 intervals t−20,...,t0,...,t20 used in Fig. 2; note that t0 ∋ 0. The height of the column on each interval is the norm of the projection of rm/||b|| onto the space spanned by the eigenvectors whose corresponding eigenvalues are contained in the interval. More specifically, the height of the column on tj after m iteration steps is

hmj=[λitjzmi2]1/2.
Note that [jhmj2]1/2=rm/b, and thus the sum of the squares of the column heights is a direct measure of convergence.

 figure: Fig. 4

Fig. 4 Initial evolution of rm/||b|| for s = −1. Relative residual vectors rm/||b|| are visualized at three iteration steps m = 0, 2, 4. In each plot, the column on each interval represents the norm of rm/||b|| projected onto the sum of the eigenspaces of the eigenvalues contained in the interval. Notice that all the columns almost vanish only after four iteration steps. In the plots for m = 2 and m = 4, the residual polynomials pm are also plotted as solid curves; note that they always satisfy the condition (18).

Download Full Size | PDF

A few properties of rm/||b|| for s = −1 shown in Fig. 4 are readily predicted from the corresponding eigenvalue distribution of the matrix A presented in Fig. 2(b). For instance, A has no eigenvalues in tj<0, and therefore rm/||b|| has components only in tj≥0 throughout the iteration process as demonstrated in Fig. 4. Also, A has very few eigenvalues in t0, and thus r0/||b|| has a very weak component in t0 as can be seen in the m = 0 plot in Fig. 4.

Now, we relate rm/||b|| with the residual polynomial to explain the convergence behavior of GMRES for s = −1. The residual polynomial pm(λ), which is obtained by solving a least squares problem, is also plotted in Fig. 4 at each iteration step. As the iteration proceeds, the residual polynomial in Fig. 4 has more and more roots, but only in tj≥0, because the eigenvalues exist only in tj≥0 and the roots of residual polynomials should stay close to the eigenvalues as mentioned in the discussion following Eq. (27). Also, as Eq. (27) predicts, the columns in each plot of Fig. 4 almost vanish at the roots of the residual polynomial. Therefore, all the columns quickly shrink as the number of the roots of the residual polynomial increases in the iteration process of GMRES. The fast reduction of the column heights provides visualization of the fast convergence of GMRES for s = −1 shown in Fig. 3.

Next, we examine the convergence behavior for s = 0. Figure 5 shows rm/||b|| for s = 0 at the first few iteration steps. Note that r0/||b|| has a tall column on t0 because A has many eigenvalues in t0 as shown in Fig. 2(a). Also, the tall column on t0 persists during the initial period of the iteration process.

 figure: Fig. 5

Fig. 5 Initial evolution of rm/||b|| for s = 0. Relative residual vectors rm/||b|| are visualized at three iteration steps m = 0, 2, 4. In each plot, the column on each interval represents the norm of rm/||b|| projected onto the sum of the eigenspaces of the eigenvalues contained in the interval. Notice that most columns almost vanish only after four iteration steps, except for the very persistent column at λ ≃ 0. In the plots for m = 2 and m = 4, the residual polynomials pm are also plotted as solid curves; note that they always satisfy the condition (18).

Download Full Size | PDF

To explain the above convergence behavior for s = 0, we show that for a nearly positive-definite matrix the column on t0 is persistent during the initial period of the iteration process of GMRES in general. For that purpose, we compare the three polynomials m ∈ 𝒫m shown in Fig. 6. The three m are chosen as candidates for the residual polynomial pm for a nearly positive-definite matrix, and therefore the roots of the polynomials are placed in tj≥0 according to the discussion following Eq. (27). The three m have the same roots except for their smallest roots: m in Fig. 6(a) does not have its smallest root in t0, whereas m in Figs. 6(b) and 6(c) do. Note that the latter two m can shrink the column on t0 more effectively than the first m according to Eq. (27).

 figure: Fig. 6

Fig. 6 Impact of the magnitude of the smallest root of a polynomial m ∈ 𝒫m on the oscillation amplitudes of m. Three m of degree 6 are shown. In each figure, a solid line represents a polynomial; an open dot on the horizontal axis indicates the smallest root; solid dots indicate the other roots; dashed lines show the slopes of the polynomial at the roots. The three polynomials have the same roots except for their smallest roots: the smallest root in (a) becomes smaller positive and negative roots in (b) and (c), respectively. Notice that the slopes at all roots in (a) become steeper in (b) and (c) as the smallest root decreases in magnitude, and as a result the amplitudes of oscillation of m around the horizontal axis increase.

Download Full Size | PDF

However, the slopes at the roots of the latter two m are steeper than the slopes at the corresponding roots of the first m as shown in Fig. 6. In Appendix B, we prove rigorously that the slopes of m at all roots indeed increase as the smallest root decreases in magnitude. In general, m with steeper slopes at the roots oscillates with larger amplitudes around the horizontal axis because it varies faster around the axis; compare the amplitudes of oscillation in Fig. 6(a) with those in Figs. 6(b) and 6(c). The increased amplitudes of oscillation amplify |mi| overall according to Eq. (27), and thus ||m|| as well.

In other words, shrinking the column on t0 (by placing the smallest root of m in t0) is achieved only at the penalty of amplifying the columns on tj>0. This penalty is too heavy when the columns on tj>0 constitute a considerable portion of ||m||. Therefore, roots of residual polynomials are not placed in t0 until the columns on tj>0 become quite small, which results in the persistence of the column on t0 during the initial period of the iteration process.

Because the height of the column on t0 remains almost the same at the initial iteration steps of GMRES, h00 of Eq. (28), which is the initial height of this column, provides an approximate lower bound of ||zm|| = ||rm||/||b|| during the initial period of the iteration process. A more accurate lower bound is calculated as the norm of r0/||b|| projected onto the eigenspace of the eigenvalue closest to 0. For our example system, for s = 0 the calculated lower bound is 0.707. Note that ||rm||/||b|| for s = 0 indeed stagnates initially at this value in Fig. 3. For s = −1 the calculated lower bound is 4.16 × 10−7, at which ||rm||/||b|| also stagnates as shown in Fig. 3. However, this value is much smaller than the lower bound for s = 0, because for s = −1 the initial height of the column on t0 is almost negligible as shown in the m = 0 plot in Fig. 4. In fact, the value is smaller than the conventional tolerance τ = 10−6 mentioned below Eq. (17), so the stagnation does not deteriorate the convergence speed for s = −1.

Lastly, we examine the convergence behavior for s = +1. Figure 7 shows rm/||b|| for s = +1 at some first (m = 0, 4, 7, 11) and later (m = 120, 140) iteration steps. Because the matrix A for s = +1 has both positive and negative eigenvalues as indicated in Fig. 2(c), rm/||b|| has components in both tj>0 and tj<0, but in the present example the components of rm/||b|| are concentrated in tj<0 initially (m = 0 plot in Fig. 7). Thus GMRES begins with the roots of residual polynomials placed in tj<0 (m = 4 plot in Fig. 7). However, such residual polynomials have large values in tj>0, so they amplify the initially very small components of rm/||b|| in tj>0 according to Eq. (27), and eventually we reach a point where the components of rm/||b|| in tj>0 and tj<0 become comparable (m = 7 plot in Fig. 7). Afterwards, GMRES places the roots of residual polynomials in both tj>0 and tj<0 so that the components of rm/||b|| in both regions are reduced.

 figure: Fig. 7

Fig. 7 Evolution of rm/||b|| for s = +1. Relative residual vectors rm/||b|| are visualized at iteration steps m = 0, 4, 7 in the first row and at m = 11, 120, 140 in the second row. In each plot, the column on each interval represents the norm of rm/||b|| projected onto the sum of the eigenspaces of the eigenvalues contained in the interval. The vertical scale of the plot is magnified as the iteration proceeds. Notice that the column at λ ≃ 0 is very persistent during the later period of the iteration process (m = 120, 140). In the plots for m = 4, 7, 11, the residual polynomials pm are also plotted as solid curves; the residual polynomials are not plotted for m = 100 and m = 120 because they have too many roots.

Download Full Size | PDF

We note that the convergence behavior for s = +1 is initially quite similar to that for s = −1 because r0/||b|| for s = +1 has components concentrated in tj<0 and only a very weak component in t0. Therefore, ||rm||/||b|| reduces quickly for s = +1 without stagnation during the initial period of the iteration process as shown in Fig. 3.

During the later period of the iteration process, however, the reduction of ||rm||/||b|| for s = +1 slows down significantly, and eventually s = +1 produces the slowest convergence among the three values of s as shown in Fig. 3. The slow reduction of ||rm||/||b|| is due to the very persistent column on t0: comparing the plots for m = 120 and m = 140 in Fig. 7 shows that the column barely reduces for 20 iteration steps.

We have shown earlier that the column on t0 is quite persistent for a nearly positive-definite matrix. The argument relied on the properties proved in Appendix B about a polynomial m ∈ 𝒫m with only positive roots. We can easily extend the proof in the appendix to m with both positive and negative roots, and then show that the column on t0 is persistent also for a strongly indefinite matrix, which explains the slow convergence for s = +1 described above. However, the explanation is insufficient to explain why the convergence is much slower for s = +1 than for s = 0 as indicated in Fig. 3.

Here, we show that the column on t0 is in fact even more persistent for a strongly indefinite matrix than for a nearly positive-definite matrix. For that purpose, we compare the two polynomials m ∈ 𝒫m shown in Fig. 8. As can be seen from the locations of their roots, they are candidates for the residual polynomials for different matrices: m shown in Fig. 8(a) is appropriate for a nearly positive-definite matrix (referred to as Adef below), and m shown in Fig. 8(b) is appropriate for a strongly indefinite matrix (referred to as Aind below). Moreover, we choose these two m to have the same smallest-magnitude root ζ0 in t0. Being elements of 𝒫m, both m satisfy Eq. (18). Hence, we have |p̃′m (ζ0)| ≃ 1/|ζ0| for both m, where p̃′m is the first derivative of m.

 figure: Fig. 8

Fig. 8 Candidates for the residual polynomials for (a) a nearly positive-definite matrix and (b) strongly indefinite matrix. In each figure, a solid line represents a polynomial m ∈ 𝒫m; an open dot on the horizontal axis indicates the smallest-magnitude root; solid dots indicate the other roots; dashed lines show the slopes of the polynomial at the roots. The two polynomials have the same smallest-magnitude root ζ0, and thus have approximately the same slope −1/ζ0 at their smallest-magnitude roots. Note that for both m the slopes get steeper at the roots further away from the median of the roots. Hence, the slopes of most dashed lines are gentler than 1/|ζ0| in (a) and steeper than 1/|ζ0| in (b). As a result, m in (b) has larger amplitudes of oscillation around the horizontal axis than m in (a).

Download Full Size | PDF

Now, we note that |p̃′m| evaluated at a root of m tends to decrease as the root gets closer to the median of the roots; see Appendix C for a more rigorous explanation. Hence, |p̃′m| ≤ 1/|ζ0| tends to hold at most roots of m for Adef, because ζ0 is one of the farthest roots from the median of the roots. On the other hand, |p̃′m| ≥ 1/|ζ0| tends to hold at most roots of m for Aind, because ζ0 is one of the closest roots to the median of the roots. Therefore, m for Aind has much steeper slopes at most roots than m for Adef in general, and thus has larger amplitudes of oscillation around the horizontal axis, as demonstrated in Fig. 8.

Combined with Eq. (27), the above argument shows that shrinking the column on t0 (by placing the smallest-magnitude root of m in t0) increases ||zm|| much more for a strongly indefinite matrix than for a nearly positive-definite matrix. Therefore, the column on t0 should be much more persistent for a strongly indefinite matrix than for a nearly positive-definite matrix in general, which explains the much slower convergence for s = +1 than for s = 0 in Fig. 3.

In summary of this section, we have shown that s = −1 produces the most superior convergence behavior; s = 0 induces stagnation during the initial period of the iteration process due to the high multiplicity of eigenvalues near zero; s = +1 leads to the slowest convergence overall due to the strongly indefinite matrix. We have provided a graphical explanation of the difference in the convergence behavior of GMRES, for which a strong theoretical basis exists, using a simple system of a homogeneous dielectric medium.

The arguments provided in this section can be easily extended to show that s = −1 is indeed optimal among all values in general. Compared with the case of s = −1, according to Eq. (14), for s > 0 A is always more strongly indefinite and therefore the convergence should be slower; for −1 < s < 0 A has more eigenvalues clustered near zero and thus the initial stagnation period should be longer; for s < −1 A has a wider eigenvalue value range, so the condition number of A should be larger and the convergence should be slower [23]. In other words, s = −1 is the value that leaves A nearly positive-definite while removing the eigenvalues clustered near zero sufficiently without increasing the condition number. With separate numerical experiments we have verified that s = −1 is indeed superior to values other than s = 0 and s = +1 as well.

In the next section we will see that the difference in the convergence behavior for different choices of s is in fact quite general in practical situations.

4. Convergence behavior of QMR for 3D inhomogeneous systems

In this section, we solve Eq. (7) for 3D inhomogeneous systems of practical interest by an iterative method, and demonstrate that s = −1 still induces faster convergence than s = 0 and s = +1. We note that the systems examined in this section are inhomogeneous and have complex ε in general. The analyses in Secs. 2 and 3, therefore, do not hold strictly here. Nevertheless, we will see that the analyses for the homogeneous system in the previous sections provide insight in understanding the convergence behavior for more general systems examined in this section.

The three 3D inhomogeneous systems we consider are illustrated in the first row of Fig. 9. To prevent reflection of EM waves from boundaries, we enclose each system by the stretched-coordinate perfectly matched layer (SC-PML), because SC-PML produces a much better-conditioned matrix than the more commonly used uniaxial PML (UPML) [32]. For each system, we construct three systems of linear equations Ax = b corresponding to s = −1, 0, +1 by the FDFD method. The number of the grid cells Nx, Ny, and Nz and the grid cell sizes Δx, Δy, and Δz of the finite-difference grid used to discretize each system are shown in Table 2. Considering the parameters summarized in Table 3 and the condition (16), all the three systems are in the low-frequency regime.

 figure: Fig. 9

Fig. 9 Three inhomogeneous systems for which Eq. (7) is solved for s = −1, 0, +1 by QMR: (a) a slot waveguide bend formed in a thin silver film (Slot), (b) a straight silicon waveguide (Diel), and (c) an array of gold pillars (Array). The figures in the first row describe the three systems. The directions of wave propagation are shown by red arrows, beside which the vacuum wavelengths used are indicated. For all three systems, the waves are excited by electric current sources J strictly inside the simulation domain. The plots in the second row show the convergence behavior of QMR. Note that for all three systems QMR converges fastest for s = −1, whereas it barely converges for s = +1. The relative electric permittivities of the materials used in these systems are εrsilver=129i3.28 [34], εrsilica=2.085 [35], εrsilicon=12.09 [35], and εrgold=10.78i0.79 [36], respectively.

Download Full Size | PDF

Tables Icon

Table 2. Specification of the finite-difference grids used for the three systems in Fig. 9. Slot uses a nonuniform grid with smoothly varying grid cell size. The matrix A has 3NxNyNz rows and columns, where the extra factor 3 accounts for the three Cartesian components of the E-field.

Tables Icon

Table 3. Parameters used in Eq. (16) for the three systems in Fig. 9. When substituted in Eq. (16), these parameters prove that all the three systems are in the low-frequency regime.

The constructed systems of linear equations are solved by the quasi-minimal residual (QMR) method [37], which is another Krylov subspace method. GMRES that was used in Sec. 3 to solve a 2D problem is not suitable for 3D problems here because it requires too much memory [5].

The second row of Fig. 9 shows the convergence behavior of QMR for the three systems.

Note that for all three systems the choice of s = −1 results in the fastest convergence, and the choice of s = +1 barely leads to convergence. The three systems shown in Fig. 9 are chosen deliberately to include different materials such as dielectrics and metals and geometries with different degrees of complexity. Therefore, Fig. 9 suggests that the superiority of s = −1 over both s = 0 and s = +1 is quite general.

Even though both the iterative method and the systems in this section are significantly different from those in the previous section, the convergence behaviors are very similar. We explain the resemblance as follows.

The matrix for an inhomogeneous system is actually not much different from that for a homogenous system in many cases. Indeed, most inhomogeneous systems consist of several homogeneous subdomains. Inside each homogeneous subdomain of such an inhomogeneous system, the differential operator in Eq. (7) for the inhomogeneous system is the same as the differential operator (9) for a homogeneous system, whereas at the interfaces between the subdomains it is not. Nevertheless, the number of finite-difference grid points assigned at the interfaces is usually much smaller than that of the grid points assigned inside the homogeneous subdomains. Therefore most rows of the matrix for the inhomogeneous system should be the same as those for a homogeneous system discretized on the same grid.

In addition, the differential operator (9) for a homogeneous system is nearly Hermitian in the low-frequency regime even if ε is complex, because it is approximated well by the Hermitian operator (10).

Hence, the matrix for an inhomogeneous system is actually quite similar to the nearly Hermitian matrix for a homogeneous system. Because QMR reduces to GMRES for Hermitian matrices in exact arithmetic [38], it is reasonable that the convergence behavior of QMR for an inhomogeneous system is similar to that of GMRES for a homogeneous system.

The matrix for an inhomogeneous system deviates more from that for a homogeneous system as the number of homogeneous subdomains increases, because then the number of grid points assigned at the interfaces between homogeneous subdomains increases. Therefore, we can expect that the convergence behavior for an inhomogeneous system would deviate from that for a homogeneous system as the number of homogeneous subdomains increases. Such deviation is demonstrated in Fig. 9(c), where the system has many metallic pillars; note that the convergence behavior for s = −1 is no longer very different from that for s = 0 in this case.

5. Conclusion and final remarks

We have introduced a new method to accelerate the convergence of iterative solvers of the frequency-domain Maxwell’s equations in the low-frequency regime. The method solves a new equation that is modified from the original Maxwell’s equations using the continuity equation.

The operator of the newly formulated equation does not have the high multiplicity of near-zero eigenvalues that makes the convergence stagnate for the original operator. At the same time, the new operator is nearly positive-definite, so it avoids the long-term slow convergence that indefinite operators suffer from.

In this paper, we have considered only nonmagnetic materials (μ = μ0). For magnetic materials (μμ0), we note that a similar equation

×μ1×E+s[(με)1(εE)]ω2εE=iωJ+siω[(με)1J],
which can also be formulated from Maxwell’s equations and the continuity equation, can be used instead of Eq. (7) to accelerate the convergence of iterative methods. We leave the discussion on the optimal value of s in this equation for future work.

Because our method achieves accelerated convergence by formulating a new equation, it can be easily combined with other acceleration techniques such as preconditioning and better iterative methods.

Appendix A: Eigenvalues and eigenfunctions of ∇×(∇× ) and ∇(∇· )

Using the k-space representations of the operators, in this section we derive the eigenvalues Eq. (12) of ∇×(∇× ) and Eq. (13) of ∇(∇· ) as well as their corresponding eigenfunctions.

Because both ∇× (∇× ) and ∇(∇· ) are translationally invariant, their eigenfunctions have the form [39]

F=Fkeikr,
where r represents position, k = kx + ŷky + kz is a real constant wavevector, and Fk=x^Fkx+y^Fky+z^Fkz is a k-dependent vector.

We reformulate the eigenvalue equations ∇×(∇×F) = λF and ∇(∇·F) = λF by substituting Eq. (30) for F. Then, the eigenvalue equation for ∇×(∇× ) is

[ky2+kz2kxkykxkzkykxkz2+kx2kykzkzkxkzkykx2+ky2][FkxFkyFkz]=λ[FkxFkyFkz],
and the eigenvalue equation for ∇(∇· ) is
[kx2kxkykxkzkykxky2kykzkzkxkzkykz2][FkxFkyFkz]=λ[FkxFkyFkz].

By solving Eqs. (31) and (32) for a given k, we obtain

λ=0,|k|2,|k|2,
which is Eq. (12), as the eigenvalues of ∇×(∇× ), and
λ=|k|2,0,0,
which is Eq. (13), as the eigenvalues of ∇(∇· ), and Eq. (30) with
Fk=[kxkykz],[kz0kx],[kykx0]
as the eigenfunctions corresponding to both Eqs. (33) and (34).

We note from Eqs. (33) and (34) that ∇×(∇× ) and ∇(∇· ) are positive-semidefinite and negative-semidefinite, respectively.

Appendix B: Effect of the smallest root of m ∈ 𝒫m on the slopes at the roots

In this section, we show that the slopes at the roots of a polynomial m ∈ 𝒫m with all positive roots become steeper when the smallest root decreases in magnitude. This behavior is illustrated in Fig. 6.

Since m ∈ 𝒫m satisfies the condition (18), it can be factored as

p˜m(ζ)=i=1dm(1ζζi),
where dmm is the degree of m and ζi’s are the roots of m. Hence, the slope of m at a root ζk is
p˜m'(ζk)=1ζkik(1ζkζi).

Now, suppose that 0 < ζ1 < ··· < ζdm. We can easily show that |p̃′m(ζk)| increases for any k when ζ1 decreases toward zero (while remaining positive) as follows. For k = 1, we have

|p˜m'(ζ1)|=1ζ1(1ζ1ζ2)(1ζ1ζdm),
which clearly increases as ζ1 decreases to 0. For k > 1, we have
|p˜m'(ζk)|=(ζkζ11)[1ζki1,k|1ζkζi|],
where the parentheses enclose the only quantity that is dependent on ζ1. We can therefore see that |p̃′m (ζk)| increases as ζ1 decreases for k > 1 as well. Therefore, for a given m ∈ 𝒫m whose roots are all positive, the slopes of m at the roots become steeper if the smallest root decreases in magnitude while remaining positive. This situation is illustrated by the transition from Fig. 6(a) to Fig. 6(b).

The slopes at the roots also become steeper when the originally positive ζ1 is replaced by a negative value, as long as the negative value is smaller in magnitude than the original ζ1. Replacing the originally positive ζ1 with a negative quantity that is smaller in magnitude is equivalent to first replacing ζ1 with a smaller positive value and then flipping its sign. Because we have already shown above that the slopes get steeper when the originally positive ζ1 is replaced by a smaller positive value, it is sufficient to show that flipping the sign of ζ1 makes the slopes even steeper. For a negative ζ1, the slopes at the roots are

|p˜m'(ζ1)|=1|ζ1|(1+|ζ1|ζ2)(1+|ζ1|ζdm)
and
|p˜m'(ζk)|=(ζk|ζ1|+1)[1ζki1,k|1ζkζi|]
for k > 1. These slopes are steeper than Eqs. (38) and (39), respectively, which are the slopes for a positive ζ1 with the same magnitude. Therefore, for a given m ∈ 𝒫m whose roots are all positive, the slopes of m at the roots become steeper if the smallest root is replaced by the one that is smaller in magnitude but negative. This situation is illustrated by the transition from Fig. 6(a) to Fig. 6(c).

Appendix C: Trend in the slopes of a polynomial at the roots

In this section, we consider a polynomial p with all real roots, and show that the slope of p evaluated at a root closer to the median of the roots tends to be gentler than the slope evaluated at a root farther away from the median of the roots. This behavior is illustrated in Fig. 8.

Consider a polynomial of degree m,

p(ζ)=αi=1m(ζζi),
with ζ1 < ··· < ζm. The slope of p at a root ζk is
p(ζk)=αik(ζkζi).

Now, we evaluate |p′(ζk+1)|/|p′(ζk)|. We first consider the case where the roots are evenly spaced, i.e., ζi+1ζi = (const.), for which we have

|p(ζk+1)||p(ζk)|=k!(mk1)!(k1)!(mk)!=kmk.
Equation (44) is an increasing function of k for 1 ≤ km −1, and it is less than 1 for k < m/2 and greater than 1 for k > m/2. Therefore, |p′(ζk)| is largest for k = 1 and k = m, and it decreases as k becomes closer to k = ⌈(m + 1)/2⌉ and k = ⌊(m + 1)/2⌋, which are the medians of the indices. In other words, for p with evenly spaced roots, the slopes of p get gentler at the roots closer to the median of the roots.

It is reasonable to expect that the above trend in the slopes also holds for p with unevenly spaced roots, unless the unevenness is too severe. To verify the expectation, we examine |p′(ζk+1)|/|p′(ζk)| without assuming ζi+1ζi = (const.):

|p(ζk+1)||p(ζk)|=ik+1|ζk+1ζi|ik|ζkζi|=ik,k+1|ζk+1ζi||ζkζi|=i=1k1(ζk+1ζiζkζi)i=k+2m(ζiζk+1ζiζk)=[i=1k1(1+ζk+1ζkζkζi)][i=k+2m(1ζk+1ζkζiζk)].
Here, the factors within the first (second) brackets are always greater (less) than 1, so the number of factors greater (less) than 1 increases (decreases) for increasing k. Hence, |p′(ζk+1)|/|p′(ζk)| tends to be less than 1 for smaller k, and it tends to be greater than 1 for larger k. This means that as k increases |p′(ζk)| tends to decrease first and then tends to increase. Therefore, even if the roots of p are unevenly spaced, the slopes of p tend to get gentler at the roots closer to the median of the roots.

Acknowledgment

We thank Prof. Michael Saunders for helpful discussions about iterative methods. This work was supported by the National Science Foundation (Grant No. DMS-0968809), the AFOSR MURI program (Grant No. FA9550-09-1-0704), and the Interconnect Focus Center, funded under the Focus Center Research Program, a Semiconductor Research Corporation entity. Wonseok Shin also acknowledges the support of Samsung Scholarship.

References and links

1. N. J. Champagne II, J. G. Berryman, and H. M. Buettner, “FDFD: A 3D finite-difference frequency-domain code for electromagnetic induction tomography,” J. Comput. Phys. 170, 830–848 (2001). [CrossRef]  

2. G. Veronis and S. Fan, “Overview of simulation techniques for plasmonic devices,” in “Surface Plasmon Nanophotonics,” M. Brongersma and P. Kik, eds. (Springer, 2007), pp. 169–182. [CrossRef]  

3. U. S. Inan and R. A. Marshall, Numerical Electromagnetics: The FDTD Method (Cambridge University, 2011). Ch. 14. [CrossRef]  

4. J.-M. Jin, The Finite Element Method in Electromagnetics, 2nd ed. (Wiley, 2002).

5. V. Simoncini and D. B. Szyld, “Recent computational developments in Krylov subspace methods for linear systems,” Numer. Linear Algebra Appl. 14, 1–59 (2007). [CrossRef]  

6. W. Cai, W. Shin, S. Fan, and M. L. Brongersma, “Elements for plasmonic nanocircuits with three-dimensional slot waveguides,” Adv. Mater. 22, 5120–5124 (2010). [CrossRef]   [PubMed]  

7. G. Veronis and S. Fan, “Bends and splitters in metal-dielectric-metal subwavelength plasmonic waveguides,” Appl. Phys. Lett. 87, 131102–3 (2005). [CrossRef]  

8. L. Verslegers, P. Catrysse, Z. Yu, W. Shin, Z. Ruan, and S. Fan, “Phase front design with metallic pillar arrays,” Opt. Lett. 35, 844–846 (2010). [CrossRef]   [PubMed]  

9. J. T. Smith, “Conservative modeling of 3-D electromagnetic fields, Part II: Biconjugate gradient solution and an accelerator,” Geophys. 61, 1319–1324 (1996). [CrossRef]  

10. G. A. Newman and D. L. Alumbaugh, “Three-dimensional induction logging problems, Part 2: A finite-difference solution,” Geophys. 67, 484–491 (2002). [CrossRef]  

11. C. J. Weiss and G. A. Newman, “Electromagnetic induction in a generalized 3D anisotropic earth, Part 2: The LIN preconditioner,” Geophys. 68, 922–930 (2003). [CrossRef]  

12. V. L. Druskin, L. A. Knizhnerman, and P. Lee, “New spectral Lanczos decomposition method for induction modeling in arbitrary 3-D geometry,” Geophys. 64, 701–706 (1999). [CrossRef]  

13. E. Haber, U. M. Ascher, D. A. Aruliah, and D. W. Oldenburg, “Fast simulation of 3D electromagnetic problems using potentials,” J. Comput. Phys. 163, 150–171 (2000). [CrossRef]  

14. J. Hou, R. Mallan, and C. Torres-Verdin, “Finite-difference simulation of borehole EM measurements in 3D anisotropic media using coupled scalar-vector potentials,” Geophys. 71, G225–G233 (2006). [CrossRef]  

15. R. Hiptmair, F. Kramer, and J. Ostrowski, “A robust Maxwell formulation for all frequencies,” IEEE Trans. Magn. 44, 682–685 (2008). [CrossRef]  

16. K. Beilenhoff, W. Heinrich, and H. Hartnagel, “Improved finite-difference formulation in frequency domain for three-dimensional scattering problems,” IEEE Trans. Microwave Theory Tech. 40, 540–546 (1992). [CrossRef]  

17. A. Christ and H. L. Hartnagel, “Three-dimensional finite-difference method for the analysis of microwave-device embedding,” IEEE Trans. Microwave Theory Tech. 35, 688–696 (1987). [CrossRef]  

18. At the final stage of our work, we were made aware of a related work by M. Kordy, E. Cherkaev, and P. Wannamaker, “Schelkunoff potential for electromagnetic field: proof of existence and uniqueness” (to be published), where an equation similar to our Eq. (7) with s= −1 was developed.

19. A. Jennings, “Influence of the eigenvalue spectrum on the convergence rate of the conjugate gradient method,” IMA J. Appl. Math. 20, 61–72 (1977). [CrossRef]  

20. A. van der Sluis and H. van der Vorst, “The rate of convergence of conjugate gradients,” Numer. Math. 48, 543–560 (1986). [CrossRef]  

21. S. L. Campbell, I. C. F. Ipsen, C. T. Kelley, and C. D. Meyer, “GMRES and the minimal polynomial,” BIT Numer. Math. 36, 664–675 (1996). [CrossRef]  

22. S. Goossens and D. Roose, “Ritz and harmonic Ritz values and the convergence of FOM and GMRES,” Numer. Linear Algebra Appl. 6, 281–293 (1999). [CrossRef]  

23. M. Benzi, G. H. Golub, and J. Liesen, “Numerical solution of saddle point problems,” Acta Numer. 14, 1–137 (2005). Sec. 9.2. [CrossRef]  

24. B. Beckermann and A. B. J. Kuijlaars, “Superlinear convergence of conjugate gradients,” SIAM J. Numer. Anal. 39, 300–329 (2001). [CrossRef]  

25. B. Beckermann and A. B. J. Kuijlaars, “On the sharpness of an asymptotic error estimate for conjugate gradients,” BIT Numer. Math. 41, 856–867 (2001). [CrossRef]  

26. O. Axelsson, “Iteration number for the conjugate gradient method,” Math. Comput. Simulat. 61, 421–435 (2003). [CrossRef]  

27. J. P. Webb, “The finite-element method for finding modes of dielectric-loaded cavities,” IEEE Trans. Microwave Theory Tech. 33, 635–639 (1985). [CrossRef]  

28. F. Kikuchi, “Mixed and penalty formulations for finite element analysis of an eigenvalue problem in electromagnetism,” Comput. Method Appl. Mech. Eng. 64, 509–521 (1987). [CrossRef]  

29. D. A. White and J. M. Koning, “Computing solenoidal eigenmodes of the vector Helmholtz equation: a novel approach,” IEEE Trans. Magn. 38, 3420–3425 (2002). [CrossRef]  

30. N. W. Ashcroft and N. D. Mermin, Solid State Physics, 1st ed. (SaundersCollege, 1976), Ch. 8.

31. To obtain 8/Δmin2, take the first equation of Eq. (4.29) in [32] and then multiply the extra factor μ= μ0to account for the difference between Eq. (4.17a) in [32] and Eq. (1) in the present paper.

32. W. Shin and S. Fan, “Choice of the perfectly matched layer boundary condition for frequency-domain Maxwell’s equations solvers,” J. Comput. Phys. 231, 3406–3431 (2012). [CrossRef]  

33. Y. Saad and M. H. Schultz, “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM J. Sci. Stat. Comp. 7, 856–869 (1986). [CrossRef]  

34. P. B. Johnson and R. W. Christy, “Optical constants of the noble metals,” Phys. Rev. B 6, 4370–4379 (1972). [CrossRef]  

35. E. D. Palik, ed., Handbook of Optical Constants of Solids (Academic Press, 1985).

36. D. R. Lide, ed., CRC Handbook of Chemistry and Physics, 88th ed. (CRC, 2007).

37. R. Freund and N. Nachtigal, “QMR: a quasi-minimal residual method for non-hermitian linear systems,” Numer. Math. 60, 315–339 (1991). [CrossRef]  

38. R. W. Freund, G. H. Golub, and N. M. Nachtigal, “Iterative solution of linear systems,” Acta Numer. 1, 57–100 (1992). Secs. 2.4 and 3.3. [CrossRef]  

39. J. W. Goodman, Introduction to Fourier Optics(Roberts & Company Publishers, 2005), 3rd ed. Sec. 2.3.2.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (9)

Fig. 1
Fig. 1 A 2D square domain filled with vacuum (ε = ε0) for which the eigenvalue distribution of T is calculated numerically for s = 0, −1, +1. The domain is homogeneous in the z-direction, whereas its x- and y-boundaries are subject to periodic boundary conditions. The square domain is discretized on a finite-difference grid with cell size Δ = 2 nm. The domain is composed of 50 ×50 grid cells, which lead to 7500 eigenvalues in total. A vacuum wavelength λ0 = 1550 nm, which puts the system in the low-frequency regime, is assumed for the electric current source to be used in Sec. 3.
Fig. 2
Fig. 2 The eigenvalue distribution of A discretized from T for (a) s = 0, (b) s = −1, and (c) s = +1 for the vacuum-filled domain illustrated in Fig. 1. All 7500 eigenvalues λ of A are calculated for each s and categorized into 41 intervals in the horizontal axis that represents the range of the eigenvalues; the unit of the horizontal axis is nm−2. The height of the column on each interval represents the number of the eigenvalues in the interval. In (b) and (c), the black dots indicate the eigenvalue distribution for s = 0 shown in (a). The vertical axes are broken due to the extremely tall column at λ ≃ 0 in (a). The local maxima at λ = ±1 nm−2 are the Van Hove singularities [30] arising from the lattice structure imposed by the finite-difference grid.
Fig. 3
Fig. 3 Convergence behavior of GMRES for the vacuum-filled domain illustrated in Fig. 1. Three systems of linear equations discretized from Eq. (8) for s = 0, −1, +1 are solved by GMRES. In the iteration process of GMRES for each s, we plot the relative residual norm ||rm||/||b|| at each iteration step m. Notice that for s = 0 the relative residual norm stagnates initially; for s = −1 it stagnates around m = 100; for s = +1 it does not stagnate, but decreases very slowly. The upper and lower “X” marks on the vertical axis indicate the values around which our theory expects ||rm||/||b|| to stagnate for s = 0 and s = −1, respectively.
Fig. 4
Fig. 4 Initial evolution of rm/||b|| for s = −1. Relative residual vectors rm/||b|| are visualized at three iteration steps m = 0, 2, 4. In each plot, the column on each interval represents the norm of rm/||b|| projected onto the sum of the eigenspaces of the eigenvalues contained in the interval. Notice that all the columns almost vanish only after four iteration steps. In the plots for m = 2 and m = 4, the residual polynomials pm are also plotted as solid curves; note that they always satisfy the condition (18).
Fig. 5
Fig. 5 Initial evolution of rm/||b|| for s = 0. Relative residual vectors rm/||b|| are visualized at three iteration steps m = 0, 2, 4. In each plot, the column on each interval represents the norm of rm/||b|| projected onto the sum of the eigenspaces of the eigenvalues contained in the interval. Notice that most columns almost vanish only after four iteration steps, except for the very persistent column at λ ≃ 0. In the plots for m = 2 and m = 4, the residual polynomials pm are also plotted as solid curves; note that they always satisfy the condition (18).
Fig. 6
Fig. 6 Impact of the magnitude of the smallest root of a polynomial m ∈ 𝒫m on the oscillation amplitudes of m. Three m of degree 6 are shown. In each figure, a solid line represents a polynomial; an open dot on the horizontal axis indicates the smallest root; solid dots indicate the other roots; dashed lines show the slopes of the polynomial at the roots. The three polynomials have the same roots except for their smallest roots: the smallest root in (a) becomes smaller positive and negative roots in (b) and (c), respectively. Notice that the slopes at all roots in (a) become steeper in (b) and (c) as the smallest root decreases in magnitude, and as a result the amplitudes of oscillation of m around the horizontal axis increase.
Fig. 7
Fig. 7 Evolution of rm/||b|| for s = +1. Relative residual vectors rm/||b|| are visualized at iteration steps m = 0, 4, 7 in the first row and at m = 11, 120, 140 in the second row. In each plot, the column on each interval represents the norm of rm/||b|| projected onto the sum of the eigenspaces of the eigenvalues contained in the interval. The vertical scale of the plot is magnified as the iteration proceeds. Notice that the column at λ ≃ 0 is very persistent during the later period of the iteration process (m = 120, 140). In the plots for m = 4, 7, 11, the residual polynomials pm are also plotted as solid curves; the residual polynomials are not plotted for m = 100 and m = 120 because they have too many roots.
Fig. 8
Fig. 8 Candidates for the residual polynomials for (a) a nearly positive-definite matrix and (b) strongly indefinite matrix. In each figure, a solid line represents a polynomial m ∈ 𝒫m; an open dot on the horizontal axis indicates the smallest-magnitude root; solid dots indicate the other roots; dashed lines show the slopes of the polynomial at the roots. The two polynomials have the same smallest-magnitude root ζ0, and thus have approximately the same slope −1/ζ0 at their smallest-magnitude roots. Note that for both m the slopes get steeper at the roots further away from the median of the roots. Hence, the slopes of most dashed lines are gentler than 1/|ζ0| in (a) and steeper than 1/|ζ0| in (b). As a result, m in (b) has larger amplitudes of oscillation around the horizontal axis than m in (a).
Fig. 9
Fig. 9 Three inhomogeneous systems for which Eq. (7) is solved for s = −1, 0, +1 by QMR: (a) a slot waveguide bend formed in a thin silver film (Slot), (b) a straight silicon waveguide (Diel), and (c) an array of gold pillars (Array). The figures in the first row describe the three systems. The directions of wave propagation are shown by red arrows, beside which the vacuum wavelengths used are indicated. For all three systems, the waves are excited by electric current sources J strictly inside the simulation domain. The plots in the second row show the convergence behavior of QMR. Note that for all three systems QMR converges fastest for s = −1, whereas it barely converges for s = +1. The relative electric permittivities of the materials used in these systems are ε r silver = 129 i 3.28 [34], ε r silica = 2.085 [35], ε r silicon = 12.09 [35], and ε r gold = 10.78 i 0.79 [36], respectively.

Tables (3)

Tables Icon

Table 1 Properties of the eigenvalue distributions of T0 for different s. Depending on the sign of s, T0 has very different eigenvalue distributions in terms of the multiplicity of the eigenvalue 0 and the definiteness of T0.

Tables Icon

Table 2 Specification of the finite-difference grids used for the three systems in Fig. 9. Slot uses a nonuniform grid with smoothly varying grid cell size. The matrix A has 3NxNyNz rows and columns, where the extra factor 3 accounts for the three Cartesian components of the E-field.

Tables Icon

Table 3 Parameters used in Eq. (16) for the three systems in Fig. 9. When substituted in Eq. (16), these parameters prove that all the three systems are in the low-frequency regime.

Equations (45)

Equations on this page are rendered with MathJax. Learn more.

× × E ω 2 μ 0 ε E = i ω μ 0 J ,
A x = b ,
2 Ψ ω 2 μ 0 ε ( Ψ + φ ) = i ω μ 0 J ,
( ε E ) = 0 .
× × E + s [ ( ( ε / ε 0 ) E ) ] ω 2 μ 0 ε E = 0
i ω ρ + J = 0 , or ( ε E ) = i ω J ,
× × E + s [ ε 1 ( ε E ) ] ω 2 μ 0 ε E = i ω μ 0 J + s i ω [ ε 1 J ]
× × E + s ( E ) ω 2 μ 0 ε E = i ω μ 0 J + s i ω ε ( J ) ,
T = × ( × ) + s ( ) ω 2 μ 0 ε
T 0 = × ( × ) + s ( )
F k = [ k x k y k z ] , [ k z 0 k x ] , [ k y k x 0 ]
λ = 0 , | k | 2 , | k | 2
λ = | k | 2 , 0 , 0
λ = s | k | 2 , | k | 2 , | k | 2
ω 2 μ 0 | ε | 8 / Δ min 2 .
λ 0 / Δ min π | ε r | / 2 ,
r m = b A x m
p ˜ m ( 0 ) = 1 .
r ˜ m p ˜ m ( A ) r 0 .
r m = p m ( A ) r 0 .
A = V Λ V ,
Λ = [ λ 1 λ n ] , V = [ v 1 v n ]
r ˜ m = V p ˜ m ( Λ ) V r 0
z ˜ m V ( r ˜ m / b ) ,
z m V ( r m / b ) .
z ˜ m = p ˜ m ( Λ ) z 0 = [ p ˜ m ( λ 1 ) p ˜ m ( λ n ) ] z 0 ,
z ˜ m i = p ˜ m ( λ i ) z 0 i .
h m j = [ λ i t j z m i 2 ] 1 / 2 .
× μ 1 × E + s [ ( μ ε ) 1 ( ε E ) ] ω 2 ε E = i ω J + s i ω [ ( μ ε ) 1 J ] ,
F = F k e i k r ,
[ k y 2 + k z 2 k x k y k x k z k y k x k z 2 + k x 2 k y k z k z k x k z k y k x 2 + k y 2 ] [ F k x F k y F k z ] = λ [ F k x F k y F k z ] ,
[ k x 2 k x k y k x k z k y k x k y 2 k y k z k z k x k z k y k z 2 ] [ F k x F k y F k z ] = λ [ F k x F k y F k z ] .
λ = 0 , | k | 2 , | k | 2 ,
λ = | k | 2 , 0 , 0 ,
F k = [ k x k y k z ] , [ k z 0 k x ] , [ k y k x 0 ]
p ˜ m ( ζ ) = i = 1 d m ( 1 ζ ζ i ) ,
p ˜ m ' ( ζ k ) = 1 ζ k i k ( 1 ζ k ζ i ) .
| p ˜ m ' ( ζ 1 ) | = 1 ζ 1 ( 1 ζ 1 ζ 2 ) ( 1 ζ 1 ζ d m ) ,
| p ˜ m ' ( ζ k ) | = ( ζ k ζ 1 1 ) [ 1 ζ k i 1 , k | 1 ζ k ζ i | ] ,
| p ˜ m ' ( ζ 1 ) | = 1 | ζ 1 | ( 1 + | ζ 1 | ζ 2 ) ( 1 + | ζ 1 | ζ d m )
| p ˜ m ' ( ζ k ) | = ( ζ k | ζ 1 | + 1 ) [ 1 ζ k i 1 , k | 1 ζ k ζ i | ]
p ( ζ ) = α i = 1 m ( ζ ζ i ) ,
p ( ζ k ) = α i k ( ζ k ζ i ) .
| p ( ζ k + 1 ) | | p ( ζ k ) | = k ! ( m k 1 ) ! ( k 1 ) ! ( m k ) ! = k m k .
| p ( ζ k + 1 ) | | p ( ζ k ) | = i k + 1 | ζ k + 1 ζ i | i k | ζ k ζ i | = i k , k + 1 | ζ k + 1 ζ i | | ζ k ζ i | = i = 1 k 1 ( ζ k + 1 ζ i ζ k ζ i ) i = k + 2 m ( ζ i ζ k + 1 ζ i ζ k ) = [ i = 1 k 1 ( 1 + ζ k + 1 ζ k ζ k ζ i ) ] [ i = k + 2 m ( 1 ζ k + 1 ζ k ζ i ζ k ) ] .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.