Abstract
Mathematical methods that are poorly known in the field of optics are adapted and shown to have striking significance. Orthogonal polynomials are common tools in physics and optics, but problems are encountered when they are used to higher orders. Applications to arbitrarily high orders are shown to be enabled by remarkably simple and robust algorithms that are derived from well known recurrence relations. Such methods are demonstrated for a couple of familiar optical applications where, just as in other areas, there is a clear trend to higher orders.
©2010 Optical Society of America
1. Introduction
Many of the special functions used in optics satisfy three-term recurrence relations. These include trigonometric functions, Bessel functions (including modified Bessels and Hankels), and various orthogonal polynomials. Although most of the ideas discussed in what follows apply for applications of any in this class of functions, I concentrate on orthogonal polynomials in particular. Among the better known orthogonal polynomials in optics are the Zernike polynomials [1] (for characterizing a function defined on a disc, or modified for cases with non-uniform weighting, or for annular domains) and the Laguerre and Hermite polynomials [2] (for characterizing profiles of transversely localized beams). Traditionally, such polynomials were only ever needed to modest orders, say up to ten or so terms. (Note that, in the case of Zernikes, there are distinct polynomials for each azimuthal order, so the total number of terms is often more than thirty, but these hold a more modest number of terms for each azimuthal order.) This has meant that explicit expressions have been widely used for the polynomials, and they have generally been perfectly adequate.
The inclusion of higher-order terms in such decompositions is increasingly being used to boost capabilities. Two considerations limit progress in this direction, however. First, the explicit expressions for the polynomials lead to numerical round-off problems that become catastrophic when the number of terms reaches around 20 —a lesson sometimes learned the hard way. Second, computational efficiency (i.e. arithmetic operation count) remains a vital consideration in applications such as optical design: despite the staggering growth of computer power, multi-dimensional global optimization is so challenging that its results will always benefit from efficiency gains. The methods discussed below lead to simple code that is remarkably efficient while also avoiding round-off failures.
Any set of orthogonal polynomials, say where is a polynomial of order m, is defined so that
where is typically an integral over the domain of interest of the product of and . When the basis is normalized by , the coefficients in an expansion of the formare then determined individually by noticing that , i.e.These coefficients are evidently like a spectrum, where the analogue of Parseval’s theorem follows similarly from Eqs. (1.1) and (1.2):This is an intuitive conservation law that couples the total energy in the spectrum to the energy in the original function.A result that is of central importance in this work follows upon considering the expansion of the function defined by . On account of Eq. (1.1), vanishes when ϕ is any polynomial of order less than m. It follows that vanishes unless . This observation can be expressed as
where q, r, and s, are constants and . Equation (1.5) can be re-arranged to give the standard form for the three-term recurrence relation that underpins the following sections:Such a relation follows regardless of the normalization. In fact, in place of , normalization by peak value is the usual convention for the applications considered in this work. Simple closed forms for , , and are known for all the most commonly used orthogonal polynomials, e.g. see Sec. 22.7 of [3].Since the well known orthogonal polynomials in optics are among the standard set, the existence of the associated recurrence relations, generating functions, Christoffel-Darboux identities, etc. has often been noted and occasionally rediscovered. In the case of Zernikes, see [4–7] as examples. Some of the associated recurrence relations have been adopted in the field of image processing, as reviewed in [8]. Even so, in most applications of Zernikes, significant effort —and space— is devoted to working with the explicit expressions for the polynomials of moderate orders. It has not been generally appreciated that, in practice, this is a road to grief. With r as the radial coordinate on a unit disc, each element of a Zernike decomposition involves or times . These polynomials for m’th azimuthal order are related to the Jacobi polynomials by , e.g. see Eq. (3.7) of [4]. The domain of interest is . The explicit expression that is the focus of many applications in such work is
(This is given by, for example, 22.3.3 and 22.5.2 of [3]). From among the rotationally symmetric subset, as an example, it follows thatSince, six or seven significant digits are evidently lost through heavy cancellation when evaluating Eq. (1.8) for any . More generally, the magnitude of the coefficients of is found to be about . This means that, even when using double precision arithmetic, it is impossible to guarantee any accurate decimal places at all in once n exceeds about 20. Accuracy can be seen to suffer long before that point of total catastrophe and, as demonstrated in Fig. 1 , this problem is avoided by using Eq. (1.6). The details of this particular recurrence relation are given in Section 4, and the stability of such relations is discussed in, for example [9].Another benefit offered by Eq. (1.6) can be appreciated when evaluating a function expressed as in Eq. (1.2) with the sum truncated at, say, . Even when nested according to the Horner scheme, the evaluation of requires about arithmetic operations. In this way, assuming all the coefficients are pre-computed, the evaluation of therefore requires about arithmetic operations. When Eq. (1.6) is employed, on the other hand, each of the basis elements is then determined with just five arithmetic operations, so only are required for the entire set. In either case, forming the linear combination that constitutes takes another operations, but the main point to be made is that an “ process” is converted to one of . This comes on top of the vital accuracy gain demonstrated in Fig. 1. More importantly, however, the focus of this paper is on three additional steps that were developed decades ago by Clenshaw [10], Smith [11], and Salzer [12], respectively, to extract additional benefits from Eq. (1.6). Their elegant, but little used, ideas are given unified derivations below and are adapted to the context of optics.
2. Linear combinations of orthogonal polynomials and their derivatives
An obvious application of Eq. (1.6) is for functions defined as a truncated sum of the type in Eq. (1.2), say
The required basis members can first be evaluated sequentially by using Eq. (1.6), and the end result then formed as a linear combination with the coefficients . Clenshaw [10] presented a related recurrence relation that allows these two steps to be combined. As proven in Appendix A, by starting with and , and progressively working downward from by usingthe desired result is given by just . This process has all the robustness of Eq. (1.6), and it is even more efficient at evaluating Eq. (2.1). Each cycle of Eq. (2.2) involves three multiplications and three additions, and this must be done times to determine the sum of interest, namely. In this way, when , , and are pre-computed, Eq. (2.1) can be evaluated robustly with about operations via a remarkably simple bit of code to perform nothing more than Eq. (2.2). Notice that no single basis member is evaluated along the way; this is an elegant short-circuit from the spectrum to the function value. When nested as a Horner scheme, evaluating a regular polynomial of order M requires only operations. A factor of three in operation count is evidently the minimal cost that must be paid to avoid catastrophic numerical instability. There are significant side-benefits, however, and a sample is commented on in one context in Section 5. Many of them follow from the fact that the rms value of a function is just the square root of the sum of the squared coefficients, as in Eq. (1.4).Clenshaw’s process turns out to open the way to an equally effective scheme for computing derivatives. Although Eq. (1.6) can be differentiated to find a recurrence relation for , a more direct determination of can be performed with a simple variant of Eq. (2.2). In fact, the same thing works for higher derivatives. If the j’th derivative is written as , it is proven in Appendix B that if we start with and , then
can be used by working down from to find the desired result: for any . Since Eqs. (2.2) and (2.3) have the same structure, it is effectively the same block of highly optimized code that can be employed to implement them both. That the derivatives follow so readily is a more significant benefit than the minor efficiency gain in going from the operations discussed before Fig. 1 to the operations for Clenshaw.Something equivalent to Eqs. (2.3) was presented by Smith [11], but with minimal derivation. The argument in his paper seems to proceed via differentiation of Eq. (1.6) instead of Eq. (2.2), as done in Appendix B. He presents two intermediate expressions that appear to complicate any interpretation of the process, and inconsistently applies a notational convention for higher derivatives. [For example, Smith presents something akin to Eq. (2.3), but without the factor of j that appears immediately after the equals sign.] The confusion this may have caused could explain the fact that his beautiful result seems to have been ignored in much of the related literature, even in numerical classics like [9]. Similarly, Clenshaw’s result has also been poorly appreciated in physics and optics and, again, it may be no coincidence that I have been unable to find a clear derivation of Eq. (2.2) in the literature. This is what motivated me to develop Appendices A and B. It is also interesting to note that Doha [13] and Barrio [14] both offer some interesting analyses and alternative results, although they are specific to Jacobi polynomials: Doha re-expresses as a linear combination (with constant coefficients) of the original basis elements, i.e. in terms of instead of , while Barrio gives without requiring all lower derivatives. The fact that the derivatives of Jacobi polynomials are once again Jacobi polynomials, opens the door for their special results. Thankfully, the simple process described above applies more generally to any polynomials that satisfy Eq. (1.6) and Eq. (2.3) is all that is required in our context to determine derivatives.
3. Change of basis
When a function is expressed in terms of the basis as
consider determining its representation in terms of a second basis, sayThat is, the goal is to determine from where each of the polynomial bases satisfies a three-term recurrence relation like Eq. (1.6), sayBy using Eqs. (3.3), Ting and Luke [15] derived a five-term recurrence relation for determining the elements of an upper-triangular change-of-basis matrix that multiplies to give . Just as the methods discussed in the previous section absorbed into a single-pass recurrence, Salzer [12] presented an analogous, apparently little-known process to change bases. That is, Salzer’s five-term recurrence absorbs , and its output is the elements themselves. (Salzer predated Ting and Luke, but the latter’s comments suggest that they misunderstood his method.) Sample applications are presented in Sections 4 and 5 where this method greatly simplifies past and current developments in optics.Salzer’s process is derived in Appendix C in terms of what can be regarded as an matrix with elements [not to be confused with the differentiated polynomial written as in Section 2]. It is convenient to introduce some auxiliary matrices in terms of the constants from the recurrence relations of Eqs. (3.3), namely
In some applications, it may be appropriate for these to be pre-computed and stored. It turns out that is triangular with non-zero elements: unless . This property can be used to simplify the penultimate equation of Appendix C, namely Eq. (C.6). At , one term drops out leaving
A different term drops for , so thatFor , three terms vanish leaving onlyand only one term remains when , namelyThe change of basis is realized by starting at , where the only non-zero term is , and with the two terms for , namely and . From these; Eqs. (3.5)-(3.8) can be used to work down from to . It is shown in the appendix that the desired result is just .Equations (3.5)-(3.8) allow each to be determined in no more than seven arithmetic operations. The roughly non-zero elements can therefore be determined with about operations. Notice that, as for the processes in Section 2, this change of basis is achieved without evaluating a single member of either basis set. More significantly, there is no need to determine integrals involving the basis functions as you might expect from Eq. (1.3). It is a powerful and efficient shortcut. Its value is enhanced by the fact that the bases need not be orthogonal polynomials; all that is required is that they satisfy a three-term recurrence. For example, the regular monomial basis obviously satisfies Eq. (1.6) with and , and this is exploited in Section 5.
4. Applications for Zernike polynomials
As stated in the Introduction, the Zernike polynomials of m’th azimuthal order are related to the Jacobi polynomials by . It follows that satisfies a recurrence relation like Eq. (1).6). With , this relation has
(See, for example, 22.7.1 of [3]). Any number of Zernikes can therefore be computed robustly and efficiently by using Eq. (1.6). More importantly, the same is possible for linear combinations of Zernikes, and also of their derivatives, by applying the results of Section 2. Perhaps the most significant observation relates to the problem of changing aperture size. Suppose that the modified aperture is a fraction, say , of the first, where typically . This process arises in a variety of contexts including ophthalmology and lithography, and it has been revisited repeatedly over the last decade, see e.g [16–20]. In all of this work, each azimuthal order can be treated separately for this change of basis. The challenge therefore is, for given values of , find so that is identical (as a function of r) to .The authors of [16–20] have started from Eq. (1.7) and tackled laborious algebra in order to derive explicit expressions for the matrix elements in the change-of-basis matrix. The problem is that their results take an almost identical form to Eq. (1.7) —see e.g., Eq. (19) of [19] or Eq. (29) of [20]. These matrix elements are polynomials in and, while the expressions are exact to within round-off errors, they nevertheless fail as catastrophically as Eq. (1.7) when the number of terms grows to 20 or more. To make matters worse, this failure is most extreme for , which is precisely where the expressions are typically used. An elegant, specialized alternative is given in [21] where the matrix elements are themselves expressed as differences between two Zernikes. Their impressive derivation relies on two integral relations involving Bessel functions. A similarly robust and efficient scheme follows simply from the more general results given in the previous section, however.
This direct application of the results of Section 3 involves finding so that
where and are both just . Both of the recurrence relations in Eqs. (3.3) are therefore given by Eqs. (4.1), but with the one distinction that due to the scaling of x in the second basis. The end result is then evidently just . In this way, Salzer’s elegant general-purpose routine means that there was no need to wrestle with special functions or explicit high-order expansions. More importantly, the failure of the explicit expressions is thereby avoided. It is also interesting to notice that, for this application, , , and of Eqs. (3.4) are linear functions of . This means that is a polynomial of order in and, just as in Eq. (B.3), Eqs. (3.5)-(3.8) can be differentiated with respect to to obtain a recurrence for determining the derivative of with respect to . Such an entity is used for sensitivity analysis in [21].5. Applications for asphere shape specification
Advances in both fabrication and metrology are enabling the introduction of more complex and precise aspheric optical surfaces. This trend demands larger numbers of terms in the polynomial used in the standard characterization of surface shape. The methods described above are ideally suited to one of the orthogonal bases that were recently introduced for efficient characterization in this context [22]. If c denotes the axial curvature and κ the conic constant, the surface’s sag is written as
where u is just the normalized radial coordinate, i.e. with . A sample of these basis members is plotted in Fig. 2 . It so happens that , so the recurrence relation for this set follows from Eqs. (4.1) with :The methods of Sections 2 and 3 are ideally matched to working with aspheres in these terms. In this way, the potentially chronic round-off problems are localized to the basis members themselves, and tamed there by using recurrence relations. As discussed in [22], the fact that the coefficients now function like a spectrum has a variety of benefits.
Upon introducing and , Eq. (5.1) and its derivatives become
Such entities are fundamental to working with aspheres, and they can now be robustly evaluated to arbitrary orders with remarkable efficiency by using Eqs. (2.2) and (2.3). Scaling the aperture size is another basic process in this context. Because , this can be achieved simply by taking in the process described in Section 4. That is, the new coefficients that give exactly the same surface —but now orthogonalized over a different aperture size— can be determined robustly and efficiently via simple code.Perhaps more interestingly, the conventional manner to describe surfaces is by expressing the additive polynomial of Eq. (5.1) as a sum of monomials, say
With , this relation can be rewritten aswhere . It is once again straightforward to apply the process presented in Section 3 to convert backwards and forwards between these two representations: one of the recurrences is described by Eq. (5.2), while the other has a simpler form, namelyThe fact that two of these coefficients vanish means that terms drop out of Eqs. (3.5)-(3.8), and a slightly optimized form may be implemented for each of these inter-conversions. Of course, the degeneracy of means that the conventional representation is prone to round-off failure. The conversion to and from this basis is therefore recommended only for modest numbers of terms (not many more than a dozen or so). Orthogonal bases will hopefully soon be the norm in this field, and there will be little need for such conversions.6. Concluding remarks
Four basic processes are described in this work, namely the evaluation of (i) a set of orthogonal polynomials, (ii) a linear combination of them, (iii) a linear combination of their derivatives, and (iv) changing from one such basis to another. These regularly enter optics, but much of the related work was focused on explicit expressions for the polynomials. It has not been widely appreciated that, although they are exact, these expressions fail numerically when the number of terms grows much beyond ten or so. Further, these decompositions also tend to become computationally intensive as the number of terms grows. Robust and efficient alternatives were presented here in each case. None of these alternatives for (ii)-(iv) require the evaluation of a single orthogonal polynomial; they all operate directly upon the coefficients of the expansion and lead to simple code that avoids round-off problems.
Although these methods were introduced decades ago, it appears that they were ahead of their time. They now open the door to benefits delivered by larger numbers of terms in various contexts including modeling based on Gaussian-beam-mode decompositions, various applications of Zernike polynomials, or optical design based on the methods of Section 5. The two-step induction-based process that was introduced in the Appendices clarifies how these methods can be generalized to other contexts. In my view, the work of Clenshaw, Smith, and Salzer certainly deserves to be better appreciated in optics, and more widely known in general. Many of us can then benefit significantly by having faster, more versatile code that does not suffer from catastrophic round-off failure and can proceed to arbitrary orders.
Appendix A: Directly evaluating linear combinations
The three-term recurrence relations that underpin the key results in this work can be expressed as
where explicit expressions are in hand for and . Once and are specified, this relation can be used to generate those of higher order. Clenshaw [10] pointed out that Eq. (A.1) leads to a fast, robust way to evaluate any sum of the formThe idea is to avoid the evaluation of the elements by progressively eliminating them from the top end of the sum. The three-term recurrence in Eq. (A.1) means that the topmost two terms are therefore progressively modified. After eliminating , ,… , where , the result can be expressed asfor some and . It follows trivially from Eqs. (A.1) and (A.3) thatUpon replacing n in Eq. (A.3) with and comparing the result with Eq. (A.4), it is then evident that These two relations hold the key to the efficient evaluation of Eq. (A.2) since taking in Eq. (A.3) reveals thatBy starting with , Eqs. (A.5) and (A.6) yield and for all , hence S can then be determined simply with Eq. (A.7).Things can be simplified a little further by using Eqs. (A.6) and (A.5), respectively, to eliminate the β’s from Eqs. (A.5) and (A.7):
Since Eq. (A.5) establishes that , start Eq. (A.8) with and work downwards to determine and, ultimately, . The value of the sum in Eq. (A.2) then follows from Eq. (A.9). Keep in mind that and are specified at the outset. Although, as pointed out in [9], it is also possible to proceed from bottom up rather than top down in eliminating the P’s, this is not effective for the optical applications considered here. It is important to note, however, that [9] gives a useful treatment of applications involving trigonometric and Bessel functions as well as a sketch of stability analysis for recurrence-based processes. For many cases of interest, including those considered in Sections 4 and 5, it turns out that hence , and the conventional normalization is . As a result, Eq. (A.9) then reduces to an even more beautiful result:Appendix B: Directly evaluating derivatives of linear combinations
When in Eq.(A.2) is a polynomial of order m in x and each is independent of x, Smith [11] presented a scheme to compute the derivative of that sum with respect to x. With primes denoting derivatives, we now seek
(Note that , so it was dropped.) In this case, of Eq. (A.1) is a linear function, sayand of Appendix A is readily seen to be a polynomial of order for . Since is independent of x, differentiating Eq. (A.8) now leads directly toThis can be initialized with , and the result that we seek is then simplyThat is —assuming that S was evaluated beforehand— can be evaluated just as robustly and with much the same number of arithmetic operations.Of course, if , it is the derivative of Eq. (A.9) that yields the desired end result, but such cases aren’t needed in this work. What is more, even when and are more general functions of x, it is straightforward to use the derivative of Eq. (A.8) to get a slight generalization of Eq. (B.3). For the most common case laid out above, it is also clear that higher derivatives can be determined in much the same way: if a superscript in parentheses is used to denote higher derivatives, it follows trivially from Eq. (A.8) that
This is initialized by , and the desired higher derivative for our cases is then simply :Remarkably, nothing more complex than Eq. (B.5) is needed to robustly determine derivatives of any order.Appendix C: Directly changing to a new basis
The inter-conversion considered in Sec. 3 can be carried out by, much as in Appendix A, progressively eliminating from the top end of the sum in Eq.(3.1). In this case, however, is to be expresses as a linear combination of rather than as the nested polynomials of Appendix A. The intermediate result that replaces Eq.(A.3) is written as
where the α’s and β’s are now just constant coefficients. By using Eq. (3.3a), the first term on the right-hand side of Eq. (C.1) can be re-expressed asIn turn, the first term on the last line of Eq. (C.2) can be re-expressed upon using Eq. (3.3b) to see thatCombining Eqs. (C.2) and (C.3) with Eq. (C.1) givesUpon gathering the terms of Eq. (C.4) that involve and, separately, those that involve , the result can be equated to Eq. (C.1) with n replaced by . The first relation that emerges is simply
where is zero for all k except . As done for the β’s in Appendix A, Eq. (C.5) makes it easy to eliminate from the second relation that emerges in order to determine an analogue for Eq. (A.8), namely a recurrence relation for :Notice that the factors inside the parentheses in Eq. (C.6) involve just the constants from the recurrence relations in Eqs. (3.3). For the cases of interest in the body, we always have and . To appreciate the power of Eq. (C.6), notice that just as Eq. (A.3) becomes (A.10) upon taking , Eq. (C.1) leads toUpon comparing Eqs. (3.2) and (C.7), it follows that the coefficients sought here are just . It follows from Eq. (C.1) that whenever either or so, for example, for all k. With this as a given, Eq. (C.6) can be used, for fixed n, by running over . The process starts at and works down to in order to yield the desired end result.References and links
1. M. Born, and E. Wolf, Principles of Optics (Cambridge, 1999), see Sec. 9.2 and Appendix VII.
2. A. E. Siegman, Lasers (University Science Books, 1986), Chaps. 16–17.
3. M. Abramowitz, and I. Stegun, Handbook of Mathematical Functions (Dover, 1978), Chap. 22.
4. A. B. Bhatia, E. Wolf, and M. Born, “On the circle polynomials of Zernike and related orthogonal sets,” Proc. Camb. Philos. Soc. 50(01), 40–48 (1954). [CrossRef]
5. D. R. Myrick, “A generalization of the radial polynomials of F. Zernike,” J. Soc. Ind. Appl. Math. 14(3), 476–492 (1966). [CrossRef]
6. E. C. Kintner, “On the mathematical properties of the Zernike polynomials,” Opt. Acta (Lond.) 23, 679–680 (1976). [CrossRef]
7. R. Barakat, “Optimum balanced wave-front aberrations for radially symmetric amplitude distributions: generalizations of Zernike polynomials,” J. Opt. Soc. Am. 70(6), 739–742 (1980). [CrossRef]
8. C.-W. Chong, P. Raveendran, and R. Mukundan, “A comparative analysis of algorithms for fast computation of Zernike moments,” Pattern Recognit. 36(3), 731–742 (2003). [CrossRef]
9. W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes: The Art of Scientific Computing (Cambridge University Press, 1992) Section 5.5.
10. C. W. Clenshaw, “A Note on the Summation of Chebyshev Series,” Math. Tables Other Aids Comput. 9, 118–120 (1955), http://www.jstor.org/stable/2002068.
11. F. J. Smith, “An Algorithm for Summing Orthogonal Polynomial Series and their Derivatives with Applications to Curve-Fitting and Interpolation,” Math. Comput. 19(89), 33–36 (1965). [CrossRef]
12. H. E. Salzer, “A Recurrence Scheme for Converting from One Orthogonal Expansion into Another,” Commun. ACM 16(11), 705–707 (1973). [CrossRef]
13. E. H. Doha, “On the coefficients of differentiated expansions and derivatives of Jacobi polynomials,” J. Phys. Math. Gen. 35(15), 3467–3478 (2002). [CrossRef]
14. R. Barrio and J. M. Peña, “Numerical evaluation of the p’th derivative of Jacobi series,” Appl. Numer. Math. 43(4), 335–357 (2002). [CrossRef]
15. B. Y. Ting and Y. L. Luke, “Conversion of Polynomials between Different Polynomial Bases,” IMA J. Numer. Anal. 1(2), 229–234 (1981). [CrossRef]
16. K. A. Goldberg and K. Geary, “Wave-front measurement errors from restricted concentric subdomains,” J. Opt. Soc. Am. A 18(9), 2146–2152 (2001). [CrossRef]
17. J. Schwiegerling, “Scaling Zernike expansion coefficients to different pupil sizes,” J. Opt. Soc. Am. A 19(10), 1937–1945 (2002). [CrossRef]
18. C. E. Campbell, “Matrix method to find a new set of Zernike coefficients from an original set when the aperture radius is changed,” J. Opt. Soc. Am. A 20(2), 209–217 (2003). [CrossRef]
19. G. M. Dai, “Scaling Zernike expansion coefficients to smaller pupil sizes: a simpler formula,” J. Opt. Soc. Am. A 23(3), 539–543 (2006). [CrossRef]
20. H. Shu, L. Luo, G. Han, and J.-L. Coatrieux, “General method to derive the relationship between two sets of Zernike coefficients corresponding to different aperture sizes,” J. Opt. Soc. Am. A 23(8), 1960–1966 (2006). [CrossRef]
21. A. J. E. M. Janssen and P. Dirksen, “Concise formula for the Zernike coefficients of scaled pupils,” J. Microlith. Microfab Microsyst. 5(3), 030501–3 (2006). [CrossRef]
22. G. W. Forbes, “Shape specification for axially symmetric optical surfaces,” Opt. Express 15(8), 5218–5226 (2007), http://www.opticsinfobase.org/oe/abstract.cfm?URI=oe-15-8-5218. [CrossRef] [PubMed]