Zernike polynomials provide a well known, orthogonal set of scalar functions over a circular domain, and are commonly used to represent wavefront phase or surface irregularity. A related set of orthogonal functions is given here which represent vector quantities, such as mapping distortion or wavefront gradient. These functions are generated from gradients of Zernike polynomials, made orthonormal using the Gram-Schmidt technique. This set provides a complete basis for representing vector fields that can be defined as a gradient of some scalar function. It is then efficient to transform from the coefficients of the vector functions to the scalar Zernike polynomials that represent the function whose gradient was fit. These new vector functions have immediate application for fitting data from a Shack-Hartmann wavefront sensor or for fitting mapping distortion for optical testing. A subsequent paper gives an additional set of vector functions consisting only of rotational terms with zero divergence. The two sets together provide a complete basis that can represent all vector distributions in a circular domain.
©2007 Optical Society of America
Zernike polynomials [1–3] are commonly used in optical testing, engineering, and analysis. There are two reasons for this. First of all, Zernike polynomials are orthogonal in a unit circle, which is convenient since many optics are circular in shape. Secondly, the lower order members of Zernike polynomials represent typical optical wavefront aberrations such as power, astigmatism, coma and spherical aberration. Besides direct wavefront measurements, wavefront slopes are often measured as well, e.g. with shearing interferometry , Shack- Hartmann sensors , or a scanning pentaprism test . Various techniques have been developed to convert measured slope data to a wavefront map expressed in terms of Zernike polynomials. Garvrelides  developed a set of vector polynomials that are orthogonal to the gradients of Zernike polynomials but not mutually orthogonal. The coefficient for a specific Zernike polynomial representing the wavefront can then be directly calculated from integration of the dot product of the slope and the corresponding vector polynomial. Acosta  et al, took a different approach but arrived at similar results. This approach skips the intermediate step of fitting the vector slope data and obtains the wavefront directly. Yet, it is desirable to fit measurement data in the measurement space. In this case, a set of vector polynomials is needed to fit the vector slope data.
Vector polynomials are also used for quantifying mapping distortion, which is important for accurate measurement of optical surfaces  and can be severe due to the use of null optics. Typically, polynomial mapping functions are defined and the coefficients are fit to data using least squares techniques. [10, 11]
Although the above problems can be solved using a least squares fit to vector functions that are not orthogonal over the domain, the results are not optimal. The fit to a non-orthogonal basis set can require many more terms than are necessary, and the coefficients themselves may not be meaningful, because the value for any particular coefficients will change as higher order terms are fit. When fitting to real data, the propagation of noise is increased with the use of non-orthogonal basis functions. If the functions are truly orthogonal, the least squares solution is not necessary, coefficients can be determined by a much simpler and computationally efficient inner product. Clearly, an orthonormal basis is desired.
In this paper, we present such a desired set of vector polynomials which are orthonormal in a unit circle. These polynomials are perfect for fitting slope data. Since they are gradients of linear combinations of Zernike polynomials, it is also straightforward to convert the fitted slope map to the wavefront map expressed in terms of Zernike polynomials.
In Section 2, we present the Zernike notations that we adopted from Noll’s landmark paper1 and list the gradients of the Zernikes following the recursion relationships presented there. We then use the Zernike gradients as a basis to obtain an orthonormal set of vector polynomials using the Gram-Schmidt method and present the result in Section 3. The mapping from the orthonormal vector polynomials to gradients of scalar functions represented by standard Zernike polynomials is discussed in Section 4.
The vector set is made complete with the addition of a complementary set of vector polynomials with non-zero curl, as presented in a subsequent paper.  The addition of this second set of functions provides a complete basis, capable of representing any vector distribution in the circular domain. Applications of the vector polynomials for fitting the slope data taken from Shack-Hartmann sensors or other slope measurement devices, and in correcting mapping distortions for null tests of aspheric surfaces will be presented in subsequent papers as well.
2. Zernike polynomials and their gradients
There are different numbering schemes for Zernike polynomials. In this paper we adopt Noll’s notation and numbering scheme which defines the polynomials in polar coordinates as
j: the general index of Zernike polynomials
n: the power of the radial coordinate r
m: the multiplication factor of the angular coordinate θ
n and m have the following relations: m≤n and (n - m) is even
The general index j has no physical meaning, while the indices n and m do. For a given j, there is a unique corresponding pair of (n, m), and the parity of j determines the angle dependence of the polynomial. While for a given pair of (n, m), j is ambiguous when m≠0. In some relationships given in the subsequent text, n and m are usually known, but the corresponding j (therefore the sine or cosine angle dependence of the polynomial) depends on other factors. For this reason, we choose to use j(n, m) for the general index of a Zernike polynomial to show that n and m are known and the actual j will be determined by other conditions. The first 37 polynomials of this numbering scheme are listed in the Appendix, where the aforementioned relationship between j and (n, m) can be seen as well.
As the first step toward establishing an orthonormal basis of vector polynomials, we derive the gradients of the Zernike polynomials. We take the gradient of each Zernike polynomial and apply the recursion relationships from Noll to represent the gradients as linear combinations of lower order Zernike polynomials. The first 37 gradient terms are presented in Table 1. These functions provide a complete basis to represent gradients, but they require further manipulation to create an orthonormal set.
3. An orthonormal set of vector polynomials
We use linear combinations of the above terms to create an orthogonal set. We define the inner product of two vector polynomials defined in a unit circle as
where the integration is over a unit circle.
The inner product is taken of the above gradient functions, and some results are shown in Table 2 (the table is symmetric about the diagonal, but only non-zero elements under the diagonal are shown). These Zernike gradient polynomials are not orthogonal, as the matrix of inner products listed in Table 2 is not diagonal.
3.1 Orthogonalization of gradient functions
Using the Gram-Schmidt orthogonalization method [13, 14] (general description for the method can be found in Reference 13, and an optical application can be found in Reference 14), we construct a new set of vector polynomials with Zernike gradient polynomials as basis. The gradient of Z1 is zero, therefore it is not used in the construction of the new set. We choose to index the first polynomial of this new set as 2 to maintain its correspondence with Zernike polynomials. The first 36 such polynomials are listed in Table 3.
In general, the S polynomials can be simply expressed in terms of Zernike gradient polynomials:
For all j with n=m,
For all j with n≠m,
where j-j′ is even when m≠0.
3.3 S as linear combinations of Zernike polynomials
Given that the vector polynomials S are functions of Zernike gradient polynomials and the Zernike gradient polynomials are functions of Zernike polynomials, we can express S in terms of Zernike polynomials as listed in Table 4.
For a given S j with corresponding indices j(n, m), we define its x and y components as S jx and S jy, respectively, i.e.
From observation of the first 37 S polynomials, we found that both S jx and S jy are linear combinations of at most two Zernike polynomials with corresponding indices j ′(n-1,m±1) which may or may not exist binding by the rules n≥m≥0:
For a given j(n, m), a set of rules can be used to determine all the parameters in Eq. (7) to express S j as linear combinations of Zernike polynomials. These rules are summarized in Table 5. These rules are useful for obtaining analytical expression of any S polynomial by programming. They are complex since we have to deal with different cases of j, n and m combinations. The complexity mostly comes from the numbering scheme. In Noll’s numbering scheme, even j correspond to cosine angle terms and odd j to sine angle terms and these terms swap order each time after an m=0 term. The rules will be simpler if we just use the sine/cosine dependence of the terms. Basically, if an S polynomial has the same j index of a Zernike polynomial, its x-component is the linear combination of the Zernikes with same sine or cosine angle dependence, and the y-component has the opposite angle dependence. For example, for S 32, the corresponding Z 32 has cosine angle dependence, so the x-component of S 32 has Z24 and Z26 terms which both have cosine angle dependence, while the y-component of S 32 has Z23 and Z25 terms which both have sine angle dependence.
3.4 Plots of vector polynomial functions
The plots of first 12 S vector polynomials are shown in Table 6.
4. Relating the vector polynomials to gradients of scalar functions
The set of S polynomials fully spans the space of vector distributions V⃗(x, y) over the unit circle where a scalar function Φ(x,y) exists such that V⃗(x,y)=∇Φ(x,y). It is useful to represent the vector data using the vector polynomials S and relate to a scalar functions ϕ that are defined as S⃗i=∇ϕ i.
Applying the rules listed in (4) and (5), the scalar functions can be calculated as
For all j with n=m,
For all j with n≠m,
where j-j ′ is even when m≠0.
These relationships match those demonstrated for the vector functions listed in Table 3. For example, , which leads .
Applying these relations, the vector data V⃗(x,y) is decomposed into a linear combination of the orthonormal S polynomials as
Using the definitions of the scalar functions Φ and ϕ i(V⃗=∇Φ, S⃗i=∇ϕ i), we have
where the coefficients α i were found from the vector decomposition in Eq. (10). Then the scalar function Φ can in turn be represented as linear combinations of standard Zernike polynomials:
The coefficients of these standard Zernike polynomials can be found by
where j-j ′ is even when m≠0.
This procedure is useful for applications such as processing data from a Shack Hartmann sensor. The centroid data, which is proportional to wavefront slopes, can be fit to the vector S polynomials to give a set of coefficients α i. These are converted directly to a standard Zernike polynomial representation of the wavefront, with coefficients γ i.
A reverse problem is: given a scalar function Φ and its Zernike decomposition coefficients γ i, we can find α i from Eq. (13). When Φ is a wavefront, the rms spot radius is , where f is the system F number.
We derived an orthonormal set of vector polynomials in a unit circle. It has many potential applications, one of which is fitting slope data in optical testing. These polynomials are linear combinations of at most two Zernike polynomial’s gradients. They can be expressed as linear combinations of at most four scalar Zernike polynomials as well. After wavefront slope data, e.g. data taken with a Shack-Hartmann sensor, is fit with the vector polynomials, it is straightforward to convert the fitted slope map to the wavefront map expressed in terms of Zernike polynomials.
References and links
1. R. J. Noll, “Zernike polynomials and atmospheric turbulence,” J. Opt. Soc. Am. 66, 207–211 (1976). [CrossRef]
2. M. Born and E. Wolf, Principles of Optics, (Pergamon Press, 1980) pg. 464–468.
6. P. C. V. Mallik, C. Zhao, and J. H. Burge, “Measurement of a 2-meter flat using a pentaprism scanning system,” Opt. Eng. 46, 023602 (2007). [CrossRef]
8. E. Acosta, S. Bara, M. A. Rama, and S. Rios, “Determination of phase mode components in terms of local wave-front slopes: an analytical approach,” Opt. Lett. 20, 1083–1085 (1995). [CrossRef] [PubMed]
9. P. E. Murphy, T. G. Brown, and D. T. Moore, “Interference imaging for aspheric surface testing,” Appl. Opt. 39, 2122–2129 (2000). [CrossRef]
10. J. H. Burge, Advanced Techniques for Measuring Primary Mirrors for Astronomical Telescopes, Ph. D. Dissertation, Optical Sciences, University of Arizona (1993).
11. DurangoTM Interferometry Software, Diffraction International, Minnetonka, MN.
12. C. Zhao and J. H. Burge, “Orthonormal vector polynomials in a unit circle, Part II : completing the basis set,” to be submitted to Optics Express (2007).
13. T. M. Apostol, Linear Algebra: A First Course, with Applications to Differential Equations (John Wiley & Sons, 1997), Page 111–114.
14. R. Upton and B. Ellerbroek, “Gram-Schmidt orthogonalization of the Zernike polynomials on apertures of arbitrary shape,” Opt. Lett. 29, 2840–2842 (2004). [CrossRef]