Fundamental estimation bounds for polarimetric imagery

Daniel A. LeMaster

doi:10.1364/OE.16.012018

1. Introduction

Passive polarimetric imaging (PI) is a form of remote sensing in which the target scene is characterized in terms of the polarization state of its incoherently reflected or emitted radiation. Polarimetric imaging has been applied to successfully address problems in astrophysics [1], terrestrial remote sensing [2], machine vision [3], and other disciplines. In all such applications, the required polarimetric measurements are achieved by combining multiple imaging channels each with a unique response to polarized light. These channels may be collected in a parallel configuration (i.e. each channel has its own optical path) or in series where one or more polarizing elements are manipulated between channel images. In either case, target polarization state estimates are inherently prone to both channel noise and registration errors. In this paper, we examine the combined effects of misregistration and channel noise in the statistical framework of the Cramer-Rao lower bound (CRLB).

For the present purpose, systems responsive to linear polarization are considered. Persons (et al.) noted that, among common means of collecting polarimetric imagery, translational and rotational registration errors between channels are most prevalent [4]. In this context, Persons and others [5, 6] have made limited attempts to characterize the effects of registration errors on polarimetric imagery using established (but polarization insensitive)registration algorithms. In addition, a recent registration study has also been conducted for the related problem of Mueller matrix imaging [7]. The results of these studies range from successful registration within stated tolerances to complete failure. Each of these efforts is a useful reference point, however, they lack the depth required to make complete comparisons across algorithms or imaging scenarios. Indeed, what is most lacking is a theoretical context in which to place these results.

The first step in providing this context is to identify the problem of estimating the polarimetric signature of a target scene as a joint estimation of the misregistration between channels and of the scene itself. Second, channel noise and misregistration are random variables, therefore, this framework must be statistical in nature. Finally, this framework must account for the influence of deterministic parameters such as channel orientation and spacing as well as polarization diversity in the scene. In this paper, the Cramer-Rao bound is derived for this joint estimation problem and expounded as a theoretical framework that meets all of these criteria.

The Cramer-Rao bound determines the lower limit on the variance of any estimator. The form of the bound is different for biased and unbiased cases. With the exception of section 5, this research focuses mainly on bounds for unbiased estimators. The utility of the CRLB for unbiased estimators is well described by Kay [8]; his arguments may also be tailored to this specific case: if a new estimator is devised, it is more efficient to compare it to a single bound rather than to attempt to simulate results for many different existing estimators. Alternatively, the bound can be used to determine if any existing algorithm meets this best case scenario. If no such algorithm exists, the bound can motivate the search for a better estimator. Similar studies via simulation (e.g. Monte Carlo methods) lack the theoretical foundation which the CRLB readily provides. In the present work, the bound is also shown to provide insight into the relationship between the sensor, target scene, and estimator.

Inspiration for applying this approach to the polarimetric imagery estimation problem comes from two recent works. Robinson and Milanfar [9] applied the Cramer-Rao bound to the problem of estimating the translational misregistration between pairs of (polarization insensitive) images in cases where the underlying scene is not estimated as a parameter. Yetik and Nehorai expanded this work to describe limits for other common misregistration issues (e.g. rotation, affine transformation) and for both feature based and intensity based registration algorithms [10]. Aside from insensitivity to polarization effects, the underlying scene assumption shared by both sets of authors obviates the need for a joint estimator and is therefore inadequate for the present purpose.

The bound derived in this work is most applicable to so called “feature matching, area-based” or “template matching” registration estimators. Common among these methods is that registration is achieved by comparing image intensities without regard to specific objects in the scene. A survey of these methods can be found in [11].

The newly derived bound determines the minimum achievable variance on estimates of the translational misregistration between channels and of the polarization state at each point in the scene. The inclusion of polarization effects and removal of the known image assumption from Robinson and Milanfar leads to a bound calculation that is computationally impractical using direct methods. As such, matrix theory is applied to the newly derived bound so that extraction of the relevant terms becomes tenable. Additionally, many well known polarization insensitive image registration algorithms exhibit bias and, in fact, some biased estimators have been shown to outperform unbiased estimators in theory and in practice. Though no such study has been conducted with polarimetric imagery, a biased estimator form of the CRLB is provided in anticipation of a similar eventual result.

The bound stands on its own as an algorithm evaluation metric, however, it may also be used in a quantitative evaluation of system design criteria. To this end, the bound is used to determine the conditions under which it is necessary to generate the channel registration parameters externally (as opposed to including them in the estimator). Furthermore, the polarization insensitive case is actually a special case of this new bound. As a consequence, Robinson and Milanfar’s bound is generalized to the case where the underlying image is unknown.

Finally, the bound can also be used to suggest optimal channel configurations. Several researchers have shown that optimum configuration (in terms of maximizing SNR) is achieved by distributing the channels evenly across all possible polarization states. In the most recent of these attempts, Tyo improved on this theory by considering the optimization problem in the presence of random channel calibration errors [15]. In perhaps the most directly related research, Tyo also showed that channels that are evenly spaced across all linear polarization states are optimal in a principal components sense [16]. The analysis presented here complements this body of research in that it demonstrates the same conclusion under different sets of assumptions and using different metrics.

1.1. Stokes vector formulation

The Stokes parameters are a standard tool for describing the polarization state of radiation in passive remote sensing literature. The great utility of the Stokes formalism stems from the fact that these parameters can be determined directly from observables. This section provides an introduction to the Stokes representation of polarized radiation to the extent that such knowledge is required for the present purpose. The interested reader can consult Collett [17] for a more detailed description. In the section that follows, the Stokes parameters are used to provide a concise example of how the inclusion of polarization content can frustrate attempts to explain registration estimator behavior using traditional (i.e. polarization insensitive) techniques.

Consider a theoretical optical system consisting of a solid-state detector and two specialized polarization manipulating elements. The first of these two elements, a retarder, introduces a phase delay, ϕ, between the x and y electric field components of the incident signal. The second element, a polarizer, transmits the portion of this field along angle θ (measured with respect to the x axis) and completely attenuates the field everywhere else. The intensity measured at the detector is given by:

I = \frac{1}{2} (S_{0} + S_{1} \cos 2 θ + S_{2} \cos ϕ \sin 2 θ + S_{3} \sin ϕ \sin 2 θ) .

Consequently, the measured intensity is a weighted sum of the Stokes parameters. Physically, S ₀ is the total incident intensity, S ₁ and S ₂ together describe the inclination angle of the polarization ellipse, and S ₃ and S ₀ together describe its ellipticity. The Stokes formalism is also capable of describing partially polarized, broadband radiation whereas a field based representation (e.g. the Jones formalism) is not. As stated previously, system responsiveness to the ellipticity of the incident field is not considered in this research. Consequently, the phase delay parameter ϕ will be assumed to be zero for each imaging channel. In other words, I is not a function of S ₃ and all measurable polarization effects will be due to linear polarization.

After setting ϕ = 0, one must still contend with polarizing optical elements that are less than ideal. That is to say, the preferred signal may be partially attenuated while, at the same time, the remaining signal is not completely suppressed. Consequently, the intensity equation is better represented with weights on the Stokes parameters that are determined via lab calibration.

I_{x} = a_{x 0} S_{0} + a_{x 1} S_{1} + a_{x 2} S_{2}

where x refers to the specific channel in question. For an ensemble of channels, it is useful to define vectors for intensity measurements, I, and Stokes parameters, S, such that I = A S and matrix A consists of the channel Stokes weighting parameters:

A = [\begin{matrix} a_{10} & \dots & a_{12} \\ ⋮ & ⋱ & ⋮ \\ a_{N 0} & \dots & a_{N 2} \end{matrix}]

1.2. A quick example

Building upon the previous section, the results of a simple laboratory experiment are presented in order to illustrate some issues regarding the registration of polarimetric imagery. The underlying assumption in many registration algorithms of the aforementioned “template matching” category (e.g. cross-correlation and variants) is that the images under test contain the same scene content but are translated and corrupted by noise. Other methods in this category, such as registration via mutual information, are perhaps more suited toward the registration of polarimetric imagery in that a statistical dependence between intensity values in each image is assumed rather than a direct correspondence in intensity. To illustrate these differences, the following example was contrived to show how a registration algorithm can utterly fail when strong polarization content violates its underlying assumptions.

Figure 1 contains the result of an attempt to register the three channels of an imaging polarimeter using cross-correlation. The target scene contains two fully polarized bars from a resolution target oriented such that the bars have orthogonal polarization states. The channels are spaced such that each is most sensitive to polarization at 0°, 60°, or −60°. The target polarization is rotated slightly (≈ 10°) out of the sensor reference frame so that each bar appears (to a greater or lesser extent) in each channel. False color and an arrow are used to accentuate the weak signal of the top bar in the 0° channel.

Clearly, these three channels do not contain the same scene content which is in violation of the assumptions built into the cross-correlation algorithm. As a consequence, the bottom bar in channel 0° has been misregistered to the top bars in the 60° and −60° channels. The Stokes parameter images of this scene (figure 2) provide an explanation of this behavior.

Fig. 1. The misregistered polarimetric images

Download Full Size | PDF

Fig. 2. The Stokes parameter images of the bar target

Download Full Size | PDF

Note that S ₁ and S ₂ can take on negative values; the reader should interpret the dark regions in the S ₁ and S ₂ images in this way. This behavior is unlike S ₀, which is strictly positive. The comparatively higher contrast in the S ₁ image occurs because the signal in S ₂ is much weaker than in S ₁. If the sensor were not sensitive to polarization then the captured image in each channel would be the S ₀ image.

Comparing the Stokes images in figure 2 with the intensity equation (1), the bottom bar in the 0° channel image is bright because its preferred polarization state is nearly parallel to the preferred polarization state of the channel. In this region I $I \approx \frac{1}{2} (S_{0} + S_{1})$ where S ₁ is a positive quantity. The small contribution from S ₂ in this channel is ignored. The top bar is orthogonally polarized to channel 0° and in this region the form of the intensity equation is the same but S ₁ is negative. A similar analysis could be conducted for the remaining channels but the point of this section has already been made: the phenomenology of polarization imagery is different than that of traditional intensity imagery and as such, the rules developed for image registration must be reevaluated in this new context.

2. Bound definition and data model

In this section, the Cramer-Rao bound is defined and the components required to calculate the bound are identified. These components are then specified for the specific problem of calculating the bound for a joint registration and Stokes parameter estimator.

2.1. Definition of the Cramer-Rao bound

Let Z be a vector of random variables that is parameterized by vector θ. Define θ̂ to be any unbiased estimate of these parameters and z to be one realization of Z. The Cramer-Rao in-equality provides the lower bound on this estimator’s error covariance matrix in terms of the Fisher Information Matrix (FIM), J:

Cov [\hat{θ}] \geq J^{- 1},

where

J = - E [\frac{\partial}{\partial θ} {(\frac{\partial}{\partial θ} L (θ, z))}^{T}]

and L(θ,z) is the data log-likelihood function [12]. Note that E[…] represents the expected value operation over Z. The minimum variance for the estimate of each parameter in θ is given by the diagonal elements of J ⁻¹. When J is not positive definite the Cramer-Rao bound is not defined.

The concept of a log-likelihood function may require some elaboration. A likelihood function describes how the probability density for a given measurement changes as θ changes. Hence, calculation of the likelihood function requires knowledge of the probability density function for Z, p _θ (z), at each measured z. The log-likelihood function is simply the natural log of the likelihood function:

L (θ, z) = \ln p_{θ} (z) .

2.2. Data model

The bound described in the previous section is defined in terms of a set of random variables Z, parameter vector, θ, and their corresponding log-likelihood function, L(θ,z). The purpose of this section is to define these items for the specific problem of generating polarimetric imagery from noisy, misregistered data.

In this scenario, the collected images are realizations of Z. Consistent with [9] and [10], the images to be registered are modeled as sampled, noisy versions of the continuously varying underlying target scene. Channel-to-channel point spread function variations are assumed to be minimal, sampling is at the Nyquist frequency or better, and noise in the scene is zero-mean, Gaussian and IID. The sampled coordinates in the (arbitrarily selected) first imaging channel, f ₁, are taken to be the reference by which the remaining channels, f ₁ to f_N, are specified. Mathematically, the collected image z_i for channel i is given by:

z_{i} (m_{n}) = f_{i} (m_{n}) + ε_{i} (m_{n}),

where m _n is the 2-dimensional coordinates of a given pixel in the image plane and ε_i(m _n) is a realization of the channel noise. The boundaries of m _n are determined by the region over which the collected images overlap. This overlap is assumed to be square (1 ≤ n ≤ p ² for a p× p image). In a departure from the cited research, the image content in each channel is determined by a shared Stokes parameter mapping rather than a common intensity mapping:

f_{1} (m_{n}) = \sum_{j = 0}^{2} a_{1 j} S_{j} (m_{n})

where v _i is the 2-dimensional translational misregistration between f _i and f ₁.

For each channel, there is a one-to-one correspondence between the 2-dimensional image and a 1-dimensional vector:

z_{i} = {[z_{i} (m_{1}) \dots z_{i} (m_{p^{2}})]}^{T}

which, in turn, allows for a very compact expression of the following results.

As previously stated, the per-pixel channel noise is zero mean, IID and Gaussian; consequently,the data log-likelihood function is given by:

L (θ, z) = \frac{- 1}{{2 σ}^{2}} \sum_{i = 1}^{N} {(z_{i} - f_{i})}^{T} (z_{i} - f_{i}) + ξ

where σ ₂ is the noise variance and ξ is a constant term that is not dependent on θ. It is clear from this equation that the log-likelihood is dependent on the region of overlap defined by the intersection of all images f_i. This region of intersection is, in turn, dependent on the relative misregistration between the images. Following the lead of Yetik, this overlap region is assumed to be constant. The efficacy of this assumption is greatest when the relative misregistration between images is small when compared to the dimensions of the overlap region; it is reasonable to assume that a multi-channel polarimeter will be operating in this regime.

Additionally, calculation of the Fisher information matrix requires an unambiguous ordering in the parameter vector θ. Since the goal is to place a bound on a joint estimator of the translational shifts between images and the values of the Stokes parameters at each pixel in the image, this parameter vector will be very large indeed:

θ = {[v_{2}^{T} \dots v_{N}^{T} S_{0}^{T} \dots S_{2}^{T}]}^{T},

where, analogous to the vectorized form of the channels mean images, f_i, each Stokes parameter vector is given by:

S_{i} = {[S_{i} (m_{1}) \dots S_{i} (m_{p^{2}})]}^{T} .

Scharf [12] provides an expression for the Fisher information matrix for m observations of an multivariate, normally distributed random vector with covariance matrix, R, and mean, f:

J_{ij} = m \frac{\partial f^{T}}{\partial θ_{i}} R^{- 1} \frac{\partial f}{\partial θ_{j}}

As shown in (10), there is one observation (i.e. m = 1) for each of N random images. Each image i has a covariance matrix R=σ ² I and mean f_i. It is straightforward to show that the FIM in this case is simply a sum of N instances of (13):

J_{ij} = \frac{1}{σ^{2}} \sum_{n = 1}^{N} \frac{\partial f_{n}^{T}}{\partial θ_{i}} \frac{\partial f_{n}}{\partial θ_{j}}

3. Fisher information for the joint estimator

The Fisher information matrix from (14) is developed in this section by partitioning J into submatrices. Matrix partitioning is used to exploit the inherent symmetry in the Fisher information matrix and to lend insight into the physical interpretation of J. In [13], Scharf and McWhorter show that a FIM in the form of (14) may be partitioned so that the subset FIMs of the parameter space may be considered individually. In this work, it is useful to partition the parameter space such the sub-FIM V describes correlations amongst the translational registration parameters and the sub-FIM S describes correlations between Stokes parameters. Define:

J = [\begin{matrix} V & H^{T} \\ H & S \end{matrix}]

where H, to use Scharf and McWhorter’s parlance, relates the intercorrelations between the registration and Stokes partitions in θ. In what follows, each of V, H, and S are described in detail.

The first matrix, V, is actually the FIM for an unbiased estimator of the misregistration between channels when the underlying Stokes images are known a priori. The inverse of V is the Cramer-Rao bound for the registration estimator under this “known prior” condition. In the two channel polarization insensitive case, V is the bound from Robinson and Milanfar. V is composed of (N−1)² submatrices of the form:

V_{ij} = {\begin{matrix} \frac{1}{σ^{2}} (\frac{\partial}{\partial v_{i}} f_{i}^{T}) {(\frac{\partial}{\partial v_{j}} f_{j}^{T})}^{T} if i = j, \\ 0_{2 \times 2} if i \neq j \end{matrix}

where 0 _2×2 is a 2×2 zero matrix. Though it does not appear explicitly in (15), it is also useful to define Ṽ with the same form as V but with entries:

\tilde{V_{ij}} = {\begin{matrix} 0_{2 \times 2} if i = j, \\ \frac{1}{σ^{2}} (\frac{\partial}{\partial v_{i}} f_{i}^{T}) {(\frac{\partial}{\partial v_{j}} f_{j}^{T})}^{T} if i \neq j \end{matrix}

The role of Ṽ will be demonstrated in the following section. For now, consider that the form of V demonstrates that errors in the estimates of the misregistration parameters are uncorrelated between channels when perfect knowledge of the underlying scene exists. When this perfect knowledge does not exist, Ṽ will be used to describe the nature of the correlation between channels.

In the lower right quadrant of (15) is the 3p ²×3p ² matrix S. Note that S is the FIM that would be used to estimate the bound on a unbiased Stokes estimator if the relative shifts between the collected images were known a priori. S divides into 3×3 submatrices corresponding to combinations of Stokes parameters (images).

S_{jk} = \frac{1}{σ^{2}} (I_{p^{2} \times p^{2}}) \sum_{i = 1}^{N} a_{ij} a_{ik}

where I _p²×p² is an identity matrix of rank p ². The block symmetry in S allows for substantial further simplification by defining the matrix:

C = [\begin{matrix} c_{00} & \dots & c_{02} \\ ⋮ & ⋱ & ⋮ \\ c_{20} & \dots & c_{22} \end{matrix}], c_{jk} = \sum_{i = 1}^{N} a_{ij} a_{ik}

in which case:

S = \frac{1}{σ^{2}} C \otimes I_{p^{2} \times p^{2}}

where ⊗ is the Kronecker product. This Kronecker product representation significantly simplifies inversion of the S matrix.

S^{- 1} = σ^{2} C^{- 1} \otimes I_{p^{2} \times p^{2}}

In other words, the inversion of the 3p ²×3p ² matrix S is solved by simply inverting the 3×3 matrix C.

Connecting the matrices V and S is the 3p ² ×2(N −1) matrix H. Physically, if H were the zero matrix then the bounds on the registration and Stokes parameters could be determined independently of each other. Proof of this statement is provided in section (4). In the process of defining H, it becomes obvious that this independence condition is never met. H is composed of 3×(N−1) readily identifiable submatrices:

H_{ij} = \frac{a_{ji}}{σ^{2}} {(\frac{\partial}{\partial v_{j}} f_{j}^{T})}^{T} .

Unless the collected image is constant everywhere (i.e. the derivative of the image is zero everywhere) then H is non-zero and the covariance bounds must be determined jointly. More on the relationship between H, V and Ṽ can be found in appendix A.

4. Bound derivation

The Fisher information matrix has been shown to contain [2(N−1)+3p ²]×[2(N −1)+3p ²] entries. To put the enormity of this matrix into perspective, consider the bound calculation for a four channel polarimeter used to estimate the first three Stokes parameters. Assuming a 512×512 overlap region, the corresponding FIM has approximately 6.18×10¹¹ entries. Inversion of such a large matrix is prohibitive. In this section, the partitioning of the Fisher information matrix from the previous section is exploited to make this inversion problem tractable.

4.1. Block matrix inversion

As previously discussed, the variance bound for each parameter in θ is given by the diagonal entries of J ⁻¹. Consequently, computational expense can be significantly reduced by avoiding the unnecessary calculation of many of the off diagonal terms in the inverse. A trivial rearrangement of the partitioned inverse of a block matrix in [18] provides the following:

B_{v} = {(V - H^{T} S^{- 1} H)}^{- 1}

and

B_{S} = S^{- 1} + S^{- 1} (H B_{v} H^{T}) S^{- 1}

where B_ν and B_S, which are submatrices of J⁻¹, identify the bounding covariance matrices on the shift parameters and Stokes images. As an aside, if H = 0 then B_ν = V ⁻¹ and B_S = S ⁻¹, thus demonstrating the physical interpretation of H put forth in the previous section.

Equations (23) and (24) readily demonstrate how uncertainty in the misregistration between channels impacts uncertainty in the Stokes image estimates and vice versa. Similar to the H =0 case, if the underlying Stokes images are known perfectly then the Cramer-Rao bound on the misregistration estimates would simply be B_ν = V ⁻¹. Likewise, if perfect knowledge of the registration parameters existed then B_S = S ⁻¹. Consequently, it is clear that B_ν and B_S are always larger than V ⁻¹ and S ⁻¹ in the absence of perfect knowledge.

4.2. Simplified registration parameter bound

Matrix B_ν is addressed first because it is required to calculate B_S. As preliminary work, note that:

H^{T} S^{- 1} H = σ^{2} H^{T} (C^{- 1} \otimes I_{p^{2} \times p^{2}}) H .

Consequently, H^T S ⁻¹ H can itself be partitioned into a matrix, D, such that:

D_{hk} = σ^{2} \sum_{i = 0}^{2} \sum_{j = 0}^{2} C_{ij}^{- 1} H_{ih}^{T} H_{jk} .

which provides the opportunity to profitably apply equations (50) and (51) from appendix A:

D_{hk} = {\begin{matrix} \sum_{i = 0}^{2} \sum_{j = 0}^{2} C_{ij}^{- 1} a_{hi} a_{hj} V_{hh} if h = k, \\ \sum_{i = 0}^{2} \sum_{j = 0}^{2} C_{ij}^{- 1} a_{hi} a_{kj} \tilde{V_{hk}} if h \neq k \end{matrix}

Furthermore, it is mathematically expedient to add and then subtract V into the definition of D such that:

D = - W_{v} • (V + \tilde{V}) + V

where • is the Hadamard product and all sensor dependent terms have been bundled into:

W_{v} = (I_{(N - 1) \times (N - 1)} - M C^{- T} M^{T}) \otimes 1_{2 \times 2}

The matrix M is formed of the 2 to N rows of A, the matrix of per channel Stokes weighting parameters from (3). The bound on the shift estimates can now be expressed concisely:

B_{v} = {(V - D)}^{- 1} = {[W_{v} • (V + \tilde{V})]}^{- 1}

As anticipated, Ṽ has indeed provided the constituents for the cross-covariance terms in B_ν.

4.3. Simplified Stokes parameter bound

The impetus behind much of the preceding section was to avoid having to manipulate an unwieldy Fisher information matrix directly while still achieving the Cramer-Rao bound on the registration parameters. The goal in this section is to do the same for the bound on the Stokes image estimators, B_S. We propose that the variance bound on the Stokes parameter estimate for any one pixel in the image is of less interest than the average bound across the image. In turn, this calculation is significantly simplified by applying properties of the trace and of the Kronecker product as shown in appendix B, where the bound is derived in detail. What follows are the highlights of this derivation.

Define B_Si, the covariance matrix for an estimator of Stokes image S _i, to be a submatrix of B_S. Equivalently, let Γ_i be the submatrix of S corresponding to the Stokes image S _i:

Γ_{i}^{- 1} = σ^{2} C_{ii}^{- 1} \otimes I_{p^{2} \times p^{2}}

and define:

Φ_{i}^{- 1} = σ^{2} C_{i}^{- 1} \otimes I_{p^{2} \times p^{2}}

where

C_{i}^{- 1} = [C_{i 0}^{- 1} \dots C_{i 2}^{- 1}]

then the average bound for Stokes image S _i is defined to be:

〈 B_{Si} 〉 = \frac{1}{p^{2}} tr (B_{Si})

Expanded out, the trace term is:

tr (B_{Si}) = tr (Γ_{i}^{- 1}) + tr [Φ_{i}^{- 1} (H B_{v} H^{T}) Φ_{i}^{- T}]

and because of the implicit symmetry of V and Ṽ:

tr [Φ_{i}^{- 1} (H B_{v} H^{T}) Φ_{i}^{- T}] = σ^{2} vec {[W_{Si} • (V + \tilde{V})]}^{T} vec (B_{v})

where

W_{Si} = M (C_{i}^{- T} C_{i}^{- 1}) M^{T} \otimes 1_{2 \times 2}

this result can be folded back into (34) to produce:

〈 B_{Si} 〉 = σ^{2} C_{ii}^{- 1} + \frac{σ^{2}}{p^{2}} tr [W_{Si} • (V + \tilde{V}) B_{v}]

Hence, substantial simplification of the bound calculation is achieved. The similarities between ⟨B_Si⟩ and B_ν are clear. The terms in B_S that are not contained in B_Si can be ignored because they do not influence the trace. Also, it is interesting to note how the sensor itself (realized by W_Si and W_ν in B_ν) plays opposing roles in equation (38) analogous to multiplication and division if this were a purely scalar case.

5. Biased estimators

A study of estimator bias in the registration of polarimetric imagery has not been addressed in the literature. However, there are several well known examples of estimator bias in traditional,polarization insensitive registration algorithms. In anticipation of this future work, what follows is a discussion of incorporating bias into the results from sections 4.3 and 4.2.

According to Scharf [12], the Cramer-Rao bound on the covariance matrix of a biased estimator is given by:

Cov [\hat{θ}] \geq Δ^{T} J^{- 1} Δ

where, J, is the Fisher information matrix (identical to the unbiased case), $\hat{θ}$ is the estimate of θ and

Δ = \frac{\partial}{\partial θ} E {[\hat{θ}]}^{T}

is the partial derivative of the expected value of the estimator. The expected value of $\hat{θ}$ is estimator specific (i.e. registration algorithm specific) and, furthermore, the CRLB of a biased estimator may be higher or lower than that of an unbiased estimator. Note that in the unbiased case, Δ = I and equation (39) reduces to equation (4). In what follows, a specific partition of Δ is defined via subscript. For instance,Δ_ν refers to a partition of Δ dealing strictly with estimates of the registration parameters whereas Δ_S₀ refers to estimates of the Stokes image S ₀.

The biased estimator bound for the registration parameters, B̃_ν, is easily achieved by combining (30) with (39):

{\tilde{B}}_{v} = Δ_{v}^{T} {(W_{v} • (V + \tilde{V}))}^{- 1} Δ_{v}

which is made possible by exchanging J ⁻¹ in (39) with its submatrix B_ν (from the unbiased case). In what appears to be a very small step, this equation shows that the CRLB can be decomposed into an scene specific part, V + Ṽ, a sensor specific part W_ν, and an estimator specific part, Δ_ν. This separation may not be complete in that the channel spacing effects V + Ṽ and, more than likely, bias in the estimator will be to some extent affected by scene polarimetric content.

Without the context of a specific registration algorithm, interpretation of the biased CRLB for a Stokes estimator is less straightforward. Combining (35) with (39) reveals that the trace on the biased Stokes bound is:

tr ({\tilde{B}}_{Si}) = tr (Δ_{Si}^{T} Γ_{i}^{- 1} Δ_{Si} + Δ_{Si}^{T} Φ_{i}^{- 1} (H B_{v} H^{T}) Φ_{i}^{- T} Δ_{Si})

which may be reduced somewhat by substituting Δ^T _Si Φ ⁻ _i in for Φ ⁻¹ _i in appendix B and noting that:

Φ_{i}^{- T} Δ_{Si} Δ_{Si}^{T} Φ_{i}^{- 1} = σ^{4} C_{i}^{- T} C_{i}^{- 1} \otimes Δ_{Si} Δ_{Si}^{T}

resulting in:

tr (\tilde{B_{Si}}) = tr (Δ_{Si}^{T} Γ_{i}^{- 1} Δ_{Si}) + σ^{4} tr (H^{T} (C_{i}^{- T} C_{i}^{- 1} \otimes Δ_{Si} Δ_{Si}^{T}) {HB}_{v})

which is, if not quite as simple as the registration parameter case, certainly more computationally efficient than directly inverting the FIM. In terms of interpretation, the trace is composed of the sum of two terms, the first of which depends only on the estimator and the sensor. Later, in section 6.2, it is shown that the equivalent to this term in the unbiased case dominates the solution to the problem of an optimum channel spacing. It will be left to future work to show that this is the case for specific biased estimators.

Though there are currently no studies of bias in the registration of polarimetric imagery, the interested reader will be well served by referring to [9] for an in depth discussion of bias in gradient based registration algorithms for polarization insensitive imagery.

6. Example bound calculations

In the examples that follow, an unbiased estimator is assumed throughout.

6.1. Bounds on polarization insensitive imagery

Bounds on polarization insensitive imagery are presented here as a special case of the results derived in section 4. This case serves as both a practical example of how the Cramer-Rao bound can be applied effectively and as a bridge between this work and the work of Robinson and Milanfar.

We seek the bound on a joint estimator of a scene and the registration parameters amongst N noisy samples of this image. Recall that S ₀ represents total scene intensity, therefore, S ₀ is the only Stokes image of interest. In this scenario, the matrix in equation (19) reduces to a scalar, C = N, because there is only one parameter per pixel to be estimated. The channel transmission coefficient, a _i0, is assumed to be unity for each channel. In that case, M = 1_(N−1)×1. Therefore, each of Wν and W _S₀ are the 2(N−1)×2(N−1) matrices:

{(W_{v})}_{ij} = {\begin{matrix} 1 - \frac{1}{N} if i = j, \\ - \frac{1}{N} if i \neq j \end{matrix}

and

W_{S_{0}} = \frac{σ^{2}}{N^{2}} \otimes 1_{2 (N - 1) \times 2 (N - 1) .}

To reiterate an important point, both V and Ṽ are image dependent. That being said, the overall bound’s behavior with increasing N can be predicted by the behavior of W_ν and W _S₀ for any image. Equation (45) shows that B_ν = 2V ⁻¹ in the two channel case and B_ν = V ⁻¹ in the limit of N. Essentially, the Ṽ matrix is suppressed by the 1/N terms in W_ν as N increases. These suppressed terms represent the decreasing influence of each individual image as the true underlying image emerges. In this limit, the bound parameters for each image asymptotically achieve the bound predicted by Robinson and Milanfar. Assuming a few images do not differ substantially from the rest of the ensemble (e.g. due to a large translational error or parallax) then the results follow a ${(1 - \frac{1}{N})}^{- 1}$ progression.

Likewise, the progression of the estimated image toward the true image can be ascertained from the behavior of ⟨B_S₀⟩. Again, we consider the endpoints. In the two channel case:

〈 B_{S_{0}} 〉 ∣_{N = 2} = \frac{σ^{2}}{p^{2}} + \frac{σ^{2}}{2}

and the large N case:

{〈 B_{S_{0}} 〉 ∣}_{N \to large} = \frac{σ^{2}}{p^{2}} \frac{2 (N - 1)}{N^{2}} + \frac{σ^{2}}{N}

It would appear that the behavior of the this bound is dominated by $\frac{σ^{2}}{N}$ . As shown in figure (3), the broad applicability of the preceding observations can be verified experimentally.

Twenty realizations of two test images, each with unique spatial characteristics, are generated with white Gaussian noise statistics and an uniformly distributed random shift error between ±5 pixels in any direction. The scene average signal-to-noise ratio is 3. The bound on the average pixel value estimates,⟨B_S₀⟩, and the average misregistration bound i.e. tr(B_ν)/(N −1)) are normalized and plotted against the number of frames, N. Normalization is carried out to illustrate the 1/N behavior of the intensity variance bound and the ${(1 - \frac{1}{N})}^{- 1}$ behavior of the shift estimator bound. For comparison, each of these predicted curves are shown as a red dashed line underneath the data.

Fig. 3. Polarization insensitive imagery and its estimation bounds

Download Full Size | PDF

The expected behavior of the bound with increasing N is confirmed for these disparate examples. The deviation of average registration parameter bound in the G.G. Stokes image from the expected trend demonstrates the slight influence of changes in image overlap area on the results.

6.2. The four channel, three Stokes case

Unambiguous determination of a linear polarization state requires three or more polarization imaging channels. In the following section, we demonstrate that a minimum of four channels is required for the joint registration and Stokes parameter estimation problem. In this section, example Cramer-Rao bounds are calculated as a function of polarization channel orientation for three distinct imaging scenarios. The purpose of this exercise is to illustrate the effects of channel orientation on the Cramer-Rao bound and, in this context, to describe the optimum channel configuration.

Along with scene content, the orientation of each polarization channel can be expected to affect the bound of the estimated Stokes parameters. As discussed in the introduction, Tyo has shown that an optimum combination of polarization channels exists in a principal components sense for monochromatic, fully polarized radiation that is uniformly distributed over all states of linear polarization (e.g. not image specific) [16]. Tyo’s optimum combination of channels, in the four channel case, corresponds to a spacing of 45°. Since his work was brought about from different assumptions and desired outcomes, it is of interest to compare this result to the optimum configuration as defined by the Cramer-Rao bound.

Examples from three real-world polarization imagers are selected to represent the diversity of polarization imaging scenarios. The first case is the bar target from figure 2 which was collected in the lab using a three channel imaging polarimeter in the visible regime. The ultraviolet astronomical data of galaxy Markarian 3 comes from the (now decommissioned) Faint Object Camera that was aboard the Hubble Space telescope until 2002. Finally, the machine vision example of a printed circuit board is taken from laboratory data collected in the visible regime (using a different polarimeter than in the the bar target case). The Stokes images for the Markarian 3 and PC board cases are shown in the following figures.

Fig. 4. The Markarian 3 test scene

Download Full Size | PDF

Fig. 5. The printed circuit board test scene

Download Full Size | PDF

The bound characteristics are determined by varying the angular spacing between polarization elements in one degree increments. The separation angle between channels is measured with respect to the first channel, which is held fixed. Each channel is evenly spaced and, in the plots that follow, this channel spacing is used to index the bound results. For example, the reader can infer that a channel spacing of 5° corresponds to an orientation of 5°, 10°, and 15° for the second, third, and fourth channels with respect to the first. Figure (6) shows the average bound on the Stokes parameters for each of the three test cases. Each plot is normalized by σ ⁻² since it may be divided out of equation (38).

As should be expected, the largest variances in Stokes image estimates occur when the polarization channels are closely spaced. Somewhat less expected is the observation that scene content appears to have little influence on individual results for the Stokes estimator in terms of the overall trends. Though there are small scale differences, the Stokes parameter bounds for all images closely follow the evolution of C ⁻¹ _ii with increasing channel spacing. This result is significant for two reasons. First, there appears to be global agreement as to which channel spacing is most preferable, at least for the test cases sampled here. Second, it would appear that each B_Si is very well approximated by direct interrogation of S ₋₁, at least in an average sense. In other words, the Stokes parameter bounds are effectively scene independent with respect to channel spacing. Recall that S ⁻¹ is, by itself, the Cramer-Rao bound on the Stokes estimates when perfect knowledge of the misregistration parameters is available, consequently, these observations apply equally to that scenario.

The bound for the S ₁ and S ₂ estimates meet at Tyo’s predicted optimal channel spacing of 45°, however, S ₁ has a minimum bound at a somewhat closer channel spacing. Consequently, there is no global minimum bound for all Stokes parameters. Rather, the 45° spacing is the point where there is no preferred parameter. This point is significant because, as stated previously, the Stokes parameters are defined with respect to some coordinate system and, as this system changes in relation to the target (e.g. through camera motion) then scene content can shift between S ₁ and S ₂. With this qualifier, a 45° channel spacing can be said to be optimal for a four channel system.

Figure (6) also contains a plot of the average registration parameter bound for each of the three channels. As before, σ ² has been normalized out. Unlike the bounds on the Stokes parameters, these bounds depend both on scene content and channel orientation. Consistent with the Robinson and Milanfar analysis of V in the polarization insensitive two channel case, the difference in bound magnitude correlates with the amount of high spatial frequency content in the test images. The PC board image, with its multi-faceted geometric features, generates the lowest bound while the galaxy Markarian 3, which has largely diffuse features, generates the highest bound.

6.3. The three channel, three Stokes case

The three channel case is important because it is the minimum number of channels required to completely describe linear polarization (i.e. S ₀, S ₁, S ₂). The interested reader may easily verify this three channel prerequisite for themselves by attempting to calculate B_S with only two channels. Three channel polarimeters are also common in practice; all examples in the previous section were originally collected by different three channel systems. In this section, the three channel polarimeter is examined from a joint estimation perspective.

First, consider the bound matrix B_ν for some combination of three polarimeter channels represented by weight matrix A. Immediately, a mathematical difficulty arises:

W_{v} = 0_{4 \times 4}

in which case equation (30) is uninvertible or, in other words, the CRLB for the joint estimator (biased or unbiased) is infinity.

Fig. 6. Bound results for the four channel case

Download Full Size | PDF

At first glance, the result in (49) appears to be inconsistent with the fact that, three channel polarimeters are routinely employed in practice. The key difference, however, is that the data from these systems are not reduced using joint estimation algorithms. Rather, shift estimation and Stokes estimation are treated as separate problems. In other words, external measurement of the registration parameters is always required in the three channel case.

In terrestrial remote sensing applications, it is often the case that much of the scene is dominated by weakly polarized content (an obvious exception is a target scene composed over the water). In this case, the channel-to-channel registration problem for each pair of images is well approximated by the two channel polarization insensitive case described in section (6.1).

Assuming external measurement of the registration parameters by whatever means, there is likely a regime where S ⁻¹ will dominate equation (24), analogous to the four channel joint estimation examples. Following the prescribed method in the previous section, the diagonal terms in J ⁻¹ _S will be jointly minimized for S ₁ and S ₁ when the angular separation between channels is 60°. In this sense, the Cramer-Rao bound for the Stokes estimates can be compared across the three and four channel cases. Note from the table below that S ₀ follows the $\frac{1}{N}$ pattern established for the polarization insensitive case while S ₁ and S ₂ go by $\frac{2}{N}$ .

7. Conclusion

In this paper, the Cramer-Rao lower bound is derived for the problem of jointly estimating Stokes images and random misregistration parameters in the presence of channel noise. Direct inversion of the prohibitively large Fisher information matrix is bypassed by applying matrix theory to express the resulting bounds in a tractable form. The effects of estimator bias on the bound are discussed up to the point where it becomes necessary to specify a specific estimator. The bound is then used to describe three and four channel polarimetric imaging systems as well as the special case of registering N polarization insensitive images. The bound itself is a useful evaluation metric because it incorporates both sensor and estimation algorithm effects and can be used to describe these effects theoretically in a way that pure simulation can not. In addition, the following general conclusions can be drawn from the results:

Table 1. Average bounds on Stokes parameter estimates in the S ⁻¹ dominant regime

View Table

• The minimum achievable estimator variance is scene and channel orientation dependent in the unbiased case. The estimator itself also effects the bound in the biased case.

• The bounds on the registration and Stokes parameters are dependent.

Furthermore, specific conclusions can be drawn from the section 6 case studies:

• In the polarization insensitive case, the Cramer-Rao bound derived by Robinson and Milanfar is shown to be the asymptotic limit of the joint estimation bound as the number of frames approach infinity.

• In the three channel case, the Cramer-Rao bound for any joint estimator is infinity. Consequently, the registration parameters and Stokes images must be estimated separately.

• The form of the bound suggests that the optimum channel spacing is 60° and 45° respectively in the three and four channel cases. Optimum, in this context, refers to a joint minimum bound for S ₁ and S ₂. These arrangements do not guarantee that the bound on the registration estimator is minimized.

This paper opens the possibility for substantial future work. This research could follow the lead of Robinson and Milanfar [9] and address the problem of registration estimator bias for a specific algorithm as it applies to polarimetric imagery. Additionally, Yetik and Nehorai’s work could be expanded to include a polarimetric bound for other types of image deformation (e.g. rotation, shearing, etc.) or bounding estimator error for polarimetric registration algorithms that use feature extraction techniques. Additionally, a search could be conduced for a joint estimator that meets the Cramer-Rao bound. Finally, the idea of representing bounds on image registration as a realization of underlying parameters (in this case, as the weighted sum of the Stokes parameters) may be used to derive bounds for registering multi-modal images if a similar underlying relationship exists.

A. Appendix: More on the V and Ṽ matrices

The matrices V and Ṽ play a central role in the bound calculation. In addition, the V matrix provides the connection between the previous work by Robinson and Milanfar [9] and the generalization provided here. Consequently, some useful properties of these matrices are developed in this section.

First, there is an important connection between the V, Ṽ , and H_ij:

{(H_{ij})}^{T} H_{kj} = \frac{a_{ji} a_{jk}}{σ^{4}} {(\frac{\partial}{\partial v_{j}} f_{j}^{T}) (\frac{\partial}{\partial v_{j}} f_{j}^{T})}^{T} = \frac{a_{ji} a_{jk}}{σ^{2}} V_{jj}

and

{(H_{lj})}^{T} H_{ki} = \frac{a_{jl} a_{ik}}{σ^{4}} {(\frac{\partial}{\partial v_{j}} f_{j}^{T}) (\frac{\partial}{\partial v_{i}} f_{i}^{T})}^{T} = \frac{a_{jl} a_{ik}}{σ^{2}} {\tilde{V}}_{ji}

for all i≠ j.

Second, each v_i is established in the plane of the image. Therefore, a common direction vector, x, for all relevant derivatives can be defined via the chain rule:

\frac{\partial}{\partial v_{i}} f_{i}^{T} = \frac{\partial}{\partial x} f_{i}^{T} .

Consequently, each entry in V and Ṽ is defined in a common coordinate system.

B. Appendix: Detailed derivation of the Stokes parameter bound

The average bound can be determined by calculating the trace of equation (24) via application of the following properties of the trace and the Kronecker product. For the trace [19]:

tr (A + G) = tr (A) + tr (G)

tr (CGD) = tr (DCG)

tr (EF) = vec {(E^{T})}^{T} vec (F)

where A and G are square matrices and C, D, E, and F are any matrices such that CGD and EF are square matrices. Operator tr represents the trace and, for any matrix A, vec(A) is defined to be an ordered stack of the columns of A. For the Kronecker product (also from [19]):

vec (AGC) = (C^{T} \otimes A) vec (G)

(A \otimes C) (G \otimes D) = (AG \otimes CD)

both of which always hold whenever AGC, AG, and CD are defined. Finally, note that, in contrast to regular matrix multiplication:

{(A \otimes G)}^{T} = A^{T} \otimes G^{T}

{(A • G)}^{T} = A^{T} • G^{T}

Armed with the preceding formulae, derivation of the average bound for each Stokes image is straightforward. Starting with the second term in equation (35):

tr [Φ_{i}^{- 1} ({HB}_{v} H^{T}) Φ_{i}^{- T}] = tr [Φ_{i}^{- T} Φ_{i}^{- 1} (H B_{v} H^{T})]

Fortunately, expression H^T (Φ^−T _iΦ⁻¹ _i)H is identical in form to equation (25) and can therefore be simplified in the same manner:

H^{T} (Φ_{i}^{- T} Φ_{i}^{- 1}) H = σ^{4} H^{T} (C_{i}^{- T} C_{i}^{- 1} \otimes I_{p^{2} \times p^{2}}) H

where

W_{Si} = M (C_{i}^{- T} C_{i}^{- 1}) M^{T} \otimes 1_{2 \times 2}

Invoking the implicit symmetry of W_Si, V, and Ṽ:

tr [Φ_{i}^{- 1} (H B_{v} H^{T}) Φ_{i}^{- T}] = σ^{2} vec {[W_{Si} • (V + \tilde{V})]}^{T} vec (B_{v})

This result can be folded back into (34) to produce:

〈 B_{Si} 〉 = \frac{1}{p^{2}} [tr (Γ_{i}^{- 1})] + \frac{σ^{2}}{p^{2}} tr [W_{Si} • (V + \tilde{V}) B_{v}]

which is identical to equation (38).

Acknowledgments

Thanks to R. Martin for gently suggesting this course of research by including it in my qualifying exam, to M. Oxley for introducing the Kronecker product, and to S. Cain for his guidance and perpetual encouragement. Calibration data for the HST imagery was graciously provided by M. Kishimoto. Some of the data presented in this paper were obtained from the Multimission Archive at the Space Telescope Science Institute (MAST). STScI is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS5-26555. The opinions and views expressed by the author are not necessarily those of the United States Air Force or the Department of Defense.

References and links

1. M. Kishimoto, L. E. Kay, R. Antonucci, T. W. Hurt, R. D. Cohen, and J. H. Krolik, “Ultraviolet Imaging Polarimetry of the Seyfert 2 Galaxy Markarian 3,” apj 565, 155–162 (2002).

2. W. G. Egan, “Polarization in remote sensing,” in Polarization and remote sensing, W. G. Egan, ed., Proc. SPIE 1747, 2–48 (1992).

3. S. Lin, K. Yemelyanov, E. Pugh Jr, and N. Engheta, “Separation and contrast enhancement of overlapping cast shadow components using polarization,” Opt. Express 14, 7099–7108 (2006). [CrossRef] [PubMed]

4. C. M. Persons, D. B. Chenault, M. W. Jones, K. D. Spradley, M. G. Gulley, and C. A. Farlow, “Automated registration of polarimetric imagery using Fourier transform techniques,” in Polarization Measurement, Analysis, and Applications V., Goldstein, Dennis H.; Chenault, David B., eds., Proc. SPIE 4819, 107–117 (2002). [CrossRef]

5. X. Wang, S. Yang, J. Ma, and Y. Qiao, “Automated registration of polarimetric image using wavelet transform techniques,” vol. 5832, pp. 695–702 (SPIE, 2005).

6. D. A. LeMaster, “A Comparison of Template Matching Registration Methods for Polarimetric Imagery,” in Aerospace Conference, 2008 IEEE, vol. 1, pp. 1–9 (2008).

7. S. Guyot, M. Anastasiadou, E. Deléchelle, and A. De Martino, “Registration scheme suitable to Mueller matrix imaging for biomedical applications,” Opt. Express 15, 7393–7400(2007). [CrossRef] [PubMed]

8. S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory (Prentice Hall, Englewood Cliffs, New Jersey, 1993).

9. D. Robinson and P. Milanfar, “Fundamental performance limits in image registration.” IEEE Trans. Image Process 13, 1185–1199 (2004). [CrossRef] [PubMed]

10. A. Yetik and I.S. Nehorai, “Performance bounds on image registration,” IEEE Trans. Signal Process 54, 1737–1749 (May 2006). [CrossRef]

11. B. Zitova and J. Flusser, “Image registration methods: a survey,” Image and Vision Computing 21, 977–1000 (2003). [CrossRef]

12. L. L. Scharf, Statistical Signal Processing: detection, estimation, and time series analysis (Addison-Wesley, Reading, Massachusetts, 1991).

13. L. L. Scharf and L. T. McWhorter, “Geometry of the Cramer-Rao bound” Signal Processing 31, 301–311 (1993). [CrossRef]

14. H. Van Trees, Detection, estimation, and modulation theory. Part 1: detection, estimation, and linear modulation theory (Wiley, New York, 2001). [CrossRef]

15. J. Tyo, “Design of optimal polarimeters: maximization of signal-to-noise ratio and minimization of systematic error,” Appl. Opt 41, 619–630 (2002). [CrossRef] [PubMed]

16. J. Tyo, “Optimum linear combination strategy for an N-channel polarization-sensitive imaging or vision system,” J. Opt. Soc. Am. A 15, 359–366 (1998). [CrossRef]

17. E. Collett, Polarized Light: Fundamentals and Applications (Marcel Dekker, Inc., New York, 1992).

18. M. Healy, Matrices for Statistics (Oxford University Press, USA, 1986).

19. A. Graham, Kronecker Products and Matrix Calculus With Applications. (Wiley, New York, 1982).

Parameter	3-channel case	4-channel case
S ₀	$\frac{1}{3} σ^{2}$	$\frac{1}{4} σ^{2}$
S ₁	$\frac{2}{3} σ^{2}$	$\frac{1}{2} σ^{2}$
S ₂	$\frac{2}{3} σ^{2}$	$\frac{1}{2} σ^{2}$

Fundamental estimation bounds for polarimetric imagery

Abstract

1. Introduction

1.1. Stokes vector formulation

1.2. A quick example

2. Bound definition and data model

2.1. Definition of the Cramer-Rao bound

2.2. Data model

3. Fisher information for the joint estimator

4. Bound derivation

4.1. Block matrix inversion

4.2. Simplified registration parameter bound

4.3. Simplified Stokes parameter bound

5. Biased estimators

6. Example bound calculations

6.1. Bounds on polarization insensitive imagery

6.2. The four channel, three Stokes case

6.3. The three channel, three Stokes case

7. Conclusion

A. Appendix: More on the V and Ṽ matrices

B. Appendix: Detailed derivation of the Stokes parameter bound

Acknowledgments

References and links

Cited By

Figures (6)

Tables (1)

Equations (73)

Optics Express