An Ultra Fast Kolmogorov Phase Screen Generator Suitable For Parallel Implementation

Vinay Sriram; David Kearney

doi:10.1364/OE.15.013709

1. Background

A phase screen generator has been defined as a

“computer program which creates random arrays of phase values on a grid of sample points which have the same statistics as the [wavefront phases distorted due to atmospheric] turbulence …” [1]

Factorization methods of phase screen generation ([2], p1428 and [1], p105) have computational complexities which increase exponentially with the size of the screen. For this reason more computationally efficient algorithms have been developed. A popular method is that of Harding et al. [3]. It begins with the factorization method to generate a coarse phase screen and then improves the resolution of that phase screen by interpolating between the points of the base screen. The interpolation step has only linear complexity in the size of the final screen. However even the method of Harding et al. is very computationally expensive. On a Pentium 4 (2.8GHz) dual core processor running Matlab version 7 (Release 14), using this method it takes 20 milliseconds to generate a 200×200 phase screen with the interpolation step accounting for 90% of the execution time. Some applications require an enormous number of phase screens to be generated. For example, in the simulation of IR scenes [4] a phase screen is required for each pixel in the simulated image. So for a modest image of 100,000 pixels it will take 2,000 seconds to generate all the phase screens required using this method. So there is a strong motivation to improve the computational performance of phase screen generation.

In this paper we describe a transformation of the interpolation step in Harding’s method that reduces the software execution time by a factor of 60 and opens the way for a further 20 times acceleration with the use of parallel hardware. The paper is organized as follows. In the next section we briefly review the factorization method for generating a coarse phase screen and the interpolation step which is the key to the speedup of Harding’s algorithm. In section 3 we describe the transformations that we have applied to simplify the computation of the interpolation step. In section 4 we compare the structure function generated using our new method with that obtained by Harding’s method and the ideal structure function.

2. Phase screen generation by factorization and interpolation

The factorization method begins with the theoretical structure function under the Kolmogorov model of turbulence. From this a covariance function is constructed ([5], p1773). The covariance function of the phase ψ(r̰) is sampled to produce an N × N array of values. This is then rearranged as a vector Φ of length N ². The product C_S = ΦΦ^T, an N ² × N ² array, is treated as the covariance for a subsequent Karhunen-Loéve decomposition. The matrix is factorized by Singular Value Decomposition

C_{S} = U Λ U,

to yield a set of N ² eigenvectors U = {U ₁,U ₂,…,U _N²} which serve as orthogonal basis functions. The corresponding eigenvalues λ_i are the variances associated with the U_i. The square roots of these are then multiplied by uncorrelated Gaussian random numbers, z_i, with 0 mean and variance of 1 to produce the weights x_i = λ_iz_i on the U_i. The phase screen is then

{\hat{Φ}}_{l} = U \underset{˜}{x},

with the property that

E [{\hat{Φ}}_{l} {\hat{Φ}}_{l}^{T}] = E [U {\underset{˜}{x} \underset{˜}{x}}^{T} U^{T}] = UE [{\underset{˜}{x} \underset{˜}{x}}^{T}] U^{T} = U Λ U = C_{S} .

The interpolation step of Harding’s algorithm starts by adding intermediate points to the coarse phase screen and setting them to zero. We denote this upsampled zero-filled phase screen as Φ_h0. The correct statistical properties are restored to the phase screen Φ_h0 in two stages. In the first stage, Φ_h0 is convolved with an interpolator matrix. To the output of this convolution is added a Gaussian random number with a 0 mean and variance of 1 multiplied by a constant residual. These operations are described in Eq. 4

Φ_{h 1} = Φ_{h 0} \otimes I_{4 x 4} + ε_{res 1} \times x_{1},

where I _4x4 is the interpolator matrix, x ₁ is a matrix of Gaussian random numbers with 0 mean and variance of 1 and ε _res1 is a residual matrix. The term ε ₁ = ε _res1 × x ₁ is the same as the term ε_i in Eq. 18 of Harding et al. [3]. Harding et al. refer to this as “a random displacement epsilon”. The constant part of this quantity is ε _res1 and the random part is x ₁. The value of I _4x4 is given in Eq. 27 of [3]. Also note that since all elements in the Eq. 4 are matrices we have dropped the bold nomenclature for matrices.

In the second stage the convolution and addition is repeated:

Φ_{h 2} = Φ_{h 1} \otimes I_{4 x 4 rotated} + ε_{res 2} \times x_{2},

where x ₂ is a second matrix of Gaussian random numbers with 0 mean and variance of 1 and ε _res2 is another residual matrix. The residual matrices are constant; the derivation for both of the residual matrices is given in section 3 and section 4 of ([3], p.2163), respectively. The I _4x4rotated which is just the I _4x4 matrix rotated by 45° is:

I_{4 x 4 rotated} = [\begin{matrix} 0 & 0 & 0 & - 0.0017 & 0 & 0 & 0 \\ 0 & 0 & - 0.0341 & 0 & - 0.0341 & 0 & 0 \\ 0 & - 0.0341 & 0 & 0.3198 & 0 & - 0.0341 & 0 \\ - 0.0017 & 0 & 0.3198 & 1 & 0.3198 & 0 & - 0.0017 \\ 0 & - 0.0341 & 0 & 0.3198 & 0 & - 0.0341 & 0 \\ 0 & 0 & - 0.0341 & 0 & - 0.0341 & 0 & 0 \\ 0 & 0 & 0 & - 0.0017 & 0 & 0 & 0 \end{matrix}] .

3. Transformed algorithm

In this section we show how the application of a well known property of the convolution operator and properties of weighted sums of Gaussian random numbers can be applied to simplify the computation of the interpolation step of Harding’s algorithm.

We first combine Eq. 4 and Eq. 5:

Φ_{h 2} = (Φ_{h 0} \otimes I_{4 x 4} + ε_{res 1} \times x_{1}) \otimes I_{4 x 4 rotated} + ε_{res 2} \times x_{2}

We now note that convolution, being a linear operation, distributes over addition [6]. So Eq. 6 can be rearranged as

Φ_{h 2} = (Φ_{h 0} \otimes I_{4 x 4}) \otimes I_{4 x 4 rotated} + ε_{res 1} (x_{1} \otimes I_{4 x 4 rotated}) + ε_{res 2} \times x_{2}

In Eq. 7, the first term on the right-hand side need only be computed once at the same time as the original coarse phase screen Φ_h0 is created. The term x ₁ ⊗ I _4x4rotated might appear to require a complete convolution of the whole refined phase screen but this is not the case.

Recall that if y is a weighted sum of uncorrelated Gaussian random numbers, g having a mean of μ_g and variance of σ ² _g,

y = \sum_{i = 1}^{n} α_{i} g_{i},

then the mean of y (see [7]) is given by

μ_{y} = \sum_{i = 1}^{n} α_{i} μ_{g_{i}}

and the variance of y (see [7]) is given by

σ_{y}^{2} = \sum_{i = 1}^{n} α_{i}^{2} σ_{gi}^{2} .

The convolution x ₁ ⊗ I _4x4rotated can be expanded and expressed as a weighted sum of uncorrelated Gaussian random numbers with 0 mean and variance of 1. To explain we represent the convolution kernel I _4x4rotated as a sequence of coefficients α_i, where i ranges from 1 to 49 and examine the elements of one convolution operation x _1(4,4) ⊗ α part of the overall convolution x ₁ ⊗ I _4x4rotated.

x_{1 (4,4)} \otimes α = x_{1 (1,1)} α_{1} + x_{1 (1,2)} α_{2} + x_{1 (1,3)} α_{3} + \dots .. + x_{1 (7,7)} α_{49}

This weighted sum, as explained in Eq. 9 above, can be replaced with a single Gaussian random number with a mean given by ∑ⁿ _i=1 α_iμ_i and a variance given by ∑ⁿ _i=1 α ² _i σ ² _i. Since the mean of Gaussian random numbers in the weighted sum was 0, the new Gaussian random number would have a 0 mean as well. The variance of Gaussian random numbers of the weighted sum was 1 and so the variance of the new Gaussian random number is the square of the sum of the weights of the convolution, i.e. ∑⁴⁹ _i=1 I ² _{4x4rotated_i}. Thus x ₁ ⊗ I4x _4rotated may be replaced by a matrix x ₃ of Gaussian random numbers with 0 mean and a variance of ∑⁴⁹ _i=1 I ² _{4x4rotated_i}. Therefore Eq. 7 now reduces to

Φ_{h 2} = (Φ_{h 0} \otimes I_{4 x 4}) \otimes I_{4 x 4 rotated} + ε_{res 1} \times x_{3} + ε_{res 2} \times x_{2}

In Eq. 11 further performance gains can be achieved by combining the remaining two matrices, ε _res1 × x ₃ and ε _res2 × x ₂, of Gaussian random numbers. The weighted sum of two Gaussian random numbers is another Gaussian random number with a mean equal to the weighted sum of the means of the two added Gaussian random numbers and a variance equal to the sum of the variances of the two Gaussian random numbers [7]. Thus the last two terms of Eq. 11 can be replaced by a single matrix x of Gaussian random numbers with 0 mean and variance of ${(ε_{res 1})}^{2} \sum_{i = 1}^{49} I_{4 x 4 rotate d_{i}}^{2} + {(ε_{res 2})}^{2}$ . The new equation for phase screen generation is

Φ_{h 2} = (Φ_{h 0} \otimes I_{4 x 4}) \otimes I_{4 x 4 rotated} + x .

Not only have we saved on the convolution operations for every phase screen but we now only need to generate one set of Gaussian random numbers for each point on the final phase screen as opposed to two per every point in the previous algorithm.

4. Validation

An established means of testing the accuracy of generated phase screens is through the comparison of the structure function derived from the generated phase screens with the ideal structure function [3]. The phase structure function is given by

D_{Φ} (r) = 2 [B_{Φ} (0) - B_{Φ} (r)],

where

B_{Φ} (r) = 〈 (Φ (x) - Φ (x - r)) 〉

is the covariance. The ideal structure function is

D_{Φ}^{ideal} (r) = 6.88 \times {(r)}^{\frac{5}{3}}

r = \frac{r_{actual}}{r_{0}}, [3]

where r_actual is the actual displacement between points in the phase screen and r ₀ is the Fried parameter [2].

In Fig. 1 we show a slice of the ideal phase structure function compared to the average of ensemble of 10,000 structure functions computed using our transformed version of Harding’s algorithm. The lower line is a slice of the ideal function, while the top line is a slice of the structure function generated using our transformed algorithm. Because our simulated structure function is very close to the ideal, we show in Fig. 1b the difference between them as a fraction of the ideal function. We do not attribute all the error of our simulated results to the transformations we have made to Harding’s algorithm. Rather we attribute the errors reported in Fig. 1b to approximations introduced in Harding’s original algorithm. Although Harding et al. do not present their results in the same form as Fig. 1b, our simulations indicate that there is no significant difference between our transformed algorithm and Harding’s original algorithm.

Fig. 1. Comparison of the structure function derived from the transformed algorithm as compared to the ideal function. The x axis is the distance between points in the phase screen non dimensionalized by the Fried parameter. (a) shows the ideal structure function (lower trace) as compared with the simulated structure function (upper trace). (b) shows the relative error between the ideal and simulated structure functions as a fraction of the ideal. Note that the errors reported in (b) are attributed to the original approximations made by Harding et al. and not to the transformations reported in this paper.

Download Full Size | PDF

The assertion that our transformed algorithm produces substantially the same results as the original Harding algorithm can be verified by comparing the reported variance of computed structure function with that reported by Harding et al. In Fig. 2, we show the error between the ideal function and an ensemble of 10,000 generated structure functions expressed in terms of the standard deviation of the ensemble of structure functions. Fig. 2 shows that there is no substantial difference in the normalized error between our transformed algorithm and Harding’s original algorithm.

The standard deviation of the ensemble of structure functions computed is given by

σ (r) = {\frac{1}{(N - 1)} \sum_{i} {[D_{ϕ (i)} (r) - \frac{1}{N} D_{ϕ (i)} (r)]}^{2}}^{\frac{1}{2}}

Error (standard deviation) = \frac{D_{ϕ}^{ideal} (r) - [\frac{1}{N}] \sum D_{ϕ i} (r)}{σ (r)} .

Fig. 2. Number of standard deviations from the ideal structure function. (Lower trace) Harding’s original algorithm, (upper trace) transformed algorithm.

Download Full Size | PDF

5. Conclusion

In this paper we have been able to reduce the software computation time of Harding’s algorithm by 60 times through a transformation of the key computationally intensive steps. The remaining speed limitation (random number generation in software) can be resolved through the use of a parallel hardware (FPGA-based) random number generator [8]. For IR scene generation, if each image frame requires 100,000 phase screens of 200×200 resolution, the reduction of computation time per scene is from 2,000 seconds to 33 seconds. The addition of parallel hardware random number generation reduces this to 1.6 seconds.

References and links

1. M. C. Roggemann and B. Welsh, Imaging Through Turbulence (CRC Press, Boca Raton, Fla., 1996).

2. D. L. Fried, “Statistics of a Geometric Representation of Wavefront Distortion,” J. Opt. Soc. Am. 55, 1427–1435 (1965). [CrossRef]

3. C. M. Harding, R. A. Johnston, and R. G. Lane, “Fast Simulation of a Kolmogorov Phase Screen,” Appl. Opt. 38, 2161–2170 (1999). [CrossRef]

4. V. Sriram and D. Kearney, “High Speed High Fidelity Infrared Scene Simulation using Reconfigurable Computing,” 2006 IEEE International Conference on Field Programmable Logic and Applications (FPL), Spain, IEEE Press.

5. E. P. Wallner, “Optimal wave-front correction using slope measurements,” J. Opt. Soc. Am. 73, 1771–1776 (1983). [CrossRef]

6. W. Rugh, “Linear Time-Invariant Systems and Convolution: An Interactive Lecture” http://www.jhu.edu/signals/lecture1/main.html#spot2.

7. E. Weisstein, “Normal Sum Distribution,” From MathWorld-A Wolfram Web Resource. http://mathworld.wolfram.com/NormalSumDistribution.html.

8. V. Sriram and D. Kearney, “A high throughput area time efficient uniform random number generator based on the TT800 algorithm,” In Proc of IEEE International Conference on Field Programmable Logic and Applications, Amsterdam, Netherlands, August 2007, IEEE Press.

An Ultra Fast Kolmogorov Phase Screen Generator Suitable For Parallel Implementation

Abstract

1. Background

2. Phase screen generation by factorization and interpolation

3. Transformed algorithm

4. Validation

5. Conclusion

References and links

Cited By

Figures (2)

Equations (20)

Optics Express