A fast, robust pattern recognition system for low light level image registration and its application to retinal imaging

A. R. Wade; F. W. Fitzke

doi:10.1364/OE.3.000190

1. Introduction

Much of our understanding of the underlying mechanisms of retinal abnormalities has come from investigations of post-mortem material. These have revealed details about the basis of pathologic changes that lead to loss of vision. However, post-mortem investigations are limited in providing information about the functional consequences of pathologies and it is difficult to determine the natural history of change or the effects of interventions. Two recent advances in ophthalmoscopic imaging based on new technological developments allow investigations that can be carried out in the living human eye and can contribute to our understanding of pathologies in a complementary way to the post-mortem studies.

One technique takes advantage of the intrinsic autofluorescence of lipofuscin. This material accumulates in the retinal pigment epithelium and has been implicated in the pathologies associated with hereditary dystrophies and age related macular degeneration. Spectrophotometric measurements in the living eye have indicated that a corresponding autofluorescence can be detected [1,2]. We have developed a method using a confocal laser scanning ophthalmoscope for low light level Imaging Fundus Autofluorescence (IFA). This technique allows non-invasive, high spatial resolution imaging of the distribution of fundus autofluorescence in vivo where the confocal optics effectively isolate the depth contribution of the autofluorescence with a resolution of around 200 microns.

A second technique [3] is based upon an optical attachment to the SLO that narrows the scan from 20 degrees to 2.5 degrees and so provides a very high magnification view of a small portion of the retina. At eccentricities of approx. 2 degrees from the fovea, the cone photoreceptor mosaic can be resolved and we hope to use this technique to monitor photoreceptor health in patients with a variety of degenerative retinal diseases.

The SNR of individual cSLO autofluorescence or high-magnification images is very low. To improve image quality, sequences of images are averaged to reduce noise and make visible the spatial structure present in the image. This requires each image in the sequence to be aligned or registered with the others to compensate for eye movements. Alignment is generally necessary for IFA imaging and is an absolute requirement for photoreceptor imaging where the effects of small eye movements are far more significant. Although several techniques are available for aligning images in this way, their effectiveness can be limited due to the high levels of noise present in the images combined with the lack of high-contrast structure. For these reasons most image registration algorithms are either not effective in correcting for eye movements or operate too slowly to be of routine use.

In this paper, we describe an image alignment system that we have developed specifically for noisy images. The system processes high-resolution SVHS video images at a rate of up to 70 frames per minute and has proven to be robust and tolerant of high noise levels in the raw image frames.

2. Materials and methods

Our high-magnification imaging method has been described previously [3]. For autofluorescence imaging, the patients’ pupils were dilated using 1% Tropicamide and imaged using the 488nm Argon laser of the Zeiss prototype cSLO with a scan angle of 40°. Images were recorded on a S-VHS video system and digitised in groups of 32 frames using a real-time frame grabber (Pulsar, Matrox Electronic Imaging Inc., Montreal, Canada) with an intensity resolution of 10 bits/pixel. Reflectance images of the region of interest were recorded first and a 521nm low-pass filter was then placed in the path of the photodetector to eliminate directly reflected light and pass only the light emitted by the auto-fluorescent substances in the retina. Images obtained with this filter in place are referred to here as autofluorescence (AF) image. All image processing software was written in-house using the Matrox Image processing Library (MIL) supplied with the digitising board and runs under Windows NT4 on a desktop Pentium PC at 133MHz. No additional image processing hardware was used.

The main stages in the image acquisition and processing procedure are summarised in Fig.1.

Fig. 1. Stages in image averaging procedure

Download Full Size | PDF

2.1 Image alignment and averaging

A wide variety of techniques for image alignment exist. [4] Many of them involve finding peaks in the cross-correlation between representation of two images, either in the spatial domain or in some transform that isolates components such as rotation and scaling. To date, most of these techniques have proved to be computationally expensive, although rapid image alignment techniques based upon them will soon fall into the domain of high-end workstations or desktop PCs using dedicated signal-processing hardware. In addition, registering very noisy images is a hard task because of the difficulty in accurately locating reference or ‘landmark’ features from one image to the next and algorithms designed for use with low-noise images often fail in these circumstances.

2.1.1 Pattern matching algorithm

The pattern-matching algorithm we use is part of the Matrox Imaging Library (Matrox Electronic Imaging, Quebec, Ca.) It is a variation of the well-known normalized cross-correlation peak detection algorithm which includes a hierarchical search strategy that dramatically increases its speed of operation.

At its core, a spatial cross-correlation pattern-detector that is searching for a model M in an image I must compute

r = \sum_{x = 0}^{X} \sum_{y = 0}^{Y} M (x, y) I (p - x, q - y)

Where X and Y are the height and width of the model and (p,q) is a position in the image I. This function is evaluated at all locations in I (width P, height Q) and the location which yields the maximum value of r is deemed to be the location of the model in the image. In order to eliminate the effect of linear intensity changes (offset and gain) in the image and model, the MIL algorithm evaluates a slightly more complicated normalised correlation function

r = \frac{X Y \sum I M - \sum I \sum M}{\sqrt{[XY \sum I^{2} - {(\sum I)}^{2}] [XY \sum M^{2} - {(\sum M)}^{2}]}}

With all summations being taken over X and Y in the model and P and Q in the image. In addition, to increase calculation speed the value of r² rather than r is calculated.

For video-sized images, say 768x576 pixels, and a 200x200 pixel model, the calculation of r at every location presents a formidable computational task (of the order of 20 billion operations per image). The MIL algorithm derives its speed from its iterative, hierarchical search strategy. Low-resolution versions of the image and model are generated and the cross correlation is calculated to identify regions likely to contain matches. The resolution is then doubled and the search is performed again, concentrating only on those regions identified in the previous stage. Because halving the horizontal and vertical resolution reduces the search time by a factor of 16, the early stages of the search, which are performed at a resolution of perhaps 1/16 that of the original image, are extremely fast and the entire matching operation takes less than 100ms on our 133MHz Pentium PC. The danger in this approach is that true matches for small or finely structured models may be overlooked in the low-resolution stage of the search. However the algorithm is adaptive in this respect and adjusts its start resolution to take account of the size of the model and characteristics of the image in which it lies. In practice, with both IFA and high-magnification images we have found that the spatial structure of retinal images is such that large, unique features such as major blood vessels, the optic disk and the fovea provide excellent models for the pattern-matching algorithm to work with and searches almost invariably begin at the lowest resolution.

Two further refinements of the MIL algorithm are notable. Firstly, it contains an option to pre-analyse the likely shape of cross-correlation peaks in all images of a given type. This calculation takes approximately 200ms and needs to be performed only at the start of the alignment procedure when the model is allocated. If the likely shape of the correlation peaks is found to be several pixels wide (as is often the case in natural images), the search algorithm may use this information to restrict its search space once it finds a region which lies on the slope of a peak. Secondly, once a peak is found at single-pixel resolution, the algorithm can fit a curve to the peak shape and estimate a true position of the peak that will lie at some non-integer pixel value. The accuracy of this sub-pixel pattern-location depends on the amount of noise in the image and the feature can be disabled when dealing with noisy images.

2.1.2 Image pre-processing

A typical image from a high-magnification imaging session is shown in Fig. 2.

Fig. 2. Raw SLO video frame

Download Full Size | PDF

Raw images such as Fig. 2 contain extremely high levels of noise: a combination of normally-distributed electronic and thermal noise from the SLO photo-diode and Poisson-distributed photon noise. When presented with images of this type, the MIL algorithm fails, either finding false matches to the model or failing to find a match at all. One solution to this problem is to low pass filter the images to attenuate the high spatial frequencies where most of the noise is concentrated.

Our image alignment program begins each search stage by making copies of the frame to be aligned and convoluting it in the spatial domain with a broad (12x12 pixel) uniform averaging function. Smaller convolution kernels can be used for images with a strong signal or prominent landmark features but we have found that the 12x12 kernel guarantees a successful pattern matching operation in nearly all the image sequences so far analysed. Performing the convolution in the Fourier domain would result in a small speed increase and we intend to implement this in future versions of the program. The low-pass filtered copies are passed to the MIL algorithm whilst the actual alignment is carried out on the original frames. In this way, high-frequency information which may be present in the original raw frames is retained in the final, averaged image whilst the noise is attenuated for the purposes of pattern matching. Most of the execution time of the program is spent performing the blurring convolutions. Registration of standard, reflectance SLO images is much faster as these contain far less noise and require no low-pass filtering.

2.1.3 Model allocation

The first stage in the alignment process is the allocation of a model. A model can either be defined by the operator moving a selection box over a suitable region of the reference image or else the program can automatically allocate a model by using a MIL library routine to search the reference image for what it considers to be an distinctive pattern. Models are defined in the low-pass filtered versions of the original frames. The optimum size of the model region depends on the size of the potential ‘landmark’ features but it also influences the speed of the search. Small models set a limit on the minimum resolution at which the search algorithm can start and for this reason large models tend to be found faster than small ones.

We have found that a model size of 200x200 pixels works well with our 40° reflectance fundus images (where, for example, the pigmented macular region is around 150 pixels across) and this is the size of model we use for auto-fluorescence images. With high-magnification images, the choice of landmark features is usually more limited and the model size can be selected by the operator. Occasionally the operator will choose a model which cannot be detected in subsequent images; usually because the subject’s eye moves so far in a drift or saccade that the field of view no longer contains the original model region. For this reason, it is essential that the operator monitors the alignment process until it has finished.

2.1.4 Localised search

To gain a further increase in speed, the model search is split into two stages. Firstly, a small region of the video frame centered on the last-known position of the model is extracted and then a search is made for the model in this area. If no match is found, the entire frame is used. This strategy eliminates the need to search within the whole frame except in rare cases where a large translational movement occurs between frames. A significant improvement in speed results from this method - not only is the search algorithm faster within the small regions but the time taken to low-pass filter these regions is much reduced. Once the model is found in the target image, its position is compared with the original position of the model in the reference image and the program calculates the amount of translation required. The translated, unfiltered image is added to an accumulator buffer and the whole process is repeated until all of the images have been registered. The data in the accumulator buffer is then retrieved and scaled to produce the averaged image.

2.1.5 Rotation

Our method assumes that only translational changes have occurred between the reference and target images. Cross-correlation peak detection in the spatial domain can fail when a rotated version of the original image is presented. The MIL routine tolerates between 10 and 12 degrees of rotation when the model has no obvious rotational symmetry. This has been found to be sufficient to ensure the pattern matching procedure works with all subjects encountered so far. Even greater tolerance to rotation can be achieved if a model with some rotational symmetry (for example, the fovea in an IFA image) is used. The MIL library contains additional routines to find patterns with an unknown amount of rotation as well as translation. However, we have not found that rotational compensation significantly improves the quality of averaged images and the images shown here were aligned using translational displacements only.

3. Results

3.1 Alignment Accuracy

We have tested our alignment routine on sample images with varying amounts of added noise to gauge its accuracy. The results are shown in figure 3. It can be seen that decreasing the signal to noise ratio (SNR), where

SNR = 10 \log_{10} (\frac{σ^{2} (signal)}{σ^{2} (noise)})

leads to an increase in alignment errors. However, the algorithm continues to operate even at extremely low SNRs and at the SNR encountered in a typical autofluorescence image (-15dB) the mean positional error is less than 0.3 pixels. Moreover, increasing the size of the blurring kernel (or equivalently, lowering the low pass filter cutoff frequency) causes only a slight decrease in accuracy. The results shown here are for images with normally-distributed noise added but we have obtained similar results for images with Poisson-distributed noise.

Fig. 3. Effects of increasing blurring kernel size and noise levels on image alignment accuracy. An image set with SNR of -14dB was used to test the effect of blurring kernel size.

Download Full Size | PDF

3.2 Image alignment examples

Fig. 4. Examples of processed images. Top - High-magnification cone photoreceptor imaging. Bottom - IFA imaging

Download Full Size | PDF

Figure 4 illustrates the effectiveness of the averaging procedure. Raw IFA and high-magnification images are shown along with their averaged counterparts. Little or no detail is visible in the raw images but in averaged IFA images discrete regions of high autofluorescence are clearly defined and in averaged high-magnification images individual cone photoreceptors can be resolved.

3.3 Processing time

The most significant effect of automating the image alignment process is the reduction in processing time compared to manual alignment. Manual alignment of 32 IFA frames takes an experienced operator at least 15 minutes and in a clinical situation, time constraints effectively limit the maximum size of the image sequence which it is practical to deal with. Using a Pentium 133MHz processor, the automated system takes approximately 30 seconds to align 32 images held in RAM and the limit to sequence length is determined by available disk storage space. We have successfully aligned sequences of 256 images in under 5 minutes. Photoreceptor imaging requires the alignment of at least 64 frames to obtain good quality images and this would not have been possible without the development of an automatic alignment system. Since the processing time is almost entirely dependent upon CPU speed, upgrading to faster CPU clock speeds brings a proportional increase in alignment speed. Tests with a 400 MHz Pentium have shown a three-fold increase in speed as expected.

4. Discussion

High-magnification and IFA imaging promise to be a powerful tools in the study of a wide range of eye diseases. IFA imaging to date has been limited by the time taken to produce good quality images using the technique of noise reduction by signal averaging. High-magnification photoreceptor imaging is a new technique where the noise levels in individual frames are so high that manual alignment of the required number of frames is impractical. The algorithm described above has enabled us to develop a fast, robust image registration system. The total time taken to generate averaged images from video sequences has been reduced by at least a factor of 20 and work is underway to make this process faster still. In addition to speeding up the process of IFA imaging, the system has enabled us to perform high-magnification photoreceptor imaging where previously the alignment time would have prohibited it.

There are obvious applications to other low-intensity fundus imaging systems such as ICG and fluorescein imaging. It is worth noting that the system spends 90% of its time on low-pass filtering noisy images and the whole-frame pattern search itself takes less than 100ms. When presented with standard reflectance SLO images with little noise, the same pattern-matching algorithm serves as a fast, accurate, global fixation tracking system that currently operates at the rate of approximately 10 frames per second on a Pentium 133MHz system and >25 frames per second on a Pentium 400MHz machine. We are developing a real-time image stabilisation and fixation-tracking system based upon this technology.

References

1. F. C. Delori, “Spectrophotometer for noninvasive measurement of intrinsic fluorescence and reflectance of the ocular fundus,” Appl. Opt. 33, 7439–7452 (1994). [CrossRef] [PubMed]

2. A. von Ruckman, F. W. Fitzke, and A. C. Bird, “Distribution of fundus autofluorescence with a scanning laser ophthalmoscope,” Br. J. Ophthal. 79, 407–412 (1995). [CrossRef]

3. A. R. Wade and F. W. Fitzke, “In-vivo imaging of the human cone photoreceptor mosaic using a confocal LSO,” Lasers and Light in Ophthalmology 1998. (In Press).

4. L. G. Brown, “A survey of image registration techniques,” Computing Surveys 24, 325–376 (1992). [CrossRef]

A fast, robust pattern recognition system for low light level image registration and its application to retinal imaging

Abstract

1. Introduction

2. Materials and methods

2.1 Image alignment and averaging

2.1.1 Pattern matching algorithm

2.1.2 Image pre-processing

2.1.3 Model allocation

2.1.4 Localised search

2.1.5 Rotation

3. Results

3.1 Alignment Accuracy

3.2 Image alignment examples

3.3 Processing time

4. Discussion

References

Cited By

Figures (4)

Equations (3)

Optics Express