Localization-based super resolution microscopy holds superior performances in live cell imaging, but its widespread use is thus far mainly hindered by the slow image analysis speed. Here we show a powerful image analysis method based on the combination of the maximum likelihood algorithm and a Graphics Processing Unit (GPU). Results indicate that our method is fast enough for real-time processing of experimental images even from fast EMCCD cameras working at full frame rate without compromising localization precision or field of view. This newly developed method is also capable of revealing movements from the images immediately after data acquisition, which is of great benefit to live cell imaging.
©2010 Optical Society of America
Fluorescence microscopy offers a sensitive, noninvasive approach to probe cells. With the advent of fluorescence labeling technologies and advanced fluorescence imaging techniques, it is now possible to gain deep insights into dynamic processes in living cells with high spatial and temporal resolution [1,2]. In particular, innovative super resolution imaging techniques, which are mainly derived from fluorescence microscopy, have revolutionized biological imaging by providing the capability to visualize the distribution, transport, and interactions of individual molecules in living cells with unprecedented spatial resolution. Among the innovations, photoactivated localization microscopy (PALM) and stochastic optical reconstruction microscopy (STORM) rely on the detection and localization of single fluorescent probes to reconstruct images the resolution of which is not limited by far-field diffraction but by photon-counting statistics [3,4]. In live cell imaging, one has to make a trade-off among the sensitivity of detection, the imaging speed and the viability of the specimen . Intrinsically wide-field imaging approaches such as PALM and STORM hold great potential for live cell imaging, since they offer simultaneously ultra-sensitive detection, maintain cell viability for extended time periods and image a large field of view at once.
Imaging speed becomes a key issue when multiple dynamic cellular processes are to be monitored. Localization-based super resolution microscopy is relatively slow [3,4]. For example, in live cell PALM, ~30 s are necessary at present to acquire images at a resolution of ~60 nm . However, a complete image analysis, including localizing multiple molecule positions in each image frame and reconstructing the final image from a large number of image frames usually takes even longer than image acquisition . As a result, images can only be visualized long after data acquisition, which is highly undesirable in tracking dynamic processes in living cells.
Recently, Nienhaus and associates introduced a fast image analysis method that relies on an algebraic algorithm called fluoroBancroft [7,8], where parallel data acquisition and analysis is possible for up to 100 molecules per frame with a camera frame time of 100 ms at only a small resolution loss. Further optimization of imaging acquisition speed is desirable because brighter probes and faster cameras [9,10] will become available in the future. These developments will lead to substantially larger numbers of molecules that need to be analyzed in parallel. Ultimately, live cell imaging will benefit most from real-time image analysis of a much larger number of fluorescent molecules without compromising either localization precision or field of view. Moreover, revealing dynamic information for large-scale live cell imaging will be possible immediately after image acquisition. Obviously, this goal is still beyond reach.
To meet the increasing need for fast image analysis in localization-based super resolution microscopy, we have developed a new method, termed MaLiang (after a traditional Chinese folktale “Ma Liang and his Magic Brush”) for “maximum likelihood algorithm encoded on a Graphics Processing Unit (GPU)”. This GPU-based maximum likelihood method exploits the fact that a GPU can compute locations of molecules in parallel [11,12], while the maximum likelihood algorithm guarantees a high localization precision . Compared with the fastest algebraic fluorosBancroft (FB) method reported recently, our method has an ~8-fold speed gain in the full routine for analyzing experimental image frames. Simulation results indicate that this new method can analyze ~20,000 molecules per second and, thus, is capable of processing experimental images from fast EMCCD cameras in real-time. The capability of this method for discovering fast cellular process in living cell imaging is also discussed.
2.1. Description of the maximum likelihood algorithm
A variety of methods are applied in localizing single fluorescent molecules. Similar to Gaussian fitting (GF), the maximum likelihood algorithm is a high precision localization method, as described earlier . This algorithm was combined with a generalized projection gradient method to locate single fluorescent molecule efficiently. Note that the point spread function is generally used to model the emission intensity pattern from individual fluorescent molecule. Theoretically, the intensity in pixel (i, j) is described byEq. (1) in pixel (i, j), is expressed by the symbol qi,j. We can compute the probability of Ii,j = qi,j by the following equationEq. (2), the spatial distribution of the light intensity is well described by a joint probabilityEq. (3) can be maximized by adjusting the parameters in Eq. (1). Note that the parameters should all be above zero. Based on the information described above, we can compute the position (x0, y0) of the fluorescent molecule by solving the following optimization problem15,16] can be used to obtain the optimal values.
2.2. Description of the MaLiang method
The total process for localizing the positions of multiple molecules and reconstructing a final super resolution image is embedded in a combined computational framework of CPU and GPU. As shown in Fig. 1 , three steps are carried out successively for the entire image analysis routine.
2.2.1. Step 1. De-noising
Original images were loaded into the memory of the CPU and then delivered to the GPU hardware. An optimized convolution template for de-noising in the GPU was modified with the following facts: an image filter separated into horizontal (row) and vertical (column) passes is optimal for boosting computing efficiency of the GPU , and an annular averaging filter efficiently eliminate the influence of different background intensity of fluorescent molecules in an image . Therefore, in the convolution template for de-noising, the forms of the averaging and annular averaging filters were modified to have the same sum of two separate filters. The raw image was then convoluted with the two filters respectively, resulting in two corresponding images which were further summed up to give the final de-noising image. Note that the dimension of the averaging filter should be adjusted to match the diameter of the corresponding Airy disk ad should be smaller (2 pixels) than that of the annular filter.
2.2.2. Step 2. Sub-region extraction
This step is used to extract sub-regions for further position localization. Due to a sparse distribution of fluorescent molecules in an image, extracting sub-regions can be performed efficiently on the CPU. For a noise-reduced image, the positions of the pixels where the values are local maxima above a given threshold value were extracted and stored. The threshold value is estimated to be more than 5 the standard deviation of the background noise intensity in the noise-reduced image. A sub-region with 5 x 5 pixels was extracted for the FB method, and a region with 7 x 7 pixels for the MaLiang and GF methods. The different sizes used here were based on the following considerations. For the FB method, it is necessary to separate the signal and noise. A larger sub-region makes such a separation more difficult, thus reducing the localization precision of the molecules. On the other side, MaLiang and GF methods are based on iterative optimization algorithms, therefore it is important to keep the data set that entirely includes the PSF of a single molecule.
2.2.3. Step 3. Localization
Values in each extracted sub-region are sent to the GPU and assigned a thread for further parallel computation. In the thread, the maximum likelihood algorithm was adopted to obtain precise sub-pixel localization of the fluorescent molecule. The number of extracted sub-regions that can be analyzed simultaneously in the GPU is governed by the number of threads, which is determined by the capacity of the GPU graphics card used. Using an NVIDIA GeForce 9800GT graphics card with 1.0 GB memory, 3,000 fluorescent molecules could be simultaneously analyzed and located.
2.3. Experimental and simulated data for image analysis
For a fair comparison between the fluoroBancroft method  and our newly developed GPU-based maximum likelihood algorithm, we used the same PALM image frames as those analyzed by the former method. As Gaussian fitting is more widely used by researchers in this field, we also include a comparison with this method. To verify our results from real PALM image data and to explore the limitations of our method in regard to the spatial and temporal resolution of localization, simulation studies based on the concept described by Ober et al  were also carried out. Parameters were modified to match the experimental images, where the radius of Airy disks equals 260 nm in object space. We note that spots with radii exceeding the average radius of the Airy discs by more than 3.5 × the standard deviation were regarded as overlapping spots and thus rejected.
In the simulations, the pixel size was set to 130 nm, so that the Airy disk occupies 16 pixels. In this case, when the photons and the background noise are ascertained, the localization precision could be optimized according to the theoretical analysis and the simulation by R. E. Thompson et al.  .The background noise was mostly considered as Poissonian noise and the SNR is defined as the ratio of I0 to the standard deviation of I0 + Ib. Here, I0 is the peak signal above the background noise and Ib is the intensity of background noise and was set to 10.
2.4. Hardware platform for image analysis
All programs were written in C under the Microsoft Visual Studio 2005 environment, using an Intel i5-750, 2.67 GHz personal computer with 4.0 GB memory. A fairly cheap (~$150 as of January 2010) NVIDIA GeForce 9800GT graphics card with 1.0 GB memory was used for GPU-based localization computation. Note that the execution times depend on the hardware platform.
2.5. Method for eliminating bad data points
In Figs. 5c and 5f, a simple linear regression method was applied to reject data points that markedly deviated from the mean contour line. Only points are included that deviate not more than four times the standard deviation from the mean contour line. The final images are built up by repeating this process several times. Note that only one linear structure exists in each of the final images.
3. Results and Discussions
3.1. Localization accuracy
Comparisons of the localization accuracy among Gaussian fitting (GF), fluoroBancroft (FB), and MaLiang (MLG) methods are presented in Fig. 2 . The resolution differences among these methods (Figs. 2b-2d) are not easily visualized, indicating that further analysis is required. Based on an approach suggested by Nienhaus et al , where the half width of the distribution of fluorescent molecules around their mean contour line was used for quantitative comparisons, it is found that the localization precision of the MaLiang method is comparable to the GF method and significantly higher than the FB method (Fig. 2f). This finding is supported by simulation (Figs. 2g-2h). Apparently, the higher localization precision of MaLiang over FB comes from the intrinsic performance difference between fitting and algebraic algorithms. The results also indicate that the Poisson noise model used in the MaLiang method is suitable to describe the noise behavior of the real image frames.
3.2. Computational speed of image analysis
Generally, the computation duty increases with the image acquisition rate and the molecules needed to be localized per frame. The computation duty between typical and fast EMCCD is presented in Table 1 . Obviously, the fastest algebraic fluorosBancroft (FB) method reported recently is not capable of real-time processing experimental images from fast EMCCD working at full frame rate.
We evaluated the image analysis speed of the MaLiang method by performing an entire PALM image analysis routine with first 1000 experimental image frames (Table 2 ). The total execution time with the MaLiang method is found to be 1.50 s, an 8-fold speed gain over the FB method (11.83 s) and a 22-fold gain over the GF method (34.02 s). We found that the program written in C runs about 2.5-fold faster than that in Matlab.
The high gain in image analysis speed comes from a combination of three factors: GPU-based parallel localization, appropriate convolution template and optimized number of iterations. First of all, we realized that integrating a fitting-based method into an image analysis method would be essential if high localization precision is desired. The only problem is how to minimize the negative effects from the slow fitting process. GPU-based parallel localization is an ideal solution, since the GPU is capable of processing multiple molecules simultaneously, which is desired by localization-based super resolution microscopy. Simulation results show that it is easy to analyze an image frame with up to 100 molecules (Figs. 3a -3b) without noticeably increasing either localization or total computation time. Moreover, the appropriate form of the convolution template is also helpful for maximizing the speed of GPU-based localization (Section 2.2). Third, although localization precision generally increases with a larger number of iterations in the fitting-based method (maximum likelihood in our case), it is found that ten iteration steps are sufficient for achieving satisfactory localization precision, while >10 iterations can only slightly improve the precision (Fig. 4 ). Therefore, the number of iterations was set to ten for all of the following image analysis.
The speed gain in analyzing experimental data is verified by simulation data (Figs. 3c-3d and Table 2). We found that more than 30 simulated image frames (512 x 512 pixels) with 100 molecules in each frame could be analyzed within 0.11 second, corresponding to an imaging analysis capacity of ~20,000 molecules per second. The gain in the total computation time of the MaLiang method is 11 over the FB and 22 over the GF methods. The speed is fast enough for real-time processing experimental images from typical fast EMCCD cameras, while neither localization precision nor field of view is compromised.
3.3. Capacity for tracking dynamic processes
We checked the capability of our high-precision, high-speed image analysis method for discovering fast cellular process in living cell imaging. Sample drift in the PALM instrument was corrected according to Betzig et al. , using the bead marked with magenta circle in Fig. 5. The drift-based contribution to position errors in PALM data is significantly reduced after sample drift correction (Fig. 6 ). The standard deviation of the bead positions obtained from 8,000 imaging frames is 7.5 nm, so that under the present imaging conditions, intracellular structure motions larger than 20 nm should be resolvable . The motion of the partial F-actin filament in Fig. 5b is found to be < 20 nm (Fig. 5d), indicating that this structure is almost stationary during the long image acquisition process. However, another F-actin filament (Fig. 5e ) was found to display > 50 nm motion (Fig. 5g), as detected between data sets 5 and 7. To quantify the sensitivity toward motion detection between the fast image analysis methods (MaLiang and FB), we calculated the Z-statistics from the positions of molecules for the two successive data sets in Fig. 5f. It is found that the MaLiang method has better performance in discovering such motions than the FB method (Fig. 7 ), which is advantageous for dynamic processes in live cell imaging.
Our MaLiang method, based on the maximum likelihood algorithm and GPU computation, is capable of ultra-fast localization-based super resolution image analysis with high localization precision. Compared with the fast fluoroBancroft method reported recently, our method is about eight times faster in the full routine for analyzing experimental image frames without compromising either localization precision or field of view. The MaLiang method has been shown to have excellent performance in discovering motions in live cell imaging.
This work was supported by the National Natural Science Foundation of China (Grant Nos. 30970691, 30927001, 30925013), the National High Technology Research and Development Program of China (863 Program: 2008AA02Z107), the Scientific Research Foundation for the Returned Overseas Chinese Scholars, State Education Ministry of China, Wuhan National Laboratory for Optoelectronics and the Open Research Fund of State Key Laboratory of Bioelectronics, Southeast University. G.U.N. was supported by the Deutsche Forschungsgemeinschaft and the State of Baden-Württemberg through the Center for Functional Nanostructures (CFN).
References and links
5. H. Shroff, C. G. Galbraith, J. A. Galbraith, and E. Betzig, “Live-cell photoactivated localization microscopy of nanoscale adhesion dynamics,” Nat. Methods 5(5), 417–423 (2008). [CrossRef] [PubMed]
7. P. N. Hedde, J. Fuchs, F. Oswald, J. Wiedenmann, and G. U. Nienhaus, “Online image analysis software for photoactivation localization microscopy,” Nat. Methods 6(10), 689–690 (2009). [CrossRef] [PubMed]
8. S. B. Andersson, “Localization of a fluorescent source without numerical fitting,” Opt. Express 16(23), 18714–18724 (2008). [CrossRef]
9. B. A. Wilt, L. D. Burns, E. T. Wei Ho, K. K. Ghosh, E. A. Mukamel, and M. J. Schnitzer, “Advances in light microscopy for neuroscience,” Annu. Rev. Neurosci. 32(1), 435–506 (2009). [CrossRef] [PubMed]
10. S. A. McKinney, C. S. Murphy, K. L. Hazelwood, M. W. Davidson, and L. L. Looger, “A bright and photostable photoconvertible fluorescent protein,” Nat. Methods 6(2), 131–133 (2009). [CrossRef] [PubMed]
11. J. D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Kruger, A. E. Lefohn, and T. J. Purcell, “A survey of general-purpose computation on graphics hardware,” Comput. Graph. Forum 26(1), 80–113 (2007). [CrossRef]
12. G. Quan, T. Sun, and Y. Deng, “Fast reconstruction method based on common unified device architecture (CUDA) for micro-CT,” J. Innovative Opt. Health Sci. 3(1), 39–43 (2010). [CrossRef]
15. P. H. Calamai and J. J. More, “Projected gradient methods for linearly constrained problems,” Math. Program. 39(1), 93–116 (1987). [CrossRef]
16. D. Q. Mayne and E. Polak, “Feasible directions algorithm for optimization problems with equality and inequality constrains,” Math. Program. 11(1), 67–80 (1976). [CrossRef]
21. E. Betzig, G. H. Patterson, R. Sougrat, O. W. Lindwasser, S. Olenych, J. S. Bonifacino, M. W. Davidson, J. Lippincott-Schwartz, and H. F. Hess, “Imaging intracellular fluorescent proteins at nanometer resolution,” Science 313(5793), 1642–1645 (2006). [CrossRef] [PubMed]