Parametric comparison between sparsity-based and deep learning-based image reconstruction of super-resolution fluorescence microscopy

Junjie Chen; Junjie Chen; Junjie Chen; Yun Chen; Yun Chen; Yun Chen

doi:10.1364/BOE.427989

1. Introduction

The past two decades have seen revolutionary breakthrough in the resolution of fluorescence microscopy, which was restricted by the diffraction limit of visible lights at approximately 250 nm. Such breakthrough has led to answers to many important questions. For example, super-resolution microscopy revealed the detailed structure of focal adhesion on the scale of nanometers [1]. Discoveries like this would not be possible without super-resolution microscopy. Among the super-resolution microscopy techniques developed over the years, the single-molecule localization microscopy (SMLM) [2,3] is the most commonly used. To resolve one emitter from another localized within the distance of diffraction limit using this method, emitters which stochastically “blink” between the bright (“ON”) and dark (“OFF”) states are employed. Multiple images of the same field are then acquired, and each of the image contains a paucity of “ON” fluorophores. The accurate position of these few “ON” emitters can then be determined by deconvolution in the frequency domain or 2D profile fitting in the spatial domain, if the point spread function (PSF) of the emitter is given. The super-resolution image is obtained by projecting all the deconvolved or fitted images onto a single plane. However, because low emitter density is required, this deconvolution approach dictates a time-consuming process of acquiring a large number of images, each containing few emitters. In general, the acquisition time ranges from minutes to hours. The long image acquisition time precludes dynamic cellular processes at the timescale of seconds or faster to be studied using super-resolution microscopy. Therefore, algorithms capable of resolving single emitters at higher density, compared to deconvolution/profile fitting based on PSF, are highly desirable. Furthermore, algorithms capable of resolving single emitters at higher density might be applied to enhance fluorescence images in general. This application will be especially helpful for research scientists who are hindered by the lack of access to super-resolution imaging instruments due to the high cost or high skill level required to operate the instrument.

In this study, we examined the performance of two alternative approaches than deconvolution/profile fitting, one based on sparsity and the other deep learning, by evaluating their capability of resolving single emitters at higher density. Both sparsity-based and deep learning-based algorithms have been used in reconstructing images of fluorescence microscopy, including localization-based super-resolution images [4–6]. The problem of super-resolution image reconstruction is essentially ill-posed for images that contain a high density of emitters, as many high-resolution (HR) images are possible solutions that result in the same low-resolution (LR) image. To identify a likely solution among many possible candidates, sparsity-based approaches introduce constraints on sparsity, assuming the image with the lowest emitter density is the likeliest solution. To obtain the likely solution, deep learning-based approaches compute the product of operator matrices and the low-resolution image, where the values of individual elements, or weights of individual nodes, in the operator matrices are optimized during training. Training is the process during which numerous pairs of corresponding low- and high-resolution images are used to deduce the operator matrix. Recently, many deep learning- [7–14] and sparsity-based [4,5,15–18] methods have been developed for SMLM. These recent methods have demonstrated potential advantages over conventional, single-molecule fitting methods. The premise of deep learning- and sparsity-based methods is that it might allow more emitters to exist at the “ON” state simultaneously, thereby significantly shortening the time required for imaging in super-resolution experiments, with comparable resolution to single-molecule fitting methods at low emitter densities. For example, Nehme et al. reported a deep learning algorithm in 2018 which could localize emitters in high density (9 μm^-2) accurately within 31 nm [7]. Similarly, a sparsity-based algorithm developed by Hugelier et al. in 2016 reached ∼50 nm accuracy at high density scenario [4]. Notably, deep learning-based methods have also proven to cost less computational resources, allowing even smartphones to gain super-resolution capabilities [13]. Yet, programs implementing these two new approaches have not been examined as comprehensively as the ones implementing the conventional single-molecule fitting approach. Notably, while SMLM 2013/2016 challenges [19,20] offered extensive insight to the capabilities of many SMLM algorithms, no deep learning-based algorithm was evaluated in these reports. Moreover, a not uncommon scenario was not examined in the challenges, in which the average emitter density in the whole field is not considered high but all emitters are concentrated in a few local regions of the field due to the structural nature of the subcellular organization. In such case, when all emitters of the “ON” state are simultaneously concentrated in a local region, the emitters appear to be connected against of a vastly dark background (see Fig. 1 “connected emitter”). When most emitters are activated simultaneously, this problem can be viewed as a Single Image Super-Resolution (SISR) task. Over the years, many deep learning algorithms have been developed for SISR given their capability of efficiently extracting features that map LR images to HR images [21], yet the accuracy and efficiency of the SISR algorithm in reconstructing SMLM images have not been examined or compared to other algorithms in any of the previous studies. To fill this knowledge gap, we evaluated the performance of sparsity-based and deep learning-based algorithms when adjusting two variables in the image: emitter density and connectivity. The emitter density, or sparsity, is highly correlated with the difficulty to accurately reconstruct high-resolution images, as more emitters present in an image makes the precise localization of each individual emitter more difficult. The connectivity of emitters further imposes challenges to image reconstruction algorithms. This is because the overlapping PSFs of connected emitters result in high emitter density locally, even when global emitter density of the image is relatively low. However, the effect of connectivity on accuracy of computational super-resolution algorithms has rarely been explored. This gap might be a result of the stochastic photo-activation of emitters in typical SMLM experiments, which makes activated emitters rarely connected even though all emitters may exist in proximity of each other.

A deep learning-based method Very Deep Super Resolution (VDSR), designed for SISR image reconstruction, and two sparsity-based methods, SParse Image DEconvolution and Reconstruction (SPIDER) and GrEedy Sparse PhAse Retrieval (GESPAR) were tested in this study. VDSR was chosen because it has been widely applied in enhancing resolution in a variety of applications. While algorithms with better performance on specific tasks have been developed more recently, VDSR remains one of the best performing deep learning-based algorithms overall [6,21] and is likely the most available to all scientists through MATLAB. It was previously demonstrated that both SPIDER and GESPAR could effectively enhance image resolution provided the true signals (i.e. non-zero pixels) in the images are sufficiently sparse. SPIDER [4] has shown superior performance over FALCON [15], the latter being the best performing compressed sensing (sparsity-constrained deconvolution) algorithm in the SMLM2016 challenge [20]. GESPAR [5] is another well-cited compressed sensing algorithm that was included in the study initially, despite showing less satisfactory performance compared to the other methods.

2. Results

2.1 Simulated fluorescence microscopy images with various degrees of sparsity and pixel connectivity are generated for systematic evaluation

In order to compare the performance and precision of the algorithms in recovering images of higher resolution from lower resolution, we synthesized ground truth images to simulate the scenario where every single fluorophore occupied a single pixel of a prescribed coordinate. These images were to be compared with the recovered images by different algorithms, for the purpose of evaluating image recovery performances of the algorithms. The size of the ground truth images was 1024 by 1024 pixels. The sparsity level of the ground truth images, namely the percentage of non-zero pixels, ranged from 0.61% to 6.9% (0.61%, 0.86%, 1.2%, 1.7%, 2.4%, 3.4%, 4.9%, 6.9%). For each sparsity level, three types of images were synthesized: the non-adjacent single-emitter type, the single-emitter type, and the connected-emitter type. In the non-adjacent single-emitter images, each non-zero pixel is surrounded by eight neighboring pixels of zero intensity (Fig. 1 for schematics). In the single-emitter images, non-zero pixels were placed randomly with no restrictions. In the connected-emitter images, each non-zero pixel is surrounded at least by one other non-zero pixel. In total, 24 categories of images were synthesized, with each representing a prescribed combination of sparsity and pixel connectivity. For each category, 10 images were synthesized for the testing. The coordinates of the non-zero pixels in the ground truth images were determined by random number generation (see Methods).

Fig. 1. Ground truth images and corresponding low-resolution images for testing reconstructing algorithms, with schematics for performance evaluation metrics. (A) Representative ground truth images generated at 3 different sparsity levels with different pixel-connectivity conditions are shown. The red arrows in the non-adjacent single emitter images indicate that two neighboring emitters are separated at least by one zero pixel. The red arrows in the single emitter images indicate that two neighboring emitters are adjacent to each other and not separated by zero pixels. (B) Low-resolution images are generated by Gaussian blurring and down-sampling from the corresponding ground truth images shown in (A). (C) The performance metrics recall and localization accuracy are defined as shown in the schematics. The emitters 1 and 2 (orange) in the recovered image are both considered representative of the ground truth emitter A (blue), and emitter 4 representative of ground truth emitter C; because they are within the range of localization accuracy tolerance from emitter A or C. To calculate the localization accuracy, a virtual emitter (green) is created, and the distance between the virtual emitter and emitter A is defined as the localization accuracy. We note emitter 2 is not considered representative of emitter B, because emitter 2 is closer to ground truth emitter A, even emitter 2 is also within the range of localization accuracy tolerance from emitter B. Emitter 3 in the recovered image is considered false positive because it is not within the range of localization accuracy tolerance from any ground truth emitter. (D) The definition of true emitter, false emitter, and failure are illustrated schematically. A true emitter is a recovered emitter overlapped with a ground truth emitter. A false emitter is a recovered emitter not overlapped with a ground truth emitter. A failure is a ground truth emitter failed to be recovered. These metrics are more stringent than the ones illustrated in (C), because it was assumed that each true pixel only could be recovered once by an algorithm. The performance metrics true positive rate (TPR) and true negative rate (TNR) are calculated based on the percentage of true emitters recovered and true negative emitters recovered, respectively.

Download Full Size | PDF

Next, based on the ground truth images, images of lower resolution were generated to simulate images acquired by fluorescence microscopy. The ground truth images were first subjected to a Gaussian filter (σ=1.0077), and the blurred images were down-sampled by a factor of 2. The resulting images, with their size of 512 by 512 pixels, were to be used as inputs to be processed by sparsity- and deep learning-based methods.

The recovered images were then examined against the ground truth images to evaluate the performances of the different methods. Five main metrics were used to assess the performance: computational time, recall, localization accuracy, true positive rate (TPR), and true negative rate (TNR) (Fig. 1(C), (D)). The computational time was defined as the mean time to recover an image on a personal computer with 8 GB RAM and a 2.60 GHz Hexa-core processor. Since the deep learning-based VDSR requires a significant period of time for model training before it is capable of recovering images, we report the computational time for VDSR both including and excluding training time. Recall and localization accuracy were defined in a manner similar to those used by Hugelier et al. [2]. To calculate recall and localization accuracy, each recovered emitter was first paired to the closest ground truth emitter within the range of PSF radius (Fig. 1(C)). The recall was then defined as the number of ground truth emitters paired with at least one recovered emitter divided by total ground truth emitters in the image. The example shown in Fig. 1(C) would have a recall of 66.7%. The localization accuracy was defined as the mean distance between all recovered-ground truth emitter pairs. If multiple recovered emitters were paired to the same ground truth emitter, a virtual emitter at the geometrical mean of these recovered emitters would be used to calculate the localization accuracy. To calculate TPR and TNR, pixel-by-pixel comparison was performed on the recovered and ground truth images (Fig. 1(D)). Pixels that were non-zero in both images were true emitters, pixels that were only non-zero in the recovered images were false emitters, and pixels that were only non-zero in the ground truth images were failures. The TPR was defined as the number of true emitters divided by the number of all emitters in the ground truth image. TPR is a metric similar to recall, but imposes a higher penalty if the coordinate of the recovered emitter is not identical to that of the ground truth emitter. The TNR was defined as the number of true negative pixels divided by the number of all zero pixels in the ground truth image. The example shown in Fig. 1(D) would have a TPR of 50% and TNR of 95.7% (22/23). These metrics can be used selectively to guide the image reconstruction of specific measurements. For example, if one is to perform molecule counting [5,6] based on the reconstructed images, the method with a higher sum of recall rate and TNR should be used. If one is to measure the distance of two molecules tagged by different fluorophores but located in a macromolecular complex [1,7], then the method with better localization accuracy should be used.

2.2 Deep learning-based image recovery is faster than sparsity-based recovery by at least three orders of magnitude

To provide a practical guide for selecting the optimal image processing method to enhance resolution, the time required to complete the processing for each method was recorded. If a method requires significant longer time to complete the same task compared to other methods, the disadvantage should also be taken into account. The recovery of high-resolution images was performed with the same set of images in all 24 categories for each algorithm tested. For VDSR, an additional 240 images were synthesized for the training purpose. Of these 240 training images, 10 images for each combination of sparsity level and connectivity constraint were included. 50 epochs of training were performed over the duration of 26,460 seconds (7.3 hours), reaching a final value root-mean-square error (RMSE) of 1042.3. RMSE, defined as $\sqrt {\mathop \sum \nolimits_{i = 1}^n {{({{y_i} - {{\hat{y}}_i}} )}^2}/n} $, where ${y_i}$ and ${\hat{y}_i}$ respectively represent the actual residual image and network prediction, is a widely used metric to measure the “loss” of the true information by comparing the recovered image and the ground truth. We note that training after the 20th epoch had very little improvement over RMSE of the model. The loss for validation set followed the same trend and no overfitting was observed. For SPIDER, the value of the sparsity parameter κ was optimized for all sparsity levels in recovering single-emitter images. For GESPAR, no optimization was required, as the recovery was conducted by exhaustive examination of all possible combinations of non-zero pixels present in the image. Because such an exhaustive examination requires a prohibitively large number of iterations, exceeding the memory capacity of the computer, each blurred and down-sampled image was divided into patches of 32 by 32 pixels for the recovery.

The time to recover all 240 low-resolution images (10 images in each of the 24 categories) was approximately 191 seconds (0.05 hours) for VDSR excluding training time and 515,370 seconds (143 hours) for SPIDER. Of the 240 images tested, GESPAR failed to recover 230 images over hundreds of hours, which were more than sufficient for the other methods. As a result, only 10 recovered images from 0.61% non-zero pixel density, non-adjacent single emitters category were produced. The time for these 10 images to be processed for recovery was approximately 577,000 seconds (160 hours). Comparing the time required for each method, we concluded that VDSR, the deep learning-based method is superior to SPIDER and GESPAR, the sparsity-based methods in terms of processing time. On average, it requires 2,150 seconds for SPIDER, 57,700 seconds for GESPAR when successful, 0.796 seconds for VDSR (excluding training time), or 111 seconds for VDSR (including training time) to recover an image of 1024 by 1024 pixels. VDSR is 2,700 times faster than SPIDER, and 72,500 times faster than GESPAR excluding training time. After considering the training time, VDSR is still 19 times faster than SPIDER and 520 times faster than GESPAR. We concluded that GESPAR is extremely time-consuming, thus impractical for the task of recovery of images with the sizes commonly seen in fluorescence microscopy. As result, we only were able to evaluate the performance of GESPAR against VDSR and SPIDER in the category of 0.61% non-zero pixel density, non-adjacent single emitters in the following sections.

2.3 VDSR recovers true signals from low resolution images with high recall efficiency

The rate of emitters in ground truth image successfully recovered, or recall, is a commonly used parameter in evaluating the performance of image reconstruction algorithms. A high recall value indicates that a significant portion of the ground truth is faithfully represented by the recovered image. An emitter in the ground truth image is regarded as successfully recalled if it is paired to at least one emitter in the recovered image within the range of localization accuracy tolerance from the ground truth emitter (Fig. 1(C)). The recall was calculated as:

Recall = \frac{{number\; of\; ground\; truth\; emitters\; paired\; to\; recovered\; emitters}}{{number\; of\; emitters\; in\; ground\; truth\; image}}

The true positive rate (TPR) is defined as the successful recovery of emitters as a percentage of all ground truth emitters using pixel-by-pixel comparison (Fig. 1(D)). TPR is a metric similar to recall but more stringent than it, because it incurs penalty if the recovered emitter is not located at the exact coordinates of the ground truth emitter. TPR was calculated using the formula:

TPR = \frac{{number\; of\; true\; emitters\; in\; recovered\; image}}{{number\; of\; emitters\; in\; ground\; truth\; image}}

As the emitter density increases, it is expected that the algorithms become less effective, because pixels containing blurred emitters close to each other may appear as a single emitter. Indeed, both the deep learning-based VDSR and sparsity-based SPIDER demonstrated high recall and TPR at low densities, but their performance decreased monotonically as the emitter density increased (Figs. 2, 3(A), 3(C)).

Fig. 2. Color-coded recovered images compared with the ground truth. Representative recovered images from each algorithm are shown. The color of each pixel indicates whether it was recovered correctly. The schematic definition of true emitter, false emitter and failure can be found in Fig. 1(C).

Download Full Size | PDF

Fig. 3. Performance comparison of different algorithms. (A) Recall is defined as the rate of recovered emitters within the range of localization accuracy tolerance from any ground truth emitter (see Fig. 1(C)). (B) The localization accuracy can be evaluated by the mean localization accuracy in recovered images. The localization accuracy indicates how far a recovered emitter deviates from its associated true emitter (see Fig. 1(C)). (C) True positive rate of recovered emitters is defined as the rate of recovered emitters with identical coordinates of the corresponding ground truth emitters (see Fig. 1(D)). (D) True negative rate is defined as the rate of zero pixels in the recovered image that are also of zero value in the corresponding ground truth image.

Download Full Size | PDF

The recall of SPIDER was 96.6% for the lowest density (0.61% non-zero pixels) non-adjacent single-emitter category and dropped to 49.5% for the highest density (6.9% non-zero pixels) single-emitter category. VDSR scored 99.9% and 69.9% on those two categories, respectively. In fact, VDSR performed consistently better than SPIDER on recall across all categories. For the connected-emitter images, SPIDER could only recall less than 34.8% of the true emitters, while VDSR recalled more than 85.0%. The relatively low performance of SPIDER in recovering connected-emitters even at low density indicated an additional constraint in its applications other than sparsity. GESPAR scored 55.5% recall in the 0.61% density, non-adjacent single-emitter category, 41.4% lower than SPIDER and 44.4% lower than VDSR.

Similar to the results regarding recall, VDSR consistently scored higher TPR relative to SPIDER. In particular, VDSR scored 100% in TPR for the lowest density (0.61%), non-adjacent single-emitter category, whereas SPIDER scored 87.0%. For the highest density (6.9%), single-emitter category, where VDSR performed worst in TPR with a score of 45.1%, SPIDER merely got 21.1% (Fig. 3(C)).

2.4 VDSR recovers true signals from low resolution images with higher localization accuracy

The localization accuracy, measuring the mean distance of recovered emitters to their corresponding ground truth emitters, estimates the similarities between the ground truth and recovered images. The localization accuracy was calculated as the mean deviation between all ground truth emitter-recovered emitter pairs:

Localization\; accuracy = \frac{{total\; deviation\; distance}}{{number\; of\; paired\; emitters}}

For localization accuracy, the emitters recovered by SPIDER and deemed as true signals with tolerable location inaccuracy (Fig. 1(C)) deviated 10 to 71 nm from the true emitters in non-adjacent single-emitter setting, while VDSR in the same setting had the accuracy of 0.1 to 50 nm (Fig. 3(B), Fig. S2). Both methods exhibited good localization accuracy at low non-zero pixel density. By comparison, localization accuracy of VDSR was 30% to 99% smaller than SPIDER, depending on the category of the tested images. Once more, VDSR scored consistently better than SPIDER on localization accuracy among all the categories except in the lowest density (0.61%), connected-emitter category. GESPAR in the lowest density (0.61%), non-adjacent single-emitter category had a localization accuracy of 75 nm, 116% less accurate than SPIDER and 58,000% than VDSR in the same category. We note that SPIDER did exhibit better localization accuracy relative to VDSR in the lowest density (0.61%), connected-emitter category. Despite having a localization accuracy of 4.6 nm (59% smaller than VDSR) in this category, SPIDER only recalled 34.8% of all true emitters (62.6% lower than VDSR). While all the recovered emitters were placed accurately with the true emitters, the low recall rate and localization accuracy of SPIDER imposes a limitation in image recovery when accurate localization of recovered emitters is critical for data interpretation.

2.5 SPIDER recovers more true zero pixels at high emitter density

True negative rate (TNR) signifies whether the algorithm recovers emitters from where they should not be. Since the non-zero pixels do not exceed 7% of the total image area, most pixels in the ground truth and recovered images are zero-valued. As a result, the TNR was expected to be close to 1 in all categories. Both SPIDER and VDSR exhibited almost 100% TNR in the lowest density (0.61%) of single-emitter and non-adjacent single emitter categories. At the highest density, the TNR of SPIDER dropped to 96.5% for non-adjacent single-emitter and 97.7% for single-emitter. VDSR had 91.5% and 97.0% TNR for the same categories (Fig. 3(D)). A closer look at the recovered images revealed that VDSR recovered twice as many emitters as there were in the ground truth images for the highest density, non-adjacent single-emitter category (Fig. S1A). The relatively lower TNR from numerous false emitters reveals a limitation of VDSR when evaluation of the number of emitters is critical. For the images containing connected emitters, the TNR of both SPIDER and VDSR remained strongly above 99.7% across all density levels. GESPAR scored 99.1% TNR in the lowest density, non-adjacent single-emitter category, being the worst among the three algorithms we tested. In corollary, false positives, defined as the percentage of recovered emitters not found within the range of localization accuracy tolerance from any ground truth emitter, for all categories was close to 0 (Fig. S1B). We note that SPIDER has slightly more false positives than VDSR, suggesting SPIDER recovered emitters deviated from ground truth emitters too far that some recovered emitters could not be found within the range of localization accuracy tolerance of any ground truth emitter A (Fig. 1(C)). This result is not surprising, especially when the ground truth emitters are adjacent. Adjacent ground truth emitters may be eliminated by SPIDER in favor of emitters that were further apart to promote sparsity, which is a term included in its penalty [2].

2.6 Image reconstruction from a real super-resolution experiment

To evaluate the performance of SPIDER and VDSR in a more realistic context, we applied the algorithms to the raw images of microtubules acquired using STORM [22]. The introduction of sparsity or deep learning-based algorithms to super-resolution microscopy is motivated by the prospective of reducing time required for image acquisition. In other words, if there exists an algorithm which permits recovering single emitters from a stack of raw images with higher emitter density, the typically minute- or hour-long image acquisition time may be reduced to seconds. To simulate this scenario, the raw image stack containing 9990 frames, acquired sequentially, was condensed to a new image stack containing 37 frames. Each frame of the new stack was synthesized by projecting 270 consecutive frames in the raw image stack onto the same plane. This condensed stack of images with 270-fold higher emitter density can be regarded as an equivalent to a stack of images containing emitters with certain photochemical kinetics in favor of “ON” (bright) state [23,24]. Experimentally, emitters with longer “ON” state can be realized by modulating the excitation and spin energy states of the fluorophores, or changing the binding affinity between the DNA paint strand and its target strand [23,24]. In addition, we also projected the whole 9990 frames into one single image to be tested, in order to evaluate the feasibility of reconstructing super-resolution images from single-shot images with emitter density nearly three orders of magnitude higher. The rationale of testing both the condensed stack and the projected image was analogous to the rationale by which we chose to test simulated images of both adjacent emitters and connected emitters in the previous section. The performance was then evaluated by comparing the recovered images by SPIDER and VDSR with those recovered by three well-established super-resolution algorithms, Octane, QuickPALM, RapidSTORM [25–27] and an algorithm called PeakFit, which is under active development and has shown superior performance over other SMLM algorithms on 2D super-resolution datasets in SMLM2016 challenge [20]. Four different algorithms were used to avoid introducing bias belonging to a particular algorithm. The raw image stack containing 9990 frames was used for image recovery by Octane, QuickPALM, and RapidSTORM. The super-resolution rendering from these algorithms was provided online as a part of the SMLM2013 results, and we applied PeakFit to the raw image stack to obtain the fourth super-resolution image. The condensed image stack and the projected single images, with 270-fold or 9990-fold higher emitter density respectively, were used for image recovery by SPIDER and VDSR.

Compared to the reconstructed images by Octane, QuickPALM, RapidSTORM, and PeakFit, we observed low recall and TPR in the recovered image by SPIDER from single projected image of microtubules, consistent with the simulation results where connected emitters were present (Fig. 3). The recall, localization accuracy and TPR of SPIDER were significantly higher in recovered image by SPIDER from the condensed image stack (Fig. 4(B)). Nonetheless, the best performance from SPIDER still fell short compared to that of VDSR. In average, the recall and localization accuracy of VDSR was 31% and 20% higher relative to SPIDER, respectively. The recall and TPR were higher in images recovered from the condensed stack than from the single projected image for both SPIDER and VDSR. The best localization accuracy and TNR, however, were accomplished by VDSR in the recovered image from the single projected image.

Fig. 4. Image recovery from real super-resolution experiment with performance comparison. (A) The illustration depicts the workflow of performance evaluation based on experimentally acquired images. The following images were synthesized from the raw images for performance evaluation: a new image as the result of the 9990 raw images projected onto one single plane (projected image), and a new condensed image stack containing 37 frames with each frame as the result of 270 consecutive raw image projected onto one plane (condensed stack). Scale: 2 μm. (B) Performance of each metric by SPIDER and VDSR is determined by comparing recovered high-resolution images and super-resolution images reconstructed by well-established methods Octane, QuickPALM, RapidSTORM and PeakFit. The best performance in each metric was marked in bold. Unit of localization accuracy: nm.

Download Full Size | PDF

Despite the superior performance of VDSR, we found that VDSR is not capable of recovering images of pixel sizes varying from the one used for training. When the LR image size was increased by a factor of two during pre-processing using linear interpolation, changing the pixel size from 200 nm/pixel to 100 nm/pixel and the preprocessed image was used for reconstruction, we observed highly inaccurate recall by VDSR, giving rise to 4 times as many as the emitters in the ground truth image. This indicates a single VDSR network cannot be used to recover HR images with an arbitrary scale factor. On the other hand, SPIDER offers the flexibility of recovering images of various pixel sizes, in contrast to the intolerance of VDSR which can only perform satisfactorily at the specific pixel size prescribed during the training (Fig. S3). Our results imply that multiple VDSR networks need to be trained dependent on the imaging parameters in order to recover super-resolved images.

3. Discussion

In this work, we have conducted a parametric study to systematically compare the performance of sparsity-based and deep learning-based super-resolution image reconstruction algorithms. The performance was evaluated with both simulated and real microscopic images. The deep learning-based algorithm VDSR has shown superiority in processing time, emitter recall, localization accuracy and TPR. Based on these metrics, VDSR is recommended when accurate localization of recovered emitters is critical and when computational resource is limited. Since the loss evaluated by RMSE for VDSR showed little improvement after the 20^th epoch, we recommend training VDSR for 20 epochs instead of 50 to further reduce the training time. The sparsity-based algorithm SPIDER has higher TNR and never recovers more emitters than there should be with optimized parameters. These metrics make SPIDER a more desirable choice between the two when evaluation of the number of emitters is critical. Another sparsity-based method, GESPAR, is significantly more time-consuming than SPIDER and VDSR, and may require much more computing resources if high-resolution images are to be recovered in comparable timescale. The high demands on time or computing resources thus render GESPAR less favorable, given the motivation of applying alternative algorithm for image reconstruction is to save time or to reduce the operational cost. The parameters of each algorithm were optimized for all settings to avoid overtraining the neural network and to put all algorithms on equal conditions.

To our best knowledge, this study is the first parametric study comparing the performance of different algorithms in recovering high-resolution microscopic images from lower resolution. Both sparsity- and deep learning-based algorithms have demonstrated remarkable image reconstruction capabilities in certain conditions. Our results provide a practical guide to establish a customized framework of super-resolution imaging experiments with specific research objectives. For example, if faster image acquisition is desirable, and it is critical to obtain accurate distance measurement between two species of fluorescently-tagged molecules, like in the case of mapping the kinetochore architecture [28] or focal adhesion organization [1], VDSR can be adopted for the task of image reconstruction. If counting the copy numbers of certain molecule species [29,30] in live cells is of critical importance, but the accuracy of their exact positions is secondary, the algorithm with highest sum of recall and TNR at the prescribed emitter density should be used. Algorithms capable of correctly discerning connected emitters with accuracy in the same image frame will be most suitable for recording the fast cellular processes in subcellular structures consisting of proteins of interests in high density, without resorting to the time-consuming stochastic-based multi-frame acquisition. The prospective applications include integrin dynamics during focal adhesion turnover [31] or tubulin rearrangement during flagellum/cilium motion [32].

The differential strength of SPIDER and VDSR might stem from the different cost functions, also known as loss functions or penalties, implemented in the algorithms. The cost function implemented in SPIDER is ${C_{SPIDER}} = \; \vert\vert{\boldsymbol x} - C{\hat{{\boldsymbol y}}\vert\vert^2} + \; \lambda {|{\hat{{\boldsymbol y}}} |^0}\; $, and in VDSR during training is ${C_{VDSR}} = \frac{1}{2}\vert\vert{\boldsymbol y} - {\hat{{\boldsymbol y}}\vert\vert^2}$, where ${\boldsymbol x}$ represents the input (low-resolution, PSF blurred) image, ${\boldsymbol y}$ represents the corresponding ground truth image, ${\hat{\boldsymbol y}\; }$represents the recovered image as an estimate of the ground truth, C represents the PSF, and $\lambda $ is the sparsity penalty coefficient for SPIDER. The higher TNR score of SPIDER observed in most conditions can be attributed to the number of emitters taken into account during the iterative cost function minimization in SPIDER computation, but not in VDSR training. By including the number of emitters as part of the cost function in a deep learning-based algorithm, one might further improve its performance by increasing the TNR score.

In addition, our study demonstrates that it may be feasible to obtain images with details beyond the diffraction limit computationally with appropriate experimental design to achieve favorable emitter density and connectivity conditions. We note that for VDSR, access to a super-resolution microscope might still be prerequisite in order to establish a training dataset, should simulated training sets not be adequate to achieve desirable accuracy [33]. However, the possibility of accomplishing super-resolution computationally, once the training is concluded, might still appeal to research scientists who cannot afford the costly rate to use super-resolution microscopes frequently.

We note that variables such as scaling factors between HR and LR images, pixel sizes, or signal-to-noise ratios (SNR) were not explored, as these variables are beyond the scope of the study primarily concerning connectivity. It is a well-known limitation that VDSR is not compatible with arbitrary scale factors in super-resolution image reconstruction. Recently, other deep-learning models have been developed to mitigate this problem [34]. We note that the precision of emitter locations recovered by VDSR or SPIDER is limited by the pixel size of the reconstructed image. While methods based on deconvolution/fitting using Gaussian or PSF kernels can possibly provide emitter coordinates with much higher precision compared to the pixel size in the reconstructed image by VDSR or SPIDER if SNR is sufficiently high with accurate knowledge of the imaging system [35], to increase the precision of emitter location, one has to set the matrix size of the reconstructed images higher by increasing the numbers of rows and columns. Yet this practice does not guarantee a better accuracy, as many possible emitter coordinates will result in the same cost function values. It would also be relevant to investigate different noise levels in LR images to be recovered. Conventionally, fast acquisition of super-resolution images using fluorophores with short ON state requires a fast frame rate to achieve short total time of image acquisition. It is because with only few ON emitters in the field, a fast frame rate collects few photons per frame, reducing the signal-to-noise (SNR) ratio to the extent that noise subtraction is challenging. In this study we aim to explore an alternative scenario where a different class of fluorophores are used, which by their nature of photochemistry stay in the “ON” state for relatively long period of time. As a result, more ON emitters per frame can be recorded, reducing the number of frames required to survey the whole fluorophore population in the field. In this scenario, acquisition time can be shortened without adopting a fast frame rate which will decrease the SNR to the extent that noise subtraction is unreliable. A more comprehensive study should consider including these factors when comparing the performance of algorithms.

The premise of SMLM, by which each emitter is assumed to equally represent a fluorescence-emitting single molecule regardless of fluorescence intensity, establishes that the location accuracy of single emitter is an essential performance metric in this study. On the other hand, intensity accuracy, which measures whether emitter intensity is faithfully recovered in reconstructed images, is not a critical metric in the context of SMLM. It is because while intensity accuracy can provide information about the time ratio of an emitter in ON or OFF states during image acquisition, not many known applications demand such information. Therefore, intensity accuracy was not evaluated in this work.

4. Methods

4.1 Dataset preparation

The ground truth images were generated by randomly filling non-zero pixels to a zero matrix of the size 1024 by 1024 repeatedly. For single-emitter and non-adjacent single-emitter images, a pair of random integers (x, y) between 1 and 1024 were generated each time. This new coordinate was discarded if it was adjacent to (in non-adjacent single-emitters) or overlap with (in single-emitters) an existing non-zero pixel. If a new coordinate was not discarded, the pixel it represents will become a non-zero pixel. For connected-emitter images, two pairs of random integers (x1, y1), (x2, y2) that were 10 pixels apart were generated each time. All pixels between these coordinates will become non-zero pixels to ensure the non-zero pixels are adjacent to each other. The random generation process was repeated for each image until the sparsity level for that category is reached. The low-resolution images were generated by Gaussian filtering (σ=1.0077) the ground truth images, adding Gaussian noise (mean=10/255, variance=(5/255)²), followed by down-sampling by a factor of 2. The size of the Gaussian filter was chosen such that the low-resolution images would be representative of fluorescent microscopy images taken from an objective with 63× magnification (pixel size = 180.6 nm), 1.4 NA at λ=600 nm. All images were 8-bit.

4.2 Recovery from low-resolution images

Briefly, the VDSR network consists of 20 2-D convolutional layers, each with 64 3 × 3 kernels and followed by a ReLU layer, except the last convolutional layer which has a single 3 × 3x64 kernel to reconstruct the image. Images were padded with zeros (width=1) to keep output sizes constant after each convolutional layer. The final layer is a regression layer computing the mean-squared error (MSE) between the residual image and the network output. The weights were randomly initialized with He’s method [36] and the biases were initialized to 0. The network was trained using stochastic gradient descent with an initial learning rate of 0.1 and a momentum of 0.9. L2-norm with a threshold value of 0.01 was used for gradient clipping.

Prior to the performance evaluation in recovery of high-resolution images, VDSR was trained on a separate training dataset generated with the same settings as described in the “dataset preparation” sub-section. The training images were randomly cropped into patches with the size 41 by 41 pixels. 128 training patches (20.5% total image area) and 16 validation patches (2.6% total image area) were produced from each training image. The σ (standard deviation) of the Gaussian PSF and ‘zoom’ parameters in SPIDER were set to match the conditions under which the dataset was generated. The κ, or sparsity parameter in SPIDER was optimized from a number of trial runs, such that the algorithm recovers as many emitters as possible at the highest sparsity level, without compromising its performance at lower sparsity levels significantly. The GESPAR parameter, the number of emitters to fit in a single patch, was determined automatically in the recovery by dividing the total intensity in each image patch by the mean total intensity of a blurred emitter in medium sparsity level image. A script was then used to automatically recover all low-resolution images using each algorithm. The recovery time each algorithm consumed in recovery was obtained from the timestamps on the recovered images.

4.3 Real super-resolution experiment

The super-resolution image set of microtubules were obtained from an open online super-resolution database [19], courtesy of Nicolas Olivier and Suliana Manley at Ecole Polytechnique Fédérale de Lausanne (EPFL). The images were acquired using a 100x objective with 1.46 NA at λ=635 nm. The pixel size was 100 nm/pixel, comparable to the settings used in our simulated low-resolution images.

The size of the raw microtubule image was 128 by 128 pixels. These images were used to be recovered by Octane, QuickPALM, RapidSTORM, and PeakFit. The size of the recovered images by Octane, QuickPALM, and RapidSTORM was 1280 by 1280 pixels. The size of the recovered image by PeakFit was 1024 by 1024 pixels. The projected image and condensed stack were down sampled from the raw images to 64 by 64 pixels, so that the image quality was consistent with that of the simulated images prepared for SPIDER and VDSR. The size of the recovered images by SPIDER and VDSR was 128 by 128 pixels. To compare the accuracy of reconstruction across Octane, QuickPALM, RapidSTORM, and PeakFit to those reconstructed by SPIDER and VDSR, the images recovered by Octane, QuickPALM, RapidSTORM, and PeakFit were down sampled to the size of 128 by 128 pixels, so that all the recovered images were of the same size for performance comparison.

4.4 Software

All computational tasks including image generation, neural network training, and performance evaluation, were performed with MATLAB version R2020a (MathWorks) unless otherwise noted. Projection of microtubule images (Fig. 4(A)) was performed using ImageJ. The super-resolution localization and rendering of microtubule images by PeakFit was performed using the ImageJ plugin GDSC SMLM, developed by A. Herbert at the University of Sussex. The corresponding codes generated in this study are included in the Supplement 1 Code 1 [37].

Funding

National Institute of Biomedical Imaging and Bioengineering (R21EB029677).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper can be recreated from codes included as supplemental.

Supplemental document

See Supplement 1 for supporting content.

References

1. P. Kanchanawong, G. Shtengel, A. M. Pasapera, E. B. Ramko, M. W. Davidson, H. F. Hess, and C. M. Waterman, “Nanoscale architecture of integrin-based cell adhesions,” Nature 468(7323), 580–584 (2010). [CrossRef]

2. S. Hugelier, M. Sliwa, and C. Ruckebusch, “A perspective on data processing in super-resolution fluorescence microscopy imaging,” J. Anal. Test. 2(3), 193–209 (2018). [CrossRef]

3. L. Möckl and W. E. Moerner, “Super-resolution microscopy with single molecules in biology and beyond-essentials, current trends, and future challenges,” J. Am. Chem. Soc. 142(42), 17828–17844 (2020). [CrossRef]

4. S. Hugelier, J. J. de Rooi, R. Bernex, S. Duwé, O. Devos, M. Sliwa, P. Dedecker, P. H. C. Eilers, and C. Ruckebusch, “Sparse deconvolution of high-density super-resolution images,” Sci. Rep. 6(1), 21413 (2016). [CrossRef]

5. Y. Shechtman, A. Beck, and Y. C. Eldar, “GESPAR: efficient phase retrieval of sparse signals,” IEEE Trans. Signal Process. 62(4), 928–938 (2014). [CrossRef]

6. J. Kim, J. K. Lee, and K. M. Lee, “Accurate image super-resolution using very deep convolutional networks,” IEEE Trans. Pattern Anal. Mach. Intell. 38(2), 1646–1654 (2015). [CrossRef]

7. E. Nehme, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Deep-STORM: super-resolution single-molecule microscopy by deep learning,” Optica 5(4), 458 (2018). [CrossRef]

8. N. Boyd, E. Jonas, H. Babcock, and B. Recht, “DeepLoco: fast 3D localization microscopy using neural networks,” bioRxiv 267096 (2018).

9. W. Ouyang, A. Aristov, M. Lelek, X. Hao, and C. Zimmer, “Deep learning massively accelerates super-resolution localization microscopy,” Nat. Biotechnol. 36(5), 460–468 (2018). [CrossRef]

10. S. Kumar Gaire, Y. Zhang, H. Li, R. Yu, H. F. Zhang, and L. Ying, “Accelerating multicolor spectroscopic single-molecule localization microscopy using deep learning,” Biomed. Opt. Express 11(5), 2705 (2020). [CrossRef]

11. P. Zelger, K. Kaser, B. Rossboth, L. Velas, G. J. Schütz, and A. Jesacher, “Three-dimensional localization microscopy using deep learning,” Opt. Express 26(25), 33166 (2018). [CrossRef]

12. A. Speiser, L.-R. Müller, U. Matti, C. J. Obara, W. R. Legant, J. Ries, J. H. Macke, and S. C. Turaga, “Teaching deep neural networks to localize single molecules for super-resolution microscopy,” (2019).

13. B. Diederich, P. Then, A. Jügler, R. Förster, and R. Heintzmann, “CellSTORM—Cost-effective super-resolution on a cellphone using dSTORM,” PLoS One 14(1), e0209827 (2019). [CrossRef]

14. P. Zhang, S. Liu, A. Chaurasia, D. Ma, M. J. Mlodzianoski, E. Culurciello, and F. Huang, “Analyzing complex single-molecule emission patterns with deep learning,” Nat. Methods 15(11), 913–916 (2018). [CrossRef]

15. J. Min, C. Vonesch, H. Kirshner, L. Carlini, N. Olivier, S. Holden, S. Manley, J. C. Ye, and M. Unser, “FALCON: Fast and unbiased reconstruction of high-density super-resolution microscopy data,” Sci. Rep. 4(1), 4577 (2015). [CrossRef]

16. S. Gazagnes, E. Soubies, and L. Blanc-Feraud, “High density molecule localization for super-resolution microscopy using CEL0 based sparse approximation,” in Proceedings - International Symposium on Biomedical Imaging (IEEE Computer Society, 2017), pp. 28–31.

17. L. Zhu, W. Zhang, D. Elnatan, and B. Huang, “Faster STORM using compressed sensing,” Nat. Methods 9(7), 721–723 (2012). [CrossRef]

18. A. Bechensteen, L. Blanc-Féraud, and G. Aubert, “New ℓ 2 − ℓ 0 algorithm for single-molecule localization microscopy,” Biomed. Opt. Express 11(2), 1153 (2020). [CrossRef]

19. D. Sage, H. Kirshner, T. Pengo, N. Stuurman, J. Min, S. Manley, and M. Unser, “Quantitative evaluation of software packages for single-molecule localization microscopy,” Nat. Methods 12(8), 717–724 (2015). [CrossRef]

20. D. Sage, T. A. Pham, H. Babcock, T. Lukes, T. Pengo, J. Chao, R. Velmurugan, A. Herbert, A. Agrawal, S. Colabrese, A. Wheeler, A. Archetti, B. Rieger, R. Ober, G. M. Hagen, J. B. Sibarita, J. Ries, R. Henriques, M. Unser, and S. Holden, “Super-resolution fight club: assessment of 2D and 3D single-molecule localization microscopy software,” Nat. Methods 16(5), 387–395 (2019). [CrossRef]

21. W. Yang, X. Zhang, Y. Tian, W. Wang, J. H. Xue, and Q. Liao, “Deep learning for single image super-resolution: a brief review,” IEEE Trans. Multimed. 21(12), 3106–3121 (2019). [CrossRef]

22. M. J. Rust, M. Bates, and X. Zhuang, “Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM),” Nat. Methods 3(10), 793–796 (2006). [CrossRef]

23. J. Vogelsang, C. Steinhauer, C. Forthmann, I. H. Stein, B. Person-Skegro, T. Cordes, and P. Tinnefeld, “Make them blink: probes for super-resolution microscopy,” ChemPhysChem 11(12), 2475–2490 (2010). [CrossRef]

24. R. Jungmann, M. S. Avendaño, J. B. Woehrstein, M. Dai, W. M. Shih, and P. Yin, “Multiplexed 3D cellular super-resolution imaging with DNA-PAINT and Exchange-PAINT,” Nat. Methods 11(3), 313–318 (2014). [CrossRef]

25. V. Tatavarty, E.-J. Kim, V. Rodionov, and J. Yu, “Investigating sub-spine actin dynamics in rat hippocampal neurons with super-resolution optical imaging,” PLoS One 4(11), e7724 (2009). [CrossRef]

26. R. Henriques, M. Lelek, E. F. Fornasiero, F. Valtorta, C. Zimmer, and M. M. Mhlanga, “QuickPALM: 3D real-time photoactivation nanoscopy image processing in ImageJ,” Nat. Methods 7(5), 339–340 (2010). [CrossRef]

27. S. Wolter, A. Löschberger, T. Holm, S. Aufmkolk, M. C. Dabauvalle, S. Van De Linde, and M. Sauer, “RapidSTORM: Accurate, fast open-source software for localization microscopy,” Nat. Methods 9(11), 1040–1041 (2012). [CrossRef]

28. X. Wan, R. P. O’Quinn, H. L. Pierce, A. P. Joglekar, W. E. Gall, J. G. DeLuca, C. W. Carroll, S.-T. Liu, T. J. Yen, B. F. McEwen, P. T. Stukenberg, A. Desai, and E. D. Salmon, “Protein architecture of the human kinetochore microtubule attachment site,” Cell 137(4), 672–684 (2009). [CrossRef]

29. L. S. Fischer, C. Klingner, T. Schlichthaerle, M. T. Strauss, R. Böttcher, R. Fässler, R. Jungmann, and C. Grashoff, “Quantitative single-protein imaging reveals molecular complex formation of integrin, talin, and kindlin during cell adhesion,” Nat. Commun. 12(1), 1–10 (2021). [CrossRef]

30. S. Tal and J. Paulsson, “Evaluating quantitative methods for measuring plasmid copy numbers in single cells,” Plasmid 67(2), 167–173 (2012). [CrossRef]

31. Y. Chen, A. M. Pasapera, A. P. Koretsky, and C. M. Waterman, “Orientation-specific responses to sustained uniaxial stretching in focal adhesion growth and turnover,” Proc. Natl. Acad. Sci. 110(26), E2352–E2361 (2013). [CrossRef]

32. S. Park and Y. Chen, Mechanics of Biological Systems (Morgan & Claypool Publishers, 2019). [CrossRef]

33. L. Barna, B. Dudok, V. Miczán, A. Horváth, Z. I. László, and I. Katona, “Correlated confocal and super-resolution imaging by VividSTORM,” Nat. Protoc. 11(1), 163–183 (2016). [CrossRef]

34. Y. Fu, J. Chen, T. Zhang, and Y. Lin, “Residual scale attention network for arbitrary scale image super-resolution,” Neurocomputing 427, 201–211 (2021). [CrossRef]

35. H. Mazidi, T. Ding, A. Nehorai, and M. D. Lew, “Quantifying accuracy and heterogeneity in single-molecule super-resolution microscopy,” Nat. Commun. 11(1), 6353–6411 (2020). [CrossRef]

36. K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: surpassing human-level performance on imagenet classification,” in Proceedings of the IEEE International Conference on Computer Vision (Institute of Electrical and Electronics Engineers Inc., 2015), pp. 1026–1034.

37. J. Chen and Y. Chen, “Parametric comparison between sparsity-based and deep learning-based image reconstruction of super-resolution fluorescence microscopy supplemental codes,” figshare (2021), https://doi.org/10.6084/m9.figshare.14428886.

Name	Description
Code 1	Matlab codes to generate simulated images used in the study and to evaluate the performance of the algorithms
Supplement 1	Supplemental document includes description to the supplemental code 1 and additional figures

Parametric comparison between sparsity-based and deep learning-based image reconstruction of super-resolution fluorescence microscopy

Abstract

1. Introduction

2. Results

2.1 Simulated fluorescence microscopy images with various degrees of sparsity and pixel connectivity are generated for systematic evaluation

2.2 Deep learning-based image recovery is faster than sparsity-based recovery by at least three orders of magnitude

2.3 VDSR recovers true signals from low resolution images with high recall efficiency

2.4 VDSR recovers true signals from low resolution images with higher localization accuracy

2.5 SPIDER recovers more true zero pixels at high emitter density

2.6 Image reconstruction from a real super-resolution experiment

3. Discussion

4. Methods

4.1 Dataset preparation

4.2 Recovery from low-resolution images

4.3 Real super-resolution experiment

4.4 Software

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (2)

Data availability

Cited By

Figures (4)

Equations (3)

Biomedical Optics Express