Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Spatial resolution, signal-to-noise and information capacity of linear imaging systems

Open Access Open Access

Abstract

A simple model for image formation in linear shift-invariant systems is considered, in which both the detected signal and the noise variance are varying slowly compared to the point-spread function of the system. It is shown that within the constraints of this model, the square of the signal-to-noise ratio is always proportional to the “volume” of the spatial resolution unit. In the case of Poisson statistics, the ratio of these two quantities divided by the incident density of the imaging particles (e.g. photons) represents a dimensionless invariant of the imaging system, which was previously termed the intrinsic imaging quality. The relationship of this invariant to the notion of information capacity of communication and imaging systems, which was previously considered by Shannon, Gabor and others, is investigated. The results are then applied to a simple generic model of quantitative imaging of weakly scattering objects, leading to an estimate of the upper limit for the amount of information about the sample that can be obtained in such experiments. It is shown that this limit depends only on the total number of imaging particles incident on the sample, the average scattering coefficient, the size of the sample and the number of spatial resolution units.

© 2016 Optical Society of America

1. Introduction

Among the most important characteristics of many imaging, communication and measurement systems are their capacity to encode or transmit large quantities of information, resolve sufficiently fine spatial details and provide output with high signal-to-noise ratio (SNR) at lowest possible radiation doses or input energy levels [1, 2]. For mainly historical reasons (abundance of photons in typical visible light imaging applications), imaging performance characteristics related to the incident particle fluence have not been extensively studied in classical optics. However, such characteristics, and the interplay between them, have attained additional relevance in recent years in the context of biomedical imaging, where the samples are often sensitive to the radiation doses [3], in certain astronomical methods where the detectable photon flux can be extremely low [4], as well as in some problems related to foundations of optics and quantum physics [5, 6]. In X-ray medical imaging, in particular, it is critically important to minimize the radiation dose delivered to the patient, while still being able to obtain 2D or 3D images with sufficient spatial resolution and SNR in order to detect the features of interest, such as small tumours [7, 8]. In this context, an imaging system (e.g. a CT scanner) must be able to maximize the amount of relevant information that can be extracted from the collected images, while keeping sufficiently low the number of X-ray photons impinging on the patient. The present paper addresses some mathematical properties of generic imaging systems, that are likely to be important in the context of designing medical imaging instruments, and may also have relevance to some fundamental aspects of quantum physics and information theory.

We have recently introduced a notion of intrinsic imaging quality, which incorporates both the spatial resolution and the noise propagation properties of an imaging system [6–10]. Note that in our treatment we consider all the components, beginning from the source of imaging particles/radiation and ending with the optional computer-aided post-processing of the images, to be parts of a complete “imaging system”, with all these components potentially affecting the overall imaging quality of the system. Compared to previous publications on this and related topics, Section 2 of this paper extends the notion of intrinsic imaging quality from “flat-field” (featureless) images to the situation were the signal and noise levels can vary across the images, but the rate of variation is slow compared to that of the point-spread function (PSF) of the imaging system. In Section 3, we consider the notion of information capacity of imaging systems, leading to results which are consistent with the ones previously reported on the basis of the classical approach originated by C. Shannon in the context of communication systems [11] and later extended by others to imaging systems [12–17]. These results are also related to the theory of communication channels and modes [18–22], as discussed in the Appendix. In Section 4, we apply the obtained results for the intrinsic imaging quality and information capacity of linear shift-invariant (LSI) systems to analysis of a simple model of quantitative imaging of weakly scattering samples. In particular, we derive an estimate for the maximum information about the sample that can be extracted in the relevant scattering experiments. We show that such a limit for the extractable information depends only on the total number of imaging particles used (i.e. on the radiation dose), the number of spatial resolution units and the “scattering power” of the sample, which is defined as a product of the average scattering coefficient and the linear size of the sample. The main results of the present paper are summarized in Section 5.

2. Noise and spatial resolution

Consider a simple model for formation of images in an LSI system which involves a stochastic flux of imaging particles incident on a position-sensitive detector, and a linear filter function describing the deterministic post-detection action of the imaging system. Here the fluence of particles (i.e. the particle density) in an image registered by the detector will typically result from interaction of particles emitted by a source with a scattering object (sample) that is being imaged, but we leave the analysis of effects of such interaction until Section 4 below.

Different realizations (instances) of images registered by the detector under the same conditions correspond to different sample functions of the stochastic detected particle fluence distribution Sin(x), xn, where n is the dimensionality of the image. Note that here we generalize the concept of a “detection system” to any positive integer dimension n, which can be convenient e.g. for description of 3D imaging. We will call the expectation value (ensemble average) S¯in(x) the “detected signal” (which serves as an “input” for the post-detection signal processing system). The difference S˜in(x)=Sin(x)S¯in(x) is the corresponding “noise process”.

The second-order correlation properties of the particle fluence registered by the detector are described by the autocovariance function Γ(x,y)=S˜in(x)S˜in(y)¯. We assume that the autocovariance function has the following form:

Γ(x,y)=H(xy)H(0)σin2(x+y2),
where the ratio H(xy)/H(0) has the meaning of the degree of correlation, and σin2(x)=Γ(x,x) is the noise variance. Note that this model is similar in form to that of the quasi-homogeneous source [2], but while the latter model represents the second-order correlation function of complex amplitudes, the present model, Eq. (1), is of the fourth order with respect to the complex amplitudes. The key assumption that we make here is that the function H(x) is narrow compared to the detected signal S¯in(x) and the noise variance σin2(x), i.e. S¯in(x) and σin2(x) are almost constant over the distances equal to the width of H(x). This means that the autocovariance function Γ(x,y) is quasi-stationary, i.e. it can be written in the form Γx(h)=Γ(x+h/2,xh/2)=σin2(x)H(h)/H(0) with a slowly varying “envelope” determined by σin2(x) and a rapidly varying component defined by H(h). This allows us to introduce, by means of the Wiener-Khinchin theorem, the corresponding noise power spectrum (which is also a slowly varying function of x): Wx(ξ)=Γ^x(ξ)=σin2(x)H^(ξ)/H(0), where the hat symbol denotes the Fourier transform with respect to h.

As the noise power spectrum Wx(ξ) is a non-negative and even function of ξ, the same is true for H^(ξ). Therefore, there exists a real-valued even function P(x), such that H^(ξ)=|P^(ξ)|2, and hence H(x)=P(y)P(x+y)dy. The function P(x) represents the “native” point-spread function (PSF) of the imaging system, which can reflect both the correlation properties of the incident particle fluence (i.e. the properties of the source and the imaging set-up) and the PSF of the detector. We assume here that P(x) is non-negative. Then, taking into account that H^(0)=||H||1=||P||12 and H(0)=||P||22, where ||f||p=(|f(x)|pdx)1/p is the usual Lp norm, we obtain in particular that Wx(0)=Γx(h)dh=σin2(x)H^(0)/H(0)=σin2(x)||P||12/||P||22.According to its definition and the properties of the covariance function assumed above, the native PSF P(x) is much narrower than both the detected signal S¯in(x) and the noise variance σin2(x). Note that such a behavior is “natural” for a well-designed imaging system: it ensures, in particular, that the detector PSF does not significantly smear typical incident signals. Model examples of the native PSF are represented by PGauss(x)=(2π)n/2σnexp[|x|2/(2σ2)] and Prect(x)=(2σ)nχ[σ,σ]n(x), where χ[σ,σ]n(x) is the function that is equal to one in the cubic area around the origin of coordinates with side length equal to 2σ, and is equal to zero outside this area. An autocovariance function of the type described above is obtained, for example, in the case of a filtered Poisson point process [1] when the width of the filter is small compared to the typical variation length of the mean photon fluence.

We want to find out how the output of the imaging system changes after the action of an LSI transformation with a non-negative PSF (filter function) T(x),

S(x)=(TSin)(x)=T(xy)Sin(y)dy.
Let us calculate the average signal (i.e. the expectation) and the (noise) variance of the filtered process in Eq. (2) under the assumption that T(x) is varying slowly compared to H(x) (i.e. T(x) is almost constant over distances comparable with the correlation length of the input noise), but the width of T(x) is much smaller than the characteristic length of variation of the detected signal S¯in(x) and noise variance σin2(x). Under these assumptions, we have:
S¯(x)=S¯in(xy)T(y)dyS¯in(x)||T||1,
and
σ2(x)=Γ(xy,xy)T(y)T(y)dydyσin2(x)||H||1H(0)||T||22.
Therefore, the following expression can be given for the output SNR:
SNR2(x)=S¯2(x)σ2(x)=SNRin2(x)||P||22||P||12||T||12||T||22,
where SNRin2=S¯in2(x)/σin2(x).

Having introduced the notion of SNR in the LSI system defined by Eq. (3), we now turn to the notion of spatial resolution. The intrinsic spatial resolution Δx of a system described by Eq. (3) can be defined via the variance of its PSF as follows,

Δx=(4πn|x|2T(x)dxT(x)dx)1/2=(4πn||||2T||1||T||1)1/2,
where the factor 4π / n is introduced for normalization purposes (see Table 1), and we have assumed for simplicity that T(x)0 is properly centred with respect to x=0, i.e. its first integral moment is equal to zero.

Tables Icon

Table 1. Spatial resolution associated with different PSFs. All PSFs are normalized such that ||T(x;σ)||1=1.

Following the ideas discussed in [2, 23] and elsewhere, it is also possible to define the spatial resolution of the LSI system by an alternative expression:

Δ2x=(||T||12/||T||22)1/n.
The results of calculation of (Δx)2 and (Δ2x)2 for some popular PSFs shown in Table 1 demonstrate that the spatial resolution defined by Eq. (7) produces similar, and in some cases more natural results, compared to the more conventional definition given by Eq. (6). Note in particular that in the case of our model PSF functions PGauss(x) and Prect(x), we get (Δ2x)in=2πσ and (Δ2x)in=2σ, respectively, which correlates well with the usual understanding of the “width” of these two functions.

Combining Eqs. (5) and (7), we can obtain:

SNR2(x)(Δ2x)n=SNRin2(x)(Δ2x)inn,
where (Δ2x)in=(||P||12/||P||22)1/n is the spatial resolution associated with the native PSF P(x). Equation (8) means that the ratio of the square of SNR to the volume of the spatial resolution unit is invariant with respect to actions of LSI systems under the assumed conditions. In other words, an exact duality exists between the signal-to-noise and the spatial resolution of the imaging system. It can be verified that any additional convolution with another filter V(x), which is much wider than T(x) but much narrower than the typical variation length of the detected signal, will result in the same equation for the noise variance as Eq. (5), with T(x) replaced by V(x). This can be proved using the fact that the two subsequent convolutions are equivalent to the convolution of the original image with the PSF (T*V)(x), ||T*V||12=||T||12||V||12, and under the stated assumptions, we get also that ||T*V||22||T||12||V||22.

If the detector area An is fixed, the spatial resolution “area” and the total number M2 of resolvable effective “pixels” (resolution units) are inversely proportional to each other: M2=An/(Δ2x)n. Then Eq. (8) implies that the product of the number of resolvable spatial units (pixels) and the square of the SNR is also invariant under the action of LSI systems with suitable PSFs:

M2SNR2(x)=M2,inSNRin2(x).
Equation (9) shows that, at least under the stated assumptions, the invariant, M2SNR2(x), does not change after post-processing operations that can be described in the form of a convolution or deconvolution equation.

Note that, while the invariant defined by Eq. (9) is dimensionless, the invariant defined by Eq. (8) has the dimensionality of inverse length taken to the power of n. One way to “naturally” normalize this function and make it dimensionless is to divide it by S¯in(x). Indeed, in the case of a filtered Poisson point process with the filter function T(x) which is much narrower than the variation length of detected signal S¯in(x) and the noise variance σin2(x), one has [1, Sect.11.3.9]: S¯(x)=(TS¯in)(x)||T||1S¯in(x) and σ2(x)=(T2S¯in)(x)||T||22S¯in(x). Then using Eq. (7) we obtain:

SNR2(x)(Δ2x)nS¯in(x)=1.
More generally, the quantity defined in Eq. (8) can be expressed as SNRin2(x)/(Δ2x)inn=S¯in2(x)||P||22/(σin2(x)||P||12)=S¯in2(x)/Wx(0), and hence the latter expression gives the “normalization factor” that correctly accounts both for the strength of the detected signal and for its variation length scale. For Poisson processes one has S¯in2(x)/Wx(0)=S¯in(x), which coincides with the normalization factor chosen above for this case.

We have previously introduced and studied [6–10] the following dimensionless quantity, which is close in form to Eq. (10) and incorporates both the noise propagation and the spatial resolution properties of LSI system:

QS2SNR2(x)(Δx)nS¯in(x).
This quantity was termed the “intrinsic imaging quality” characteristic (per single particle) of the imaging system. Note that although both SNR and S¯in can vary across the image, they have been assumed to be “quasi-stationary”, i.e. approximately constant within the width of the filter function of the LSI system. It turns out that under these conditions, the imaging quality QS is constant across the image. Indeed, substituting (10) into (11) and then taking into account the definitions of the spatial resolution from Eqs. (6) and (7), we obtain
QS2=(Δ2x)n(Δx)n=||T||12+n/2(4π/n)n/2||||2T||1n/2||T||22,
i.e. the intrinsic imaging quality characteristic can be expressed purely in terms of the filter PSF alone. It is easy to verify that the functional in the right-hand side of Eq. (12) is bi-invariant with respect to multiplication of the PSF or its argument by any positive constant. Therefore, QS is independent from the height or width of the system's PSF, and depends only on its functional form (see Table 1). It was proven in [9] that this functional is always bounded from above, i.e. the inequality QS2(T)1/Cn holds and is exact for LSI systems with point-spread functions T(x) having finite mathematical expectation, variance and energy, where Cn=2nΓ(n/2)n(n+2)/(n+4)n/2+1 is the Epanechnikov constant [7,9,24]. The maximum, i.e. the equality QS2(T)=1/Cn, is achieved on Epanechnikov PSFs TE(x)=(1|x|2)+ [9,24], where the subscript “+” denotes that TE(x)=0 at points where the expression in brackets is negative. Note that C1=6(π/125)1/20.95 and Cn → 0 monotonically when n → ∞. Although the definition of the intrinsic quality was originally introduced for LSI systems [7], later we extended it to some non-linear systems. One such example, studied in [6], corresponded to the case of the ideal observer SNR [1] in the famous Young double-slit diffraction experiment.

We will also need an estimate for the average value of SNR2(x) over all spatial resolution units in the image. In the case of incident fluences with Poisson statistics, we obtain from Eq. (10): SNR2(xm)=(Δx2)nS¯in(xm)=n¯(xm), which can be identified with the average number of imaging particles registered in the “pixel” (spatial resolution unit) with the center at x=xm. Let us sum up this quantity over all points xm,m=1,2,...,M2, corresponding to the centers of the output spatial resolution units: M2SNR2=m=1M2SNR2(xm)=m=1M2(Δx2)nS¯in(xm)=m=1M2n¯(xm)=N, i.e.

SNR2=N/M2,
where N is the average total number of incident particles per registered image and the angular brackets denote the average over all resolution units in the sample.

A useful expression for the intrinsic imaging quality characteristic can be obtained by first noting that QS2=(Δ2x)n/(Δx)n=M/M2, where M=An/(Δx)n, and then substituting the expression M2=N/SNR2 obtained from Eq. (13), with the result given by:

QS2=(M/N)SNR2.
As discussed in the next section, Eq. (14) is somewhat similar in form to the expression for information capacity of imaging systems.

3. Information capacity of imaging systems

Consider a simple “imaging” experiment that involves N incident particles (“imaging quanta”, e.g. photons) and a 2D detector with M pixels (where both N and M are non-negative integer numbers). We would like to calculate first how many distinct “images” can be produced in this situation, in other words, how many distinct sequences (n1,n2,...,nM) exist, with m=1Mnm=N (all nm are non-negative integers). Two sequences (n1,n2,...,nM) and (n1,n2,...,nM) are considered distinct if nmnm for at least one index m, 1mM. This problem is equivalent to the one about the number of different ways for distributing N indistinguishable (identical) balls into M distinguishable bins. The corresponding total number of different combinations, that we denote by X(M,N), can be found by utilizing the following recursive relationship:

X(M,N)=n=0NX(M1,n).
Equation (15) can be understood by considering the last (M-th) bin: obviously, it can contain nM=0,or1,...,orN balls, and in each such case, the remaining (NnM) balls can be distributed arbitrarily between the first (M - 1) bins. Comparing Eq. (15) with the known theorem about the “rising sum of binomial coefficients” (see e.g [25].), which states:
n=0N(M+nM)=(M+1+NM+1)=(M+1+NN),
we can conclude that the quantity
X(M,N)=(M1+NM1)=(M1+N)!(M1)!N!
satisfies the recursive relationship of Eq. (15). The correctness of “normalization” in Eq. (17) is justified by considering the case of M = 1 bins: obviously, in this case the number of possible different placements of balls is equal to 1 (they can only all go into the one available bin). This agrees with the fact that, according to Eq. (17), X(1,N)=1 for any N.

Now let us consider the possibility that some of the particles incident on the object can be absorbed or strongly deflected by the object, etc., and as a result will not be registered by the detector. This means that some of the resultant “images” in this situation can be created with a different number N of registered quanta, where N can be any integer satisfying 0NN. Let us calculate the number, Y(M,N), of all distinct images that can be formed this way. Obviously,

Y(M,N)=N=0NX(M,N).
Knowing the explicit form of X(M,N) given by Eq. (17), we can use Eq. (16) again to evaluate the sum on the right-hand side of Eq. (18), with the result:
Y(M,N)=(M+NM)=(M+NN)=(M+N)!M!N!.
Note, in particular, that in the simplest case of one bin, M = 1, Eq. (19) gives Y(1,N)=N+1. This is the correct result, as the only bin available in this case can contain any number of balls between 0 and N.

A simpler approximation for the expression Y(M,N) in Eq. (19) can be derived for large values of M and N. According to the refined version of Stirling's formula for the factorial [26]:

n!=2πnn+12enewn,n=1,2,...,
where 1/(12n+1)<wn<1/(12n).

Let us introduce a notion of “information capacity” D(M,N)=log2Y(M,N) by analogy with the original information capacity definition introduced by Shannon [11] for communication systems and later extended to imaging systems [12–17]. We will also consider information capacity per single particle: DS(M,N)=D(M,N)/N=(1/N)log2Y(M,N). From Eqs. (19)-(20) we obtain:

D(M,N)=(M+N+0.5)log2(M+N)(M+0.5)log2M(N+0.5)log2N+O(1),
where O(1) does not exceed a constant of the order of 1 for any non-negative integer M and N. It may be interesting to note that the expressions for Y(M,N) and D(M,N) are symmetric with respect to M and N, even though by definition all the imaging particles are considered identical (indistinguishable) and all the detector pixels are uniquely identifiable (distinguishable).

Usually in practice the number of imaging particles (e.g. photons) in an image is much larger than the number of detector pixels, which corresponds to the case N>>M in the above equations. In that case Eq. (21) leads to the following expressions

D(M,N)=M[log2(N/M)+O(1)],
DS(M,N)=(M/N)[log2(N/M)+O(1)],
when N/M.

Let us now postulate that two images are distinguishable if and only if the difference between their signal values in at least one pixel exceeds the sum of standard deviations of noise in that pixel in the two images (this can be easily modified to include a distinguishability factor SNRmin which can e.g. be equal to 5, if the Rose criterion is applied). In the “noise-free” case considered above, the distinguishability was equivalent to any difference in detected particle numbers in the two images by at least one particle in at least one pixel, which formally corresponds e.g. to the standard deviation of noise being equal to 1/2 or less in all pixels of both images.

Consider the case where the detector is not perfect and contains a fixed amount of additive noise in each pixel. Assuming that the level of noise is σ, two signals, corresponding to n1 and n2 registered particles in a pixel, respectively, can be considered distinguishable e.g. if |n1n2|2σ. This is equivalent to |n1/(2σ)n2/(2σ)|1. In other words, the number of distinguishable images can now be calculated in terms of “bunches” of 2σ particles, instead of the individual particles (note that the fractions of the bunches, that correspond to signal increments below the level of distinguishability, may lead to a reduction of the total number of full bunches). Therefore, the result obtained above in Eqs. (22)-(23) should remain valid in the case of additive detector noise, with the replacement of N by N/(2σ). Note also that because log2(N/(2σM))=log2(N/M)log2(2σ)=log2(N/M)+O(1), and hence the form of Eqs. (22-23) can remain unchanged in this case.

Finally consider the case of a photon-counting detector, which corresponds to the Poisson statistics with the expectation and variance both equal to the mean number n¯ of photons per pixel, σ=n¯1/2. Let us define two signal levels as distinguishable if they differ at least by the sum of their standard deviations. If the number of photons in a pixel is equal or smaller than n¯, then there are n¯1/2 distinguishable possible signal levels. For example, for n¯=25, the n¯1/2=5 distinguishable levels (bands) correspond to 25 ± 5, 16 ± 4, 9 ± 3, 4 ± 2, and 1 ± 1 photons. Therefore, the result stated in Eqs. (22) and (23) should remain valid in the case of Poisson statistics, with the replacement of N by N1/2. Note also that Mlog2(N/M)+O(M)=0.5Mlog2(N/M)+O(Mlog2M).Although the above considerations regarding the extension of Eqs. (22) and (23) to the case of noisy images are not rigorous, it turns out that they lead to correct formulae for the information capacity in the case of Poisson (shot) and Gaussian (additive) noise. A rigorous derivation for these cases is given in the Appendix where more accurate formulae are derived from the results in [18, 20, 22]. Specifically, an estimate for the information capacity for the case of Poisson statistics when N/M is large, is given by

DP(M,N)=0.5M[log2(N/M)+o(1)],
from which it follows that
DP,S(M,N)=0.5(M/N)[log2(N/M)+o(1)],
where the term tends to zero in the limit as N/M.

Identifying the individual “pixels” with the isotropic spatial resolution units having the side length given by Eq. (7) and substituting Eq. (13) into Eq. (24), we obtain that in the case of Poisson noise:

DP(M2,N)=0.5M2[log2SNR2+o(1)],
where the term o(1) tends to zero in the limit as N/M2. The corresponding equation in the case of Gaussian additive noise is:
DG(M2,N)=0.5M2[log2(SNR2+1)].
Previously, expressions for the information capacity of imaging systems with additive noise in a form similar to Eq. (27) have been obtained in [12, 16, 17].

4. Effect of the imaged sample

The above analysis has been carried out without a reference to an imaged sample. In the present section we consider one simple generic model which takes into account the interaction of the incident particle fluence with a sample and its consequences for the information about the sample that can be obtained in an imaging experiment.

Consider an experiment involving interaction of a flux of incident imaging particles with a weakly scattering sample, after which the scattered particles are registered by a position-sensitive detector. We assume for simplicity that the particle fluence over the “incident surface” of the sample can be described by its spatially uniform expectation value S¯0(n1). Note that the latter quantity is expressed in the number of particles per unit “area”, i.e. it has the dimensionality of inverse length in the power of n - 1, while the detected signal, S¯in(x), introduced in Section 2 was expressed in the number of particles per unit “volume”, i.e. it has the dimensionality of inverse length in the power of n. We assume here that the detected scattered fluence Sin(x) has a Poisson distribution with the mean S¯in(x)=S¯0(n1)γ(x). This corresponds to the case of so-called “dark-field imaging”, when the primary unscattered beam does not reach the detector. Modification to the case of bright-field imaging is quite straightforward and the corresponding result is given below. Here γ(x) is a deterministic non-negative function (“scattering coefficient”) that describes the interaction of the incident fluence with the sample. This interaction is assumed to be linear with respect to the interaction length, i.e. γ(x) has the dimensionality of inverse length. As a model for γ(x) one may consider, for example, the linear attenuation, μ(x;λ)=(4π/λ)β(x;λ), or refraction, (2π/λ)δ(x;λ), coefficients in X-ray imaging, where n=1δ+iβ is the complex refractive index of the sample and λ is the X-ray wavelength (we omit the argument λ in the notation for γ(x) for brevity). It can sometimes be helpful to express γ(x)=σρ(x), where σ is the scattering cross-section and ρ(x) is the number density of scattering centers (usually, atoms) in the sample. The interaction is also assumed to be weak, so that γ(x)A<<1, where A is the linear size of the sample. In particular, this allows us to ignore the weakening of the incident flux in the course of its propagation through the sample. Finally, the interaction has a “single-scattering” nature, i.e. each incident particle interacts at most with a single scattering center the sample, as is usually assumed in the first Born approximation or in the so-called “kinematical approximation” in X-ray and electron diffraction [27]. This model is somewhat similar to the one used in diffraction tomography [28], where the first Born approximation is invoked for the description of interaction of the radiation with the sample. However, unlike diffraction tomography, our simple model ignores the effect of propagation of the scattered radiation to the detector, assuming only that there is a one-to-one correspondence between the resolution volumes in the sample and the effective “detector pixels”. The totality of detector pixels may possibly be aggregated over multiple 2D exposures of the sample. For example, in conventional or diffraction CT imaging, such one-to-one correspondence may be achieved after a reconstruction procedure, e.g. after the application of the inverse Radon transform to the data collected in real detector pixels at different rotational positions of the sample. In the case of weakly absorbing or weakly scattering samples, such reconstruction procedure is usually linear and can be expressed by a convolution with the corresponding filter function [29], and hence these cases fit well into the LSI approach considered above.

Consider now a post-detection filtering of the Poisson process with a narrow filter function T(x) as in Section 2 above (i.e. the filter function is required to be narrow with respect to the characteristic variation length of the scattering properties across the sample). We then obtain: S¯(x)=(TS¯in)(x)||T||1S¯0(n1)γ(x) and σ2(x)=(T2S¯in)(x)||T||22S¯0(n1)γ(x). Accordingly,

SNR2(x)=||T||12S¯0(n1)γ(x)||T||22=S¯0(n1)γ(x)(Δ2x)n.
We can now use the results from the previous sections to characterize the information capacity of an LSI imaging system with a weakly scattering object. In the present context, this can be understood as a capacity of the system to distinguish between different imaged objects.

Using the expression for SNR2(x) from Eq. (28) and arguing similarly to the derivation of Eq. (13) above, i.e. averaging over all spatial resolution units into which the sample is split, we can obtain

SNR2=γ¯AS¯0(n1)An1(Δ2x)nAn=ΩNM2,
where γ¯=M21m=1M2γ(xm) is the average value of the linear scattering coefficient in the sample, and Ω=γ¯A=(σ/An1)Nsc can be termed the scattering power of the sample, with Nsc denoting the total number of scattering centers in the sample and σ/An1 being the relative scattering cross-section. It is easy to verify that in the case of bright-field imaging one obtains
SNRb.f.2=Ω(2)N/M2,
with Ω(2)=γ2¯A2.

Let us consider for comparison a more specific example of conventional computed tomography (CT) imaging. It was shown in [30] that in the case of filtered back-projection reconstruction (FBP) with the nearest-neighbor interpolation, one has:

<SNR2>=12π2μ2¯MaS¯0(2)A4Mp2=(π2)4/312π2Ω(2)NM24/3,
where Ma, Mp and M2=MaMp are, respectively, the number of projections, the number of pixels in each projection and the total number of resolution units utilized in the whole CT scan. Here we used the well-known optimal relationship, Ma=(π/2)Mp1/2 [31], (which leads to isotropic spatial resolution in the reconstructed slices), for derivation of the final result in Eq. (31). Note that Eq. (31) correctly reflects the well-known fact [3] that in order to keep the average SNR in the reconstructed CT images constant, the number N of incident particles (and hence the radiation dose to the sample) has to be changing as the inverse 4th power of the spatial resolution (note that (Δ2x)~1/M21/3). The cause of the difference between this and the corresponding result for the bright-field scattering model, Eq. (30), which has a different power of M2 in the denominator, is in the mathematical properties of the inverse Radon transform. The latter is known to amplify the noise in the input signal via the application of the ramp filter in the course of FBP CT reconstruction (this fact is also reflected in the growth of the eigenvalues of the inverse Radon transform with the radial index) [31].

Using Eq. (29) in combination with Eq. (26) for the information capacity, we obtain:

DP(M2,N)=0.5M2[log2(ΩN/M2)+o(1)].
When the spatial resolution is fixed, the dependence of this information capacity on the number of particles N is straightforward, DP(M2,N)=0.5M2log2N+const. When the number of incident particles is fixed, then the information capacity is bounded from above by a linear function of the number of spatial resolution units, DP(M2,N)M2O(1).

Substituting the expression for the SNR2 from Eq. (28), together with the expression Sin(x)=S¯0(n1)γ(x) for the incident fluence, into the definition of QS2 in Eq. (11), we find that the imaging quality here has exactly the same form as in the case with no sample considered in Section 2. In particular, Eq. (12) and the estimate QS21/Cn remain unchanged. This happens because, according to Eq. (12), the imaging quality can be expressed in terms of the two definitions of the system's spatial resolution alone, and these resolutions do not depend on the effect of the scattering power of the sample under the conditions of the simple model considered here.

5. Conclusions

We have considered the relationship between the spatial resolution, signal-to-noise ratio and information capacity of linear shift-invariant imaging systems. Convenient imaging performance characteristics of such systems include the imaging quality QS and the information capacity per single imaging particle (e.g. photon), DS. From the results of this paper it is clear that, apart from relatively insignificant factors, the information capacity per single particle and the intrinsic imaging quality have similar mathematical expressions, namely DS(M,N)~(M/N)log2<SNR2> and QS(M,N)~(M/N)<SNR2>, respectively. However, the behavior of these two quantities under linear transformations of the imaging system is different due to the presence of the logarithm in the first expression. It is usually possible to increase the signal-to-noise (SNR) ratio at the expense of spatial resolution (or equivalently, the total number of spatial resolution units, M) and vice versa using operations like image filtering, denoising or deconvolution. While the intrinsic imaging quality remains essentially invariant under such (linear) transformations, the information capacity can change significantly: for example, a system with a single spatial resolution unit has quite low information capacity per particle, ~(log2N)/N. On the other hand, when the number of particles is fixed, the information capacity per particle can grow without a limit when the number of resolution units grows: consider the fact that an unlimited number of new images can be created, in principle, by directing a photon or a fixed number of photons into new pixels (one pixel at a time).

In imaging problems that involve quantitative analysis of samples on the basis of transmitted or, more generally, scattered radiation, the notion of “useful signal” is modified. We have studied one example of this type of problems in Section 4 of the paper. We have found that the information capacity is proportional to the logarithm of the product of the total number of imaging particles and the scattering power of the sample, the latter depending only on the average scattering coefficient and the size of the sample (or alternatively, on the total number of scattering centers, e.g. atoms, in the sample, and the average relative scattering cross-section of each such center). On the other hand, the imaging quality per single particle is not affected by the scattering properties of the sample, and remains the same as in the sample-free case. It means that, in accordance with its name and the intended role, the intrinsic imaging quality characteristic depends only on the properties of the imaging system itself, and does not depend on the imaged sample.

Appendix A Information capacity of a simple imaging system in the cases of additive and Poisson detection statistics

Following the ideas by Shannon [11, 18] and others [12, 16], we consider the maximum number of distinct images (“messages”), that can be represented (encoded) by a system which has a given number of “pixels” (spatial resolution units), with an average level of signal-to-noise ratio squared over the image pixels not exceeding a certain value. Note that while in previous publications [11–17] the noise was usually assumed to be additive, here we consider the case of Poisson noise. Setting an upper limit for the average squared SNR in image pixels is then equivalent to limiting the average number of photons per image pixel. As the number of pixels here is considered constant, this in turn is equivalent to setting an upper limit on the total number of particles or on the energy used to form each image.

We begin by considering a communication system with inputs λ1,λ2,λ3, and corresponding outputs y1,y2,y3,and closely follow the development given in Shannon [18]. Further detail may be found in the excellent text on information theory by Cover and Thomas [20]. For a system without memory, Shannon [18] modelled the system by assuming that the inputs λ1,λ2,λ3,are independent realizations of a random variable Λ, say. The outputs y1,y2,y3,are then independent realizations of a random variable, Y say. In his analysis of a noiseless system, Shannon showed that, for largeM, there are approximately 2MH(Λ) distinguishable sequences of the form λ1,λ2,,λM where H(Λ)is the entropy of Λ. It therefore follows that the average information content of a realization of Λ is H(Λ). As an example, consider the case where there are r distinct values for λ that are equally probable. Then, if M is large, there are approximately Mr distinct sequences of λ1,λ2,,λM, each with probabilityMr. Note that when r=n¯=N/M, this is consistent with Eq. (24).

When there is a one to one correspondence between the output of a channel and its input, the channel capacity C is given by,

C=maxΛH(Λ),
where the maximum is taken over all random variables Λ and may be subject to constraints such as a bound on the variance. However, in practice a communication channel will be subject to noise and this will affect its performance in a negative way. Specifically, the average information content from a single output is the mutual information I(Λ;Y) which is defined by
I(Λ;Y)=H(Λ)H(Λ|Y),
where H(Λ|Y) is the conditional entropy of Λ given Y. The channel capacity is now defined by
C=maxΛI(Λ|Y),
where, as previously, the maximum is taken over all random variables Λ and may be subject to constraints.

We have noted previously, that when M is large, the number of distinguishable sequences input sequences of the form λ1,λ2,,λM is approximately 2MH(Λ) and it is only these sequences that have information content. Each of these distinguishable sequences is received as an output sequence y1,y2,,yM and, in the process, may become indistinguishable. Shannon showed that about 1 in 2MH(Λ|Y) distinguishable input sequences results in a distinguishable output sequence. That is, there are approximately 2MI(Λ,Y) distinguishable output sequences that correspond to distinguishable input sequences.

We now, relate the models for communication systems to images. As previously, we haveM pixels and a total of N photons. For each pixel, we have a target number of photons λi,i=1,,M which are independent realization of a random variable Λ and there are associated outputs yi,i=1,,M, which are the number of photons measured in each pixel. As the mean number of photons per pixel is n¯:=N/M, we assume that

E(Λ)=n¯.
It now remains to specify a model of the outputs, and here we shall consider two cases, namely:

  • yi is an independent realization of a Poisson random variable with intensity λi;
  • yi=λi+zi where zi is an independent a realization of a mean zero Gaussian random variable Z that is independent of Λ.

The case of a Poisson distribution is important in a number of applications, including the capacity of a discrete time Poisson channel, and has received considerable attention in the literature. Although an explicit analytical expression for the channel capacity is not known, a number of upper and lower bounds are derived in Lapidoth and Moser [22]. When the mean number of photons per pixel n¯ is large, these yield

CPoisson=12log2(n¯)+o(1),
where the o(1) term becomes zero in the limit as n¯ .This is consistent with the results in Section 3.

The channel capacity for a communications channel with independent additive Gaussian noise, is well known [11]. In this case, we have Y=Λ+Z where Λis a random variable representing the signal andZis an independent mean zero Gaussian random variable that models the noise. The channel capacity is given by

CGauss=12log2(E(Y2)E(Z2))=12log2(E(Λ2)+E(Z2)E(Z2)).
In order to compare this to the Poisson case, we take the same second order moments, namely E(Y2)=n¯+n¯2, and E(Z2)=n¯. This yields
CGauss=12log2(n¯+1)=12log2(n¯)+o(1),
and is asymptotically the same as the Poisson case when the average number of photons per pixel is large.

Acknowledgements

T.E.G. thanks Drs. Harry Quiney, David Paganin, Alexander Kozlov, Saumitra Soha and Andrew Martin for helpful discussions.

References and links

1. H. H. Barrett and K. J. Myers, Foundations of Image Science (Wiley, 2004).

2. J. W. Goodman, Statistical Optics (Wiley, 1985).

3. M. R. Howells, T. Beetz, H. N. Chapman, C. Cui, J. M. Holton, C. J. Jacobsen, J. Kirz, E. Lima, S. Marchesini, H. Miao, D. Sayre, D. A. Shapiro, J. C. H. Spence, and D. Starodub, “An assessment of the resolution limitation due to radiation-damage in x-ray diffraction microscopy,” J. Electron Spectrosc. Relat. Phenom. 170(1-3), 4–12 (2009). [CrossRef]   [PubMed]  

4. A. C. Fabian, K. A. Pounds, and R. D. Blandford, Frontiers of X-ray Astronomy (Cambridge University Press, 2004).

5. R. Bach, D. Pope, S.-H. Liou, and H. Batelaan, “Controlled double-slit electron diffraction,” New J. Phys. 15(3), 033018 (2013). [CrossRef]  

6. Y. I. Nesterets and T. E. Gureyev, “Young’s double-slit experiment: noise-resolution duality,” Opt. Express 23(3), 3373–3381 (2015). [CrossRef]   [PubMed]  

7. T. E. Gureyev, Y. I. Nesterets, F. de Hoog, G. Schmalz, S. C. Mayo, S. Mohammadi, and G. Tromba, “Duality between noise and spatial resolution in linear systems,” Opt. Express 22(8), 9087–9094 (2014). [CrossRef]   [PubMed]  

8. T. E. Gureyev, S. C. Mayo, Ya. I. Nesterets, S. Mohammadi, D. Lockie, R. H. Menk, F. Arfelli, K. M. Pavlov, M. J. Kitchen, F. Zanconati, C. Dullin, and G. Tromba, “Investigation of imaging quality of synchrotron-based phase-contrast mammographic tomography,” J. Phys. D Appl. Phys. 47(36), 365401 (2014). [CrossRef]  

9. F. de Hoog, G. Schmalz, and T. E. Gureyev, “An uncertainty inequality,” Appl. Math. Lett. 38, 84–86 (2014). [CrossRef]  

10. T. E. Gureyev, F. de Hoog, Ya. I. Nesterets, and D. M. Paganin, “On the noise-resolution duality, Heisenberg uncertainty and Shannon’s information,” ANZIAM J. 56, C1–C5 (2015).

11. C. Shannon, “Communication in the presence of noise,” Proc. Inst. Radio Eng. 37, 10–21 (1949).

12. P. B. Felgett and E. H. Linfoot, “On the assessment of optical images,” Phil. Trans. Roy. Soc. A 247(931), 369–407 (1955). [CrossRef]  

13. D. Gabor, “Light and information,” in Progress in Optics, E. Wolf, ed. (North-Holland, 1961).

14. G. Toraldo di Francia, “Degrees of freedom of an image,” J. Opt. Soc. Am. 59(7), 799–804 (1969). [CrossRef]   [PubMed]  

15. F. Gori and G. Guattari, “Shannon number and degrees of freedom of an image,” Opt. Commun. 7(2), 163–165 (1973). [CrossRef]  

16. I. J. Cox and C. J. R. Sheppard, “Information capacity and resolution in an optical system,” J. Opt. Soc. Am. A 3(8), 1152–1158 (1986). [CrossRef]  

17. M. A. Neifeld, “Information, resolution, and space-bandwidth product,” Opt. Lett. 23(18), 1477–1479 (1998). [CrossRef]   [PubMed]  

18. C. E. Shannon, “A mathematical theory of communication,” Bell Syst. Tech. J. 27(3), 379–423 (1948). [CrossRef]  

19. D. A. B. Miller, “Communicating with waves between volumes: evaluating orthogonal spatial channels and limits on coupling strengths,” Appl. Opt. 39(11), 1681–1699 (2000). [CrossRef]   [PubMed]  

20. T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley, 2003).

21. A. Burvall, P. Martinsson, and A. T. Friberg, “Communication modes in large-aperture approximation,” Opt. Lett. 32(6), 611–613 (2007). [CrossRef]   [PubMed]  

22. A. Lapidoth and S. M. Moser, “On the capacity of the discrete-time Poisson channel,” IEEE Trans. Inf. Theory 55(1), 303–322 (2009). [CrossRef]  

23. L. Mandel and E. Wolf, “The measures of bandwidth and coherence time in optics,” Proc. Phys. Soc. 80(4), 894–897 (1962). [CrossRef]  

24. V. A. Epanechnikov, “Non-parametric estimation of a multivariate probability density,” Theory Probab. Appl. 14(1), 153–158 (1969). [CrossRef]  

25. https://proofwiki.org/wiki/Rising_Sum_of_Binomial_Coefficients.

26. P. Stanica, “Good lower and upper bounds on binomial coefficients,” J. Inequal. Pure Appl. Math. 2, 30 (2001).

27. J. M. Cowley, Diffraction Physics (Elsevier, 1995).

28. E. Wolf, “Three-dimensional structure determination of semi-transparent objects from holographic data,” Opt. Commun. 1(4), 153–156 (1969). [CrossRef]  

29. T. E. Gureyev, Y. I. Nesterets, K. M. Pavlov, and S. W. Wilkins, “Computed tomography with linear shift-invariant optical systems,” J. Opt. Soc. Am. A 24(8), 2230–2241 (2007). [CrossRef]   [PubMed]  

30. Ya. I. Nesterets and T. E. Gureyev, “Noise propagation in x-ray phase-contrast imaging and computed tomography,” J. Phys. D Appl. Phys. 47(10), 105402 (2014). [CrossRef]  

31. F. Natterer, The Mathematics of Computerized Tomography (Teubner, 1986).

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Tables (1)

Tables Icon

Table 1 Spatial resolution associated with different PSFs. All PSFs are normalized such that ||T(x;σ)| | 1 =1 .

Equations (39)

Equations on this page are rendered with MathJax. Learn more.

Γ(x,y)= H(xy) H(0) σ in 2 ( x+y 2 ),
S(x)=(T S in )(x)= T(xy) S in (y)dy .
S ¯ (x)= S ¯ in (xy)T(y)dy S ¯ in (x)||T| | 1 ,
σ 2 (x)= Γ(xy,x y )T(y)T( y )dyd y σ in 2 (x) ||H| | 1 H(0) ||T| | 2 2 .
SN R 2 (x)= S ¯ 2 (x) σ 2 (x) =SN R in 2 (x) ||P| | 2 2 ||P| | 1 2 ||T| | 1 2 ||T| | 2 2 ,
Δx= ( 4π n |x | 2 T(x)dx T(x)dx ) 1/2 = ( 4π n ||| | 2 T| | 1 ||T| | 1 ) 1/2 ,
Δ 2 x= (||T| | 1 2 /||T| | 2 2 ) 1/n .
SN R 2 (x) ( Δ 2 x) n = SN R in 2 (x) ( Δ 2 x) in n ,
M 2 SN R 2 (x)= M 2,in SN R in 2 (x).
SN R 2 (x) ( Δ 2 x) n S ¯ in (x) =1.
Q S 2 SN R 2 (x) (Δx) n S ¯ in (x) .
Q S 2 = ( Δ 2 x) n (Δx) n = ||T| | 1 2+n/2 (4π/n) n/2 ||| | 2 T| | 1 n/2 ||T| | 2 2 ,
SN R 2 =N/ M 2 ,
Q S 2 =(M/N) SN R 2 .
X(M,N)= n=0 N X(M1,n) .
n=0 N ( M+n M ) =( M+1+N M+1 )=( M+1+N N ),
X(M,N)=( M1+N M1 )= (M1+N)! (M1)!N!
Y(M,N)= N =0 N X(M, N ) .
Y(M,N)=( M+N M )=( M+N N )= (M+N)! M!N! .
n!= 2π n n+ 1 2 e n e w n , n=1,2,...,
D(M,N)=(M+N+0.5) log 2 (M+N) (M+0.5) log 2 M(N+0.5) log 2 N+O( 1 ),
D(M,N)=M[ log 2 (N/ M) +O( 1 )],
D S (M,N)= (M /N )[ log 2 (N/ M) +O( 1 )],
D P (M,N)=0.5M[ log 2 (N/ M) +o( 1 )],
D P, S (M,N)=0.5 (M /N )[ log 2 (N/ M) +o( 1 )],
D P ( M 2 ,N)=0.5 M 2 [ log 2 SN R 2 +o(1)],
D G ( M 2 ,N)=0.5 M 2 [ log 2 ( SN R 2 +1 )].
SN R 2 (x)= ||T| | 1 2 S ¯ 0 (n1) γ(x) ||T| | 2 2 = S ¯ 0 (n1) γ(x) ( Δ 2 x) n .
SN R 2 = γ ¯ A S ¯ 0 (n1) A n1 ( Δ 2 x) n A n = ΩN M 2 ,
SN R b.f. 2 = Ω (2) N/ M 2 ,
<SN R 2 >= 12 π 2 μ 2 ¯ M a S ¯ 0 (2) A 4 M p 2 = ( π 2 ) 4/3 12 π 2 Ω (2) N M 2 4/3 ,
D P ( M 2 ,N)=0.5 M 2 [ log 2 (ΩN/ M 2 )+o(1)].
C= max Λ H( Λ ),
I( Λ;Y )=H( Λ )H( Λ|Y ),
C= max Λ I( Λ|Y ),
E( Λ )= n ¯ .
C Poisson = 1 2 log 2 ( n ¯ )+o( 1 ),
C Gauss = 1 2 log 2 ( E( Y 2 ) E( Z 2 ) )= 1 2 log 2 ( E( Λ 2 )+E( Z 2 ) E( Z 2 ) ).
C Gauss = 1 2 log 2 ( n ¯ +1 )= 1 2 log 2 ( n ¯ )+o( 1 ),
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.