What are the advantages of ghost imaging? Multiplexing for x-ray and electron imaging

Thomas J. Lane; Thomas J. Lane; Daniel Ratner; Daniel Ratner

doi:10.1364/OE.379503

1. Introduction

There has been recent interest in the use of multiplex sensing techniques in applications at the atomic scale [1–8]. Traditionally, measurements using x-rays or electrons systematically interrogate a system at a single point in space, time, or photon/electron energy at once before moving on to the next measurement. In contrast, multiplex methods measure a combination of space/time/energy points simultaneously. Many repeated multiplexed measurements can convey the same information as a systematic scan if the points combined are chosen wisely. In some cases a multiplex scheme has clear advantages, either fundamental or practical. In other cases, systematic scanning is preferable. The purpose of this paper is to provide a framework for reasoning about which is best in the context of x-ray and electron science.

Through a very general analysis, we are able to conclude that for signals linear in the x-ray or electron intensity:

1. Multiplex scanning enables the experimenter to swap the resolution – in time, space, or energy – of a final detector for the resolution with which the input beam can be modulated or measured.
2. We confirm multiplexing can help overcome specific kinds of noise, a well-known result known as Felgett’s advantage. We show, however, this advantage offers no possibility to acquire more information for a given radiation dose as compared to systematic scanning.
3. Multiplex measurements should yield significant improvements in experiment time, radiation dose used, and signal-to-noise if the sample is known to be sparse in some basis. Multiplexing methods can lead to very general applications of the theory of compressed sensing, which represents an opportunity for imaging at the atomic scale.

Each of these possible advantages will be discussed in detail, with clear applicability conditions. To aid readability, nearly all mathematics has been relegated to the appendices and only results are presented.

To build some intuition for how multiplex sensing techniques work, consider acquiring an x-ray fluorescence image of a sample. We could go about this in the usual way: by rastering a focused x-ray probe over the sample, thereby illuminating one “pixel” at a time, and recording a spectrum at each position. At the end of the experiment, the fluorescence image is simply the set of all acquired spectra, with each spectrum assigned to a spatial location on the image.

Alternatively, we could de-focus the x-ray beam and modulate the transverse intensity distribution with a random mask. This mask might illuminate half of the pixels from our previous raster scan, chosen at random. Then, a measurement acquires the summed spectra from all the illuminated pixels, which by itself contains no spatial information. Imagine, however, we fabricate a large number of such masks and repeat the experiment many times. Every time a specific pixel $i$ is illuminated, it will contribute the $k$ points in its spectrum ${\boldsymbol {\mathbf {x}}}_i$ to the spectral sum; for mask $m$, let’s write that measured sum

(1)$${\boldsymbol{\mathbf{r}}}_m = \sum_{i} a_{mi} \cdot {\boldsymbol{\mathbf{x}}}_i$$

where $a_{mi}$ is 1 if the pixel $i$ is illuminated by the mask $m$, and 0 otherwise. For many repeated measurements this can be nicely written in matrix notation

(2)$$R = A X \,.$$

This form hints that if we build our mask set $A$ in a smart way and take enough measurements, we may be able to easily solve for $X$, the same per-pixel spectra as we acquired in our raster scan.

In the example presented, we specifically designed the masks and therefore knew exactly how to write our matrix $A$, with 1s and 0s for illuminated or shadowed pixels for each mask. The scheme would work just as well, however, if we had random masks we did not choose specifically, so long as we were able to measure and record which pixels were illuminated. This gives us ability to use the randomness inherent in the experiment to our advantage – so long as we can measure that random patterning to high resolution. Figure 1 shows an illustration of all three methods discussed for acquiring a signal: raster scanning, multiplexing with known patterning, and multiplexing using measured patterning. The example envisioned is spatial in nature. The same procedure works just as well, however, in the time or energy (spectral) domains.

Fig. 1. Three different sampling schemes for a single imaging task. Top left: in a raster scan, small regions of the object are illuminated and recorded one at a time. Top right: in a multiplex scan, many selected regions are illuminated and recorded at the same time, and their sum recorded. The patterns of illumination are chosen, so they are known. The procedure is repeated many times and the final object reconstructed mathematically. Bottom: a second example of a multiplex scan, ghost imaging. In a ghost imaging setup, a randomly patterned illumination is used and the sum recorded. The patterning, though random and uncontrolled, is measured via a diagnostic, typically a beam splitter and area detector. The reconstruction procedure is exactly the same as for the multiplex scan.

Download Full Size | PDF

When is multiplexing advantageous? In this paper we provide a framework for analyzing this question, then discuss three specific cases where multiplex sensing may be useful. Our focus is on applications involving the use of x-rays and electrons, where measurements are in many ways expensive: they are often flux-limited, cause irreversible damage to the sample and require time at costly facilities. In some cases, multiplexing will offer a way to alleviate these concerns, but in other situations it may exaggerate them. Our goal is to understand when multiplexing is useful for x-ray and electron experiments and when it should be avoided.

An important note on the use of prior information in imaging experiments is necessary. In general, if one knows something about the signal about to be acquired, for example if it is smooth or always-positive, that fact can be used to improve the measurement process. These priors can be implemented equally well in any sampling scheme, whether rastering or multiplexed. There is one powerful prior, however, for which this equivalence is known to not be true. If the signal is sparse in some representation (i.e. compressible) then multiplex sensing can offer serious advantages over raster scanning [9,10]. Because of this unique feature, sparsity will be the only prior discussed in this paper. We will first consider sampling schemes that do not make use of sparsity, then show how if the signal is known to be sparse, that information can be readily incorporated into the analysis of different imaging schemes.

Our analysis builds on progress from a variety of fields. One specific implementation of multiplex sensing is classical “ghost imaging” (Fig. 1), which refers to a class of experiments where a randomly patterned wavefront is split into two branches [11]. One branch measures the total transmission through a sample of interest, called the bucket signal. The other branch images the wavefront. A moment of thought reveals this process to be a case of spatial multiplexing, identical to the imaging fluorescence spectroscopy scan discussed previously, only using random beam patterning instead of known masks. In the ghost imaging community, multiplexing with known masks is conversely known as “computational ghost imaging” [12]. Similarly, Fourier transform spectroscopy is also a multiplexing scheme, where many wavelengths are measured simultaneously per measurement [13,14]. Our presentation generalizes these examples and many others into a single framework. Our treatment is completely classical in nature, ignoring any information gain that might be achieved via performing measurements with entangled particles. Entanglement has been focus of specific studies in ghost imaging – for perspective see [11] – but is not heavily used in current applications.

While the idea to use multiplexing in x-ray and electron experiments has only recently started to recieve significant attention, demonstrations at synchrotrons [1–3], FELs [7], and electron sources [6] have already been conducted, paving the way for multiplexing to start to impact applied x-ray and electron science.

2. Framework

Consider a measurement process where we perform $m$ measurements with $n$ unknown channels, for example the pixels in an image or amplitudes for given frequencies in a spectrometer. The channel values are the unknowns we wish to infer. We measure the weighted sum of all channels; weights may be zero. Crucially, we assume the system response is linear in all aspects, including as a function of beam intensity and the number/character of the channels. Further, we consider the general case where each channel’s output can be described by a $k$-length vector, for example $k=3$ if a pixel gives a simultaneous scalar output for red, green, and blue when measured, and these colors can be recorded separately. Finally, we assume some additive noise in each measurement.

Our goal is choose a weighting scheme that will allow us to infer each individual channel’s $k$ dimensional response. We write this symbolically as follows. Let $R$ be the $m \times k$ matrix of recorded outcomes, $A$ be the $m \times n$ sensing matrix, and $X$ the $n \times k$ signal of interest. Note that multi-dimensional signals can be written as a flattened one-dimensional signal – in the following formalism, we consider each pixel independent and do not consider shared information between adjacent pixels. In practice, it is possible to extend the method to include shared information, see for example [15], but the details depend on the domain of application. Then we consider a measurement process in the presence of uncorrelated, additive noise $\epsilon$ (an $m \times k$ matrix, but where all $k$ pixels are assumed to have equivalent noise) with finite variance $\sigma ^2$,

(3)$$R = AX + \epsilon \,.$$

The details of this additive noise will be discussed in a moment. We have not considered the case where there are errors in the record of the sensing matrix $A$. Such errors are possible when using random $A$ matrices generated naturally as part of the experiment and then measured by some diagnostic, for example as is done in ghost imaging (see Fig. 1). Sensing matrix errors make our estimates of $X$ too small, an effect known as regression dilution. Regression dilution has been studied in detail in the field of statistics and we refer the interested reader to [16]. We will consider a solution to (3) to be the estimate $\hat {X}$ that minimizes the error function

(4)$$\hat{X} = \mathop{{\arg\,\min}}\limits_{X} || R - A X ||_2^2 \,,$$

appropriate for Gaussian errors on the model predictions. For some experiments, other error models may be more appropriate, e.g. Poisson errors in low-flux situations. A brief comment on extending the results here to those more specific cases is given in Appendix E. In the rest of the text we consider Gaussian errors.

The solution to (4) is given by the ordinary least squares solution,

(5)$$\hat{X} = (A^T A)^{-1} A^T R \,,$$

from which we can compute the total expected error per channel,

(6)$$\mathcal{E} = \sigma \sqrt{ n^{-1} \Big\langle \mathrm{Tr} \big\{ (A^T A)^{-1} \big\} \Big\rangle_A } \,.$$

While the ordinary least squares solutions are well known, for completeness in Appendix A we compactly prove (5) and (6). We are motivated to do so by the fact that in the field of ghost imaging it is still common to use sub-optimal estimators, even though least squares has been employed previously [17–19]. The properties and performance of the traditionally used ghost imaging estimator is discussed in Appendix B and shown numerically in Fig. 2.

Fig. 2. Simulations showing the performance of different sampling schemes: raster, Bernoulli, half-Gaussian, and Hadamard. A 1023-length waveform with iid random samples from $\mathcal {N}(0,1)$ smoothed using a size-15 median filter was used as the signal of interest. This signal was reconstructed using either the ordinary least squares estimator (Eq. (5)) for which the mean-squared error [left] and the dose-error product is shown [middle], or the traditional ghost imaging correlation algorithm (Eq. (19)) [right]. Top: in the absence of noise, Bottom: in the presence of 20% ($\sigma _m = 0.2$) per-measurement additive Gaussian noise, not per-channel noise, highlighting Felgett’s advantage (see Section 4.2). Each simulation was repeated 10 times with new random values for the signal, sensing matrices, and noise, with average results shown. Black line shows $m = n$. Observations: (1) the dose and error predictions of Table 1 are reproduced, (2) the Felgett advantage is clearly visible in the simulation containing noise [bottom left], (3) the performance of the traditional ghost imaging algorithm is inferior to the OLS solution as predicted by theory (Appendix B). Python code to reproduce simulations is available in Code File 1 (Ref. [22]).

Download Full Size | PDF

Table 1. Quantitative comparison between the sampling schemes discussed, produced using Eqs. (6) and (9). Recall $m$ is the number of measurements, $n$ is the number of channels (i.e. unknowns), $a$ is the average dose per channel, $\sigma$ is the standard deviation of the additive Gaussian noise per measurement. The factor $1.66$ in the half Gaussian distribution approximates $\left ( 1 - 2/ \pi \right )^{-1/2}$. Note that all these values assume $A^T A$ is not singular, which can only occur when $m \geq n$. $^\ddagger$Hadamard error estimate derived in [14].

View Table

Equation (6) naturally separates the error contributions from the noise, given by $\sigma$, and the stability of the inversion of $A$ represented by the trace term $\mathrm {Tr} (A^T A)^{-1}$. As we will see, both depend on the choice of $A$, and are therefore determined by the experimental design. The noise factor, $\sigma$, depends also on the noise characteristics of the detection method, which turns out to be important for the comparison between different sampling schemes.

2.1 Noise model and transferred noise

While in general the presented equations are applicable in any case of additive noise, in real experiments the noise can often be broken into three standard types:

1. Quantum Poisson noise
2. Per-channel (e.g. per-pixel) Gaussian noise, often caused by the electronics in a detector
3. Total-detector Gaussian readout noise that is incurred once per measurement.

Considering these three types of noise with variances $\sigma _p^2$, $\sigma _x^2$ and $\sigma _m^2$ respectively, we can approximate the total additive noise $\sigma$ as a function of the magnitudes of the individual contributions and the experimental design,

(7)$$\sigma = \sqrt{ \left\langle \sigma_p^2 \sum_{i=1}^n a_{mi} + \sigma_x^2 \sum_{i=1}^n a_{mi}^2 + \sigma_m^2 \right\rangle_A }$$

where we have assumed each column ${\boldsymbol {\mathbf {a}}}_m$ of the matrix $A$ is identically and independently distributed (Appendix C). We have also assumed that we known nothing about the structure of the sample a priori, and therefore assume a uniform signal intensity purpose of estimating Poisson noise (Appendix C.1).

Typically, one source of noise will be dominant in the system. In that case, the other sources will be negligible and we can approximate,

(8)$$ \sigma \approx \begin{cases} \sigma_p \sqrt{ n \langle a_{mi} \rangle} \ \ & \mathrm{Poisson} \\ \sigma_x \sqrt{ n \langle a_{mi}^2 \rangle} \ & \mathrm{Channel \ Gaussian} \\ \sigma_m \ & \mathrm{Measurement \ Gaussian} \\ \end{cases} $$

Note that for the per-channel cases, the noise scales as $\sqrt {n}$, the square root of the number of channels. This should be intuitive, as adding an additional channel brings with it that channel’s noise contribution. It also brings that channel’s contribution to the summed signal, so these factors cancel with the $n^{-1/2}$ factor in Eq. (6) to make the overall error $\mathcal {E}$ independent of the number of channels.

More interesting is the case where the error is proportional only to the number of measurements. This occurs for example if there is some significant constant readout noise for a detector – then, the number of channels doesn’t matter, only the number of readouts (measurements). In this case, the overall error is reduced by increasing the number of channels and scales as $\mathcal {E} \sim n^{-1/2}$. The advantage comes from each channel contributing additional intensity, and that intensity boosts the signal higher and higher above the constant noise floor. This is well known in the field of spectroscopy, where the $n^{-1/2}$ reduction of the noise as a function of multiple channels is known as the Felgett advantage, one motivation behind Fourier transform and Hadamard spectroscopies. See [14] for a comprehensive study of noise in multiplex spectroscopy.

The question is if this kind of noise is applicable in modern x-ray and electron imaging setups. Current x-ray detectors are able to routinely count individual photons with high quantum efficiency, which effectively eliminates readout noise of this sort [20]. Electron detectors have also progressed into the quantum counting regime recently [21]. The major source of noise in modern setups, therefore, is quantum (Poisson) in nature, and occurs on a per-channel basis. The traditional Felgett advantage does not apply in such situations, and no signal-to-noise advantage is achieved by multiplexing.

2.2 Inversion stability: trace term

Equation (5) assumes that $A^T A$ is non-singular and can be inverted, which in general is not true, especially if $A$ is random. In such cases approximate – but often very good – solutions may be found by solving Eq. (4) via a matrix factorization or a minimization algorithm. Reasonable solutions will only be obtained if $A^T A$ is high rank, and good design of a sampling scheme should ensure that $A^T A$ is non-singular with high probability in the limit of a reasonable number of samples. The schemes we discuss here are all of this variety. In the case of random sensing matrices, however, $A^T A$ may only be strictly non-singular in the limit of a large number of random samples, and errors computed using (6) will be lower bounds.

2.3 Radiation dose

In this model, the total dose on the sample $\mathcal {D}$ for a set of measurements is given by the sum of all the elements of $A$,

(9)$$\mathcal{D} = \left\langle \sum_{ij} |A_{ij}| \right\rangle_A \,.$$

We assume the damage is linear in total dose, and do not consider cases where the damage mechanism changes as a function of fluence, i.e. changes in dose-intensity per unit time or area.

3. The analysis of some examples

We consider four sampling schemes for a demonstration comparison: a raster setup, two random multiplexed schemes, and one deterministic multiplexing scheme. For each, we compute the error of a recovered signal as a function of the number of measurements performed and radiation dose incurred. Results are summarized in Table 1.

3.1 Raster scans

In a raster scan, each pixel is sequentially illuminated with intensity $a$ so that $A \equiv aI$, where $I$ is the identity matrix with $1$ on the diagonals and zeros elsewhere. Usually the number of measurements equals the number of channels ($m = n$). In the presence of noise, it might be productive to repeat measurements. In that case $A$ once again becomes an $m \times n$ matrix where rows of the identity matrix start to repeat.

The solution to the reconstruction problem (Eq. (5)) is simply $\hat {X} = R/a$, reflecting the intuition that in raster scanning no reconstruction is necessary. The radiation dose of the experiment is $am$ while the expected error is $(\sigma \sqrt {n}) / ( a \sqrt {m} )$, showing how increasing the beam intensity overcomes background noise at the cost of possibly damaging the sample.

3.2 Multiplex scans

Any non-raster scan is a multiplex scan. To highlight the differences, we will discuss situations where the sensing matrix $A$ is fairly dense, i.e. about half or more of the elements are non-negligible. Consider the following three concrete multiplex scans:

1. Half Gaussian. $a_{ij} \sim |\mathcal {N} (0, a^2)|$. Each element distributed independently and identically (iid).
2. Bernoulli with $p=1/2$. $a_{ij} = a$ with probability $1/2$ and $0$ otherwise, iid.
3. Hadamard. The Hadamard sequence used in spectroscopy involves constructing a specific $n=2^z-1$ length sequence of $0$s and $1$s (with $z$ a positive integer) that forms an $n \times n$ circulant matrix $S_n$. This matrix is used as a sensing matrix. These matrices are closely related to, but are not exactly, the traditionally defined Hadamard matrices. They are formed by generating a Hadamard matrix of order $n+1$, truncating the first row and column – which contain only 1s – and performing the substitions $1 \to 0$ and $-1 \to +1$. The sequence is deterministic and the circulant property can simplify the experimental implementation. For our discussion, the unframiliar reader need only know that $S_n$ is a deterministic matrix where approximately half of the elements are $1$, the rest are $0$, and the inverse of $S_n$ is given by $2(n+1)^{-1} \cdot (2S_n^T - J_n) \neq S^T$. By $J_n$ we mean an $n \times n$ matrix with all elements $1$ [13,14].

Note the first two cases are random – continuous and binary, respectively. The last is binary and deterministic.

3.3 Comparison

A quantitative comparison between these four sampling schemes is presented in Table 1, containing analytical results, and Fig. 2 which contains numerical results. Some important observations are immediately noticeable:

1. In cases where radiation damage is a concern, rastering always offers the best error-to-damage tradeoff. The increased damage for adding multiplexing channels is proportional to the number of channels $n$, while the error reduction offered by the Felgett advantage scales as at most $\sqrt {n}$, and at worst negligible, see Eq. (8).
2. Hadamard sampling can be considered optimal in terms of maximizing Felgett’s advantage [14]. It is interesting, however, that the performance of a random Bernoulli sampling is nearly identical to the carefully designed Hadamard one.
3. Our results highlight that the performance of all schemes are not the same. The half Gaussian sampling scheme has a higher error for a given dose when compared to the Hadamard or Bernoulli methods, for instance.

4. Experimental opportunities for multiplex sensing

The presented framework gives us the tools for assessing the fundamental capabilities and costs associated with a particular sampling scheme. Often, however, the benefits of multiplexing are practical. The ability to use flexible illumination in either space or time can alleviate engineering challenges. For example, Kabra and colleagues recently used a multiplexing approach to measure the quantum efficiency map of copper photocathodes using the natural fluctuations in the drive laser [15] parasitically. Previously, the cathode would have to have been taken offline and raster-scanned, but using multiplexing the mapping can be done in the background durign normal operations. Next, we discuss situations where multiplexing can give performance advantages due to either practical or fundamental concerns.

4.1 Resolution improvement

In standard, raster-style imaging, the resolution of the measurement is limited by either the probe or detector resolution. For instance, in the x-ray fluorescence imaging application discussed in the introduction, the final resolution of the hyperspectral image obtained will be limited by the x-ray focus size.

Multiplexing enables one to trade the resolution of the probe or detection system for the resolution at which the probe can be patterned or measured. This trade may make it possible to collect data at higher resolution. Alternatively, it may be a simpler or cheaper option than increasing the probe resolution.

As an example, we recently suggested along with our colleagues that by measuring the random temporal fluctuations inherent in the power of an XFEL pulse, pump-probe experiments with time resolution faster than the pulse duration would be possible. A typical XFEL pulse has a duration on the order of 10-100 fs, but its power over time can be readily measured to $\sim 1$ fs resolution using well established diagnostics such as the LCLS’s XTCAV [23]. This should enable x-ray pump-probe measurements to be conducted down to 1 fs resolution. The alternative of decreasing the pump and probe durations – preparing 1 fs x-ray pulses – is possible, but challenging, and requires new hardware to implement [24–26]. Therefore for some experiments time-domain multiplexing should offer an attractive alternative.

This highlights an interesting aspect of multiplex sensing. In cases where an experimental setup naturally results in random fluctuations in the measurement, it may be possible to use that randomness rather than fight it.

4.2 Felgett’s advantage: more signal per measurement

Multiplex scans have multiple channels illuminated at once. Thus, for a system that has a fixed flux-per-channel, multiplexed illumination offers a higher overall flux on the sample. In experiments with per-measurement noise, this enables one to achieve a specific signal-to-noise ratio with fewer measurements. Given the large investments in making brighter x-ray and electron sources and the expense associated with operating those sources, the ability to use more of the incident flux is attractive.

It should be emphasized that the experiment time is reduced precisely because the sample is exposed to more incident radiation. This does not necessarily translate into more information acquired for a given radiation dose. Moreover, the multiplex advantage only exists with per-measurement noise (such as $\sigma _m$ in Eq. (8)), and there is no gain when using photon-counting detectors.

4.3 Signal-to-noise per dose

Can multiplex sensing enable one to obtain more information for a given radiation dose? Many atomic resolution imaging tasks are radiation-damage limited, and the possibility to use multiplex sensing to overcome that limit has been discussed in recent literature [3–5,7]. This discussion has revolved around ghost imaging setups, where it has been suggested that damage might be avoided by decreasing the flux on the object branch and increasing the flux into the reference branch. Since the reference branch radiation does not interact with the object, then it follows damage will be reduced.

This argument needs to consider errors in the measurements performed on both branches. The analysis presented so far corresponds to the case of an infinitely bright reference branch with no error on the matrix $A$. In the case where such error is significant, one expects regression dilution [16] and our work only provides a lower bound error estimate. Here, we consider only the limiting case, where the reference branch is made so bright that the error on the measurement of $A$ is negligible.

Now we only need to consider error on the object branch (as written in Eq. (6)). Even with an ideal detector, there will at least be Poisson noise due to the fact we image with quanta (photons or electrons).

Even in this best case scenario, our analysis shows there is no fundamental way classical ghost imaging – or any multiplexing scheme – can alleviate radiation damage (in the absence of priors, see Section 5). Multiplexing with $n$ channels, corresponding to pixels on the reference branch, reduces the error by at most a factor of $\sqrt {n}$ (Eq. (8)) which occurs in the presence of per-measurement noise. From a signal-to-noise perspective, this is equivalent to reducing the number of measurements, and therefore the radiation dose, by a factor of $n$ (since error scales with the square root of the number of measurements, $\mathcal {E} \sim m^{-1/2}$). The $n$ additional channels, however, increase the dose by a factor of $n$ (Eq. (9)). Thus, the additional dose used to obtain Felgett’s advantage exactly offsets the dose reduction possible by conducting fewer measurements (see Table 1). In the best case scenario, multiplexing does not improve the error-to-dose ratio.

In the case of a perfect detector with only quantum noise, the scaling is even worse – multiplexing does not decrease the error at all, but still incurs the dose penalty. Alone, multiplexing cannot limit radiation damage, only make it worse.

We can understand this intuitively. Consider the limiting case in a ghost imaging setup, where for each ghost image exposure the object branch is so weak that exactly a single photon travels along that branch (zero photons would provide no information, as the bucket would always read zero). That photon is either absorbed by the object or transmitted and detected. An infinitely bright reference branch provides a probability distribution for where the photon interacted with the sample. What is the most informative possible distribution? Clearly, a delta function – telling us exactly where the imaging photon went – would be most useful. This corresponds precisely to a raster scan, where we send the photon in a pre-determined and known direction and measure the result. Ghost imaging imparts at most an equal amount of information per object-branch photon as a raster scan, and in general less.

4.4 An example of applied multiplexing: XFEL self-seeding

We propose an application of ghost imaging to solve a real problem currently facing the LCLS XFEL at SLAC. Many spectroscopy experiments require a high intensity concentrated in a narrow bandwidth of x-ray energies. XFEL self-seeding has been pursued to produce XFEL pulses with a higher spectral brilliance than simple SASE pulses [29–31], but at soft x-ray wavelengths these seeded beams come with a large “pedestal” of undesired wavelengths that dilute the spectral purity of the beam [27]. For spectroscopy experiments that require the smallest bandwidths, this means the beam must still be monochromated, which inevitably reduces the total flux due to losses in the monochromator [32].

Multiplexing provides a possible way to overcome this issue. In a standard soft x-ray absorption spectroscopy experiment, a monochromatic beam, generated either using a monochromator or self-seeding, is passed into the sample and the absorbed fraction of the beam (or some proxy of absorbance) is recorded. The wavelength of the beam is scanned to measure the absorbance at all relevant wavelengths. If the incoming beam is not purely monochromatic, the spectrum obtained is convolved by the beam bandwidth.

If, however, we measure the incident x-ray spectrum across many randomly fluctuating exposures, we can use the multiplexing algorithm (3) to recover the absorption spectrum at the resolution of the incident x-ray spectrum measurement. Seeded or SASE beams with a large bandwidth but high intensity can be used without suffering from the loss of resolution caused by the large bandwidth. Measurement of the XFEL spectrum is easier than controlling it precisely, and in the specific case of soft x-ray self seeding, the desired level of control is not currently possible [27]. A similar application of multiplexing to soft x-ray spectroscopy has recently been experimentally demonstrated to overcome large bandwidths caused by the use of transform-limited attosecond XFEL pulses [33].

To demonstrate the ability of multiplexing to overcome the pedestal present in current soft x-ray self-seeding at LCLS, we have conducted a simulation of the results expected for an absorption spectroscopy experiment performed in both the traditional fashion and using multiplexing, for both SASE and seeded beams. The XFEL spectra used in the simulation are real LCLS spectra recorded as part of self-seeding tests [32]. The sample absorption spectrum is a phantom generated to include two broad features separated by a narrow bandwidth on a sloping baseline, capturing possible challenges associated with different samples [32]. The results, presented in Fig. 3, show that by applying the multiplexing approach, weak spectral features of 200 meV linewidth can be resolved by the multiplexing analysis, but not the standard binning approach. Additionally, we see that the seeded beam, which is similar to a true raster scan, recovers the signal with less error than the broader-bandwith SASE beam. Nonetheless, spectroscopy with the SASE beam on this sample would be possible with the multiplexing analysis combined with a sufficient quantity of data.

Fig. 3. Simulation of a soft x-ray absorption spectroscopy experiment performed at LCLS, using both traditional and multiplexing approaches. Simulations were performed using 175,000 individual XFEL spectra measured at LCLS, either in SASE mode at undulator 19 or after soft x-ray self seeding after undulator 24 [27]. Spectra were randomly shuffled and energy-shifted before use. These spectra ($A$) were multiplied by a putative spectrum ($\mathbf {x}$, top left panel) to produce a measured bucket signal $\mathbf {r} = A \mathbf {x}$. Teal traces show recovery using the mean energy of the individual x-ray spectra, while the blue traces show recovery using least-squares regression as in eqn. 4 along with an additional regularization term $\alpha || \mathbf {x} ||^2_2$ (ridge regression [28]) that stabilizes the solution. The right panels show the mean spectra (bold) and five representative spectra (background) for both the seeded and SASE cases.

Download Full Size | PDF

5. Compressed sensing

Throughout our discussion we have assumed that no prior information is used in the signal reconstruction process. There is, however, one important situation where prior information about the structure of the signal can – in combination with multiplex scanning – dramatically improve the imaging process.

If the signal $X$ to be measured is sparse in some basis, then techniques of compressed sensing can be applied [9,10]. Here we quickly review compressed sensing theory, in order to show how multiplexed measurements can enable a general application of compressed sensing in x-ray and electron science.

Compressed sensing presumes that our signal $X$ of interest has a sparse representation. Concretely, this means that there exists some orthonormal basis $\Psi$ such that $X_s \equiv \Psi ^{-1} X$ is $s$-sparse, meaning it has only $s$ non-negligible ($\approx 0$) entries. If $s \ll n$ we may be able to determine the signal $X$ of interest with order $s$ measurements instead of $n$ (Fig. 4). If we can do so, we can obtain the signal with less time, radiation dose, and error.

Fig. 4. Compressed sensing uses sparsity to enable inference with fewer samples than unknowns. Top: when solving linear systems, if the number of unknown variables $n$ is greater than the number of known measurements $m$, then the problem is underdetermined and no unique solution exists. Middle: if, however, the signal is known to be sparse in some basis $\Psi$, then the number of knowns is effectively reduced and the problem may become solvable. It is essential, however, that the majority of the measurements provide information about the non-sparse elements of the sparse signal. Bottom: if the matrices $A$ and $\Psi$ are CS-incoherent, then this is ensured. If, however, $A$ and $\Psi$ are highly correlated, then there is a high chance any measurement only informs on zero components of the sparse signal and provides no information. The required number of samples in this case is expected to return to order $n$, as this number of samples are required to ensure all components of the sparse signal are measured.

Download Full Size | PDF

The basis $\Psi$ can be chosen to maximize the sparsity. For example, if the signal is composed of a waveform with only a few frequencies, choosing $\Psi$ as a DFT matrix will transform the signal into the frequency domain, where it will be sparse. Further, it is absolutely possible for the signal to be sparse in the original domain of interest (for example, an image with a uniform background), in which case $\Psi = I$ is a reasonable choice.

If we knew precisely which elements in $X_s$ were sparse, the procedure to use would be straightforward. We could simply truncate $X_s$ to the non-negligible components and perform our inference as before with $s$ instead of $n$ unknowns, under the image formation model $R = A \Psi X_s$. We assume a complete column of $k$ elements in $X_s$ is zero or not for the purposes of counting sparse elements. If one element is non-zero, that entire column is counted as one of the $s$ datapoints we have to infer. If, however, we do not know which components will be sparse, it is essential that each measurement provides some new information about each non-sparse component. This is where multiplexed measurements enter the picture. To ensure that each of our measurements are highly likely to give information about all $s$ components, the measurement scheme $A$ must be constructed in a particular way. It turns out random multiplexing satisfies this requirement for any signal and any sparsity basis, as we will now discuss.

In the theory of compressed sensing, the ability of a sensing scheme to yield information about all the non-zero signal components is quantified by the coherence $\mu$

(10)$$\mu( A, \Psi ) = \sqrt{n} \max_{i,j} \frac{{\boldsymbol{\mathbf{a}}}_i \cdot {\boldsymbol{\mathbf{\psi}}}_j}{ ||{\boldsymbol{\mathbf{a}}}_i||_2 \cdot ||{\boldsymbol{\mathbf{\psi}}}_j||_2}$$

which is the correlation between the rows of $A$ and the basis vectors (columns) of $\Psi$. The term coherence should not be confused with the phase relationship between any waveforms, and to distinguish it we will call it CS-coherence.

Given a pair $A, \Psi$, we can replace Eq. (4) with the lasso regression algorithm,

(11)$$\min_{X_s} || R - A \Psi X_s ||_2^2 + \lambda || X_s ||_1.$$

Here, the $|| X_s ||_1$ term enforces signal sparsity and parameter $\lambda$ determined by the noise level $\epsilon$. In general this parameter is unknown, and set using cross-validation or some other estimate based on observed data. If

(12)$$m \geq c \, \mu^2 \cdot s \, \log n$$

for some constant $c$ that depends on $A$, then a central result of compressed sensing theory demonstrates this program recovers the exact signal with probability $1 - O(e^{- \gamma m})$ with $\gamma\;>\;0$ [10]. If $s$ is small, then often $m \ll n$ measurements give an exact reconstruction. Figure 5 shows this scaling is achieved using the lasso algorithm for a simple sparse reconstruction. This reduction in the number of measurements is the advantage compressed sensing offers for x-ray and electron experiments.

Fig. 5. Example of compressed sensing using the multiplexing schemes discussed. A signal (top left) of 1023 samples was constructed by summing 16 cosine functions with random frequencies (chosen uniformly over the band limit but excluding the DC component) and positive amplitudes from $10 \times |\mathcal {N}(0,1)|$, making the signal sparse in the Fourier domain (top right). The signal was then reconstructed from a variable number of samples using either the ordinary least squares algorithm (bottom left) or the lasso algorithm with $\lambda = 10^{-2}$ for all sensing matrices, except the identity case (which has a factor of $\approx 1024 / 2$ smaller amplitude) for which $\lambda = 1 \cdot 10^{-5}$. Values of $\lambda$ were hand-tuned. Shown are the averages of 10 randomized repeats of the simulation process. On the bottom panels, the leftmost black line indicates $m = s \log n$, and the rightmost line is $m = n$. The Hadamard reconstruction requires additional samples because it is partially coherent with the Fourier basis ($\mu = 20.3$), as compared to the incoherent identity, Bernoulli, and half-Gaussian sensing bases ($\mu = 1.4, 3.4, 3.0$ respectively). Simulations were measurement noise-free, variation seen is a function of the randomness of the simulated true signal only. Python code to reproduce simulations is available in Code File 1 (Ref. [22]).

Download Full Size | PDF

Consider instead trying to use a raster scan ($A=I$) to reveal a naturally sparse signal ($\Psi = I$). Here, $A \Psi = I$ and so $\mu ^2 (A, \Psi ) = n$, confirming the expectation that $n$ independent measurements are required to uniquely infer a signal of size $n$.

Since we wish to presume as little as possible about the sample of interest and its sparsity structure, it is natural to ask: are there sensing matrices $A$ that are CS-incoherent with nearly all sparsity representations $\Psi$? It turns out the answer is yes, and randomly multiplexed schemes, such as the half-Gaussian and Bernoulli matrices discussed in this paper, are examples. It can be shown the random nature of such sensing matrices make it almost certain $\mu \ll n$. In other words, signals that are randomly multiplexed are guaranteed to be CS-incoherent. This holds for any sparsity basis $\Psi$ (specifically $\mu ^2\;<\;c \log n$, and often $\mu =$ const.) [9,10]. This theory shows why multiplexing is so useful in compressed sensing applications – random multiplexing enables us to use any sparsity basis of our choice, thereby maximizing sparsity and the associated advantages of compressed sensing.

As an intuitive example, consider an example where the signal of interest consists of an image with only a few localized features on blank background, but the feature locations are unknown (here, $\Psi = I$). Raster scanning for the features will be slow. In fact for $n$ pixels, $s$ of which are non-blank, we get useful information only a fraction ($s/n$) of the time we perform a raster measurement. Further, we have to scan all the pixels to ensure we don’t miss any feature. Instead, consider conducting measurements where each pixel has a $50\%$ probability of being illuminated in each measurement. On average, we obtain information on fraction $s/2$ of the information-containing pixels per measurement, as compared to the average fraction $s/n$ in the raster scan. Since it is likely $n \gg 2$, this is a more efficient measurement scheme. Further, the probability of missing a given feature goes as $2^{-m}$, which means it is unlikely we miss any feature entirely, even for a modest number of measurements ($<\;0.1\%$ for $m=10$).

We emphasize that by reducing the number of samples below $n$, compressed sensing is the only multiplexing approach which reduces the total dose needed to achieve a given signal-to-noise ratio.

6. Summary

We have presented a framework for assessing the performance of different sampling schemes. In addition to the common rastering method, where a single point in space, time or energy is measured, we have considered multiplex schemes where sets of points are measured simultaneously. In specific cases, the latter can offer advantages in experiment time, sample use, and measurement resolution:

• When per-measurement readout noise is dominant, the Felgett advantage improves the signal-to-noise ratio.
• If the source is naturally random, multiplexed measurements using that randomness may be more convenient or yield higher resolution than producing a raster beam.
• If the resolution at which the multiplex beam can be measured or controlled exceeds that of a raster beam, multiplexing will improve the resolution of the final reconstruction.
• Finally, if the acquired image is known to be sparse in some domain, compressed sensing enables experiments to be conduced with fewer measurements, improving experiment time, radiation dose, and signal-to-noise. Random multiplexing allows one to apply compressed sensing generally, to any signal, with any sparsity basis.

We have proven two facts that may be especially counterintuitive for this budding community. First, the only way multiplexing can be used to mitigate radiation damage or perform more rapid experiments is when it is combined with compressed sensing. If the signal is known to be sparse, the expected number of measurements for the multiplexing schemes scales roughly as $s \log n$, with simulations providing a more accurate final assessment. Second, random multiplex patterns are about as good as the carefully programmed Hadamard sequence – whichever is easier to implement experimentally should be used. Finally, we emphasize an old result that is little known in photon science; there is no Felgett advantage when Poisson noise dominates.

The advantages offered, however, are idiosyncratic to each application. When considering a multiplex sampling scheme, we recommend answering four questions first:

1. Is the experimental goal limited by radiation dose, number of measurements, or resolution?
2. Is it possible to program a multiplex scheme or measure some natural or induced randomness in the measurement to implement multiplexing?
3. What noise is present in your signal of interest?
4. Is your expected signal known to be sparse in some basis? If so, what is the expected sparsity?

With the answers to these questions in hand, Eq. (6) provides the final error expected for each possible experimental setup.

Large capital investments are currently being made in x-ray and electron facilities and instrumentation. It is our belief that compressed sensing offers an under-utilized opportunity to push the capabilities of these new radiation sources, making current experiments more efficient and opening new experimental possibilities. Since compressed sensing and multiplexing go hand-in-hand, this in turn means we anticipate an increased use of multiplex sensing in x-ray and electron science in the near future. Future work is necessary to develop general ways to apply compressed sensing to the wide variety of imaging currently being performed with electrons and x-rays.

Even in cases where sparsity cannot be leveraged, we believe multiplexing offers significant advantages for very specific x-ray and electron imaging experiments. Our hope is that the work presented here helps clearly identify those advantageous cases and implement them.

A. OLS proof

For completeness, we quickly sketch proofs of the OLS solution and error analysis. We use vector notation (k=1) for simplicity, but the results are easily extended. To show the optimality of the estimator (5), we wish to minimize (4),

(13)$$\begin{aligned}|| {\boldsymbol{\mathbf{r}}} - A {\boldsymbol{\mathbf{x}}} ||_2^2 =& \left( {\boldsymbol{\mathbf{r}}} - A {\boldsymbol{\mathbf{x}}} \right)^T \left( {\boldsymbol{\mathbf{r}}} - A {\boldsymbol{\mathbf{x}}} \right)\\ =& \, {\boldsymbol{\mathbf{r}}}^T {\boldsymbol{\mathbf{r}}} - 2 {\boldsymbol{\mathbf{x}}}^T A^T {\boldsymbol{\mathbf{r}}} - {\boldsymbol{\mathbf{x}}}^T A^T A {\boldsymbol{\mathbf{x}}} \end{aligned}$$

differentiation with respect to ${\boldsymbol {\mathbf {x}}}$ yields

(14)$$- 2 A^T {\boldsymbol{\mathbf{r}}} + 2 A^T A {\boldsymbol{\mathbf{x}}}$$

which when set equal to $0$ shows that (4) is minimized when ${\boldsymbol {\mathbf {x}}} = (A^T A)^{-1} A^T {\boldsymbol {\mathbf {r}}}$. Note this estimator is defined for non-square $A$; for such matrices $A^{-1}$ nor $(A^{T})^{-1}$ exist. This operator is also known as the Moore-Penrose pseudoinverse and denoted $A^+ \equiv (A^T A)^{-1} A^T$, a compact notation we will use immediately. To derive the expected error of this estimator under the error model (3),

(15)$${\boldsymbol{\mathbf{r}}} = A {\boldsymbol{\mathbf{x}}} + {\boldsymbol{\mathbf{\epsilon}}}$$

we can compute the residual ($\hat { {\boldsymbol {\mathbf {x}}} } - {\boldsymbol {\mathbf {x}}}$). To do so, multiply by $A^+$ and realize $\hat { {\boldsymbol {\mathbf {x}}} } \equiv A^+ {\boldsymbol {\mathbf {r}}}$ and $A^+ A = I$ to obtain

(16)$$\hat{ {\boldsymbol{\mathbf{x}}} } - {\boldsymbol{\mathbf{x}}} = A^+ {\boldsymbol{\mathbf{\epsilon}}}$$

the expected mean squared error is, by definition, the sum of the squared residuals,

(17)$$\begin{aligned}\mathcal{E}^2 =& \frac{1}{n} \left\langle \sum_i \left( \hat{ x_i } - x_i \right)^2 \right\rangle\\ =& \frac{1}{n}\left\langle (A^+ {\boldsymbol{\mathbf{\epsilon}}}) \cdot (A^+ {\boldsymbol{\mathbf{\epsilon}}}) \right\rangle\\ =& \frac{\sigma^2}{n} \left\langle \mathrm{Tr} \left[ A^{+T} A^+ \right] \right\rangle\\ =& \frac{\sigma^2}{n} \left\langle \mathrm{Tr} \left[ (A^T A)^{-1} \right] \right\rangle \end{aligned}$$

where in the last step we have expanded $A^+ = (A^T A)^{-1} A^T$ and used the circular property of the trace ($\mathrm {Tr} ABC = \mathrm {Tr} BCA$) twice.

B. Traditional ghost imaging

A quick remark is in order concerning the “traditional” algorithm for ghost imaging, as our discussion reveals how this method works and describes its performance. In the field of ghost imaging, the signal ${\boldsymbol {\mathbf {x}}}$ is reconstructed via correlation with “bucket” detector measurements $r_i$ corresponding to many illumination patterns ${\boldsymbol {\mathbf {a}}}_i$. Then the image estimate $\hat {{\boldsymbol {\mathbf {x}}}}$ is obtained via correlation

(18)$$\hat{{\boldsymbol{\mathbf{x}}}} - \langle x \rangle_i = \Big\langle {\boldsymbol{\mathbf{a}}}_i \cdot ( r_i - \langle r \rangle_i ) \Big\rangle_i$$

in matrix notation

(19)$$\hat{{\boldsymbol{\mathbf{x}}}} - \langle x \rangle_i = A^T ( {\boldsymbol{\mathbf{r}}} - \langle r \rangle_i ) \,.$$

However the forward model is still ${\boldsymbol {\mathbf {r}}} = A {\boldsymbol {\mathbf {x}}}$. Thus, the traditional ghost imaging algorithm implicitly assumes $A^TA \approx c_1 I + c_2 J$. The interesting observation is that while in general this assumption is not satisfied, we have shown that error-minimizing multiplexing schemes approach this behavior, explaining the success of the traditional ghost imaging algorithm.

That said, the traditional algorithm is inferior in both theory and practice to the ordinary least squares solution given by Eq. (5). The Gauss-Markov theorem proves this so long as the noise is uncorrelated, zero-mean, and has identical variance for all measurement elements.

Note: to obtain properly normalized estimates, like those presented in Fig. 2, one can use

(20)$$\hat{{\boldsymbol{\mathbf{x}}}} = \frac{n A^T}{\mathrm{Tr} A^T A} {\boldsymbol{\mathbf{r}}}$$

where we have assumed $\hat {{\boldsymbol {\mathbf {x}}}}$ and ${\boldsymbol {\mathbf {r}}}$ are centered (mean-subtracted). If we let $\hat {{\boldsymbol {\mathbf {x}}}} = c A^T {\boldsymbol {\mathbf {r}}}$, this is the form for which

(21)$$c = \min_c || \hat{ {\boldsymbol{\mathbf{x}}} } - {\boldsymbol{\mathbf{x}}} ||_2^2$$

Proof. Let $\delta \equiv (c A^T A - I) {\boldsymbol {\mathbf {x}}}$. Then by definition,

(22)$$\begin{aligned}\epsilon =& \min_c \sum_j \left[ \left( c A^T A - I \right) {\boldsymbol{\mathbf{x}}} \right]^2_j\\ =& \min_c \delta \cdot \delta \end{aligned}$$

expand ${\boldsymbol {\mathbf {x}}} = \sum _i b_i \phi _i$ in the basis of the eigenvectors $\phi _i$ of $G \equiv A^T A$ (a Grammian and therefore orthogonal matrix),

(23)$$\begin{aligned}\delta =& \sum_i b_i (cG - I) \phi_i\\ =& \sum_i b_i (c \lambda_i - 1) \phi_i\\ =& \, {\boldsymbol{\mathbf{x}}} \cdot (c {\boldsymbol{\mathbf{\lambda}}} - {\boldsymbol{\mathbf{1}}} ) \end{aligned}$$

now

(24)$$\delta \cdot \delta = ({\boldsymbol{\mathbf{x}}} \cdot {\boldsymbol{\mathbf{x}}}) \left[ (c {\boldsymbol{\mathbf{\lambda}}} - {\boldsymbol{\mathbf{1}}} ) \cdot (c {\boldsymbol{\mathbf{\lambda}}} - {\boldsymbol{\mathbf{1}}} ) \right]$$

which is minimized for $c = n / \sum _i \lambda _i$.

C. Additive error

The noise model we consider combines noise from three sources: unavoidable Poisson (quantum) noise, per-pixel Gaussian (Johnson type) noise, and finally per-measurement Gaussian (detector readout) noise. Each is shown to be well approximated by a Gaussian distribution. In this appendix we derive the expected variances of each of these types of noise individually; the final noise can be written as the sum of their variances. As a result we obtain that the error on the measurement $r_m = {\boldsymbol {\mathbf {a}}}_m \cdot {\boldsymbol {\mathbf {x}}}_m$ that is a mean-zero normal with variance

(25)$$\sigma^2 = \sigma_p^2 \sum_i a_{ni} + \sigma_x^2 \sum_i a_{ni}^2 + \sigma_m^2$$

Here, $\sigma _p$ gives the magnitude of the Poisson noise relative to the per-pixel $\sigma _x$ and per-readout $\sigma _m^2$ Gaussian variances.

C.1. Poisson error

Consider summing photons recorded on a set of pixels, with each pixel’s photon count distributed as $c_i \sim \mathrm {Pois}(a_{mi} \cdot x_{mi})$,

(26)$$r_m = \sum_i c_i \ \ \mathrm{for} \ \ c_i \sim \mathrm{Pois}(a_{mi} \cdot x_{mi})$$

then $r_m \sim \mathrm {Pois}({\boldsymbol {\mathbf {a}}}_m \cdot {\boldsymbol {\mathbf {x}}}_m)$, since the sum of iid Poissons is also a Poisson. The mean and variance of this distribution are both ${\boldsymbol {\mathbf {a}}}_m \cdot {\boldsymbol {\mathbf {x}}}_m$. For large values of this mean, above $\sim 1000$ (photons), a Gaussian approximation is very accurate, and $r_m \sim \mathcal {N}({\boldsymbol {\mathbf {a}}}_m \cdot {\boldsymbol {\mathbf {x}}}_m, {\boldsymbol {\mathbf {a}}}_m \cdot {\boldsymbol {\mathbf {x}}}_m)$. We could equivalently write

(27)$$r_m = {\boldsymbol{\mathbf{a}}}_m \cdot {\boldsymbol{\mathbf{x}}}_m + \epsilon_m$$

with $\epsilon _m \sim \mathcal {N}(0, {\boldsymbol {\mathbf {a}}}_m \cdot {\boldsymbol {\mathbf {x}}}_m)$. We wish, however, to consider the error for a generic measurement system independent of the structure of the sample under study. To do this we assume ${\boldsymbol {\mathbf {x}}}_m \sim \sigma _p^2 {\boldsymbol {\mathbf {1}}}$, with $\sigma _p^2$ a scale factor. Then the Poisson noise can be approximated by $\epsilon _m \sim \mathcal {N}(0, \sigma _p^2 \sum _i a_{mi})$.

C.2. Per-pixel Gaussian error

First, recall that for $\epsilon _i \stackrel {iid}{\sim } \mathcal {N}(0, \sigma _x^2)$ that $S \equiv \sum _i a_i \epsilon _i$ is distributed $S \sim \mathcal {N}(0, \sigma _x^2 \sum _i a_i^2)$. This means that if we have per-pixel errors $[{\boldsymbol {\mathbf {\epsilon }}}_m]_i \stackrel {iid}{\sim } \mathcal {N}(0, \sigma _x^2)$ so that

(28)$$r_m = {\boldsymbol{\mathbf{a}}}_m \cdot ( {\boldsymbol{\mathbf{x}}}_m + {\boldsymbol{\mathbf{\epsilon}}}_m )$$

we can re-write those errors as a per-measurement error

(29)$$r_m = {\boldsymbol{\mathbf{a}}}_m \cdot {\boldsymbol{\mathbf{x}}}_m + \epsilon_m$$

where $\epsilon _m \sim \mathcal {N}(0, \sigma _x^2 \sum _i a_{mi}^2)$.

C.3. Per-measurement Gaussian error

The simplest case is a per-measurement error, where

(30)$$r_m = {\boldsymbol{\mathbf{a}}}_m \cdot {\boldsymbol{\mathbf{x}}}_m + \epsilon_m$$

and $\epsilon _m \sim \mathcal {N}(0, \sigma _m^2)$ is a scalar additive error.

D. IID sensing matrices

Here we derive the error for sensing matrices with iid components. Let $A = (a_{ij})$ be such a matrix. Per Eq. (6), our task is to evaluate the expected value

(31)$$\left\langle \mathrm{Tr} (A^T A)^{-1} \right\rangle$$

Write the ($n \times n$) Grammian matrix $G = A^T A$ for convenience, the elements of which are

(32)$$g_{ij} = \sum_{k=1}^n a_{ki} a_{kj} = \begin{cases} m \langle a_{ii}^2 \rangle \ \ \mathrm{if} \ \ i=j \\ m \langle a_{ij} \rangle^2 \ \ i \neq j \end{cases}$$

matrix $G$ has diagonal elements $d \equiv m \langle a_{ii}^2 \rangle$ and off-diagonal elements $c \equiv m \langle a_{ij} \rangle ^2$. The eigenvalue equation $G \psi = \lambda \psi$ can be written

(33)$$\lambda \psi_1 = d \psi_1 + n c \psi_2 $$

(34)$$\lambda \psi_2 = d \psi_2 + c \psi_1 + (n-1) c \psi_2 $$

for all $n$ degenerate eigenvalues $\lambda$, which correspond to vectors $\psi$ with a single $\psi _1$ element and $n-1$ elements $\psi _2$. The solution to this system gives $\psi _2/\psi _1 = - 1/n$ and more relevantly

(35)$$\lambda = d-c = n \left( \langle a_{ii}^2 \rangle - \langle a_{ij} \rangle^2 \right)$$

and so

(36)$$\left\langle \mathrm{Tr} G^{-1} \right\rangle = n \lambda^{-1} = \frac{n}{m} \left( \langle a_{ii}^2 \rangle - \langle a_{ij} \rangle^2 \right)^{-1}$$

the terms of this expression are real squared values and therefore positive. This shows that Grammian matrices that approximate the identity matrix, with large values on the diagonal and small off-diagonal matrices, have the lowest error, as expected.

For the half-Gaussian distribution with variance $a^2$, $\langle a_{ii}^2 \rangle = a^2$ and $\langle a_{ij} \rangle ^2 = (2/\pi ) a^2$ so $\left \langle \mathrm {Tr} G^{-1} \right \rangle = n / (m a^2) \cdot (1 - 2/\pi )^{-1}$.

For the Bernoulli distribution that gives value $a$ with probability $p$ and $0$ otherwise, $\langle a_{ii}^2 \rangle = p a^2$ and $\langle a_{ij} \rangle ^2 = p^2 a^2$ so $\left \langle \mathrm {Tr} G^{-1} \right \rangle = (n/m) \left [ a^2 p(1-p) \right ]^{-1}$.

E. Generalized linear models

The loss function (4) is appropriate if the errors of the model are Gaussian. In some cases, however, this is not true and superior results can be obtained using a more appropriate model of the noise. One example that will be familiar to those working in x-ray or electron imaging is the low-flux situation, where error is primarily due to Poisson quantum noise, and the error is skewed as a result.

In such situations, it is possible to employ a generalized model to better model the error. Generalized Linear Models (GLMs) are well studied (as in [28]). We only provide a brief overview here for completeness.

Instead of minimizing the least-squares error, as in (4), one can write a more general log-likelihood function

(37)$$\log \mathcal{L}({\boldsymbol{\mathbf{x}}}) = g(A {\boldsymbol{\mathbf{x}}} | {\boldsymbol{\mathbf{r}}}) + \gamma({\boldsymbol{\mathbf{x}}})$$

Here, $g$ is a “link function” that provides a more general model to represent the likelihood of observing data ${\boldsymbol {\mathbf {r}}}$ given input $A{\boldsymbol {\mathbf {x}}}$, and $\gamma$ is a function capturing (all) prior beliefs that weights ${\boldsymbol {\mathbf {x}}}$ according to their believed likelihood before any data are observed.

The desired solution is then the maximum of the log-likelihood

(38)$$\hat{{\boldsymbol{\mathbf{x}}}} = \mathrm{argmax}_{{\boldsymbol{\mathbf{x}}}} \log \mathcal{L}({\boldsymbol{\mathbf{x}}})$$

Equation (4) is a special case where $g$ is the Euclidean distance given by a Gaussian likelihood and no prior beliefs are asserted ($\gamma = \mathrm {const.}$).

Maximum-likelihood theory shows that the asymptotic distribution of $\hat {{\boldsymbol {\mathbf {x}}}}$ is normally distributed around the true value ${\boldsymbol {\mathbf {x}}}^*$,

(39)$$\hat{{\boldsymbol{\mathbf{x}}}} \sim \mathcal{N} ( {\boldsymbol{\mathbf{x}}}^*, I({\boldsymbol{\mathbf{x}}}^*)^{-1} )$$

with covariance given by the inverse of the Fisher Information

(40)$$I({\boldsymbol{\mathbf{x}}})_{ij} = - \left\langle \frac{ \partial^2 \log \mathcal{L} ({\boldsymbol{\mathbf{x}}}) } {\partial {\boldsymbol{\mathbf{x}}}_i \partial {\boldsymbol{\mathbf{x}}}_j } \right\rangle$$

as previously, the expected mean squared error is

(41)$$\mathcal{E}^2 = \frac{1}{n} \left\langle \sum_i \left( \hat{ x_i } - x_i \right)^2 \right\rangle$$

but $\sum _i \left ( \hat { x_i } - x_i \right )^2$ is the trace of the covariance matrix, so

(42)$$\mathcal{E}^2 = \frac{1}{n} \mathrm{Tr} \, I({\boldsymbol{\mathbf{x}}})^{-1}$$

as a concrete and relevant example, consider the aforementioned Poisson regression case. Here, the link function is the exponential and the log likelihood becomes,

(43)$$\log \mathcal{L}({\boldsymbol{\mathbf{x}}}) = \sum_i \left[ - \exp( A_i \cdot {\boldsymbol{\mathbf{x}}} ) + {\boldsymbol{\mathbf{r}}}_i ( A_i \cdot {\boldsymbol{\mathbf{x}}} ) - \log({\boldsymbol{\mathbf{r}}}_i !) \right]$$

where $A_i$ is the $i^{\mathrm {th}}$ row of $A$, corresponding to the $i^{\mathrm {th}}$ measurement. Taking the first derivative gives

(44)$$\frac{ \partial \log \mathcal{L} ({\boldsymbol{\mathbf{x}}}) } {\partial {\boldsymbol{\mathbf{x}}} } = \sum_i \left[ {\boldsymbol{\mathbf{r}}}_i - \exp( A_i \cdot {\boldsymbol{\mathbf{x}}} ) \right] A_i$$

which, when set to zero, gives a convex equation that can be solved numerically (via e.g. gradient ascent) to find the maximum likelihood solution. The second derivative is

(45)$$I({\boldsymbol{\mathbf{x}}}) = \sum_i A_i A_i^T \exp( A_i \cdot {\boldsymbol{\mathbf{x}}} )$$

One can in principle design an optimal sensing matrix $A$ by minimizing the trace of the inverse of this matrix. If we want to consider such a design before any knowledge of the signal ${\boldsymbol {\mathbf {X}}}$ is know, we might approximate $A_i \cdot {\boldsymbol {\mathbf {x}}} \approx \mathrm {const.}$ in which case we obtain the familiar

(46)$$\mathcal{E}^2 = \mathrm{const.} \cdot \frac{1}{n}\mathrm{Tr} (A^T A)^{-1}$$

Funding

This work was funded by the US Department of Energy, Office of Science under contract DE-AC02-76SF00515.

Acknowledgments

The authors thank Gabriel Marcus for assistance assembling the FEL spectra for use in the soft x-ray self seeding simulation. TJL would like to thank Anders Nilsson for generously hosting him at Stockholm University during the time this paper was written.

Disclosures

The authors declare no conflicts of interest.

References

1. B. A. Yorke, G. S. Beddard, R. L. Owen, and A. R. Pearson, “Time-resolved crystallography using the Hadamard transform,” Nat. Methods 11(11), 1131–1134 (2014). [CrossRef]

2. H. Yu, R. Lu, S. Han, H. Xie, G. Du, T. Xiao, and D. Zhu, “Fourier-Transform Ghost Imaging with Hard X Rays,” Phys. Rev. Lett. 117(11), 113901 (2016). [CrossRef]

3. D. Pelliccia, A. Rack, M. Scheel, V. Cantelli, and D. M. Paganin, “Experimental X-Ray Ghost Imaging,” Phys. Rev. Lett. 117(11), 113902 (2016). [CrossRef]

4. A.-X. Zhang, Y.-H. He, L.-A. Wu, L.-M. Chen, and B.-B. Wang, “Tabletop x-ray ghost imaging with ultra-low radiation,” Optica 5(4), 374–377 (2018). [CrossRef]

5. D. Pelliccia, M. P. Olbinado, A. Rack, A. M. Kingston, G. R. Myers, and D. M. Paganin, “Towards a practical implementation of X-ray ghost imaging with synchrotron light,” IUCrJ 5(4), 428–438 (2018). [CrossRef]

6. S. Li, F. Cropp, K. Kabra, T. J. Lane, G. Wetzstein, P. Musumeci, and D. Ratner, “Electron Ghost Imaging,” Phys. Rev. Lett. 121(11), 114801 (2018). [CrossRef]

7. Y. Y. Kim, L. Gelisio, G. Mercurio, S. Dziarzhytski, M. Beye, L. Bocklage, A. Classen, C. David, O. Y. Gorobtsov, R. Khubbutdinov, S. Lazarev, N. Mukharamova, Y. N. Obukhov, B. R. Oesner, K. Schlage, I. A. Zaluzhnyy, G. U. Brenner, R. R. Oehlsberger, J. von Zanthier, W. Wurth, and I. A. Vartanyants, “Ghost Imaging at an XUV Free-Electron Laser,” arXiv:1811.06855v1 (2018).

8. A. M. Kingston, D. Pelliccia, A. Rack, M. P. Olbinado, Y. Cheng, G. R. Myers, and D. M. Paganin, “Ghost tomography,” Optica 5(12), 1516–1520 (2018). [CrossRef]

9. E. J. Candes and M. B. Wakin, “An Introduction To Compressive Sampling,” IEEE Signal Process. Mag. 25(2), 21–30 (2008). [CrossRef]

10. S. Foucart and H. Rauhut, A Mathematical Introduction to Compressive Sensing (Springer, 2013).

11. M. J. Padgett and R. W. Boyd, “An introduction to ghost imaging: quantum and classical,” Philos. Trans. R. Soc., A 375(2099), 20160233 (2017). [CrossRef]

12. J. H. Shapiro, “Computational ghost imaging,” Phys. Rev. A 78(6), 061802 (2008). [CrossRef]

13. N. Sloane, “Multiplexing methods in spectroscopy,” Math. Mag. 52(2), 71–80 (1979). [CrossRef]

14. M. Harwit and N. Sloane, Hadamard Transform Optics (Academic, 1979).

15. K. Kabra, S. Li, F. Cropp, T. J. Lane, P. Musumeci, and D. Ratner, “Mapping photocathode quantum efficiency with ghost imaging,” Phys. Rev. Accel. Beam (2019). (to be published).

16. N. Draper and H. Smith, Applied Regression Analysis (John Wiley, 1998).

17. O. Katz, Y. Bromberg, and Y. Silberberg, “Compressive ghost imaging,” arXiv:0905.0321v2 p. 131110 (2009).

18. C. Zhang, S. Guo, J. Cao, J. Guan, and F. Gao, “Object reconstitution using pseudo-inverse for ghost imaging,” Opt. Express 22(24), 30063 (2014). [CrossRef]

19. S. Jiang, X. Li, Z. Zhang, W. Jiang, Y. Wang, G. He, Y. Wang, and B. Sun, “Scan efficiency of structured illumination in iterative single pixel imaging,” Opt. Express 27(16), 22499–22507 (2019). [CrossRef]

20. A. Förster, S. Brandstetter, and C. Schulze-Briese, “Transforming X-ray detection with hybrid photon counting detectors,” Philos. Trans. R. Soc., A 377(2147), 20180241 (2019). [CrossRef]

21. W. Kühlbrandt, “The resolution revolution,” Science 343(6178), 1443–1444 (2014). [CrossRef]

22. T. J. Lane, “Simulations of multiplexing,” https://gist.github.com/tjlane/35a84123e3df73cea014ff082a5ad0a8 (2019).

23. C. Behrens, F. J. Decker, Y. Ding, V. A. Dolgashev, J. Frisch, Z. Huang, P. Krejcik, H. Loos, A. Lutman, T. J. Maxwell, J. Turner, J. Wang, M. H. Wang, J. Welch, and J. Wu, “Few-femtosecond time-resolved measurements of X-ray free-electron lasers,” Nat. Commun. 5(1), 3762 (2014). [CrossRef]

24. J. MacArthur, J. Duris, Z. Huang, and A. Marinelli, “High power sub-femtosecond x-ray pulse study for the lcls,” in Proc. of International Particle Accelerator Conference (IPAC’17), Copenhagen, Denmark, 14-19 May, 2017, (JACoW, Geneva, Switzerland, 2017), no. 8 in International Particle Accelerator Conference, pp. 2848–2850.

25. N. Hartmann, G. Hartmann, R. Heider, M. S. Wagner, M. Ilchen, J. Buck, A. O. Lindahl, C. Benko, J. Grünert, J. Krzywinski, J. Liu, A. A. Lutman, A. Marinelli, T. Maxwell, A. A. Miahnahri, S. P. Moeller, M. Planas, J. Robinson, A. K. Kazansky, N. M. Kabachnik, J. Viefhaus, T. Feurer, R. Kienberger, R. N. Coffee, and W. Helml, “Attosecond time–energy structure of X-ray free- electron laser pulses,” Nat. Photonics pp. 1–7 (2018).

26. S. Serkez, G. Geloni, S. Tomin, G. Feng, E. V. Gryzlova, A. N. Grum-Grzhimailo, and M. Meyer, “Overview of options for generating high-brightness attosecond x-ray pulses at free-electron lasers and applications at the European XFEL,” J. Opt. 20(2), 024005 (2018). [CrossRef]

27. G. Marcus, W. M. Fawley, D. Bohler, Y. Ding, Y. Feng, E. Hemsing, Z. Huang, J. Krzywinski, A. Lutman, and D. Ratner, “Experimental observations of seed growth and accompanying pedestal contamination in a self-seeded, soft x-ray free-electron laser,” Phys. Rev. Accel. Beams 22(8), 080702 (2019). [CrossRef]

28. W. H. Greene, Econometric Analysis (Pearson Education, 2003), 5th ed.

29. G. Geloni, V. Kocharyan, and E. Saldin, “A novel self-seeding scheme for hard x-ray fels,” J. Mod. Opt. 58(16), 1391–1403 (2011). [CrossRef]

30. J. Amann, W. Berg, V. Blank, F.-J. Decker, Y. Ding, P. Emma, Y. Feng, J. Frisch, D. Fritz, J. Hastings, Z. Huang, J. Krzywinski, R. Lindberg, H. Loos, A. Lutman, H.-D. Nuhn, D. Ratner, J. Rzepiela, D. Shu, Y. Shvyd’ko, S. Spampinati, S. Stoupin, S. Terentyev, E. Trakhtenberg, D. Walz, J. Welch, J. Wu, A. Zholents, and D. Zhu, “Demonstration of self-seeding in a hard-x-ray free-electron laser,” Nat. Photonics 6(10), 693–698 (2012). [CrossRef]

31. D. Ratner, R. Abela, J. Amann, C. Behrens, D. Bohler, G. Bouchard, C. Bostedt, M. Boyes, K. Chow, D. Cocco, F. J. Decker, Y. Ding, C. Eckman, P. Emma, D. Fairley, Y. Feng, C. Field, U. Flechsig, G. Gassner, J. Hastings, P. Heimann, Z. Huang, N. Kelez, J. Krzywinski, H. Loos, A. Lutman, A. Marinelli, G. Marcus, T. Maxwell, P. Montanez, S. Moeller, D. Morton, H. D. Nuhn, N. Rodes, W. Schlotter, S. Serkez, T. Stevens, J. Turner, D. Walz, J. Welch, and J. Wu, “Experimental demonstration of a soft x-ray self-seeded free-electron laser,” Phys. Rev. Lett. 114(5), 054801 (2015). [CrossRef]

32. E. Hemsing, G. Marcus, W. M. Fawley, R. W. Schoenlein, R. Coffee, G. Dakovski, J. Hastings, Z. Huang, D. Ratner, T. Raubenheimer, and G. Penn, “Soft x-ray seeding studies for the slac linac coherent light source ii,” Phys. Rev. Accel. Beams 22(11), 110701 (2019). [CrossRef]

33. T. Driver, S. Li, E. G. Champenois, J. Duris, D. Ratner, T. J. Lane, P. Rosenberger, A. Al-Haddad, V. Averbukh, T. Barnard, N. Berrah, C. Bostedt, P. H. Bucksbaum, R. Coffee, L. F. DiMauro, L. Fang, D. Garratt, A. Gatton, Z. Guo, G. Hartmann, D. Haxton, W. Helml, Z. Huang, A. LaForge, A. Kamalov, M. F. Kling, J. Knurr, M.-F. Lin, A. A. Lutman, J. P. MacArthur, J. P. Marangos, M. Nantel, A. Natan, R. Obaid, J. T. O’Neal, N. H. Shivaram, A. Schori, P. Walter, A. Li Wang, T. J. A. Wolf, A. Marinelli, and J. P. Cryan, “Attosecond transient absorption spooktroscopy: a ghost imaging approach to ultrafast absorption spectroscopy,” Phys. Chem. Chem. Phys. (2020).

	Dose ( $m$ meas.)	Error (per channel, $m$ meas.)	Dose-Error Product
Raster	$a m$	$σ / (a \sqrt{m n})$	$σ \sqrt{m / n}$
Half Gauss	$\sqrt{2 / π} \cdot a m n$	$1.66 \cdot σ / (a \sqrt{m})$	$1.32 \cdot σ n \sqrt{m}$
Bernoulli	$(1 / 2) \cdot a m n$	$2 \cdot σ / (a \sqrt{m})$	$σ n \sqrt{m}$
Hadamard $^{‡}$	$(1 / 2) \cdot a m n$	$2 \cdot σ / (a \sqrt{m})$	$σ n \sqrt{m}$

What are the advantages of ghost imaging? Multiplexing for x-ray and electron imaging

Abstract

1. Introduction

2. Framework

2.1 Noise model and transferred noise

2.2 Inversion stability: trace term

2.3 Radiation dose

3. The analysis of some examples

3.1 Raster scans

3.2 Multiplex scans

3.3 Comparison

4. Experimental opportunities for multiplex sensing

4.1 Resolution improvement

4.2 Felgett’s advantage: more signal per measurement

4.3 Signal-to-noise per dose

4.4 An example of applied multiplexing: XFEL self-seeding

5. Compressed sensing

6. Summary

A. OLS proof

B. Traditional ghost imaging

C. Additive error

C.1. Poisson error

C.2. Per-pixel Gaussian error

C.3. Per-measurement Gaussian error

D. IID sensing matrices

E. Generalized linear models

Funding

Acknowledgments

Disclosures

References

Cited By

Figures (5)

Tables (1)

Equations (46)

Optics Express