Multi-Scattering software: part I: online accelerated Monte Carlo simulation of light transport through scattering media

Joakim Jönsson; Edouard Berrocal

doi:10.1364/OE.404005

1. Introduction

Scattering media are innumerable. Human tissues, blood samples, turbid water, beverages such as milk or unfiltered beer, under water environments, smoke, fog, mist, clouds as well as spray systems are only a few examples that could be cited. When probing a scattering medium using a beam of visible light, photons interact several times with the randomly distributed scattering centers. This multiple scattering of light is a complex phenomenon, commonly encountered but rarely desired. In imaging it induces strong blurring on the recorded photographs reducing or even fully suppressing visibility. Despite the large variety of natural and industrial examples of scattering environments (see Fig. 1) biomedical optics and medicine remains the research fields with the strongest interest in the modeling of light transport. The most versatile and flexible way of modeling photon transport through scattering media is by means of the Monte Carlo (MC) method. The main concept behind a MC algorithm is to break down a complex problem into a series of smaller sets of probability calculations which are repeated a large number of times. The different possibilities/paths are then sampled using random numbers and a succession of probability density functions. This random sampling process must be sufficiently repeated in order to reduce statistical fluctuations and obtain converging results. Due to the large improvements in computer performance and the trend towards the adaptation of general-purpose computing on graphics processing units (GPU), modern MC codes can now simulate billions of photon packets within a reasonable time frame. This has led to the possibility of obtaining realistic simulations within complex three-dimensional geometries. In this article we are presenting a multi-purpose, GPU accelerated MC software called $Multi$-$Scattering$, which runs on a server located at Lund University. The core of the code originates from a numerical MC model developed by Berrocal in 2006 [1] where inhomogeneous scattering media can be considered by decomposing the simulated volume into voxels. While originally developed to simulate light propagation through clouds of droplets, the model has been extended here to consider tissue-like media. Other developments include the capability of simulating image formation, as well as the ability of parallel computing using CUDA programming. In addition, the software is highly flexible and user-friendly by means of web page interfaces. It can easily be employed for a variety of studies including: 1) Estimating photon path-length or time-of-flight to obtain quantitative measurements from near-infrared spectroscopy based techniques. 2) Calculating the fluence distributions in tissues for photon delivery used in photodynamic therapy. 3) Developing and optimizing new optical instruments for quantitative measurements of particle/droplet size and concentration. 4) Predicting the contribution and correcting for the effects of multiple light scattering. 5) Testing and comparing optical filtering strategies to improve image contrast. 6) Analyzing visibility issues and quantifying image transfer when imaging through scattering media.

Fig. 1. Examples of scattering media encounter in medicine (e.g. biomedical tissues), in the industry (e.g. beverages, spray systems) as well as in natural environments (e.g. turbid water, fog). Each medium is characterized by its own optical properties.

Download Full Size | PDF

The article is organized as follows: In section 2, a general description of $Multi$-$Scattering$ is given together with the specific features of the software. In section 3, the related terminology and definitions are provided. In section 4 the successive steps implemented in the code are detailed. Section 5 focuses on the parallelization of the code and its acceleration using CUDA. Section 6 shows the results of simulation speed over a set of different hardware and simulation configurations. Finally section 7 provides a typical example of inhomogeneous medium, where light propagation through a portion of a human head is simulated.

1.1 Monte Carlo simulations in biomedical tissues

In nuclear medicine MC simulations were first applied as early as the 1960s. This work included either the estimation of internal dosimetry or the optimization of instruments used for clinical nuclear counting and imaging. Interactions associated with x-rays, gamma rays, electrons, positrons, beta particles, and alpha particles are all being simulated in nuclear medicine.

It was not until 1983 that the first use of MC simulation for the study of light propagation through biomedical tissues was reported by Wilson and Adam [2]. The technique was applied to light dosimetry in order to predict and optimize light delivery for photodynamic therapy. Following this initial work, two notable Ph.D. thesis were written, the first by Prahl [3] in 1988 and the second by Keijzer [4] in 1993. The two went on to co-author several publications [5,6]. In 1989, one of the first comparisons of MC results with results from the diffusion equation was provided by Flock et al. [7], demonstrating the limits of the diffusion theory to accurately predict the fluence distribution through soft tissues. Based on the work of Prahl and Keijzer, Wang et al. introduced the possibility of considering multiple tissue layers, enabling more realistic simulations. This new code, developed in 1994, was named MCML short for Monte Carlo Multi-Layered [8]. Thanks to the detailed description of the code and the open access to the source code, MCLM has become the most popular and widely used model for the simulation of photon transport through skin tissues. In the early 2000’s, following up the multi-layer simulation configuration, Meglinski and Matcher developed a MC code taking into account successive wavelengths to deduce the absorption and reflectance spectra from human skin [9]. This was, then, further adapted by Churmakov for studying the localization of fluorescence within skin tissues [10].

In order to simulate highly heterogeneous scattering media, Boas et al. presented a MC code in 2002 based on the use of cubic voxels instead of infinite layers [11]. By using a three-dimensional matrix consisting of 151x171x232 voxels the authors simulated light propagation in a human head where the temporal evolution of photon migration could be visualized. To obtain even more realistic 3D shapes and contours of complex tissue structures, tetrahedrons were used by Shen and Wang in 2010 as part of a software project called TIM-OS. Similarly, and during the same year, Fang proposed a mesh-based method in a model referred to as MMCM [12].

In 2008, a breakthrough in the field of MC modeling of photon transport resulted in the possibility of significantly increasing the simulation speed by running the MC code on GPUs, as proposed by Alerstam et al. [13]. An approach known as general-purpose computing on graphics processing units (GPGPU). The article demonstrated a computational acceleration of up to three orders of magnitude compared with MC simulation performed on a single standard processor. Based on the same approach, where the code is implemented in CUDA and run on NVIDIA GPUs, Fang and Boas demonstrated in 2009 an acceleration by a factor of 300 [14]. In 2010, the GPU acceleration of the MCML code was demonstrated by Alerstam, Lo et. al [15] where a solution to tackle performance bottleneck caused by the inefficiency of atomic access to the GPU memory was proposed.

In 2011, a GPU accelerated MC simulation, coupled with a webpage interface for online access was proposed by Doronin and Meglinski [16]. Another GPU accelerated MC code, FullMonte, was introduced by Cassidy et al. in 2013 [17] and an optimized version of it has been released in 2019 [18]. Similarly, Fang and Yan [19] have also accelerated the MMC (Mesh-based Monte Carlo) code in 2019 using OpenCL computing framework allowing simulations to run on a wide range of CPUs and GPUs, showing scalability to accommodate increasingly more powerful GPU architectures.

1.2 Monte Carlo simulations in particulate scattering media

Particulate scattering media are defined as macroscopic media composed of randomly positioned particles [20]. They differs from biomedical tissues as the scattering centers originates from isolated particles and not from inhomogeneities of matter structure. Thus, the number density and extinction cross-sections of the scatterers (e.g. solid particles, droplets and/or molecules) should be known to define the extinction coefficient, which is usually not the case for biomedical tissues. In this sub-section, the term particulate media includes most scattering media that are not medical tissues.

The earliest applications of MC simulation of light propagation through particulate scattering media are from the 1960s and were related to the field of Atmospheric research. In 1965, Collins and Wells wrote a confidential report [21] presenting two MC procedures for studying the transport of light through the earth’s atmosphere under various environmental conditions. In 1968, Plass and Kattawar simulated multiple light scattering of visible light by clouds where the scattering phase function was obtained from the Lorenz-Mie theory [22]. Soon after, the same authors generated several articles using their MC code to deduce the radiance of multiple scattered light for a typical haze [23], for an atmosphere-ocean system [24] as well as for investigating time-of-flight Lidar measurements in oceans [25]. In 1973, Funk used MC simulations to determine the effects of multiple scattering on the performance of undersea optical imaging systems [26]. The same year, Bucher reported computer simulations of light pulse propagation through clouds in the context of earth-satellite communication [27].

From 1983 and over a period of 30 years Bruscaglioni and Zaccanti co-published an extensive number of articles on MC simulations for various applications including: the transmission of a light beam through a turbid atmosphere [28], the effect of a turbid medium on the Modulation Transfer Function of an optical system [29], the experimental validation of the MC procedure [30] and the calculations of Lidar returns [31]. In parallel to this work, a MC model called MUSCAT, originally developed by Briton during his Ph.D. thesis in 1989, was capable of dealing with a broad field of applications [32], including visibility predictions in foggy weather. More specifically, in 1994 Rozé et al. [33] have used the code for investigating the sighting distance through fog for real driving situations, considering the presence of street lamps and assuming the road to be perfectly absorbing. In 1991, Roysam et al. presented a MC approach for computationally simulating the propagation of light through smoke mixtures in the context of visualizing lighted exit signs during fire situations [34].

In 2005, Berrocal et al. presented a new MC model for light propagation in highly inhomogeneous polydisperse turbid media with applications in spray diagnostics [35,36]. The simulation in [36] considered 73x73x50 voxels and 25 different scattering phase functions to model an inhomogeneous polydisperse spray system typically used for combustion applications. The code has been validated with experimental results in 2007 [37] and an analysis of the contribution from individual scattering orders was shown in 2009 [38].

This model has been further developed and largely extended as presented in this article. In addition to particulate scattering media, the model can now consider tissue-like media making it truly multi-purpose. To this end, the model has been accelerated via parallel computing and part of the original code has been converted from C programming language to CUDA in order to run the simulations on NVIDIA graphic cards; in a similar fashion to that shown by Alerstam et al. [13]. A detailed description of the model and an analysis of its performance are given in the following sections.

2. Multi-Scattering: a multi-purpose MC software

2.1 General description

$Multi$-$Scattering$ runs MC simulations of photon transport through turbid media from a server located at Lund University. Several web pages have been created to set-up a simulation, display the results as well as to organize, manage and run the simulations. It can be accessed at: https://multi-scattering.com/. Thanks to this web interface, simulation schemes can be rapidly designed and set-up without any prior programming knowledge. In addition, this configuration avoids the need for software installation as well as any compatibility issues with GPU cards (as only Nvidia cards have full support for CUDA). Those user-friendly features allow a quick and easy access to accelerated MC simulations to any researcher. The simulated results can be downloaded and the raw data is directly accessible. A screenshot of the web page displaying an example of output results is given in Fig. 2. Using this output interface, several tabs can be open in parallel facilitating the comparison of resulting images. Note that the display of the results has been optimized for the Firefox and Chrome browsers.

Fig. 2. Screenshot of a simulation output displayed online. In this example, a 600 nm laser pulse of 1 ps duration is being scattered by a cloud of water droplets of 1 $\mu$m diameter, suspended in air. The scattering volume is a homogeneous cube a 50 mm diameter. The spatial intensity distribution on the front (Ymax) face is given with the corresponding temporal profile.

Download Full Size | PDF

Figure 3 illustrates how $Multi$-$Scattering$ is operated. First, a user online account is needed to login as shown in (a). This gives access to several default simulations, which have been pre-configured for various studies as listed in (b). Thus, the settings of a default simulation, which is the closest to a desired case of study, can be copied directly and modified accordingly. Changing the input data can be done via several libraries, shown in (c), where the incident light sources, the duration of the incident pulse, the time gate, the use of masks for filtering photons exiting the medium (e.g. for light detection using optical fibers) and a desired scattering phase function can be chosen. Using those libraries allows to setting-up rapidly a simulation while avoiding the risks of errors by including input data in a wrong format that cannot be processed.

Fig. 3. Screenshot showing how $Multi$-$Scattering$ is organized. (a) is the login interface, (b) is the list of default simulation pre-configured (c) shows the libraries where various input data are stored and can be picked up and (d) is the list of simulation setup by the user with the corresponding status of the simulation progresses.

Download Full Size | PDF

Finally once a simulation is configured it is saved in the user’s list of simulation, as shown in Fig. 3(d), where it is ready to be run. By default, the number of photons sent per simulation occasion is one billion with an upper limit of 10 billion. More photons can be added by incremental steps to already completed simulations. This is a convenient feature as a simulation does not need to be restarted to improve the statistics. Similarly, any simulation can be stopped while running, without any loss of processed data. In this case, the output results are saved with the number of photons launched until the simulation was aborted. Each simulation runs on a single high-performance computer consisting of four Nvidia graphic cards (Nvidia GeForce GTX 1080 Ti). Three computers are available on the server at Lund University allowing for three simulations to run simultaneously without time losses. Depending on the demand for $Multi$-$Scattering$, the number of computers will be increased accordingly. Further increases in simulation speed will also be made possible by upgrading the GPU’s. To reduce the time for readjusting a detection setting and rerunning a given simulation, the software has the capability of up to 20 detection schemes. This allows, a single run, to produce multiple results where photons are sorted out differently, as a function of specified detection characteristics. For each detection case, the optical configuration including the dimension and position of the collecting lens, the duration of the incident pulses, the application of time-gating and the characteristics of a detection mask are specified. Thus, allowing for the results from multiple optical configurations to be directly compared from a single run.

2.2 Features of the simulations

The implementation of MC simulations from $Multi$-$Scattering$ can be summarized as follows: First, photon packets start their journey from a light source. The angular and spatial distribution of the light source can be adjusted according to the purpose of the simulation. Photon paths are traced within a simulated volume where the photon packet undergoes successive scattering/absorbing events until they exit the simulated volume. Random samples are taken from successive probability distributions which govern various interactions and transitions. Once a photon packet exits the simulated volume the parameters of interest are extracted from its traveling history. These steps are repeated for a large number of photons until sufficient statistics are accumulated.

Historically, the high computational requirement has been the main limiting factor against the use of MC simulations for light transport through scattering media. However, this repetition process makes MC simulations highly suitable for parallel processing. It was not until the advent of general-purpose computing on graphics processing units, that MC simulations became possible even for very challenging scattering media. Note that the need for launching more photons increases in situations where only a small fraction of the total simulated light is collected and analyzed. This light collection is usually a lens, an optical fiber or a detector.

The MC method is based on the assumption that scattering events are taking place at random locations in the simulated volume, meaning that the scattering centers do not have a static position. Thus, to produce simulated results that are comparable to experimental data the investigated medium must either be sufficiently turbid, so that the overall scattering process can be assumed random (e.g. medical tissues), or must contain scattering particles that are in motion (e.g. particulate scattering media) and where averaged data is recorded (averaging over long exposures or averaging over a large number of independent snapshots).

In $Multi$-$Scattering$, the entire scattering medium is divided into voxels with uniform dimensions and with each voxel characterized by its optical properties. Currently a maximum of 20x20x20 voxel matrix is allowed on the software version available online. This restriction is due to the complexity of handling input data on the web interface, as the optical properties of each voxel must be set manually. A larger number of voxels will be extended in the future for the reconstruction of more complex inhomogeneous scattering structures involving large amounts of input data. The overall simulated volume is, by default, always a cubic structure in order to simplify the online results visualization and display. However, voxels can be considered "empty" by setting the extinction coefficient to zero ($\mu _e=0$), so any non-cubic shape can be modeled. Each voxel is characterized by its own optical properties including: The extinction coefficient (used to determine the location of scattering/absorbing centers), the albedo (used to reduce photon weight as a function of absorption) and the scattering phase function (used to determine a new direction after a scattering event). Those quantities are defined and detailed in the next section. In addition to multiple voxels, $Multi$-$Scattering$ has various other features, as described in Fig. 4, including the possibility of focusing the incident beam, of analyzing temporally resolved data and of visualizing photon path and energy deposition within the simulated volume.

Fig. 4. Specific features of $Multi$-$Scattering$ where the simulated volume is decomposed into voxels. Medical tissues and particulate scattering media can both be considered, making it a versatile simulation tool accessible online.

Download Full Size | PDF

3. Terminology and definitions

The optical properties characterizing scattering and absorbing media are defined in this section with the terminology used in this article. Those properties are either employed in the MC code as input data or calculated through the simulation.

3.1 Extinction, scattering and absorption

When a collimated light beam of given wavelength and initial direction is crossing a homogeneous medium containing scattering and absorbing centers, the transmitted light intensity $I_t$ can be deduced using the Beer-Lambert law such as:

(1)$$I_t=I_ie^{-\mu_e\cdot l} \qquad \qquad \textrm{where} \qquad \qquad \mu_e=\mu_s+\mu_a$$

Thus, the incident light intensity $I_i$ is exponentially reduced as a function of the length $l$, in mm, traveled by the light through the medium and of the extinction coefficient $\mu _e$, in mm$^{-1}$, which is the sum of the scattering and absorption coefficients, $\mu _s$ and $\mu _a$ respectively. Assuming that the medium contains distinct scattering/absorbing centers, such as particles, droplets or molecules, then:

(2)$$\mu_e=N\cdot\sigma_e \qquad \qquad \textrm{and} \qquad \qquad \sigma_e=\sigma_s+\sigma_a$$

where $\sigma _e$ is the extinction cross-section in mm$^2$ and $N$ is the number density of scattering/absorbing centers in $\#$/mm$^3$. At each interaction between light and a center, the fraction of light that is scattered over the total extinction is called the single scattering albedo, $\omega$, and is calculated such as:

(3)$$\omega = \frac{\mu_s}{\mu_s+\mu_a} = \frac{\sigma_s}{\sigma_s+\sigma_a}$$

For non-absorbing media $\omega =1$ and for highly absorbing media $\omega$ tends to 0. For example, in the visible light spectrum soot particles are mostly absorbing, water droplets are mostly scattering and red blood cells are both scattering and absorbing centers. The mean free path length corresponds to the average distance between two scattering/absorbing events. It is calculated from the inverse of the extinction coefficient:

(4)$$\overline{l}_{fp}=\frac{1}{\mu_e}$$

By dividing the total length $l$ of a turbid medium with its mean free path length $\overline {l}_{fp}$, the average number of scattering events along $l$ is deduced. This average number of scattering events is known as the optical depth and denoted in this article as $OD$ (also commonly denoted as $\tau$ in the literature):

(5)$$OD=\frac{l}{\overline{l}_{fp}}=\mu_e\cdot l$$

As illustrated in Fig. 5 turbid media can be categorized from their optical depth and albedo. Based on the optical depth three scattering regimes can be identified. If the optical depth is smaller than one, $OD \leqslant 1$, then the $single$ $scattering$ $regime$ is assumed where single scattering events are dominant. If the optical depth ranges between $2 \leqslant OD \leqslant 9$, the $intermediate$ $scattering$ $regime$ applies where visibility is reduced. This corresponds to the transition between seeing an object through a scattering medium, to not seeing it. In this regime the dominant scattering order is usually close to the $OD$ value especially for forward scattering phase functions. Note also that the Modulation Transfer Function of an imaging system through scattering media can be calculated in this regime [29]. If now $OD \geqslant 10$, then the $multiple$ $scattering$ $regime$ is assumed. In this case visibility is lost. Retrieving visibility can be done via the use of some filtering strategies such as time-gating [39], Fourier spatial filtering and/or using Structured Illumination based techniques [40]. For $OD \geqslant 15$ retrieving visibility is extremely challenging. In the multiple scattering regime the scattering orders tend to contribute nearly equally, allowing the diffusion approximation to be applicable. This is the case when light is crossing a few millimeters of medical tissues only.

Fig. 5. Classification of turbid media as a function of the optical depth $OD$ and single scattering albedo $\omega$. When $OD \leqslant 1$, then the $single$ $scattering$ $regime$ is assumed where most photons are unscattered (ballistic light). When the optical depth ranges between $2 \leqslant OD \leqslant 9$, the $intermediate$ $scattering$ $regime$ applies involving blurring effects. At $OD \geqslant 10$, the $multiple$ $scattering$ $regime$ is assumed and visibility is lost. Scatterers can be mostly absorbing $\omega \sim 0$ (e.g. soot particles), mostly scattering $\omega \sim 1$ (e.g. water droplets), or both absorbing and scattering $0 < \omega < 1$ (e.g. blood).

Download Full Size | PDF

3.2 The Radiative Transfer Equation

The radiative transfer theory is based on the energy conservation of incoming, outgoing, absorbed and scattered photons within an infinitesimal volume element. The optical properties of the volume element are assumed homogeneous and the central equation describing the average transport of photons through such volume is known as the Radiative Transfer Equation (RTE) (or equation of radiative transfer). An illustration describing the RTE is given in Fig. 6. In the RTE only quantities of power or intensity are considered. Light propagation is envisioned as a photon stream and correlations between the radiation fields such as interference are neglected. Such assumptions can only be assumed if the wavelength of the incident radiation is small when compared to the dimensions of the scattering medium and for well separated scattering centers (independent scattering). For random scattering media, these conditions are respected. The RTE can be described as follows: The change of radiance along an incident direction corresponds to the loss of radiance due to the extinction of the incident light plus the amount of radiance that is scattered from all other directions into the incident direction. As previously defined, light extinction equals the loss of radiance due to scattering of the incident light in all other directions, plus the loss of the radiance due to absorption.

(6)$$\frac{1}{C}\frac{ \partial I(\vec r,\vec s,t)}{ \partial t} = \ - \mu _s I(\vec r,\vec s,t) \ - \mu _a I(\vec r,\vec s,t) \ + \mu _s \int_{4\pi} f(\vec s',\vec s) I (\vec r,\vec s,t) d \Omega '$$

Fig. 6. Visual description of Radiative Transfer Equation given in Eq. (6).

Download Full Size | PDF

The RTE is given above, in Eq. (6), where $t$ is time, $\vec r$ is the vector position, $\vec s'$ is the incident direction of propagation, $f(\vec s',\vec s)$ is the scattering phase function derived from the appropriate scattering theory (e.g. Lorenz-Mie or Rayleigh-Gans theory), $d \Omega$ is the spanning of the solid angle and $C$ is the speed of the light in the surrounding medium. While the RTE is applicable for a wide range of turbid media, analytical solutions are only available in rather simple circumstances where assumptions and simplifications are introduced. For realistic cases of scattering media numerical techniques have been developed and utilized to solve the RTE. The most versatile and widely used numerical solution is based on the statistical MC technique.

3.3 Scattering phase function and anisotropy factor

By definition, the scattering phase function $f(\vec s',\vec s)$ is the angular distribution of light intensity scattered by a scattering center, such as a molecule, particle, cell, or droplet, at a given wavelength. Note that this is a misleading term as $f(\vec s',\vec s)$ has nothing to do with the phase of a light wave. The parameters governing the scattering phase function are the incident light characteristics (wavelength, polarization state, intensity profile), the optical properties of the surrounding medium (external refractive index) and the characteristics of the scattering center (size, shape, refractive index, orientation). The scattering phase function is usually given under its normalized form:

(7)$$\int_{4\pi} f(\vec s',\vec s) d \Omega ' = 1$$

When the scattering phase function is limited to the scattering plane, it is defined as $f(\theta _s)$, where $\theta _s$ is the polar scattering angle as described in Fig. 7 . For axial symmetry the scattering probability is homogeneous over the azimuthal angle $\phi _s$, defined between 0 and 2$\pi$ and the scattering phase function depends only on its scattering angle $\theta _s$ defined between 0 and $\pi$. This is the case for spherical particles/droplets and for scattering phase function derived from the Henyey-Greenstein theory [41]. Three examples of Lorenz-Mie scattering phase function calculated using the computational script provided by Bohren and Huffman in [42] are given in Fig. 8 (logarithmic scale). Here the incident light is assumed unpolarized at wavelength $\lambda = 600$ nm. For the 10 $\mu$m droplet, a highly forward scattering lobe is visible while for the 0.1 $\mu$m case light scatters more homogeneously corresponding to the Rayleigh scattering regime. Scattering phase functions can also be averaged over several wavelengths, when the incident light consists of a broad light spectrum, and/or when considering a distribution of particles. In the case of a homogeneous distribution of spherical particles of diameter $D$, the representative averaged scattering phase function $\overline {f}(\theta _s)$ is calculated such as:

(8)$$\overline{f}(\theta_s)=\frac{\int^\infty_{D=0}n(D)\cdot\sigma_s(D)f(D,\theta_s)\cdot dD} {\int^\infty_{D=0}n(D)\cdot\sigma_s(D)\cdot dD}$$

To quantify the amount of light that is scattered in the forward direction from a scattering phase function, the mean cosine of the scattering angle can be calculated, corresponding to the so-called anisotropy factor $g$:

(9)$$g=\int^{\pi}_{0}f(\theta_s) \cdot cos(\theta_s) \cdot 2 \pi \cdot sin(\theta_s) d\theta_s \qquad \textrm{where} \qquad \int^{\pi}_{0} f(\theta_s) \cdot 2\pi \cdot sin(\theta_s) d\theta_s=1$$

where $-1 \leqslant g \leqslant 1$. Isotropic scattering occurs for $g=0$ while a value near 1 indicates highly forward scattering. The anisotropy factor can also be used to define the scattering phase function via the Henyey-Greenstein equation [41], defined as:

(10)$$f(\theta_s)=\frac{1}{4\pi}\frac{1-g^2}{[1+g^2-2g \cdot cos(\theta_s)]^{3/2}}$$

In $Multi$-$Scattering$ pre-calculated Henyey-Greenstein scattering phase functions can be chosen as input data. Finally, a quantity known as the reduced scattering coefficient is deduced using $g$ such as:

(11)$$\mu_s' =\mu_s\cdot (1-g)$$

Note that $\mu _s'$ aims at reducing the calculation time from MC simulations by extending the mean free path length when forward scattering phase functions are considered. Despite its merits this approximation doesn’t allow for the consideration of complex scattering phase functions and can only be applied within the multiple scattering regime. For those reasons, the reduced scattering coefficient is not used in $Multi$-$Scattering$ and $\mu _s$ is employed by default.

Fig. 7. Directions and angles of a photon being scattered in the local UVW coordinate system. The polar and azimuthal scattering angles $\theta _s$ and $\phi _s$ are shown on the left side while the solid angles $\Omega _s'$ and $\Omega _s$ around the respective incident and scattered vectors are shown on the right side.

Download Full Size | PDF

Fig. 8. Examples of polar scattering phase functions (logarithmic scale) for three sizes of spherical water droplets of refractive index $n_p=1.333-0.0i$ suspended in air of refractive index $n_m=1.000293-0.0i$. The incident light is assumed to be unpolarized and monochromatic at wavelength $\lambda = 600$ nm. As the size of the droplet increases, light is significantly scattered in the forward direction and scattering ripples are appearing.

Download Full Size | PDF

4. Description of the successive steps and algorithm

4.1 General simulation algorithm

The simulation algorithm used in the $Multi$-$Scattering$ software is divided into two sections:

- The $Monte$ $Carlo$ $code$ that tracks photon packets through the simulated scattering medium.

- The $Detection$ $code$ that selects the desired photon exiting the scattering medium and simulates image formation using a ray-tracing approach.

The general algorithm of the software is shown in Fig. 9. When a simulation is initiated, the $Monte$ $Carlo$ $code$ starts and the initial position of photons is defined on a face of the simulated volume using the desired light source matrix and photon direction. Then, the distance to the next scattering/absorption interaction center is calculated. If the interaction center is within the boundaries of the simulated volume, the $Monte$ $Carlo$ $code$ continues tracking photons; otherwise photon packets leaves the simulated scattering volume and the simulation continues to the $Detection$ $code$ as shown on the right hand side of the flow chart in Fig. 9. Then, depending on their properties photons are sorted by applying several filtering conditions, according to the desired detection case. Finally, various detection cases, set by the user prior to starting the simulation, are processed, resulting in the formation of image data. Results can be opened in the output web page where they are displayed and can directly be analyzed.

Fig. 9. Flow chart describing the algorithm of the Multi-Scattering simulation. The algorithm is divided into two successive codes: to the left, the $Monte$ $Carlo$ $code$ and, to the right, the $Detection$ $code$.

Download Full Size | PDF

4.2 Launching photon packets

The simulation starts by launching photon packets from an image. This two-dimensional light source matrix is both scaled and positioned on the surface of the scattering medium as desired. It can either be uploaded by the user or alternatively, selected from pre-defined images proposed in the "Light Source Library". Using this approach any experimental light source can be faithfully simulated, allowing for closer comparisons with experimental results as shown in [43]. Prior to launching photons, the source matrix is normalized and the probability weight of photons for each pixel is calculated accordingly. The pixel from which a photon packet is launched is deduced using two random numbers. By generating two more random numbers, a random point is defined within the boundaries of the chosen pixel. Finally, the absolute starting position on the face of the scattering medium is calculated by taking into consideration the size and position of the source matrix and the dimensions of the scattering medium. In contrast to sequential launching, this random positioning of the photons starting point is beneficial in preserving the intensity profile of the source in situations where a simulation is stopped prior to completion. Using this approach, results from aborted simulations remain meaningful as the entire light source is being statistically reconstructed. Furthermore, thanks to this feature additional photons can be launched to an already completed simulation. Once the starting photon position is chosen, the initial direction of photon propagation must also be defined by setting a value to the polar $\Theta$ and azimuthal $\Phi$ angles in the main coordinate system (XYZ). Depending on the directional features of the light source, two different approaches are available:

1) By setting up a photon dispersion cone angle: In this case photons are given a random direction vector that is uniformly constrained within the cone angle defined by $\theta _{start}$. If this angle equals zero, a perfectly collimated beam is assumed resulting in the initial direction being a vector along one direction X, Y or Z. If, however, the angle is close to 90 degrees then a highly diffusing source is being simulated. Semi-directional light sources, such as LEDs, can easily be defined by setting up $\theta _{start}$ according to the illumination angle specifications.

2) By setting up a focusing distance: In this case, a focal point is defined by the user. This distance can either be positive for a focusing beam or negative for a diverging beam. The initial vector is pointing towards the focus point and photon direction is calculated accordingly. Thus, the incident angle differs significantly depending on photon position on the source matrix. A useful application of this feature is the focusing of a laser beam into medical tissues.

Based on the chosen option and on the inserted input data, the angles $\Theta _i$ and $\Phi _i$ defining the incident directions of a photon packet are deduced accordingly. Finally, an initial weight equal to one, $W_0=1$, is set to a photon packet prior to launching.

4.3 Determination of the free path length

The free path length $l_{fp}$ is the distance of propagation until an interaction event, scattering and/or absorption, occurs. The cumulative distribution function for the mean free path length is derived from the Beer-Lambert law and equals:

(12)$$C(l_{fp})=1-e^{-\mu_e\cdot l_{fp}}$$

Yielding to the probability density function of a photon packet to be scattered and/or absorbed after traveling the distance $l_{fp}$:

(13)$$P(l_{fp})=\mu_e\cdot e^{-\mu_e\cdot l_{fp}}$$

Following the process of random sampling and considering a given random number $\xi _1$ where $\xi _1 \in (0,1)$ and the corresponding distance $l_1$, it is deduced that:

(14)$$\int_0^{l_1}P(l_{fp})\cdot d(l_{fp})=\xi_1$$

Which leads to

(15)$$\int_0^{l_1}\mu_e\cdot e^{-\mu_e\cdot l_{fp}}\cdot d(l_{fp})=\xi_1$$

Finally,

(16)$$1-e^{-\mu_e\cdot l_1}=\xi_1 \qquad \qquad \textrm{and} \qquad \qquad l_1=-\frac{\ln(1-\xi_1)}{\mu_e}$$

However, as $(1-\xi _1)$ equals another random number uniformly distributed between 0 and 1, the free path length $l_{fp}$ between two light-particle interactions can be deduced from any sampled random number $\xi$ as:

(17)$$l_{fp}=-\frac{\ln\xi}{\mu_e}$$

By knowing the distance $l_{fp}$ and the initial direction of propagation, the position of the next scattering/absorbing center is calculated within the absolute (XYZ) coordinate system. If this interaction center is located inside the medium, then a new direction is calculated, otherwise the photon’s journey in the scattering volume is ending.

4.4 New photon weight

Photon packets start with a photon weight equals to unity, $W_0=1$. Once an interaction event occurs, the amount of scattered light must be adjusted accordingly depending on the absorption properties of the medium. Thus, at each successive scattering/absorption center the photon weight is gradually reduced by the fraction of light being absorbed. This loss of energy, named energy deposit, is an important parameter in optical dosimetry. At the interaction $n$ the fraction of the photon packet being absorbed corresponds to:

(18)$$\Delta W_a = W_n \cdot [\frac{\sigma_a}{\sigma_s+\sigma_a}] = W_n \cdot (1-\omega)$$

Where $\sigma _a$ and $\sigma _s$ are the respective absorption and scattering cross-sections of the interaction center, while $\omega$ is the albedo. Finally the new photon weight $W_{n+1}$ is given as:

(19)$$W_{n+1} = W_n - \Delta W_a$$

If the interaction center is non-absorbing, the absorption cross-section is set to zero and the photon weight remains unchanged. At the end of a photon’s journey, the photon weight can be further reduced if the surrounding medium is absorbing. Note that the absorption of the surrounding medium differs from the absorption of the scattering centers. Examples of this would include situations in which individual particles or droplets are suspended in a medium that absorbs the probing light. On the other hand, for tissue media, there is no distinction between scattering centers and surrounding medium. So rather than using absorption coefficients for each, an overall coefficient is applied.

The final reduction in photon weight is calculated as:

(20)$$W_{final} = W_n \cdot e^{-\mu_{am}\cdot l_{total}}$$

Where $\mu _{am}$ is the absorption coefficient of the surrounding medium and $l_{total}$ is the total distance traveled once photons have exited the medium.

4.5 New photon direction

When scattering occurs, a new propagation direction must be specified. The directions of propagation of a photon packet before and after a scattering event are respectively defined by the vectors $\overrightarrow {S}$’ and $\overrightarrow {S}$ within the absolute coordinate system (XYZ). The transformation from $\overrightarrow {S}$’ to $\overrightarrow {S}$ is performed using the polar and azimuthal scattering angles $\theta _s$ and $\varphi _s$ defined within a local coordinate system (UVW) as illustrated in Fig. 10. This transformation is mathematically expressed in Eq. (21) where scattering angles $\theta _s$ and $\varphi _s$ are calculated from the appropriate scattering phase function:

(21)$$\left( \begin{array}{c} S_x \\ S_y\\ S_z\\ \end{array} \right) = \left[ \begin{array}{ccc} \frac{S_x'S_z'}{(1-S_z'^2)^{1/2}} & -\frac{S_y'}{(1-S_z')^{1/2}} & S_x' \\ \frac{S_y'S_z'}{(1-S_z'^2)^{1/2}} & -\frac{S_x'}{(1-S_z')^{1/2}} & S_y' \\ -(1-S_z'^2)^{1/2} & 0 & S_z' \\ \end{array} \right] \left( \begin{array}{c} \sin \theta_s \cos \varphi_s \\ \sin \theta_s \sin \varphi_s \\ \cos \varphi_s \\ \end{array} \right)$$

Fig. 10. To calculate a new photon direction $\vec S$ after a scattering event, a change of coordinate system is used such as: (X,Y,Z)$\Rightarrow$(U,V,W)$\Rightarrow$(X,Y,Z). The scattering center is located at the origin of (U,V,W), the W axis is aligned with the incident propagation direction and the scattering phase function is defined in the (U,W) plane. Finally, the vector $\vec {S}$ is deduced using $\theta _s$ and $\varphi _s$ through the matrix transformation given in Eq. (21).

Download Full Size | PDF

For the simple case of isotropic scattering, the scattering angles are calculated using random numbers $\xi$ such as $\theta _s=\cos ^{-1}(2\xi -1)$ and $\varphi _s=2\pi \xi$. For spherical particles and droplets, the scattering process is derived from the Lorenz-Mie theory (see Fig. 8). In this case, as well as for any axisymmetric scattering phase function, the most straightforward approach of generating the polar angle $\theta _s$ is to use the inverse Cumulative Density Function (CDF) method.

The function CDF($\theta _s$) is calculated by integrating the scattering phase function $f(\theta _s)$ over the solid angle $d\Omega '$. Then, a random variable $\theta _s$ can be generated by taking the inverse transform of a random number uniformly distributed between zero and one, such as: $\theta _{1}=CDF^{-1}\xi _1$ where:

(22)$$CDF(\theta_{1}) = \int_0^{\theta_{1}}f(\theta_s)\cdot d\Omega(\theta_s)$$

When the analytical form of the inverse CDF is not available, the CDF is stored in a lookup table and the inverse transformation is performed at each scattering event using a random number. By considering the scattering phase function to be axisymmetric, the azimuthal angle $\varphi _s$ is deduced from $\varphi _s=2\pi \xi$.

4.6 Photon crossing voxels and exiting the scattering volume

Each time photons scatter from an interaction center, a new free path length $l_{fp}$ is calculated defining the distance traveled by photons prior to the next interaction. This distance must then be compared to the the distance corresponding to the voxel boundary, $l_{exit}$, where photons will exit the voxel if no further interactions occur. Note that the scattering medium can either be defined by a single voxel or by multiple voxels as illustrated in Fig. 11(a). The challenge in determining $l_{exit}$, is figuring out which face of the voxel is concerned. This is done by calculating the three distances $l_x$, $l_y$, and $l_z$ corresponding to the respective distances traveled by the photon packet until the local closest (xz), (yz) or (xy) planes are encountered. The smallest $l$ is used as the length $l_{exit}$ as illustrated in Fig. 11(b). By now comparing $l_{fp}$ with $l_{exit}$, three scenarios can occur:

Fig. 11. (a) Illustration of simulated volumes within the absolute (XYZ) coordinate system; including a single voxel and a 3x3x3 voxel matrix. The exit faces are termed $X_{min}$, $X_{max}$, $Y_{min}$, $Y_{max}$, $Z_{min}$ and $Z_{max}$ respectively. (b) Illustration of a scattering event with the respective polar and azimuthal angles $\Theta$ and $\Phi$ in (XYZ).

Download Full Size | PDF

1) If $l_{fp} < l_{exit}$: Then, the next scattering event will take place inside the voxel and the process will be repeated.

2) If $l_{fp} > l_{exit}$ and photons enter a new voxel: Then, the free path length in the new voxel $l_{fp(new)}$ must be adjusted as a function of the ratio of the extinction coefficients between the "old" and the "new" voxels such that:

(23)$$l_{fp(new)} = (l_{fp}-l_{exit})\cdot \frac{\mu_{e(old)}}{\mu_{e(new)}}$$

Note that in the current model refractive-index-mismatch between adjacent voxels is not considered. Further developments will take into account refractive-index-mismatch in the same fashion as described in MCML [8].

3) If $l_{fp} > l_{exit}$ and the photon packet exit the simulated volume: Then, the photon journey inside the medium is terminated. When photon packets exit the simulated volume, their direction of propagation, total distance traveled, weight and position are known. The $Monte$ $Carlo$ $code$ ends and the photons are processed through a second code referred as the $Detection$ $code$.

4.7 Photon detection and image formation

The $Detection$ $code$ aims at filtering out photons for various detection cases that are set, prior to starting the simulation. Having multiple detection settings presents two main advantages: First a given simulation does not need to be run again for each new detection case, saving computational time. Second, it allows for the easy exploration of multiple detection cases where results can be directly compared to one another. This provides further insight on how photons are propagating through the simulated volume and how they contribute to image formation. Each detection case is defined by a number of criteria that must be fulfilled. The successive detection filtering criteria used for sorting out and selecting the desired photons are described below together with the successive steps characterizing the $Detection$ $code$:

• Scattering order filtering: This is the first filter applied. If this filtering is activated, then only photon packets having undergone a given number of scattering events are detected. Note that this filtering process does not need to be activated and in this case all photons can be considered.

• Mask filtering: A mask, consisting to a binary matrix is applied on the face of the simulated medium. At locations where zero is set, light is blocked, and photons are rejected. On the contrary, at locations where unity is set, light is fully transmitted, and photons are accepted. This filtering process is found to be advantageous for the simulation of optical fibers located on the surface of the simulated medium. An option called "No mask" can be selected by the user where all light exiting the medium is considered. Various masks can be selected from the "Masks Library".

• Temporal pulse: At this stage of the simulation, the temporal profile of the illuminating pulse is considered. By doing this operation at the end of the simulation, instead of the beginning, a variety of pulse durations and shapes can be considered from the "Incident Pulse Library". This is implemented by simply adding a distance to the total path traveled by the exiting photon packet. This added distance is calculated through random sampling using a temporal profile of a given shape and duration that has been pre-selected for each detection case prior to running the simulation.

• Lens filtering: To simulate image formation, a lens is now positioned at a defined distance from the exiting face. The lens is assumed to be a disk of known diameter. Only photon packets reaching the lens are collected and contribute to image formation.

• Time-gating: A temporal gate is applied to photons reaching the collecting lens. The temporal gate is pre-selected from the "Time-Gate Library". The temporal gate is defined by its shape, duration and start. Only photons having a time-of-flight within the temporal gate are selected. Note that the photon weight is modified according to the value of the gate and arrival time. An option called "No Gating" can be selected by the user where all light reaching the lens is considered no matter their time-of-flight.

• Image Formation: After having undergone the successive filtering processes described above the selected photon packets contribute to the formation of an image using a ray-tracing approach. Two images are created during this process: One corresponding to the Fourier plane and one at the image plane. To form the final image, the number of pixels employed and the desired magnification must be specified beforehand.

At the end of the $Detection$ $code$ four types of images are generated referred as: The "Face Image" showing the photon distribution at the face of the simulated medium; the "Lens Image" showing the light intensity spreading over the collecting lens; the "Fourier Image" showing the angular distribution of photon direction; and the "Final Image" showing the image being formed at the image plane.

5. Parallel programming using CUDA

5.1 CUDA and modern GPU’s architecture

Parallel processing can be employed to improve the performance of a heavy computational algorithm. The benefit of accelerating the simulation via parallel processing is not only the significant reduction in turnaround time but also that it allows for a much larger number of photon packets to be tracked and processed. This is an important detail, as it reduces inaccuracy induced by minimal statistics and ensures the convergence of the results toward the exact solution of the RTE. In this work, CUDA is used as the programming interface to parallelize the simulation and to efficiently reduce the execution time. CUDA was created by Nvidia with the intention of using GPUs for general-purpose calculations; calculations that can be performed in parallel. While CPUs usually have a threads counted in tens, GPUs threads are counted in thousands. Transitioning an application from a single thread to two threads splits, at best, the simulation time by half. The large number of threads with modern GPUs opens up the possibility of accelerating the simulations by several orders of magnitude. The reason for the vast difference in the number of threads between CPUs and GPUs stems from the design of the two circuits. As illustrated in Fig. 12 the cores on a GPU are much more tightly packed together with a lower number of control blocks. In general, only one thread can be executed on a single core at a given time while multiple threads can be kept ready by the control block. By grouping cores together with one control block more space becomes available for additional cores. The trade-off of this configuration is that the threads on the GPU cannot work independently. Thus, all the cores in one control group must follow the same flow of instructions. If, for example, the function of one thread is to multiply two numbers, then all threads in the group of cores must also multiply two numbers. However, the benefit of this architecture comes from the fact that each core can access different regions of memory which means that the GPU is highly optimized for calculations using multiple input data and producing multiple output data. Consequently, this design approach works particularly well in favor for MC simulations. By means of thousands of threads, each one tacking an individual photon packet in parallel, the overall simulation time is greatly reduced.

Fig. 12. Simplified representation of the circuit design showing the major differences between a modern CPU on the left and a modern GPU on the right. The GPU has a much more compact layout with cores tightly packed together. The core block executes instructions according to a flow of instruction (program code). The control block selects which thread should be running or if it should be suspended while waiting for memory.

Download Full Size | PDF

5.2 Optimizing the code for running on the GPU

To make a code run efficiently on the GPU a few important aspects must be considered. First, all data that is processed on the GPU originate usually from a code running on the CPU. Thus, the flow of operation relies on the preparation of some data by the CPU that will later be processed by the GPU. Once this operation is accomplished, the prepared data is transferred from the CPU’s memory to the GPU’s global memory. The CPU instructs the GPU to process the data and finally the resulting data can be transferred back to the memory of the CPU as illustrated in Fig. 13(a). The time needed for transferring memory back and forth should not be neglected, unless the time required to process the data on the GPU is significantly longer (by several orders of magnitude). The amount of time lost due to memory transfers and readying the GPU could potentially negate all the speed benefits from the GPU. As shown in Fig. 13(b) some of the slow memory transfers can be avoided by organizing the code smartly. This optimization utilizes a feature of the GPU which allows for memory transfers at the same time as a task is being computed. To reduce the total simulation time further, CPU idle time can be used to process the resulting data or to prepare the next task while the GPU is busy. $Multi$-$Scattering$ is optimized by first creating two queues, or streams as they are called in CUDA, for the GPU. It then immediately gives each queue a task to process a number of photon packets. The CPU code will then alternate between the two queues preforming a synchronized wait for it to finish. Once finished more photons are sent to the idle queue so that the GPU always remains busy. Instead of transferring a lot of data back and forth, almost all of the MC code is run by the GPU including the generation of the final results. Furthermore, the results from the image formation are matrices of photon packet counts. These matrices are also stored in the GPUs global memory since modern GPUs have almost the same amount of memory available as is typically installed alongside the CPU. Occasionally multiple threads will try to write data to the same memory segment simultaneously. With the help of atomic operations the race conditions are avoided and data loss in those situations are prevented.

Fig. 13. The simulation time can be reduced by reorganizing tasks and memory transfers. In (a), even if Task 1 & Task 2 are running on the GPU to its full capacity, efficiency can be lost due to delays (waiting for the next task and for memory to be copied). Those delays can be suppressed by adequately rearranging memory transfer as shown in (b).

Download Full Size | PDF

5.3 Generation of random numbers

MC simulations are based on the use of a very large series of random numbers. To create and generate random numbers on a computer, pseudo-random number generators (PRNG) are most often used. There are many existing varieties of PRNG algorithms to choose from, each defined by their own series length and limitations affecting computational cost. A common issue with PRNGs is that the series of random numbers eventually repeats itself due to memory limitations in storing the state of the generator. A state using more memory would in principle result in a larger period. To initialize the state of a PRNG a seed is used. Initializing the generator with the same seed leads to an identical series of random numbers. Using different seeds is thus important when repeating the same simulation in order to increase the statistics. To boost the performance of an application with multi-threading it is beneficial to assign one PRNG per thread. When selecting a random number generator for MC simulations it is important to consider the following aspects: 1) The period of the PRNG must be larger than the number of random numbers needed. 2) When multiple PRNGs are used in parallel the series of numbers generated must be uncorrelated. 3) Random numbers must be generated fast enough to avoid slowing down the MC algorithm.

In $Multi$-$Scattering$ simulations, several random numbers are generated for different purposes during the tracking of each photon packet:

• Two random numbers are used to select the pixel position from the light source matrix.

• Two additional random numbers are used to select the starting position within the pixel.

• For non-collimated beams - diverging, converging or diffused light sources - three additional random numbers are needed to define the incident direction within a cone angle.

• One random number is employed to determine the distance travelled prior to the first scattering/absorption event.

• At each scattering event three random numbers are employed: Two for the determination of the new photon direction of propagation and one for determining the position of the next scattering event.

• Finally, when considering a light pulse of given duration, one random number is used to define at which location within the pulse the tracked photon packet is generated.

Consider a high optical depth such as, $OD=100$, and launching $nb_{photon}=10^{10}$ photons, the amount of random numbers needed would reach:

(24)$$nb_{\xi}\approx nb_{photon}\cdot(2+2+1+3\cdot OD+1)\approx 3\cdot 10^{12}$$

$Multi$-$Scattering$ is designed to use the cuRAND library part of the of the official CUDA toolkit. With the use of XORWOW a period greater than $2^{190}\approx 1.5\cdot 10^{57}$ is guaranteed. Running a simulation on the GPU with $Multi$-$Scattering$ will result in the creation of multiple PRNGs. First, each GPU will have their own starting generator initialized from a unique seed. Then each thread will be assigned their own sequence number. This will allow all the threads to track multiple photon packets in parallel without any risk of running out of random numbers. By initializing the generator with a unique seed, from the cuRAND library, a unique starting state of the PRNG is obtained resulting in uncorrelated sequences of random numbers between the threads.

6. Simulation speed and acceleration with GPUs

6.1 Description of the simulations

The scattering medium considered here is a cube of $50$ mm side, similar to the simulation configuration illustrated in Fig. 2. The incident light is assumed to be unpolarized and monochromatic at a wavelength $\lambda = 600$ nm. The scatterers are non-absorbing spherical water droplets with a refractive index of $n_p=1.333-0.0i$. They are suspended in air that has a refractive index $n_m=1.000293-0.0i$. Two sizes have been considered resulting in two scattering phase functions calculated from the Lorenz-Mie algorithm which is built into the software. This leads to scattering phase functions with the following properties:

• Fairly isotropic scattering from 0.1 $\mu$m diameter scatterers, shown in Fig. 8(a), resulting to an anisotropy factor factor $g= 0.05$.

• Highly forward scattering from 10 $\mu$m diameter scatterers, shown in Fig. 8(c), resulting to an anisotropy factor factor $g= 0.86$.

In addition to the scattering phase function, other parameters have been varied, such as: the optical depth ranging from 2 up to 500 (corresponding to $\mu _e=0.04$ mm$^{-1}$ and $\mu _e=10$ mm$^{-1}$ respectively) and the number of voxels composing the scattering medium ranging from 1 up to 1000 voxels. The main goal of these simulations is to compare the computing time and performance of using a CPU alone versus a single GPU unit, versus four GPU units. Three identical computers forming a server are used in this study to produce the "GPU results". The hardware of each computer consists of an Intel CPU (Core i7-7700K) and four GPUs from Nvidia (GeForce GTX 1080Ti). For the case of the "CPU results", the simulations are run using another computer having a modern CPU processor from AMD (Ryzen Threadripper 2950X 16-Core) where 32 threads can be exploited. For each simulation case, 1 billion photon packets have been launched.

6.2 Simulation results

The increase in simulation time as a function of the optical depth is shown in Fig. 14. Fig. 14(a) shows the simulation time for a single CPU thread takes between 30 and 90 minutes when varying the optical depth from 2 to 10. Those long processing times are usually not affordable, especially when various parameters need to be investigated. Using now 32 threads of the AMD processor, the simulation time can be reduced by more than one order of magnitude, as shown from the 20x zoomed results in Fig. 14(b). In this case, the simulations can be computed in less than 6 minutes at optical depths $OD \leq 10$. Despite this significant improvement, the running time of the simulation would become significantly larger for higher $OD$ and would not allow multiple users to access the software due to long queuing delays. It is worth mentioning here that the 32 CPU threads are limited to 16 threads running simultaneously while the other 16 threads are kept available to make a rapid transition when a running thread is stalled due to delays in the data being transferred to the cache. This feature explains why the time improvement with 32 CPU threads over a single thread, is significantly less than 32x. From those results, it can be concluded that the execution of the MC simulations using a modern CPU is too time consuming for many simulation cases; especially those related to light propagation through medical tissues where the optical depth can easily reach $OD=100$.

Fig. 14. Comparison of the simulation execution time as a function of the optical depth for a single CPU thread (in black), 32 CPU thread (in red), a single GPU unit (in blue) and four GPU units (in yellow). Two sizes of scatterer are considered: The continuous lines correspond to 0.1 $\mu$m scatterers ($g= 0.05$) and the dashed lines to 10 $\mu$m scatterers ($g= 0.86$).

Download Full Size | PDF

Fig. 14(c) shows that a single GPU is a far more suitable computing platform, completing these tasks in 15 to 30 seconds. Running the same simulations with four GPUs, which are available on a single computer, all simulations are completed in less than 10 seconds at $OD \leq 10$. This fast turnaround time enables far more challenging simulations, such as higher optical depths, and allows multiple users to access $Multi$-$Scattering$ with short queuing delays.

The results given in Fig. 15 show the relative gains in speed for the case given in Fig. 14. In (a) and (b) the relative increase in comparison to a single CPU thread and to 32 threads are given respectively. Additionally, Fig. 15(c) shows now the execution times for higher optical depths, $50\leq OD \leq 500$. The computational expense of running this case would be prohibitively high for conventional CPU’s, so it was only investigated with the four GPU configuration. Results show that simulation times for that arrangement are on the order of minutes. We observe here that the scattering phase function has a noticeable effect on the simulation time. For instance, at $OD = 500$ the execution time increases from 21 to 61 minutes if the scatterers are reduced from 10 to 0.1 $\mu \textrm {m}$. These differences increase with the OD as more isotropic scattering photons tend to spread out within the medium, resulting in a larger number of scattering/absorbing interactions.

Fig. 15. Simulation speed as a function of the optical depth comparing a single CPU thread in (a) with 32 CPU threads in (b). Those results are extracted from the data given in Fig. 14. In (c) the simulation execution time is given for the case of highly scattering media where the $OD$ ranges from 50 to 500. In this case only the GPU results are shown and the time is indicated in minutes. The continuous lines correspond to 0.1 $\mu$m scatterers ($g= 0.05$) and the dashed lines to 10 $\mu$m scatterers ($g= 0.86$).

Download Full Size | PDF

For constructing inhomogeneous turbid media and providing realistic 3D structures multiple voxels are needed. However, when the simulated medium is composed of multiple voxels the execution time is increased. This is due to the additional computing costs needed for calculating the locations at the boundaries where photons transit from one voxel to another. As seen in Fig. 16(a) the total simulation time doubles at $OD=10$ when using $10^3$ voxels instead of a single voxel. When using multiple voxels, each one can either share the same scattering phase function CDF as the others, or use its unique scattering CDF. In the latest case, the demand in memory increases significantly, affecting the overall simulation time. Nevertheless, using multiple individual scattering CDF instead of a single scattering CDF has a minor impact on the overall simulation time as seen in Fig. 16.

Fig. 16. Effect of multiple voxels on simulation speed. The range of voxels goes from a single voxel up to $x=10^3$ voxels. In (a) the optical depth is $OD=10$ and in (b) $OD=100$. Each voxel with a unique scattering CDF (purple lines where the total number of CDF equals $x^3$). Every voxel shares the same scattering CDF (green lines where only one CDF is used).

Download Full Size | PDF

In Fig. 16(b), with $OD=100$, the simulation time increases from 1.4 minutes with one cell to 2 minutes with 1000 cells for the 10 $\mu \textrm {m}$ scatterers (dashed lines). For the the 0.1 $\mu \textrm {m}$ scatterers (solid lines) the simulation time increases from 5.6 minutes with a single cell to 7 minutes with 1000 cells. Those results demonstrate an acceptable increase of the simulation time when multiple voxels are used in $Multi$-$Scattering$ software.

7. Example of a simulation in a 3D-inhomogeneous medium

7.1 Description of the simulation

In this simulation a piece of a human head is considered involving a 3D-inhomogeneous scattering volume. This example was chosen due to both its relevance in the field of medicine and its challenging optical density. In order to construct a realistic 3D model of a human head, input data from various sources have been gathered. The voxelized model describing the tissue structures was reproduced from the MRI brain atlas found in the Neurodevelopmental MRI database [44]. This data set was recently used by Tran et al. [45], where tools for optimizing 3D meshing are presented. A section of the atlas of the human head considered in this study is shown in Fig. 17. The spatial resolution consists of 1 mm$^3$ voxel volume from a portion of the brain at the top of the head. By maintaining the spatial resolution and extracting 20x20x20 voxels, an overall cubic volume of 8000 mm$^3$ has been created for the simulation. Second, the optical properties for each type of brain tissue has been defined using the Chinese Visible Human (CVH) data set described by Lanhui Wu et al. in [46]. Five types of tissue are considered corresponding to the scalp, skull, cerebrospinal fluid (CSF), grey matter and white matter. The optical properties of those tissues are given on the right side of Fig. 17 and correspond to an illumination wavelength of $\lambda =800$nm. Note that those values will vary from person to person and as a function of age.

Fig. 17. The simulated volume consists of 20x20x20 voxels, each $1mm^3$ voxel with the optical properties of one of the five tissue types presented in the table.

Download Full Size | PDF

From the anisotropy factor given in [46] for each tissue layer, the Henyey-Greenstein equation (Eq. 10)) has been used to deduce the corresponding scattering phase functions. Finally, the scattering phase function, as well as the scattering and absorption coefficients of each tissue were stored as input data for the simulation. The incident light source used here is a top hat collimated beam with a diameter of 1 mm, entering from the center of the top of the volume. To simulate light detection, two optical fibers with source-detector separation distance of 3 mm and 8 mm were added and photon path of the light being detected has been recorded. A total number of $20 \cdot 10^9$ photon packets have been launched in the simulation corresponding to 3.3 hours computing time on a single computer (Note that $\mu _s$ is used here and not $\mu _s'$). The corresponding value for Fig. 15(c) would be 10 minutes, when launching one billion photons.

7.2 Simulation results

The results presented here comes from a single simulation defined by the optical properties shown in the table of Fig. 17. Some of the imaging results of the simulation are presented in Fig. 18. It shows the distribution of photons exiting the simulated volume. The average number of scattering events is $S_{nb}=351$. The inhomogeneous structure of the brain is visible from the 3D image, in which we observe the greatest amount of light exiting through the CSF, and the least via the white matter.

Fig. 18. 3D views showing how light exits the simulated volume. The incident beam is located at the top of the volume. The locations of the different tissue layers are visible. It seen here that absorption is high in the gray matter and low in the CSF.

Download Full Size | PDF

In Fig. 19 the distributions of scattering events inside the simulated volume are presented. The data corresponds to the overall locations and distribution of 1 million scattering events with each event projected on the XZ plane. It shows that most scattering events occur within the scalp region while only a few scattering events are visible in the CSF. Some more scattering events are apparent in the white matter at the bottom of the image. Positioning an optical fiber, 1mm in diameter, and collecting light at a distance $d$ from the incident illumination beam, the path of the detected photons represents a "banana shape" as shown in (b) and (c). By increasing $d$, photon pathways reach further into the head tissues. This leads to a reduction of collected photons while the average number of scattering events increases. Those values are indicated in the respective images. Results of simulations such as the one presented here, can be complementary to numerous biomedical techniques. One example would be diffuse correlation spectroscopy, which is used to identify blood flow and functional dynamics within tissue as well as time-of-flight measurements to determine the mean free path length within tissues. Other applications include accurate predictions of photon energy deposition for accurate treatment and detection of tumor cells.

Fig. 19. Results showing scattering events inside the head sample. Each pixel corresponds to the sum of scattering events along the depth of the simulated volume. In (a) all photons are shown while the trajectory of photons collected by a 1 mm diameter fiber located at $d=5$ mm and $d=8$ mm are shown in (b) and (c) respectively. The number of recorded photon packets ($\#_{\textrm {photon}}$) and the average number of scattering events ($S_{\textrm {nb}}$) are also shown.

Download Full Size | PDF

8. Conclusions

A computational tool designed for multi-purpose Monte Carlo simulations of photon transport through scattering media, $Multi$-$Scattering$, has been presented. This software package is ready for use by researchers who are not experts programmers, but wish to numerically model these systems and obtain graphical results of various detection cases. It is freely accessible online at: https://multi-scattering.com/.

Thanks to its extensive user interface, and to the integration of the Lorenz-Mie theory, simulations can be created easily, as well as duplicated and modified quickly, making $Multi$-$Scattering$ the perfect numerical platform for studying the effects of various light propagation situations and detection parameters. The software includes several libraries of input data, greatly simplifying the setup for numerous test cases. We have shown that with the GPU acceleration implemented, simulations for a single voxel with and optical depth of 10, run in less than 10 seconds, and a 1000 voxel case in less than 30 seconds. For the challenging case of an optical depth of 100 and 1000 voxels, the simulation was computed in 7 minutes. These rapid processing times allow the server to be shared among several users with acceptable queuing delays. The validation of the software will be shown in Part II of this article series. It will compare the simulation results from Multi-Scattering to a comprehensive set of experimental data including conditions with different wavelengths, optical depths and with particles of different sizes of polystyrene microspheres. Future improvements to the platform will include the option of running a single simulation on three computers in parallel, allowing the use of 12 GPU’s for a given run. Additional computers will also be added to the server, making $Multi$-$Scattering$ one of the world’s fastest online Monte Carlo simulation of photon transport through scattering media.

Funding

H2020 European Research Council (638546).

Acknowledgment

The authors would like to thank Andrew Corber from the National Research Council Canada who kindly gave some of his time for reading and correcting the article.

Disclosures

The authors declare no conflicts of interest.

References

1. E. Berrocal, “Multiple scattering of light in optical diagnostics of dense sprays and other complex turbid media,” Ph.D. thesis, Cranfield University (2006).

2. B. C. Wilson and G. Adam, “A Monte Carlo model for the absorption and flux distributions of light in tissue,” Med. Phys. 10(6), 824–830 (1983). [CrossRef]

3. S. A. Prahl, “Light transport in tissue,” Ph.D. thesis, University of Texas at Austin (1988).

4. M. Keijzer, “Light transport for medical laser treatments,” Ph.D. thesis, Technische Univ., Delft (Netherlands). (1993).

5. M. Keijzer, S. L. Jacques, S. A. Prahl, and A. J. Welch, “Light distributions in artery tissue: Monte Carlo simulations for finite-diameter laser beams,” Lasers Surg. Med. 9(2), 148–154 (1989). [CrossRef]

6. S. A. Prahl, “A Monte Carlo model of light propagation in tissue,” in Dosimetry of Laser Radiation in Medicine and Biology, vol. 10305G. J. Mueller, D. H. Sliney, and R. F. Potter, eds., International Society for Optics and Photonics (SPIE, 1989), pp. 105–114.

7. S. T. Flock, M. S. Patterson, B. C. Wilson, and D. R. Wyman, “Monte Carlo modeling of light propagation in highly scattering tissues. I. model predictions and comparison with diffusion theory,” IEEE Trans. Biomed. Eng. 36(12), 1162–1168 (1989). [CrossRef]

8. L. Wang, S. L. Jacques, and L. Zheng, “MCML-Monte Carlo modeling of light transport in multi-layered tissues,” Comput. Methods Programs Biomed. 47(2), 131–146 (1995). [CrossRef]

9. I. V. Meglinski and S. J. Matcher, “Quantitative assessment of skin layers absorption and skin reflectance spectra simulation in the visible and near-infrared spectral regions,” Physiol. Meas. 23(4), 741–753 (2002). [CrossRef]

10. D. Y. Churmakov, I. V. Meglinski, and D. A. Greenhalgh, “Amending of fluorescence sensor signal localization in human skin by matching of the refractive index,” J. Biomed. Opt. 9(2), 339–346 (2004). [CrossRef]

11. D. A. Boas, J. P. Culver, J. J. Stott, and A. K. Dunn, “Three dimensional Monte Carlo code for photon migration through complex heterogeneous media including the adult human head,” Opt. Express 10(3), 159–170 (2002). [CrossRef]

12. Q. Fang, “Mesh-based Monte Carlo method using fast ray-tracing in Plücker coordinates,” Biomed. Opt. Express 1(1), 165–175 (2010). [CrossRef]

13. E. Alerstam, T. Svensson, and S. Andersson-Engels, “Parallel computing with graphics processing units for high-speed Monte Carlo simulation of photon migration,” J. Biomed. Opt. 13(6), 060504 (2008). [CrossRef]

14. Q. Fang and D. A. Boas, “Monte Carlo simulation of photon migration in 3D turbid media accelerated by Graphics Processing Units,” Opt. Express 17(22), 20178–20190 (2009). [CrossRef]

15. E. Alerstam, W. C. Y. Lo, T. D. Han, J. Rose, S. Andersson-Engels, and L. Lilge, “Next-generation acceleration and code optimization for light transport in turbid media using GPUs,” Biomed. Opt. Express 1(2), 658–675 (2010). [CrossRef]

16. A. Doronin and I. Meglinski, “Online object oriented Monte Carlo computational tool for the needs of biomedical optics,” Biomed. Opt. Express 2(9), 2461–2469 (2011). [CrossRef]

17. J. Cassidy, L. Lilge, and V. Betz, “FullMonte: a framework for high-performance Monte Carlo simulation of light through turbid media with complex geometry,” in Biomedical Applications of Light Scattering VII, vol. 8592A. P. Wax and V. Backman, eds., International Society for Optics and Photonics (SPIE, 2013), pp. 41–54.

18. T. Young-Schultz, S. Brown, L. Lilge, and V. Betz, “FullMonteCUDA: a fast, flexible, and accurate GPU-accelerated Monte Carlo simulator for light propagation in turbid media,” Biomed. Opt. Express 10(9), 4711–4726 (2019). [CrossRef]

19. Q. Fang and S. Yan, “Graphics processing unit-accelerated mesh-based Monte Carlo photon transport simulations,” J. Biomed. Opt. 24(11), 1–6 (2019). [CrossRef]

20. M. I. Mishchenko, L. Liu, D. W. Mackowski, B. Cairns, and G. Videen, “Multiple scattering by random particulate media: exact 3D results,” Opt. Express 15(6), 2822–2836 (2007). [CrossRef]

21. D. G. Collins and M. B. Wells, “Monte Carlo codes for the study of light transport in the atmosphere,” (1965).

22. G. N. Plass and G. W. Kattawar, “Monte Carlo calculations of light scattering from clouds,” Appl. Opt. 7(3), 415–419 (1968). [CrossRef]

23. G. W. Kattawar and G. N. Plass, “Radiance and polarization of multiple scattered light from haze and clouds,” Appl. Opt. 7(8), 1519–1527 (1968). [CrossRef]

24. G. N. Plass and G. W. Kattawar, “Radiative transfer in an Atmosphere–Ocean system,” Appl. Opt. 8(2), 455–466 (1969). [CrossRef]

25. G. W. Kattawar and G. N. Plass, “Time of flight lidar measurements as an ocean probe,” Appl. Opt. 11(3), 662–666 (1972). [CrossRef]

26. C. J. Funk, “Multiple scattering calculations of light propagation in ocean water,” Appl. Opt. 12(2), 301–313 (1973). [CrossRef]

27. E. A. Bucher, “Computer simulation of light pulse propagation for communication through thick clouds,” Appl. Opt. 12(10), 2391–2400 (1973). [CrossRef]

28. P. Bruscaglioni, A. Ismaelli, G. Zaccanti, and L. Pantani, “Modified Monte-Carlo Method To Evaluate Multiple Scattering Effects On Lightbeam Transmission Through A Turbid Atmosphere,” in Max Born Centenary Conf, vol. 0369M. J. Colles and D. W. Swift, eds., International Society for Optics and Photonics (SPIE, 1983), pp. 164–173.

29. P. Bruscaglioni, P. Donelli, A. Ismaelli, and G. Zaccanti, “A numerical procedure for calculating the effect of a turbid medium on the MTF of an optical system,” J. Mod. Opt. 38(1), 129–142 (1991). [CrossRef]

30. P. Donelli, P. Bruscaglioni, A. Ismaelli, and G. Zaccanti, “Experimental validation of a Monte Carlo procedure for the evaluation of the effect of a turbid medium on the point spread function of an optical system,” J. Mod. Opt. 38(11), 2189–2201 (1991). [CrossRef]

31. P. Bruscaglioni, A. Ismaelli, and G. Zaccanti, “Monte-Carlo calculations of LIDAR returns: Procedure and results,” Appl. Phys. B 60(4), 325–329 (1995). [CrossRef]

32. J.-P. Briton, B. Maheu, G. Gréhan, and G. Gouesbet, “Monte Carlo Simulation of Multiple Scattering in arbitrary 3-D geometry,” Part. Part. Syst. Charact. 9(1-4), 52–58 (1992). [CrossRef]

33. C. Rozé, B. Maheu, G. Gréhan, and J. Ménard, “Evaluations of the sighting distance in a foggy atmosphere by Monte Carlo simulation,” Atmos. Environ. 28(5), 769–775 (1994). Conference on visibility and fine particles. [CrossRef]

34. B. Roysam, A. Cohen, P. Getto, and P. Boyce, “A numerical approach to the computation of light propagation through turbid media: Application to the evaluation of lighted exit signs,” in Conference Record of the 1991 IEEE Industry Applications Society Annual Meeting, (1991), pp. 1876–1882 vol.2.

35. E. Berrocal, D. Y. Churmakov, V. P. Romanov, M. C. Jermy, and I. V. Meglinski, “Crossed source–detector geometry for a novel spray diagnostic: Monte Carlo simulation and analytical results,” Appl. Opt. 44(13), 2519–2529 (2005). [CrossRef]

36. E. Berrocal, I. Meglinski, and M. Jermy, “New model for light propagation in highly inhomogeneous polydisperse turbid media with applications in spray diagnostics,” Opt. Express 13(23), 9181–9195 (2005). [CrossRef]

37. E. Berrocal, D. L. Sedarsky, M. E. Paciaroni, I. V. Meglinski, and M. A. Linne, “Laser light scattering in turbid media Part I: Experimental and simulated results for the spatial intensity distribution,” Opt. Express 15(17), 10649–10665 (2007). [CrossRef]

38. E. Berrocal, D. L. Sedarsky, M. E. Paciaroni, I. V. Meglinski, and M. A. Linne, “Laser light scattering in turbid media Part II: Spatial and temporal analysis of individual scattering orders via Monte Carlo simulation,” Opt. Express 17(16), 13792–13809 (2009). [CrossRef]

39. L. Wang, P. P. Ho, C. Liu, G. Zhang, and R. R. Alfano, “Ballistic 2-D imaging through scattering walls using an ultrafast optical Kerr gate,” Science 253(5021), 769–771 (1991). [CrossRef]

40. E. Berrocal, S.-G. Pettersson, and E. Kristensson, “High-contrast imaging through scattering media using structured illumination and Fourier filtering,” Opt. Lett. 41(23), 5612–5615 (2016). [CrossRef]

41. L. G. Henyey and J. L. Greenstein, “Diffuse radiation in the galaxy,” Astrophys. J. 93, 70–83 (1941). [CrossRef]

42. C. Bohren and D. Huffman, Absorption and Scattering of Light by Small Particles (Wiley, 1983).

43. E. Berrocal, I. V. Meglinski, D. A. Greenhalgh, and M. A. Linne, “Image transfer through the complex scattering turbid media,” Laser Phys. Lett. 3(9), 464–467 (2006). [CrossRef]

44. C. E. Sanchez, J. E. Richards, and C. R. Almli, “Age-specific MRI templates for pediatric neuroimaging,” Dev. Neuropsychol. 37(5), 379–399 (2012). PMID: 22799759. [CrossRef]

45. A. P. Tran, S. Yan, and Q. Fang, “Improving model-based functional near-infrared spectroscopy analysis using mesh-based anatomical and light-transport models,” Neurophotonics 7(01), 1–18 (2020). [CrossRef]

46. L. Wu, Y. Lin, and T. Li, “Effect of human brain edema on light propagation: A Monte Carlo modeling based on the Visible Chinese Human Dataset,” IEEE Photonics J. 9(5), 1–10 (2017). [CrossRef]

Multi-Scattering software: part I: online accelerated Monte Carlo simulation of light transport through scattering media

Abstract

1. Introduction

1.1 Monte Carlo simulations in biomedical tissues

1.2 Monte Carlo simulations in particulate scattering media

2. Multi-Scattering: a multi-purpose MC software

2.1 General description

2.2 Features of the simulations

3. Terminology and definitions

3.1 Extinction, scattering and absorption

3.2 The Radiative Transfer Equation

3.3 Scattering phase function and anisotropy factor

4. Description of the successive steps and algorithm

4.1 General simulation algorithm

4.2 Launching photon packets

4.3 Determination of the free path length

4.4 New photon weight

4.5 New photon direction

4.6 Photon crossing voxels and exiting the scattering volume

4.7 Photon detection and image formation

5. Parallel programming using CUDA

5.1 CUDA and modern GPU’s architecture

5.2 Optimizing the code for running on the GPU

5.3 Generation of random numbers

6. Simulation speed and acceleration with GPUs

6.1 Description of the simulations

6.2 Simulation results

7. Example of a simulation in a 3D-inhomogeneous medium

7.1 Description of the simulation

7.2 Simulation results

8. Conclusions

Funding

Acknowledgment

Disclosures

References

Cited By

Figures (19)

Equations (24)

Optics Express