3D-deep optical learning: a multimodal and multitask reconstruction framework for optical molecular tomography

Shuangchen Li; Shuangchen Li; Beilei Wang; Beilei Wang; Jingjing Yu; Jingjing Yu; Dizhen Kang; Xuelei He; Xuelei He; Hongbo Guo; Hongbo Guo; Hongbo Guo; Xiaowei He; Xiaowei He; Xiaowei He

doi:10.1364/OE.490139

1. Introduction

Optical molecular tomography (OMT) has considerable potential in various small animal model-based studies, such as the early detection and diagnosis of tumors, analysis of pathological tumor characteristics, and efficacy evaluation of anticancer drug discovery [1,2]. The inverse inference of the 3D distribution of luminous sources in complex biological tissues based on acquired optical images has always been a topic of considerable research interest in OMT [3], including bioluminescence tomography (BLT) [4] and fluorescence molecular tomography imaging (FMT) [5].

The major challenge to high-quality reconstruction is the ill-posedness due to high photon scattering in tissue [6]. To date, a priori information is typically required to approximate the photon propagation represented by radiative transfer equations more accurately [7,8], such as the topography of the surface optical power distribution and anatomical information provided by computed tomography (CT) or magnetic resonance imaging (MRI) [9,10]. Furthermore, benefiting from the compressive sensing (CS) theory [11], numerous iterative optimization algorithms have been designed to characterize the results under sparse prior assumptions and improve reconstruction efficiency, precision, and robustness, including regularization strategies [12–14], greedy pursuit [15], and sparse Bayesian learning [16]. However, their performance in various biomedical applications is hindered by deviations in the approximated photon propagation model, high computational cost, and handcrafted parameters [17].

Over the past few years, research has indicated that deep learning (DL) has considerable potential for improving the quality and efficiency of image reconstruction [18,19]. Existing data-driven DL methods for OMT are implemented by directly learning nonlinear mapping from the vector/projection of boundary optical measurement to the luminous source distribution, which provides a new approach for OMT with more accurate localization of the luminous source. For example, based on finite element analysis (FEA) [20], Gao et al. first proposed an inverse problem simulation (IPS) method for BLT [21] and extended a series of improved methods, such as the K-nearest neighbor based locally connected network in FMT [22] and a multilayer fully connected neural network in Cerenkov luminescence tomography (CLT) [23]. To further resolve the resolution limited by FEA, Zhang et al. proposed a series of voxel-based solutions, including 3D deep encoder–decoder networks [24] and a more advanced 3D fusion dual-sampling deep network FMT reconstruction method (UHR-DeepFMT) [25], to realize a higher spatial resolution reconstruction of FMT. These end-to-end DL methods are effective for solving complex optical inverse problems and realizing the reconstruction of specified objects with higher efficiency and accuracy.

However, the destruction of spatial information during data preprocessing and the inadequate consideration of spatial correlation between the inputs and labels cause these reconstruction methods to inaccurately identify the 3D structure of the imaged object and aggravate the ill-posedness of the OMT. Additionally, it is difficult for a network constrained only by the luminous source distribution to properly understand the relationship between optical measurements and luminous sources in a huge solution space. This makes the existing OMT reconstruction approaches poorly adaptable to the spectra of different optical probes. Therefore, these deep learning architectures are often limited to pretrained imaging environments [26], including a standard imaging structure and specific optical probe. To date, as no effective nonrigid registration algorithm that can achieve alignment across different tissue models exists, the reconstruction capability of these deep learning architectures for different tissue conditions in practical applications remains a concern, such as the recognition of diverse imaged objects and the adaptability to the spectra of various optical probes [27–29].

Recently, in the field of diffuse optical tomography (DOT) [30], Feng et al. proposed an MRI-guided near-infrared spectral tomography (NIRST) reconstruction algorithm (Z-net) based on deep convolutional networks for diffusion to reconstruct functional images of breast tumors [31]. This approach demonstrates the value of combining multimodal image reconstruction, which can improve the quality of the reconstructed images, particularly for DL-based image recovery. However, owing to the differences in imaging scenarios, directly applying this reconstruction framework to BLT and FMT is difficult. First, Z-net research only considers a single 2D tomographic image. Unlike mammary glands with regular shapes, the imaging objects of BLT and FMT are aimed at organisms with more complex shapes and structures, such as orthotopic glioma in brain tissue. To obtain a more accurate spatial pattern of the lesion area, considering the integrity of the optical measurements and the image structure is necessary. Second, DOT identifies the lesion area by distinguishing the differences in optical parameters between the heterogeneous tissue and the background. The MRI images of the Z-net fusion show clear structural features of heterogeneous tissues. However, the research targets of BLT and FMT are always early and homogeneous tumors that are difficult to detect using conventional imaging methods (CT / MRI) [2]. Thus, the difference between the research target and application background hinders the development of a Z-net reconstruction framework for BLT and FMT.

In this study, to overcome the aforementioned limitations, we analyzed the imaging principle of OMT, and a new mapping model was established to represent the backward propagation of photons. Moreover, based on this model, a multimodal and multitask reconstruction framework–3D-deep optical learning (3DOL), for optical molecular tomography was proposed. Particularly, through a modularized design and specific optimization process, various types of a priori information were introduced to infer the photon density distribution (optical field) in the organism and the spatial morphology of the luminous source, which realized universal compatibility with diverse imaged objects and generalization to various spectra in the first near-infrared window (NIR-I, 620–900 nm) by enormously mitigating the ill-posedness in OMT.

Unlike existing DL approaches, 3DOL accepts inputs from multi-modality (optical and CT) data to identify different imaged objects and splits OMT into multi-tasks (optical field recovery and luminous source reconstruction) to implicitly infer the photon propagation process. Additionally, the powerful capabilities of convolutional neural networks (CNNs) [32] to represent advanced image features and recurrent neural networks (RNNs) [33] to aggregate sequence data were combined to capture fused embeddings from multi-mode original volumetric tomography scans in 3D space. Furthermore, multiscale feature maps were adopted to improve the quality of the recovered optical field, and the geometry of the imaged object was used a priori to speed up the convergence rate of the parameters. Finally, a Laplacian operator with learnable weights was designed under the guidance of field theory to obtain a reliable luminous source distribution from an inhomogeneous optical field at an extremely low computational cost.

The contributions of this study are as follows.

1. The bio-optical image formation process was systematically analyzed and inversely solved to propose a novel reconstruction mapping model that divides the imaging process of OMT into optical field recovery and luminous source reconstruction tasks.

2. The anatomy of the imaging object (provided by CT) and boundary optical measurement were fused, and the spatial relationship of image features was established through the cascade architecture of the CNN-RNN so that the network can recognize different imaging objects.

3. The inference of the optical field and the introduction of a physical model between the optical field and the luminous source help the network to correctly understand the transmission mode of photons, which enables the network to adapt to spectral changes over a wider range.

The remainder of the paper is structured as follows. Section 2 introduces the construction ideas and basic modules of 3DOL; Section 3 elaborates on the details of the experimental setting; Section 4 presents the experimental results; and Section 5 provides a basic summary and discussion of 3DOL.

2. Method

The basic concept of 3DOL is to inversely deduce the generation process of optical images [34]. Particularly, Fig. 1(a) illustrates the procedure of optical image acquisition as follows: Step.1: Photons propagate from the luminous source $q$ to form an optical field $\phi$ based on the optical properties of the different anatomical tissues $\Omega$. Step.2: Some photons penetrate the surface of the medium to form observable boundary optical measurement $\phi _{b}$ related to the geometry of the objects $R$. Step.3: $\phi _{b}$ is captured as an optical image using a highly sensitive camera. Instead of directly mapping $f(\theta |\mathcal {G}(\phi _{b})\to q)$ in conventional end-to-end deep reconstruction approaches, where $\mathcal {G}$ is the dimension reduction operator, 3DOL learns the following mapping during the training process:

(1)$$f\left(\theta|\left\{\phi_{b},\Omega, R\right\}\to\phi\to \Theta\right)$$

where $\Omega$ and $R$ are the anatomical structure and geometry of the imaged object acquired using CT, respectively. This sophisticated mapping enables 3DOL to perceive photon propagation more accurately and understand the relationship between boundary optical measurement, optical fields, and luminous sources. Based on the mapping model in Eq. (1), optical images should be denoised [35] and registered to $R$ in advance to acquire $\phi _{b}$ [36], as steps 4 and 5. Subsequently, 3DOL decompose the inverse problem of OMT into two tasks, optical field recovery and luminous source reconstruction.

Fig. 1. Process of (a) acquiring an optical image and (b) inverse reconstruction in OMT.

Download Full Size | PDF

(i). Recover $\phi$ from $\{\phi _{b}, \Omega, R\}$ as in step 6 in Fig. 1(b). In 3DOL architecture, the pipeline achieves optical-field recovery through a four-stage framework. (1) Fixed the number of 2D-scan slices of $\{\phi _{b}$, $\Omega$, $R\}$ and rescaled them to the same size. (2) Multimodal fused features at different scales are extracted and fused from slices of $\phi _{b}$ and $R$ using a bimodal (optical and CT) cascade feature module (BCFM). (3) The fused sequential spatial feature module (FSM) integrates all the features in the high-dimensional feature space and stepwisely recovers $\phi$. (4) The boundary-constraint optimization module (BOM) provides an enhanced constraint referring to $R$ to refine the recovered $\phi$.

(ii). Design a learnable Laplace operator in a physics-inspired condensed module (PCM) to segment the mask representation of the source domain from the recovered $\phi$ and reconstruct $\Theta$ as step 7 in Fig. 1(b). Fig. 2(a) illustrates the detailed procedures of 3DOL to complete the above tasks.

Fig. 2. Schematic of the 3DOL architecture, including (a) Overview of 3DOL, (b) Bimodal (optical and CT) Cascade Feature Module (BCFM), (c) Fused sequential-spatial-features Module (FSM), (d) Boundary-constraint optimization module (BOM), and (e) physics-inspired condensed module (PCM).

Download Full Size | PDF

2.1 Optical field recovery

The cooperation of BCFM, FSM,and BOM restores the optical field inside the imaged object, as shown in Fig. 2(b)-(d), respectively.

Bimodal (optical and CT) cascade feature module (BCFM): The tomographic scan slices of $\Omega$ and $\phi _{b}$ on a fixed size together serve as inputs of BCFM, which is designed to avoid the excessive loss of details caused by 3D-downsampling operations in directly processing boundary optical measurement with over-sparse information density [37]. As indicated by the green backgrounds in Fig. 2(a), considering the different physical characters of the two modalities (optical and CT), BCFM adopts two groups of ResNet-like weights sharing encoders [38] with 3*3 kernels to extract, refine, and compress the features of each 2D-scan slice from the two modalities parallel and sequentially. Moreover, Fig. 2(b) illustrates the detail of processing a pair of slices from optical and CT. First, an embedding vector (outline, geometry, tissue distribution, grayscale, etc.) representing the morphology of the medium based on $\Omega$ (typically segmented by CT) is used to distinguish different imaged objects. Second, additional hidden optical features are extracted from $\phi _{b}$ (gradient component, refractive index, incident and exit directions, etc.) and further fused with the medium-embedded representation and optical feature maps with the same scale by the residual connection to implicitly infer the optical properties of the medium. Particularly, a dual-pooling operation with 2*2, combined with maximum and average pooling, was applied to reduce the information loss during sampling. Finally, the fused features are the outputs of BCFM, which provide more priori information for recovering the optical field.

Fused sequential-spatial-features module (FSM): Considering that the features extracted by BCFM are 2D fused, FSM was applied to fit the sequential slice features obtained from BCFM to 3D-space, and deliver them to the decoder to stepwisely recover $\phi$, as shown in the blue background of Fig. 2(a) and detailed in Fig. 2(c). In FSM, inspired by the outstanding representation performance of RNNs for sequential temporal inputs, a bidirectional gate recurrent unit (GRU) based on the axial spatial correlation of 3D objects was employed to integrate the information of tomographic scans from various depths to add global feature representations of imaged objects to multimodal features [39]. Compared with long short-term memory (LSTM) networks [40], GRU entails fewer parameters but can achieve similar performance. This architecture allows the context of the imaged object to be retained in memory, focusing on details and keeping in mind its global appearance. Moreover, the decoder of FSM encourages optical and CT fusion feature reuse and enhances feature mapping by concatenating low- and high-level features by residual connections, which fully integrates the physical significance of the original inputs and mitigates the ill-posedness of OMT.

For multimodal volumetric inputs, the architecture of BCFM-FSM can effectively focus on the global structural information of 3D objects and solve the difficulties of model training caused by large volumetric input dimensions.

Boundary-constraint optimization module (BOM): BOM incorporates $R$ and divides the whole voxel space into two domains: solution and background, which restricts the updating direction of parameters and the recovered domain of $\phi$ inside the imaged object instead of the global space to accelerate the convergence of the network. As shown in Fig. 2(d), BOM adds the $R$ to the output of FSM:

(2)$$E_{BOM}=E_{FSM}\odot R(r)$$

(3)$$R(r)=\left\{\begin{matrix} 1 & r\in \Omega\\ 0 & r\notin \Omega \end{matrix}\right.$$

where $\odot$ is the Hadamard product, $E$ is the module output, and $r$ is the location in 3D-space. The effect of BOM is reflected in backpropagation, and the gradient $G(\nabla _{\theta })$ for the optimization target is as follows [41]:

(4)$$G(\nabla_{\theta}) = \frac{\partial \left\|E_{FSM}\left(\theta^{k-1}|r\right)\odot R\left(r\right),\phi\left(r\right) \right\|}{\partial \theta^{k-1}}$$

where $\phi$ represents the ground truth of the optical field and $\theta$ is the parameter of 3DOL. Eq. (4) indicates that BOM ensures the accuracy of the parameter update direction by restricting the solution domain as follows:

(5)$$\begin{aligned}\theta^{k}&=\theta^{k-1}-\alpha G(\nabla_{\theta})\\ &= \left\{\begin{matrix} \theta^{k-1}-\alpha\frac{\partial \left\|E_{FSM}\left(\theta^{k-1}|r\right),\phi\left(r\right) \right\|}{\partial \theta^{k-1}} & r\in \Omega\\ \theta^{k-1} & r\notin \Omega \end{matrix}\right. \end{aligned}$$

Learning Objective: Toward fulfilling the goal of optical field recovery, the learning objective was designed to minimize the sum of $huber$ [42] and Jensen–Shannon divergence (JS) [43]:

(6)$$L_{optical}=huber(E_{BOM}, \phi) + JS(E_{BOM}, \phi)$$

where $\phi$ denotes the ground truth of the optical field used in simulations. Moreover, $huber$ improves the robustness of 3DOL to the inputs and sufficiently considers the peak value of the field energy. $JS$ divergence ensures that the predicted and true fields have the same distribution.

2.2 Luminous source reconstruction

PCM, as shown in Fig. 2(e), realizes the luminous source reconstruction and quantifies the photon density inside the source domain to reflect the distribution of probe concentration.

Physics-inspired condensed module (PCM): Inspired by field theory and considering the approximation error $\varepsilon$, PCM models the relationship between $\phi$ and $\Theta$ simply by the spatial second-order derivative as follows [44]:

(7)$$\Theta\left(r\right) =\nabla^{2}\phi + \varepsilon$$

According to spectral graph theory, for the discrete matrix $\phi$, the spatial second-order derivative represented by Eq. (7) is equal to the convolution of the Laplace operator with $\phi$ [45]. Considering the nonuniform optical field distribution, the PCM is designed to combine the unknown weights $w$ with the Laplace operator $L$ to further transform it into a learnable Laplace operator, as follows:

(8)$$\Theta\left(r\right) =w\otimes L\ast \phi + \varepsilon$$

Considering the mask images of $\Theta$ as labels, PCM is committed to selecting the correct domain and indicating the clearest boundary of $\Theta$. The PCM output passes through a truncated tanh ($T_{tanh}$) [46] that is normalized as

(9)$$T_{tanh}=relu\left(tanh\left(x\right)\right)= \left\{\begin{matrix} \frac{e^{x}-e^{{-}x}}{e^{x}+e^{{-}x}} & x > 0 \\ 0 & x\le 0 \end{matrix}\right.$$

Equation (8) and Eq. (9) describe the architecture of PCM, which is an extremely lightweight module with a 3D convolution operation activated by the $T_{tanh}$ function (including a 3 $\ast$ 3 $\ast$ 3 convolution kernel and a unit bias term), as shown in Fig. 2(e). Moreover, Fig. 3 demonstrates the visualization process of PCM, which intuitively reflects its reconstruction theory.

Learning Objective: The learning objective of the luminous source reconstruction task was designed as the addition of mean square error ($MSE$) [47] with intersection over union ($IOU$) [48]:

(10)$$L_{source}=MSE(E_{PCM}, \Theta) + 1 - IOU(E_{PCM}, \Theta)$$

where $q$ denotes the ground truth for the luminous source used in the simulations. Moreover, $MSE$ plays a role similar to that of $huber$ for the lightweight module PCM but with a lower computational cost. $IOU$ ensures maximum coincidence between the segmentation result and the source mask.

Fig. 3. Visualization of Physics-inspired Condensed Module (PCM).

Download Full Size | PDF

2.3 Implementation and optimization procedure

3DOL was trained separately based on the two tasks. In the initial training stage, the output of the BOM is meaningless and cannot provide useful information for the PCM. Thus, an optical field with 10% Gaussian noise was adopted as the input for PCM training, and its epoch was only one-fourth that of the other three modules. Finally, four well-trained modules were cascaded for testing, as shown in Fig. 2.

The Adam algorithm was used as the optimizer in 3DOL [49]. The training parameters were: epochs=64, batch size=16, and learning rate=1e-5. 3DOL was implemented using the Tensorflow 1.5.0 backend in Python 3.5 on a computer with a total memory of 160G of dual-Tesla A 100(80G).

3. Experiment setting

3.1 Imaging device

The Optical and CT images were collected using a customized imaging system, as shown in Fig. 4(a). The imaging system consists of four parts: (1) Micro-CT system: including an X-ray detector (1512N-C90-HRCC, Dexela, USA) and X-ray generator (L9181-02 MT2195, HAMAMATSU PHOTONICS, CHN); (2) Optical acquisition system: including a thermoelectric cooling electron multiplying charge coupled device (EMCCD) camera (iXonEM+888, ANDOR, UK) and a Schneider 3CCD Fixed-focus Lens (Xenon 0.95/17); (3) An 808nm CW laser source (CrystaLaser, Reno, NV, USA, and Model NO.CL671-050-O) with power of 12 W. (4) The mechanical control module, which included a 360-degree motorized rotation stage (RAK100, Zolix, CHN), a motorized lifting stage along the z-axis, three translation stages (Zolix Instruments Co., Beijing, China), and a controller (Zolix Instruments Co. Beijing, China). The optical/CT imaging system was placed in a dark room to avoid external light interference and acquire weak optical signals. Realistic scenarios are shown in Fig. 4(b).

Fig. 4. Optical and CT imaging dual-mode system and optical parameter setting. (a) and (b) are 3D models and solid drawings of the imaging system. (c) Optical parameters of different tissues at 620–900 nm.

Download Full Size | PDF

3.2 Optical parameters

The optical energy change of different spectral bands in biological tissues depends primarily on the absorption coefficient $\mu _{a}$ and the scattering coefficient $\mu _{s}$. $\mu _{s}$ is generally expressed as a function of wavelength [50]:

(11)$$\mu_{s}\left(\lambda\right)=a\ast \lambda^{{-}b}mm^{{-}1}$$

where $a$ and $b$ are relevant tissue constants, and $\mu _{a}$ is expressed as the weighted sum of oxygen-hemoglobin ($HbO_{2}$), deoxy-hemoglobin ($Hb$), and water content in tissues.

(12)$$\mu_{a}\left(\lambda\right)=S_{B}X\mu_{aHb}\left(\lambda\right) + S_{B}\left(1-X\mu_{aHbO_{2}}\right)\left(\lambda\right) + S_{W}\mu_{aw}\left(\lambda\right)$$

where $\mu _{aHbO_{2}}$, $\mu _{aHb}$ and $\mu _{aw}$ are the absorption coefficients of oxy-hemoglobin, deoxy-hemoglobin, and water, respectively, [51] and $X$, $S_{B}$ and $S_{W}$ are empirically weighted factors.

The corresponding optical parameters of NIR-I from 620nm to 900nm are shown in Fig. 4(c).

3.3 Dataset

The dataset of 3DOL contains eleven different imaged models, including three commonly digital imaged objects [52], imaging objects used in IPS and UHR-DeepFMT [21] [25], a poly-formaldehyde cube, three vivo mouse body models, and two vivo mouse head models. Meanwhile, the digital imaged objects are as shown in Fig. 5: 1. a simulated homogeneous adipose cube. 2.a standard simulated mouse head with a size of 18 mm $\times$ 20.8 mm $\times$ 30 mm (discretized into three tissues: adipose, skull, and brain). 3.a standard simulated mouse body with a size of 38mm $\times$ 20.8mm $\times$ 35mm (discretized into six tissues: muscle, liver, kidneys, stomach, lungs, and heart).

Fig. 5. Imaged objects setting of simulation experiment, including (a) homogeneous adipose cube, (b) head model with three tissues (adipose, skull, and brain), and (c) body model with six tissues (muscle, liver, kidneys, stomach, lungs, and heart)

Download Full Size | PDF

Monte Carlo eXtreme (MCX) was used to simulate the optical field at the 680 nm emission wavelength and further generate boundary optical measurement [53]. 5400 cases of single-luminous source simulations with random positions, radii (3-5 voxels), and three different shapes (sphere, cylinder, and cube) were adopted to build a single-source set, and two single sources were randomly located to construct 5400 cases of the dual-source set.

For poly-formaldehyde cube and each vivo mouse model. The anatomical structures and geometries of the imaged objects were captured by a micro-CT. The optical images were collected using an electron-multiplying charge-coupled device (EMCCD) camera, and further convert it into boundary optical measurement by combining CT data, which as test set in in vivo experiment.

Finally, a total of 10800 cases of data were randomly split at 3D level: 90% for training and 10% for validation. 3753 test data were generated for simulation experiments as test set. By observing the loss curves of the training and validation sets during the training process, the final network determined for testing was prevented from overfitting or underfitting. All the inputs and labels were resized to 90 $\times$ 104 $\times$ 70 voxels, which ensured consistency in the data dimensions and enabled effective processing and analysis.

3.4 Simulation experiments

Three groups of experiments are designed to verify the performance of 3DOL.

Group 1-Ablation experiment: As a multi-module cascaded neural network, verifying the effectiveness of each module is necessary. Ablation studies of each module were conducted using the dual-source reconstruction of three different objects by removing or replacing one specific module of 3DOL. The spherical targets were centered at (25, 52, 35) and (65, 52, 35), respectively.

Group 2-Validation experiments at different spectra: In practical application, different optical probes have different emission spectra; however, the networks are always trained at a specific wavelength. Better wavelength generalization ability can avoid retraining the networks when the wavelength changes. The structure of 3DOL indicates that it has a certain learning ability of the optical field and the photon propagation property and has the potential to realize generalization to a wider range of wavelengths. To assess its adaptation to spectra in NIR-I, we randomly generated 50 samples of luminous sources for each digital object and further simulated the test cases at wavelengths between 620 and 900 nm (at 20 nm intervals). These cases were tested using 3DOL trained at 680 nm.

Group 3-Performance validation under different datasets: To demonstrate the performance of 3DOL under various datasets and different wavelengths over other networks, 50 samples of luminous sources for IPS and UHR-DeepFMT were randomly generated, and these data were simulated for test cases at wavelengths between 620 and 900 nm (at a 20 nm interval). The comparison methods, IPS and UHR-DeepFMT were trained and tested using their respective datasets.

3.5 Phantom and in vivo experiments

To demonstrate the universality of 3DOL in a real environment, experiments were conducted on phantom and in vivo models.

In the phantom experiment, a custom-made polyformaldehyde cube (25 $mm$ in side length) with a hole (1.5mm in diameter) was used. The hole was filled with a 2$\mu L$ luminescent liquid (Glow Products, Victoria, Canada), which acted as a luminous source in the phantom. The luminescent liquid emitted NIR-I light with a peak wavelength of approximately 650 nm.

In the in vivo experiments using 4–6-week-old female BALB/C nude mice, two groups of mouse models were established. All experimental procedures were approved by the Animal Ethics Committee of the Northwest University of China. All animal procedures were performed under isoflurane gas anesthesia (3% isoflurane-air mixture) to minimize suffering.

The first group of in vivo experiments was based on source-implanted mouse models, in which a small catheter (5 mm high, 1.5 mm diameter) filled with 6$\mu L$ luminous liquid (Glow Products, Victoria, Canada) as the luminous source was sewn into the upper abdomen, chest, and lower abdomen of three different mice (signed as mouse-1, mouse-2 and mouse-3, respectively). The anatomical structures and geometric information of the source-implanted mouse models were acquired by CT. For subsequent processing, the CT data were segmented into different tissues, including the muscle, heart, liver, lungs, kidneys, and stomach.

The second group of in vivo experiments was conducted to verify the effectiveness of 3DOL for different optical probes. In this study, two optical probes with different spectra were used: a fluorescent probe (TF-IRDye800, 774 nm) [54] and green fluorescent protein (GFP, 671 nm) [55]. In the preparation stage, $1\times 10^{6}$ U87MG cells were injected into the brains of two mice to establish an orthotopic glioma. After 11 days, optical images (fluorescent and bioluminescent) were collected. First, TF-IRDye800 was injected into mice to acquire fluorescence images, and then U87MG cells were labeled with GFP to acquire bioluminescence images. The heads of the two tumor-bearing mice (signed as mouse-4 and mouse-5) were segmented into three tissues (adipose, skull, and brain) using CT data. Additionally, to determine the actual source region, T2-weighted MR images (M3TM, Aspect Imaging, Israel) were acquired with the following parameters: TR 6000 ms, TE 50 ms, slice thickness 0.7 mm, and slice spacing 0.2 mm.

For the raw data of the CT volume acquisition process, after tube warm-up, X-ray calibration, CT attenuation correction, and other necessary preparations, the tube voltage and current were set to 40 kVp and 300 mA, respectively. The rotating stage was rotated $360^{\circ }$ at $1^{\circ }$ intervals to capture X-ray projection images. Before optical acquisition, the EMCCD camera coupled with a bandpass filter was cooled to $-80^{\circ }C$ to reduce the effects of thermal noise. In the optical image acquisition process, a short exposure time (20 s), high shift speed (12.9$\mu$s), low-speed readout rate (1 MHz at 16-bit), and four $\times$ four binning steps were established to increase the signal-to-background ratio.

Specifically, IPS and UHR-DeepFMT were adopted as comparison methods. Considering the requirements of the dataset, we trained the IPS and UHR-DeepFMT separately for the polyformaldehyde cube and each mouse for verification.

3.6 Evaluation index

For quantitative evaluation, six common indices were used to evaluate two different tasks, among which the root mean square error (RMSE), Jensen-Shannon divergence (JS), and cosine (COS) were used to evaluate the optical field recovery task. Intersection over union (IOU), location error (LE), and signal-to-noise ratio (SNR) were also used to evaluate the luminous source reconstruction task.

The RMSE evaluates the optical field recovery tasks by the peak value difference between the recovered field $\phi _{r}$ and the ground truth $\phi$, is defined as

(13)$$RMSE=\sqrt{\frac{1}{n}\sum_{i=1}^{n}\left(\phi_{r}\left(i\right)-\phi\left(i\right)\right)}$$

JS was adopted to evaluate the distribution difference of the optical field recovery tasks between the recovered and true fields.

(14)$$JS=\left\|\phi_{r}log\frac{\phi_{r}}{\phi}+\phi log\frac{\phi}{\phi_{r}}\right\|_{1}$$

The cosine evaluates the similarity between the recovered optical field and the ground truth by measuring their angles:

(15)$$COS=\frac{\phi_{r}^{T}\phi}{\|\phi_{r}\|\|\phi\|}$$

To verify the structural recovery capability of the luminous source, the IOU was adopted to denote the similarity between the true source region $q_{r}$ and the reconstructed source region $\Theta _{t}$:

(16)$$IOU=\frac{\Theta_{r}\cap \Theta_{t}}{\Theta_{r}\cup \Theta_{t}}$$

LE is the Euclidean distance between the reconstructed ($x_{r}$, $y_{r}$, $z_{r}$) and real source barycenter ($x_{0}$, $y_{0}$, $z_{0}$), which was used to evaluate the localization accuracy of the reconstruction results.

(17)$$LE=\sqrt{\left(x_{r} - x_{0}\right )^{2}+\left(y_{r} - y_{0}\right )^{2}+\left(z_{r} - z_{0}\right )^{2}}$$

SNR, expressed in decibels (dB), measures how well the reconstructed image is distinguished from the background noise; the higher the SNR, the better the image produced:

(18)$$SNR = \lg \left\{\frac{\left(\Theta_{t}\right)^{2}}{\left(\Theta_{r}-\Theta_{t}\right)^{2}}\right\}$$

Higher COS, IOU, and SNR indicate better results, whereas lower LE, RMSE, and JS denote better results. The best indices are highlighted in bold, while the worst are underlined in the quantitative results tables.

4. Results

4.1 Loss curve analysis

Figure 6 the results of the loss ablation experiment. During the task of recovering the optical field, the sum of Huber loss and JS loss is the learning objective. The Huber loss alone is capable of reaching convergence at the highest speed. However, it converges at a higher RMSE and a lower COS compared to the sum of the JS and Huber loss. The JS loss converged better than the Huber loss at a slower speed. Moreover, the sum of Huber and JS loss realizes the best convergence of RMSE, JS, and COS with a moderate speed, as shown in Fig. 6(a-c). For the source-reconstruction task shown in Fig. 6(d-f), the NMSE loss realized the convergence of LE, IOU, and SNR but with a slower speed than the sum of NMSE and IOU loss. When the IOU alone acts as a loss function, the values of LE and SNR decline rapidly, whereas the IOU increases rapidly in the first 200 iterations. However, these indexes become worse in the remaining iterations, indicating the IOU loss result in over-fitting under this parameter setting. Therefore, the loss function of JS + Huber and MSE + IOU are reasonable and better choice for the corresponding tasks.

Fig. 6. The ablation study for the loss function. (a-c) are the curves of the RMSE, JS, and COS indexes for the optical field recovery task, respectively. (d-f) are the curves for LE, IOU, and SNR indexes, respectively.

Download Full Size | PDF

4.2 Network model evaluation

To evaluate the operational efficiency of the network, we compared IPS, UHR-DeepFMT, and 3DOL regarding the test time, number of parameters, output image dimensions, and network efficiency, as presented in Table 1. Evidently, the lightweight fully connected structure of IPS has the shortest test time; however, it tends to introduce a large number of parameters in the calculation process. The accuracy of the reconstruction result is constrained by the mesh density, which results in the lowest network efficiency among all algorithms.

Table 1. The evaluation of spatial, temporal and output resolution of different reconstruction network models.

View Table | View all tables in this article

UHR-DeepFMT has the fewest parameters, and its relatively complex framework makes its test time longer than that of IPS. Compared with IPS, the imaging results of voxel-based UHR-DeepFMT are not constrained by the mesh. But the point density of the output image of UHR-DeepFMT is insufficient, which results in a lower network efficiency than 3DOL and limits its application in complex imaging environments.

Compared to IPS and UHR-DeepFMT, 3DOL is a voxel based reconstruction framework with a moderate number of parameters. Although requiring the longest test time, 3DOL achieves the highest network efficiency (approximately 483 times that of IPS and 2.3 times that of UHR-DeepFMT) and is not limited by the mesh accuracy.

4.3 Numerical simulations results

In Group 1, we explored the effectiveness of each module in 3DOL. The ground truth of the optical field and luminous source distribution are shown in Fig. 7(a1-c1) and (d1-f1), respectively.

Fig. 7. Results of ablation experiments. The ground truths of the optical field and luminous source are given by (a1-c1) and (d1-f1), respectively. (a2-f2) are results of removing anatomy provided by CT in BCFM; (a3-f3) are results of replacing GRU with MLP in FSM; (a4-f4) are results of deleting BOM; (a5-f5) are results of replacing PCM with a numerical solution; (a6-f6) are results of a complete 3DOL.

Download Full Size | PDF

First, as illustrated in Fig. 7(a2-c2), the recovered optical fields of 3DOL without anatomical information are over-sparse and close to the object surface, which is far from the ground truth. Similar results were obtained for the reconstructed sources (Fig. 7(d2-f2)). Since boundary optical measurement alone cannot accurately reflect variations in the structure of imaged objects (cubes, heads, and bodies), finding a compromise solution that can satisfy different structures in a unique dataset is difficult.

Secondly, the bidirectional GRU in 3DOL was replaced by multilayer perception (MLP) [56], the results of which are shown in Figs. 7(a3-f3). Obvious artifacts exist along the slice stacking direction in the recovered optical field, which severely deteriorates the source reconstruction result, particularly for the cube case. MLP can only image objects by identifying the differences (energy and geometry distributions) between 2D slices, which cannot fully reflect the full view of a 3D object. For example, unlike the head and body, the geometric information provided by each slice in a cube is the same, leading to worse results compared to the cases in the head and body.

Third, the BOM was removed from the 3DOL. As shown in Fig. 7(a4-c4), the energy intensity of the recovered optical field is lower than the ground truth, and a few low-energy artifacts appear between the two sources, which further causes the deviation of morphological recovery between dual sources, as shown in Fig. 7(d4-f4). 3DOL without BOM exhibits more uncertainty owing to a larger redundant solution domain, which makes it difficult for the parameters to converge completely under fixed iteration settings.

Fourth, the PCM is replaced by a numerical solution of the simplified photon propagation equation (Diffusion Equation), which is a widely used approximation in OMT [57,58]. The results are presented in Fig. 7(a5-f5). The simplified photon propagation equation introduces inevitable model errors, and its accuracy is limited by the mesh density and shape, resulting in the reconstructed sources being oversmooth and deviating in morphology.

Finally, in Fig. 7(a6-f6), the results indicate that the complete 3DOL can image all objects simultaneously and achieve the recovery of optical field density and energy distribution and high-precision luminous source reconstruction in morphology and spatial positions.

Table 2. Quantitative analysis of the ablation experiment for the simulations.

View Table | View all tables in this article

Table 2 lists the quantitative results of the ablation studies, which are consistent with the visual evaluation. The complete 3DOL had the lowest RMSE, JS, and LE and the highest IOU, SNR, and COS in most cases. It can be found that BCFM, FSM, and BOM are indispensable for the recovery of optical field in the scene with multi-imaged objects; deleting or replacing one of them will directly make the network difficult to extract 3D features of imaged objects and result in high RMSE (>0.018) and JS (> 0.2) and low COS (< 0.75). Moreover, 3DOL with BOM deletion can reconstruct partial regions of the source (LE < 0.6mm, IOU > 0.6, and SNR > 36dB) since the PCM is designed based on a physical model that has excellent robustness to low-energy disturbances in the optical field.

In Group 2, Fig. 8 shows the optical field recovery and luminous source reconstruction results of 3DOL trained under a wavelength of 680 nm to deal with diverse spectra in the test cases under different spectra in NIR-I. Specifically, for optical field recovery, RMSE remains flat at wavelengths between 620n–900 nm. JS shows an obvious increment at wavelengths lower than 680 nm; however, it remains acceptable (average < 0.13) and superior when the wavelength is higher than 700 nm (average < 0.025). Moreover, the COS obtains a small increase (from 0.94 to 0.955 in average) when the wavelength rises from 620 nm to 660 nm, and then the COS stays stable between 660–900 nm. The lower and stable RMSE represents that 3DOL can recover the high-energy area of the optical field. However, at wavelengths between 620–680 nm, the declining JS and ascending COS coefficients demonstrate that the optical field recovered by 3DOL has some low-energy artifacts due to the change in the spectrum.

Fig. 8. Statistical results of the robustness of 3DOL at different emission wavelengths for the two tasks. (a-c) are the box-and-whisker plot and the trendline of RMSE, JS, and COS for optical field recovery; (d-f) are the box-and-whisker plot and the trendline of LE, IOU, and SNR for luminous source reconstruction.

Download Full Size | PDF

Moreover, for source reconstruction, the LE coefficient is significantly stable, whereas the IOU is the highest below 680 nm and slightly decreases when generalized to other wavelengths. Additionally, there is only negligible fluctuation in SNR at 620–900 nm, and it reaches its highest value at 680 nm. Although JS and COS fluctuated significantly, the LE, IOU, and SNR for source reconstruction remained stable (0.1 mm < LE < 0.3 mm, 0.65 < IOU < 0.71, and 34 dB < SNR < 38 dB on average), which is attributed to the low-energy artifacts being filtered by the PCM.

Generally, 3DOL has excellent robustness to emission wavelengths between 620–900 nm. Although the spectral changes lead to low-energy artifacts in the optical field recovered by 3DOL, in the process of luminous source reconstruction, the artifacts are filtered by the PCM, which makes 3DOL perform stably in the luminous source reconstruction task.

In Group 3, Fig. 9 are the results of compatibility of 3DOL trained under hybrid datasets of IPS and UHR-DeepFMT. Its generalization ability to wavelengths in NIR-I is compared with IPS and UHR-DeepFMT’s.

Fig. 9. Comparative results of Group 3. (a-c) and (d-f) are the results of comparative experiments of 3DOL (brown area) with UHR-DeepFMT (blue area) and IPS (green area) in different emission wavelengths, respectively

Download Full Size | PDF

First, Fig. 9(a-c) compares the robustness of IPS and 3DOL to different wavelengths in NIR-I. The IOU, LE, and SNR of IPS were unstable. Although the lowest LE (approximately 0.1 mm), highest IOU (> 0.9), and SNR (> 35 dB) at 680 nm in IPS are beyond expectations, their strongly discretized distribution indicates poor performance at all wavelengths. Additionally, the sharp fluctuations in IPS indices along with the change in wavelength are unacceptable, particularly the poor performance at wavelengths of 620, 640, and 660 nm. The results indicate that IPS trained with the datasets at a certain wavelength cannot be generalized to other wavelengths. This is because the fully connected network of IPS tends to introduce numerous parameters, resulting in low fitting ability and poor robustness.

Second, as shown in Fig. 9(d-f), the performance of UHR-DeepFMT is even better than that of 3DOL at 680 nm, which realizes more stable and accurate reconstruction (with lower LE, higher IOU, and SNR). However, the generalization ability of UHR-DeepFMT remains weaker than 3DOL’s. Specifically, when generalized to other wavelengths, although UHR-DeepFMT could realize stable position accuracy at wavelengths higher than 660 nm, its IOU fluctuated with the change in wavelength, and the SNR always performed worse than 3DOL. Particularly at wavelengths below 640 nm, these indices show a noticeable deterioration, which is a trend similar to that of the IPS’s. This is attributed to the fact that the IPS and UHR-DeepFMT cannot adjust to changes in the energy distribution of the boundary optical measurement, which are caused by the significantly increased absorption coefficients at wavelengths under 640 nm.

Finally, as illustrated in Fig. 9, compared to IPS and UHR-DeepFMT, although trained under a hybrid dataset, 3DOL realizes accurate reconstruction and can be stably generalized to different wavelengths in NIR-I. The sophisticated mapping modes expressed in Eq. (1) bring powerful regularization effects to the reconstruction of OMT. Moreover, the cascaded design of BCFM and FSM enables 3DOL to accurately understand the 3D structure of imaging objects from multimodal inputs, and the introduction of an optical field recovery task enables 3DOL to correctly understand the relationship between optical boundary measurement, optical field, and luminous source. The above factors make 3DOL recognize the differences between IPS and UHR-DeepFMT datasets (including different imaged objects, luminescence mechanism, and photon propagation model) and achieve stable reconstruction within a wide spectral range.

4.4 Phantom and in vivo results

Since obtaining the optical field in a real situation is difficult, only the LE, IOU, and SNR indicators are listed in Table 3 and Table 4.

Table 3. Quantitative results of the phantom and source implanted experiments.

View Table | View all tables in this article

Table 4. Quantitative results of the orthotopic glioma experiments.

View Table | View all tables in this article

Particularly, Fig. 10 demonstrates the comparative reconstruction results of phantom and source-implanted mouse models. Since IPS and UHR-DeepFMT cannot recover the optical fields, only the reconstructed sources are displayed. First, the sources reconstructed by IPS were over-sparsed in both the phantom and mouse models, as shown in Fig. 10(a1)-(d1). Particularly in mouse models, the reconstructed luminous sources are discretely distributed and show obvious artifacts outside the boundaries of the true regions.

Fig. 10. Results of the polyformaldehyde cube (the first column) and the source-implanted mouse models (mouse-1, 2, and 3 are respectively corresponding to the last three columns). (a1-d1) and (a2-d2) are the reconstructed luminous sources by IPS and UHR-DeepFMT, respectively. (a3-d3) and (a4-d4) are the recovered optical fields and reconstructed luminous sources by 3DOL, respectively. The highlighted red regions represent the real implant sources.

Download Full Size | PDF

Second, the sources reconstructed by UHR-DeepFMT tend to display a long cylindrical shape, and the energy is distributed almost uniformly, which results from the interpolation between slices with large spacing. In the phantom experiment, there was an artifact with high energy located far from the ground truth, which demonstrated deteriorated reconstruction performance. In addition, there was an angular deviation between the reconstruction and ground truth in the mouse model experiments.

Third, for 3DOL, although the spatial shape of the real luminous sources had a wide difference compared with the training dataset’s, the morphology and location of the reconstructed regions were per the ground truth, demonstrating its superior ability for source reconstruction compared with IPS and UHR-DeepFMT.

The quantitative metrics listed in Table 3 are in accordance with the visualized results. Specifically, IPS tends to obtain lower LE values, but the IOU is far from satisfactory, and its SNRs are at a lower level. UHR-DeepFMT always obtains the highest LE and the lowest IOU. Additionally, the indices of IPS and UHR-DeepFMT show obvious fluctuations when reconstructing different objects, although they are trained separately for each imaged object. 3DOL obtains very stable and excellent results for different imaged objects with IOU > 0.5, LE < 0.3mm and SNR > 20.

Figure 11 shows the visualized results of orthotopic glioma models with different optical probes. First, the results of IPS were oversparse and morphologically close to the tetrahedral unit of the mesh for both bioluminescence and fluorescence images. In mouse 4, IPS reconstructed obvious artifacts outside the truth region for the fluorescence image with the lowest IOU and lowest SNR. This is because the emission wavelength of TF-IRDye800 is farther away from the training simulation wavelength (680 nm) than GFP; however, IPS has poor generalization ability to different wavelengths, leading to deteriorating performance in fluorescence images.

Fig. 11. Reconstruction results of orthotopic glioma experiments. (a1) and (c1) are bioluminescence images for mouse-4 and mouse-5. (b1) and (d1) are fluorescence images for mouse-4 and mouse-5. (a2-d2) and (a4-d4) are the luminous sources reconstructed by IPS and UHR-DeepFMT, respectively. (a6-d6) and (a7-d7) are recovered optical fields and reconstructed luminous sources on the 35th slice of 3DOL, respectively. (a3-d3), (a5-d5), and (a8-d8) are the respective luminous sources reconstructed by 3DOL, IPS, and UHR-DeepFMT, where regions highlighted by the green dotted area are reconstructed regions and the red dotted circles portray the true luminous source provided by MRI.

Download Full Size | PDF

Second, the luminous source regions reconstructed by UHR-DeepFMT overlap part of the real region and deviate inward, with a large portion outside the real boundary. Additionally, compared to GFP, TF-IRDye800 is laser excited, which results in autofluorescence and significantly reduces the reconstruction performance [59], with higher LE, lower IOU, and lower SNR. Specifically, the reconstructed region deviated significantly to the right-hand side of the ground truth, which may be because the laser was closer to the right side, resulting in more autofluorescence.

Finally, although 3DOL was trained on a hybrid dataset, the reconstructed sources for both GFP and TF-IRDye800 showed clear boundaries and morphological similarities with the ground truth. Particularly, there is background noise in the recovered optical field owing to autofluorescence in the fluorescence images; however, 3DOL can finally realize source reconstruction with fewer artifacts with the lowest LE, highest IOU, and highest SNR.

5. Discussion

OMT is a promising technique for preclinical studies; however, the limited universality of the existing approaches for different imaged objects and various optical probes poses key challenges that hinder its practical application. To overcome these limitations, we examined the imaging principle of OMT to abstract a special mapping relationship. Subsequently, a multimodal and multitask reconstruction framework for optical molecular tomography (3DOL) was proposed to concretize this mapping in 3D space by adding an anatomical structure as a priori and dividing OMT into two tasks (optical field recovery and luminous source reconstruction). This deep reconstruction approach is universally compatible with various imaged objects and is highly robust to the emission spectra in NIR-I.

The construction of 3DOL follows the idea of modularization with a creative structural design that conforms to multitask and multimodality DL frameworks to reconstruct large-scale 3D images effectively. Through the incorporation of sequential 2D multimodality tomographic scan slices, a bidirectional GRU was employed to integrate the features of sequential slices to achieve considerable improvement in the volumetric image quality and recover more details of the optical field. Furthermore, 3DOL learned a variable-weight Laplace operator and reconstructed the luminous source from the recovered optical field based on field theory. Particularly, the geometry of the imaged object was applied to accelerate network convergence and further improve the reconstruction performance.

The effectiveness of each module in 3DOL was verified through an ablation study in the simulation experiment. The results indicated that the performance of 3DOL depends on the collective cooperation of all modules. Furthermore, a series of experiments demonstrated that 3DOL could overcome the bottlenecks faced by OMT. Specifically, 1. 3DOL can simultaneously image different objects. 2. The trained 3DOL under 680 nm can generalize the parameters to the entire NIR-I spectra (620–900 nm). 3. 3DOL can be trained with a hybrid dataset that combines IPS and UHR-deepFMT datasets and realizes reconstruction with high image resolution; 4. The advantages of 3DOL in the simulation were further validated using phantom and in vivo experiments, which demonstrated that 3DOL is a practical deep reconstruction network.

Generally, 3DOL has the advantages of universality and scalability, and its crafted structure and unique optimization scheme have significant potential for development in OMT. We believe that the new concept introduced by 3DOL will benefit various preclinical applications of OMT and promote the development of OMT in theoretical research. However, 3DOL has some limitations. For example, as the large number of parameters in GRU takes up considerable memory, we must squeeze the batch size, which causes a long training time of approximately 48 h. However, once the network is well trained, the testing process can be completed in less than 1 s. Additionally, since there is no public dataset in OMT, it is difficult to conduct test to validate the statistical difference of 3DOL in real situation. In the future, we will be devoteded to developing more advanced RNN to improve training effectiveness, and worked on building in-vivo optical mouse model dataset to support the statistical test of methods.

6. Conclusion

We overcame the bottleneck of universal OMT for diverse imaged objects and various optical probes by realizing a brand-new mapping model through a multimodal and multitask reconstruction framework, 3DOL. Particularly, 3DOL recovers the optical field from optical and CT tomographic images and reconstructs the luminous source in space. The results of the numerical simulation and in vivo experiments verified the rationality of 3DOL architecture, the universality of 3DOL under different tissue conditions, and its applicability in practical situations. As a preliminary exploration of the universal OMT approach, we believe that this study pushes OMT to a new level and has broad prospects for applications.

Funding

National Natural Science Foundation of China (12271434, 61901374, 61906154, 61971350, 62271394).

Acknowledgments

The authors are grateful to the CAS Key Laboratory of Molecular Imaging for in vivo the experiment data.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. R. Weissleder and M. Nahrendorf, “Advancing biomedical imaging,” Proc. Natl. Acad. Sci. 112(47), 14424–14428 (2015). [CrossRef]

2. K. Wang, C. Chi, Z. Hu, M. Liu, H. Hui, W. Shang, D. Peng, S. Zhang, J. Ye, H. Liu, and J. Tian, “Optical molecular imaging frontiers in oncology: the pursuit of accuracy and sensitivity,” Engineering 1(3), 309–323 (2015). [CrossRef]

3. S. R. Arridge and J. C. Schotland, “Optical tomography: forward and inverse problems,” Inverse problems 25(12), 123010 (2009). [CrossRef]

4. C. Darne, Y. Lu, and E. M. Sevick-Muraca, “Small animal fluorescence and bioluminescence tomography: a review of approaches, algorithms and technology update,” Phys. Med. Biol. 59(1), R1–R64 (2014). [CrossRef]

5. V. Ntziachristos, C.-H. Tung, C. Bremer, and R. Weissleder, “Fluorescence molecular tomography resolves protease activity in vivo,” Nat. Med. 8(7), 757–761 (2002). [CrossRef]

6. G. Wang, Y. Li, and M. Jiang, “Uniqueness theorems in bioluminescence tomography,” Med. Phys. 31(8), 2289–2299 (2004). [CrossRef]

7. M. Schweiger, S. Arridge, M. Hiraoka, and D. Delpy, “The finite element method for the propagation of light in scattering media: boundary and source conditions,” Med. Phys. 22(11), 1779–1792 (1995). [CrossRef]

8. D. Yang, X. Chen, Z. Peng, X. Wang, J. Ripoll, J. Wang, and J. Liang, “Light transport in turbid media with non-scattering, low-scattering and high absorption heterogeneities based on hybrid simplified spherical harmonics with radiosity model,” Biomed. Opt. Express 4(10), 2209–2223 (2013). [CrossRef]

9. W. Huang, K. Wang, Y. An, H. Meng, Y. Gao, Z. Xiong, H. Yan, Q. Wang, X. Cai, X. Yang, T. Jie, and S. Zhang, “In vivo three-dimensional evaluation of tumour hypoxia in nasopharyngeal carcinomas using fmt-ct and msot,” Eur. J. Nucl. Med. Mol. Imaging 47(5), 1027–1038 (2020). [CrossRef]

10. W. Ren, L. Li, J. Zhang, M. Vaas, J. Klohs, J. Ripoll, M. Wolf, R. Ni, and M. Rudin, “Non-invasive visualization of amyloid-beta deposits in alzheimer amyloidosis mice using magnetic resonance imaging and fluorescence molecular tomography,” Biomed. Opt. Express 13(7), 3809–3822 (2022). [CrossRef]

11. E. J. Candes, J. K. Romberg, and T. Tao, “Stable signal recovery from incomplete and inaccurate measurements,” Comm. Pure Appl. Math. 59(8), 1207–1223 (2006). [CrossRef]

12. H. Zhang, X. He, J. Yu, X. He, H. Guo, and Y. Hou, “L1-l2 norm regularization via forward-backward splitting for fluorescence molecular tomography,” Biomed. Opt. Express 12(12), 7807–7825 (2021). [CrossRef]

13. Y. Chen, W. Li, M. Du, L. Su, H. Yi, F. Zhao, K. Li, L. Wang, and X. Cao, “Elastic net-based non-negative iterative three-operator splitting strategy for cerenkov luminescence tomography,” Opt. Express 30(20), 35282–35299 (2022). [CrossRef]

14. H. Guo, J. Yu, X. He, H. Yi, Y. Hou, and X. He, “Total variation constrained graph manifold learning strategy for cerenkov luminescence tomography,” Opt. Express 30(2), 1422–1441 (2022). [CrossRef]

15. S. Zhang, X. Ma, Y. Wang, M. Wu, H. Meng, W. Chai, X. Wang, S. Wei, and J. Tian, “Robust reconstruction of fluorescence molecular tomography based on sparsity adaptive correntropy matching pursuit method for stem cell distribution,” IEEE Trans. Med. Imaging 37(10), 2176–2184 (2018). [CrossRef]

16. L. Yin, K. Wang, T. Tong, Q. Wang, Y. An, X. Yang, and J. Tian, “Adaptive grouping block sparse bayesian learning method for accurate and robust reconstruction in bioluminescence tomography,” IEEE Trans. Biomed. Eng. 68(11), 3388–3398 (2021). [CrossRef]

17. P. Zhang, C. Ma, F. Song, G. Fan, Y. Sun, Y. Feng, X. Ma, F. Liu, and G. Zhang, “A review of advances in imaging methodology in fluorescence molecular tomography,” Phys. Med. Biol. 67(10), 10TR01 (2022). [CrossRef]

18. L. Huang, H. Chen, Y. Luo, Y. Rivenson, and A. Ozcan, “Recurrent neural network-based volumetric fluorescence microscopy,” Light: Sci. Appl. 10(1), 62 (2021). [CrossRef]

19. G. Wang, J. C. Ye, and B. De Man, “Deep learning for tomographic image reconstruction,” Nat. Mach. Intell. 2(12), 737–748 (2020). [CrossRef]

20. A. X. Cong and G. Wang, “A finite-element-based reconstruction method for 3d fluorescence tomography,” Opt. Express 13(24), 9847–9857 (2005). [CrossRef]

21. Y. Gao, K. Wang, Y. An, S. Jiang, H. Meng, and J. Tian, “Nonmodel-based bioluminescence tomography using a machine-learning reconstruction strategy,” Optica 5(11), 1451–1454 (2018). [CrossRef]

22. H. Meng, Y. Gao, X. Yang, K. Wang, and J. Tian, “K-nearest neighbor based locally connected network for fast morphological reconstruction in fluorescence molecular tomography,” IEEE Trans. Med. Imaging 39(10), 3019–3028 (2020). [CrossRef]

23. Z. Zhang, M. Cai, Y. Gao, X. Shi, X. Zhang, Z. Hu, and J. Tian, “A novel cerenkov luminescence tomography approach using multilayer fully connected neural network,” Phys. Med. Biol. 64(24), 245010 (2019). [CrossRef]

24. L. Guo, F. Liu, C. Cai, J. Liu, and G. Zhang, “3d deep encoder–decoder network for fluorescence molecular tomography,” Opt. Lett. 44(8), 1892–1895 (2019). [CrossRef]

25. P. Zhang, G. Fan, T. Xing, F. Song, and G. Zhang, “Uhr-deepfmt: ultra-high spatial resolution reconstruction of fluorescence molecular tomography based on 3-d fusion dual-sampling deep neural network,” IEEE Trans. Med. Imaging 40(11), 3217–3228 (2021). [CrossRef]

26. D. Li, C. Chen, J. Li, and Q. Yan, “Reconstruction of fluorescence molecular tomography based on graph convolution networks,” J. Opt. 22(4), 045602 (2020). [CrossRef]

27. J.-B. Li, H.-W. Liu, T. Fu, R. Wang, X.-B. Zhang, and W. Tan, “Recent progress in small-molecule near-ir probes for bioimaging,” Trends in chemistry 1(2), 224–234 (2019). [CrossRef]

28. Z. Hu, W.-H. Chen, J. Tian, and Z. Cheng, “Nirf nanoprobes for cancer molecular imaging: approaching clinic,” Trends Mol. Med. 26(5), 469–482 (2020). [CrossRef]

29. B. Wang, S. Li, L. Zhang, J. Li, Y. Zhao, J. Yu, X. He, H. Guo, and X. He, “A review of methods for solving the optical molecular tomography,” J. Appl. Phys. 133(13), 130701 (2023). [CrossRef]

30. D. A. Boas, D. H. Brooks, E. L. Miller, C. A. DiMarzio, M. Kilmer, R. J. Gaudette, and Q. Zhang, “Imaging the body with diffuse optical tomography,” IEEE Signal Process. Mag. 18(6), 57–75 (2001). [CrossRef]

31. J. Feng, W. Zhang, Z. Li, K. Jia, S. Jiang, H. Dehghani, B. W. Pogue, and K. D. Paulsen, “Deep-learning based image reconstruction for mri-guided near-infrared spectral tomography,” Optica 9(3), 264–267 (2022). [CrossRef]

32. K. O’Shea and R. Nash, “An introduction to convolutional neural networks,” arXiv, arXiv:1511.08458 (2015). [CrossRef]

33. W. Zaremba, I. Sutskever, and O. Vinyals, “Recurrent neural network regularization,” arXiv, arXiv:1409.2329 (2014). [CrossRef]

34. S. Li, J. Yu, X. He, H. Guo, and X. He, “Voxdmrn: a voxelwise deep max-pooling residual network for bioluminescence tomography reconstruction,” Opt. Lett. 47(7), 1729–1732 (2022). [CrossRef]

35. A. Alfalou and C. Brosseau, “Recent advances in optical image processing,” Prog. Opt. 60, 119–262 (2015). [CrossRef]

36. X. Chen, X. Gao, D. Chen, X. Ma, X. Zhao, M. Shen, X. Li, X. Qu, J. Liang, J. Ripoll, and J. Tian, “3d reconstruction of light flux distribution on arbitrary surfaces from 2d multi-photographic images,” Opt. Express 18(19), 19876–19893 (2010). [CrossRef]

37. X. Xiang, Y. Tian, V. Rengarajan, L. D. Young, B. Zhu, and R. Ranjan, “Learning spatio-temporal downsampling for effective video upscaling,” Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XVIII pp. 162–181 (2022).

38. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition pp. 770–778 (2016).

39. J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv, arXiv:1412.3555 (2014). [CrossRef]

40. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation 9(8), 1735–1780 (1997). [CrossRef]

41. B. J. Wythoff, “Backpropagation neural networks: a tutorial,” Chemom. Intell. Lab. Syst. 18(2), 115–155 (1993). [CrossRef]

42. C. Yi and J. Huang, “Semismooth newton coordinate descent algorithm for elastic-net penalized huber loss regression and quantile regression,” Journal of Computational and Graphical Statistics 26(3), 547–557 (2017). [CrossRef]

43. B. Fuglede and F. Topsoe, “Jensen-shannon divergence and hilbert space embedding,” International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings. p. 31 (2004).

44. E. Lescano and D. Marqués, “Second order higher-derivative corrections in double field theory,” J. High Energ. Phys. 2017(6), 104–129 (2017). [CrossRef]

45. D. Deutsch and P. Candelas, “Boundary effects in quantum field theory,” Phys. Rev. D 20(12), 3063–3080 (1979). [CrossRef]

46. H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition pp. 658–666 (2019).

47. D. Yarotsky, “Error bounds for approximations with deep relu networks,” Neural Networks 94, 103–114 (2017). [CrossRef]

48. D. M. Allen, “Mean square error of prediction as a criterion for selecting variables,” Technometrics 13(3), 469–475 (1971). [CrossRef]

49. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014). [CrossRef]

50. G. Alexandrakis, F. R. Rannou, and A. F. Chatziioannou, “Tomographic bioluminescence imaging by use of a combined optical-pet (opet) system: a computer simulation feasibility study,” Phys. Med. Biol. 50(17), 4225–4241 (2005). [CrossRef]

51. W.-F. Cheong, S. A. Prahl, and A. J. Welch, “A review of the optical properties of biological tissues,” IEEE J. Quantum Electron. 26(12), 2166–2185 (1990). [CrossRef]

52. B. Dogdas, D. Stout, A. F. Chatziioannou, and R. M. Leahy, “Digimouse: a 3d whole body mouse atlas from ct and cryosection data,” Phys. Med. Biol. 52(3), 577–587 (2007). [CrossRef]

53. S. Yan and Q. Fang, “Hybrid mesh and voxel based monte carlo algorithm for accurate and efficient photon transport modeling in complex bio-tissues,” Biomed. Opt. Express 11(11), 6262–6270 (2020). [CrossRef]

54. E. L. Rosenthal, J. M. Warram, E. De Boer, T. K. Chung, M. L. Korb, M. Brandwein-Gensler, T. V. Strong, C. E. Schmalbach, A. B. Morlandt, G. Agarwal, Y. E. Hartman, W. R. Carroll, J. S. Richman, L. K. Clemons, L. M. Nabell, and K. R. Zinn, “Safety and tumor specificity of cetuximab-irdye800 for surgical navigation in head and neck cancerfluorescent-labeled antibodies in human trials,” Clin. Cancer Res. 21(16), 3658–3666 (2015). [CrossRef]

55. M. Chalfie, Y. Tu, G. Euskirchen, W. W. Ward, and D. C. Prasher, “Green fluorescent protein as a marker for gene expression,” Science 263(5148), 802–805 (1994). [CrossRef]

56. M. W. Gardner and S. Dorling, “Artificial neural networks (the multilayer perceptron)–a review of applications in the atmospheric sciences,” Atmos. Environ. 32(14-15), 2627–2636 (1998). [CrossRef]

57. X. Zhang, X. Cao, P. Zhang, F. Song, J. Zhang, L. Zhang, and G. Zhang, “Self-training strategy based on finite element method for adaptive bioluminescence tomography reconstruction,” IEEE Trans. Med. Imaging 41(10), 2629–2643 (2022). [CrossRef]

58. R. Elaloufi, R. Carminati, and J.-J. Greffet, “Time-dependent transport through scattering media: from radiative transfer to diffusion,” J. Opt. A: Pure Appl. Opt. 4(5), 355S103 (2002). [CrossRef]

59. C. Qin, J. Feng, S. Zhu, X. Ma, J. Zhong, P. Wu, Z. Jin, and J. Tian, “Recent advances in bioluminescence tomography: methodology and system as well as application,” Laser Photonics Rev. 8(1), 94–114 (2014). [CrossRef]

Method	Parameters scale	Testing duration(s)	Output dimension	Network efficiency
IPS	$\underline{52244970}$	0.134( $\pm$ 0.008)	$\underline{3265 points}$	$\underline{6.249 E - 5}$
UHR-DeepFMT	4912912	0.949( $\pm$ 0.021)	65536 points	1.334E-2
3DOL	21690664	$\underline{1.315 (\pm 0.009)}$	655200 points	3.021E-2

Imaged object	Method	Luminous source			Optical field
Imaged object	Method	LE(mm)	IOU	SNR(dB)	RMSE	JS	COS
cube	Del CT inputs	$\underline{1.4368}$	$\underline{0.0017}$	$\underline{6.4442}$	$\underline{0.1174}$	$\underline{0.6890}$	$\underline{0.0087}$
	Replace GRU with MLP	0.5996	0.1591	19.3474	0.0553	0.4782	0.4167
	Del BOM	0.4753	0.6014	27.5594	0.0226	0.2035	0.7486
	Replace PCM with Numerical solution	0.2981	0.4961	28.2124	0.0093	0.0560	0.9463
	Complete 3DOL	0.1738	0.8343	36.9698	0.0093	0.0560	0.9463
brain	Del CT inputs	0.3305	$\underline{0.0455}$	$\underline{9.7773}$	$\underline{0.0375}$	$\underline{0.6630}$	$\underline{0.0324}$
	Replace GRU with MLP	0.2065	0.6462	28.1159	0.0180	0.2088	0.7190
	Del BOM	$\underline{0.5927}$	0.5711	32.3251	0.0232	0.2201	0.7297
	Replace PCM with Numerical solution	0.3380	0.5580	30.1149	0.0075	0.0413	0.9587
	Complete 3DOL	0.1434	0.8751	37.6389	0.0075	0.0413	0.9587
body	Del CT inputs	0.2047	$\underline{0.0283}$	$\underline{10.7330}$	$\underline{0.0487}$	$\underline{0.6685}$	$\underline{0.0374}$
	Replace GRU with MLP	0.2636	0.3364	23.4374	0.0305	0.4456	0.3654
	Del BOM	$\underline{0.4361}$	0.5955	31.3606	0.0219	0.2458	0.6688
	Replace PCM with Numerical solution	0.0895	0.4218	28.3057	0.0100	0.0638	0.9351
	Complete 3DOL	0.0857	0.8244	37.0914	0.0100	0.0638	0.9351

Imaged object	Method	LE	IOU	SNR(dB)
polyformaldehyde cube	IPS	0.2146	$\underline{0.4307}$	21.7862
	UHR-DeepFMT	$\underline{0.9783}$	0.6374	$\underline{21.5238}$
	3DOL	0.2087	0.6385	21.8625
mouse-1 (upper abdomen)	IPS	0.2683	$\underline{0.2705}$	$\underline{16.4255}$
	UHR-DeepFMT	$\underline{0.3728}$	0.2861	19.0865
	3DOL	0.315	0.5903	23.3078
mouse-2 (chest)	IPS	0.5114	$\underline{0.2503}$	$\underline{13.2358}$
	UHR-DeepFMT	$\underline{0.7145}$	0.2589	18.4093
	3DOL	0.2205	0.5924	25.2501
mouse-3 (lower abdomen)	IPS	0.5913	0.3208	$\underline{16.9699}$
	UHR-DeepFMT	$\underline{0.7833}$	$\underline{0.3048}$	23.2983
	3DOL	0.1636	0.6232	25.3348

Imaged object	Method	LE	IOU	SNR(dB)
mouse-4(GFP)	IPS	$\underline{0.5976}$	$\underline{0.3249}$	28.8585
	UHR-DeepFMT	0.3261	0.5913	$\underline{17.9910}$
	3DOL	0.0781	0.8455	36.8815
mouse-4(TF-IRDye800)	IPS	0.3130	$\underline{0.1847}$	$\underline{7.6728}$
	UHR-DeepFMT	$\underline{0.5976}$	0.2492	12.7911
	3DOL	0.1206	0.6534	17.1278
mouse-5(GFP)	IPS	0.4093	0.4917	$\underline{23.2107}$
	UHR-DeepFMT	$\underline{0.5990}$	$\underline{0.4008}$	25.8641
	3DOL	0.1802	0.6649	29.6218
mouse-5(TF-IRDye800)	IPS	0.5811	0.3317	18.4812
	UHR-DeepFMT	$\underline{0.6185}$	$\underline{0.2568}$	$\underline{16.4045}$
	3DOL	0.4228	0.5136	26.9783

Method	Parameters scale	Testing duration(s)	Output dimension	Network efficiency
IPS	$\underline{52244970}$	0.134( $\pm$ 0.008)	$\underline{3265 points}$	$\underline{6.249 E - 5}$
UHR-DeepFMT	4912912	0.949( $\pm$ 0.021)	65536 points	1.334E-2
3DOL	21690664	$\underline{1.315 (\pm 0.009)}$	655200 points	3.021E-2

3D-deep optical learning: a multimodal and multitask reconstruction framework for optical molecular tomography

Abstract

1. Introduction

2. Method

2.1 Optical field recovery

2.2 Luminous source reconstruction

2.3 Implementation and optimization procedure

3. Experiment setting

3.1 Imaging device

3.2 Optical parameters

3.3 Dataset

3.4 Simulation experiments

3.5 Phantom and in vivo experiments

3.6 Evaluation index

4. Results

4.1 Loss curve analysis

4.2 Network model evaluation

4.3 Numerical simulations results

4.4 Phantom and in vivo results

5. Discussion

6. Conclusion

Funding

Acknowledgments

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (4)

Equations (18)

Optics Express