100 kHz CH<sub>2</sub>O imaging realized by lower speed planar laser-induced fluorescence and deep learning

Wei Zhang; Xue Dong; Zhiwei Sun; Bo Zhou; Zhenkan Wang; Mattias Richter

doi:10.1364/OE.433785

1. Introduction

Formaldehyde (CH₂O) is a critical intermediate species which is produced in the low temperature oxidation region and then consumed in further combustion process [1–3]. CH₂O mostly appears at the preheat region. Thus is usually used as an indicator for such preheat zones [4]. The measurement of CH₂O is significant for resolving the spatial and temporal evolution of the flame preheat zone [5], which can provide further insights to the turbulence chemistry interaction [6–8] and also be useful for model validation. Taking methane flames as an example, the chemical reactions start with an attack on the fuel molecule by OH, O and H radicals which produces the methyl radical (CH₃). An oxygen atom can then combine with a methyl radical to produce H and CH₂O, i.e., that CH₃ + O $\to $ CH₂O + H. This direct pathway from methyl to formaldehyde is the dominant reaction in the preheat zone. Formaldehyde in turn reacts with OH, O and H radicals in a thin layer in which heat releases [9]. Therefore, CH₂O is usually regarded as a marker for the preheat zone, while in combustion diagnostics the spatial overlap of CH₂O and OH corresponds to the heat release zone or flame front given that it is challenging to directly detect the product of CH₂O and OH reaction, i.e., HCO, due to its low concentration. The edge of CH₂O distribution or planar laser-induced fluorescence (PLIF) images can also moderately represent turbulent structures of the flame front, thus the distribution of CH₂O is important data used for combustion model validations. Recent development of laser diagnostic techniques enables the visualization of combustion products with high spatio-temporal resolution [10–12]. For example, PLIF adopting the burst-mode laser system with prolonged sequence [13,14], has been reported to be successful in capturing the distribution of CH₂O with various ultra-high frequencies [10,15–19]. These previous works have demonstrated the advantages of PLIF in non-instructive, high spatio-temporal resolution and high precision. But to achieve a consecutive high-speed PLIF diagnostic, a complicated laser imaging system is always needed. Its delicate operation and high experimental cost constrain the penetration of this high-speed imaging technique. Moreover, the frequency restriction of laser and camera has been another limitation of PLIF application [20].

Alternatively, the frequency of measurement can potentially be increased by temporally interpolating computational PLIF images between the measured ones of lower speed [21–23]. As a promising method in computational imaging, deep learning algorithms, the convolutional neural network (CNN) in particular, has been widely applied in image classification [24], segmentation [25], objection detection [26], high resolution image generation [27,28] and frame sequence interpolation [29,30]. CNN inherits the concept and advantages of standard artificial neural network and has also been typically designed for 2D image processing [31,32]. CNN consists of forward computation [31], weights (learnable coefficients) update with optimization algorithms [32], overfitting diminishing [33], and structure pruning for computation alleviation [34]. These strategies make CNN a powerful nonlinear model that outperforms other machine learning methods in image feature capturing and reconstruction [35]. Beyond the classical implementation on single images, CNN structures also perform well on frame sequence interpolation, e.g., for resolving the subtle movement of fast-moving objects. These models predict the consecutive movement between input frames and then interpolating the intermediate frames in-between by simulating the subtle variation of the imaging objects, hence artificially increasing the measurement frequency, and reducing the requirement on camera setup while capturing high-speed motions. For example, a robust network presented by Niklaus et al. [30] employs a spatially adaptive CNN to capture the local motion between input frames and the intermediate frames. Likewise, an optimized version of adaptive CNN model with separable 1D convolution kernels and less weights was reported in [30]. Liu et al. trained a model, which combines the traditional optical flow and CNN, to synthesize intermediate frames from input sequences with high efficiency and precision [36]. These previous works on frame sequence interpolation using CNN inspires similar applications of the CNN models in high-speed flow diagnostics using planar imaging techniques to moderately overcome the repetition limit due to the optical system. However, considering the complex spatial structures of highly turbulent flows [37], a careful assessment of such an application of CNN in turbulent flows must be conducted.

Therefore, the main aim of this work is to explore the possibility of using a specified CNN model to predict the higher frequency of planar images. Taking CH₂O PLIF sequences collected in turbulent flame as example, high frequency frame sequences of 100 kHz are predicted from those of 50 kHz, 33 kHz, 25 kHz, and 20 kHz, respectively. The details of the CNN model, the process of data training and prediction, as well as the assessment of prediction accuracy are presented and discussed.

2. Methodology

2.1 Experimental setup

A hybrid porous-plug/jet burner with a centered nozzle of 1.5 mm in diameter was adopted for generating a premixed methane/air jet flame. The details of the burner, also named Lund University Piloted Jet burner, can be found in references [38,39]. This jet flame was stabilized by a reacting co-flow with a diameter of 61 mm. The reacting co-flow was generated from a premixed methane/air flat-flame with an equivalence ratio of 0.9 and a flow speed of 0.3 m/s. To regulate the gas flow rate, eight mass flow controllers (Bronkhorst) calibrated at 300 Kelvin with a precision higher than 98.5% were utilized. The speed of the jet flow is 66 m/s, corresponding to an exit Reynolds number of 5796 and a turbulent Reynolds number of 95 at y/d = 30. With regards to the laser-camera setup, as shown in Fig. 1, a 355 nm laser beam produced by a third harmonic generator (KDP type) of the 1064 nm laser was employed to excite CH₂O. The laser beam was then formed into a laser sheet of 24 mm in height by a cylindrical lens (f = – 75 mm) and a spherical lens (f = + 300 mm). The resulting pulse energy of the 355 nm laser was around 25 mJ/pause. An intensifier (LaVision HS-IRO) with a 40 ns exposure gate and a high-speed CMOS camera (Photron Fastcam SA-Z) were combined to record the CH₂O PLIF images. The intensifier was settled in the middle of its gain range. In front of the intensifier and camera, an objective lens (Nikkon f#1.2, f = 50 mm) and a GG385 long-pass filter were mounted, leading to a resolution of 31$\mu $m/pixel. The parameters of the laser imaging system are listed in Table 1. The CH₂O PLIF data were recorded with a frequency of 100 kHz. In this experiment, 55 segments of PLIF sequences containing 16,972 images were recorded in total. More details of the experimental setup can be found elsewhere [10].

Fig. 1. The experimental setup for high-speed (100 kHz) CH₂O PLIF imaging in a turbulent flame.

Download Full Size | PDF

Table 1. Parameters of laser imaging system

View Table | View all tables in this article

2.2 Description of the dataset

As the purpose of this study is to accelerate the CH₂O PLIF imaging, the initial 100 kHz experimental dataset was down sampled to 50 kHz, 33 kHz, 25 kHz, and 20 kHz, respectively, as input, to train the CNN based PLIF interpolation network. The consecutive frames in each dataset were then organized as shorter image sequence in the order of ${I_0}$, ${I_t}$, and ${I_1}$, corresponding to the start frame, ground truth frame, and end frame. The regularized interpolation timing $t \in ({0,\; 1} )$ between the start and end frames controls the timing of interpolated frames. For example, when t = 0.5, only one frame was interpolated in-between the two input consecutive images, which doubles the frequency of imaging. With the same definition, while t = 0.33 and 0.66, two frames were inserted in-between two input images at its respective timing, which triples the frequency of measurements. The specification of datasets and the corresponding interpolation timing is shown in Table 2.

Table 2. Specification of dataset

View Table | View all tables in this article

2.3 PLIF interpolation network

2.3.1 Interpolation strategy and network structure

This interpolation approach is based on an assumption that the motion of imaging objects in two consecutive frames can be reflected by optical flow [29,40,41], then such an estimated flow is combined with appropriate interpolation methods to simulate the un-imaged intermediate frames [23,29,42,43].

Interpolation strategy Let ${I_0}$ and ${I_1}$ represent two consecutive frames, regularized $t \in ({0,\; 1} )$ represents the interpolation timing, ${\hat{I}_t}$ denotes the interpolated frame, the estimated optical flow from timing t to 0 and t to 1 are described as ${\hat{F}_{t \to 0}}$ and ${\hat{F}_{t \to 1}}$, and the optical flow from timing 0 to 1 and 1 to 0 are expressed as ${F_{0 \to 1}}$ and ${F_{1 \to 0}}$, respectively. The key point of this method is to estimate the optical flow ${\hat{F}_{t \to 0}}$ and ${\hat{F}_{t \to 1}}$, then use them to form the interpolated frame ${\hat{I}_t}$. In practice, the interpolated frame ${\hat{I}_t}$ does not exist, and there is no access to estimate the optical flow fields ${\hat{F}_{t \to 0}}$ and ${\hat{F}_{t \to 1}}$. However, an interpolation approach was proposed by Jiang et al. [29], which fuses the interpolation timing t, optical flow ${F_{0 \to 1}}$ and ${F_{1 \to 0}}$ as follows:

(1)$${\hat{F}_{t \to 0}} ={-} ({1 - t} )t{F_{0 \to 1}} + {t^2}{F_{1 \to 0}}$$

(2)$${\hat{F}_{t \to 1}} = {({1 - t} )^2}{F_{0 \to 1}} - ({1 - t} )t{F_{1 \to 0}}$$

Correspondingly, let $g({{I_0},\; {{\hat{F}}_{t \to 0}}} )$ and $g({{I_1},\; {{\hat{F}}_{t \to 1}}} )$ represent the synthesized interpolated frames from two directions, respectively, where $ g({\cdot,\cdot } )$ is a backward warping function which can be implemented by bilinear interpolation [29,36,44]. In image-based interpolation methods, there is an underlying temporal consistency discipline, the closer the interpolation t to the frame ${I_0}$, the more contribution ${I_0}$ made to the interpolated frame ${\hat{I}_t}$, and vice versa. In addition, the consecutive motion causes the variation of the pixel intensity, which means a pixel is possibly partially occluded or appearing in the interpolated frame. The above issues are summarized as the pixel occlusion problem and addressed by the visible map [29]. The visible map V is defined as a matrix consisting of elements from 0 to 1, where 0 means pixels in interpolated frames are totally occluded by the motion of other pixels, while 1 represents there is no occlusion. Thus, together with the consideration of visibility issue ${V_{t \leftarrow 0}}$ and ${V_{t \leftarrow 1}}$, the interpolated frame can be represented as:

(3)$${\hat{I}_t} = {Z^{ - 1}} \odot [{({1 - t} ){V_{t \leftarrow 0}} \odot g({{I_0},\,{{\hat{F}}_{t \to 0}}} )+ t{V_{t \leftarrow 1}} \odot g({{I_1},\,{{\hat{F}}_{t \to 1}}} )} ]$$

where the timing factor t is introduced to combine the temporal consistency and $Z = ({1 - t} ){V_{t \leftarrow 0}}\; \, + \; \,t{V_{t \leftarrow 1}}$ is a normalization factor.

Improved visual map In motion-based interpolation methods, the movement of objects between frames always cause occlusion and variation. Thus, the visible map V is introduced as weighted coefficient of interpolated frames to control the in-between morphological variation of objects [43,45]. In this paper, rather than using the constant, manually settled, or lineally computed visual maps, a nonlinear, trainable and self-adaptive visible map was proposed to address the occlusion problem, as well as to afford flexible variation of targets in interpolated frames. This was achieved by the following equations:

(4)$${V_{t \leftarrow 1}} = \textrm{sigmo}id[{g({{I_0},\,{{\hat{F}}_{t \to 0}}} )- g({{I_1},\,{{\hat{F}}_{t \to 1}}} )} ]$$

(5)$${V_{t \leftarrow 0}} = 1 - {V_{t \leftarrow 1}}$$

where $\textrm{sig}moid(x )= 1/({1 + {e^{ - x}}} )$ and it is always utilized as the activation function in deep learning to enhance the non-linear effect. By combining the coarsely estimated bidirectional frames $\; g({{I_0},\; {{\hat{F}}_{t \to 0}}} )$ and $g({{I_1},\; {{\hat{F}}_{t \to 1}}} )$, as well as the non-linear activation function $\textrm{sig}moid({\cdot} )$, the computation of elements in visual map can be updated during the training process as the parameters of the network are globally optimized.

PLIF interpolation network This network is based on Super SloMo architecture [29]. Following the interpolation strategy mentioned above, estimating the bidirectional optical flow between the two input frames is the first and critical step for later computation [29]. In this study, a branch of CNN, the U-net structure [25], presented in the dashed block in Fig. 2, was trained to estimate the in-between flow. More precisely, the U-net structure was firstly adopted to approximate the optical flow ${F_{0 \to 1}}$ and ${F_{1 \to 0}}$ from the input frames ${I_0}$ and ${I_1}$. Then, these two bidirectional flows were used to compute the intermediate items $\textrm{g}({{I_1},{{\hat{F}}_{t \to 1}}} )$, ${\hat{F}_{t \to 1}}$, ${\hat{F}_{t \to 0}}$, and $\textrm{g}({{I_0},{{\hat{F}}_{t \to 0}}} )$, through the computation procedures as illustrated above. Together with the original frames ${I_1}$ and ${I_0}$, the factors were sent to the second U-net structure, i.e., arbitrary-timing flow interpolation net, to further refine the flow ${F_{t \to 1}}$ and ${F_{t \to 0}}$, as well as the nonlinear learnable visual map ${V_{t \to 0}}$, ${V_{t \to 1}}$. Eventually, these refined items are computed by Eq. (3) to complete the interpolation. In this study, the strategy was merged into an entire CNN structure and formed as an end-to-end model, shown in Fig. 2.

Fig. 2. Structure of the network for PLIF interpolation network based on Super SloMo architecture.

Download Full Size | PDF

The U-net is an encoding-decoding network particularly [25]. The basic computing unit of U-net structure in this paper is the hierarchy [29], which is composed of two convolutional layers and one Leaky ReLU layer. There are six hierarchies in the encoding part, at the end of each hierarchy except the last one, an average pooling layer with a stride of two is adopted to decrease the spatial dimension. Meanwhile, there are five hierarchies in the decoder part, at the beginning of each hierarchy, a bilinear up sampling layer with a factor of two is added to expand the spatial dimension. As was mentioned in [27], large filters should be used initially in flow computation CNN to capture the wide range motion. Thus, the 7 ${\times} $ 7 convolutional kernels are used in the first hierarchy and 5 ${\times} $ 5 kernels in second hierarchy. For the rest of the entire network, the 3 ${\times} $ 3 convolutional kernels are employed.

2.3.2 Loss function

The design of loss function is of great significance for the neural network, as it measures the difference between predicted results and ground truth and forces the convergence of the model. In this research, the loss function is a combination of four loss terms, including reconstruction loss ${l_r}$, perceptual loss ${l_p}$, warping loss ${l_w}$, and smoothness loss ${l_s}$ [29]. Following the weighting strategy in [29] that keeps the balance of convergence, the weighs of the loss terms are set as ${\lambda _r}$ = 10, ${\lambda _p}$ = 0.1, ${\lambda _w}$ = 1, ${\lambda _s}$ = 100:

(6)$$Loss = {\lambda _r}{l_r} + {\lambda _p}{l_p} + {\lambda _w}{l_w} + {\lambda _s}{l_s}$$

Given a predicted frame ${\hat{I}_t}$ and its ground truth ${I_t}$, the reconstruction loss ${l_r}$ calculates the pixel to pixel ${L_1}$ distance to qualify the image reconstruction.

(7)$${l_r} = \frac{1}{{m \times n}}\mathop \sum \nolimits_{i = 1}^{m \times n} |{{{\hat{I}}_t} - {I_t}} |$$

where m and n represent the column and row of the image.

Although the ${L_1}$ loss could quantify the difference between prediction and ground truth, it may still cause blurring terms in image predictions. According to [46], an extra loss term, named perceptual loss, that compares the difference of extracted features from a well-pretrained network, can be beneficial to preserve the details in reconstructed images [30,46]. Thus, an additional perceptual loss ${l_p}$ was introduced, to compare the feature extracted by the forth layer of a well-tuned VGG-16 network following [29]. It is worth noting that, the VGG-16 is designed with thirteen convolutional layers and three fully connected layers for image classification [47]. While, its well-pretrained convolutional layers act as the powerful feature extractors essentially, such a pretrained layer would provide a stable initialization and result in better convergence in later training [46]. The mathematical description of the perceptual loss can be written as:

(8)$${l_p} = \frac{1}{{m \times n}}\mathop \sum \nolimits_{i = 1}^{m \times n} |\emptyset ({\hat{I}_t}) - \emptyset ({{I_t}} ){|^2}$$

where $\emptyset $ means comparing the feature loss on higher dimension which was extracted by the fourth layer of well-designed and tuned VGG-16 network [47].

In addition to the straightforward comparison between the interpolated frame and the ground truth, a warping loss ${l_w}$ is also introduced to qualify the intermediate optical flow, which is defined as:

(9)$$\begin{aligned} {l_w} &= \frac{1}{{m \times n}}\mathop \sum \nolimits_{i = 1}^{m \times n} [|{{I_0} - g({{I_1},\; {F_{0 \to 1}}} )} |+ |{{I_1} - g({{I_0},\; {F_{1 \to 0}}} )} |\\ &\quad + |{{I_t} - g({{I_0},\,{{\hat{F}}_{t \to 0}}} )} |+ |{{I_t} - g({{I_1},\,{{\hat{F}}_{t \to 1}}} )} |] \end{aligned}$$

Finally, a smoothness loss ${l_s}$ is added to balance the flow variation among neighboring pixels [36].

(10)$${l_s} = |{\nabla {F_{0 \to 1}}} |+ |{\nabla {F_{1 \to 0}}} |\,$$

where $\nabla F$ is the total variance on the x and y components of optical flows.

All the strategy mentioned above are merged into the training of the PLIF interpolation network, which is described as a flowchart in Fig. 3.

Fig. 3. Flowchart of training CNN model for PLIF interpolation strategy.

Download Full Size | PDF

2.4 Indices for quantitative evaluation

To quantitively evaluate the accuracy of the model, four indices, i.e., interaction over union (IoU), peak signal to noise ratio (PSNR), structure similarity index (SSIM), and correlation coefficient, were adopted in this research. IoU is a direct pathway to distinguish the region of signal appearance in image pairs; whereas SSIM is a standard and generally used index in image processing to compare image similarity in structure, contrast, and intensity; PSNR is another index as a supplement to SSIM, which measures the similarity in the ratio of peak to average value of pixel intensity. Moreover, correlation coefficient is a common method to determine the mathematical correlation of two sets of data. These indices were used jointly, to evaluate the performance of the model from different perspective. Detailed definitions of these indices were given in the following section.

2.4.1 Intersection over union

For the predicted ${\hat{I}_t}$ and its ground truth ${I_t}$, their corresponding binary image ${\hat{I}_{tb}}$ and ${I_{tb}}$ can be acquired based on the binarization threshold. In this study, the threshold is the mean value of image intensity. Then, the intersection over union (IoU) can be calculated by overlapping the two binary images, as is shown in Eq. (11). IoU is used to quantify the prediction accuracy on signal occurrence. Specifically, the difference between the prediction and ground truth can be found, as is shown by the red region in the last column of Fig. 4. It should be noted that, there are sparse noises after binarization, in order to decrease the influence of these noises while keeping the major parts of CH₂O cluster, a 10 ${\times} $ 10 pixels (0.31 mm ${\times} $ 0.31 mm) smooth window was applied on the binary images before calculating IoU.

(11)$$IoU = \frac{{{{\hat{I}}_{tb}} \cap {I_{tb}}}}{{{{\hat{I}}_{tb}} \cup {I_{tb}}}}$$

Fig. 4. Frequency doubling by interpolating single frame. (a) 100 kHz PLIF predicted by 50kHz PLIF; (b) 50 kHz PLIF predicted by 25 kHz PLIF; (c) 33 kHz PLIF predicted by 16.5 kHz PLIF. ${I_t}$ is the ground truth image, while ${\hat{I}_{t}}$ is the prediction image interpolated.

Download Full Size | PDF

Fig. 5. Correlation coefficients of experimental PLIF images as a function of the frame interval.

Download Full Size | PDF

2.4.2 Peak signal to noise ratio

Peak signal to noise ratio (PSNR) is a widely used indicator to evaluate the pixel by pixel difference between two images. Firstly, the mean square error (MSE) of predicted ${\hat{I}_t}$ and its ground truth ${I_t}$ can be computed by the following equation:

(12)$$MSE = \frac{1}{{m \times n}}\mathop \sum \nolimits_{i = 1}^{m \times n} {({{{\hat{I}}_t} - {I_t}} )^2}$$

then, the PSNR can be computed as:

(13)$$PSNR = 10 \times {\log _{10}}\left( {\frac{{MA{X^2}}}{{MSE}}} \right)$$

where MAX = 255 represents the maximum gray levels of images.

2.4.3 Structure similarity index

Furthermore, the SSIM was also proposed to quantify the regional structure similarity of two images [48,49], which can be described as:

(14)$$SSIM({{{\hat{I}}_t},{I_t}} )= \frac{{2{\mu _{{{\hat{I}}_t}}}{\mu _{{I_t}}} + {C_1}}}{{{\mu _{{{\hat{I}}_t}}}^2 + {\mu _{{I_t}}}^2 + C}} \times \frac{{2{\sigma _{{{\hat{I}}_t}{I_t}}} + {C_2}}}{{{\sigma _{{{\hat{I}}_t}}}^2 + {\sigma _{{I_t}}}^2 + {C_2}}}$$

where $\mu $ and $\sigma $ represent the mean and variance of ${\hat{I}_t}$ and ${I_t}$, ${\sigma _{{{\hat{I}}_t}{I_t}}}$ denotes their covariance. Following [48], C₁ = (0.01L)² = 6.5025 and C₂ = (0.03L)²= 58.5225 are constants introduced to keep the stability in calculation, where L = 255 is the image gray level. In practice, the SSIM is calculated on sub-windows first, a window of 11 ${\times} $ 11 pixels (0.34 mm ${\times} $ 0.34 mm) in particular, then averaged to the full scale of image to form a global value [48].

2.4.4 Correlation coefficient

As a simple and efficient index to evaluate the correlation of two vectors, the correlation coefficient was adopted to quantitatively measure the accuracy of time averaged prediction over various flame heights [50]. The correlation coefficient $R({\hat{v},v} )$ is defined as:

(15)$$R({\hat{v},v} )= \frac{{\hat{v},v}}{{\vert\vert\hat{v}\vert\vert\vert\vert v \vert\vert}}$$

where vector $\hat{v}$ and v denote the intensity of signal on a specific flame height of the time averaged prediction and ground truth, respectively.

In this research, the IoU, PSNR, and SSIM were adopted to evaluate the performance of the model on full image scale, while correlation coefficient was used to quantify the accuracy of the time averaged prediction on various flame heights. With regards to the range of these four indices, IoU, SSIM, and correlation coefficient all range from 0 to 1, where 0 denotes the predicted results are completely different, while 1 indicates that the prediction fully agrees with the experimental result. Meanwhile, the PSNR rages from 0 to $\infty $, with higher PSNR representing higher prediction accuracy. According to some previous research, an IoU and SSIM of 0.65 and above [49,51], and/or a PSNR of 20 dB and above [48] suggests that the model has well-reconstructed the images with high similarity.

Figure 5 and Table 3 show the correlation coefficients of experimental PLIF frames of different frame intervals, with results averaged over 1,000 image pairs. The correlation of the consecutive frames and the frames with longer intervals are all at a relatively low level. Specifically, even when the image pair of measurement timing 0 $\mu $s and 10 $\mu $s look very similar, its SSIM value is only 0.23. This suggests that for SSIM to achieve 0.7–0.8, the image pair needs to be significantly similar in structure and intensity distribution.

Table 3. The average correlation coefficient of experimental PLIF frames at different frame intervals

View Table | View all tables in this article

3. Results and discussion

The CNN model was firstly trained to predict a single frame between two input PLIF images, which doubles the frequency of the input PLIF sequence, e.g., 50 kHz to 100 kHz, 25 kHz to 50 kHz, and 16.5 kHz to 33 kHz. Then, the capability of the model to further increase the measuring frequency was also explored, e.g., 33 kHz to 100 kHz, 25 kHz to 100 kHz, and 20 kHz to 100 kHz. The computation was completed on GPU NVIDIA Tesla V100, the model training took 85 hours, while interpolation was within 200 ms for each frame.

3.1 Frequency doubling by interpolating single frame

For frequency doubling, the interpolated timing t, as it was defined in Section 2.3.1, is 0.5. Three case scenarios were studied in this research, i.e., 50 kHz to 100 kHz, 25 kHz to 50 kHz, and 16.5 kHz to 33 kHz. Figure 4 shows the gray-scale images of input ${I_0}$ and ${I_1}$, ground truth ${I_t}$ and prediction ${\hat{I}_t}$, together with binary images of ${I_{tb}}$ and ${\hat{I}_{tb}}$, as well as the IoU image. It can be seen from the IoU image in Fig. 4 that, the prediction results agree with most part of the measured CH₂O distribution, except for a minor area close to the margin of the CH₂O cluster. It is also shown in Table 4 that the IoU ranges from 0.83 to 0.86, representing a relatively high degree of coincidence between prediction and measurement. Meanwhile, high PSNR and SSIM also suggests a high accuracy of the proposed model.

Table 4. Indices for the prediction results in Fig. 4

View Table | View all tables in this article

In addition to the instantaneous results, statistical data calculated from 1,000 image pairs were also acquired and presented in Table 5 and Fig. 6. It can be found from Table 5 that, for all the three indices, the statistical mean for the 16.5 kHz to 33 kHz case is the lowest among all the three cases studied, while those for the 50 kHz to 100 kHz case is the highest. Figure 6 shows the probability density function (PDF) curves of IoU and SSIM for the three cases studied. Consistent with the results in Table 5, Fig. 6 shows that IoU was distributed within a relatively higher level for the case of 50 kHz to 100 kHz, compared with the other two cases. Meanwhile, the distribution curve of SSIM is also narrower and presents higher peak for the case of 50 kHz to 100 kHz. These results suggest that the performance of the model depends on the initial frequency of the data, especially when it comes to highly turbulent flames, a higher input frequency is required for the model to reconstruct the rapid temporal variation of the flow/flame.

Fig. 6. PDFs of (a) IoU and (b) SSIM for the frequency doubling performance of the model under conditions of 50 kHz to 100 kHz, 25 kHz to 50 kHz, and 16.5 kHz to 33 kHz. Each curve was calculated from 1,000 image pairs.

Download Full Size | PDF

Table 5. Averaged indices for the prediction of various PLIF frequency

View Table | View all tables in this article

Moreover, the time averaged results for 100 consecutive ground truth and predicted PLIF frames, as well as their profiles crossed at axial position of 10 mm are shown in Fig. 7. The time averaged predicted and measured PLIF sequence present a high degree of agreement in structure and intensity, and the mean profiles shows a maximum deviation of 2.9%, 6.5%, and 8.2% for the 50 kHz to 100 kHz, 25 kHz to 50 kHz, and 16.5 kHz to 33 kHz cases, respectively. Furthermore, the correlation coefficient was used to evaluate the similarity of the time averaged prediction and ground truth on various axial positions, as is shown in Table 6. It can be seen that, for all the three cases studied, the correlation coefficient is the highest around 9 mm – 12 mm on axial position, probably because this is the region with the highest signal to noise ratio for PLIF images, which contributes to feature abstraction and enhances optical flow estimation thereby. Meanwhile, by comparing the three case scenarios, it can be found that 50 kHz to 100 kHz results in the highest correlation coefficient at axial position, which again demonstrates the fact that the model has more potential in capturing the highly dynamic motion of turbulent flow/combustion from higher frequency PLIF sequences.

Fig. 7. Time averaged results of the predicted and measured (ground truth) PLIF images under (a) 50 kHz to 100 kHz, (b) 25 kHz to 50 kHz and (c) 16.5 kHz to 33 kHz. Each map was acquired from 100 predicted frames. All the mean profile crossed at axial position of 10 mm.

Download Full Size | PDF

Table 6. Correlation coefficient for time averaged prediction results on various axial position

View Table | View all tables in this article

3.2 Frequency multiplication by interpolating multiple frames

The frequency of PLIF sequences can be further multiplied by interpolating more frames, with various interpolation timing t, e.g., t = 0.33, 0.66 for frequency tripling from 33 kHz to 100 kHz; t = 0.25, 0.5, 0.75 for frequency quadrupling from 25 kHz to 100 kHz; and t = 0.2, 0.4, 0.6, 0.8 for frequency quintupling from 20 kHz to 100 kHz. These three interpolation models were trained in this study, with the results showing in Fig. 8, Fig. 9, and Fig. 10, and the corresponding precision indices summarized in Table 7, Table 8, and Table 9, correspondingly. It can be seen from Fig. 8 to Fig. 10 that, the interpolated frames agree with the experimental data for the most part of the CH₂O clusters, indicating that the proposed model captured the motion of the CH₂O cluster and simulated its temporal variation. And the similarity indices, including IoU, PSNR, and SSIM in Tables 7,8, and 9, are mainly around 0.78, 28.00 dB, and 0.80, respectively, further demonstrating the relatively high degree of similarity in pixel intensity distribution.

Fig. 8. Frequency tripling by interpolating two frames, i.e. from 33 kHz to 100 kHz. (a) the experimental 100 kHz PLIF sequence; (b) the predicted intermediate PLIF frames; (c) and (d) are binary sequence of experimental and predicted PLIF; (e) IoU of experimental and prediction results. ${{I}_{t}}$ is the ground truth image, while ${\hat{I}_{t}}$ is the prediction image interpolated.

Download Full Size | PDF

Fig. 9. Frequency quadrupling by interpolating three frames, i.e. from 25 kHz to 100 kHz. (a) the experimental 100 kHz PLIF sequence; (b) the predicted intermediate PLIF frames; (c) and (d) are binary sequence of experimental and predicted PLIF; (e) IoU of experimental and prediction results. ${{I}_{t}}$ is the ground truth image, while ${\hat{I}_{t}}$ is the prediction image interpolated.

Download Full Size | PDF

Fig. 10. Frequency quintupling by interpolating four frames, i.e. from 20 kHz to 100 kHz. (a) the experimental 100 kHz PLIF sequence; (b) the predicted intermediate PLIF frames; (c) and (d) are binary sequence of experimental and predicted PLIF; (e) IoU of experimental and prediction results. ${{I}_{t}}$ is the ground truth image, while ${\hat{I}_{t}}$ is the prediction image interpolated.

Download Full Size | PDF

Table 7. Indices for the prediction results in Fig. 8

View Table | View all tables in this article

Table 8. Indices for the prediction results in Fig. 9

View Table | View all tables in this article

Table 9. Indices for the prediction results in Fig. 10

View Table | View all tables in this article

The statistical results of the indices calculated from 1,000 interpolated frames for the frequency multiplication tests are presented in Table 10 and Fig. 11. It can be seen from Table 10 that, the IoU, PSNR, and SSIM centered around 0.76, 27.00 dB, and 0.82, respectively, suggesting that the model is capable of predicting multiple intermediate frames to form a higher frequency PLIF sequence with an acceptable accuracy. Additionally, PDF curves of IoU and SSIM in Fig. 11, together with the mean indices in Table 10, reveal a declining trend in precision with the increasing interpolated frames between two consecutive PLIF CH₂O images, this might result from the instantaneous fluctuation and variation of the high turbulent flow/flame.

Fig. 11. PDFs of (a) IoU and (b) SSIM for the frequency tripling, quadrupling, and quintupling performance of the model under conditions of 33 kHz to 100 kHz, 25 kHz to 100 kHz, and 20 kHz to 100 kHz. Each curve was calculated from 1,000 image pairs.

Download Full Size | PDF

Table 10. Averaged indices for the various predictions

View Table | View all tables in this article

Furthermore, the time averaged results of 100 consecutive PLIF frames were calculated and shown in Fig. 12, for both experimental and predicted images, under frequency tripling, quadrupling, and quintupling conditions. Their profiles crossed at axial position of 10 mm are also shown in Fig. 12. The time averaged CH₂O distribution shows a small bias between ground truth and predicted (interpolated) results, with a maximum deviation within 6.2%, 8.3%, and 9.1% for frequency tripling, quadrupling, and quintupling, respectively. Besides, the correlation coefficient at various axial position between the time averaged ground truth and predicted frames is shown in Table 11. Consistent with the correlation comparison in Section 3.1, more accurate prediction occurred in the region where CH₂O is clustered, e.g., axial position from 9 mm – 12 mm in this case, rather than higher or lower axial positions where CH₂O distributed as discrete and independent spots. Moreover, Table 11 also shows the impact of the interpolation steps on prediction accuracy. Specifically, with the increase of interpolation steps, the overall level of correlation coefficient decreases, indicating a lower prediction accuracy.

Fig. 12. Time averaged results of the predicted and measured (ground truth) PLIF images for (a) 33 kHz to 100 kHz PLIF; (b) 25 kHz PLIF to 100 kHz PLIF; (c) 20 kHz PLIF to 100 kHz PLIF. Each map was acquired from 100 predicted frames. All the mean profile crossed at axial position of 10 mm.

Download Full Size | PDF

Table 11. Correlation coefficient for time averaged prediction results on various axial positions

View Table | View all tables in this article

To summarize and compare the influence of initial PLIF frequency and the interpolation steps on prediction precision, similarity indices calculated on 1,000 interpolation results for both single interpolation (frequency doubling) and multiple frames interpolation (frequency multiplication) are shown in Fig. 13. For single frame prediction, the precision tends to drop as the initial frequency of PLIF sequence diminishes, statistical results indicating that, comparing with the 50 kHz to 100 kHz interpolation, IoU, PSNR, and SSIM is decreasing by 6.9%, 10.1%, and 8.0% for 25 to 50 kHz case, and by 12.9%, 3.4% and 4.7% for 16.5 to 33 kHz case, respectively. While for the multiple frame prediction, with the increasing amount of the interpolated steps, the mean value of the indices, IoU, PSNR, and SSIM decreased by 2.5%, 3.8%, and 3.4% for frequency tripling, 7.3%, 3.5%, and 5.9% for frequency quadrupling, compared with the frequency doubling case.

Fig. 13. The mean value of indices of (a) IoU, (b) PSNR, and (c) SSIM as a function the number of flames interpolated. All results were calculated over 1,000 interpolation results.

Download Full Size | PDF

The difference in model performance under various conditions may be explained by the characteristics time scale of turbulent flows. As was mentioned above, the speed of the jet flow in the experiment is 66 m/s, which corresponds to a Taylor time scale of around 100 $\mu s$ at y/d = 30. The Taylor time scale indicates the characteristic time the small vortex correlates with itself, or in this case, the time flame structures are correlated with themselves. Here, a Taylor time scale of 100 $\mu s$ is considered to be reasonable to measure the time of flame self-correlation, because as can be seen from Fig. 10, within around 50 $\mu s$, the flame has moved upwards by half of the frame length. In this research, the experimental PLIF of 16.5 kHz, 20 kHz, 25 kHz, 33 kHz and 50 kHz were studied, and the corresponding interval were 60.06, 50, 40, 30.3, and 20 $\mu s$. For the cases of intervals approaching 100 $\mu s$, interpolation results tend to be worse because it is more difficult for the model to capture the spatial variation in flame structures with weak correlation. The relationship between Taylor time scale and imaging frequency is only one aspect to evaluate the interpolation quality. In practice, there are many other influencing factors, e.g., flow speed, turbulence and field of view. But all these factors can be evaluated by the correlation indices between the consecutive experimental frames. That is to say, if the IoU, PSNR, and SSIM of consecutive experimental frames are higher than a certain minimal level, a reasonable interpolation can be made. One can compare Table 3 of section 2.4, and the interpolation results presented in this research to find out a minimal correlation index between consecutive experimental data, for this model to reconstruct high quality interpolated PLIF frames.

However, such a discussion was made under the condition that the model was trained with 55 segments of PLIF sequences with 16,972 images in total. The performance of deep learning models highly depends on the amount of dataset, especially for the image-based tasks. Since a larger dataset always contains sufficient variation patterns that could force the model be more adaptive through training, and then to expand the applicability and improve the accuracy thereby. For example, in standard image training tasks, e.g., classification [47] or detection [51], tens of thousands images are often utilized to guarantee high accuracy and adaptability. Therefore, it is expected that with abundant high frequency experimental PLIF frames as training data, the model could better capture the temporal-spatial variation among consecutive frames even when they are weakly correlated with each other.

3. Conclusion

In this research, a Super SloMo based PLIF interpolation network was adopted to predict the intermediate frames between two consecutive CH₂O PLIF frames, hence to increase the frequency of the PLIF sequence and reduce the requirement on the experimental setup for ultrahigh-speed PLIF. This was achieved by training the proposed CNN based model to extract and refine the optical flow between the input frames, and then use the predicted optical flow, together with the improved visual map to synthesize the intermediated frames. The model was trained for predicting single and multiple frames, results show that:

(1) For the frequency doubling of PLIF sequence, e.g., interpolate single frame in-between 50 kHz PLIF sequence, the prediction could achieve a high precision above 0.86, 32.88 dB, and 0.88 for IoU, PSNR, and SSIM. Time averaged results for both ground truth and prediction show a similar CH₂O intensity and distribution, with the prediction bias of mean profile crossed at axial position of 10 mm less than 2.9%. Meanwhile, the correlation coefficient for the time averaged predicted PLIF sequence on the clustered region of CH₂O (axial position ranges from 6 to 15 mm) mainly concentrates above 0.80.
(2) The precision declines with the decrease of the experimental PLIF frequency (increasing interval time). Comparing with the 50 kHz to 100 kHz prediction, IoU, PSNR, and SSIM is dropping by 6.9% and 10.1%, 8.0% and 12.9%, 3.4% and 4.7% for 25 kHz to 50 kHz and 16.5 kHz to 33 kHz predictions, respectively.
(3) For the frequency multiplication of PLIF sequence, the average prediction accuracy is 0.78, 28.49 dB, and 0.85 for IoU, PSNR, and SSIM under two frames interpolation. While for more PLIF frames interpolation, increasing the amount of interpolated frames (interval time) or decreasing the PLIF frequency causes decline of the prediction accuracy. The IoU, PSNR, and SSIM declines by 2.5% and 3.8%, 3.4% and 7.3%, 3.5% and 5.9% for three and four frames interpolation, respectively, when comparing with that of the two frames interpolation.
(4) Research also found that the prediction precision is influenced by the relationship between interpolation interval and the characteristic time scale of turbulent flow itself. The model performance deteriorates when the interpolation interval approaches the characteristic time scale of the turbulent flow due to the weak correlation between the input consecutive frames.

This work provides a promising approach that employs a well-trained off-site convolutional neural network to achieve imaging acceleration by interpolating multiple frames. The proposed method is capable in overcoming the limitation of experimental setup for high-speed laser-camera systems, and has potential for revealing the ultra-rapid evolution of the distribution of instantaneous species in turbulent flames with high spatio-temporal resolution.

Funding

China Postdoctoral Science Funding (2016M600313); Shanghai Sailing Program (19YF1423400); National Natural Science Foundation of China (52006137).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. L. Paxton, A. Giusti, E. Mastorakos, and F. N. Egolfopoulos, “Assessment of experimental observables for local extinction through unsteady laminar flame calculations,” Combust. Flame 207, 196–204 (2019). [CrossRef]

2. U. Retzer, R. Pan, T. Werblinski, F. J. T. Huber, M. N. Slipchenko, T. R. Meyer, L. Zigan, and S. Will, “Burst-mode OH/CH2O planar laser-induced fluorescence imaging of the heat release zone in an unsteady flame,” Opt. Express 26(14), 18105–18114 (2018). [CrossRef]

3. J. A. Wagner, S. W. Grib, J. W. Dayton, M. W. Renfro, and B. M. Cetegen, “Flame stabilization analysis of a premixed reacting jet in vitiated crossflow,” Proc. Combust. Inst. 36(3), 3763–3771 (2017). [CrossRef]

4. Z. Wang, P. Stamatoglou, Z. Li, M. Alden, and M. Richter, “Ultra-high-speed PLIF imaging for simultaneous visualization of multiple species in turbulent flames,” Opt. Express 25(24), 30214–30228 (2017). [CrossRef]

5. I. A. Mulla, A. Dowlut, T. Hussain, Z. M. Nikolaou, S. R. Chakravarthy, N. Swaminathan, and R. Balachandran, “Heat release rate estimation in laminar premixed flames using laser-induced fluorescence of CH2O and H-atom,” Combust. Flame 165, 373–383 (2016). [CrossRef]

6. L. Yang, W. Weng, Y. Zhu, Y. He, Z. Wang, and Z. Li, “Investigation of Dilution Effect on CH4/Air Premixed Turbulent Flame Using OH and CH2O Planar Laser-Induced Fluorescence,” Energies 13, (2020).

7. M. Hajilou, M. Q. Brown, M. C. Brown, and E. Belmont, “Investigation of the structure and propagation speeds of n-heptane cool flames,” Combust. Flame 208, 99–109 (2019). [CrossRef]

8. W. Weng, E. Nilsson, A. Ehn, J. Zhu, Y. Zhou, Z. Wang, Z. Li, M. Aldén, and K. Cen, “Investigation of formaldehyde enhancement by ozone addition in CH4/air premixed flames,” Combust. Flame 162(4), 1284–1293 (2015). [CrossRef]

9. S. Turns, An introduction to combustion : concepts and applications (An introduction to combustion : concepts and applications, 2000).

10. Z. Wang, P. Stamatoglou, B. Zhou, M. Aldén, X.-S. Bai, and M. Richter, “Investigation of OH and CH2O distributions at ultra-high repetition rates by planar laser induced fluorescence imaging in highly turbulent jet flames,” Fuel 234, 1528–1540 (2018). [CrossRef]

11. N. Jiang, P. S. Hsu, S. W. Grib, and S. Roy, “Simultaneous high-speed imaging of temperature, heat-release rate, and multi-species concentrations in turbulent jet flames,” Opt. Express 27(12), 17017–17026 (2019). [CrossRef]

12. M. S. Bak and M. A. Cappelli, “Successive laser ablation ignition of premixed methane/air mixtures,” Opt. Express 23(11), A419–A427 (2015). [CrossRef]

13. J. P. Crimaldi, “Planar laser induced fluorescence in aqueous flows,” Exp. Fluids 44(6), 851–863 (2008). [CrossRef]

14. B. Thurow, N. Jiang, and W. Lempert, “Review of ultra-high repetition rate laser diagnostics for fluid dynamic measurements,” Meas.Sci. Technol. 24(1), 012002 (2013). [CrossRef]

15. J. Sjöholm, J. Rosell, B. Li, M. Richter, Z. Li, X.-S. Bai, and M. Aldén, “Simultaneous visualization of OH, CH, CH2O and toluene PLIF in a methane jet flame with varying degrees of turbulence,” Proc. Combust. Inst. 34(1), 1475–1482 (2013). [CrossRef]

16. B. O. Ayoola, R. Balachandran, J. H. Frank, E. Mastorakos, and C. F. Kaminski, “Spatially resolved heat release rate measurements in turbulent premixed flames,” Combust. Flame 144(1-2), 1–16 (2006). [CrossRef]

17. S. D. Hammack, C. D. Carter, A. W. Skiba, C. A. Fugger, J. J. Felver, J. D. Miller, J. R. Gord, and T. Lee, “20 kHz CH2O and OH PLIF with stereo PIV,” Opt. Lett 43(5), 1115–1118 (2018). [CrossRef]

18. J. R. Osborne, S. A. Ramji, C. D. Carter, S. Peltier, S. Hammack, T. Lee, and A. M. Steinberg, “Simultaneous 10 kHz TPIV, OH PLIF, and CH2O PLIF measurements of turbulent flame structure and dynamics,” Exp. Fluids 57(5), 65 (2016). [CrossRef]

19. A. W. Skiba, C. D. Carter, S. D. Hammack, J. D. Miller, J. R. Gord, and J. F. Driscoll, “The influence of large eddies on the structure of turbulent premixed flames characterized with stereo-PIV and multi-species PLIF at 20 kHz,” Proc. Combust. Inst. 37(2), 2477–2484 (2019). [CrossRef]

20. F. Xing, Y. Huang, M. Zhao, and J. Zhao, “The Brief Introduction of Different Laser Diagnostics Methods Used in Aeroengine Combustion Research,” J. Sensors 2016, 1–13 (2016).

21. B. Pan and L. Tian, “Advanced video extensometer for non-contact, real-time, high-accuracy strain measurement,” Opt. Express 24(17), 19082–19093 (2016). [CrossRef]

22. S. Meyer, O. Wang, H. Zimmer, M. Grosse, and A. Sorkine-Hornung, “Phase-based frame interpolation for video,” in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1410–1418 (2015).

23. U. S. Kim and M. H. Sunwoo, “New Frame Rate Up-Conversion Algorithms With Low Computational Complexity,” IEEE Trans. Circuits Syst. Video Technol. 24(3), 384–393 (2014). [CrossRef]

24. Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE 86(11), 2278–2324 (1998). [CrossRef]

25. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, Springer International Publishing, 234–241 (2015).

26. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” (2015), p. arXiv:1506.01497.

27. J. Caballero, C. Ledig, A. Aitken, A. Acosta, J. Totz, Z. Wang, and W. Shi, “Real-Time Video Super-Resolution with Spatio-Temporal Networks and Motion Compensation,” in2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2848–2857 (2017).

28. D. Chen, X. Sang, W. Peng, X. Yu, and H. C. Wang, “Multi-parallax views synthesis for three-dimensional light-field display using unsupervised CNN,” Opt. Express 26(21), 27585–27598 (2018). [CrossRef]

29. H. Jiang, D. Sun, V. Jampani, M. Yang, E. Learned-Miller, and J. Kautz, “Super SloMo: High Quality Estimation of Multiple Intermediate Frames for Video Interpolation,” in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9000–9008 (2018).

30. S. Niklaus, L. Mai, and F. Liu, “Video Frame Interpolation via Adaptive Separable Convolution,” in2017 IEEE International Conference on Computer Vision (ICCV), 261–270 (2017).

31. G. Barbastathis, A. Ozcan, and G. Situ, “On the use of deep learning for computational imaging,” Optica 6(2019).

32. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

33. S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” (2015), p. arXiv:1502.03167.

34. Y. He, X. Zhang, and J. Sun, “Channel Pruning for Accelerating Very Deep Neural Networks,” (2017), p. arXiv:1707.06168.

35. Y. Jin, W. Zhang, Y. Song, X. Qu, Z. Li, Y. Ji, and A. He, “Three-dimensional rapid flame chemiluminescence tomography via deep learning,” Opt. Express 27(19), 27308–27334 (2019). [CrossRef]

36. Z. Liu, R. A. Yeh, X. Tang, Y. Liu, and A. Agarwala, “Video Frame Synthesis Using Deep Voxel Flow,” in2017 IEEE International Conference on Computer Vision (ICCV), 4473–4481 (2017).

37. B. Coriton, A. M. Steinberg, and J. H. Frank, “High-speed tomographic PIV and OH PLIF measurements in turbulent reactive flows,” Exp. Fluids 55(6), 1743 (2014). [CrossRef]

38. B. Zhou, J. Kiefer, J. Zetterberg, Z. Li, and M. Aldén, “Strategy for PLIF single-shot HCO imaging in turbulent methane/air flames,” Combust. Flame 161(6), 1566–1574 (2014). [CrossRef]

39. B. Zhou, C. Brackmann, Z. Li, M. Aldén, and X.-S. Bai, “Simultaneous multi-species and temperature visualization of premixed flames in the distributed reaction zone regime,” Proc. Combust. Inst. 35(2), 1409–1416 (2015). [CrossRef]

40. L. Xu, J. Jia, and Y. Matsushita, “Motion detail preserving optical flow estimation,” IEEE Trans. Pattern Anal. Mach. Intell. 34(9), 1744–1757 (2012). [CrossRef]

41. S. Wei and J. U. Kang, “Optical flow optical coherence tomography for determining accurate velocity fields,” Opt. Express 28(17), 25502–25527 (2020). [CrossRef]

42. B. Choi, J. Han, C. Kim, and S. Ko, “Motion-Compensated Frame Interpolation Using Bilateral Motion Estimation and Adaptive Overlapped Block Motion Compensation,” IEEE Trans. Circuits Syst. Video Technol. 17(4), 407–416 (2007). [CrossRef]

43. S. G. Jeong, C. Lee, and C. S. Kim, “Motion-compensated frame interpolation based on multihypothesis motion estimation and texture optimization,” IEEE Trans. Image Process. 22(11), 4497–4509 (2013). [CrossRef]

44. T. Zhou, S. Tulsiani, W. Sun, J. Malik, and A. A. Efros, “View Synthesis by Appearance Flow,” in Computer Vision – ECCV 2016, Springer International Publishing, 286–301 (2016).

45. C. Rhemann, C. Rother, J. Wang, M. Gelautz, P. Kohli, and P. Rott, “A perceptually motivated online benchmark for image matting,” in2009 IEEE Conference on Computer Vision and Pattern Recognition, 1826–1833 (2009).

46. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual Losses for Real-Time Style Transfer and Super-Resolution,” inComputer Vision – ECCV 2016, Springer International Publishing,694–711 (2016).

47. K. Simonyan and A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition,” arXiv e-prints, arXiv:1409.1556 (2014).

48. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. Image Process. 13(4), 600–612 (2004). [CrossRef]

49. W. Zhang, X. Dong, C. Liu, G. J. Nathan, B. B. Dally, A. Rowhani, and Z. Sun, “Generating planar distributions of soot particles from luminosity images in turbulent flames using deep learning,” Appl. Phys. B 127(2021).

50. J. Yang, M. Xu, D. L. S. Hung, Q. Wu, and X. Dong, “Influence of swirl ratio on fuel distribution and cyclic variation under flash boiling conditions in a spark ignition direct injection gasoline engine,” Energy Convers. Manage. 138, 565–576 (2017). [CrossRef]

51. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779–788 (2016).

Parameter	Setting
Intensifier gain	50 (full value = 100)
Camera lens	Nikkon f# = 1.2, f = 50 mm
Field of views	20 mm $\times$ 12 mm
Camera Resolution	640 $\times$ 384 pixels, 31 $μ$ m/pixel
Optical filter	GG385
Intensifier gate	40 ns
Excitation wavelength	355 nm
Laser energy	25 mJ/pulse

Frame interval ( $μ$ s)	IoU	SSIM	PSNR (dB)
0	1.00	1.00	$\infty$
10	0.47	0.23	20.13
20	0.44	0.20	19.55
30	0.42	0.16	18.32
40	0.41	0.15	18.39
50	0.40	0.15	17.99

Interpolation timing t	Index
	IoU	PSNR (dB)	SSIM
0.33	0.79	25.07	0.78
0.66	0.83	29.36	0.84

PLIF frequency (kHz)	Correlation coefficient
	Axial position (mm)						Full scale
	3	6	9	12	15	18
33 to 100	0.71	0.76	0.85	0.83	0.70	0.67	0.81
25 to 100	0.68	0.74	0.80	0.80	0.71	0.65	0.76
20 to 100	0.65	0.71	0.79	0.77	0.69	0.64	0.73

Parameter	Setting
Intensifier gain	50 (full value = 100)
Camera lens	Nikkon f# = 1.2, f = 50 mm
Field of views	20 mm $\times$ 12 mm
Camera Resolution	640 $\times$ 384 pixels, 31 $μ$ m/pixel
Optical filter	GG385
Intensifier gate	40 ns
Excitation wavelength	355 nm
Laser energy	25 mJ/pulse

100 kHz CH₂O imaging realized by lower speed planar laser-induced fluorescence and deep learning

Abstract

1. Introduction

2. Methodology

2.1 Experimental setup

2.2 Description of the dataset

2.3 PLIF interpolation network

2.3.1 Interpolation strategy and network structure

2.3.2 Loss function

2.4 Indices for quantitative evaluation

2.4.1 Intersection over union

2.4.2 Peak signal to noise ratio

2.4.3 Structure similarity index

2.4.4 Correlation coefficient

3. Results and discussion

3.1 Frequency doubling by interpolating single frame

3.2 Frequency multiplication by interpolating multiple frames

3. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (11)

Equations (15)

Optics Express

Dataset	Sequence frequency (kHz)		Interpolated frames	Interpolation timing t
	Training set	Testing set
I	100	50	1	0.5
II	50	25	1	0.5
III	33	16.5	1	0.5
IV	100	33	2	0.33, 0.66
V	100	25	3	0.25, 0.5, 0.75
VI	100	20	4	0.2, 0.4, 0.6, 0.8

PLIF frequency (kHz)	Index
	IoU	PSNR (dB)	SSIM
50 to 100	0.86	32.88	0.88
25 to 50	0.85	27.69	0.87
16.5 to 33	0.83	28.29	0.85

Interpolation timing t	Index
	IoU	PSNR (dB)	SSIM
0.25	0.78	25.39	0.77
0.5	0.77	26.59	0.82
0.75	0.80	29.71	0.85