Real-time audio detection and regeneration of moving sound source based on optical flow algorithm of laser speckle images

Nan Wu; S. Haruyama

doi:10.1364/OE.383442

1. Introduction

Sound detection with optical means is an appealing research topic due to its broad application prospects, such as remote monitoring, rescue, and so on [1,2]. One of the approaches is detecting sound with laser speckle images. The principle of laser speckle method is simple: when a coherent light is reflected by an optically rough surface, a high-contrast grainy speckle pattern can be observed with an image device due to the interferometry of the multiple reflection light waves [3]. A major property of speckle pattern is that the speckle motion is very sensitive to the motion of the object [4,5]. The captured speckle pattern shows significant displacement even the object is moved slightly. Based on this property, sound vibrations can be detected with speckle images, and the audio signal can be recovered by extracting information from the movement of the captured speckle image sequence. Compared with other methods, like the interferometric or holographic measurement method [6,7], the laser speckle detection method has a simple structure and low hardware cost and can achieve remote sound detection. Previously there were several researches on recovering sound with laser speckle, mainly focused on the applications in remote monitoring. In [8], the authors proposed a remote sound extraction system based on laser speckle. The result shows they can record the speech or heart beats with a distance up to 100 meters. In [9], the authors proposed an intensity variance-based method for sound recovery via the appropriate pixels’ gray-value variations from the laser speckle patterns. In these researches, people usually take a short video and then analyze the video to restore the audio signal. Although these works successfully achieved sound regeneration with laser speckle images, still real-time sound detection and regeneration have not been considered, nor has it been considered for detection under moving sound source situation, which greatly limits the potential applications of this technology.

In this manuscript, a real-time sound detection and regeneration system based on laser speckle image is proposed. Different from the previous researches, the proposed system for the first time took the real-time processing and regeneration of audio signal with moving sound source into consideration. In our system, after capturing speckle images, high-speed calculation is conducted immediately to obtain the displacement of the captured speckle images instead of storing the pattern in the computer. Thus, the system can output audio signals in real time while sampling. To achieve this, only a small part of the imaging sensor is used to capture the speckle patterns. In this way, not only high camera sampling rates can be achieved even with a common industrial camera, but also the computation time can be reduced because of the small image size. Moreover, optical flow algorithm is adopted to obtain the displacement between two frames in a short time. These two points enable a real-time processing speed and sub-pixel level accuracy. In addition, some denoising algorithms are proposed to correct the calculation noise in real time. This not only improves the accuracy of the results, but also enables the system to regenerate audio signal with moving sound sources. Compared with the previous systems, our system works more like a microphone rather than a recorder, which enables our system to have a wider range of potential applications, such as a meeting scenario.

The structure of this paper is as follows: the flowchart of our system is intially introduced in Section 2, where the optical flow method, along with the denoising algorithms of the sampling signal are explained. Then the experiment results of our system are shown in Section 3, including the results under different signal amplitude and the camera defocusing and the results of moving sound source detection. Finally, the conclusion of the paper is given in Section 4.

2. Methodologies

2.1 Farneback optical flow algorithm

According to the results of previous research, when we illuminate a vibrating object with a coherent laser source, the captured speckle is periodically vibrated in one direction [10]. In the past there were several studies that recover the audio signal via the gray value variation of selected pixels [11]. The advantage is that it does not require too much workload of calculation, thus this method is possible to achieve real time calculation speed. However, the gray value method requires linear distribution of gray value within a certain pixel range in the direction of vibration. Therefore, the quality of result cannot be guaranteed when the amplitude of the audio signal changes. Therefore, we decided to regenerate the audio signal according to the motion information of the speckle sequence. In the past, cross-correlation between images was widely used to calculate speckle motion [12,13]. However, it is difficult for cross correlation method to achieve a high-speed calculation and a sub-pixel accuracy at the same time. Besides, in our system, the speckle image size is settled to be very small, which causes reduction of the available image information. This makes most feature points method [14] unavailable with our situation.

For these reasons stated above, the Farneback optical flow algorithm, which is proposed by Gunnar Farneback in 2003, is employed to analysis the speckle motion [15]. In the algorithm, each image is regarded as a 2D function $f(x, y)$. Specifically, by fitting the gray value of each pixel and its neighbors, a quadratic polynomial expansion based on the coordinate $(x, y)$ of the interested pixel can be expressed as:

(1)$$f(\textbf{x}) = {\textbf{x}^T}\textbf{Ax} + {\textbf{b}^T}\textbf{x} + c. $$

Where $\textbf{x}$ represents the coordinate $(x, y)$, $\textbf{A} = \left( {\begin{array}{cc} {{r_4}}&{\frac{{{r_6}}}{2}}\\ {\frac{{{r_6}}}{2}}&{{r_5}} \end{array}} \right)$, $\textbf{b} = \left( {\begin{array}{c} {{r_2}}\\ {{r_3}} \end{array}} \right)$, $c = {r_1}$, ${r_1}$∼${r_6}$ are the coefficients of the quadratic polynomial fitting. When the image undergoes a global shifting $\textbf{d}$, the new signal can be expressed as:

(2)$$\begin{aligned} f^{\prime}(\textbf{x}) &= f(\textbf{x} - \textbf{d}) \\ &= {(\textbf{x} - \textbf{d})^T} \textbf{A}(\textbf{x} - \textbf{d}) + {\textbf{b}^T}(\textbf{x} - \textbf{d}) + c \\ &= {\textbf{x}^T} \textbf{Ax} + {(\textbf{b} - 2\textbf{Ad})^T} \textbf{x} + {\textbf{d}^T} \textbf{Ad} - {\textbf{b}^T} \textbf{d} + c \\ &= {\textbf{x}^{T}} \textbf{A}^{\prime} \textbf{x} + {\textbf{b}^{{\prime}T}} \textbf{x} + c \end{aligned}$$

Optical flow method assumes that the brightness in the same pixel of the two images does not change, thus we have:

(3)$$\textbf{A}^{\prime} = \textbf{A}. $$

(4)$$\textbf{b}^{\prime} = \textbf{b} - 2\textbf{Ad}. $$

(5)$$c^{\prime} = {\textbf{d}^T}\textbf{Ad} - {\textbf{b}^T}\textbf{d} + c. $$

According to Eq. (4), the displacement $\textbf{d}$ can be solved as:

(6)$$\textbf{d} ={-} \frac{1}{2}{\textbf{A}^{ - 1}}(\textbf{b}^{\prime} - \textbf{b}). $$

The above description is the basic idea of the Farneback optical flow algorithm. In practical considerations, a weighted estimation over a neighborhood of the interested pixel is performed to reduce noise and obtain a reliable calculation result. Figure 1 shows two speckle images and the optical flow result between them. Since the algorithm calculates the displacement pixelwise, a dense optical flow that represents the displacement between two frames can be obtained even the image size is very small, as shown in Fig. 1(c).

Fig. 1. Two speckle images and the optical flow field between them. (a) Former frame. (b) Later frame. (c) Optical flow.

Name	Description
Visualization 1	Original audio file
Visualization 2	Regenerated audio file

Abstract

1. Introduction

2. Methodologies

2.1 Farneback optical flow algorithm

2.2 Real-time signal processing

3. Experiment result

3.1 Single frequency test

3.2 Effect of amplitude and defocusing on the result

3.3 Detection of moving sound source

4. Conclusion

Funding

Disclosures

References

Supplementary Material (2)

Cited By

Figures (19)

Equations (10)

Optics Express