Speedy light focusing through scattering media by a cooperatively FPGA-parameterized genetic algorithm

Shu Guo; Shu Guo; Richard Stern; Hong Zhang; Lin Pang; Lin Pang

doi:10.1364/OE.469238

1. Introduction

Imaging through scattering media attracts great research attention since it offers biomedical diagnosis and therapy advantages. The light transmitted through the randomly scattering materials, like biological tissues, experiences multiple scattering processes, which leads to limited penetration depth into the tissues. The ballistic photons that carry image information would degrade exponentially during the scattering process. A conventional microscope can gather the ballistic photons when the penetration distance is smaller than the scattering mean free path, still acquiring a diffraction-limited image. As the propagation distance into the media increases, the transmitted field would be completely randomized as the scattered photons are far more than the ballistic photons. To tackle the high randomizations, researchers attempted to make the scattered photons constructively interfere. In such a context, wavefront shaping technologies have been developed to modulate the incident wavefront to form light focusing behind the scattering media. Three common approaches used are transmission matrix (TM) [1–5], optical phase conjugation (OPC) [6–9], and feedback-based iterative wavefront shaping [10] that includes partitioning algorithm (PA), continuous sequential algorithm (CSA) [11], Hadamard algorithm [12], simulated annealing algorithm [13], particle swarm algorithm (PSA) [14,15], four-element division [16], genetic algorithm (GA) [17,,–20], and natural gradient method [21,22]. The OPC and TM approaches might offer fast retrieval of the optimized fields, but the significant vulnerability against noise makes them unsuitable for dynamic imaging. In contrast, the GA could achieve much higher enhancement of light focusing on low signal-to-noise (SNR) conditions than other methods due to its superior ability to be immune to the noisy environment. However, the GA is a random search-based optimization that achieves the optimized solution after multiple iterative operators, including selections, crossovers, mutations, etc. The computational complexity makes GA performs very well but time-consuming. The GA approach implemented on a personal computer typically requires at least tens of minutes (or even several hours).

In this study, we aim to improve the processing speed of the GA by optimizing its parameters and configuring it on a hardware level. By analyzing the influence of the decay ratio of the mutation rate on the convergence in GA, we could shorten the convergence time significantly with steeper decreased decay ratios. We further introduced a convergence-efficiency function to balance the processing time and the enhancement of light focusing. The optimized GA parameters are then adopted into a highly integrated FPGA configuration to build the FPGA-based GA system. The FPGA has the outstanding speeding advantage because it has properties like large cache memory, specialized circuit, and paralleling processing. In addition, we introduced a digital micromirror device (DMD) that could a achieve fast framerate with the binary amplitude modulation method [23]. Compared to the PC-based GA, this system dramatically reduces the processing time of the GA. The light focusing through the scattering media was achieved within a few seconds, 150 times faster than the PC-based GA configuration. With the further upgrade of hardware, we believe the proposed accelerated GA approach could reach the level of milliseconds, enabling us to tackle various dynamic scattering scenarios.

2. Rapid convergence of genetic algorithm with optimized parameters

2.1 Theory behind convergence of the GA

The genetic algorithm method is an optimization algorithm that searches for the best solutions based on the nature of evolution. The GA focuses the scattered light onto one output channel behind scattering media as the focal spot by optimizing the input modulation optical field or input modes to construct in-phase interference. The overall processing procedures of the GA are described in [18]. In general, the GA starts by generating an initial population of parent masks with random fields (amplitude or phase). Each mask stands for a solution of optimization. When projecting the population of parent masks on the scattering media, a cost function is used to evaluate the fitness of each mask. In our case, the cost function ranks intensity values measured at the output spot on a detector behind scattering media. According to the ranking of the current population, two higher-ranked parent masks are selected to breed a new offspring mask. This process is called the crossover. The newly formed offspring mask is mutated by randomly changing its field values with a certain probability. The mutation is realized by randomly changing pixels on the spatial light modulator in the applications. Afterward, the offspring masks are continuously generated from the higher-ranked parent masks and projected onto the scattering media, and the corresponded intensity values are ranked again. Half of the best-ranked offspring masks would replace the worst half in the population of the parent masks to form a new population. The algorithm then iterates the above procedures until the intensity values measured at the output spot are saturated, at which the corresponded offspring mask would be the optimized mask or field distribution.

In order to greatly reduce the time duration to reach the saturation, or convergence, while maintaining the highest level of intensity values at the focal spot, the mutation rate R, defined as the percentage change of the field in the newly generated mask, is a critical parameter. At each iteration in GA, the mutation rate decides the newly generated population's diversity or defines the search space's size. For example, when the mutation rate R is set high, the newly formed masks would generate bigger intensity variations at the output spot, which would help the algorithm jump out those possible local optimums but might take too long to the saturation. Inversely, if $R$ is set relatively low, it could make the searching process faster but might lead to ‘genetic drift’ and never reach the highest intensity. To accelerate the searching process while avoiding being trapped in the local optimums, we define a decay ratio of the mutation rate:

(1)$$\Upsilon = |{{R_k} - {R_{k + 1}}} |$$

from which we can set a mutation rate for each iteration. The decay ratio is determined by the difference in the mutation rates between the ${k^{th}}$ and ${(k + 1)^{th}}$ iteration. The higher decay ratio and reasonable initial mutation rate can lead to rapid convergence by quickly reducing the size of search space while preventing being trapped in potential local optimums. As shown in Fig. 1, the optimization process in GA is visualized with a 2D Shaffer function, including many local optimums, local minimums, and the satisfactory solution at the central circle. At the beginning of GA, the mutation rate would be set a bit higher, for example, 5%∼15%, to maintain a relatively larger search space. Consequently, a higher diversity of masks can be found in the initial population (larger area occupied by the initial population in Fig. 1). Afterward, the mutation rates keep decreasing with a high decay ratio $\Upsilon $. The search space for the newly formed offspring masks is shrinking rapidly, as the red circles show, leading to the acceleration of the convergence speed toward the satisfactory solution. In order to evaluate the above strategy, we conduct numerical simulations and experimental assessments in the following sections.

Fig. 1. Description of the effect of high decay ratio. The 2D Schaffer function that is used for testing black-box optimization algorithms is plotted as multiple concentric circles. The orange areas indicate local optimums, the gray areas represent local minimums, and the satisfactory solution is located at the central point of the circle. For the GA, the diversity of a population determines how vast the search space could be, which is shown as the area of the red circle. Green hexagons and blue hexagons represent offspring and parent masks, respectively. With high decay ratio, the mutation rate decreases quickly along with iterations and reduces the size of the search space. This process improves the possibility of reaching satisfactory solution without falling into local optimums, which thus makes the convergence of GA faster.

Download Full Size | PDF

2.2 Numerical simulation

To investigate the decay ratio of the mutation rate, we conducted numerical simulations using MATLAB (2019b) on a computer equipped with Windows 10 OS, Intel Core i9-9900K CPU @ 3.60GHz, 3600MHz, 8 Core, and 64.0 GB RAM. The distribution of input modulation modes (segments) in the mask defines the incident field. We set the number of input modulation segments as 1024, which means that $2 \times 2$ pixels are grouped as one modulation segment, corresponding to a mask with $32 \times 32$ modulation segments and $64 \times 64$ pixels. We adopted binary amplitude modulation [23] by setting each segment to either 1 or 0. The scattering media is defined by a Gaussian random matrix in size of $64 \times 64$ pixels. The size of one population in the GA is set as 32, meaning there are 32 masks in each generated population. The cost function is set as the ranking of intensity values at the targeted output channel behind the scattering media, corresponding to the fitness of 32 masks. We started with a previous study [18], the mutation rate $R$ for each iteration is defined as:

(2)$$R = ({R_0} - {R_{end}}) \times \exp ( - k/D) + {R_{end}}$$

where ${R_0}$ is the initial mutation rate, ${R_{end}}$ is the ending mutation rate. $D$ is the decay factor that decides the decay ratio of the mutation rate. The decay factor is inversely proportional to the decay ratio, where $\Upsilon \propto (1/D)$. To compare the effect of decay ratio, we adopt decay factor as $D = 80,400,1000$, to indicate high, medium, and low decay ratio, respectively. The other parameters are given as follows: the initial mutation rate ${R_0} = 0.06$; the ending mutation rate ${R_{end}} = 0.012$; total iteration number $N = 2000$. The performance of the GA with different decay ratios is evaluated by comparing the normalized convergence, which is described as the percentage of the enhancement in each iteration to the highest enhancement when the satisfactory solution is approached. Here, the enhancement $\xi$ is defined as the ratio of the intensity value at the output spot after each iteration to the averaged intensity value before optimization. In addition, the satisfactory solution (optimized field distribution) is determined when the optimization of the GA saturates.

The curves in Fig. 2(a) illustrate the mutation rates versus the iterations for the high, medium, and low decay ratios. The mutation rate decreases drastically to the ending mutation rate near the 500^th iteration for the case of the high decay ratio. In contrast, the mutation rate for the medium decay ratio reaches the ending mutation rate at around the 2000^th iteration. As for the low decay ratio, the mutation rate does not reach the ending mutation rate at the 2000^th iteration and more iterations might need to achieve the satisfactory solution.

Fig. 2. Numerical simulation of the GA with different decay ratios. (a) Mutation rates in each iteration of the GA with decay factors equal to 80, 400, and 1000, corresponding to high, medium, and low decay ratios. (b) The normalized convergence of GA and corresponding enhancement. The three stages of convergence rates are depicted by black dotted lines.

Download Full Size | PDF

Corresponding to the mutation rates in Fig. 2(a), we apply 50 simulations, the average performance of the GA is shown in Fig. 2(b). The right axis indicates the enhancement at the focal spot in each iteration, and the left axis shows the normalized convergence. The satisfactory solution appears at around the 1900^th iteration for the high decay ratio with the final enhancement of $96 \pm 6$. The converging speeds of the normalized convergences are high at the beginning, as depicted by the first part of the black dotted line. The normalized convergence reaches around 20% within only about 50 iterations. After that, the speeds show different trends for the three cases. For the high decay ratio, the speed maintains until the normalized convergence reaches 66% at the 200^th iteration, defined as the first stage. Starting from the 200^th iteration, the speed slows down and the normalized convergence reaches 88% at the 500^th iteration, defined as its second stage. After the 500^th iteration, the speed further decreases until reaching saturation at around the 2000^th iteration, defined as the third stage. In contrast, the first stages shrink in the medium and low decay ratios while the second stage lengthens. The normalized convergence reaches 70% at about the 700^th iteration for the medium decay ratio, while it arrives at 50% at the 570^th iteration for the low decay ratio. At their third stage, the normalized convergences reach 98% and 86% for the medium and the low decay ratios. The corresponded enhancements are $95 \pm 5$ and $81 \pm 3$, respectively. The results in Fig. 2 show that steeper decay of mutation rate can lead the GA to converge faster. According to Fig. 1, faster convergence results from the rapid shrink of the search space. Although the decay ratio, or decay factor, has been mentioned in reported literature, its effect on the convergence in the GA is first presented in detail above.

2.3 Experimental validation for optimized parameters

To investigate the effect of the decay ratio in an experimental environment, we conducted experiments with the setup illustrated in Fig. 3. A 532nm coherent laser beam (Genesis MX 532, Coherent Inc.) is expanded and then illuminates on a DMD (DLP Discovery 4100 with DLPLCRC Modulation Board DLPLCRC410EVM and DMD board, DLPLCR70EVM, Texas Instruments). The incident angle on the DMD surface is adjusted to make the micromirrors act as a blazed grating to enable the optical energy to concentrate on the reflection direction [24]. The surface of DMD is imaged on the scattering media (ground glass diffuser, DG10-120, Thorlabs) by a 4-f imaging system with a demagnification of 4 determined by lens L2 and L3 with focal lengths of 300mm and 75mm, respectively. A 90:10 beam splitter (BS) is placed behind the scattering media, from which the transmitted light, or 90% of the scattered light, is received by a photodetector (APD130A2, Thorlabs) with a $100\mu m$ iris for the intensity evaluation. The reflected light, 10% of scattered light, as a sampled light, is captured by a CCD camera (CS2100M-USB, Thorlabs). The analog voltage signal acquired at the detector is transferred to a data transition box (DAQ, DT9834) that has a data resolution of 16-bit in the range of 20V from -10V ∼ +10V. The DAQ is connected to the PC by USB. The same computing device is used for the numerical simulation in the above section. In addition, DMD is also connected to a PC and controlled by a MATLAB control module (LO4100, LinOptx LLC).

Fig. 3. Experiment setup. DMD: digital micromirror device (DLP4100, TI). M1, M2, M3: mirror. L1: lens (focal length = 300 mm). L2: lens (focal length = 300 mm). L3: lens (focal length = 75 mm). MS: microscope (M-5X, Newport). P: pinhole. SM: ground glass diffuser (DG10-120, Thorlabs). BS: beam splitter 90:10. CCD: CCD camera (CS2100M, Thorlabs). APD: Si avalanche photodetector (APD130A2, Thorlabs). DAQ: data transition box (DT9834). ADC: analog to digital convert (LinOptx Digitizer v1.0). FPGA: Virtex-5 (Xilinx). The coherent 532 nm laser beam is expanded by a microscope and collimated by L1. The DMD is imaged to the front surface of scattering media by a 4-f system. A photodetector and a camera collect the scattered light behind scattering media. The voltage signal from the photodetector is split; one connects to the PC control module (DAQ and PC) shown as a green dotted line for the experimental validation of the GA. After the GA validation, the signals are directed to the FPGA control module (ADC and FPGA) shown as a red dotted line for testing the FPGA configuration. The experiment setup has two functions; the first is to validate the effect of decay ratio in an experimental environment. The second is to test the design of the FPGA program and visualize the performance later.

Download Full Size | PDF

The decay factors were selected as 80, 400, and 1000 to stand for the high, medium, and low decay ratios, same as those in the simulations. The DMD full-screen pixels in size of $1024 \times 768$ were grouped into big segments to create $32 \times 32$ input modulation segments as one mask. Laser power was set to 45mW. The normalized convergence of the GA and the enhancement of the focal spot for different decay ratios are shown in Fig. 4(b). The experimental results generally follow the trends simulated in Fig. 2(b).

Fig. 4. Experimental effect for three decay ratios. The parameters of the GA are same as numerical simulation. Normalized convergence and corresponding enhancement during the measurement. The black dotted lines depict three stages of improved speed for each decay ratio setting.

Download Full Size | PDF

In experiments, at the end of the first stage, about 30 iterations, the normalized convergence reaches 40% for the three decay ratios. Afterward, the convergence starts to separate. For the high decay ratio, the speed decreases in the second stage, between the 230^th and 500^th iterations, and the normalized convergence reaches 90% at the end of the stage. The speed further falls at the third stage while the GA converges to the satisfactory solution at the 2000^th iteration with the enhancement of 43. As for the medium decay ratio, the speed decreases at the second stage (from the 30^th to the 580^th iteration), reaching around 83% of normalized convergence at the end of the stage. It approaches about 95% of the normalized convergence at the 2000^th iteration. For the low decay ratio, the normalized convergence reaches 64% at the end of the second stage. The speed further declines at the third stage, and 90% convergence of the GA is achieved at the 2000^th iteration. In the experiments, when an extremely low value of decay ratio was adopted, the convergence might not happen even though it was processed with a vast number of iterations. On the other hand, the relatively higher mutation rate would cause the slow convergence because it was challenging to find the optimized masks under a big searching space.

2.4 Time duration versus operational enhancement

As illustrated in Fig. 3 and Fig. 4, the GA processing saturates, and the enhancement reaches the maximum after a certain number of iterations. With different selections of parameters, the converging speeds decrease through the course, defined in three stages. The decreasing converging speed brings an issue of time efficiency because the GA only achieves a small quantity of enhancement improvement at the last stage but consumes much longer. In the applications of dynamic scattering, the short decorrelation time of the scattering events does not allow the algorithm to process too long. Instead, the specified timing should be determined to end the optimization processing after specific iterations, at which the optimized mask would be displayed on DMD to form the light focusing behind the scattering media. Hence, the tradeoff between the operational enhancement and the iteration number should be evaluated to determine this specific timing. For this purpose, we define a convergence-efficiency function $\eta$:

(3)$$\eta = {\xi _k}/{\xi _{{N_g}}} - k/{N_g}$$

where $0 \le {\xi _k}/{\xi _{{N_g}}} \le 1$, $0 \le k/{N_g} \le 1$. ${N_g}$ represents the total number of iterations when achieving the optimization and acquiring a satisfactory solution. k indicates the ${k^{th}}$ iteration during the GA processing, where $k \le {N_g}$. ${\xi _k}/{\xi _{{N_g}}}$ indicates the normalized convergence during the operation, as defined above. Both ${\xi _k}/{\xi _{{N_g}}}$ and $k/{N_g}$ are normalized to the condition when the satisfactory solution is achieved, so the range of $\eta$ would be $[{ - 1,1} ]$. The convergence-efficiency function $\eta$ is used to evaluate the difference between the ratio of enhancement and the ratio of iteration number (time duration) regarding the satisfactory solution. It presents the time efficiency during the optimization process. Hence, the maximum value of the function $\eta$ could help us define the ending point of the GA processing, upon which the acquired optimized mask at this ending point should be utilized to form the light focusing.

The convergence-efficiency functions for the simulations and the experiments conducted in the above sections were evaluated and shown in Fig. 5. The convergence-efficiency functions increase from near zero to the peak values and then monotonically decrease for the three decay ratios ($D = 80,400,1000$). For the numerical simulations, when the decay ratio is high (D = 80), the measured $\eta$ reaches a maximum of about 0.63 at the 500^th iteration, then reduces to about 0 again at the 2000^th iteration. For the medium and low decay ratios, the maximums $\eta$ appear at around the 630^th iteration and 300^th iteration, the corresponded values are approximately 0.38 and 0.22, respectively. The experimentally measured $\eta$ generally agrees with the simulation results. For the high decay ratio, the maximum value locates near 0.6 at the 500^th iteration. As for the medium decay ratio, the measured $\eta$ is about 0.39 at the 730^th iteration, slightly higher than the simulation. The experimental value is lower than the simulated one for the low decay ratio condition, with a maximum value of around 0.2. Both the simulations and the experiments show that the higher decay ratio $\Upsilon $ of the mutation rate would bring a better balance between the iteration number and the enhancement.

Fig. 5. Convergence-efficiency functions for the simulations and experiments. Red, blue, and green dots correspond to high, medium, and low decay ratios, respectively, for simulation results. Red, blue and green lines represent the experimentally measured convergence-efficiency functions for high, medium and low decay ratios. The black dotted line indicates the maximal point of function in the experiments and simulations.

Download Full Size | PDF

The convergence-efficiency function degrades to the negative region after 1500 iterations for the low decay ratio because the enhancement increases too slowly. In addition, it can be seen that the maximum $\eta$ appears at different iteration locations for the three decay ratios. For the low decay ratio, the convergence drastically flattens after the initial rising stage, leading to the turning point of $\eta$ at early iterations. Besides the GA, the convergence-efficiency function $\eta$ could also be employed to analyze the tradeoff for other various optimization algorithms.

3. FPGA parameterized genetic algorithm

The simulation and experimental evaluations show that within only about 500 iterations, the normalized convergence reaches around 88% with a high decay ratio or a decay factor of D = 80. To accelerate the GA processing, minimizing the time duration for each iteration becomes another critical factor.

Based on the available accessories on the DLPLCRC410EVM board, we designed an FPGA-based GA architecture for the Virtex-5 FPGA chip, as shown in Fig. 6. A PLL (phase locked loop) module is generated to provide the timing control of the FPGA program, including the clocks of internal state machine logic and the state machine of other hardware such as DDR (Crucial PC2-6400 2GB DIMM 800MHz DDR2 SDRAM Memory) and ADC (analog-to-digital converter, LinOptx Digitizer v1.0). The top-level state machine controls the whole logic of the program and cooperates with lower-level state machines. These lower-level state machines include the logic of the Genetic Algorithm and DDR, ADC data communication. The hardware operations were designed with careful consideration for the timing constraints so that DMD display interface, DDR write/read interface, and ADC read interface would cooperate with the procedures in the GA.

Fig. 6. Architecture in the FPGA design. PLL generates 50 MHz clocks for top-level state machine, 200 MHz for lower-level state machine. Dual port BRAM is an internal memory space that stores one DMD mask. DMD interface controls the displaying of masks when the full mask in size of $1024 \times 768$ is already loaded. ADC interface acquires digitalized data in high-speed sampling mode. DDR interface controls the write/read operations between FPGA and DDR memory.

Download Full Size | PDF

In the FPGA design, we introduced a trivium stream cypher to generate random number vectors [25] for the randomizations in the GA processing. The workflow for the FPGA program is shown in Fig. 7. The internal frequency for the lower-level state is 200MHz with one clock cycle of 5ns. The number of modulation segments (the size of the masks) is set as $64 \times 64 = 4096$, and $1024 \times 768$ pixels on the DMD are divided into $64 \times 64$ segments, each segment has $16 \times 12$ pixels. This is the maximum number of modulation segments that we could realize under 200MHz frequency because the more number indicates a longer processing time within one cycle. In our implementation, the time for processing $64 \times 64$ modulation segments is around $4.9\textrm{ns}$, which is very closed to one clock cycle (5ns). It could be aware that there is a trade-off between the processing time and the modulation size. According to the previous study [23], the enhancement of the focal spot is proportional to the modulation size. But for the FPGA configuration, the commands are processed through time cycles. Consuming fewer time cycles means faster processing, which is one of the primary goals of designing an FPGA system. For our FPGA configuration, additional functions are inserted to guarantee the fixed mutation rate and crossovers, in other words, to prevent the modulation size from being disturbed. Those functions occupy the major processing time within one time cycle, thus, confining the applicable modulation size.

Fig. 7. Workflow of generation of one offspring mask. In the beginning of program, it generates initialization parent population in $43\mathrm{\mu s}$. The FPGA program starts to iterate after initialization, from where the offspring mask is constructed based on parent masks. The new offspring mask would be displayed on DMD and written to DDR, in the meanwhile, digitalized data from ADC is acquired and ranked. The process for one offspring mask cost around $500\mathrm{\mu s}$ to complete. One iteration for processing of population of 16 is then finished in 8 ms.

Download Full Size | PDF

The GA starts with the generation of a population of 16 parent masks. These masks are written into the DDR memory for storage. The ranking process is triggered when the masks are displayed on the DMD, and corresponded intensity values are captured by the photodetector. The analog voltage values from the photodetector are then converted to digital values by the ADC.

After the initialization stage, the program continues to generate the next population of 16 offspring masks from the high-ranked parent masks, read from the DDR memory. For our FPGA configuration, one batch of 128-bit vector data is the essential data transfer unit between buffers, internal memory, and DDR. We could only manipulate 128-bit vector data for each clock cycle. Thus, the construction of one $1024 \times 768$ mask costs $8 \times 768$ clock cycles, equal to 30720ns. Since the DDR operation has a ‘burst’ issue that the column of data is transferred in several parts, rather than sent across the bus immediately, the data reading from DDR has a slower frequency (50MHz) compared to the internal frequency of the board (200MHz).

The implementations of crossovers and mutations in the GA were also be processed based on 128-bit vectors. Here, the mutation rate represents the probability of mutation. Since the total modulation segments is $64 \times 64$, the 128-bit vector is divided into 8 pieces of 16-bit vectors. As a result, the crossovers are processed by reading two 16-bit vectors from two parent masks and combining them as one 16-bit vector in the offspring mask. We generated a virtual counter related to the iteration number to determine whether the 16-bit vector should be mutated by the random number vectors. Thanks to the parallel processing nature of the FPGA, we could apply mutations and crossovers to 8 pieces of 16-bit vectors at the same time. The processed vectors are transferred to the DDR or DMD buffer immediately. In contrast, using software programming to apply such calculations or data transitions is mostly sequential. The commands are processed one by one from the first to the last. Here, the parallel processing nature of the FPGA helps to accelerate the procedure 8 times at least. To maintain the $16 \times 12$ pixels, the mutations should be kept being the same for $1024/128 \times 12 = 96$ cycles. Because of the DDR reading operation, it takes $420\mathrm{\mu s}$ to generate one complete offspring mask after the processing of the crossovers and the mutations. When an offspring mask is completely written to the DMD buffer, the DMD starts to display the mask immediately. In the meantime, the generated offspring mask is written to DDR memory as a new parent mask for the next iteration. After that, the digitalized data from the ADC is accumulated in $31\mathrm{\mu s}$, which converts analog voltage values in the range of 0∼3.3V to 10-bit digital data every $2.3\mathrm{\mu s}$. A $10\mathrm{\mu s}$ delay is set to wait for the ranking process that evaluates the fitness of the voltage values produced by the offspring mask. Above is the complete procedure for one offspring mask. For the population size of 16, the above process is conducted 16 times as one completed iteration in the GA approach. In the end, one iteration takes around 8ms.

Since the FPGA processes with a binary system, the exponential function in Eq. (2) could be difficult to be utilized and requires additional functions. Considering that a binary vector is more efficient to be processed in FPGA program, we modify the exponential function of the mutation rate in Eq. (2) to a linear format:

(4)$$R = \left\{ {\begin{aligned} &{{R_0} - c \times (k - 1),R > {R_{end}}}\\& {{R_{end}}\textrm{ }\quad\quad\quad,R \le {R_{end}}} \end{aligned}} \right.$$

$c$ is the linear decay factor of the mutation rate, which is the key factor that decides the decay ratio of the mutation rate. k represents the index of iteration. To fit the high decay ratio (low $D$) acquired in the above section, the parameters in Eq. (4) were set as following values, $c = 3.66 \times {10^{ - 4}}$, ${R_0} = 0.061$, and ${R_{end}} = 0.012$. The number of iterations was set for 2000. The experiment setup is shown in Fig. 5, where the red dotted line indicates the connection of ADC, FPGA, and DMD. The sampled images are captured by the CCD for visualization and verification. The voltage values from the photodetector are digitalized by the DAQ to the PC for evaluation. The captured signals from the photodetector are also transferred to the FPGA as the signal feedback. A typical experiment result of the FPGA program is presented in Fig. 8, showing the voltage values from the photodetector via the DAQ during the process of the FPGA program. The program starts at 0s and ends at 14s. It can be seen that the convergence experienced three stages, the first (steep rising), the second (approaching), and the third (gradually saturating), similar to those in the numerical simulations and the experimental results, as shown in Fig. 2(b) and Fig. 4(b). The green dotted line indicates the time at 4s with the 500^th iteration, marking the end of the second stage of the convergence and corresponding to the maximal point in convergence-efficiency function shown in Fig. 5 at the 500^th iteration. The enhancement is 91 at the 2000^th iteration. At the 500^th iteration, the normalized convergence arrives 88%. In addition, the insets in Fig. 8 show the images taken before the optimization (the speckle distribution behind scattering media), the focal spot at the 500^th, and at the 2000^th iteration, respectively. This FPGA program takes about 14s to finish 2000 iterations, and it only costs around 4s to reach 88% of full convergence. In contrast, the PC-based GA takes around 1200ms to complete one iteration, while our FPGA program only takes 8ms. The configured FPGA program accelerates the GA processing up to 150 times.

Fig. 8. Process of focusing light through scattering media. The process of cooperatively FPGA-parameterized GA is recorded by CCD and DAQ. Red dots show original voltage data from DAQ. Black lines indicate three stages of convergence speed. Orange dotted line: end point of the first stage of the convergence. Green dotted line: end point of the second stage, at the 500^th iteration. The three inset patterns show CCD images at the start of program, the 500^th iteration, and 2000^th iteration, respectively.

Download Full Size | PDF

To evaluate the repeatability and the stability of the cooperatively FPGA-parameterized Genetic Algorithm, we translocated the experimental setup from an optical table to a breadboard on an office desk. We modified the FPGA program to repeat the GA process every 500 iterations. The voltage data recorded via the DAQ repeat ten times in the 40s as shown in Fig. 9(a). The insets show the normalized CCD images taken at the 0s and 3.91s, corresponding to the speckle pattern without the beam shaping at the beginning and the formed focal point at the 500^th iteration, respectively. The cross sections of intensity distributions of the focal spots for 10-time repetitions are shown in Fig. 9(b). The peak intensity values fluctuate near 200 with a variance of ${\pm} 10$, showing the robustness of the cooperatively FPGA-parameterized GA. As a result, cooperatively FPGA-parameterized GA is ready to focus light through dynamic scattering media with a few second decorrelation time. For the visualization, the recorded movie is presented in Supplementary Material (see Visualization 1).

Fig. 9. Focusing light iteratively by the cooperatively FPGA-parameterized GA. (a) Voltage data recorded by DAQ. The GA is processed 10 times. The insets figures display normalized CCD images captured at 0s and 3.91s. (b) The cross section of the focal spot at 4s of each repeat. The inset figure shows the CCD image for the last repeat corresponding to the red rectangle area.

Download Full Size | PDF

4. Discussion

In this study, we proposed enhancing the convergence of genetic algorithm (GA) with a sufficient initial mutation rate and a higher decay ratio of mutation rate. The initial mutation rate itself could influence the converging speed in an inversely proportional relationship. However, a too low initial mutation rate might result in ‘genetic drift’ and make the GA difficult to achieve an optimum solution in a noisy environment. Our study showed that a higher decay ratio $\Upsilon $ of mutation rate makes GA converge faster, but the decay ratio could not be set too high. When the decay ratio is set too high, the mutation rate drops too fast within a few iterations, which makes the searching process insufficient to find the optimized solutions. In our implementation, we chose the reasonable values of the decay ratio $\Upsilon $ and the initial mutation rate so that at least 100 iterations could be guaranteed before reaching the ending mutation rate. Generally, the optimal selection of $\Upsilon $ varies case by case. Besides, it is necessary to apply analysis for the selection of parameters before configuring the FPGA design. To determine the optimized parameters combination in our study, we applied simulations and experiments with different parameters, including different combinations of the initial and ending mutation rate. When the noise was significant in simulations or experiments, we found the combination of parameters (${R_0} = 0.06$, ${R_{end}} = 0.012$) in our system has a relatively stable performance against the noise.

In wavefront shaping applications, FPGA has been used to form focal spots by Continuous Algorithm [26] and TM approach [27]. Our amplitude FPGA-based TM with $32 \times 32$ dimension could be evaluated within 150ms. However, a higher enchantment factor by the higher dimensional TM will cost much more time, notwithstanding that the TM approach is susceptible to environmental noise. The Continuous Algorithm could rapidly generate the focal spot, but the low SNR can strongly reduce the focusing power. In contrast, the GA is a well-known optimization approach with high performance even under an intense noise level. To evaluate the GA and TM performance in the same optical system, we developed both FPGA-TM configuration and FPGA-GA configuration. We placed the optical system on an office desk and turned on the room light to simulate the noisy and unstable environment that usually presents in dynamic scattering regime. The results shown that the TM approach was more sensitive to noise as it sometimes could not generate the focal spot. But for the GA method, the FPGA-GA configuration could achieve light focusing through multiple tests stably like Fig. 9 shown. Consequently, we paid many efforts to construct the FPGA-GA configuration, aiming at dealing with the applications that have the rigorous stability requirement. Our FPGA configuration could also collaborate with current approaches based on advanced algorithm [28] and artificial intelligence [29] to further improve the processing speed against dynamic scattering. On the other hand, gradient-assisted methods could achieve fast light focusing [21,22]. Currently, the primary difficulty of implementing the gradient-assisted method in the FPGA is the conversion of the computation functions. To apply the gradient-assisted method in the FPGA to further promote the speed is one of our future development plans. Although the phase-only modulation provided by DMD could achieve much higher relative enhancement than the binary modulation, the actual optical power at the focal point would be much lower compared to the binary modulation. Along with the modulation size the FPGA time cycle can handle, we select binary modulation method with $64 \times 64$ modulation segments while considering the trade-off between time and efficiency.

In the implementation of cooperatively FPGA-parameterized GA, the number of input modulation segments, the size of the incident field, is set as $64 \times 64$. A more extensive dimensional incident field would lead to higher enhancement of the formed focal spot. However, the timing limitation in the current FPGA chip in DLPLCRC410EVM board, for example, the data writing and reading between DDR and FPGA, prevents us from increasing the modulation segments without extending the time duration for each iteration. It currently takes 14s to operate 2000 iterations for the designed FPGA-based GA program. The data communication of the DDR on the board is processed at 50MHz, which is slower than the frequency of the internal state machine (200MHz). The conflict of the frequencies leads to the primary timing limitation of our current design. On the other hand, the on-chip memory of Virtex-5 in the DLPLCRC410EVM board is only 16.4Mb. The current design is processed with a population size of 16 and each mask in the population is in size of $1024 \times 768$. These parameters make the design require at least memory space $1024 \times 768 \times 16 \times 2 \approx 25.1\textrm{Mb}$. Therefore, the attached DDR must be utilized to store the generated random masks, which leads to the time duration for one iteration of around 8ms, much longer than the time for one clock cycle, 5ns. Fortunately, Xilinx now supplies FPGA chips (Virtex-UltraScale+) that enable up to 500Mb on-chip storage, which is definitely sufficient for the storage of masks for our current design. With the abundant on-chip memory, the FPGA could process all data writing/reading internally. We made a comparison between these two FPGA chips regarding the timing workflow for generating one set of 128-bit data of the offspring mask, as shown in Fig. 10. The internal clock is 200MHz, meaning the time duration for one clock cycle 5ns. For the Virtex-5 FPGA and DDR hardware support in the DLPLCRC410EVM, shown in the red rectangle, it takes 20 ns for the FPGA to read one set of 128-bit data from the DDR with 50MHz. Due to the ‘burst’ issue of the DDR, it takes 80ns to eventually generate one set of the 128-bit offspring data. As for the Virtex-UltraScale+, shown in the dotted line rectangle, no DDR is needed to store the masks as the on-chip memory is enough. It takes 5ns to read 128-bit data from the internal memory, and only 10ns is needed to generate a 128-bit offspring vector. If employed, the GA processing based on the Virtex-UltraScale+ FPGA chip could speed up at least eight times faster than the current Virtex-5 based design. In other words, it will only take 500ms or shorter to operate 500 iterations, reaching 88% convergence. The internal processing program of the FPGA-base GA also has the potential to be improved with the newer version of the FPGA chip. We believe GA could be further accelerated, even with more modulation modes towards a higher enhancement factor. The cooperatively FPGA-Parameterized GA based on the advanced FPGA chips has great potential in imaging through dynamic scattering media.

Fig. 10. Flowchart for the construction of a 128-bit data for an offspring mask by Virtex-5 and Virtex-UltraScale+. The frequency of internal clock is 200 MHz with 5 ns clock cycle. Red rectangle: For current Virtex-5 and DDR communication system, it takes 4 clock cycles (20 ns) to read 128-bit parent data from DDR, and 80 ns to generate a 128-bit offspring data. Gray rectangle: The parent data could be saved in internal memory, which reads 128-bit data out per clock cycle, thus the construction of a 128-bit offspring data requires 10 ns.

Download Full Size | PDF

5. Conclusion

We have proposed a method to achieve fast light focusing through scattering media by a Genetic Algorithm accelerated with optimized parameters and FPGA-based hardware implementation. We found that the high decay ratio of the mutation rate greatly improved the converging processing. By measuring the convergence-efficiency function that evaluates the tradeoff between the iteration number (duration time) and the enhancement of the focal spot, we defined the optimal efficiency for the GA with 500 iterations and 88% convergence. Furthermore, the FPGA program was developed to accelerate GA cooperatively at the hardware level with an optimized decay ratio. The configured FPGA program simultaneously was implemented under hardware controls (DMD, ADC, DDR) and algorithm logic. The measured convergence-efficiency function allowed us to gauge that the optimization was achieved within 4s. The designed GA program could be further accelerated to the millisecond level by using the advanced version of the FPGA chip that provides on-chip memory up to 500Mb. We believe our cooperatively FPGA-Parameterized GA system could pave the way for broad applications in biological imaging in which dynamical corrections of the real-time wavefront shaping is a must.

Funding

R&D Funding from LinOptx; National Natural Science Foundation of China (61675140); Science Specialty Program of Sichuan University (2020SCUNL210).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. S. M. Popoff, G. Lerosey, R. Carminati, M. Fink, A. C. Boccara, and S. Gigan, “Measuring the Transmission Matrix in Optics: An Approach to the Study and Control of Light Propagation in Disordered Media,” Phys. Rev. Lett. 104(10), 100601 (2010). [CrossRef]

2. S. M. Popoff, G. Lerosey, M. Fink, A. C. Boccara, and S. Gigan, “Controlling light through optical disordered media: Transmission matrix approach,” New J. Phys. 13(12), 123021 (2011). [CrossRef]

3. J. Yoon, K. Lee, J. Park, and Y. Park, “Measuring optical transmission matrices by wavefront shaping,” Opt. Express 23(8), 10158–10167 (2015). [CrossRef]

4. M. Kim, W. Choi, Y. Choi, C. Yoon, and W. Choi, “Transmission matrix of a scattering medium and its applications in biophotonics,” Opt. Express 23(10), 12648–12668 (2015). [CrossRef]

5. D. B. Conkey, A. M. Caravaca-Aguirre, and R. Piestun, “High-speed scattering medium characterization with application to focusing light through turbid media,” Opt. Express 20(2), 1733–1740 (2012). [CrossRef]

6. Z. Yaqoob, D. Psaltis, M. S. Feld, and C. Yang, “Optical phase conjugation for turbidity suppression in biological samples,” Nat. Photonics 2(2), 110–115 (2008). [CrossRef]

7. M. Cui and C. Yang, “Implementation of a digital optical phase conjugation system and its application to study the robustness of turbidity suppression by phase conjugation,” Opt. Express 18(4), 3444–3455 (2010). [CrossRef]

8. C. L. Hsieh, Y. Pu, R. Grange, G. Laporte, and D. Psaltis, “Imaging through turbid layers by scanning the phase conjugated second harmonic radiation from a nanoparticle,” Opt. Express 18(20), 20723–20731 (2010). [CrossRef]

9. Y. Liu, C. Ma, Y. Shen, J. Shi, and L. V. Wang, “Focusing light inside dynamic scattering media with millisecond digital optical phase conjugation,” Optica 4(2), 280–288 (2017). [CrossRef]

10. I. M. Vellekoop, “Feedback-based wavefront shaping,” Opt. Express 23(9), 12189–12306 (2015). [CrossRef]

11. Z. Fayyaz, N. Mohammadian, M. Tabar, R. Manwar, and M. Avanaki, “A comparative study of optimization algorithms for wavefront shaping,” J. Innov. Opt. Health Sci. 12(04), 1942002 (2019). [CrossRef]

12. I. M. Vellekoop and A. P. Mosk, “Phase control algorithms for focusing light through turbid media,” Opt. Commun. 281(11), 3071–3080 (2008). [CrossRef]

13. Z. Wu, J. Luo, Y. Feng, X. Guo, Y. Shen, and Z. Li, “Controlling 1550-nm light through a multimode fiber using a Hadamard encoding algorithm,” Opt. Express 27(4), 5570–5580 (2019). [CrossRef]

14. Z. Fayyaz, N. Mohammadian, F. Salimi, A. Fatima, M. R. R. Tabar, and M. R. N. Avanaki, “Simulated annealing optimization in wavefront shaping controlled transmission,” Appl. Opt. 57(21), 6233–6242 (2018). [CrossRef]

15. H. L. Huang, Z. Y. Chen, C. Z. Sun, J. L. Liu, and J. X. Pu, “Light focusing through scattering media by particle swarm optimization,” Chin. Phys. Lett. 32(10), 104202 (2015). [CrossRef]

16. Q. Feng, B. Zhang, Z. Liu, C. Lin, and Y. Ding, “Research on intelligent algorithms for amplitude optimization of wavefront shaping,” Appl. Opt. 56(12), 3240–3244 (2017). [CrossRef]

17. L. Fang, C. Zhang, H. Zuo, J. Zhu, and L. Pang, “Four element division algorithm to focus coherent light through a turbid medium,” Chin. Opt. Lett. 15(10), 102901 (2017). [CrossRef]

18. D. B. Conkey, A. N. Brown, A. M. Caravaca Aguirre, and R. Piestun, “Genetic algorithm optimization for focusing through turbid media in noisy environments,” Opt. Express 20(5), 4840–4849 (2012). [CrossRef]

19. X. L. Zhang and P. Kner, “Binary wavefront optimization using a genetic algorithm,” J. Opt. 16(12), 125704 (2014). [CrossRef]

20. D. X. Wu, J. W. Luo, Z. H. Li, and Y. C. Shen, “A thorough study on genetic algorithms in feedback-based wavefront shaping,” J. Innov. Opt. Health Sci. 12(04), 1942004 (2019). [CrossRef]

21. J. M. Yang, Q. Z. He, L. X. Liu, Y. Qu, R. J. Shao, B. W. Song, and Y. Y. Zhao, “Anti-scattering light focusing by fast wavefront shaping based on multi-pixel encoded digital-micromirror device,” Light: Sci. Appl. 10(1), 149 (2021). [CrossRef]

22. Y. Y. Zhao, Q. Z. He, S. N. Li, and J. M. Yang, “Gradient-assisted focusing light through scattering media,” Opt. Lett. 46(7), 1518–1521 (2021). [CrossRef]

23. D. Akbulut, T. J. Huisman, E. G. van Putten, W. L. Vos, and A. P. Mosk, “Focusing light through random photonic media by binary amplitude modulation,” Opt. Express 19(5), 4017–4029 (2011). [CrossRef]

24. D. F. Wang, E. H. J. Zhou, J. Brake, H. W. Ruan, M. S. Jang, and C. H. Yang, “Focusing through dynamic tissue with millisecond digital optical phase conjugation,” Optica 2(8), 728–735 (2015). [CrossRef]

25. B. Blochet, L. Bourdieu, and S. Gigan, “Focusing light through dynamical samples using fast continuous wavefront optimization,” Opt. Lett. 42(23), 4994–4997 (2017). [CrossRef]

26. A. M. Caravaca-Aguirre, E. Niv, D. B. Conkey, and R. Piestun, “Real-time resilient focusing through a bending multimode fiber,” Opt. Express 21(10), 12881–12887 (2013). [CrossRef]

27. Christophe De Cannière, Trivium: A Stream Cipher Construction Inspired by Block Cipher Design Principles. In: S.K Katsikas, J López, M Backes, S Gritzalis, and B Preneel, eds. (Information Security, 2006).

28. C. M. Woo, H. H. Li, Q. Zhao, and P. X. Lai, “Dynamic mutation enhanced particle swarm optimization for optical wavefront shaping,” Opt. Express 29(12), 18420–18426 (2021). [CrossRef]

29. Y. Q. Luo, S. X. Yan, H. H. Li, P. X. Lai, and Y. J. Zheng, “Towards smart optical focusing: deep learning-empowered dynamic wavefront shaping through nonstationary scattering media,” Photon. Res. 9(8), B262–B278 (2021). [CrossRef]

Speedy light focusing through scattering media by a cooperatively FPGA-parameterized genetic algorithm

Abstract

1. Introduction

2. Rapid convergence of genetic algorithm with optimized parameters

2.1 Theory behind convergence of the GA

2.2 Numerical simulation

2.3 Experimental validation for optimized parameters

2.4 Time duration versus operational enhancement

3. FPGA parameterized genetic algorithm

4. Discussion

5. Conclusion

Funding

Disclosures

Data availability

References

Supplementary Material (1)

Data availability

Cited By

Figures (10)

Equations (4)

Optics Express