Monocular kilometer-scale passive ranging by point-spread function engineering

Nadav Opatovski; Nadav Opatovski; Dafei Xiao; Dafei Xiao; Gal Harari; Yoav Shechtman; Yoav Shechtman

doi:10.1364/OE.472150

1. Introduction

From microscopy to astronomy, depth estimation is a challenging task in optics due to difficulty of encoding three-dimensional (3D) information in a two-dimensional (2D) image. Spanning scales from nanometers to light-years, the requirement for depth estimation, namely, ranging, covers a variety of applications and has yielded numerous imaging schemes. At the scale of tens of meters to kilometers, ranging is especially essential in the fields of autonomous vehicles, defense, and aircraft landing systems. Most common techniques make use of active measurements. Such measurements, e.g. radar or LIDAR, rely upon detection of a signal returning from the ranged objects. This requires emission of intense signals, often making the measurement system bulky and expensive, or unsuitable for some applications due to safety or source-detectability constraints. Additionally, active ranging that is based on individual lines of sight (e.g. LIDAR) require scanning and may be rendered unreliable in the presence of even scarce amounts of dust particles in the air [1], due to back-scattering.

Passive ranging methods, on the other hand, benefit from relying solely on incoming optical signal. Traditional methods such as depth from focus/defocus [2], or triangulation by stereo imaging [3] are usually limited in their application to rather short distances. In fact, to the best of our knowledge such methods have not been applied at distances far surpassing 100-200 meters [4], although efforts in this direction have been made [5]. A fundamental limitation common to these methods stems from their dependence on highly accurate feature detection or blur estimation, which become inefficient at larger distances, where atmospheric turbulence (AT) comes into play. Another drawback associated with the focus/defocus approaches is that they require multiple frames under different camera settings, which usually involves mechanically moving parts. Stereo vision relies on two or more viewpoints [6], with challenges including the need for precise feature detection, as well as the bulkiness of an apparatus with two optical systems, and stringent cross-channel registration requirements. A different approach for ranging at such range scales makes use of signal degradation due to long propagation through the atmosphere – the effects of absorption, scattering and turbulence increase with distance, thus hold useful information [7,8]. The downside of such approaches is that due to unknown atmosphere properties, they require a priori information, such as the properties of the imaged scene, or the existence of a reference object at a known distance.

Another approach for distance estimation is wavefront shaping, specifically, point-spread-function (PSF) engineering; it is implemented by modifying the light-path to the sensor in order to change the optical system’s PSF [9–11], namely, the image that a point source generates in the detector plane. The modified PSF contains information on the curvature of the incoming wavefront, encoding the distance of the object. The versatility of the wavefront shaping approach lies in the ability to design the PSF per application – the PSF can capture distance for three-dimensional imaging [11–14], extend imaging depth of field [15], encode color [16–19], correct aberrations [20,21] and more.

Here, we present a monocular approach for long-distance ranging, based on PSF engineering in a telescope. Combining optical wavefront shaping with decoding algorithms enables us to demonstrate single-shot ranging of moving objects and real-world scenes at distances surpassing 1.7 km. We provide quantitative results obtained by analytical as well as deep neural-network based image analysis, and discuss the sources of error that bound system performance.

2. Materials and methods

2.1 Wavefront shaping and cepstrum analysis

Our optical system is based on a commercial telescope, augmented by a 4-f extension providing easy access to the Fourier plane. There, a diffractive optical element modulates the wavefront, consequently encoding depth information into the PSF (Fig. 1). Extraction of PSF information from a general, unknown continuous image is a difficult challenge due to the convoluted process of image formation. We explore two depth-decoding techniques: a center-of-mass (CoM) based analytical approach and a neural-net based approach. The decoding method is inspired by previous related work by Berlich et al. [14], where the double-helix PSF [10,22] was used to estimate distance at short ranges (∼1 m). With the double-helix PSF, the camera measures a convolution between a sharp image and the PSF comprising two lobes with orientation determined by the distance of the imaged object (Fig. 1). This image can be regarded to as one subject to an echo – a translated replica of the base signal. The decoding challenge is to recover the direction of the echo from the measured image, which entails the object distance. A useful mathematical tool to identify echoes in a signal is the complex cepstrum [23] (denoted from now on simply as cepstrum) - $\mathrm{{\cal C}}$, defined as:

(1)$$\mathrm{{\cal C}} = \mathrm{{\cal F}}{^{ - 1}}(\ln (\mathrm{{\cal F}}(I)))$$

where I is an image, $\mathrm{{\cal F}}$ is the Fourier-transform operator, and ln denotes the natural logarithm operator. Given an image containing a single echo of translation $\boldsymbol{{x_t}}$, the cepstrum would contain a train of impulses following the same direction as $\boldsymbol{{x_t}}$ [24]. Upon finding the direction of the impulses in the cepstrum, the distance of the object in the image can be resolved. In practice, typically only two lobes from the train are visible. In the analytical approach, distance estimation is achieved by locating each lobe’s center-of-mass, calculating lobe orientation and comparing with a distance-to-angle calibration curve. In the neural net-based decoding approach, a neural net is trained to receive the cepstrum and output object distance (see 2.4.2 and SI for details).

Fig. 1. Optical setup and analysis pipeline. A. Optical setup, comprising of a telescope and a subsequent 4-f system; WF – incoming light wavefront, T-telescope BP – Bandpass filter, L – Lens, PM – phase mask. B. Simulated PSFs of the optical system, at different distances. C. Analysis pipeline of distance estimation: (1) The Ceptrum transform (CT) is applied on an ROI in the image. (2) The cepstrum is filtered to improve lobe detection. (3) A window is chosen around the maximal-value pixel, and Center of Mass (CoM) localization is performed inside. (4) The angle of the CoM is obtained, and used to estimate distance, based on a calibration curve (details in Materials and Methods). D. Illustration of the alternative depth-decoding process by a neural network

Download Full Size | PDF

Note that the cepstrum is affected by scene content; this can be understood by examining two extreme cases – the first is an object containing no features at all – e.g. a plain wall or clear sky. Any modification to the PSF of such a scene would introduce no change to the measured image, thus we will not observe any impulses in the cepstrum. The second extreme case is a signal that contains repetitive features in a distinct direction, such as a fence or window shutters. Such features in the image appear as an echo which will dominate the PSF-related impulse and could lead to incorrect distance estimation.

2.2 Optical system

We implemented a PSF-engineering optical system comprising a telescope objective lens and a 4-f extension, to access the Fourier plane, where a diffractive optical element modulates the wavefront. A large light-gathering area is essential in order to maximize the information carried by the wavefront, thus we used an off-the-shelf telescope (Astromaster 102AZ, aperture diameter: 102 mm, f = 660 mm) as the objective lens. We removed the eyepiece module and placed an optical 4-f extension instead, held onto the telescope by a custom-made 3D printed mount. In the 4-f extension, we used a pair of CCTV lenses (Kowa LM50FC24M, f = 50 mm), as we found standard achromatic lenses to suffer from a significant FOV dependent aberration, which appears to be Petzval field curvature. In the Fourier plane of the extension, we placed a photolithography-etched phase mask. Additionally, we placed a spectral bandpass filter (Chroma AT565/30) to optimize PSF engineering performance. Images were captured by FLIR grasshopper 3 camera (GS3-U3-120S6M-C).

2.3 Phase mask design

The phase mask of choice was inspired by previous work by Berlich et al. [14], where the double-helix PSF [10] was used to estimate distance in a macroscale scene. The goal PSF comprises of two lobes (ideally – two delta functions), rotating about their middle point as a function of object distance (Fig. 1.B). To produce such a PSF, a proper phase mask needs to be designed, manufactured and placed in the optical system. To design the phase mask, a synthetic set of 2D realizations of the desired PSF was generated, with each realization corresponding a defined object distance. In line with the PSF requirement, each realization comprised of a pair of 2D Gaussians separated by a fixed distance and oriented at an angle depending on object distance. Specifically, orientation angle changed linearly with one-over-the distance, which has proven to produce the most efficient results, as expected from the propertied of optical wavefront propagation. Once generating the synthetic 2D-PSFs and corresponding object distances, Vectorial-Implementation of Phase Retrieval [25] was implemented to obtain the pupil phase that generates a PSF similar to the synthetic set. The obtained pupil phase describes the phase mask required for the desired PSF (the resulting phase is color-coded in the top-right circle in Fig. 1.A). Based on the optimized phase design, a quartz phase mask was fabricated by photolithography, as described previously [19,26].

2.4 Image processing

2.4.1 Complex cepstrum – CoM-based analysis

The cepstrum domain is degraded due to the unknown object feature content, imaging noise and the shape of the PSF, which comprise of two lobes, rather than a pair of highly localized impulses. This required us to perform a series of steps in order to accurately allocate the impulses in the cepstrum, as shown in Fig. 1.C. The process of obtaining the lobe-angles is as follows: Initially, we apodize in the ROI prior to cepstrum calculation, by multiplying the ROI by a Hann windowing function. This is done to reduce artificial vertical and horizontal lines to the cepstrum, which we wish to avoid.

Next, the cepstrum was calculated according to Eq. (1). Cepstrums from multiple images were then averaged, if needed (e.g. for system calibration). At this point, the cepstrum generally comprises of the two lobes, in addition to lines that intersect the center, in cases where continuous lines appear in the raw image. The goal is to find a proper window around one of the lobes, and to perform a CoM localization inside it. To avoid the center-intersecting lines, we multiplied the cepstrums by a weight matrix W, that enhances regions that have strong gradients in the radial direction, while oppressing regions of strong tangential gradients. To generate W, the cepstrum is blurred (Gaussian filtering with σ = 2 for all analysis except drone, for which σ = 3) and the gradients (both magnitude, ${G_m}$ and direction, ${G_\varphi }$) are calculated. ${G_m}$ is smoothed by a median filter, and the projection of the gradient on the radial direction is raised to a high power α, to obtain our weight function:

(2)$${W_\rho }(\rho ,\varphi ) = \cos {(\varphi - {G_\varphi }(\rho ,\varphi ))^\alpha }$$

where $(\rho ,\varphi )$ are the radial and angular polar coordinates of the matrix, ${W_\rho }$ is a weight matrix based on the gradients’ radial projection. Throughout analysis, we used α=4. The motivation for using ${W_\rho }$ is to suppress regions where intense gradients appear that are mainly tangential, as in the case of the center-intersecting lines. The final weight matrix W is obtained by multiplying the gradient magnitude matrix ${G_m}$ by ${W_\rho }$, to maintain regions containing strong gradients in the radial direction, and applying a Gaussian blur. Once W is generated, it multiplies the original cepstrum (element-wise), and the product is further blurred. The result is multiplied by a binary map of a ring around the center, to avoid any high-intensity pixels that are not the correct distance from the cepstrum center. Finally, the maximal pixel $({x_0},{y_0})$ is chosen as the center of the window. The ring-shaped map is defined to be 6 pixels wide around the expected distance of the lobe, which is constant for all distances as determined by the PSF. For CoM calculation, the original cepstrum is smoothed by a median filter, then a window about $({x_0},{y_0})$ is cropped. Inside it, a threshold is applied to reduce the sensitivity of the localization to choice of window position [27]. Finally, the CoM is calculated inside the window, leading to a value of θ – the angle where the lobe is centered; if the CoM is more than 1 pixel away from the center, the window center is updated according to the result and the CoM is calculated in the new window.

2.4.2 Complex cepstrum – neural-network-based analysis

In addition to the analytical approach, we applied a convolutional neural network to decode distance from cepstrum. Network training was performed using cepstrums of simulated images, which were generated in four steps: 1. convolving a sharp telescope image with a PSF obtained from a revised imaging model (corresponding a known distance), 2. applying an apodization to the convolved image to alleviate the cross lines in cepstrums, 3. calculating the cepstrum; and finally, 4. removing the cepstrum center to solely address the side-lobes, that contain the meaningful information. Imperatively, for the network to correctly estimate distance of real data, high consistency must be achieved between the simulated PSF and the actual PSF of the optical system. The latter is expected to differ from the theoretical one (obtained from the phase-mask design) due to phase-mask manufacturing errors, misalignments and optics-induced aberrations. The degraded experimental PSF requires a revised optical system characterization, which we obtain via a process of phase retrieval (PR) process [28] (specifically, Misell algorithm [29]).

For the PR process, the PSF was measured at 12 known distances, by imaging a bright point-like light source under minimal turbulence conditions. To better account for nonzero turbulence, 100 frames were acquired per distance, producing multiple slightly-perturbed realizations of the optical-system’s PSF. In brief, the process of PR involves iterative estimations of the wavefront at the pupil plane of the 4-f extension (the plane of the phase mask), and the image plane. Using the optical model and measured PSFs, phase at the pupil plane is iteratively updated until converging to the true pupil plane of the optical system, which fully describes the PSF. Per iteration, a random set of frames (one frame per distance, out of the 100 acquired) was used, in order to average out the effect of random turbulence on the retrieved phase. We have found both in simulations and in experiments, that as long as turbulence severity is below a certain level, PR converges to a stable solution. More detail on PR is found in the SI.

The neural net includes a feature detection part and a regression part. The former is used to obtain higher-level features of the input cepstrum, e.g. lobe positions. It consists of four convolution blocks/layers, each of which is made up of three consecutive convolution operations followed by batch normalization and ReLU nonlinear operation, and a max pooling at the end. The outputs of those four blocks have channel numbers of 32, 64, 128 and 256 respectively. Then a global pooling is applied to reduce both the height and width to 1. The regression part is a fully connected layer with Sigmoid as the activation function aiming to predict distance from the higher-level features. We apply the relative L1 norm as the loss function:

(3)$$Loss = \frac{1}{N}\sum\limits_{i = 1}^N {\frac{{|{{\widehat y}_i} - {y_i}|}}{{{y_i} + \varepsilon }}}$$

where ${\widehat y_i},{y_i}$, and $N$ are the network prediction, ground truth, and batch size (128), respectively, and $\varepsilon = 1$ is a regularization constant.

2.5 Data acquisition

2.5.1 Car distance measurement

Data was acquired on March the 1st, 2022, at 16:00. Temperatures were around 17.5°C and winds of around 10 km/h. Exposure time was 120 ms to account for low light conditions, as the car was driving west on sunset, with 200 ms between frames.

2.5.2 Drone distance measurement

Data was acquired on April the 19th 2022, near Haifa in the morning. The weather was hot (∼31°C) with moderate winds (∼13 km/h). Exposure times were between 6.5 ms to 11 ms, depending on the sky brightness due to clouds. Viewed from the front, the size of the drone is 31.8 ${\times} $ 26.7 cm². Outliers were excluded using mean absolute deviation (Matlab isoutlier function default), no more than 10 outliers were excluded per distance.

2.5.3 Static scene – clear day

Data was acquired on the 25th of March 2022 at noon. Temperatures were around 12°C with wind of 18 km/h. Exposure times were determined by object brightness (albedo and illumination). Exact values are specified in Table S1. Outliers were excluded using mean absolute deviation, and are shown in Fig. 3.C.

2.5.4 Static scene – dusty weather

Data was acquired on the 25th of April 2022 at the afternoon. Temperature was 21.7°C with low winds (<5 km/h). Exposure time was 60 ms, with gain of 5 dB.

2.5.5 Turbulence measurements

Data was acquired on April the 18th, 2022 at time intervals of approximately one hour, from 10:00 to 20:00. The weather that day measured temperatures between 18.7-34°C with wind speeds peaking at 34 km/h.

2.6 System calibration

Calibration in our context is the process of establishing the relation between the lobe angle in the cepstrum and object distance. It was done by imaging a set of scenes containing objects having a well-known distance across the sensor (e.g. large walls with a planar depth map). For calibration, a large number of frames was acquired per scene. Each scene was divided to a grid of partially overlapping ROIs, with a single ground truth (GT) distance assigned to each ROI. Next, cepstrums were calculated per ROI, per frame, and then averaged in non-consecutive groups of 5 to improve the contrast prior to the lobe seeking. After obtaining the mean PSF-angle per ROI in each of the scenes, a calibration between angle and distance was fitted using the ROI lobe-angles and GT distances (see next paragraphs). Three fitted parameters per ROI determine the relation between PSF-angle and distance, locally. As a final step, 2nd degree 2D polynomials were fitted to each of the parameters across the FOV, to generate FOV-dependent calibration curves.

The relation between the PSF-angles in the cepstrum and distance, per ROI, was established experimentally by fitting local curves following the theoretical relation between these entities. This relation is based on the conjunction that the angle of the PSF is inversely proportional to the distance, according to our phase-mask design.

This conjunction leads to a generalized relation between the angle $\theta $ and distance $z$,

(4)$$\theta = a + \frac{b}{{z - c}}$$

where $a,b,c$ are the fit parameters. a represents the lobe angle at $z \to \infty $, b defines the sensitivity of the angle to change in distance and c is the distance corresponding the asymptote $\theta \to \infty $. Rewriting for z we get:

(5)$$z = c + \frac{b}{{\theta - a}}$$

3. Results

3.1 Dynamic object distance estimation

To demonstrate the applicability of our approach to range determination from a moving object, we imaged a car driving along a straight road extending nearly 800 m. Per frame, an ROI confined within the car boundaries was chosen. The ROI size was chosen to be 247² pixels, based on the apparent size of the car at its farthest position from the camera. As a rule of thumb, cepstrum analysis performed on larger ROIs reduces noise and leads to improved precision. We generated a calibration function relating cepstrum lobe-angle, which is the PSF angle, to distance, prior to the acquisition. The calibration was made in a Field-of-View (FOV) dependent manner (see 2.6 for further details), to account for a minor effect of FOV-dependence of the PSF which was observed. We suspect this effect to originate from the telescope lens, as we are pushing the viewing angles farther than the design FOV of the telescope. Using the calibration, the PSF angle of an ROI inside the car was used to estimate the distance of the car per frame, using both our analytical CoM method and the neural-net method. As a validation, we evaluated the Ground Truth (GT) distance of the car per frame by application of prior knowledge, namely, by knowing the “track width” of the car (defined as the distance between the back wheels of the car) and that it moves away from the observer in a smooth fashion. Using the track width, the apparent magnification was calculated per frame, from which distance was derived. The error in GT values is estimated at 1.5%. Details of GT evaluation process and discussion of errors are in the SI.

The results in Fig. 2 show that the passive-ranging estimation complies with the GT, while the absolute error increases with distance. Throughout the motion, the single-shot error (deviation from GT) was calculated; average error was found to be 3.4% of GT for the CoM analysis, and 6.0% for the neural net, and remains typically below 10% for the duration of the motion. We can observe a decrease in precision with increasing distance. This is due to the reduced sensitivity of wavefront curvature to distance, and plausibly due to an increased contribution of AT as light travels a larger distance through the air.

Fig. 2. Results of distance estimation of a driving car. A.-C. (Right) Example frames. The analyzed ROI is highlighted in blue. (Top left) The ROI. (Bottom left) A close up of the cepstrum of the ROI. The center is darkened for improved visibility. The angle obtained from CoM localization of the lobe is expressed by a line from the center. D. PSF-based estimations compared against a curve generated by applying priors on the data. The results can be seen in detail in Visualization 1. (E.) Snapshots of a flying drone at different distances. F. The results of drone distance estimation. GT is obtained from the drone’s built-in GPS (assumed precision of ∼5 m). Standard deviation of distance estimation is based on estimation from 100 individual frames.

Download Full Size | PDF

We also demonstrated dynamic aerial object ranging, by measuring the distance to a drone (DJI Mavic Mini), at several distances up to 1.08 km. Per distance, 100 frames were acquired. At large distances, the object was smaller than the PSF lobes separation, thus it appeared as two objects, shifted at the PSF angle corresponding the distance (Fig. 2.G). By employing the cepstrum approach the PSF angles and the distances were extracted.

3.2 Static object distance estimation

Another appealing use of our method is passive, single-shot ranging of static objects. To demonstrate this application, we acquired images centered on 27 different objects (e.g. trees, walls, cars) spread across a large view (Fig. 3.A). Per object, 100 frames were acquired for the purpose of precision analysis. Range estimation was performed using the analytical method, and results were evaluated by estimation precision and difference from the GT values. GT distance to objects closer than 1,000 m was measured using a laser rangefinder. Beyond this applicable range of the laser rangefinder, GPS was used for distance determination. We estimate the precision of the rangefinder to be ∼1 m, based on a discretization behavior it exhibited, and the precision of our GPS-based determination method to be 5 m. Exposure times were determined per frame to maximize image contrast while avoiding saturation (Table S1). Per frame, the cepstrum was calculated from an ROI of size 401² pixels, and the orientation of the PSF was extracted. Of the 27 scenes, a small subset was used for calibration (every 3rd scene by order of acquisition, not displayed in Fig. 3), while the other scenes were used to test the ranging performance. For each calibration scene, the average cepstrum angle of 100 frames was used to generate a calibration curve (Fig S4).

Fig. 3. Distance estimation of static scenes. A. A color image of the view containing the targets used for distance estimation. The ROI of each target is marked by a square. B. Results of distance estimation per object, compared to the GT values from the rangefinder or GPS. Error bars indicate the standard deviation in distance estimation, from 100 frames. C. Zoom-ins on the ROIs, and their cepstrums. Left images – ROI of each object, as acquired by the PSF-engineering system (first frame is shown). Right images – the average cepstrum of 100 frames, with blue lines indicating the estimated lobe-angle obtained per frame, yellow lines mark excluded outliers. Note that the averaged cepstrum is symmetric, so that the area covered by the lines is mirrored around the center of the cepstrum. D. Example scene in acquired in dusty atmosphere. E. The image of the scene as acquired by the PSF-engineering setup. ROI zoom-ins and their complementary cepstrums are shown to the right. GT values of the ROIs are 517 m (orange) and 1,130 m (blue), measured with the rangefinder on a clear day. F. Depth-map of the scene obtained by patch-based cepstrum analysis. The pixels of the depth map are smaller than the complement ROI sizes due to the partial overlap of ROIs.

Download Full Size | PDF

To improve the calibration process, we reduced the effect of unknown scene features over the cepstrum, before lobe-determination. We did so by measuring in parallel an unmodulated, focused image, calculating the average cepstrum of 10 such frames, and subtracting it from the calibration cepstrums. Note that this parallel acquisition was performed only during the calibration step; all subsequent range estimations were done using a single modulated image. Additionally, cepstrums in the calibration data were averaged (in non-consecutive groups of 5) to improve the contrast before finding the lobe-angles. A calibration function was then fitted, relating lobe-angle to GT distance, which was eventually used for distance estimation.

The results of the 18 scenes comprising the test data are presented in Fig. 3 and Table S1. We observed that the lobes in the cepstrum remain clearly visible regardless of distance, and are largely affected by the texture of the scene. The precision of the estimation is reduced as distance increases, mostly due to the decreased sensitivity of the wavefront curvature to distance. We observe bias in several objects - this is primarily due to the contribution of the object features to the cepstrum, which may “pull” the lobes CoM in some direction. The effect is most significant in images that contain repetitive features or continuous lines, such as tiled walls or windows. Generally, if there is a repeating feature in a scene (or a continuous line), where the repetition angle is close to the direction of the PSF orientation, it will bias the CoM localization, affecting the estimated lobe angle. This can be observed in scene K, where a vertical line in the scene overshadows the near-vertical orientation of the ceptrum lobes. In other cases, the cepstral contribution of lines can be ignored if it is sufficiently separated from the PSF-induced lobes in the cepstrum (e.g. scenes M, Q).

Continuous lines in the image yield lines that cross the center of the cepsctum and reduce in intensity as the radius increases. Appearance of such line is not uncommon in urban scenes, so we handle these cases algorithmically to only detect the lobes (see 2.4.1). The results show consistency of the angle-distance relation which is maintained over a large range. The precision of the results, calculated from the standard deviation of the angle estimations, is given by the error bars in Fig. 3.B. Another interesting scene is L, containing a concrete wall with essentially no features. This is an extreme example demonstrating the limitation of PSF engineering for ranging of objects with little-or-no texture, as hardly any features are seen in the cepstrum.

A major advantage of our approach over active range-finding is robustness to back-scatter from nearby scatterers, e.g. dust particles. To validate this, we examined the performance of the system under severe presence of dust in the air. We acquired images of a scene spanning a broad range of distances, and analyzed the depth-map of the scene. We did so by analyzing partially-overlapping ROIs (ROI size of 401² pixels, shift between adjacent ROI centers is 125 pixels). To account for the poor contrast from the dust, cepstrums from 5 consecutive frames were averaged before lobe-finding. Once found, the PSF angles were translated to distance using a calibration curve. The cepstrums are displayed in Visualization 2. Lastly, a median filter was applied to eliminate outliers, and obtain a depth-map of the scene (Fig. 3.F). Importantly, under such a heavy presence of dust in the air, the laser rangefinder failed to measure any target farther than the tree to the left (reading 458 m), repeatedly returning values of 20-30 m (Visualization 3). Similar results showing robustness to dust were obtained imaging a driving truck (Visualization 4).

3.3 Atmospheric turbulence and precision

Fluctuations in temperature and pressure of the air, namely, AT, induce random changes to the refractive index of the medium, distorting the wavefront. The effect of AT is clearly visible in long-distance (generally >100 m) imaging, both in traditional in-focus imaging and in our PSF-engineered setup. In traditional imaging, the contribution of AT is local translations across the image, accompanied by a blur due to the degraded PSF (Visualization 5). In our optical setup, the degraded wavefront (thus PSF) may also lead to false distance estimation.

We believe that the main contribution to distance estimation error due to AT occurs by a wavefront curvature aberration (defocus) - other aberrations deform the lobes, introducing noise by lowering contrast. The error introduced by AT is the one we are most concerned with, and we find it to be the bottleneck to the system performance. In order to evaluate the effect of turbulence on distance estimation, we imaged a point light source from a distance of 670 m. The acquisition was made on a hot day, where the peak turbulence was expected to be severe. Each hour, 1,000 frames were acquired under multiple exposure and gain settings, and in each frame the angle of the PSF was calculated by the CoM position of each of the PSF lobes. The effect of AT is apparent when imaging such a solitary point source, scintillating due to turbulence (Fig. 4.A).

Fig. 4. Turbulence effect on precision. A. The effect of turbulence on the PSF. PSFs from a point source acquired under mild (light blue, top row) and intense (red, bottom row) turbulence conditions (data from 20:00 and 12:00, respectively). First 3 frames of each data were colored and overlaid to emphasize the effect. The average of all 1,000 frames is shown in grayscale at the rightmost image. B. (Top) Density-plot of the angle-estimation results from 1,000 frames at different times of the day. (Bottom) the standard deviation of the estimated angle, ${\sigma _\theta }$, per acquisition. C. The expected range of the system due to AT, at different turbulence conditions (defined by ${\sigma _\theta }$). D. Normalized temporal correlations of angle-estimation, at the different times of the day (legend numbers denote acquisition times).

Download Full Size | PDF

At around noon, where AT is at its peak, the effect on the angles was eminent. This can be seen by the increased spread of the PSF angles, reaching a standard deviation of PSF angles (${\sigma _\theta }$) of almost 6.5° (Fig. 4.B). As AT-based error can be characterized by ${\sigma _\theta }$ it can be translated to precision in distance estimation using the system calibration. Figure 4.C shows the system’s single-shot ranging capabilities, providing a given tolerance of ranging precision (5% or 10%). Distance-determination precision can be improved by repeating the estimation over multiple frames; however, it is important to note that sufficient time is required between frames to obtain independence between the measurements, due to the finite correlation time of AT. We measured this effect by calculating the temporal correlation of the error in the different levels of AT (Fig. 4.D), and found that the temporal correlation is between 20-70 ms, defined as the point where normalized correlation decreases to 0.5. Importantly, as one could expect, strong turbulence tends to feature shorter correlation times. The implication is that under strong AT, snapshots acquired with short time separation will improve ranging precision more than the case of weak AT. This provides some amount of counter-effect to the limited precision. During measurements, we did not observe any effect of exposure time or gain levels over the precision, reinforcing the assumption that in our acquisition regime, the AT is the main cause for error, rather than limited photon count.

4. Discussion

We have presented and experimentally validated long-distance passive ranging by PSF engineering. We have addressed the two most significant challenges of the method, which are 1) dependence of distance estimation quality on object features and 2) degradation due to AT. Regarding the former, we have employed specific filtration in the cepstrum domain to help avoid generic erroneous results. Despite this, in special cases (e.g. no texture or a repetitive one) the proposed method is still susceptible to errors. Imperatively, such patterns are rare to find in typical real-world scenes, especially for dynamic objects, which are often the target for ranging applications (cars, planes, people, animals). The second challenge, AT, is the one we found to be the most influential on ranging performance. AT can be viewed as an inherent source of noise to the wavefront, which would degrade any passive ranging approach. Nevertheless, our results show that even under severe turbulence, good estimation precision (<5%) can be maintained at distances of several hundreds of meters. Clearly, under low turbulence conditions, the precision improves significantly, and in any case it can be further improved by employing multi-frame analysis. We speculate turbulence is the main reason for the lack of passive rangers at long distances, as it precludes achieving the pixel-scale accuracy often highly necessary for such instruments [2].

Several aspects of our methodology and implementation could be further optimized. One of them is system construction; our embodiment was based on a custom 3D-printed mount, which connects the optics to the telescope. The use of PLA for the integral mechanical part has probably led the system to be subject to mild internal translations, requiring occasional recalibration. In several occasions we have observed that calibration was off, leading to biased results, especially when calibration could not be performed on-site. This meant the telescope needed to be mounted and unloaded from a tripod and transported between sites, potentially affecting the calibration. Clearly, stronger materials should be used for the setup, although a robust alternative would be based on replacing the 4-f system by a large phase-mask placed in front of the aperture. Such modification, however, comes with the challenge of producing such a large phase mask; this may be impractical using commonly used fabrication such as nanolithography. Nevertheless, recent developments in diffractive optical element fabrication already enable simple custom-made phase masks on large scales, based on micron-scale 3D printing [30]. The use of a large mask at the objective lens will also enable better flexibility in choice of magnification (thus FOV) than shown in this work.

As for data analysis, we have employed an algorithmic approach based on simple CoM analysis, and applied a neural network that was trained to handle cepstrums. While the analysis scheme is relatively simple, it can be significantly improved by incorporation of object (feature) detection, prior assumptions, usage of temporal information, Kalman filtering (for moving objects) and more. Also, one of the main advantages of PSF engineering is the flexibility in designing the system’s PSF. The analysis pipeline can improve results significantly by incorporating an initial optimal PSF-designing step [26]. The PSF can feature improved compatibility to certain objects of interest (e.g. in an urban surrounding containing horizontal and vertical lines, the PSF may benefit from extending in the diagonal directions), or increased robustness to AT [31].

5. Conclusions

Experimentally demonstrated, PSF engineering can be employed for passive priorless ranging over long distances, exceeding 1.5 km. The current thorough demonstration of the method’s validity holds the potential for a variety of technological extensions and applications; notably, application-based PSF design, namely algorithmically learning the optimal PSF for different ranging scenarios [21,20,26,32] is an exciting territory to be explored in this context. Atmospheric turbulence has been found to be the prime source of error, and can be dealt with by means of multi-frame analysis, constrained by precision and temporal resolution requirements.

Funding

Israel Innovation Authority (69791); Israel Science Foundation (450/18); Horizon 2020 Framework Programme (802567).

Disclosures

The authors declare no conflict of interests.

Data availability

Data underlying the results presented in this paper may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. T. G. Phillips, N. Guenther, and P. R. McAree, “When the Dust Settles: The Four Behaviors of LiDAR in the Presence of Fine Airborne Particulates,” J. F. Robot. 34(5), 985–1009 (2017). [CrossRef]

2. Y. Y. Schechner and N. Kiryati, “Depth from defocus vs. stereo: how different really are they?” in Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170) (IEEE Comput. Soc, n.d.), 2, pp. 1784–1786.

3. A. OrRiordan, T. Newe, G. Dooly, and D. Toal, “Stereo Vision Sensing: Review of existing systems,” in 2018 12th International Conference on Sensing Technology (ICST) (IEEE, 2018), pp. 178–184.

4. F. Blais, “Review of 20 years of range sensor development,” J. Electron. Imaging 13(1), 231 (2004). [CrossRef]

5. B. Vanek, A. Zarandy, A. Hiba, P. Bauer, T. Tsuchiya, S. Nagai, and R. Mori, Technical Note D4 . 7 Preliminary Validation Results of Vision-Based Navigation Techniques (2019), 2018 (2018).

6. A. Saxena, S. Jamie, and A. Y. Ng, “Depth estimation using monocular and stereo cues,” IJCAI Int. Jt. Conf. Artif. Intell.2197–2203 (2007).

7. S. G. Narasimhan and S. K. Nayar, “Vision and the Atmosphere,” Int. J. Comput. Vis. 48(3), 233–254 (2002). [CrossRef]

8. C. Wu, J. Ko, J. Coffaro, D. A. Paulson, J. R. Rzasa, L. C. Andrews, R. L. Phillips, R. Crabbs, and C. C. Davis, “Using turbulence scintillation to assist object ranging from a single camera viewpoint,” Appl. Opt. 57(9), 2177 (2018). [CrossRef]

9. G. E. Johnson, E. R. Dowski, and W. T. Cathey, “Passive ranging through wave-front coding: information and application,” Appl. Opt. 39(11), 1700 (2000). [CrossRef]

10. A. Greengard, Y. Y. Schechner, and R. Piestun, “Depth from diffracted rotation,” Opt. Lett. 31(2), 181 (2006). [CrossRef]

11. Y. Shechtman, S. J. Sahl, A. S. Backer, and W. E. Moerner, “Optimal Point Spread Function Design for 3D Imaging,” Phys. Rev. Lett. 113(13), 133902 (2014). [CrossRef]

12. A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” ACM Trans. Graph. 26(3), 70 (2007). [CrossRef]

13. S. Quirin and R. Piestun, “Depth estimation and image recovery using broadband, incoherent illumination with engineered point spread functions [Invited],” Appl. Opt. 52(1), A367 (2013). [CrossRef]

14. R. Berlich, A. Bräuer, and S. Stallinga, “Single shot approach for three-dimensional imaging with double-helix point spread functions,” in Imaging and Applied Optics 2016 (OSA, 2016), p. CTh1D.4.

15. S. Tucker, W. T. Cathey, and E. Dowski Jr., “Extended depth of field and aberration control for inexpensive digital microscope systems,” Opt. Express 4(11), 467 (1999). [CrossRef]

16. J. Broeken, B. Rieger, and S. Stallinga, “Simultaneous measurement of position and color of single fluorescent emitters using diffractive optics,” Opt. Lett. 39(11), 3352 (2014). [CrossRef]

17. Y. Shechtman, L. E. Weiss, A. S. Backer, M. Y. Lee, and W. E. Moerner, “Multicolour localization microscopy by point-spread-function engineering,” Nat. Photonics 10(9), 590–594 (2016). [CrossRef]

18. E. Hershko, L. E. Weiss, T. Michaeli, and Y. Shechtman, “Multicolor localization microscopy and point-spread-function engineering by deep learning,” Opt. Express 27(5), 6158 (2019). [CrossRef]

19. N. Opatovski, Y. Shalev Ezra, L. E. Weiss, B. Ferdman, R. Orange-Kedem, and Y. Shechtman, “Multiplexed PSF Engineering for Three-Dimensional Multicolor Particle Tracking,” Nano Lett. 21(13), 5888–5895 (2021). [CrossRef]

20. V. Sitzmann, S. Diamond, Y. Peng, X. Dun, S. Boyd, W. Heidrich, F. Heide, and G. Wetzstein, “End-to-end optimization of optics and image processing for achromatic extended depth of field and super-resolution imaging,” ACM Trans. Graph. 37(4), 1–13 (2018). [CrossRef]

21. X. Dun, H. Ikoma, G. Wetzstein, Z. Wang, X. Cheng, and Y. Peng, “Learned rotationally symmetric diffractive achromat for full-spectrum computational imaging,” Optica 7(8), 913 (2020). [CrossRef]

22. S. R. P. Pavani and R. Piestun, “High-efficiency rotating point spread functions,” Opt. Express 16(5), 3484 (2008). [CrossRef]

23. D. G. Childers, D. P. Skinner, and R. C. Kemerait, “The cepstrum: A guide to processing,” Proc. IEEE 65(10), 1428–1443 (1977). [CrossRef]

24. J. K. Lee, M. Kabrisky, M. E. Oxley, S. K. Rogers, and D. W. Ruck, “The complex cepstrum applied to two-dimensional images,” Pattern Recognit. 26(10), 1579–1592 (1993). [CrossRef]

25. B. Ferdman, E. Nehme, L. E. Weiss, R. Orange, O. Alalouf, and Y. Shechtman, “VIPR: vectorial implementation of phase retrieval for fast and accurate microscopic pixel-wise pupil estimation,” Opt. Express 28(7), 10179 (2020). [CrossRef]

26. E. Nehme, D. Freedman, R. Gordon, B. Ferdman, L. E. Weiss, O. Alalouf, T. Naor, R. Orange, T. Michaeli, and Y. Shechtman, “DeepSTORM3D: dense 3D localization microscopy and PSF design by deep learning,” Nat. Methods 17(7), 734–740 (2020). [CrossRef]

27. H. C. van Assen, M. Egmont-Petersen, and J. H. C. Reiber, “Accurate object localization in gray level images using the center of gravity measure: accuracy versus precision,” IEEE Trans. Image Process. 11(12), 1379–1384 (2002). [CrossRef]

28. J. R. Fienup, “Reconstruction of an object from the modulus of its Fourier transform,” Opt. Lett. 3(1), 27–29 (1978). [CrossRef]

29. Y. Xu, Q. Ye, A. Hoorfar, and G. Meng, “Extrapolative phase retrieval based on a hybrid of PhaseCut and alternating projection techniques,” Opt. Lasers Eng. 121, 96–103 (2019). [CrossRef]

30. R. Orange-Kedem, E. Nehme, L. E. Weiss, B. Ferdman, O. Alalouf, N. Opatovski, and Y. Shechtman, “3D printable diffractive optical elements by liquid immersion,” Nat. Commun. 12, 3067 (2021). [CrossRef]

31. X. Liu and J. Pu, “Investigation on the scintillation reduction of elliptical vortex beams propagating in atmospheric turbulence,” Opt. Express 19(27), 26444 (2011). [CrossRef]

32. E. Nehme, B. Ferdman, L. E. Weiss, T. Naor, D. Freedman, T. Michaeli, and Y. Shechtman, “Learning Optimal Wavefront Shaping for Multi-Channel Imaging,” IEEE Trans. Pattern Anal. Mach. Intell. 43(7), 2179–2192 (2021). [CrossRef]

Monocular kilometer-scale passive ranging by point-spread function engineering

Abstract

1. Introduction

2. Materials and methods

2.1 Wavefront shaping and cepstrum analysis

2.2 Optical system

2.3 Phase mask design

2.4 Image processing

2.4.1 Complex cepstrum – CoM-based analysis

2.4.2 Complex cepstrum – neural-network-based analysis

2.5 Data acquisition

2.5.1 Car distance measurement

2.5.2 Drone distance measurement

2.5.3 Static scene – clear day

2.5.4 Static scene – dusty weather

2.5.5 Turbulence measurements

2.6 System calibration

3. Results

3.1 Dynamic object distance estimation

3.2 Static object distance estimation

3.3 Atmospheric turbulence and precision

4. Discussion

5. Conclusions

Funding

Disclosures

Data availability

Supplemental document

References

Supplementary Material (6)

Data availability

Cited By

Figures (4)

Equations (5)

Optics Express

Name	Description
Supplement 1	SI
Visualization 1	Distance estimation of the car from Fig. 2, per frame.
Visualization 2	ROIs and cepstrums used for ranging the dusty-air scene of Fig. 3.
Visualization 3	Ranging by the rangefinder, in the dusty-air scene. It successfully ranges a few targets that are < 500 m away, and for the rest, it reads 26.8 m, as it is obscured by the dust. On a clear day, the rangefinder easily ranges the closer targets, and occ
Visualization 4	Truck distance estimation, with simultaneous readings from the rangefinder. Data was acquired on the 10th of April, at 08:00. Temperatures were around 19.5°C and winds of around 10 km/h. Exposure was 15 ms, with 200 ms between frames. A CCD camera (N
Visualization 5	Example of the effect of turbulence. A school banner is imaged by the telescope from 1,650 m away (sharp PSF, without PSF engineering), in days of low and high atmospheric turbulence.