Bio-inspired foveal super-resolution method for multi-focal-length images based on local gradient constraints

Feng Huang; Xuesong Wang; Yating Chen; Xianyu Wu

doi:10.1364/OE.524154

1. Introduction

The angular resolutions of most commercial imaging systems are limited primarily by pixelation effects rather than by diffraction or optical aberrations. Therefore, super-resolution (SR) techniques have begun to receive attention as methods to obtain images with higher spatial resolutions. Existing image SR technology can be mainly classified into multi-image SR (MISR) techniques [1] and single-image SR (SISR) techniques [2].

When the information frequency of the target scene is higher than the sampling frequency of a single detector, the high-frequency information of the scene will degenerate into aliasing information and thus, high-resolution (HR) imaging cannot be achieved. Combined with the MISR imaging algorithm, the aliasing information contained in the multi-frame image data collected by the camera arrays can be reconstructed as high-frequency information of the target scene [3–12]. The maximum likelihood estimation SR method based on L2 regularization (L2TVSR) [13] can quickly restore the high-frequency information of images. The SR method based on bilateral total variation regularization (L1BTVSR) [14] can preserve the edge information of images. The SR method based on the Lucy–Richardson method (LRSR) [8] extends the application of the LR denoising algorithm to the field of image restoration. Single-frame-image SR methods based on deep-learning have a wide range of applications and do not necessarily require complex specially designed optical imaging systems, only needing images acquired by commercial cameras for SR imaging (such as blind SR generative adversarial network (BSRGAN)) [15–20]. However, single-frame SR algorithms require large amounts of prior information: training image datasets are needed to learn the image degradation models. Therefore, due to the limitations of image datasets, tasks that can be handled by SISR methods may be limited to certain application scenarios (such as face SR). To overcome the limitations of the SISR method, researchers use high-resolution reference (HRef) images for analysis to enhance the generalization performance of the model. Therefore, self-supervised dual zoomed SR (selfDZSR) is proposed as a dual-camera fusion super-resolution technique based on self-supervised learning [21]. This technique combines the wide field of view of short-focus cameras and the high-resolution characteristics of long-focus cameras, optimizing the imaging quality of low-resolution (LR) short-focus images by referencing the high-resolution details in long-focus images.

The primary advantage of camera arrays is their ability to overcome the limitations of pixelation effects and increase the angular resolution of images, enabling clearer and more detailed image capture. Inspired by insects, traditional camera arrays have lenses and detectors evenly distributed across the field of view. They achieve super-resolution images by reconstructing non-redundant information from multiple frames of the same object. However, in many applications, high angular resolution in imaging is not the only requirement; a wide field of view is also necessary to provide context and situational awareness [10,22–24].

In comparison to traditional imaging systems, foveal imaging systems offer a broader field of view. They also preserve high resolution through the design of non-uniform light density distribution. This design enriches images with richer contextual information and situational awareness capabilities. It is akin to the human eye, as illustrated in Fig. 1. In 1990, Tistarellia et al. [25] introduced a foveal imaging charge coupled device (CCD) sensor. It was segmented into 30 concentric rings, with pixel sizes increasing from 30 $\mu$m to 412 $\mu$m radially from the center to the periphery in a polar coordinate fashion. Subsequently, in 2000, Guillem et al. [26] achieved foveal imaging with 8,013 pixels on complementary metal oxide semiconductor (CMOS). Compared to CCD, CMOS is more controllable in data storage and easier to interface with a microprocessor, making it more suitable for simulating the acquisition of foveal information. In 2014, Gao, H. et al. [27] employed a non-uniform lens array arranged in a logarithmic polar configuration to capture light on a photosensitive material. This strategy shifts the complexity to the design of the lens array, enhancing its adaptability to various scenarios. Subsequently, in 2016, G. Carles et al. [10] introduced foveal imaging by combining prisms and camera arrays. This achieved high-resolution central imaging and low-redundancy peripheral imaging. They utilized a camera array super-resolution imaging technique to achieve high-resolution imaging at the fovea. Then, in 2017, a 3D printed 2$\times$2 array of microlenses with different focal lengths was positioned directly on the CMOS sensor to achieve foveal imaging [28]. This approach allows rapid design since the four lenses are printed simultaneously without further assembly.

Fig. 1. Foveal vision of the human eye. (a) The spatial resolution characteristic of human vision is from the periphery to the center; and the spatial resolution is gradually enhanced. (b) This visual property is due to the higher density of cones in the foveal region of the human eye structure. This feature helps people observe the surrounding environment and reduces the difficulty of obtaining detailed information about the external environment.

Download Full Size | PDF

However, in the current situation, most commercial imaging systems generally adopt uniform imaging across the entire field of view. This results in the need for a more complex lens array design to implement the foveal imaging method. This article, inspired by [10], proposes an easily assembled method. By changing the focal length of the lens array based on the camera array, super-resolution imaging is achieved in the fovea. Additionally, this paper proposes a MISR algorithm based on multi-focal-length (MFL) images. This study successfully implemented the mechanism of foveal imaging and significantly enhanced the angular resolution of foveal imaging by using the MFL image acquisition method, achieving the expected results. This method breaks through the limitations of a single commercial imaging sensor produced by modern manufacturing technology, demonstrating important application significance.

The contributions of this work are as follows.

1. For images captured by cameras with different focal lengths at different spatial scales, this paper proposes an SR method based on the local gradient constraint, which imitates the foveal imaging process. Simultaneously, while acquiring a large-field-of-view image, this method enables high-resolution imaging of the central area.

2. A multi-aperture, multi-focus imaging model is presented. In order to solve the problem of blurring that may occur when using multiple frames of images with different focal lengths for SR reconstruction, we propose a local gradient constraint method to ensure the details and texture of the reconstructed image.

3. Based on the proposed MFL image SR technology, a multi-focal-length SR imaging (MFLSRI) prototype was established and image data captured in indoor and outdoor scenes were used for experimental verification.

The structure of the remainder of this paper is as follows. Section 2 describes the principles of the bio-inspired MFLSRI system. Section 3 introduces the proposed MFLSRI method based on the local gradient constraint. Section 4 presents the experimental results of the proposed method and compares it with other state-of-the-art (SOTA) SR methods. Lastly, section 5 summarizes our work.

2. Principles of the imaging system

2.1 Camera arrays

Most camera arrays are designed as imaging devices that use the same detector and lens to acquire sub-images of the same target scene, which is the case with the existing MISR imaging methods that use camera arrays. In the context of the entire system, the Nyquist frequency $v_{\text {N}}$ must satisfy the condition $v_{\text {A}}/2 > v_{\text {N}} > v_{\text {D}}$. Here, $v_{\text {A}}$ represents the system’s sampling angular frequency, and $v_{\text {D}}$ represents the optical cut-off frequency of the system. The optical cut-off frequency $v_{\text {D}}$ is calculated as $v_{\text {D}} = \frac {F}{1.22 f \lambda _{\text {O}}}$, where $f$ and $F$ denote the focal length and aperture value, respectively. The wavelength of visible light, denoted as $\lambda _{\text {O}}$, is 550 nm in the context of this article. Furthermore, the Nyquist frequency $v_{\text {N}}$ is defined as $v_{\text {N}} = \frac {f}{2p}$, where $p$ represents the pixel size.

Mathematically, the camera arrays must consist of $M\times M\left ( M>int\left ( {{v}_{\text {N}}}/{{v}_{\text {A}}} \right ) \right )$ cameras, where the ${{v}_{\text {D}}}$ of a single lens and the ${{v}_{\text {N}}}$ of a single focal plane array (FPA) are in a ratio with the former being much higher than the latter. From a simple geometric perspective, it can be understood that the pixel count of the reconstructed HR image is $M^{2} \ \times$ that of a single camera. Based on this, the theoretical spatial resolution is expected to be increased by $M \ \times$. The successful implementation of MISR is not solely dependent on high pixel counts. It also necessitates the existence of sub-pixel shifts between array images and the presence of aliasing artifacts within the array images. The SR technique, theoretically, can be used to achieve the spatial resolution of an SR image to the diffraction limit of the lens.

2.2 Proposed imaging system

To facilitate the creation of an easily assembled foveal SR imaging system, this paper explores the concept of implementing variable spatial resolution imaging as opposed to conventional lens stacking. This approach aims to optimize the trade-off between spatial resolution and field of view within the system. According to the characteristics of human eye imaging, the sampling frequency of the target area in the center of the field of view is increased to reduce the number of required lenses. Simultaneously, through the SR reconstruction algorithm, the image resolution of the central area can be close to the diffraction limit of the lens. The prototype of the imaging system is shown in Fig. 2(b). The prototype contains three different types of lenses and nine high-speed imaging FPAs, forming nine cameras in total. The different colors in Fig. 2(b) represent different lenses: green marks four lenses with focal lengths of $f_{16}$= 16 mm (Basler, C125-1620-5M-P, f/2.0); blue marks four lenses with focal lengths of $f_{25}$= 25 mm (Basler, C125-2522-5M-P, f/2.0); and a lens with a focal length of $f_{35}$= 35 mm (Edmund, HR 85-869, f/1.8) is in the center of the camera arrays. All cameras are equipped with the same FPA sensor (Sony IMX287), the pixel size is p = 6.9 μm, and the sensor resolution is $720 \times 540$.

Fig. 2. The proposed SR imaging process for the MFLSRI system, the schematic diagram illustrating the configuration of the MFLSRI system, and the curves depicting the imaging angular resolution. (a)Schematic diagram of the super-resolution reconstruction principle of the proposed MFLSRI system. Image pixels captured by the array camera are mapped onto a high-resolution grid of the reference image, and distance-weighted based interpolation is performed on the grid. The pixel values are then aggregated within the grid. (b) MFLSRI system. Among them, the green, blue and yellow lenses are 16-, 25-, and 35-mm-focal-length lenses, respectively. (c) Angular resolution comparison of different types of lenses and the proposed bio-inspired MFLSRI system. Green, blue, and yellow is the sampling frequencies of the 16-, 25-, and 35-mm-focal-length cameras, respectively; red indicates the sampling frequency of the proposed MFLSRI system.

Download Full Size | PDF

The field of view of a single camera in the prototype imaging system and the reconstructed foveal imaging field of view are shown in Fig. 2(a). As a lens with a larger focal length results in a smaller field of view, the closer to the central area, the greater the amount of image overlap, resulting in higher spatial resolution through computational reconstruction. In Fig. 2(a), the peripheral field of view (represented by the green points) is equivalent to the field of view of the image captured by the 16-mm-focal-length lens; the size of the transitional field of view (represented by the blue points) is equal to the size of the field of view of the 25-mm-focal-length lens; and the central field of view (represented by the yellow points) is equal in size to the field of view of the 35-mm-focal-length lens. We established a high-resolution grid based on the coordinate system of the reference image and projected all sample points onto this grid using a registration method. In this paper, the registration method employed is the scale-invariant feature transform (SIFT [29]). In Fig. 2(a), each red coordinate point is derived by applying Euclidean distance interpolation to the adjacent sample points, resulting in the reconstructed high-resolution image.

To deeply explore the performance of this system, a mathematical model was established in this paper. According to the definition of angular resolution, the field of view angle and the sample count $N$ within its range are used in this study to compute the system’s Nyquist angle resolution, given by the formula:

(1)$$\delta _{\text{N}}=4\sqrt{{{\theta}_{\text{h}}}{{\theta}_{\text{v}}}/N},$$

Specifically, ${{\theta }_{\text {h}}}$ and ${{\theta }_{\text {v}}}$ denote the half-angles for the horizontal field of view (HFOV) and the vertical field of view (VFOV). The imaging principle of a single camera is shown in Fig. 3. Considering an imaging system composed of $K$ cameras, each with varying focal lengths, the formula to determine $N$ within its field of view is:

(2)$$N=\frac{4\tan{{\theta}_{\text{h}}}\tan{{\theta}_{\text{v}}}}{{{p}^{2}}}\sum_{k=1}^{K}{{{f}^{2}_{k}}}.$$

Where $\sum \limits_{k=1}^{K}{{{f}^{2}_{k}}}$ signifies the aggregate focal length across all cameras. For the sake of clarity and simplicity in notation, we will henceforth denote it as $\sum {{{f}^{2}_{k}}}$. Combining Eq. (1) and Eq. (2), it’s feasible to compute the Nyquist angle resolution for an individual camera as well as project the Nyquist angle resolution for the complete MFLSRI system. Moreover, the equation to determine the $\delta _{\text {N}}$ can be further reduced to:

(3)$${{v}_{\text{N}}}=2p/\sqrt{\sum{f_{k}^{2}}}.$$

Fig. 3. The imaging principle of a single camera. From the image, it can be inferred that when the HFOV, focal length, and pixel size are known, the number of sampling points in the horizontal direction can be determined.

Download Full Size | PDF

For $K=1$, Eq. (3) can determine the Nyquist angle resolution of a camera with focal length $f$. Therefore, in comparison to a camera with focal length $f$, the spatial resolution enhancement factor $m$ for the MFLSRI system is given by:

(4)$$m=\frac{2p/f\ }{2p/\sqrt{\sum{f_{k}^{2}}}\ }=\sqrt{\sum{f_{k}^{2}}/{{f}^{2}}}.$$

Equation (3) allows for theoretical computation of the sampling rate for a single aperture and the imaging system in this study, as depicted in Fig. 2(c). Moreover, to achieve MFLSRI with a MFLSRI system, it is necessary to meet the three conditions required for the camera arrays to implement MISR. Firstly, Nyquist angular resolution is greater than the optical cut-off angular resolution of the lens; The second and third conditions require the array images to have sub-pixel displacement and aliasing artifacts. Considering the intrinsic design of the MFLSRI system, the second requirement is inherently met, and the third underscores the justification for utilizing MFLSR.

Hence, in the MFLSRI system, it is imperative that the maximum optical cut-off angular resolution of the lens assembly must be smaller than the minimum Nyquist angular resolution of the camera assembly. As an example, within the lens array, the 16 mm focal length lens possesses the maximum optical cut-off angular resolution, which is calculated as $1/{{v}_{\text {D}}}$=0.084 mrad. Among all the cameras, the one with a 35 mm focal length has the smallest Nyquist angle resolution of 0.40 mrad. Thus, this designed MFLSRI system is capable of achieving SR at the center. Utilizing Eq. (4), the SR factor for relevant regions can be determined. As an illustration, relative to a camera with a 16 mm focal length, in the peripheral zone (solely where the 16 mm lens overlaps), transitional region (where the 16 mm and 25 mm lenses overlap), and the central section (where all lenses overlap), the MFLSRI system has SR multiples of 2, 3.7, and 4.3, respectively. Comparatively, for a camera with a 35 mm focal length, the MFLSRI system displays a SR factor of 2 in the central zone. A more comprehensive calculation outcome is presented in Table 1.

Table 1. Specific imaging parameters of proposed device

View Table | View all tables in this article

3. Method

3.1 Image SR model based on multiple apertures and focal lengths

We assume that there exists a spatially variable-resolution image $\boldsymbol {Y}$. Different shifting, blurring, downsampling, and cropping operations on $\boldsymbol {Y}$ yield nine different low-resolution images ${{\boldsymbol {X}}_{i}}$:

(5)$$\boldsymbol{X}_i=\boldsymbol{D}_{i}\boldsymbol{B}_{i}\boldsymbol{M}_{i}\boldsymbol{T}_{i}\boldsymbol{Y}+n,i\in 1,2,\ldots ,9,$$

where $\boldsymbol {D}$ represents the downsampling matrix, $\boldsymbol {B}$ represents the blur matrix, $\boldsymbol {M}$ represents the shift matrix, $\boldsymbol {T}$ represents the cropping matrix, and $n$ represents the noise. The subscript $i$ represents the $i$-th frame low-resolution image.

Let $\boldsymbol {H}_i=\boldsymbol {D}_i\boldsymbol {B}_i\boldsymbol {M}_i\boldsymbol {T}_i$, Eq. (5) can be written as

(6)$$\boldsymbol{X}_i=\boldsymbol{H}_i\boldsymbol{Y}+n,$$

and the data fidelity term of the SR algorithm can be written as

(7)$$\mathop{\text{arg min}}\limits_{\boldsymbol{Y}}\left\| \boldsymbol{HY}-\boldsymbol{X} \right\|,$$

where $\boldsymbol {X}=\left [ \boldsymbol {X}_1,\boldsymbol {X}_2,\ldots,\boldsymbol {X}_9 \right ]$ and $\boldsymbol {H}=\left [ \boldsymbol {H}_1,\boldsymbol {H}_2,\ldots,\boldsymbol {H}_9 \right ]$.

In SR methods, reconstructing SR images is an ill-conditioned problem. When relying exclusively on the data fidelity term, SR methods often fail to achieve high-quality image reconstruction. For these ill-conditioned problems, a set of low-resolution images may reconstruct multiple different SR images. To solve this ill-conditioned problem, an effective prior regular term should be added to constrain the solution of the optimization problem according to the actual situation. Local gradient constraints have certain advantages in multi-focal image SR methods, which is the focus of this paper [30]. In addition, as a certain amount of noise will be introduced in the SR reconstruction process, the method in this paper introduces a TV regular term based on the 1-norm for image denoising. Therefore, the formula of the multi-focal-length SR model proposed in this paper is

(8)$$\boldsymbol{\hat{Y}}=\mathop{\text{arg min}}\limits_{\boldsymbol{Y}}\left\| \boldsymbol{HY}-\boldsymbol{X} \right\| _2+\lambda LGC\left( \boldsymbol{Y} \right) +\gamma \left\| \nabla \boldsymbol{Y} \right\| _1.$$

The first term is the data fidelity term, the second term is the local gradient constraint term, and the third term is the 1-norm-based TV denoising term. $\boldsymbol {\hat {Y}}$ is the solution that approximates $\boldsymbol {Y}$, $\lambda$ and $\gamma$ are the regularization parameters used to penalize these two terms, and $\nabla$ represents the gradient operator. After $t$ iterations, the SR image result ${{\boldsymbol {\hat {Y}}}^{t}}$ is obtained. It should be noted that $\boldsymbol {\hat {Y}}^0$ is a HR image derived from the sampling point interpolation technique outlined in Section 2.

According to the model of the MFLSRI system proposed in this paper, in practical applications, the acquired images are low-resolution images with different sharpnesses and focal lengths. Analogous to challenges in image fusion, hazy images of low clarity diminish the enhancements introduced by crisper images, mirroring the diffusion effects observed in chemistry, thereby undermining the quality of images reconstructed using super-resolution techniques. Although images with short focal lengths are much blurrier than those with long focal lengths, according to our theoretical calculations in Section 2, all images should provide spatial resolution gains to the reconstructed image. Drawing on panchromatic sharpening and image fusion technology [31–34], it can be considered that there is a linear relationship between the gradients of images of different focal lengths, as shown in Fig. 4. Figure 4(a) shows images taken of the same scene at three focal lengths using an imaging device. Zooming in on the local details of the three images shows that the larger the focal length, the clearer the image and the stronger the edge contrast. Figure 4(b) plots the gradient changes of the three images on line 322. The gradients of the three images with different focal lengths have an obvious linear relationship locally. Therefore, we infer that in the iterative process, there is a local linear relationship between the gradient $\nabla {{\boldsymbol {Y}}^{t}}$ of the $t$-th iteration result and the gradient $\nabla {{\boldsymbol {Y}}^{t+1}}$ of the $t+1$-th iteration result. Simultaneously, as reported in the literature [30,35], the use of gradient constraints in image reconstruction is beneficial for preserving the edge structure of the image. To enhance edge preservation in the SR image, local gradients are employed to impose constraints on the gradient of the SR image. The local gradient constraint term is as follows:

(9)$$LGC\left( \boldsymbol{Y} \right) =\mathop{\text{arg min}}\limits_{\boldsymbol{Y}}\left\| \nabla \boldsymbol{Y}-\nabla \boldsymbol{Y}_{LGC} \right\| ,$$

where $\nabla \boldsymbol {Y}$ represents the gradient of the ideal super-resolved image so that the constrained gradient $\nabla \boldsymbol {Y}_{LGC}^{t}$ for the $t$-th iteration is

(10)$$\nabla \boldsymbol{Y}_{LGC}^{t}=\boldsymbol{A}^{t-1}\nabla \boldsymbol{\hat{Y}}^{t}+\boldsymbol{C}^{t-1}.$$

Fig. 4. Linear relationship of the image on the gradient between different focal lengths. (a) Local details of the same scene (inside the dotted frame) were captured with lenses of different focal lengths; the focal lengths of the three images are 35 mm, 25 mm, and 16 mm, from left to right. (b) Gradient scanning of three images with different focal lengths at line 322, where red, blue, and green correspond to the gradients of the red, blue, and green horizontal lines in the three images in (a) respectively.

Download Full Size | PDF

$\nabla {{\boldsymbol {\hat {Y}}}^{t-1}}$ represents the gradient map in the SR image during the $t-1$-th iteration. Referring to the maximum value mapping theory in the image fusion method [36], when $t$ = 1, $\nabla {{\boldsymbol {Y}}_{LGC}}=\max \left ( \nabla \left ( \boldsymbol {H}_{i}^{T}{{\boldsymbol {X}}_{i}} \right ) \right )$($i\in 1,2,\ldots,9$). In this paper, $\max \left ( \nabla \left ( \boldsymbol {H}_{i}^{T}{{\boldsymbol {X}}_{i}} \right ) \right )$ refers to projecting each low-resolution image onto a high-resolution grid and selecting the maximum gradient value at each position to generate $\nabla \boldsymbol {Y}_{\text {LGC}}^{1}$. In Eq. (10), $\boldsymbol {A}$ and $\boldsymbol {C}$ are coefficient matrices generating the local linear relationship between $\nabla \boldsymbol {\hat {Y}}$ and $\nabla {{\boldsymbol {Y}}_{LGC}}$. For the $k$-th pixel in the image, the local linear relationship between $\boldsymbol {A}$ and $\boldsymbol {C}$ is

(11)$$\nabla {{y}_{LGCi}}={{a}_{k}}\nabla {{\hat{y}}_{i}}+{{c}_{k}},\forall i\in {{w}_{k}},$$

where ${{w}_{k}}$ represents the image block area centered on the $k$-th pixel. For any pixel $\forall i\in {{w}_{k}}$, $\nabla {{y}_{LGCi}}$ and $\nabla {{\hat {y}}_{i}}$ are the gradient values of the $i$-th pixel on the corresponding image. ${{a}_{k}}$ and ${{c}_{k}}$ are constant coefficients on the local image regions ${{w}_{k}}$ concerning the linear constraint. Eq. (11) is similar to the guided filter, but the local gradient constraint term acts to sharpen images rather than smooth them: sharpening is performed along the gradient of the image edge. Our solution to Eq. (11) follows the strategy of guided filters. The minimization objective formula concerning ${{a}_{k}}$ and ${{c}_{k}}$,

(12)$$\mathop{\text{arg min}}\limits_{{a_{k}^{t},c_{k}^{t}}}\,\sum\nolimits_{i\in {{w}_{k}}}^{{}}{{{\left( \nabla y_{LGCi}^{t}-a_{k}^{t}\nabla \hat{y}_{i}^{t-1}-c_{k}^{t},\forall i\in {{w}_{k}} \right)}^{2}}},$$

when solved, yields

(13)$$a_{k}^{t}=\frac{\frac{1}{\left| w \right|}\sum\nolimits_{i\in {{w}_{k}}}^{{}}{\nabla y_{LGCi}^{t}\nabla \hat{y}_{i}^{t-1}-\mu \left( \nabla y_{LGCi}^{t} \right)}\mu \left( \nabla \hat{y}_{i}^{t-1} \right)}{{{\sigma }^{2}}\left( \nabla \hat{y}_{i}^{t-1} \right)+\varepsilon },$$

(14)$$c_{k}^{t}=\mu \left( \nabla y_{LGCk}^{t} \right)-a_{k}^{t}\mu \left( \nabla \hat{y}_{k}^{t-1} \right),$$

where $\mu$ and ${{\sigma }^{2}}$ represent the mean and variance, respectively, and $\left | w \right |$ is the number of pixels in ${{w}_{k}}$. Because in Eq. (11), each pixel is included in the overlap window about this pixel, the value of $\nabla {{y}_{i}}$ is different in different local patches $w$. For different ${{w}_{k}}$, we obtain $\left | w \right |$ values of $\nabla {{y}_{LGCi}}$. Taking their average, we have

(15)$$\nabla {{y}_{LGCi}}={{\bar{a}}_{i}}\nabla {{\hat{y}}_{i}}+{{\bar{c}}_{i}},$$

where ${{\bar {a}}_{i}}=\frac {1}{\left | w \right |}\sum \nolimits _{k\in {{w}_{i}}}^{{}}{{{a}_{k}}}$ and ${{\bar {c}}_{i}}=\frac {1}{\left | w \right |}\sum \nolimits _{k\in {{w}_{i}}}^{{}}{{{c}_{k}}}$.

The images may be noisy at the time of acquisition due to hardware, the environment, and other factors. Also, noise is introduced during the iterative process under local gradient constraints. Here, we introduce the L1 total variation (L1TV) term [37] to perform the denoising and smoothing operation on the image; then, the L1TV denoising term is as in Eq. (16):

(16)$$\mathop {\text{arg min}} _{\boldsymbol{Y}}\left\| \nabla \boldsymbol{Y} \right\| _1.$$

3.2 Solving the image SR model algorithm

The multi-focal-length image SR model proposed in this paper is obtained by combining Eq. (7), Eq. (9), and Eq. (16) as follows

(17)$$\hat{\boldsymbol{Y}} =\mathop{\text{arg min}}\limits_{\boldsymbol{Y}}\|\boldsymbol{H} \boldsymbol{Y}-\boldsymbol{X}\|_{2}+\lambda\left\|\nabla \boldsymbol{Y}-\nabla \boldsymbol{Y}_{L G C}\right\|_{2} +\gamma\|\nabla \boldsymbol{Y}\|_{1} .$$

As Eq. (8) is a typical least absolute shrinkage and selection operator (LASSO) problem [38], we use alternating direction method of multipliers (ADMM) to solve this equation. Introducing the auxiliary variable $\boldsymbol {u}$, we have

(18)$$\boldsymbol{\hat{Y}}=\mathop{\text{min}}\limits_{\boldsymbol{Y}}\left\| \boldsymbol{HY}-\boldsymbol{X} \right\| _2+\lambda \left\| \nabla \boldsymbol{Y}-\nabla \boldsymbol{Y}_{LGC} \right\| _2 +\gamma \left\| \boldsymbol{u} \right\| _{1},s.t.u=\nabla \boldsymbol{Y},$$

when transformed into the augmented Lagrangian form is

(19)$$\boldsymbol{\hat{Y}}=\mathop{\text{min}}\limits_{\boldsymbol{Y}}\left\| \boldsymbol{HY}-\boldsymbol{X} \right\| _2+\lambda \left\| \nabla \boldsymbol{Y}-\nabla \boldsymbol{Y}_{LGC} \right\| _2 +\gamma \left\| \boldsymbol{u} \right\| _1+\frac{\alpha}{2}\left\| \boldsymbol{u}-\nabla \boldsymbol{Y}+\frac{\boldsymbol{z}}{\alpha} \right\| _2.$$

Solving for $\boldsymbol {\hat {Y}}$ in Eq. (19) gives

(20)$$\frac{\partial \boldsymbol{\hat{Y}}}{\partial \boldsymbol{Y}}=\boldsymbol{H}^T\left( \boldsymbol{HY}-\boldsymbol{X} \right) +\lambda \nabla ^T\left( \nabla \boldsymbol{Y}-\nabla \boldsymbol{Y}_{LGC} \right) +\alpha \nabla ^T\left( \boldsymbol{u}-\nabla \boldsymbol{Y}+\frac{\boldsymbol{z}}{\alpha} \right) .$$

We then use the gradient descent method to solve Eq. (20) obtaining $\boldsymbol {Y}^{t+1}$,

(21)$$\boldsymbol{\hat{Y}}^{t+1}=\boldsymbol{\hat{Y}}^t-\frac{\partial \boldsymbol{\hat{Y}}^t}{\partial \boldsymbol{Y}}.$$

For $\boldsymbol {u}$, we use the shrink method to solve

(22)$$\boldsymbol{u}^{t+1}=\max \left( \left| \boldsymbol{v}^{t+1} \right|-\frac{\gamma}{\alpha ^t},0 \right) sign\left( \boldsymbol{v}^{t+1} \right) ,$$

where $\boldsymbol {v}^{t+1}=\nabla \boldsymbol {\hat {Y}}^{t+1}-\frac {\boldsymbol {z}^t}{\alpha ^t}$.

Solving for $\boldsymbol {z}$ we obtain

(23)$$\boldsymbol{z}^{t+1}=\boldsymbol{z}^t+\alpha ^t\left( \boldsymbol{u}^{t+1}-\nabla \boldsymbol{\hat{Y}}^{t+1} \right) .$$

Finally, solving for $\alpha$ we obtain

(24)$$\alpha ^{t+1}=\alpha ^t-\rho ,$$

where $\rho$ is the iteration step size.

The whole image SR reconstruction algorithm is shown in the Table 2.

Table 2. Framework of the proposed local-gradient-constraint-based SR algorithm

View Table | View all tables in this article

4. Experiment results and discussion

To prove the effectiveness of our proposed system and algorithm, we used the ISO 12233 resolution test chart image captured by the prototype to test its spatial modulation function (modulation transfer function, MTF) value and increase the spatial resolution [39,40]. A higher MTF value indicates greater resolution capability in the optical system or imaging sensor. Field experiments were also conducted to verify the feasibility and effectiveness of the proposed system and the SR method. To verify the performance of the proposed SR imaging method, the SR-RAW dataset [41] was used to conduct simulation experiments, and the peak signal-to-noise ratio (PSNR) and image structural similarity (SSIM) were used to evaluate the simulation results. For the image SR calculation and other image post-processing, the processor model of the computer was an AMD Ryzen 7 3700 and the memory was 32 GB. The SR methods used for comparison were LRSR [8], L1BTVSR [14], L2TVSR [13], BSRGAN [19], and selfDZSR [22]. For the MISR method, the number of convergence iterations was set to 50, and the convergence error was $\varepsilon =\max \left ( mean\left ( {{\boldsymbol {Y}}^{t+1}}-{{\boldsymbol {Y}}^{t}} \right ) \right )=2\times {{10}^{-5}}$. The iterations were stopped when the number of convergence iterations or convergence error was reached, and the SR image was obtained. Importantly, the compared MISR method adopts the same registration approach and imaging model as this study and chooses its parameters based on PSNR iterative tests on the simulation dataset. Through PSNR testing for convergence within iterations, L1BTVSR and L2TVSR were allocated regularization weights of 0.06 and 0.19, respectively. BSRGAN operates as a SISR approach(with the input being the image with the maximum focal length), and was evaluated against SOTA methods in the Real SR domain. In selfDZSR, we used both the maximum and minimum focal length images as reference and target images, respectively, for comparison as a MFLSRI algorithm based on deep-learning methods. Each deep-learning method utilized the testing models furnished by the researchers.

4.1 Experimental results

To evaluate the improvement effect of the proposed SR imaging method on the image spatial resolution, indoor and outdoor experiments were performed using the prototype system. Quantitative and qualitative evaluations were conducted on the SR results. The resolution of the reconstructed SR image was tested using the ISO 12233 resolution test chart. The prototype system was used to capture 287 sets of multi-focal images, each consisting of nine images, for a total of 2583 low-resolution images.

Figure 5 shows the images acquired using the system prototype proposed in this paper and the reconstructed fovea SR image, as well as the local details of each image. Figure 5(b), Fig. 5(c), and Fig. 5(d) represent zoomed-in images of the LR images based on their respective positions in Fig. 5(a). Figure 5(e), Fig. 5(f), and Fig. 5(g) illustrate detailed views of the MFLSRI images at the positions corresponding to the solid boxed areas. The spatial resolution of the MFLSRI image is significantly improved compared with those of the other images. In particular, the spatial resolution of the highest resolution image in the low-resolution image group is approximately 4 line pairs per millimeter (LP/mm) (that is, the spatial resolution of the 35-mm-focal-length image (Fig. 5(d)), whereas the spatial resolution of the SR image can reach 6 LP/mm. That is, the SR effect proposed by the device can reach 1.5 $\times$ the highest spatial resolution in the low-resolution images. The spatial resolution of the 16-mm-focal-length image in the LR image group is approximately 1.5 LP/mm, and the spatial resolution of the SR image is 4 $\times$ that of the 16-mm-focal-length image, which is close to the diffraction limit of the 16-mm-focal-length lens. Due to factors such as registration accuracy, optical distortion, and data redundancy, the actual super-resolution multiple did not reach the target value of 4.3 $\times$ as defined in Section 2. Taking traditional array imaging as an example, at least 16 identical cameras are required to form an imaging system to obtain 16 images of the same target to achieve a spatial resolution of more than 4 $\times$. The device proposed in this paper only needs nine cameras with different focal lengths to achieve this spatial resolution while retaining the field of view of the 16 mm lens. In other words, the advantages of our proposed device are low cost, small size, and fast post-processing.

Fig. 5. The central fovea SR image obtained by the imaging device described in this paper was compared with images of other focal lengths in terms of spatial resolution. (a)Depiction of the central fovea SR image. The region bounded by the red dashed box represents the central view. Between this and the green dashed box lies the transitional view, with the area outside being the peripheral view. (b), (c), and (d) individually present detailed views of LR images within the blue, green, and red solid boundaries, corresponding to focal lengths of 16 mm, 25 mm, and 35 mm, respectively. (e), (f), and (g) are SR images mirroring the exact locations depicted in (b), (c), and (d).

Download Full Size | PDF

The results of using the prototype system to capture images of vehicles on the road and perform SR reconstruction on the captured images are shown in Fig. 6. Further analysis of the SR results for car body and tire is provided. Figure 6(b) is the original image captured by the 35-mm-focal-length lens, and Fig. 6(c) is the enlarged image of the bicubic interpolation in Fig. 6(b). In both Fig. 6(d) and Fig. 6(e), tire details are noticeably restored in the SR images derived from LRSR and L1BTVSR methods, yet both images suffer from substantial noise interference. The results from the L2TVSR method exhibit minor noise in the white vehicle body, whereas BSRGAN fails to authentically reproduce the tire details. Even though selfDZSR manages to augment target resolution using a reference image, it doesn’t manage to recuperate the intricate details of the wheel hub. Compared with the results of the other SR methods, the SR image obtained by this method not only restores the detailed information and high-frequency information of the image, but also achieves the best visual effect.

Fig. 6. The SR results of images captured by the MFLSRI system. (a)Images acquired during the field tests. (b) Image acquired by the 35-mm-focal-length lens; (c) Bicubic interpolation image of (b); and SR images obtained using (d) LRSR, (e) L1BTVSR, (f) L2TVSR, (g) BSRGAN, (h) selfDZSR, and (i) the proposed method. The magnification is 4$\times$.

Download Full Size | PDF

For the images captured by the prototype system, the foveal SR ground-truth image cannot be obtained directly by any existing imaging systems. Hence, no-reference image quality assessment methods were employed to evaluate the effectiveness of the SR methods. The blind/reference-less image spatial quality evaluator (BRISQUE) [42], natural image quality evaluator (NIQE) [43], SMD2 [44], and contrast enhancement image quality (CEIQ) [45] were used. Table 3 shows the average no-reference image evaluation metrics computed using the SR images corresponding to the 287 sets of actually captured images. In Table 3, the optimal results are marked in bold and sub-optimal results in italics. On the NIQE and SMD2, the method presented in this research secured near-top performances, while it predominantly excelled in the MISR method. Even though BSRGAN and selfDZSR’s data-driven attributes notably enhance the image quality, it doesn’t mean they can authentically reproduce the image’s details, as depicted in Fig. 6.

Table 3. Average no-reference image metrics calculated using the SR results of all captured multi-focal-length image groups

View Table | View all tables in this article

Figure 7 shows the low-resolution and SR images obtained by taking the test chart. Figure 7(a) shows the ground truth (GT) image. Among them, Fig. 7(b) is an image with a focal length of 35 mm in the original image, and Fig. 7(c) is an enlarged cubic interpolation of the image in Fig. 7(b). The spatial resolution of the highest resolution image in the low-resolution image group is approximately 4 LP/mm, and the spatial resolutions of the images corresponding to LRSR, L1BTVSR, L2TVSR, and the proposed method can reach 6 LP/mm. Secondly, the LRSR and L1BTVSR images exhibit a certain amount of noise on the number "6". The BSRGAN and selfDZSR images do not reach the spatial resolution of 6 LP/mm, and the BSRGAN image even has serious artifacts. The L2TVSR image has a good balance between noise and high-frequency details, but the method in this paper uses gradient constraints to make the details more prominent. Therefore, the proposed method achieves a balance between improving resolution and suppressing noise and artifacts, outperforming the other competing methods.

Fig. 7. SR results of ISO 12233 resolution test chart images. The magnification is 4$\times$. (a) GT image; (b) Image acquired by 35-mm-focal-length lens; (c) bicubic interpolation image of (b); and SR images obtained using (d) LRSR, (e) L1BTVSR, (f) L2TVSR, (g) BSRGAN, (h) selfDZSR, and (i) the proposed method.

Download Full Size | PDF

To evaluate each SR method quantitatively, the slanted-edge method was used to calculate the MTF of the test chart image to obtain the MTF20 value (the value at which the MTF decays to 20%) of each SR method, as shown in Table 4. Table 4 demonstrates that the image resolution using the SR algorithm proposed in this paper is significantly higher than those of the other multi-frame image SR algorithms but inferior to that of BSRGAN. Concurrently, the MTF of the proposed method is 1.4 $\times$ greater than the LR image. Interestingly, although the spatial resolution of the BSRGAN image is not improved and severe artifacts appear, its resolution is much larger than that of the multi-frame SR image. Considering the principle of the edge method for calculating MTF, this finding shows that the BSRGAN’s processing of edges is extremely good. However, the essence of SR is to improve the spatial resolution of the image to obtain more high-frequency information. Therefore, the processing of BSRGAN images is distorted, which is also one of the characteristics of deep-learning SR methods.

Table 4. Image MTF measure calculated using the SR results showed in Fig. 7

View Table | View all tables in this article

4.2 Simulation

To simulate the characteristics of the multi-focal-length imaging system as much as possible in the simulation experiments, the multi-focal-length images captured for the same scene in the SR-RAW dataset were used as the ground truth images. The multi-focal-length images in the image data set were down-sampled, blurred, and displaced to simulate the imaging model of the multi-focal imaging system and were used to generate low-resolution images captured by the simulated multi-focal-length imaging equipment. The resolution of the ground truth image was 4 $\times$ that of the low-resolution image generated by the simulation. In the above degradation operation, the blur function was a Gaussian function with a mean of 0 and standard deviation of 1. To simulate the sub-pixel displacement between multi-focal-length images collected with different apertures, except for the image with the largest focal length in the image group, the displacement deviations of the other images have absolute values less than 1 at random in the horizontal and vertical directions.

Figure 8 and 9 show the results of the simulation with image groups 105 and 151 in the SR-RAW dataset. These image groups contain one 49-mm-focal-length image, four 35-mm-focal-length images, and four 24-mm-focal-length images. Because the resolution of the ground-truth image is 4 $\times$ the size of the low-resolution image, in the simulation process, the magnification of the SR reconstruction was set to four, so that the sizes of the ground-truth and SR images were consistent, and the reference image evaluation test could be performed. The SR reconstruction results in Fig. 8(l) and Fig. 9(d) show that the LRSR image obviously has more noise, and the edges of the L1BTVSR and L2TVSR images’ are blurred. The SR images obtained by the proposed method outperformed the SR images obtained by the L1BTVSR and L2TVSR methods in terms of object’s edge and detailed texture information. The images obtained by the BSRGAN method optimize the SR results of the object edge details, which is also one of the characteristics of the deep-learning SR methods. However, it is not possible to judge the advantages and disadvantages of a certain SR method only from the high-frequency detail texture information recovered from the SR image. The recovered text in Fig. 8(g) and Fig. 9(g) have more artifacts. While selfDZSR images exhibit minimal noise artifacts, it lags behind other MISR algorithms with respect to detail restoration. Therefore, the method proposed in this paper is superior to other methods in terms of image detail recovery, edge preservation, and denoising.

Fig. 8. SR images of image No. 105 in the SR-RAW dataset [41] at a magnification of 4$\times$. (a) Ground truth; (b) 49-mm-focal-length image; (c) bicubic interpolation of (b); and SR images reconstructed using (d) LRSR, (e) L1BTVSR, (f) L2TVSR, (g) BSRGAN, (h) selfDZSR, and (i) the proposed method. Close-ups from (a-i) are represented by (j-r) respectively.

Download Full Size | PDF

Fig. 9. SR images of image No. 151 in SR-RAW dataset [41] at a magnification of 4$\times$. (a) Ground truth; (b)49-mm-focal-length image; (c) bicubic interpolation of (b); and SR images reconstructed using (d) LRSR, (e) L1BTVSR, (f) L2TVSR, (g) BSRGAN, (h) selfDZSR, and (i) the proposed method. Close-ups from (a-i) are represented by (j-r) respectively.

Download Full Size | PDF

Tables 5 and 6 show the PSNR and SSIM of the SR processing results of 12 sets of data in the SR-RAW dataset. The optimal values appear in bold and suboptimal values in italics.

Table 5. PSNRs of the SR processing results of 12 sets of data in the SR-RAW dataset [41]

View Table | View all tables in this article

Table 6. SSIMs of the SR processing results of 12 sets of data in the SR-RAW dataset [41]

View Table | View all tables in this article

In most cases, the PSNR and SSIM of the SR results produced by the proposed method are better than those obtained using the other methods, and the average value is the best. The above experimental results demonstrate that the method in this paper can provide better SR reconstruction results, especially in terms of the PSNR and SSIM, compared to the other SR methods.

The running time of each algorithm and the number of iterations to reach convergence are shown in Table 7, and the minimum convergence time is marked in bold. Although the method proposed in this paper is not the fastest approach, it can provide the best PSNR and SSIM, balancing the reconstruction effect and computing power consumption.

Table 7. Running time of the algorithms (where the convergence error is set as shown and the maximum number of iterations is set to 50 and the input is 9 540$\times$720 images)

View Table | View all tables in this article

4.3 Ablation experiment

The algorithm in this study integrates local gradient constraint (LGC) and L1TV denoising terms into the conventional imaging model to enhance the quality of image reconstruction. This section performs ablation experiments to assess the effect of each prior term on image reconstruction, considering three combinations of prior terms:

1. Solely utilizing the data consistency constraint term;
2. The combination of data consistency constraint with L1TV denoising term.
3. The combination of data consistency constraint with LGC.
4. The combination incorporating data consistency constraint, LGC, and L1TV denoising term.

Nine sets of images are randomly selected from the SR-RAW dataset for the ablation experiments in this study. Each experiment iterates 20 times, with other parameter settings kept consistent with previous experiments. The reference ground truth image is presented in Fig. 10(a). Figure 10(b) depicts a low-resolution image at a 49 mm focal length. Reconstruction results using the aforementioned combinations of prior terms are exhibited in Fig. 10(c) to (f). Sole reliance on data constraint terms for resolving the MISR issue is not practical. The L1TV term in reconstruction results in images with lower noise but potential detail smoothing. Employing local gradient constraint terms markedly improves image details. The constraint emphasizes the comparison between image gradients and expected gradients in the reconstruction process. While enhancing details, clarity, and SR effects, this approach may also lead to increased noise. The study combines local gradient constraints and L1TV priors to strike a balance between gradient enhancement and image smoothing. The SR outcomes, derived from LGC and L1TV priors, bolster image details while suppressing noise. Table 8 illustrates that the proposed approach effectively eliminates noise and enhances image details through the integration of LGC and L1TV terms. Valuable insights for further advancements in imaging methods are provided through these comparative experiments in this section. Moreover, it furnishes references for the selection of imaging strategies in specific applications.

Fig. 10. Ablation experiment results of the image NO.148 in the SR-RAW dataset [41].

Download Full Size | PDF

Table 8. Average of PSNR and SSIM for 9 validation image datasets

View Table | View all tables in this article

5. Conclusion

Inspired by the large field of view and the central high-resolution imaging mechanism of the human visual system, a foveal SR imaging optical system was proposed and a prototype imaging system was built for demonstration. The prototype imaging system was verified through simulations, indoor experiments, and field tests. The multi-focal-length images collected by the system were post-processed by the SR algorithm proposed in this paper, and then a large-field-of-view and high-resolution foveal image was reconstructed. For the multi-focal-length images collected by the prototype system, an SR imaging method based on the local gradient constraint was proposed to achieve foveal SR imaging. The proposed system and algorithm were simulated using the SR-RAW public dataset, and the feasibility of the SR algorithm was verified. The simulation results showed that the average PSNR of the results obtained by the proposed method was approximately 0.32 dB higher than the sub-optimal value, confirming the effectiveness of the method. The prototype system was verified experimentally using the ISO 12233 resolution test chart, and field tests were conducted. Using the local-gradient-constraint-based super-resolution algorithm to perform SR reconstruction on the collected images, in the foveal area of the SR result image, the spatial resolution was improved by nearly 4 $\times$ compared to that of the image with the minimum focal length, which was close to the limit of the corresponding lens. The proposed algorithm achieved a spatial resolution of approximately 1.5 $\times$ relative to that of the image at maximum focal length. The field test results confirmed that the proposed bio-inspired imaging system and SR method provided better results than the other SR methods in terms of MTF, spatial resolution, and other metrics, while yielding sub-optimal NIQE values, thus confirming the effectiveness of the proposed method in practical applications. In conclusion, this paper provides an effective solution for multi-aperture, multi-focal-length SR imaging.

For future applications of the proposed method, certain limitations persist, particularly regarding the computational cost required for generating SR images. Enhancing calculation efficiency is paramount. The current approach of using a uniform high-resolution grid to interpolate pixels of array image projections proves to be inefficient, given the lower sampling density in the peripheral field of view compared to the central field of view. Consequently, interpolation methods for the peripheral field of view are inefficient.

To address this challenge, future work will focus on refining the algorithm. Specifically, the aim is to extend the prior knowledge of HR images from the central to the peripheral field of view, thereby reducing computational redundancy. This strategic adjustment is anticipated to enhance SR imaging performance while alleviating the computational burden.

Funding

Natural Science Foundation of Fujian Province (2023J01130137).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data may be obtained from the authors upon reasonable request.

References

1. Y. Mengbei, W. Hongjuan, L. Mengyang, et al., “Overview of research on image super-resolution reconstruction,” in IEEE Int. Conf. Inf. Commun. Softw. Eng.(ICICSE), (IEEE, CTU, CN., 2021), pp. 131–135.

2. S. M. A. Bashir, Y. Wang, M. Khan, et al., “A comprehensive review of deep learning-based single image super-resolution,” PeerJ Comput. Sci. 7, e621 (2021). [CrossRef]

3. J. Tanida, T. Kumagai, K. Yamada, et al., “Thin observation module by bound optics (tombo): concept and experimental verification,” Appl. Opt. 40(11), 1806–1813 (2001). [CrossRef]

4. R. Hartley and A. Zisserman, Multiple view geometry in computer vision (Cambridge university press, 2003).

5. Y. Kitamura, R. Shogenji, K. Yamada, et al., “Reconstruction of a high-resolution image on a compound-eye image-capturing system,” Appl. Opt. 43(8), 1719–1727 (2004). [CrossRef]

6. M. P. Christensen, V. Bhakta, D. Rajan, et al., “Adaptive flat multiresolution multiplexed computational imaging architecture utilizing micromirror arrays to steer subimager fields of view,” Appl. Opt. 45(13), 2884–2892 (2006). [CrossRef]

7. M. Somayaji, M. P. Christensen, E. Faramarzi, et al., “Field test of panoptes-based adaptive computational imaging system prototype,” in Imag. Appl. Opt. Tech. Dig., (Optica Publishing Group, 2011), p. CPDP3.

8. G. Carles, J. Downing, and A. R. Harvey, “Super-resolution imaging using a camera array,” Opt. Lett. 39(7), 1889–1892 (2014). [CrossRef]

9. G. Carles, G. Muyo, N. Bustin, et al., “Compact multi-aperture imaging with high angular resolution,” J. Opt. Soc. Am. A 32(3), 411–419 (2015). [CrossRef]

10. G. Carles, S. Chen, N. Bustin, et al., “Multi-aperture foveated imaging,” Opt. Lett. 41(8), 1869–1872 (2016). [CrossRef]

11. X. Liu, L. Chen, W. Wang, et al., “Robust multi-frame super-resolution based on spatially weighted half-quadratic estimation and adaptive btv regularization,” IEEE Trans. on Image Process. 27(10), 4971–4986 (2018). [CrossRef]

12. A. García-Díaz, R. Mendez-Rial, and A. Souto-López, “Embedded video rate super-resolution in the infrared with a low-cost multi-aperture camera,” in Unconventional Opt. Imag., (2018).

13. M. Elad and A. Feuer, “Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images,” IEEE Trans. on Image Process. 6(12), 1646–1658 (1997). [CrossRef]

14. S. Farsiu, M. D. Robinson, M. Elad, et al., “Fast and robust multiframe super resolution,” IEEE Trans. on Image Process. 13(10), 1327–1344 (2004). [CrossRef]

15. M. Haris, M. R. Widyanto, and H. Nobuhara, “Inception learning super-resolution,” Appl. Opt. 56(22), 6043–6048 (2017). [CrossRef]

16. J. Cai, H. Zeng, H. Yong, et al., “Toward real-world single image super-resolution: A new benchmark and a new model,” in Proc. IEEE/CVF Int. Conf. Comput. Vision (ICCV), (2019), pp. 3086–3095.

17. X. Xu, Y. Ma, and W. Sun, “Towards real scene super-resolution with raw images,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recog. (CVPR), (2019), pp. 1723–1731.

18. X. Ji, Y. Cao, Y. Tai, et al., “Real-world super-resolution via kernel estimation and noise injection,” in proc. IEEE/CVF conf. comput. vision pattern recog. works. (CVPRW), (2020), pp. 466–467.

19. K. Zhang, J. Liang, L. Van Gool, et al., “Designing a practical degradation model for deep blind image super-resolution,” in Proc. IEEE/CVF Int. Conf. Comput. Vision (ICCV), (2021), pp. 4791–4800.

20. H. Chen, X. He, L. Qing, et al., “Real-world single image super-resolution: A brief review,” Inf. Fusion 79, 124–145 (2022). [CrossRef]

21. Z. Zhang, R. Wang, H. Zhang, et al., “Self-supervised learning for real-world super-resolution from dual zoomed observations,” in Proc. Eur. Conf. Comput. Vis. (ECCV), (Springer, 2022), pp. 610–627.

22. S. L. Matthews, A. Uribe-Quevedo, and A. Theodorou, “Rendering optimizations for virtual reality using eye-tracking,” in Proc. Symp. Virtual Augment. Real.(SVR), (IEEE, 2020), pp. 398–405.

23. Q. Hao, Y. Tao, J. Cao, et al., “Retina-like imaging and its applications: A brief review,” Appl. Sci. 11(15), 7058 (2021). [CrossRef]

24. F. Huang, H. Ren, X. Wu, et al., “Flexible foveated imaging using a single risley-prism imaging system,” Opt. Express 29(24), 40072–40090 (2021). [CrossRef]

25. M. Tistarelli and G. Sandini, “On the estimation of depth from motion using an anthropomorphic visual sensor,” in Proc. Eur. Conf. Comput. Vis. (ECCV), (Springer, 1990), pp. 209–225.

26. G. Sandini, P. Questa, D. Scheffer, et al., “A retina-like cmos sensor and its applications,” in Proc. 2000 IEEE Sensor Array Multichannel Signal Process. Workshop (SAM 2000), (IEEE, 2000), pp. 514–519.

27. H. Gao, Q. Hao, X. Jin, et al., “Circuit design for the retina-like image sensor based on space-variant lens array,” in 2013 Int. Conf. Opt. Instrum. Technol. (OIT 2013), vol. 9045 (SPIE, 2013), pp. 402–409.

28. S. Thiele, K. Arzenbacher, T. Gissibl, et al., “3d-printed eagle eye: Compound microlens system for foveated imaging,” Sci. Adv. 3(2), e1602655 (2017). [CrossRef]

29. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vision 60(2), 91–110 (2004). [CrossRef]

30. Q. Song, R. Xiong, D. Liu, et al., “Fast image super-resolution via local adaptive gradient field sharpening transform,” IEEE Trans. on Image Process. 27(4), 1966–1980 (2018). [CrossRef]

31. K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE Trans. Pattern Anal. Mach. Intell. 35(6), 1397–1409 (2013). [CrossRef]

32. X. Yan, H. Qin, J. Li, et al., “Multi-focus image fusion using a guided-filter-based difference image,” Appl. Opt. 55(9), 2230–2239 (2016). [CrossRef]

33. X. Fu, Z. Lin, Y. Huang, et al., “A variational pan-sharpening with local gradient constraints,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recog, (CVPR), (2019), pp. 10265–10274.

34. H. Lu, Y. Yang, S. Huang, et al., “A unified pansharpening model based on band-adaptive gradient and detail correction,” IEEE Trans. on Image Process. 31, 918–933 (2022). [CrossRef]

35. X. Liu, D. Zhai, R. Chen, et al., “Depth super-resolution via joint color-guided internal and external regularizations,” IEEE Trans. on Image Process. 28(4), 1636–1645 (2019). [CrossRef]

36. S. Li, X. Kang, and J. Hu, “Image fusion with guided filtering,” IEEE Trans. on Image Process. 22(7), 2864–2875 (2013). [CrossRef]

37. D. N. H. Thanh, N. N. Hien, S. Prasath, et al., “Adaptive total variation l1 regularization for salt and pepper image denoising,” Optik 208, 163677 (2020). [CrossRef]

38. M. Yang, “Total variation regularization and fast algorithms based on alternating direction method,” in Chinese Control Conf., (CCC), (IEEE, 2014), pp. 4866–4869.

39. H. Hwang, Y.-W. Choi, S. Kwak, et al., “Mtf assessment of high resolution satellite images using iso 12233 slanted-edge method,” in Proc. SPIE Int. Soc. Opt. Eng., vol. 7109 (SPIE, 2008), pp. 34–42.

40. M. Ye, B. Wang, M. Uchida, et al., “Focus tuning by liquid crystal lens in imaging system,” Appl. Opt. 51(31), 7630–7635 (2012). [CrossRef]

41. X. Zhang, Q. Chen, R. Ng, et al., “Zoom to learn, learn to zoom,” in Proc. IEEE/CVF Conf. Comput. Vision Pattern Recog. (CVPR), (2019), pp. 3762–3770.

42. A. Mittal, A. K. Moorthy, and A. C. Bovik, “No-reference image quality assessment in the spatial domain,” IEEE Trans. on Image Process. 21(12), 4695–4708 (2012). [CrossRef]

43. A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a “completely blind” image quality analyzer,” IEEE Signal Process. Lett. 20(3), 209–212 (2013). [CrossRef]

44. Y. Li, N. Chen, and J. Zhang, “Fast and high sensitivity focusing evaluation function,” Appl. Research Comput. 27, 1534–1536 (2010).

45. Y. Fang, K. Ma, Z. Wang, et al., “No-reference quality assessment of contrast-distorted images based on natural scene statistics,” IEEE Signal Process. Lett. 22(7), 838–842 (2014). [CrossRef]

	Peripheral area	Transitional area	Central area
Focal length (mm)	16	25	35
HFOV $\times$ VFOV( $^{\circ}$ )	17.65 $\times$ 13.28	11.35 $\times$ 8.52	8.12 $\times$ 6.09
$v_{N}$ of Single-camera (mrad)	0.86	0.55	0.40
$v_{N}$ of Theoretical SR imaging(mrad)	0.43	0.23	0.20
SR multiples relative to $v_{N}$ (16 mm)	2.0	3.7	4.3

	LRSR	L1BTVSR	L2TVSR	BSRGAN^a	selfDZSR^b	Proposed^c
BRISQUE	43.38	43.46	27.53	22.94	46.87	29.33
NIQE	10.95	6.13	5.90	3.90	5.50	$5.45$
CEIQ	3.04	3.07	3.09	3.17	3.18	3.11
SMD2	17.85	8.41	6.53	12.38	2.36	2.43

	LRSR	L1BTVSR	L2TVSR	BSRGAN	selfDZSR	Proposed
6	26.23	27.94	28.02	25.92	25.02	28.10
10	25.38	27.22	27.06	25.45	24.40	27.07
86	22.89	24.82	25.25	25.21	22.52	25.47
105	31.94	32.66	33.02	28.79	28.32	33.81
112	32.11	33.49	33.37	28.46	26.60	33.81
126	24.21	27.17	27.38	30.31	24.61	27.29
137	27.51	29.38	29.37	27.62	25.02	29.68
141	28.76	30.35	30.98	28.81	26.63	31.57
145	31.29	31.28	31.42	27.02	25.75	31.76
151	26.51	27.19	27.45	23.61	22.64	27.78
154	26.86	28.10	28.56	24.22	24.57	28.61
162	24.76	24.28	24.55	19.55	19.09	25.31
mean	27.37	28.66	28.87	26.25	24.60	29.19

	LRSR	L1BTVSR	L2TVSR	BSRGAN	selfDZSR	Proposed
6	0.710	0.806	0.803	0.780	0.767	0.838
10	0.603	0.780	0.721	0.719	0.724	0.764
86	0.611	0.840	0.730	0.769	0.768	0.808
105	0.910	0.947	0.941	0.914	0.921	0.951
112	0.811	0.888	0.875	0.846	0.841	0.878
126	0.618	0.931	0.825	0.921	0.914	0.908
137	0.718	0.837	0.818	0.778	0.783	0.830
141	0.831	0.913	0.896	0.865	0.862	0.920
145	0.840	0.846	0.854	0.767	0.764	0.851
151	0.823	0.842	0.848	0.761	0.743	0.862
154	0.799	0.873	0.870	0.800	0.805	0.881
162	0.721	0.696	0.708	0.513	0.472	0.732
mean	0.749	0.850	0.824	0.786	0.780	0.852

	w/o LGC & L1TV	w/o LGC	w/o L1TV	Proposed
PSNR	719.17	19.87	19.80	19.89
SSIM	5.652	6.419	6.283	6.414

Bio-inspired foveal super-resolution method for multi-focal-length images based on local gradient constraints

Abstract

1. Introduction

2. Principles of the imaging system

2.1 Camera arrays

2.2 Proposed imaging system

3. Method

3.1 Image SR model based on multiple apertures and focal lengths

3.2 Solving the image SR model algorithm

4. Experiment results and discussion

4.1 Experimental results

4.2 Simulation

4.3 Ablation experiment

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (10)

Tables (8)

Equations (24)

Optics Express

	LRSR	L1BTVSR	L2TVSR	BSRGAN	selfDZSR	Proposed
Time/s	21.10	52.83	25.91	1.00	1.68	36.30
Steps	30	33	10	N/A	N/A	15

	LRSR	L1BTVSR	L2TVSR	BSRGAN	selfDZSR	Proposed
Time/s	21.10	52.83	25.91	1.00	1.68	36.30
Steps	30	33	10	N/A	N/A	15