Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Snapshot polarimetric diffuse-specular separation

Open Access Open Access

Abstract

We present a polarization-based approach to perform diffuse-specular separation from a single polarimetric image, acquired using a flexible, practical capture setup. Our key technical insight is that, unlike previous polarization-based separation methods that assume completely unpolarized diffuse reflectance, we use a more general polarimetric model that accounts for partially polarized diffuse reflections. We capture the scene with a polarimetric sensor and produce an initial analytical diffuse-specular separation that we further pass into a deep network trained to refine the separation. We demonstrate that our combination of analytical separation and deep network refinement produces state-of-the-art diffuse-specular separation, which enables image-based appearance editing of dynamic scenes and enhanced appearance estimation.

© 2022 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

The appearance of an object depends on light interactions with its materials and geometry. Many common materials can be accurately modeled as a combination of specular reflections and a diffuse component. Numerous computer vision and graphics algorithms are greatly simplified under the assumption of only diffuse (or specular) reflectance. For this reason, several approaches were developed to separate the diffuse and specular components of surface appearance. Diffuse-specular separation is inherently ill-posed and challenging. As a result, most existing work leverages multiple input images; either exploiting their spatial frequency response [2], or properties of diffuse chromaticity [3]. Some approaches attempt to perform single-image diffuse-specular separation; these methods are by definition ill-posed, and require strong assumptions that severely limit their practical applicability [4,5].

Polarization cues are commonly used to aid diffuse-specular separation, as specular reflections remain polarized when lit by polarized illumination while diffuse reflection loses most of the polarization due to subsurface scattering. To exploit this, two (or more) images are captured while rotating a linear polarizing filter in front of the camera. As this filter is set perpendicular or parallel to the light polarization, the observed image respectively blocks or allows specular reflections, providing an approximate separation.

However, many methods based on this observation make simplifying assumptions about polarimetric reflectance. In particular, they assume the diffuse component to be purely unpolarized [1]. The acquired measurements are also assumed to have no sensor saturation. Neither of these assumptions is true when acquiring single-shot polarimetric reflectance for most real-world objects, resulting in imperfect separation (Fig. 1(c)). These techniques require capturing multiple images with different rotations of the polarizer, which is impractical for hand-held setups or when capturing dynamic scenes.

 figure: Fig. 1.

Fig. 1. Separating diffuse and specular reflectance from a scene such as (a) is an ill-posed problem. We propose a single-shot method to this problem that utilizes polarization cues and data-driven priors. Our capture setup (b) comprises of a polarimetric camera and a polarized point source. Existing polarization-based analytical methods such as (c) [1] assume the specular is completely polarized while the diffuse is completely unpolarized. We demonstrate that this assumption does not hold for realistic scenes resulting in imperfect separation. Training feed-forward networks to directly perform this separation from polarized measurements performs better (d) but still fails to suppress all artifacts. Our approach (e) incorporates polarization cues along with a data-driven neural model and provides accurate diffuse-specular separation.

Download Full Size | PDF

In this work, we introduce a neural polarimetric diffuse-specular separation method that correctly handles partially polarized diffuse reflectance and requires only a single polarimetric capture. We capture the scene with a polarimetric sensor and analytically recover an initial estimate of the diffuse-specular components. We then refine these estimates using a feed-forward deep network trained on a large-scale synthetic dataset of complex materials. We adopt the polarimetric Bidirectional Reflectance Distribution Function (pBRDF) model of [6] to derive the initial analytical separation and to produce synthetic data to train our model.

We demonstrate that our two-stage initial separation + network refinement step (Fig. 1(e)) outperforms directly learning the separation from polarized measurements (d) and leads to state-of-the-art diffuse-specular results on a wide range of both synthetic test examples and real images. We make the following contributions:

  • • We develop a state-of-the-art diffuse-specular separation method that combines a physics-based analytic initialization with a deep feed-forward neural network refinement stage. To our knowledge, this is the first snapshot polarimetric diffuse-specular separation approach that accounts for accurate pBRDFs and works with single uncalibrated lighting polarizer orientation. The existing approaches either assume unpolarized diffuse component and/or require multiple measurements with calibrated rotations of the imaging and the lighting polarizers.
  • • Our practical single-capture setup enables real-time polarimetric acquisition. We showcase diffuse-specular separation and image-based appearance editing of real dynamic scenes. We also show that our diffuse-specular separation can be employed as a preprocessing step to enhance existing appearance acquisition.

2. Related work

Intensity-based separation. Many intensity-based methods used to perform diffuse-specular separation are based on the dichromatic reflectance model [7], postulating that an image specular component has a constant color throughout the image, while diffuse color varies spatially. Using this, separation can be achieved either by solving a PDE [3,8], leveraging the dark channel prior [5] or using non-negative matrix factorization techniques [9]. Liu et al. [10] propose a two-step approach to refine an initial over-saturated image through chromaticity propagation using linear programming. Souza et al. [11] propose a real-time specular highlight removal technique using pixel clustering, while Guo et al. [12] remove specularities using a low-rank reflection model. More recently, deep learning was used to remove specularities in an image using either a CNN [13], a GAN [14], or an image-to-image translation approach [15].

Another closely related body of work is intrinsic image decomposition [16], of which most methods are based on the assumptions of the Retinex theory [17] that natural images have smooth shading and piece-wise constant reflectance. Many methods aim to solve these constraints using either optimization [1820] or deep learning approaches [21].

Polarization-based separation. Diffuse-specular separation using polarization was considered since the late eighties [23,24], based on the observation that diffuse reflection tends to result in unpolarized reflectance while specular reflections tend to polarize the light toward the normal to the plane of incidence. Nayar et al. [25] further extend the method by using polarization to determine the color of the specular component and solve for the diffuse color using neighboring pixels. Debevec et al. [1] proposed to acquire the reflectance of human faces using polarization. They notice that specular reflections from human skin preserve polarization while scattering interactions in the epidermal and dermal layers depolarize diffuse reflectance. Similarly, Riviere et al. [26] and Deschaintre et al. [27] use a polarizing filter to estimate surface reflectance properties, including diffuse and specular maps. These methods make the simplifying assumption that reflected light’s polarization is entirely due to specular reflections, while diffuse reflections are unpolarized.

The unpolarized diffuse assumption fails to account for certain effects in natural images; to solve this, the Polarimetric BRDF model [6] was recently proposed to model polarization in both specular and diffuse reflections. Baek et al. [6] also separate diffuse and specular components by capturing multiple measurements with varying imaging and lighting polarizers. Ding et al. [22] propose simple arithmetic operations for separating out unpolarized diffuse, polarized diffuse and polarized specular components by capturing two polarization images with orthogonal lighting polarizers. These approaches require capturing multiple images by rotating a calibrated polarizer on the light source prohibiting their application in real world dynamic scenes. Our proposed method handles diffuse polarization and only requires a single polarization image with uncalibrated lighting polarizer. Table 1 summarizes the key distinctions of our approach from existing polarimetric diffuse-specular separation methods.

Tables Icon

Table 1. Salient features differentiating our approach from existing polarimetric diffuse-specular separation techniques.

Snapshot polarimetric camera. Polarization has been successfully used for multiple tasks related to 3D geometry and surface reflectance, including multi-view stereo [2831], Simultaneous Location and Mapping (SLAM) [32], shape estimation [22,3335], reflectance acquisition [36] and reflection removal [37]. In these works, the polarimetric acquisitions are typically performed using an intensity-based camera capturing multiple images while rotating a linear polarizer in front of the camera, unfortunately prohibiting fast or realtime acquisition.

The advent of single-shot polarization sensors, such as the Sony IMX250MZR (monochrome) and IMX250MYR (color) [38], made polarimetric acquisition faster and more practical. These sensors have a grid of polarizers oriented at different angles attached on the CMOS sensor enabling the capture of polarimetric cues at the expense of spatial resolution. Various techniques have been proposed for polarization and color demosaicking of the raw sensor measurements [39,40]. In this work, we use the Sony IMX250MYR color polarized sensor for snapshot polarimetric acquisition, enabling us to acquire polarization cues of real-world dynamic scenes at 50 frames per second.

3. Problem formulation

3.1 Theory of polarized reflectance

Stokes vector. The polarization state of light ray incident/exitant at a point ${\mathbf {x}}$ at incoming or outgoing angle ${\omega }$ can be represented as a Stokes vector ${\mathbf {s}}({\mathbf {x}}, {\omega }) \in \mathbb {R}^{4}$, which we denote $\left [ s_0, s_1, s_2, s_3 \right ]$ [41]. Each $s_i$ represent a different amount of radiance: unpolarized ($s_0$), linear in the vertical/horizontal axis ($s_1$) or diagonal axes ($s_2$), and circularly polarized light ($s_3$). When no circular polarization is present, the Stokes vector can be represented as

$${\mathbf{s}} = \tau\begin{matrix}[1 & \beta\cos(2\phi) & \beta \sin(2\phi) & 0]\end{matrix}^{T} \;,$$
where $\tau$ is the unpolarized radiance, $\beta \! \in \! \left [ 0, 1 \right ]$ the degree of polarization, and $\phi$ the angle of polarization. These parameters $\tau$, $\beta$ and $\phi$ relate to the Stokes vector components $s_0$, $s_1$ and $s_2$ as
$$\tau = s_0 \text{ , } \; \beta = \frac{\scriptstyle \sqrt{s_1^{2}+s_2^{2}}}{s_0} \text{ , and } \; \phi = \frac{1}{2}\tan^{{-}1} \left(\frac{s_2}{s_1}\right) \; .$$

Mueller matrices. To model surface interactions of polarized light, we follow [6] and use Mueller matrices ${\mathbf {H}} \in \mathbb {R}^{4\times 4}$. For a given inbound Stokes vector ${\mathbf {s}}_\mathrm {in}$, the outgoing Stokes vector after interaction is given by ${\mathbf {s}}^{\mathrm {out}} = {\mathbf {H}} {\mathbf {s}}^{\mathrm {in}}$.

Polarized point illumination. We consider the scene to be lit by a completely linearly polarized light source at location ${\mathbf {l}}$. A light source emits light with polarization angle $\nu ^{l}$ and intensity $\tau ^{l}$ onto a point ${\mathbf {x}}$ on the scene, visible to the camera. The input Stokes vector can be described as

$${\mathbf{s}}^{\mathrm {in}} = \tau^{l}\begin{matrix} [1 & \cos(2\nu^{l}) & \sin(2\nu^{l}) & 0] \end{matrix}^{T} \;.$$

Polarimetric BRDF According to the polarimetric BRDF model [6], surface interactions are divided into diffuse and specular components. The diffuse reflectance at point ${\mathbf {x}}$ on the material is modelled as a Mueller matrix, ${\mathbf {H}}^{\mathrm d}({\mathbf {x}}, {\omega }_i, {\omega }_o)$, which combines the effects of Fresnel transmission into the material, de-polarization and Fresnel transmission out of the material. Specular polarimetric BRDF, ${\mathbf {H}}^{\mathrm s}({\mathbf {x}}, {\omega }_i, {\omega }_o)$ is modelled as the Fresnel reflection at a specular microfacet on the surface.

We consider the scene to be composed of opaque dielectric materials with both diffuse and specular components. Since the dependence of polarimetric reflectance on the index of refraction is weak [35], it is assumed to be $1.5$ for all materials.

3.2 Stokes vector of exitant light

For the diffuse interactions, Stokes vector of the light exiting the surface is a product of the diffuse Mueller matrix ${\mathbf {H}}^{\mathrm d}$ and the input Stokes vector ${\mathbf {s}}^{\mathrm {in}}$, expressed as

$${\mathbf{s}}^{\mathrm d}={\mathbf{H}}^{\mathrm d} {\mathbf{s}}^{\mathrm {in}}.$$

From Eq. (1), we describe the diffuse exitant Stokes vector ${\mathbf {s}}^{\mathrm d}$ using the polarimetric cues $\tau ^{\mathrm {d}}$, $\beta ^{\mathrm d}$ and $\phi ^{\mathrm d}$ as

$${\mathbf{s}}^{\mathrm d} = \tau^{d}\begin{matrix} [1 & \beta^{\mathrm d}\cos(2\phi^{\mathrm d}) & \beta^{\mathrm d}\sin(2\phi^{\mathrm d}) & 0] \end{matrix}^{T} \;.$$

Using ${\mathbf {H}}^{\mathrm d}$ from pBRDF model and ${\mathbf {s}}^{\mathrm {in}}$ from Eq. (3), we derive the polarimetric cues of the diffuse Stokes vector as,

$$\tau^{\mathrm d} = I^{\mathrm d}\tau^{l}T_o^{+}\left(T_i^{+}+T_i^{-}\gamma^{}_{o+l}\right) \text{ , } \; \; \beta^{\mathrm d} = \frac{T_o^{-}}{T_o^{+}} \; \text{ , and } \; \phi^{\mathrm d} ={-}\phi_o$$
where $T_o^{+}$ and $T_o^{-}$ are calculated from the exitant Fresnel transmission coefficients, $T^{+}_i$ and $T^{-}_i$ are calculated from the incident Fresnel transmission coefficient, $\phi _\mathrm {o}$ is the exitant azimuth angle w.r.t. surface normal, $I^{\mathrm d}$ is the diffuse shading term and $\gamma ^{}_{o+l} = \cos \left (2\left (\phi _\mathrm {o}+\nu ^{l}\right )\right )$. Please refer to Supplemental Document for complete derivation and description of each term. Similarly, for specular interactions, the exitant Stokes vector is obtained as
$${\mathbf{s}}^{\mathrm s} = {\mathbf{H}}^{\mathrm s} {\mathbf{s}}^{\mathrm {in}} = \tau^{\mathrm s}\begin{matrix} [1 & \beta^{\mathrm s}\cos(2\phi^{\mathrm s}) & \beta^{\mathrm s}\sin(2\phi^{\mathrm s}) & 0] \end{matrix}^{T} \;.$$

We derive the specular polarimetric cues using ${\mathbf {H}}^{\mathrm s}$ from the pBRDF model [6] and $\nu ^{l}$ from Eq. (3).

$$\begin{aligned} \tau^{\mathrm s} &= I^{\mathrm s}\tau^{l}\left(R^{+}+R^{-}\gamma^{}_{i+l}\right) \;, \; \; \beta^{\mathrm s} = 1 \;, \\ \phi^{\mathrm s} &={-}\tan^{{-}1}\left(\frac{R^{-}\chi^{}_o+R^{+}\chi^{}_o\gamma^{}_{i+l} +R^{{\times}}\gamma^{}_o\chi^{}_{i+l}}{R^{-}\gamma^{}_o+R^{+}\gamma^{}_o\gamma^{}_{i+l} +R^{{\times}}\chi^{}_o\chi^{}_{i+l}}\right) \;. \end{aligned}$$
where $R^{+}$, $R^{-}$ and $R^{\times }$ are calculated from the Fresnel reflection coefficients, $\varphi _\mathrm {i}$, $\varphi _\mathrm {o}$ are the incident and exitant azimuth angle w.r.t. half angle, $I^{\mathrm s}$ is the specular shading term, $\scriptstyle \chi ^{}_{i+l} \; = \; \sin \left (2\left (\varphi _\mathrm {i}+\nu ^{l}\right )\right )$, $\scriptstyle \gamma ^{}_{i+l} = \cos \left (2\left (\varphi _\mathrm {i}+\nu ^{l}\right )\right )$, $\chi ^{}_{o} = \sin \left (2\varphi _\mathrm {o}\right )$, and $\gamma ^{}_{o} = \cos \left (2\varphi _\mathrm {o}\right )$. Please refer to Supplemental Document for the complete derivation and description of each term.

Since the incident light is completely linearly polarized, the specular reflection is completely linearly polarized as well, i.e. $\beta ^{\mathrm s}=1$. So, we can drop $\beta ^{\mathrm s}$ from Eq. (7)

As light is a linear phenomenon, the net exitant Stokes vector is the sum of both diffuse (Eq. (4)) and specular Stokes vectors Eq. (7),

$${\mathbf{s}}^{\mathrm {mix}} = {\mathbf{s}}^{\mathrm d} + {\mathbf{s}}^{\mathrm s} = \begin{bmatrix} \tau^{d} + \tau^{s} \\ \tau^{d}\beta^{d}\cos(2\phi^{d})+\tau^{s}\cos(2\phi^{s})\\ \tau^{d}\beta^{d}\sin(2\phi^{d})+\tau^{s}\sin(2\phi^{s})\\ 0\\ \end{bmatrix} .$$

From Eq. (2), the resulting Stokes vector ${\mathbf {s}}^{mix}({\mathbf {x}}, {\mathbf {o}})$ can be described with $\tau ^{\mathrm {mix}}$, $\beta ^{\mathrm {mix}}$, and $\phi ^{\mathrm {mix}}$ as

$$\begin{aligned} \tau^{mix} &= \tau^{d} + \tau^{s} \;, \\ \beta^{mix} &= \dfrac{\sqrt{(\beta^{d}\tau^{d}+\tau^{s})^{2}-2\beta^{d}\tau^{d}\tau^{s}\sin^{2}(\phi^{d}-\phi^{s})}}{\tau^{s}+\tau^{d}} \;, \\ \phi^{mix} &= \tan^{{-}1}\left( \dfrac{\tau^{d}\beta^{d}\sin(2\phi^{d})+\tau^{s}\sin(2\phi^{s})}{\tau^{d}\beta^{d}\cos(2\phi^{d})+\tau^{s}\cos(2\phi^{s})}\right) \;. \end{aligned}$$

Obtaining the parameters of the exitant Stokes vector ${\mathbf {s}}^{\mathrm {mix}}$ can be directly used for an initial estimate of the diffuse-specular separation, as explained in Sec. 4.2.

3.3 Polarimetric camera measurements

The diffuse and specular light exiting the scene is incident on the camera. Polarization-based methods capture $K$ different polarization angles $\nu ^{c}_k$, with $k=1,2 \dots K$, where $K \geq 2$. This capture can either be done at once by a polarization-sensitive sensor or by mounting a linear polarizer on the camera lens and capturing $K$ images, rotating the polarizer between each capture. The polarized camera radiance measurements are given by $m({\mathbf {x}}, \nu ^{c}_k) = {\mathbf {p}}_k \cdot {\mathbf {s}}^{\mathrm {mix}}({\mathbf {x}},{\mathbf {o}})$, where $\mathbf {p}_k$ corresponds to the $k^{\text th}$ polarization angle, given by:

$${\mathbf{p}}_k = \frac{1}{2}\begin{matrix} [1 & \cos{2\nu^{c}_k} & \sin{2\nu^{c}_k} & 0] \end{matrix}^{T}.$$

Defining $\mathbf {P}({\boldsymbol {\nu }}^{c}) = [\mathbf {p}_1, \mathbf {p}_2,\ldots, \mathbf {p}_K]^{T}$ allows us to describe all polarization measurements as

$$\mathbf{m}({\mathbf{x}},{\boldsymbol{\nu}}^{c}) = \mathbf{P}({\boldsymbol{\nu}}^{c}) \cdot \mathbf{s}^{\mathrm {mix}}({\mathbf{x}},{\mathbf{o}}) \; ,$$
where ${\boldsymbol {\nu }}^{c} = [\nu ^{c}_1, \nu ^{c}_2, \dots, \nu ^{c}_K]$.

Given the fixed dynamic range of the sensor, these measurements may be clipped to a maximum value, here assumed to be 1. Thus the saturated measurements captured can be denoted as

$$\mathbf{m}^{sat}({\mathbf{x}},{\boldsymbol{\nu}}^{c}) = \min(\mathbf{m}({\mathbf{x}}, {\boldsymbol{\nu}}^{c}),1) \:.$$

3.4 Beyond common approximations

We consider two phenomenona commonly ignored in the literature—unpolarized diffuse reflectance and sensor saturation—which allows our method to be applied to single-shot LDR images.

Unpolarized Diffuse Reflectance Existing polarimetric separation works such as [1] assume that diffuse reflectance is always unpolarized, i.e. $\beta ^{d} \approx 0$. However, regions with a large angle between surface normal and viewing direction break this assumption, which leads to leaking of the (typically colored) diffuse component into the specular map estimation, as visible in Fig. 2(b).

 figure: Fig. 2.

Fig. 2. Our Pipeline: We capture a single image with the polarimetric camera under fixed polarized point lighting (left). We perform demosaicing, compute Stokes vector and obtain an initial polarization-based analytical separation (middle). This separation has artifacts due to sensor saturation and ill-posed recovery. These maps are used as inputs to a feed-forward neural refinement step, producing the final separated images (right). Red arrows depict specular bleeding into diffuse and green arrows depict diffuse bleeding into specular.

Download Full Size | PDF

No saturation The captured image is often assumed to yield no saturated pixels, i.e. $\mathbf {m}^{\mathrm {sat}} \approx \mathbf {m}$. This is a plausible assumption for diffuse reflection in a well-exposed image. However, for regions with sharp specularities, this assumption does not accurately model reality. As depicted in Fig. 2(b), realistic single-shot captures have saturated pixels at specular highlights, resulting in artifacts when relying on this assumption for diffuse-specular separation.

4. Our approach

The aim of our method is to obtain per-pixel unpolarized diffuse and specular intensities $\tau ^{\mathrm d}$ and $\tau ^{\mathrm s}$ from a single LDR image captured by a polarimetric camera, first using an analytical separation, followed by a refinement step.

4.1 Image acquisition and preprocessing

We consider a scene lit by a single LED light equipped with a linear polarizer. Our approach works with an arbitrary light position and polarization angle. The scene is viewed from a camera equipped with a 4 MP color polarimetric sensor equipped with color (R,G,B) and polarizing filters ($90^{\circ }$, $45^{\circ }$, $135^{\circ }$ and $0^{\circ }$) forming a mosaic pattern, spatially multiplexing the polarization angles as shown in Fig. 2 (a).

We apply standard Bayer demosaicing on the raw image to obtain the 3 channels for each polarization angle. For every 2x2 sub-array of pixels, we have color images corresponding to the four orientations of imaging polarizer. Thus, we rearrange the demosaiced 4 MP RGB image into four 1MP 3 channels images, each corresponding to different angles of the imaging polarizer. From the notation introduced in Sec. 3.3, the preprocessed data can be denoted as

$$\mathbf{m}({\mathbf{x}}, {\boldsymbol{\nu}}^{c}) \; , \quad {\boldsymbol{\nu}}^{c} = \left[0^{{\circ}},45^{{\circ}},90^{{\circ}}, 135^{{\circ}}\right] \;.$$

4.2 Initial analytical separation

For a location on the surface ${\mathbf {x}}$, the linear relationship between the measurements $\mathbf {m}$ and Stokes vector $\mathbf {s}$ is described by Eq. (12). Using Eq. (11), we construct $\mathbf {P}({{\boldsymbol {\nu }}}^{c})$ and then solve for the mixed exitant Stokes vector using

$$\begin{aligned} \mathbf{\hat{s}}({\mathbf{x}},{\mathbf{o}}) = \mathbf{P}({{\boldsymbol{\nu}}}^{c})^{+} \cdot \mathbf{m}({\mathbf{x}},{{\boldsymbol{\nu}}}^{c}) \;, \end{aligned}$$
where $^{+}$ denotes the pseudo-inverse. When the measurements are unsaturated, i.e. $\mathbf {m}^{\mathrm {sat}} = \mathbf {m}$, solving Eq. (15) results in the exact Stokes vector, i.e. $\mathbf {\hat {s}}({\mathbf {x}},{\mathbf {o}}) = \mathbf {s}^{\mathrm {mix}}({\mathbf {x}},{\mathbf {o}})$. When the diffuse component is assumed to be completely unpolarized ($\beta ^{d} = 0$), we can approximate the specular exitant Stokes vector from Eq. (9) as
$$\begin{aligned} {\mathbf{s}}^{s}({\mathbf{x}}, {\mathbf{o}}) \approx \begin{bmatrix} \tau^{d} + \tau^{s} & \tau^{s}\cos(2\phi^{s}) & \tau^{s}\sin(2\phi^{s}) & 0 \end{bmatrix}^{T} \;. \end{aligned}$$

We estimate an initial diffuse and specular intensity as

$$\begin{aligned} {\tau}^{d} \approx \hat{s}_0 - \sqrt{\hat{s}_1^{2}+\hat{s}_2^{2}}\triangleq \hat{\tau}^{\mathrm d} \end{aligned}.$$
$$\begin{aligned}\tau{}^{s} \approx \quad\sqrt{\hat{s}_1^{2}+\hat{s}_2^{2}} \triangleq \hat{\tau}^{\mathrm s}\end{aligned},$$

This diffuse-specular separation is exact under the assumptions described in Sec. 3.4. However, since a real observed image contains both saturation and polarized signal on diffuse regions (see Fig. 2(b)), this formulation yields artifacts in its separation. In the next section, we fix this issue using a neural network based refinement step.

4.3 Neural network refinement of the separation

The recovery of polarized diffuse and specular components from the mixed exitant Stokes vector is an ill-posed problem. For this purpose, we propose solving this issue with learned priors using a deep feed-forward network trained on a synthetic dataset.

Input We feed our neural networks DiffUNet and SpecUNet the following maps from our previously estimated mixed Stokes vector:

  • 1. The Initial Diffuse Intensity $\hat {\tau }^{\mathrm d}$ described in Eq. (17).
  • 2. The Initial Specular Intensity $\hat {\tau }^{\mathrm s}$ described in Eq. (18). From Eq. (2), the $\hat {\tau ^{s}}$ input can also be interpreted as degree of polarization scaled by the total intensity, i.e., $\hat {\tau ^{s}}=\beta s_0$. To prevent issues due to the high radiance values typically produced by specular highlights, we compress this input with $\log (1 + \hat {\tau }^{s})$.
  • 3. The Mixed Polarization Angle $\phi ^{\mathrm {mix}}$ described in Eq. (10), which corresponds to either $\phi ^{\mathrm d}$ or $\phi ^{\mathrm s}$ in diffuse or specular dominated regions, respectively. We also compress its dynamic range to $[-1, 1]$ with $\cos (2\phi ^{\mathrm {mix}})$.

Architecture The input images described above are concatenated along the channels and fed to two separate U-Net-based feed forward networks [42], DiffUNet and SpecUNet, that are independently trained to output diffuse unpolarized intensity $\tau ^{\mathrm d}$ and the specular unpolarized intensity $\tau ^{\mathrm s}$, respectively.

DiffUNet Loss The output of DiffUnet $U^{d}$ is trained to match the unclipped ground truth diffuse intensity, $\tau ^{d}$. Our training loss comprises of a combination of L1 and spatial gradient losses described as follows:

$$L^{d}(U^{d}, \tau^{d}) = \left\| U^{d} - \tau^{d} \right\|_1 + 5 \left\|\nabla_{{\mathbf{x}}}U^{d} - \nabla_{{\mathbf{x}}}\tau^{d} \right\|_1 \;.$$

SpecUNet Loss The unclipped specular component $\tau ^{s}$ has high dynamic range (HDR) and can go from zero to very high values at specular highlights, causing instability and divergence during training. To stabilize the training process, SpecUNet $U^{s}$ is trained to output $\log (1+\tau ^{s})$ using an L1 loss, with an additional L1 regularization on the spatial gradient, similar to Eq. (19).

The initial specular intensity $\hat {\tau }^{s}$ has diffuse bleeding in due to the imperfect analytical separation. To encourage SpecUNet to fix this issue, we reuse the $L^{\mathrm d}$ loss and compute a proxy diffuse from the estimated specular component, which we formulate as

$$U^{\mathrm s} \rightarrow {\mathrm d} = \tau^{\mathrm {mix}} - \left( \exp U^{s} - 1 \right) \; ,$$
where $\tau ^{\mathrm {mix}}$ is the input unseparated image. SpecUNet is trained using the sum of the previously mentionned losses,
$$\begin{aligned}L^s(U^s,\tau^s, \tau^\mathrm{mix}) &= \|U^s - (\log(1+\tau^s)) \|_1\\ &+ 50 \|\nabla_{\mathbf{x}} U^s - \nabla_{\mathbf{x}}(\log(1+\tau^s)) \|_1\\ &+L^d(U^\mathrm{s \rightarrow d}, \tau^d) \;.\end{aligned}$$

Details of the neural network refinement can be found in the supplementary material.

4.4 Training data generation

Random scene generation To obtain realistic variation of surfaces in the dataset, we use the dataset of random 3D shapes introduced by Xu et al. [43]. 3D shapes are obtained by applying random height field perturbations to primitive shapes such as cylinders, ellipsoids, cubes, and then by combining multiple primitive shapes in random orientations. Each face of every 3D primitive is assigned a random base color and roughness value. Figure 3 shows an example of the generated random shapes. 500 such random shapes are generated.

 figure: Fig. 3.

Fig. 3. Results on synthetic test scenes. The ground truth unseparated, diffuse and specular intensities are shown in (a). Separating the diffuse and specular components using traditional analytical separation (b) causes the diffuse component to leak into the specular result and vice-versa. CNN-based approach on the polarized images (c) alleviates some of the artifacts but still suffers from artifacts. Our result (d) shows a much better diffuse and specular results, closest to ground truth (a) as shown by the absolute error maps. The diffuse and specular maps are tonemapped with $\gamma=2.2$ for visualization. Red arrows depict specular bleeding into diffuse and green arrows depict diffuse bleeding into specular.

Download Full Size | PDF

Polarimetric rendering Each random shape is viewed from 20 different view points generating 10,000 data points. We consider a single completely polarized point light source and vary its position and polarization angle for each data point, for which we render the exitant Stokes vector incorporating the pBRDF model (Eq. (9)) with a resolution of 1MP. We then emulate the clipped measurements of the polarimetric camera using Eqs. (12) and 13.

Please refer to the supplementary material for additional details of the training data generation.

5. Results and evaluation

Evaluation Datasets To evaluate our method, we demonstrate results on two synthetic and one real captured dataset. The first synthetic dataset consists of a held-out set of random 3D objects as described in Sec. 4.4. The second synthetic dataset is comprised of 3D objects reconstructed using Photometric Stereo [44] to which we apply diffuse albedos and roughness values randomly sampled in $[0.2,0.5]$. We also demonstrate results on real objects captured with our capture setup described in Sec. 4.1.

Ablation study In Table 2, we demonstrate the contribution of different aspects in our approach by comparing the PSNRs obtained when removing parts of our method on synthetic test datasets. We first note that using the same refinement UNet (1) performs worse as balancing between diffuse (Eq. (19)) and specular (Eq. (21)) losses is hard. This potentially explains our slight quality degradation on specular maps. Second, removing the polarization angle input (2) from the refinement network leads to worse performance. The polarization angles give cues about diffuse and specular components and also depend on illumination polarization, which appear to be a useful to the refinement task. Lastly, providing the Stokes vector maps ($s_0$, $s_1$, and $s_2$) (3) instead of the initial analytical separation results in slightly worse performance as the network has to learn to perform both the separation and the refinement.

Tables Icon

Table 2. Ablation study demonstrating strengths of different components of our approach. Metrics shown are PSNR (in dB) on (a) random objects dataset (see Sec. 4.4) and (b) 3D scenes [44]. A single UNet that outputs both specular and diffuse (1) offers a strong baseline, but has difficulty with the specular map of dataset (a). Removing the polarization angle input from the refinement network (2) and replacing the inputs with stokes vectors provides slightly lower test PSNR.

Comparisons with Existing Separation Techniques In Fig. 4, we compare our estimated specular intensity with the ones estimated by existing polarimetric diffuse-specular techniques on a rendered scene. Debevec et al. [1] assume the diffuse component is unpolarized. This assumption results in bleeding of the diffuse color into the estimated specular map. Ding et al. [22] model the partial polarization of diffuse, resulting in a more accurate specular estimation. But Ding et al. require captures of two polarization images with known rotations of the lighting polarizer. Our method only requires a single polarization image captured with unknown lighting polarizer angle. Our analytical separation provides an initial estimate of the specular. The refinement network SpecUNet then removes the diffuse bleeding into the initial estimate resulting in specular estimate comparable to Ding et al. with single polarization image and unknown polarizer angle.

Existing approaches to polarimetric diffuse-specular separation require accurate calibration of the lighting polarizer orientation while our approach is agnostic to the lighting polarizer angle. To analyse the robustness to lighting polarizer calibration, we rendered the Random Objects dataset (Sec. 4.4) with perturbed lighting polarizer orientation. The perturbations to the lighting polarizer follow Gaussian distribution with varying standard deviation. In Fig. 5, we show the mean PSNR of the diffuse and specular images estimated by our and existing methods. Debevec et al. [1] require the lighting polarizer to be exactly co-aligned and cross-aligned with the imaging polarizer. As a result, the separation performance is poor on increasing polarizer mismatch. Ding et al. [22] require capturing two polarization images with orthogonal lighting polarizer orientations. Due to random perturbations, the two lighting polarizer orientations aren’t perfectly orthogonal leading to a slight descrease in performance. On the other hand, our method is agnostic to the perturbations in the lighting polarizer angles and the performance doesn’t vary with calibration mismatches of the lighting polarizer.Moreover our approach consistently achieves higher mean PSNR than the existing approaches.

 figure: Fig. 4.

Fig. 4. Comparisons with existing diffuse-specular separation approaches on synthetic scene. Debevec et al. [1] does not handle diffuse polarization and has artefacts due to the diffuse bleeding into specular (green arrow). Ding et al. [22] provides accurate specular estimation but requires capturing multiple polarization images by rotating the lighting polarizer to known angles. Our initial analytical separation assumes unpolarized diffuse leading to the diffuse bleeding artefacts(green arrow). SpecUNet then refines the initial separation and produces accurate specular component that is comparable to Ding et al. reconstruction but with a single unknown lighting polarizer angle.

Download Full Size | PDF

Baselines In Table 3, we compare our method with the following baseline approaches.

  • 1 Analytical approach on polarized images. We use an analytical method to separate our polarization measurements into diffuse and specular components based on the algorithm detailed in Sec. 4.2, effectively implementing a method without data driven priors such as [1].
  • 2 Direct CNN-based approach on single unseparated image. We train a network that outputs the separated diffuse and specular intensities from a regular unpolarized image. We use the same network architecture (except the input layer) as our separation refinement network (Sec. 4.3), trained on the same scenes. This intensity-based baseline performs diffuse-specular separation without polarization cues.
  • 3 Direct CNN-based approach on polarized images. We train a network that directly outputs the separated diffuse and specular intensities from the demosaicked polarized sensor measurements $\textbf {m}({\boldsymbol {\nu }})$, effectively removing our initial analytical separation and estimating the separation in a single step.

Tables Icon

Table 3. Top: Salient features differentiating our approach from the baselines (see text for their description). Our approach exploits polarization cues to get initial analytic separation followed by CNN-based refinement. Bottom: PSNR of estimated diffuse and specular on our test set with held-out random objects dataset and a 3D scene dataset. Our separation gives consistently better PSNR values than direct CNN methods and traditional analytical separation.

Comparisons with Baseline Table 3 shows that our separation gives consistently better PSNR values than other methods, both on randomly generated shapes and more structured scenes. We note that a simple feed-forward approach (1) is good at recovering the diffuse component, but perform poorly on the specular component. Adding cues such as multiple inputs (2) further improves the results.

Figure 3 shows qualitative results on two synthetic 3D scenes. Our method shows a much better specular result devoid of residual diffuse light, as shown by the absolute error maps.

In Fig. 6, we further demonstrate diffuse-specular separation on images of real objects, captured with our setup lit by a single polarized point light (Fig. 1(a)). The results from a traditional polarization-based separation technique (Fig. 6(b)) has artifacts due to sensor saturation (red arrows) and diffuse residual in the specular component (green arrows) particularly present at the edges of objects. Our approach (c) provides the cleanest diffuse and specular separation. However, our image formation model assumes only direct specular and diffuse reflectance and does not model inter-reflections. The second row of Fig. 7 shows a related failure, where reflections of the hands are visible on its body in the specular component.

 figure: Fig. 5.

Fig. 5. Comparisons with existing approaches upon varying error in calibration of the lighting polarizer. We render synthetic scenes with random perturbations to the lighting polarizer’s orientation. Existing approaches require calibration of the lighting polarizer and their performance degrades on increasing the standard deviation of the perturbations. Our approach is agnostic to the lighting polarizer calibration and results in higher mean PSNR for the estimated diffuse and specular components.

Download Full Size | PDF

Dynamic scenes Our capture setup (Fig 1(a)) enables polarimetric acquisition of dynamic scene at interactive rates around 50 Hz. This acquisition rate is much faster than allowed by existing techniques that require either the rotation of the camera polarizer and/or the lighting polarizer. Our pipeline involves lightweight linear operations followed by a feed forward network inference on a single polarization image. The inference time is 1.4 seconds per frame which we believe can be sped up with a more optimized implementation. In Fig. 7, we demonstrate diffuse-specular separation of a plastic strawberry rotating at roughly 30 rpm captured at 50 Hz frame rate. Please refer to our supplementary video for results on more scenes.

 figure: Fig. 6.

Fig. 6. Comparison of our diffuse-specular separation with baselines on real objects, captured with our setup (polarized point light and a polarimetric camera). The unseparated image is shown on the left in (a). Separating the diffuse and specular components using a traditional polarimetric separation (b) leads to clear leaks of diffuse light into the specular result. A CNN trained for this separation directly on polarized images performs better, yet still suffers from artifacts. Our separation (c) shows a much cleaner specular component and a more accurate diffuse component. Red arrows depict specular bleeding into diffuse and green arrows depict diffuse bleeding into specular. Please refer to Visualization 1 for video results on dynamic scenes.

Download Full Size | PDF

Appearance editing application We showcase an example of image editing in Fig. 8 by (d) turning the appearance to metallic, (e) increasing specular roughness and (f) adding texture on the diffuse without affecting specular highlights. For metallic appearance, we apply the diffuse hue and saturation $\tau _d$ to the specular intensity $\tau _s$ and combine them as $45\tau _s+\tau _d$. For matte appearance, we Gaussian blur ($\sigma =10$) the specular component.

 figure: Fig. 7.

Fig. 7. Diffuse-specular separation of dynamic scenes. Our single-shot capture setup enables the acquisition of dynamic scenes at 50 Hz and provides separated diffuse (left) and specular (right) components from the video capture for three different frames (c). Raw image from the sensor (a) and the unseparated image (b) for the first frame shown for reference. Please refer to Visualization 1 for the full separated diffuse and specular videos.

Download Full Size | PDF

Enhanced material estimation In Fig. 9, we show how our diffuse-specular separation can be used to enhance material estimation methods. Here, we train two versions of the state-of-the-art material capture network of Deschaintre et al. [45], one using unseparated images as input, and the other on separated diffuse and specular inputs. Using our separated diffuse and specular components as inputs to the second network significantly outperforms the first network. For example, the inference from unseparated inputs includes flatter normals and colored specular. As a result, the relighting of the scene using our separated inputs looks more realistic.

 figure: Fig. 8.

Fig. 8. Image-based appearance editing. From an input image (a), our method estimates a diffuse-specular separation (b-c) that can be used to perform appearance editing such as changing the material to a metal (d), increasing its roughness (e), and texture editing without affecting the specular component (f)

Download Full Size | PDF

Enhanced Geometry Estimation In Fig. 10, we employ a state-of-the-art uncalibrated geometry estimation approach [46] to output albedo and surface normals from 6 images, each with a different lighting direction. When using the unseparated image as input to the model provided by the authors, the estimated albedo and surface normals have artifacts at specular regions as shown by red arrows. We integrate the surface normals and show the resulting 3D surface. We further apply the albedo onto this surface, which we relight under new lighting (shown in inset). Using the diffuse component estimated by our approach gives a much cleaner estimated albedo and surface normals, resulting in improved mesh reconstruction and relighting.

 figure: Fig. 9.

Fig. 9. Our diffuse-specular separation can be used to enhance material estimation methods, illustrated on real data. The material parameters estimated from separated components (using our method and [45] retrained to take separated inputs) are consistently better than unseparated (unpolarized) inputs, both with single and multiple light positions. For example, unseparated inputs lead the model to make wrong inferences that assume a colored specular map and flat normals. The estimates using our separation result in more realistic renderings under novel lighting.

Download Full Size | PDF

 figure: Fig. 10.

Fig. 10. Diffuse-specular separation enhances geometry estimation: We use state-of-the-art deep uncalibrated geometry estimation network [46] to estimate surface normals and albedo from captures lit by an LED with varying light position. Using the unseparated image (top) results in artifacts at the specular highlights (red arrows). Using the separated diffuse from our approach (bottom) results in better albedo and normal estimation, yielding a better mesh reconstruction and relighting.

Download Full Size | PDF

6. Conclusion

We propose a method for single-shot diffuse-specular separation under spatially-varying, unknown dielectric BRDFs. The key to our method is to refine an initial analytical estimate using a feed-forward deep network trained on synthetic scenes rendered with complex materials. Our method yields state-of-the-art results and can be applied on real dynamic scenes. Furthermore, using our method provides drop-in replacements that improves several downstream task such as material estimation and editing.

Despite providing state-of-the-art results, our method bears some limitations. In particular, only direct reflections are considered in this work; extending to indirect illumination would improve the robustness to inter-reflections. Furthermore, extending the lighting model area lights would allow the method to be used with a wider range of setups. We hope our fast and robust method helps democratize the use of polarimetric cameras for appearance acquisition and image-based editing tasks in a wide spectrum of cases.

Funding

National Science Foundation (1652633, 1730574).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. P. Debevec, T. Hawkins, C. Tchou, H.-P. Duiker, W. Sarokin, and M. Sagar, “Acquiring the reflectance field of a human face,” in Proceedings of the 27th annual conference on Computer graphics and interactive techniques, (2000), pp. 145–156.

2. S. K. Nayar and M. Gupta, “Diffuse structured light,” in 2012 IEEE International Conference on Computational Photography (ICCP), (IEEE, 2012), pp. 1–11.

3. S. P. Mallick, T. Zickler, P. Belhumeur, and D. Kriegman, “Dichromatic separation: specularity removal and editing,” in ACM SIGGRAPH 2006 Sketches, (2006), pp. 166–es.

4. H.-L. Shen, H.-G. Zhang, S.-J. Shao, and J. H. Xin, “Chromaticity-based separation of reflection components in a single image,” Pattern Recognition 41(8), 2461–2469 (2008). [CrossRef]  

5. H. Kim, H. Jin, S. Hadap, and I. Kweon, “Specular reflection separation using dark channel prior,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2013), pp. 1460–1467.

6. S.-H. Baek, D. S. Jeon, X. Tong, and M. H. Kim, “Simultaneous acquisition of polarimetric svbrdf and normals,” ACM Trans. Graph. 37(6), 1–15 (2018). [CrossRef]  

7. S. A. Shafer, “Using color to separate reflection components,” Color Res. Appl. 10(4), 210–218 (1985). [CrossRef]  

8. S. P. Mallick, T. Zickler, P. N. Belhumeur, and D. J. Kriegman, “Specularity removal in images and videos: A pde approach,” in European Conference on Computer Vision, (Springer, 2006), pp. 550–563.

9. Y. Akashi and T. Okatani, “Separation of reflection components by sparse non-negative matrix factorization,” in Asian Conference on Computer Vision, (Springer, 2014), pp. 611–625.

10. Y. Liu, Z. Yuan, N. Zheng, and Y. Wu, “Saturation-preserving specular reflection separation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015), pp. 3725–3733.

11. A. C. Souza, M. C. Macedo, V. P. Nascimento, and B. S. Oliveira, “Real-time high-quality specular highlight removal using efficient pixel clustering,”, in 2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), (IEEE, 2018), pp. 56–63.

12. J. Guo, Z. Zhou, and L. Wang, “Single image highlight removal with a sparse and low-rank reflection model, in Proceedings of the European Conference on Computer Vision (ECCV), (2018), pp. 268–283.

13. Z. Wu, C. Zhuang, J. Shi, J. Xiao, and J. Guo, “Deep specular highlight removal for single real-world image, in SIGGRAPH Asia 2020 Posters, (2020), pp. 1–2.

14. J. Lin, M. E. A. Seddik, M. Tamaazousti, Y. Tamaazousti, and A. Bartoli, “Deep multi-class adversarial specularity removal,” in Scandinavian Conference on Image Analysis, (Springer, 2019), pp. 3–15.

15. S. Wu, H. Huang, T. Portenier, M. Sela, D. Cohen-Or, R. Kimmel, and M. Zwicker, “Specular-to-diffuse translation for multi-view reconstruction,” in Proceedings of the European conference on computer vision (ECCV), (2018), pp. 183–200.

16. H. Barrow, J. Tenenbaum, A. Hanson, and E. Riseman, “Recovering intrinsic scene characteristics,” Comput. Vis. Syst 2, 2 (1978).

17. E. H. Land and J. J. McCann, “Lightness and retinex theory,” J. Opt. Soc. Am. 61(1), 1–11 (1971). [CrossRef]  

18. Q. Chen and V. Koltun, “A simple model for intrinsic image decomposition with depth cues,” in Proceedings of the IEEE International Conference on Computer Vision, (IEEE, 2013), pp. 241–248.

19. S. Bell, K. Bala, and N. Snavely, “Intrinsic images in the wild,” ACM Trans. Graph. 33(4), 1–12 (2014). [CrossRef]  

20. J. T. Barron and J. Malik, “Shape, illumination, and reflectance from shading,” TPAMI 37(8), 1670–1687 (2015). [CrossRef]  

21. T. Zhou, P. Krahenbuhl, and A. A. Efros, “Learning data-driven reflectance priors for intrinsic image decomposition,” in Proceedings of the IEEE International Conference on Computer Vision, (2015), pp. 3469–3477.

22. Y. Ding, Y. Ji, M. Zhou, S. B. Kang, and J. Ye, “Polarimetric helmholtz stereopsis,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), pp. 5037–5046.

23. L. B. Wolff, “Using polarization to separate reflection components,” in Proceedings CVPR’89: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (IEEE, 1989), pp. 363–369.

24. L. B. Wolff, “Material classification and separation of reflection components using polarization/radiometric information,” in Proceedings of a workshop on Image understanding workshop, (1989), pp. 232–244.

25. S. K. Nayar, X.-S. Fang, and T. Boult, “Removal of specularities using color and polarization,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, (IEEE, 1993), pp. 583–590.

26. J. Riviere, I. Reshetouski, L. Filipi, and A. Ghosh, “Polarization Imaging Reflectometry in the Wild,” ACM Trans. Graph. 36(6), 1–14 (2017). [CrossRef]  

27. V. Deschaintre, Y. Lin, and A. Ghosh, “Deep polarization imaging for 3d shape and svbrdf acquisition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), pp. 15567–15576.

28. Z. Cui, J. Gu, B. Shi, P. Tan, and J. Kautz, “Polarimetric multi-view stereo,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2017), pp. 1558–1567.

29. J. Zhao, Y. Monno, and M. Okutomi, “Polarimetric multi-view inverse rendering,” arXiv preprint 2007.08830 (2020).

30. Y. Fukao, R. Kawahara, S. Nobuhara, and K. Nishino, “Polarimetric normal stereo,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), pp. 682–690.

31. T. Ichikawa, M. Purri, R. Kawahara, S. Nobuhara, K. Dana, and K. Nishino, “Shape from sky: Polarimetric normal recovery under the sky,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), pp. 14832–14841.

32. L. Yang, F. Tan, A. Li, Z. Cui, Y. Furukawa, and P. Tan, “Polarimetric dense monocular slam,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), pp. 3857–3866.

33. W. A. Smith, R. Ramamoorthi, and S. Tozza, “Linear depth estimation from an uncalibrated, monocular polarisation image,” in European Conference on Computer Vision, (Springer, 2016), pp. 109–125.

34. A. Kadambi, V. Taamazyan, B. Shi, and R. Raskar, “Polarized 3d: High-quality depth sensing with polarization cues,” in Proceedings of the IEEE International Conference on Computer Vision, (2015), pp. 3370–3378.

35. Y. Ba, A. Gilbert, F. Wang, J. Yang, R. Chen, Y. Wang, L. Yan, B. Shi, and A. Kadambi, “Deep shape from polarization,” arXiv e-prints pp. arXiv–1903 (2019).

36. S.-H. Baek, T. Zeltner, H. Ku, I. Hwang, X. Tong, W. Jakob, and M. H. Kim, “Image-based acquisition and modeling of polarimetric reflectance,” ACM Trans. Graph. 39(4), 139 (2020). [CrossRef]  

37. R. Li, S. Qiu, G. Zang, and W. Heidrich, “Reflection separation via multi-bounce polarization state tracing,” in European Conference on Computer Vision, (Springer, 2020), pp. 781–796.

38. “Sony polarization image sensors,” https://www.sony-semicon.co.jp/e/products/IS/industry/product/polarization.html (2021). Accessed: 2021-09-25.

39. M. Morimatsu, Y. Monno, M. Tanaka, and M. Okutomi, “Monochrome and color polarization demosaicking using edge-aware residual interpolation,” in 2020 IEEE International Conference on Image Processing (ICIP), (IEEE, 2020), pp. 2571–2575.

40. S. Qiu, Q. Fu, C. Wang, and W. Heidrich, “Linear polarization demosaicking for monochrome and colour polarization focal plane arrays,” in Computer Graphics Forum, (Wiley Online Library, 2021).

41. E. Collett, Polarized Light: Fundamentals and Applications (CRC Press, United States, 1992).

42. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computer-assisted intervention, (Springer, 2015), pp. 234–241.

43. Z. Xu, K. Sunkavalli, S. Hadap, and R. Ramamoorthi, “Deep image-based relighting from optimal sparse samples,” ACM Transactions on Graphics (ToG) 37, 1–13 (2018). [CrossRef]  

44. Y. Xiong, A. Chakrabarti, R. Basri, S. J. Gortler, D. W. Jacobs, and T. Zickler, “From shading to local shape,” TPAMI (2014).

45. V. Deschaintre, M. Aittala, F. Durand, G. Drettakis, and A. Bousseau, “Flexible svbrdf capture with a multi-image deep network,” in Computer Graphics Forum, vol. 38 (Wiley Online Library, 2019), pp. 1–13.

46. D. Lichy, J. Wu, S. Sengupta, and D. W. Jacobs, “Shape and material capture at home,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), pp. 6123–6133.

Supplementary Material (2)

NameDescription
Supplement 1       In this document, we provide detailed derivation of our forward model and demonstrate additional results.
Visualization 1       In this video, we compare our approach with baselines and demonstrate results on dynamic scenes.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (10)

Fig. 1.
Fig. 1. Separating diffuse and specular reflectance from a scene such as (a) is an ill-posed problem. We propose a single-shot method to this problem that utilizes polarization cues and data-driven priors. Our capture setup (b) comprises of a polarimetric camera and a polarized point source. Existing polarization-based analytical methods such as (c) [1] assume the specular is completely polarized while the diffuse is completely unpolarized. We demonstrate that this assumption does not hold for realistic scenes resulting in imperfect separation. Training feed-forward networks to directly perform this separation from polarized measurements performs better (d) but still fails to suppress all artifacts. Our approach (e) incorporates polarization cues along with a data-driven neural model and provides accurate diffuse-specular separation.
Fig. 2.
Fig. 2. Our Pipeline: We capture a single image with the polarimetric camera under fixed polarized point lighting (left). We perform demosaicing, compute Stokes vector and obtain an initial polarization-based analytical separation (middle). This separation has artifacts due to sensor saturation and ill-posed recovery. These maps are used as inputs to a feed-forward neural refinement step, producing the final separated images (right). Red arrows depict specular bleeding into diffuse and green arrows depict diffuse bleeding into specular.
Fig. 3.
Fig. 3. Results on synthetic test scenes. The ground truth unseparated, diffuse and specular intensities are shown in (a). Separating the diffuse and specular components using traditional analytical separation (b) causes the diffuse component to leak into the specular result and vice-versa. CNN-based approach on the polarized images (c) alleviates some of the artifacts but still suffers from artifacts. Our result (d) shows a much better diffuse and specular results, closest to ground truth (a) as shown by the absolute error maps. The diffuse and specular maps are tonemapped with $\gamma=2.2$ for visualization. Red arrows depict specular bleeding into diffuse and green arrows depict diffuse bleeding into specular.
Fig. 4.
Fig. 4. Comparisons with existing diffuse-specular separation approaches on synthetic scene. Debevec et al. [1] does not handle diffuse polarization and has artefacts due to the diffuse bleeding into specular (green arrow). Ding et al. [22] provides accurate specular estimation but requires capturing multiple polarization images by rotating the lighting polarizer to known angles. Our initial analytical separation assumes unpolarized diffuse leading to the diffuse bleeding artefacts(green arrow). SpecUNet then refines the initial separation and produces accurate specular component that is comparable to Ding et al. reconstruction but with a single unknown lighting polarizer angle.
Fig. 5.
Fig. 5. Comparisons with existing approaches upon varying error in calibration of the lighting polarizer. We render synthetic scenes with random perturbations to the lighting polarizer’s orientation. Existing approaches require calibration of the lighting polarizer and their performance degrades on increasing the standard deviation of the perturbations. Our approach is agnostic to the lighting polarizer calibration and results in higher mean PSNR for the estimated diffuse and specular components.
Fig. 6.
Fig. 6. Comparison of our diffuse-specular separation with baselines on real objects, captured with our setup (polarized point light and a polarimetric camera). The unseparated image is shown on the left in (a). Separating the diffuse and specular components using a traditional polarimetric separation (b) leads to clear leaks of diffuse light into the specular result. A CNN trained for this separation directly on polarized images performs better, yet still suffers from artifacts. Our separation (c) shows a much cleaner specular component and a more accurate diffuse component. Red arrows depict specular bleeding into diffuse and green arrows depict diffuse bleeding into specular. Please refer to Visualization 1 for video results on dynamic scenes.
Fig. 7.
Fig. 7. Diffuse-specular separation of dynamic scenes. Our single-shot capture setup enables the acquisition of dynamic scenes at 50 Hz and provides separated diffuse (left) and specular (right) components from the video capture for three different frames (c). Raw image from the sensor (a) and the unseparated image (b) for the first frame shown for reference. Please refer to Visualization 1 for the full separated diffuse and specular videos.
Fig. 8.
Fig. 8. Image-based appearance editing. From an input image (a), our method estimates a diffuse-specular separation (b-c) that can be used to perform appearance editing such as changing the material to a metal (d), increasing its roughness (e), and texture editing without affecting the specular component (f)
Fig. 9.
Fig. 9. Our diffuse-specular separation can be used to enhance material estimation methods, illustrated on real data. The material parameters estimated from separated components (using our method and [45] retrained to take separated inputs) are consistently better than unseparated (unpolarized) inputs, both with single and multiple light positions. For example, unseparated inputs lead the model to make wrong inferences that assume a colored specular map and flat normals. The estimates using our separation result in more realistic renderings under novel lighting.
Fig. 10.
Fig. 10. Diffuse-specular separation enhances geometry estimation: We use state-of-the-art deep uncalibrated geometry estimation network [46] to estimate surface normals and albedo from captures lit by an LED with varying light position. Using the unseparated image (top) results in artifacts at the specular highlights (red arrows). Using the separated diffuse from our approach (bottom) results in better albedo and normal estimation, yielding a better mesh reconstruction and relighting.

Tables (3)

Tables Icon

Table 1. Salient features differentiating our approach from existing polarimetric diffuse-specular separation techniques.

Tables Icon

Table 2. Ablation study demonstrating strengths of different components of our approach. Metrics shown are PSNR (in dB) on (a) random objects dataset (see Sec. 4.4) and (b) 3D scenes [44]. A single UNet that outputs both specular and diffuse (1) offers a strong baseline, but has difficulty with the specular map of dataset (a). Removing the polarization angle input from the refinement network (2) and replacing the inputs with stokes vectors provides slightly lower test PSNR.

Tables Icon

Table 3. Top: Salient features differentiating our approach from the baselines (see text for their description). Our approach exploits polarization cues to get initial analytic separation followed by CNN-based refinement. Bottom: PSNR of estimated diffuse and specular on our test set with held-out random objects dataset and a 3D scene dataset. Our separation gives consistently better PSNR values than direct CNN methods and traditional analytical separation.

Equations (21)

Equations on this page are rendered with MathJax. Learn more.

s = τ [ 1 β cos ( 2 ϕ ) β sin ( 2 ϕ ) 0 ] T ,
τ = s 0  ,  β = s 1 2 + s 2 2 s 0  , and  ϕ = 1 2 tan 1 ( s 2 s 1 ) .
s i n = τ l [ 1 cos ( 2 ν l ) sin ( 2 ν l ) 0 ] T .
s d = H d s i n .
s d = τ d [ 1 β d cos ( 2 ϕ d ) β d sin ( 2 ϕ d ) 0 ] T .
τ d = I d τ l T o + ( T i + + T i γ o + l )  ,  β d = T o T o +  , and  ϕ d = ϕ o
s s = H s s i n = τ s [ 1 β s cos ( 2 ϕ s ) β s sin ( 2 ϕ s ) 0 ] T .
τ s = I s τ l ( R + + R γ i + l ) , β s = 1 , ϕ s = tan 1 ( R χ o + R + χ o γ i + l + R × γ o χ i + l R γ o + R + γ o γ i + l + R × χ o χ i + l ) .
s m i x = s d + s s = [ τ d + τ s τ d β d cos ( 2 ϕ d ) + τ s cos ( 2 ϕ s ) τ d β d sin ( 2 ϕ d ) + τ s sin ( 2 ϕ s ) 0 ] .
τ m i x = τ d + τ s , β m i x = ( β d τ d + τ s ) 2 2 β d τ d τ s sin 2 ( ϕ d ϕ s ) τ s + τ d , ϕ m i x = tan 1 ( τ d β d sin ( 2 ϕ d ) + τ s sin ( 2 ϕ s ) τ d β d cos ( 2 ϕ d ) + τ s cos ( 2 ϕ s ) ) .
p k = 1 2 [ 1 cos 2 ν k c sin 2 ν k c 0 ] T .
m ( x , ν c ) = P ( ν c ) s m i x ( x , o ) ,
m s a t ( x , ν c ) = min ( m ( x , ν c ) , 1 ) .
m ( x , ν c ) , ν c = [ 0 , 45 , 90 , 135 ] .
s ^ ( x , o ) = P ( ν c ) + m ( x , ν c ) ,
s s ( x , o ) [ τ d + τ s τ s cos ( 2 ϕ s ) τ s sin ( 2 ϕ s ) 0 ] T .
τ d s ^ 0 s ^ 1 2 + s ^ 2 2 τ ^ d .
τ s s ^ 1 2 + s ^ 2 2 τ ^ s ,
L d ( U d , τ d ) = U d τ d 1 + 5 x U d x τ d 1 .
U s d = τ m i x ( exp U s 1 ) ,
L s ( U s , τ s , τ m i x ) = U s ( log ( 1 + τ s ) ) 1 + 50 x U s x ( log ( 1 + τ s ) ) 1 + L d ( U s d , τ d ) .
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.