Semi-automated infrared simulation on real urban scenes based on multi-view images

Xixian Xiong; Fugen Zhou; Xiangzhi Bai; Bindang Xue; Changming Sun

doi:10.1364/OE.24.011345

1. Introduction

The simulation of thermal infrared (IR) signatures of man-made and natural targets has received an increasing level of interest in recent years. It is mostly applied as a method for assessing the benefits of new infrared imaging technology, both in military and civil applications, such as target detection, precision guidance, and night navigation. Physical experiments are expensive, subject to the limitations of meteorological environments and observation conditions, and cannot be used directly to design or evaluate other applications. By using simulation method, however, it is considerably easier to control at a reduced cost. At the same time, more scenarios can be investigated than conducting field trials in the real-world. IR simulation of real-world scenarios is a comprehensive work that involves computer graphics, thermal modeling, and other multi-disciplinary theories.

As thermal modeling is very sensitive in the shape and quality of the calculating elements, well designed geometric models are a necessity [1]. To generate the 3D target model on the terrain surface, two common approaches are frequently used. If it is possible to find a matched template for the target in a current model library, a 3D model can be directly constructed by modifying a set of parameters. But beyond that, for complicated targets, Computer-Aided Design (CAD) program (e.g., 3D Studio Max, SkechUp, Blender or Polytrans) are needed to manually build the sophisticated structures. This manual work comes at the cost of a significant amount of processing time. Moreover, it needs a priori detailed information of the targets that are sometimes difficult to obtain. Therefore, this whole workflow is greatly affected by the reliability of data source, especially limited in sensitive areas. To produce a quality geometric model for large scale urban scenes, it usually takes weeks or even months for knowledgeable CAD users [2]. Due to the difficulties of modeling real-world sites as we described above, many researchers construct synthetic scene models for IR modeling; however, this is hard to put into practical applications.

In the last two decades, a considerable amount of 3D reconstruction approaches have been developed, aiming at automatically modelling large scale urban scenes. Most of these approaches, however, are designed to deal with Light Detection and Ranging (LiDAR) point clouds data obtained from laser scanners mounted on airborne planes or ground vehicles. Acquiring LiDAR data usually faces expensive device cost. In complex scenes, it cannot cover all sides of buildings, or it may have severe occlusions [3]. Recent advances on multi-view stereo (MVS) workflows (Acute3D, Autodesk, Pix4D) allow the generation of an accurate raw geometry from images on a wide variety of scenes. Despite the fact that MVS meshes are in general less accurate than LiDAR points at similar resolution, they provide more structural information and they are enriched with high resolution textures. Modeling from multi-view images is economical and easy to handle, even a hand-held camera can be used to acquire high quality image sets. It provides more flexibility and significantly improves the efficiency for constructing 3D models of large scale scenes in real-world.

In particular, MVS algorithms typically produce overly complex meshes which are cumbersome for storage, indexing, transmission, and rendering. Such meshes contain many geometric and topological defects. It requires increased robustness for posteriori simplification. Furthermore, regular mesh elements have to be regenerated from simplified geometric models, for the purpose of accurately computing temperature distribution on target surfaces. Mesh generation algorithm based on advanced technology is widely used in thermal calculation. As solving fine thermal model for each surface element is quite sensitive to the geometric topology (size and ratio between edges), this technology produces quality meshes with good shapes that are dense near boundary and sparse in smooth areas.

Thermal modeling is a complex process with a variety of factors affecting the infrared signatures of real-world objects. Many researchers [4–7] study the influence caused by the factors such as meteorological conditions, heat-source distribution, and physical properties (emissivity, absorptivity, reflectivity capabilities, etc.) of materials composing the target. These IR signatures are also dynamically varying as a result of internal and external influences on the object, resulting from the heat balance comprising isolation, internal heat sources, aerodynamic heating (airborne objects), conduction, convection, and radiation. In order to accurately model object’s signatures in a simulation, the energy balance equation is established based on the first thermodynamic law, describing the whole energy exchange progress with these factors involved. In the last two decades, several thermal modeling system have been developed by governments and research institutions. Representatives of such systems include Vega developed by MultiGen-Paradigm, SE-Workbench (Synthetic Environment Workbench) [8] developed by a French company OKTAL-SE, MuSES (Multi-Service Electro-optics Signature) [9] developed by ThermoAnalytics, and DIRSIG (the digital imaging and remote sensing image generation) [10] developed by Rochester Institute of Technology. In spite of these systems’ different application focus areas, their underlying physics-based approach is similar.

However, the applicability of these systems is still limited by their needs for complicated processing: 3D target models have to be created and assigned material characteristics manually. Current methods of generating scenes for use in IR simulation are quite labor intensive and time consuming, especially for large scale scenes in real-world. Therefore, there is a strong desire to put forward a more automatic framework, which is more flexible in applications and with less interactions from human.

1.1. Related works

In the last two decades, with a large number of approaches that have been developed, both 3D reconstruction and IR simulation have been hot topics in modeling urban scenes in real-world. In this section, we review previous works that are mostly related to our method. We cover the two main facets of our problem statement: geometric model reconstruction and infrared signature simulation specific to urban scenes.

Urban model reconstruction

In our approach, reconstruction amounts to automatically turning the raw input data into geometric model composed of watertight buildings in a large scale urban scene. A possible classification of the existing approaches is to distinguish between two main surface reconstruction methods: smooth and primitive-based [11].

Smooth reconstruction approaches generate C₁-continuous surface (surface whose derivative is continuous) using implicit or explicit representations. Implicit methods recover surface from defect-laden point sets involves denoising and filtering of outliers [12–14]. Explicit methods commonly reconstruct surfaces using Delaunay triangulations when the sampling is noise-free and dense enough. The underlying idea is that points close on the surface should also be close in space [15]. As for the requirements of thermal modeling and calculation, model surfaces need to be finally subdivided into quality mesh elements according to their geometric structure and material characteristics. State-of-the-art Structure from Motion (SfM) and Multi-View Stereo (MVS) methods [16] have produced extremely compelling results via smooth surface reconstruction on a wide variety of scenes. However, such meshes are usually not sharp at edges as well as cross material areas or semantic components.

Primitive-based surface reconstruction methods have been popular in recent years. These approaches are relevant alternative to smooth reconstructions. They are commonly used in dealing with large scale urban scenes such as buildings which commonly composed of canonical parts, e.g., planar components. They may generate watertight geometric models with sharp edges and greatly improve computational efficiency. Model surface can be further divided into regular and quality mesh elements at user-defined size. Primitive-based reconstruction mainly relies on two input data sources: Multi-View Stereo (MVS) images and LiDAR point sets.

3D LiDAR point sets are directly generated by laser scanners on airborne equipment and ground vehicles with high accuracy. Due to the data acquisition strategy, existing airborne-based methods [17–19] are difficult to go beyond the 2.5D representation (polygonal model composed of complex roofs and only vertical facades) while ground-based methods fail to reconstruct roof structures. Even though meshes and point sets derived by MVS are generally less accurate than the LiDAR data, they contain details on vertical components such as facades [20]. 3D information obtained from both LiDAR point sets and MVS images can be used to extract planar primitives. The Manhattan World assumption [21] was proposed by Coughlan and Yuille and constrains three orthogonal directions when exploring planes. Vanegas et al. reconstructed urban models from LiDAR point sets under a restrictive assumption [22]. Chauve produced a concise and idealized 3D representation from unstructured point data of complex cluttered real-world scenes [23]. Missing structures were recovered by ghost primitives but comes at a risk of surface reliability. Zhou and Neumann discovered global regularities among planes [24]. Lafarge and Verdie proposed an automated reconstruction pipeline to generate multiple coherent level of detail (LOD) models from image-based modeled meshes [25].

Infrared simulation

With increasing demand of infrared technology in civil and military affairs, a reliable simulation system is required for researches on thermal infrared IR signatures of man-made and natural targets in real-world scenarios. It offers the possibilities to generate realistic representations of infrared scenes in different times, meteorological conditions, and viewing conditions. In recent years, a series of special and general IR simulation systems have been developed.

The infrared modeling and analysis (IRMA) signature prediction model was one of the first high resolution, physics-based IR target and background signature models developed for tactical weapon applications. It was originally developed in 1980 by the Munitions Directorate of the Air Force Research Laboratory (AFRL/MN) [26]. First principles model is one that seeks to calculate surface temperature distribution starting directly from established equations of local thermodynamic equilibrium (LTE). It takes into account various factors that affect the temperatures of objects in real-world. In 1987, physically reasonable infrared signature model (PRISM) was developed based on first principles semi empirical model [27]. It was a continuous improvement to become a standard tool of modeling infrared signatures. The intended replacement for PRISM, multi-service electro-optic signature (MuSES), was developed by ThermoAnalytics in 1998. It combined rapid prototyping thermal designing and infrared signature prediction. The combination offers increased capabilities and ease of use. Georgia Tech Research Institute proposed Georgia Tech signature model (GTSIG) in 1990, which offered IR signature prediction by creating thermal meshes and difference equations based on first principles [28]. Besides, Rochester Institute of Technology presented the digital imaging and remote sensing scene generation (DIRSIG) model in 1991. It effectively models broadband, multispectral and hyper spectral imagery using a suite of first principles based radiation propagation modules [29]. A number of commercial simulation tools have also been developed and widely used for secondary development. Some representative systems include American MultiGen-Paradigm’s Vega Prime, Britain INSYS’s CAMEO-SIM, French OKTALSE’s WorkBench-IR, and Canadian Davis Engineering’s ShipIR [30].

In summary, even though a variety of methods have been proposed to solve each component of our problem individually, there is a strong need to develop an automatic reconstruction technology into an IR simulation system. The new process of IR simulation is more efficient and applicable to data measured on large scale urban scenes in real-world.

1.2. Contributions

To our knowledge, our approach is the first which combines automatic urban reconstruction and thermal modeling into IR simulation. The input geometric data are triangle meshes, which usually contain noise, facet density variations, multi-scale components, and errors resulting from multi-view stereo processes. Our algorithm can digest laden-defect meshes into an optimized watertight geometry model. It preserves semantic structure at a user-defined scale and is then subdivided into quality elements for thermal calculation. To accurately and conveniently calculate the dynamic thermal radiation distribution, this paper adopts a fast method to solve the heat transfer model based on these regularized surface elements. The whole framework provides an efficient solution for IR simulation and validation of real-world scenarios which significantly push the boundaries of the state-of-the-art. An overview of our method is shown in Fig. 1.

Fig. 1 An overview of the proposed simulation pipeline: (a) A sequence of multi-view images acquired by a hand-held camera; (b) Mesh model generated by using MVS; (c) Superfacets clustered through region growing; (d) Superfacets labeled with different classes of interest; (e) Plane proxies extracted from superfacets; (f) Optimized geometric model produced via surface reconstruction; (g) Subdivided element meshes of model surface; (h) Model with assigned material attributes; (i) Simulated infrared scene.

Download Full Size | PDF

A major contribution of our work is a semi-automated IR simulation pipeline that departs from existing work. It is capable of directly and semi-automatically generate quality geometric models and simulating infrared signatures of real-world scenarios. Basically, our algorithm begins at simplifying input data into 3D-primitive arrangements. The space decomposition approaches used in [23, 25] are closest to ours but have limitations. The algorithm in [23] is based on 3D point clouds that are geometrically less structured, and it works well only when all planes are perfectly detected from dense points. The method cannot be directly applied to MVS meshes, which requires high robustness to deal with their geometric and topological defects. In [25], the method globally regularizes plane proxies with a hierarchical organization of the canonical geometric relationships. Even though it greatly reduces the amount of initial planes in space partition, the global splitting process leads to over partition in the space which is not traversed by the splitting plane. Thus, the following two specific technical ingredients are applied to enhance robustness to input mesh defects and scalability: (i) two-level hierarchy in abstraction of plane proxies; and (ii) constraint binary splitting in space decomposition.

The remainder of this paper is organized as follows. In Section 2, we generate a watertight geometric model from image-based modeled meshes via semantic classification, primitive abstraction, and surface reconstruction. In Section 3, the reconstructed surface is subdivided into element meshes and different material attributes are attached. Thermal models are established to calculate radiation distribution for different elements. Physical processes affecting the IR signatures are described. In Section 4, we experimentally validate our approach on various types of small scale scenes and a very large scale real-world scene. In Section 5, conclusion remarks are made.

2. Geometric model generation

The input datasets of our generation algorithm are triangle meshes reconstructed from multi-view images of real-world scenes. Recent multi-view stereo methods such as CMP-MVS [31] produce dense meshes, enriched with high resolution textures. Following the sequential approach (semantics-then-geometry) proposed in [25], our method contains three main steps: semantic classification, primitive abstraction, and surface reconstruction.

2.1. Semantic classification

The goal of semantic classification is to separate each individual urban object from others with labels. They are grouped into four classes: roof, facade, ground, and vegetation. The classification process relies on unsupervised Markov random field (MRF).

In order to save computation time, we first segment the input data into superfacets that are composed of connected triangle meshes. Superfacets are generated by clustering connected triangle meshes with similar geometric topologies through a region growing process. Many criterions have been proposed in the literature on mesh segmentation [32, 33] to evaluate this similarity. Nevertheless, most of these technologies cannot be adapted to meshes generated by multi-view stereo due to the defects mentioned in Section 1. Refer to the mesh segmentation methods used in [23, 25, 34], curvature analysis and plane estimation are known to be two effective methods to solve this problem. Our final goal of semantic classification is to extract plane proxies from superfacets with high planarity and large areas, which can be labeled as roof or facade. As shown in Fig. 2, a two-level hierarchy algorithm is designed to generate and classify superfacets by using different region growing strategies.

Fig. 2 Classification of superfacets into semantic groups: (a) Input mesh; (b) Clustered superfacets (based on curvature information); (c) Aggregated superfacets (according to the rules of our hierarchy algorithm); (d) Classified superfacets with labels of semantic groups. Color code used here: roof (blue), facade (yellow), and ground (brown).

Download Full Size | PDF

At layer 1, we aim at obtaining initial superfacets for labeling among four classes. Due to the fact that plane estimation is time consuming, and some objects such as vegetation have curved surfaces, we use the shape operator matrix [35] for each triangle mesh on a local spherical neighborhood of radius r_m. We compare the Frobenius norm of matrices with threshold t_m during clustering.

To measure the possibility of a superfacet belonging to one of these four categories, some feature attributes should be defined. The attributes proposed in LiDAR based methods [3, 37] are used to classify points into three classes: ground, building, and tree. We use the geometric attributes introduced in [25] to further separate roof and facade from the building class. These attributes take values within [0, 1], which are summed up for each superfacet with area-weighting factor of its facets.

- The elevation attribute α_e is defined as a function of the relative height of facet centroid h_i along the vertical axis of the scene: $α_{e} (f_{i}) = {(\frac{h_{i} - h_{min}}{h_{max} - h_{min}})}^{1 / 2}$ where [h_min, h_max] denotes the height range of all facet centroids located within a local spherical neighborhood of radius R_f. R_f was set by default to half of the height range of the whole scene.
- The planarity attribute α_p denotes the coplanarity of the superfacet containing facet f_i. We use the values λ_min, λ_mid, and λ_max to determine the “degree of coplanarity” [36], α_p = 1 indicates a perfectly co-planar superfacet: $α_{p} (f_{i}) = 1 - \frac{3 {| λ_{min} |}^{1 / 2}}{{| λ_{min} |}^{1 / 2} + {| λ_{mid} |}^{1 / 2} + {| λ_{max} |}^{1 / 2}}$ where λ_min denotes the minimum eigenvalues of the covariance matrix computed over all facets of superfacet containing facet f_i. Each eigenvalue measures the variance of the superfacet along the corresponding eigenvector.
- The horizontality attribute α_h measures the deviation of the unit normal n_i to the facet f_i with respect to the vertical axis of the scene: $α_{h} (f_{i}) = | n_{i} \cdot n_{z} |$ where n_z denotes the unit normal along the vertical axis of the scene.

Finally, the semantic classification becomes a typical multi-label problem. We compute an assignment of labels l_s to superfacets s ∈ S such that the joint labeling l minimizes an energy function U(l). The function consists of two terms:

U (l) = \sum_{i \in S} U_{d} (l_{i}) + γ \sum_{{i, j} \in E} U_{s} (l_{i}, l_{j})

The consistency term U_d(l_i) introduced in [25] measures the cost of the label l_i at the superfacet i. It is computed as follows with ᾱ = 1 − α:

U_{d} (l_{i}) = A_{i} \cdot {\begin{array}{l} 1 - α_{p} \cdot α_{h} \cdot {\bar{α}}_{e} & if l_{i} = ground \\ 1 - {\bar{α}}_{p} \cdot α_{h} & if l_{i} = vegetation \\ 1 - α_{p} \cdot {\bar{α}}_{h} & if l_{i} = facade \\ 1 - α_{p} \cdot α_{h} \cdot α_{e} & if l_{i} = roof \end{array}

The smoothness term U_s(l_i, l_j) represents the pairwise interaction cost between adjacent superfacets i and j. Two superfacets are defined to be adjacent when they share at least one edge:

U_{s} (l_{i}, l_{j}) = C_{i j} \cdot | n_{i} \cdot n_{j} | \cdot δ (l_{i}, l_{j})

where δ(l_i, l_j) is an indicator function that has a value 0 if i and j are assigned with the same label, otherwise it has a value 1. n_i and n_j are defined as unit normals of superfacets i and j respectively. C_ij denotes the total length of the interface edge between superfacets i and j. An optimal solution to this energy minimization problem is solved through the α-expansion algorithm [38] based on the graph-cuts theory.

In the assumption of our theory, buildings are commonly composed of planar components. For a better plane estimation of semantic structure, at layer 2 of our hierarchy algorithm, we generate a new set of superfacets by clustering superfacets labeled as roof or facade. Denote that P_i and P_j are local fitting planes for superfacets i and j with respective centroids c_i and c_j, and unit normals n_i and n_j, d_⊥(c, P) is the orthogonal distance between point c and proxy P. This process can be broken into two steps and each step is implemented iteratively. The clustering process is under the constraints of orientation tolerance θ_s, planarity tolerance t_s, as well as distance tolerance d_s.

The superfacets with high planarity and a minimum large area (we impose α_p > t_s and area A larger than A_smax) is selected to be the initial region growing seeds. Their neighbor superfacets which are assigned with the same label can be integrated to form new seeds if the following conditions are met. The local fitting plane P of the seed superfacet is recalculated after each growing operation.
- - Position: the distance from c_i to P is less than d_s, d_⊥(c_i, P) ≤ d_s;
- - Normal: the angle between n_i and normal n_p of P is less than θ_s, |n_i · n_p| ≤ cos(θ_s);
- - Planarity: the planarity α_p of the new superfacet generated after growing is more than t_s.
After the first step has finished, there are still a large number of small superfacets, especially in weakly reconstructed structures. Their areas are under the threshold (we impose area A to be smaller than A_smax) and need to be further clustered to estimate their potential local fitting planes. Adjacent superfacets i and j can be merged if they meet the following conditions:
- - Area: both the areas of superfacets i and j are less than A_smax;
- - Position: the sum of distance from c_i to P_j and from c_j to P_i is less than 2d_s, |d_⊥(c_i, P_j) + d_⊥(c_j, P_i)| ≤ 2d_s;
- - Normal: the angle between n_i and n_j is less than θ_s, |n_i · n_j| ≤ cos(θ_s);
- - Planarity: the planarity α_p of the new superfacet generated after growing is more than t_s.

2.2. Primitive abstraction

As the input of primitive abstraction, superfacets have been clustered twice by using our two-level hierarchy algorithm. Connected superfacets which are labeled as roof or facade are divided into groups that are regarded as individual building components. For each component, we compute local fitting planes of superfacets in parallel, which have high planarity and large areas (we impose planarity α_p > t_p and area A larger than A_p). For curved surface, if its curvature is near zero, it is regarded as a plane in our method, otherwise it is represented by piecewise planes. These planes are estimated by using principal component analysis (PCA), and are referred to as primitive proxies. As proxies extracted from defect mesh areas may not be reliable, we refine all plane proxies by altering their orientations and positions under the global geometric regularities. Vegetation objects (e.g., trees and bushes) cannot be well reconstructed by MVS meshes due to their complicated detail structures. Furthermore, the structures of vegetation objects cannot be correctly approximated with planar proxies. In our method, we do not extract planar proxies from superfacets which are labeled as vegetation.

In order to use normalized description, we establish a local coordinate system for each building with n_z representing the direction of vertical axis Z. Unit vector n_x represents the direction of axis X, which is obtained by projecting normal n_p of the largest non-horizontal proxy P (its associated superfacet has the largest area, |n_p · n_z| ≤ cos(θ_p)) to the horizon plane. The unit vector n_y of axis Y is calculated by using the right hand rule. Therefore, we can write plane normal n in spherical coordinates (θ(n), ϕ(n)).

θ (n) = arccos (n \cdot n_{z})

φ (n) = arctan (\frac{n \cdot n_{y}}{n \cdot n_{x}})

Departing from the algorithms proposed in [39] and [24], we followed the detection-then-regularization strategy proposed in [25] to reduce the accumulated error in a greedy process of graph regularization. As a combination of these three approaches, we finally introduce five general relationships of plane proxy pairs (P_i, P_j). These relationships are under orientation tolerance θ_p or distance tolerance d_p.

- Parallelism: P₁ and P₂ are parallel if |n₁ · n₂| ≥ cos(θ_p);
- Orthogonality: P₁ and P₂ are orthogonal if |n₁ · n₂| ≤ cos(θ_p);
- V-symmetry: P₁ and P₂ are V-symmetric if |θ(n₁) − θ(n₂)| ≤ θ_p;
- H-symmetry: P₁ and P₂ are H-symmetric if |φ(n₁) − φ(n₂)| ≤ θ_p;
- Coplanarity: P₁ and P₂ are coplanar if they are parallel and |d_⊥(c₁, P₂) + d_⊥(c₂, P₁)| ≤ 2d_p.

Here P₁ and P₂ are two plane proxies with respective unit normal n₁ and n₂, and c₁ and c₂ are the centroids of their supporting superfacets.

The first four relationships are related to proxy orientations. The notion of V-symmetry and H-symmetry matches the common assumption that most structures of man-made buildings are symmetric in similar slope and azimuthal values. Coplanarity is a specific relationship of parallel proxy pairs with a distance constraint. Proxies are clustered according to the relationships mentioned above to detect the global regularities.

We first put proxies into clusters of parallel proxies under the orientation tolerance θ_p and then calculate their average normal direction. Coplanar proxies are extracted from parallel proxy clusters under the distance tolerance d_p. We compute the centroid position of each coplanar proxy cluster. V-symmetry groups and H-symmetry groups are clustered from parallel proxy clusters under the orientation tolerance θ_p. Average θ angle and φ angle are calculated for each group. All average calculations of parameters are weighted by the area of each proxy which is defined as the total area of its associated superfacets. A relationship graph is constructed as the following steps, with each node representing a parallel cluster. The solid and dash edges between two nodes respectively denote orthogonality and two symmetry relationships under the orientation tolerance θ_p.

Sort all parallel proxy clusters in a descending order based on area. The initial node is the parallel proxy that has the largest area.
Find all parallel proxy clusters that are orthogonal to the initial node to form new nodes and edges. New nodes are constructed in a descending order based on area and their associated edges are formed with the index of their construction orders.
The node with the smallest index number is selected to be the new initial node. Repeat step (ii) until nodes and edges are constructed for all orthogonal pairs of parallel proxy cluster.
V-symmetry and H-symmetry pairs are finally connected with dash edges.

The global regularization of proxies can be divided into two steps: orientation alignment and placement alignment. The process is implemented through propagating in the relationships graph with the order of index on the dash edge, by altering orthogonal pairs of source and target nodes, as proxies extracted from large areas are more reliable. The whole process of proxy regulation is illustrated in Fig. 3.

Fig. 3 Proxy regularization: (a) For each individual building component, we construct the local coordinate system. (b) 22 plane proxies are extracted from the superfacets and then divided into (c) 7 parallel proxy clusters and (d) 22 coplanar proxy sets. (e) A relationship graph is constructed, where each node represents a parallel cluster, and the edge between two nodes represents their geometric relationships, such as orthogonality (purple), V-symmetry (blue), and H-symmetry (yellow). These proxies are regularized according to their geometric constraints that are illustrated in the graph.

Download Full Size | PDF

Orientation alignment

The orientation of target node is altered by constraining its normal to match the relationships of: (i) orthogonal with respect to its source node; (ii) V-symmetry if its V-symmetry group has more than one node; (iii) H-symmetry if its H-symmetry group has more than one node. The process of orientation alignment was illustrated in Fig. 4, which can be divided between the following cases:

- Case A: Orthogonality relationship, V-symmetry relationship, and H-symmetry relationship are matched. According to the robustness of these relationships, we can rank them from high to low as orthogonality, V-symmetry, and H-symmetry. When any two of the aforementioned relationships are matched, a new orientation can be calculated uniquely. So that Case A here can be switched to Case B. If no solution can be obtained, we do not align the orientation of the target node.
- Case B: Orthogonality relationship and V-symmetry relationship are matched.
- Case C: Orthogonality relationship and H-symmetry relationship are matched.
- Case D: H-symmetry relationship and V-symmetry relationship are matched.
- Case E: Only the orthogonality relationship is matched.
- Case F: Only the H-symmetry relationship is matched.
- Case G: Only the V-symmetry relationship is matched.

Fig. 4 Orientation alignment of proxies: purple, blue and yellow circles represent the relationships of orthogonality, V-symmetry, and H-symmetry respectively. Only when two circles intersect, the new orientation (red arrow) can be uniquely calculated as the direction head to the intersection point. If only one circle exists, the old orientation (gray arrow) is projected orthogonally onto the circle.

Download Full Size | PDF

From Case A to Case D, a new orientation can be uniquely calculated under two constraints of relationships. If no solution can be obtained, we do not align the orientation of the target node. From Case E to Case G, only one relationship is matched. We align the orientation of the target node to the direction under its specific constraint of relationship. The new orientation should be the one that has the smallest deviation to its initial orientation. For all the cases, in order to avoid large deviation, we only alter initial orientation n₂ when the new normal n′₂ meets the constraint |n′₂ · n₂| ≥ cos(θ_p).

Placement alignment

After orientation alignment, we refresh the orientations of all proxies. As shown in Fig. 5, for each cluster of coplanar proxies, we transfer the centroid of each proxy along the new orientation, so that proxies in the same coplanar proxy cluster finally share a unique supporting plane.

Fig. 5 Placement alignment of proxies.

Download Full Size | PDF

2.3. Surface reconstruction

Building components are composed of connected superfacets that are labeled as facade or roof. Their 3D arrangement of the supporting planes contains the entire geometric structure. As shown in Fig. 6, a watertight polygonal surface that are free of self-intersection can be reconstructed by assembling theses planes via decomposition of the 3D space. It preserves the semantic structures as well as completes the missing parts of buildings in a plausible manner.

Fig. 6 2D illustration of surface reconstruction: (a) Complete arrangement partition as in [25]; (b) Constraint arrangement partition of our method; (c) Anchors that are marked as inside (red) or outside (black) and their containing cells; (d) Combination of inside labelling cells. Solid line indicates splitting plane, red dot indicates inside anchor and black dot indicates outside anchor.

Download Full Size | PDF

Space decomposition

For large scale scenes that are composed of a variety of buildings, even when we restrict space decomposition to each individual building components, direct use of complete arrangement may leads to a complex computational task. Previous works based on space subdivision attempted to reduce the complexity by aligning planes to three orthogonal directions referred to as the Manhattan World assumption [21], global regularization of arrangements with a hierarchical organization of the canonical geometric relationships [25], or binary space partition which only split the cells intersected by the supporting plane of arrangement [23]. These approaches are either limited by the underlying assumption or generating a large amounts of unnecessary splitting space cells. Although we obtain many plane proxies by primitive abstraction, only a few of them contribute to the final reconstructed surfaces. We combined the methods used in [23] and [25] in order to reduce the number of planes used in space decomposition. It also avoids over partition of the space cells.

We build an initial space cell which is composed of six oriented facets from building component’s bounding box, based on the axis directions of its local coordinate system. We then generate anchor points that are uniformly sampled on each axis direction, and indicate whether the anchor is inside or outside the specific building component. We cast five rays from the four corners and the centroid of the bounding box’s upper facet to the anchor. For each ray, we count its intersections with the meshes of the component. If the number of rays with odd intersections is more than the ones with even intersections, the anchor is marked as inside, otherwise it is assigned as outside. All meshes and anchors of the component are assigned integers that denote the index of the cell which they belong to.

From each coplanar proxy cluster obtained in Section 2.2, we can obtain a specific supporting plane. We construct the 3D arrangement for each component from these planes, and rank them in a descending order based on the area of their associated coplanar proxy cluster. Then binary space partition is gradually processed by inserting supporting planes to space cells. Instead of global splitting as in [25], we only split the cells intersected by the plane being inserted. A cell is intersected by the supporting plane when any of its containing meshes meets one of the following three conditions:

the supporting plane of its associated superfacet is in the same coplanar proxy cluster with the inserted plane.
a superfacet adjacent to its associated superfacet contains the supporting plane which is in the same coplanar proxy cluster with the inserted plane.
the number of meshes exceeds the threshold N_m. These meshes are both under the orientation tolerance θ_p and the distance tolerance d_p to the inserted plane.

Cells intersected with the splitting plane are then subdivided into two children cells. The cell indexes of anchors and meshes are updated after each insertion. As shown in Fig. 6, the expansion of space decomposition with the constraint greatly enhances the robustness of the reconstruction as it greatly reduces the number of final splitting cells while avoid producing empty cells (no anchor is contained).

Labeling of space cells

For each building component, we obtain a binary tree after space decomposition that each node of the tree represents a space cell. We formulate the surface abstraction problem as an optimal inside/outside binary labeling of these cells. The labeling problem is handled with the cell-adjacency graph G = {C, F} where C = {c₁,...,c_p} are the cells of the component while the edge F = {f₁,..., f_q} denotes the facet linking adjacent cells. By minimizing the cost E via a min-cut formulation, we extract a surface S as the union of all the oriented facets from an inside cell to an outside one. C_in and C_out denote the groups of inside cells and outside cells respectively. The cost function E is introduced in [25] as bellow:

E (S) = \sum_{c_{i} \in C_{out}} V_{c_{i}} \cdot p (c_{i}) + \sum_{c_{i} \in C_{in}} V_{c_{i}} \cdot (1 - p (c_{i})) + β \cdot \sum_{f_{j} \in S} A_{f_{j}}

where V_{c_i} denotes the volume of cell c_i; p(c_i) denotes the function estimating the label probability with respect to the ratio of its inside/outside anchors, and A_{f_j} denotes the area of edge facet f_j. The first two terms are data terms of function E while the third term acts as a smooth term weighted by β. Efficient max-flow algorithm [40] is used to find the cut with minimal cost. Function p(c_i) is within the range [0, 1], which measures the probability of the coherence of assigning a label inside the cell c_i with ratio r_in.

p (c_{i}) = \frac{(2 r_{in} - 1) \cdot {| 2 r_{in} - 1 |}^{α} + 1}{2}

where r_in denotes the ratio of inside to outside anchors in the cell, α denotes the data sensitivity of function p. When α > 0, the result favours any surface with a small area. On the contrary, when α < 0, cells whose labels are inconsistent with r_in will be strongly penalized.

By merging the surface facets which belong to the same plane, as shown in Fig. 6, the complete polygonal surface of each building component is generated. According to the semantic classification labels of superfacets which are associated with these supporting planes, we coarsely classify these surface facets into two material groups of roof and facade, then save the model with the standard Wavefront OBJ format (a geometry definition file format first developed by Wavefront Technologies for its Advanced Visualizer animation package). Vegetation objects are not approximated with planar proxies for the reason described in Section 2.2. They are omitted in our generated geometric models.

3. Infrared signature modeling

3.1. Mesh regeneration

Thermal calculation aims to solve equations of heat exchange based on elements of an object surface. Even when the reconstructed geometric model contains planar surface and sharp edges, it does not fulfill our post-processing requirements. Thus, the polygonal surface model needs to be further divided into quality elements while the physical attributes of their materials were attached. These elements are triangle meshes and should meet the following requirements: (i) mesh sizes should not be too large (not exceed the maximum value specified by users); (ii) as near to equilateral as possible; (iii) without very small angles; and (iv) be denser near edges of the structure.

Meshes were regenerated by using NETGEN, an open-source software package that allows triangle surface meshing [41]. It automatically changes the resolution of meshes near boundaries depending on the user-specified grading value, given the maximum mesh size.

According to the physical principle of infrared radiation, different physical properties of surface materials (emissivity, absorptivity, reflectivity capabilities, etc.) will lead to distinction in infrared signatures, even between neighboring element meshes which have the same geometric parameters. As shown in Fig. 7, for every mesh element, according to multi-view images, we refined their initial coarse classification of roof and facade by assigning materials with necessary physical attributes. The geometric model of the large scale scene is finally generated and then saved by expanding the standard OBJ files with attributes of the physical properties of materials.

Fig. 7 Mesh regeneration: (a) Input mesh; (b) Simplified polygonal model; (c) Subdivided mesh elements with assigned material.

Download Full Size | PDF

3.2. The radiance balance equation

In the process of IR signature simulation, objects are considered to be stacks of layers with the surface layer exposed to the environment. The stack of layers is supposed to be consisting of N layers with equal depth d_l along the normal direction of the surface element mesh. The temperature of the surface layer is the result of a number of heat exchange mechanisms while the inside layers only consider heat conduction.

As shown in Fig. 8, these effects take place within the object itself as well as between the object and its exterior environment, i.e., radiation exchange, convection heat exchange, latent heat exchange, and heat conduction inside the object [42]. The dynamic distribution of surface temperature T_s can be calculated by solving differential equation of the form [43]:

\frac{d T_{s}}{d t} ρ d_{l} C_{p} = R + C + L

where

ρ:	density of the material (kg/m³)
C_p:	heat (at constant pressure) of the material (J/kg · K)
C:	convection heat flux (W/m²)
L:	latent heat flux (W/m²)

Fig. 8 Schematic view of the heat exchange processes.

Download Full Size | PDF

The radiation term R is composed of solar radiation E_sun, atmospheric radiation E_sky, and long wave radiation emitted from the surface layer. Solar radiation is the most important outer heat source for city scenes. As the amount of solar energy absorbed by object surface depends on the changing position of the sun, several pre-processing steps are necessary. During daytime, solar radiation is modeled according to the position of the sun (elevation and azimuth), location of the object, geometry of the surface, and atmospheric conditions (air temperature, cloud cover function, and other factors). When geographic information (longitude and latitude) of the city was known, at certain day and certain time, shadow maps can be calculated to determine which surface elements are illuminated. During night time, this calculation is not required. Then the shadow effect between objects is taken into account in the radiance balance equation:

R = α E_{sun} + ε E_{sky} - ε σ T_{s}^{4}

where

α:	the absorbent coefficients in short wave
ε:	the emissivity coefficients in long wave
σ:	the Stefan-Boltzmann’s constant value

The convection term C is divided into natural convection and forced convection. Natural convection is caused by the temperature difference between ambient air and object surface at the boundary layer, resulting in a constant circulation of air. Forced convection occurs when there is a forced movement of air, such as wind:

C = (h_{1} + h_{2} v_{air}) \cdot (T_{air} - T_{s})

where

h₁:	exchange coefficients for free convection (W/m² · K)
h₂:	exchange coefficients for forced convection (J/m³ · K)
v_air:	wind speed (m/s)
T_s:	object surface temperature (K)
T_air:	air temperature (K)

The latent heat exchange is written analogously. Both evaporation and condensation effects are included:

L = r (h_{1} + h_{2} v_{air}) \cdot (e (T_{air}, h_{r}) - e (T_{s}, 100 %))

where

r:	ratio of convection and latent exchange coefficients (K/hPa)
e:	the atmospheric water vapor pressure (hPa)
h_r:	the relative humidity (%)

The coefficients h₁ and h₂, which appear in both equations for C and L, are chosen to be the same. It is assumed that air which is in contact with the surface layer has the same temperature as the surface and that it is saturated with moisture h_r = 100%.

The problem turns into a system of N coupled differential equations, which can be used to calculate the heat fluxes within an object and across its total surface along the direction of the surface normal. Considering that the multi-layer structure is homogeneous and without variations of material properties in the directions of the surface plane, we can use one dimension difference model to solve this heat conduction. Mathematically, it is described with the form:

1^st layer: $\begin{array}{l} \frac{d T_{s}}{d t} ρ d_{l} C_{p} = α E_{sun} + ε E_{sky} - ε σ T_{s}^{4} + (h_{1} + h_{2} v_{air}) \cdot (T_{air} - T_{s}) \\ + r (h_{1} + h_{2} v_{air}) \cdot (e (T_{air}, h_{r}) - e (T_{s}, 100 %)) + \frac{k}{d_{l}} (T_{2} - T_{1}) \end{array}$
n^th layer: $\frac{d T_{n}}{d t} ρ d_{l} C_{p} = \frac{k}{d_{l}} (T_{n - 1} - 2 T_{n} + T_{n + 1})$
N^th layer: $\frac{d T_{N}}{d t} ρ d_{l} C_{p} = \frac{k}{d_{l}} (T_{N - 1} - 2 T_{N} + T_{C})$ where T_n is the temperature of the n^th layer, n (2 ≤ n ≤ N) is the layer number; d_l and k are the thickness and the thermal conductivity of the n^th layer respectively. There is a boundary condition for the temperature of the deepest layer (the N^th layer). The layer temperature is set to be a constant value, usually using the indoor temperature T_C. The solving process of these equations is a stepping algorithm with time increment Dt, which should be small for numerical stability reasons. There is a boundary condition for the temperature of the deepest layer (layer number n = N), which is set to a constant value, usually using the indoor temperature. The algorithm starts with a reasonable initial temperature setting (T₁,...,T_n) for each layer, whose quality will affect the convergence time for producing measured surface temperatures. The detailed computing steps for the equation can be found in [44].

The thermal energy emitted from the surface layer can be calculated by Planck’s equation. The hemispheric spectral radiance emitted from a surface element i is given by:

E_{i} = ε_{i} \cdot \int_{λ_{1}}^{λ_{2}} \frac{c_{i}}{λ^{5}} \cdot \frac{1}{exp (\frac{c_{2}}{λ T_{i}}) - 1} \cdot d λ, i = 1, 2, \dots, N

where ε_i is the emissivity of the material; λ₁ and λ₂ are the wavelength range, here λ₁ = 7.5 μm, λ₂ = 14 μm; c₁ = 3.74 × 10⁻¹⁶ W · m², and c₂ = 1.4388 × 10⁻² m · K.

3.3. Synthesis and rendering

The zero-distance radiation intensity E_i of surface element i is determined by Planck’s Law as in Eq. (18). The distribution of the grey scale values in IR scene depends on the radiation intensity difference among the scene. Suppose the maximum and minimum temperatures of the scene are T_max and T_min, the corresponding radiation intensity computed by Eq. (18) (here ε_i = 1) are E_max and E_min. The surface element i with intensity E_i can be assigned with grey scale G_i, which is given by:

G_{i} = \frac{E_{i} - E_{min}}{E_{max} - E_{min}} \times 255

Gouraud shading method was used in the rendering process. Firstly, surface elements in the scene were presented by triangle meshes with three vertexes. The color of each vertex can be calculated by averaging the grey scale of elements meeting at each vertex. Then we linearly interpolate the resulting vertex colors of each pixel covered by the triangle mesh. Finally, all surface elements in the scene were rendered with smooth transitions along their edges.

4. Experimental validation

In this section, we demonstrate the feasibility of the proposed approach by applying these techniques to datasets of various shapes and sizes. The input meshes of these datasets are reconstructed from multi-view images by using the state-of-the-art dense-MVS method CMPMVS [31].

With the limitation of shooting conditions and occlusions from ground in real-world, it is hard to obtain images of buildings that cover all the sides of their facades and roofs. The reconstructed mesh models of large scale scenes usually suffer noise, holes, and nonuniform densities. Therefore, referring to typical style buildings, we first construct two scaled-down scenes in laboratory and then build a small test scene in real-world, to validate the accuracy and reliability of the complete pipeline from model generation to infrared simulation. At last, we use a large scale scene in real-world to validate the scalability and performance of our approach. The experimental results are analyzed in the following parts.

4.1. Accuracy of generated model

Our model generation algorithm is implemented in C++ using the Computational Geometry Algorithm Library (CGAL), which provides all the needed basic geometric operations. An α-expansion and a max-flow library are also used for labeling superfacets and space cells respectively. The parameters of our algorithm and their values used in the experiment are summarized in Table 1. For the scalability of the application, our user-specified unit length is the average edge length of the input mesh.

Table 1. Parameters Used in Geometric Model Reconstruction

View Table | View all tables in this article

We evaluate the accuracy of the reconstructed surface against the input mesh, and the results of these experimental datasets are shown in Fig. 9. In Table 2, we compare the root mean square error (RMS) [45] and the complexity of the generated models against the LOD generation approach [25]. The first two datasets, museum and laboratory building, are small-scaled plastic models of real buildings at the campus of Beihang University, Beijing. The third dataset is a man-made scene in real-world, which is composed of three brick buildings with board roof, laid on the cement ground. For the dataset of museum, the two methods have comparable RMS values, but ours produces less facets by the constraint on space partition, which has a lower computation cost. The slopes at the back of the museum are correctly reconstructed by orientation alignment. For the dataset of laboratory building, the hierarchy algorithm of clustering superfacets generates more reliable segmentation results from detect-laden mesh areas. It enables more accurate and less amount of extracted plane proxies. Therefore, our method produces less facets and a lower RMS value. For the dataset of brick buildings, our algorithm successfully reconstructed detail structures of the roof, which produces more facets as well as a lower RMS value. The experiment results show that our method has comparable approximation accuracy, less facets, and better preservation of structures.

Fig. 9 Model generation accuracy and structure awareness. From top to bottom are the results of three datasets: (a) Classical museum; (b) Laboratory building; (c) Man-made brick buildings. The first column is input mesh reconstructed by MVS. The second column and the third column show the geometric model and its Hausdorff distance to the input mesh, which is generated by our method and the LOD algorithm in [25] respectively. The Hausdorff distance displays with color scale from blue to red, which represents the value from low to high.

Download Full Size | PDF

Table 2. Output Complexity and Accuracy

View Table | View all tables in this article

4.2. Reliability of infrared simulation

Validation is an important step in model development. For thermal IR, it is rather difficult to achieve. Most objects available for validation are quite complex and many of their specific material parameters are not precisely known [46].

We design a test scene (brick building dataset) to validate the reliability of our infrared simulation model. The scene is not overly complex in geometry and is built using materials whose parameters can be looked up from literatures. The facades of the buildings are made of 230 mm thick brick, while the roofs are made of 10 mm thick board. The 3D geometric model of the test scene has been generated and illustrated in Section 4.1. Its surface is further subdivided into triangle meshes attached with material attributes of brick or board.

The physical experiment was conducted in Beijing, China (longitude 116.33°E and latitude 39.73°N) on April 26^th, 2013. As shown in Fig. 10, the TASI-60 temperature meter is used to acquire the distribution of radiation temperatures across (some of the) buildings’ surface facets. A FLIR System A615 provides the thermal IR images of the scene in long wave IR (7.5μm – 14μm) band. Experimental data was measured and recorded every 30 minutes, which includes temperatures of the test facets, IR images from a fixed perspective and meteorological data such as wind speed, relative humidity, and ambient temperature. The parameters used in the physical experiment are described in Section 3.2. The data source and values of these parameters are summarized in Table 3.

Fig. 10 Left: Meshes attached with material characteristics. Right: Man-made buildings on the test field. Solid dots and dashed dots with numbers indicate thermistor locations on the front facets and back facets respectively. Thermistor 1 (south wall facet), 2 (east wall facet), 3 (north wall facet), 4 (west wall facet), 5 (south roof facet), 6 (east roof facet), 7 (north roof facet), and 8 (west roof facet).

Download Full Size | PDF

Table 3. Parameters Used in Infrared Simulation

View Table | View all tables in this article

Validation of simulated surface temperature

The measured temperatures (black curve) together with the predicted temperatures of our approach (red curve) for the 8 thermistor installed facets are shown in Fig. 11. From 11:00 to 12:00, the sky is cloud-covered, with decreased solar irradiance causing weaker heating on the surfaces. Obviously there is a strong drop-off of the measured temperature for the south wall (1 in Fig. 10) and south roof (5 in Fig. 10) facet during this time. After 12:00, cloud cover decreases, which leads to a significant increase of temperatures of the south and west facets which are lighted up by the sun. Drifting clouds led to the notched characteristics of the measured temperature curves. Whereas our approach is not designed to exploit measured solar radiation for IR modeling, the computation results cannot follow the measured temperatures very well at the level of notched details. Better predictions are given for the east and the north facets of both wall (2 and 3 in Fig. 10) and roof (6 and 7 in Fig. 10), which were not directly lit by sunlight during this period of time. Therefore, they are less affected by the changes of cloud cover conditions. For all facets of wall and roof, our algorithm produced smooth temperature curves that agree well with the changing tendency of the measured curves.

Fig. 11 Measured temperatures (black curve) and temperatures calculated by our approach for all test facets (1–8 in Fig. 10), beginning from 11:00 to 16:00 (UTC +8) on April 26^th, 2013.

Download Full Size | PDF

The deviation between the computed results and the measured data mainly comes from: (i) The discrepancy between material parameters obtained from literatures and their actual values. (ii) MODTRAN [47] is used for radiance and transmission computations that are difficult to model real cloud cover conditions. (iii) Most facets have a temperature distribution with non-vanishing variance while computed temperatures are area averaged mean values.

Effects of geometric model’s accuracy

We are aiming at reliably simulating IR signatures of urban scenes in real-world. The deviation between the geometric model and its corresponding object in real-world will affects the final simulation result. The deviation mainly comes from: (i) The surface of an object in real-world is usually uneven, and it cannot be finely generated from a geometric model; (ii) The direction and size of the semantic component of the generated geometric model (e.g., facade and roof) may deviates from the real object.

For the balance between computation efficiency and accuracy, deviation of the first case is inevitable in the simulation task for large-scale scenes. Deviation of the second case may become different because of the different methods used in generating geometric models. In the existing infrared simulation systems, most geometric models are manually drawn using a variety of CAD tools [29], with relevant aerial or street view imagery. These acquired data may not cover all sides of the target object, which leads to uncertainties of direction and size of the object for model designers. The input data of our model generation approach are mesh models reconstructed from multi-view images. They provide more reliable information on geometric sizes and relationships of target objects in three dimension. As shown in Fig. 9 and Table 2, the geometric models generated by our approach match well with the original reconstructed meshes. Solar radiation is the most important outer heat source in Eq. (12). The direction of the surface normal mainly affects the amounts of surface solar radiation. The direction deviation of the surface normal between the generated model and the real object will finally leads to deviation of surface temperature. As shown in Fig. 12, we add four different offsets to the original directions of test facets and compare deviations between the measured temperatures and the predicted results for the four new directions. To evaluate the temperature deviations, we have to set a relevance criterion for the meteorological disturbances. Because the measured accuracy of the thermistor is ±1K while the accuracy of the thermal camera is ±2K, we set two fixed relevance lines at ±2K.

Fig. 12 Deviation of temperature between the calculated and measured results for different direction offsets of test facets (1–8 in Fig. 10), beginning from 11:00 to 16:00 (UTC +8) on April 26^th, 2013. H indicates the offset of horizontal azimuth angle (“+” represents clockwise direction, “−” represents counterclockwise direction); V indicates the offset of pitch angle (“+” represents direction off the ground, “−” represents direction facing the ground). The two dash lines are set at ±2K, which indicate fixed relevance criterions.

Download Full Size | PDF

For all facades and roofs, compared with the four assumed direction offsets, our original directions of the test facets produce the smallest deviations between the predicted temperatures and the measured temperatures. For the north (7 in Fig. 10) and the south roof (5 in Fig. 10), whose slope angel is small, the two types of direction offsets at 10 degrees and the original direction produce close temperature results. For all sides of facades and vertical roofs, when they are illuminated by the sun, the temperature deviations caused by direction offsets of horizontal azimuth angles are less than that by offsets of pitch angels. The 10 degrees offset of pitch angels may leads to more than 2K deviation of surface temperature. When they are not illuminated by the sun, the two types of direction offsets at 10 degrees and the original direction produce close temperature results. The deviations of surface temperature are both less than 2K.

Validation of simulated infrared image

Finally, according to Eq. (19), we calculate the radiation distribution of the whole scene in long wave IR (7.5μm – 14μm) band, and render them into images from the same perspective of the recorded IR images.

As shown in Fig. 13, we compare the simulation result with the geometric model processed by our approach and the raw mesh reconstructed by multi-view stereo. In both of the simulated scenes, the infrared signatures of roofs and facades match well with the real infrared image. Even though dense meshes of reconstructed models display more details than our simplified model on facades and roofs, these details may have more noises and cannot accurately model real structures of object surfaces. While the reconstructed model has around 1 M triangle facets, our method produces 52 K triangle facets with the average edge length increased by 6.7 times. It greatly reduces the computation cost and smoothly renders the flat object surfaces.

Fig. 13 The comparison of simulation result with different geometric models: (a) Infrared image taken by a FLIR A615 camera; (b) Simulated scene with our geometric model; (c) Simulated scene with mesh reconstructed by multi-view stereo.

Download Full Size | PDF

The simulated scene images and its corresponding real IR images from 11:00 to 16:00 during the day are shown in Fig. 14. As the result of changing incident direction and intensity of solar radiance, the correlation of gray scale among object surfaces differs at different times in the scene. IR shadows are calculated due to the sun position and geometrical relationships between objects. Different from shadows in the visible band, a typical effect in the IR range is the history influence of shadows. Areas previously shadowed vary continuously in radiance distribution, due to the capacity of material to store energy (see Fig. 14(a–l)). The comparisons between the simulated images and the real IR images illustrate that our simulated images match well with the recorded IR images in terms of the distribution and correlation of thermal radiance intensity in the scene.

Fig. 14 Various simulated and real IR images of scenes at different times on April 26^th, 2013. The first row (subfigure (a) – (c)) and the third row (subfigure (g) – (i)) are simulated scene images from 11:00 to 16:00 in long wave IR (7.5μm – 14μm) band. The second row (subfigure (d) – (f)) and the fourth row (subfigure (j) – (l)) are recorded IR images from 11:00 to 16:00 in the same waveband.

Download Full Size | PDF

4.3. Scalability and performance

The three datasets (museum, laboratory, and brick buildings) used in previous validation experiments are small scale scenes ranging from individual architectures to simple building blocks. The complexities of their input mesh models are ranging from 100 K to 1 M triangle faces. The experimental results of the three small scale scenes from model generation to infrared simulation are illustrated in Fig. 15.

Fig. 15 Model generation and infrared simulation on small scale scenes. From top to bottom, the results of the three datasets are shown: museum, laboratory building, and man-made brick buildings. The first column are input mesh models. The second column are optimized geometric models generated by our method. The semantic structures of the input mesh are well preserved. The third column are subdivided mesh elements attached with material characteristics. The fourth column displays simulated scenes in long wave IR (7.5μm – 14μm) band. IR signatures of the walls and roofs as well as IR shadows are realistically calculated and rendered.

Download Full Size | PDF

Our approach also can deal with large scale scenes in real-world whose reconstructed models comprise several million triangle faces. The results of a complicated urban scene covering 350 × 350 m² of Zhuhai Airport, China are illustrated in Fig. 16. The input mesh (3.4 M triangle faces) was reconstructed from 71 images, which were taken from a training aircraft. The generated model comprises 2 K triangle faces, excluding ground faces.

Fig. 16 Model generation and infrared simulation on a large scale scene. In subfigure (b) – (e), the red squares mark the area corresponding to the red border image in subfigure (a) and give detailed views at the same perspective.

Download Full Size | PDF

The airport scene comprises three main types of components: terminal building, control tower, and lounge bridges. Aerial images were taken around the region of the control tower which is marked with a red square in Fig. 16, while the terminal building and some lounge bridges usually appear near the edges of the images. The terminal building is partly reconstructed in the input mesh. Both the terminal building and the lounge bridges have many laden-defect meshes. For the terminal building, our approach generates detailed structures and preserves the main structures in defect areas. We only generate one lounge bridge as the other two cannot generate enough supporting plane proxies. The top of the control tower is actually composed of curved facades with near zero curvature. These facades are well generated as planar surfaces. The eight different orientations of these planar surfaces are correctly aligned. The main structures of the control tower are successfully generated and well separated from adjacent trees. Trees and bushes in the scene are not generated due to our model generation algorithm.

The model surface is then subdivided into 400 K triangle facets at the specified element resolution. Compared with the triangle facets of the input mesh, the amount of the subdivided facets is 7.5 times less while the average edge length of the subdivided facets is 4 times larger. Material attributes are attached to these surface elements. The results show that at this level of element resolution, the simulated infrared scene achieves a good visual effect. From the reconstructed MVS mesh (Fig. 16(b)) to the generated model (Fig. 16(c)), it takes around 160 minutes (15 minutes for superfacet classification, about 1 minute for proxy abstraction, and 144 minutes for surface reconstruction). From the generated model (Fig. 16(c)) to the subdivided elements (Fig. 16(d)), it takes around 48 minutes. For the use of IR simulation, the total process of model generation costs around 208 minutes. All timings are measured on a laptop with an Intel Core i5 clocked at 2.5 GHz and 8 GB RAM. It significantly reduces the time required by the current manual methods of creating scene models for IR simulation.

For all small scale scenes and the large scale scene, experimental results match our initial goal to reliably and semi-automatically simulate urban scenes in the IR band: the geometric models are accurately and automatically generated, and the IR signatures are efficiently and realistically simulated.

5. Conclusion

This paper proposed a semi-automated approach to simulate IR signatures of city scenes from multi-view visible images. In the IR scene generating process, the application of multi-view reconstruction and mesh approximation technologies significantly reduces the man-in-the-loop requirements for several aspects of geometric model construction.

For a real scene, a 3D mesh model can be conveniently reconstructed from state-of-the-art multi-view stereo methods. We digest the reconstructed meshes into an optimized watertight geometry model and then subdivided it into quality mesh elements with material characteristics attached. Our reconstruction algorithm is shown to exhibit robustness to defect-laden meshes through regularized optimizations combined with constraints on space partition which generates well-behaved surfaces.

We established radiance balance equation based on the principle of energy equilibrium for every surface mesh elements. By using infinite difference method, their temperature distributions are efficiently calculated. The scene is finally rendered into images based on the distribution of radiance values at user-specified perspective. The comparisons of temperatures and images between simulation results and measured data verify that our method realistically simulates the IR characteristic of city scenes in real-world.

Currently, for geometric model reconstruction, we only use planar primitives to approximate meshes and reconstructed surfaces at the LOD2 level. We do not generate detailed objects on roofs or facades with planar proxies due to the limited resolution of MVS meshes (typically only a few triangle facets for these objects, such as windows). At this level of reconstruction, multiple material properties in different regions of the same plane cannot be automatically segmented and processed. In future work, we wish to introduce more regular shapes in primitive abstraction, such as cylinders and spheres, to obtain more accurate reconstructed surfaces of non-planar structures. We will try to introduce parametric 3D models to model different species of vegetation. We also want to devise a photo-consistent framework by exploiting the color attributes of the multi-view stereo images to reliably reconstruct repetitive patterns on facade and roof, as well as to segment different land cover regions. More physical experiments will be conducted for city scenes in real-world to achieve a better understanding of their IR characteristics and to verify more thermal models of typical styles of architectures.

Acknowledgments

This work is supported by the Innovation Foundation of AVIC under Grant No. CXY2012BH02, China Scholarship Council, National Natural Science Foundation of China (61271023), Program for New Century Excellent Talents in University (NCET-13-0020), and Fundamental Research Funds for the Central Universities (YWF-16-BJ-Y-28). We appreciate Zhaoying Liu of Beijng University of Technology and Xiyu Yu of University of Technology Sydney for their comments on this paper. We are grateful for the detailed and useful comments from the reviewers and from the editor.

References and links

1. C. Nelsson, P. Hermansson, S. Nyberg, A. Persson, R. Persson, S. Sjökvist, and T. Winzell, “Optical signature modeling at FOI,” Proc. SPIE 6395, 639508 (2006). [CrossRef]

2. S. E. Paul, “Subpixel temperature estimation from single-band thermal infrared imagery,” Ph.D. thesis, Rochester Institute of Technology (2012).

3. M. Li, L. Nan, N. Smith, and P. Wonka, “Reconstructing building mass models from UAV images,” Comput. Graph-UK. 54, 84–93 (2016). [CrossRef]

4. A. Reinov, Y. Bushlin, A. Lessin, and D. Clement, “Dew, dust, and wind influencing thermal signatures of objects,” Proc. SPIE 6941, 69410U (2008). [CrossRef]

5. S. Pallotta, X. Briottet, C. Miesch, and Y. Kerr, “Sensor radiance physical model for rugged heterogeneous surfaces in the 3–14 μm region,” Opt. Express 14(6), 2130–2150 (2006). [CrossRef] [PubMed]

6. J. S. Tyo, B. M. Ratliff, J. K. Boger, W. T. Black, D. L. Bowers, and M. P. Fetrow, “The effects of thermal equilibrium and contrast in LWIR polarimetric images,” Opt. Express 15(23), 15161–15167 (2007). [CrossRef] [PubMed]

7. H. Ren, R. Liu, G. Yan, ZL. Li, Q. Qin, Q. Liu, and F. Nerry, “Performance evaluation of four directional emissivity analytical models with thermal SAIL model and airborne images,” Opt. Express 23(7), A346–A360 (2015). [CrossRef] [PubMed]

8. J. Latger, T. Cathala, N. Douchin, and A. Le Goff, “Simulation of active and passive infrared images using the SE-WORKBENCH,” Proc. SPIE 6543, 654302 (2007). [CrossRef]

9. K. Johnson, A. Curran, D. Less, D. Levanen, E. Marttila, T. Gonda, and J. Jones, “MuSES: A new heat and signature management design tool for virtual prototyping,” in Proceedings of the 9th Annual Ground Target Modelling & Validation Conference (Citeseer, 1998).

10. J. R. Schott, S. D. Brown, R. V. Raqueno, H. N. Gross, and G. Robinson, “An advanced synthetic image generation model and its application to multi/hyperspectral algorithm development,” Can. J. Rem. Sens. 25(2), 99–111 (1999). [CrossRef]

11. F. Lafarge and P. Alliez, “Surface reconstruction through point set structuring,” Comput. Graph. Forum 25(2), 225–234 (2013). [CrossRef]

12. M. Kazhdan, M. Bolitho, and H. Hoppe, “Poisson surface reconstruction,” in Proceedings of the Fourth Eurographics Symposium on Geometry Processing (Eurographics Association, 2006), 61–70.

13. A. Hornung and L. Kobbelt, “Robust reconstruction of watertight 3D models from non-uniformly sampled point clouds without normal information,” in Proceedings of Symposium on Geometry Processing (Eurographics Association, 2006), 41–50.

14. V. Lempitsky and Y. Boykov, “Global optimization for shape fitting,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2007), 1–8.

15. P. Labatut, J-P. Patrick, and R. Keriven, “Robust and efficient surface reconstruction from range data,” Comput. Graph. Forum 28(8), 2275–2290 (2009). [CrossRef]

16. P. Musialski, P. Wonka, D. Aliaga, M. Wimmer, L. Van Gool, and W. Purgathofer, “A survey of urban reconstruction,” Comput. Graph. Forum 32(6), 146–177 (2013). [CrossRef]

17. Q.-Y. Zhou and U. Neumann, “Fast and extensible building modeling from airborne LiDAR data,” in Proceedings of the 16th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM, 2008), 1–8.

18. C. Poullis and S. You, “Automatic reconstruction of cities from remote sensor data,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2009), 2775–2782.

19. H. Yu, E. Li, W. Gong, and S. Han, “Structured image reconstruction for three-dimensional ghost imaging lidar,” Opt. Express 23(11), 14541–14551 (2015). [CrossRef] [PubMed]

20. F. Lafarge, “Some new research directions to explore in urban reconstruction,” in Proceedings of Joint Urban Remote Sensing Event (IEEE, 2015), 1–4.

21. J. M. Coughlan and A.L. Yuille, “The Manhattan world assumption: Regularities in scene statistics which enable Bayesian inference,” in Conference on Neural Information Processing Systems (MIT Press, 2000), 845–851.

22. C. A. Vanegas, D. G. Aliaga, and B. Benes, “Automatic extraction of Manhattan-world building masses from 3D laser range scans,” IEEE T. Vis. Comput. Gr. 18(10), 1627–1637 (2012). [CrossRef]

23. A.-L. Chauve, P. Labatut, and J.-P Pons, “Robust piecewise-planar 3D reconstruction and completion from large-scale unstructured point data,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2010), 1261–1268.

24. Q.-Y. Zhou and U. Neumann, “2.5D building modeling by discovering global regularities,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2012), 326–333.

25. Y. Verdie, F. Lafarge, and P. Alliez, “LOD Generation for Urban Scenes,” ACM T. Graphic. 34(3), 30 (2015). [CrossRef]

26. M. A. Owens, M. R. Wellfare, J. Foster, J. S. Watson, D. A. Vechinski, M. Richards, and V. Underwood, “IRMA 5.0 MultiSensor signature prediction model,” Proc. SPIE 4209, 249–267 (1999). [CrossRef]

27. S. P. Sullivan and W. R. Reynolds, “Validation of the physically reasonable infrared signature model (PRISM),” Proc. SPIE 0890, 104–110 (1988). [CrossRef]

28. S. E. Lane, C. S. Nichols, A. M. Spencer, and J. M. Cathcart, “OREOS: a new EO-IR modeling and simulation tool for US Coast Guard search and rescue applications,” Proc. SPIE 8713, 87130R (2013). [CrossRef]

29. S. R. Lach, S. D. Brown, and J. P. Kerekes, “Semi-automated DIRSIG scene modeling from 3D LiDAR and passive imaging sources,” Proc. SPIE 6214, 62140I (2006). [CrossRef]

30. N. Li, Z. Lv, S. Wang, G. Gong, and L. Ren, “A real-time infrared radiation imaging simulation method of aircraft skin with aerodynamic heating effect,” Infrared Phys. Techn. 71, 533–541 (2015). [CrossRef]

31. M. Jancosek and T. Pajdla, “Multi-view reconstruction preserving weakly-supported surfaces,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE, 2011), 3121–3128.

32. M. Attene, S. Katz, M. Mortara, G. Patané, M. Spagnuolo, and A. Tal, “Mesh segmentation – A comparative study,” in Proceedings of IEEE International Conference on Shape Modeling and Applications (IEEE, 2006), 7.

33. A. Shamir, “A survey on mesh segmentation techniques,” Comput. Graph. Forum 27(6), 1539–1556 (2008). [CrossRef]

34. F. Lafarge, R. Keriven, and M. Brédif, “Combining meshes and geometric primitives for accurate and semantic modeling,” in The British Machine Vision Conference (BMVA, 2009), 38.1–38.11.

35. D. Cohen-Steiner and J. Morvan, “Restricted Delaunay triangulations and normal cycle,” in Proceedings of the Nineteenth Annual Symposium on Computational Geometry (ACM, 2003), 312–321.

36. B. Heckel, A. E. Uva, B. Hamann, and K. I. Joy, “Surface reconstruction using adaptive clustering methods,” in Geometric Modelling (Springer, 2001), 199–218. [CrossRef]

37. Y. Qin, S. Li, TT. Vu, Z. Niu, and Y. Ban, “Synergistic application of geometric and radiometric features of LiDAR data for urban land cover mapping,” Opt. Express 23(11), 13761–13775 (2015). [CrossRef] [PubMed]

38. Y. Boykov, O. Veksler, and R. Zabih, “Fast approximate energy minimization via graph cuts,” IEEE T. Pattern Anal. 23(11), 1222–1239 (2001). [CrossRef]

39. Y. Li, X. Wu, Y. Chrysathou, A. Sharf, D. Cohen-Or, and N. J. Mitra, “Globfit: Consistently fitting primitives by discovering global relations,” ACM T. Graphic. 30(4), 52 (2011). [CrossRef]

40. Y. Boykov and V. Kolmogorov, “An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision,” IEEE T. Pattern Anal. 26(9), 1124–1137 (2004). [CrossRef]

41. J. Schöberl, “NETGEN An advancing front 2D/3D – mesh generator based on abstract rules,” Comput. Vis. Sci. 1(1), 41–52 (1997). [CrossRef]

42. X. Xiong, F. Zhou, X. Bai, and X. Yu, “IR characteristic simulation of city scenes based on radiosity model,” Proc. SPIE 8907, 890727 (2013). [CrossRef]

43. B. Bartos and K. Stein, “FTOM-2D: a two-dimensional approach to model the detailed thermal behavior of nonplanar surfaces,” Proc. SPIE 9653, 96530G (2015). [CrossRef]

44. Z. Wang, Z. Jiang, S. Liu, and Q. Peng, “New model for realistic IR image rendering of city buildings,” Proc. SPIE 5405, 177–188 (2004). [CrossRef]

45. N. Aspert, D. Santa Cruz, and T. Ebrahimi, “MESH: measuring errors between surfaces using the Hausdorff distance,” in Pcoceedings of IEEE International Conference on Multimedia and Expo (IEEE, 2002), 5705–5708.

46. A. Malaplate, P. Grossmann, and F. Schwenger, “CUBI: a test body for thermal object model validation,” Proc. SPIE 6543, 654305 (2007). [CrossRef]

47. MODTRAN, “MODTRAN atmospheric radiative transfer model,” http://www.modtran.org.

48. T. Foken and C. J. Nappo, Micrometeorology (Springer, 2008), Chap. 4.

49. F. D. Lapierre and M. Acheroy, “Modelisation of convective heat transfer using novel adaptive parametric models and novel empirical models obtained from measured surface temperatures,” in 5th International IR Target, Background Modelling & Simulation (ITBMS) Workshop (2009).

50. Solar Air Conditioning Tech Group: solar-ac@yahoogroups.com, “Absorptivity & Emissivity table 1 plus others,” http://www.solarmirror.com/fom/fom-serve/cache/43.html.

51. J. A. Clarke, P. P. Yaneske, and A. A. Pinney, “The Harmonisation of Thermal Properties of Building Materials,” http://www.esru.strath.ac.uk.

	Parameters	Value
Semantic Classification	Mesh neighbor radius r_m	1 (in user-specified unit)
	Initial growing criterion threshold t_m	0.15 (in user-specified unit)
	Growing orientation tolerance θ_s	30°
	Growing planarity tolerance t_s	0.8
	Growing distance tolerance d_s	2 (in user-specified unit)
	Growing are threshold A_smax	100 (in user-specified unit)
	Pairwise potential weight γ	0.5

Primitive Abstraction	Proxy planarity tolerance t_p	0.8
	Proxy minimal size A_pmin	80 (in user-specified unit)
	Orientation tolerance θ_p	15°
	Proxy distance tolerance d_p	2 (in user-specified unit)

Surface Reconstruction	Anchor spacing	1 (in user-specified unit)
	Data sensitivity α	0
	Regularization weight β	1

Dataset Name	Input Complexity	Approach	Output Complexity	RMS
Museum	126501 facets	Our Approach Method in [25]	169 facets 530 facets	0.81 0.80
Laboratory	99530 facets	Our Approach Method in [25]	66 facets 110 facets	0.86 0.96
Brick buildings	986150 facets	Our Approach Method in [25]	508 facets 92 facets	0.26 0.36

Parameters		Value	Data Source
Solar radiation E_sun (W/m²) Atmospheric radiation E_sky (W/m²)		Calculated value	Calculated by MODTRAN [47]
Air temperature T_air (K) Wind speed v_air (m/s) The relative humidity h_r (%)		Measured value	Measured by temperature meter Measured by anemometer Measured by hygrometer
Ratio of convection and latent exchange coefficients r (K/hPa)		0.667	Refer to [48]
Exchange coefficients for free convection h₁ (W/m² · K)		10, v_air < 1.5 m/s 6.2, v_air ≥ 1.5 m/s	Refer to [49]
Exchange coefficients for free convection h₂ (J/m³ · K)		0, v_air < 1.5 m/s 3.8, v_air ≥ 1.5 m/s	Refer to [49]
The number of layers N		10	User specified
Layer thickness d_l (mm)	Brick facade	23
	Board roof	1
	Cement ground	5
Absorbent coefficients α	Brick facade	0.70	Refer to [50]
	Board roof	0.60
	Cement ground	0.60
Emissivity coefficients ε	Brick facade	0.93
	Board roof	0.90
	Cement ground	0.88
Density ρ (kg/m³)	Brick facade	1800	Refer to [51]
	Board roof	800
	Cement ground	2200
Heat capacity C_p (J/kg · K)	Brick facade	880
	Board roof	2510
	Cement ground	920
Thermal conductivity k (W/m · K)	Brick facade	0.75
	Board roof	0.15
	Cement ground	1.20

	Parameters	Value
Semantic Classification	Mesh neighbor radius r_m	1 (in user-specified unit)
	Initial growing criterion threshold t_m	0.15 (in user-specified unit)
	Growing orientation tolerance θ_s	30°
	Growing planarity tolerance t_s	0.8
	Growing distance tolerance d_s	2 (in user-specified unit)
	Growing are threshold A_smax	100 (in user-specified unit)
	Pairwise potential weight γ	0.5

Primitive Abstraction	Proxy planarity tolerance t_p	0.8
	Proxy minimal size A_pmin	80 (in user-specified unit)
	Orientation tolerance θ_p	15°
	Proxy distance tolerance d_p	2 (in user-specified unit)

Surface Reconstruction	Anchor spacing	1 (in user-specified unit)
	Data sensitivity α	0
	Regularization weight β	1

Dataset Name	Input Complexity	Approach	Output Complexity	RMS
Museum	126501 facets	Our Approach Method in [25]	169 facets 530 facets	0.81 0.80
Laboratory	99530 facets	Our Approach Method in [25]	66 facets 110 facets	0.86 0.96
Brick buildings	986150 facets	Our Approach Method in [25]	508 facets 92 facets	0.26 0.36

Semi-automated infrared simulation on real urban scenes based on multi-view images

Abstract

1. Introduction

1.1. Related works

Urban model reconstruction

Infrared simulation

1.2. Contributions

2. Geometric model generation

2.1. Semantic classification

2.2. Primitive abstraction

Orientation alignment

Placement alignment

2.3. Surface reconstruction

Space decomposition

Labeling of space cells

3. Infrared signature modeling

3.1. Mesh regeneration

3.2. The radiance balance equation

3.3. Synthesis and rendering

4. Experimental validation

4.1. Accuracy of generated model

4.2. Reliability of infrared simulation

Validation of simulated surface temperature

Effects of geometric model’s accuracy

Validation of simulated infrared image

4.3. Scalability and performance

5. Conclusion

Acknowledgments

References and links

Cited By

Figures (16)

Tables (3)

Equations (19)

Optics Express