High-definition real-time depth-mapping TV camera: HDTV Axi-Vision Camera

We have developed a field-worthy, high-definition, real-time depth-mapping television camera called the HDTV Axi-Vision Camera. The camera can simultaneously capture both an ordinary HDTV color image and a depth image of objects on more than 1280×720 pixels at a frame rate of 29.97 Hz, or on 853×480 pixels at a frame rate of 59.94 Hz. The number of detectable pixels per unit time was increased by about 5 times that of the prototype camera by improving the sensitivity and resolution of the depth-mapping camera. Short video clips demonstrate how depth information from the camera can be used to create a virtual image in actual television program production.

Fig. 1.
Fig. 1. (2.4 MB) Video clip from the Japan Broadcasting Station (NHK) broadcast produced using the HDTV Axi-Vision Camera. The high-definition TV program is the “50th Anniversary: Today is the birthday of TV. Grand finale.” broadcast live from NHK. A moving computer graphic image was amalgamated in real time with an image of a singer using the depth information.
Fig. 2.
Fig. 2. Principle of acquiring depth information by using intensity-modulated illumination and an ultra-fast camera shutter using an image intensifier.
Fig. 3.
Fig. 3. Configuration of the HDTV Axi-Vision Camera.
Fig. 4.
Fig. 4. Photograph of the HDTV Axi-Vision Camera.
Fig. 5.
Fig. 5. Quantum efficiency of the photocathode of the image intensifier.
Fig. 6.
Fig. 6. Transmittance of the dichroic prism and the optical filter.
Fig. 7.
Fig. 7. LED array units: (a) geometry, (b) spatial distribution of optical power.
Fig. 8.
Fig. 8. Output image signal as a function of distance between the object and camera.
Fig. 9.
Fig. 9. Depth resolution as a function of object distance.
Fig. 10.
Fig. 10. Relationship between object reflectivity and depth resolution.
Fig. 11.
Fig. 11. (2.5 MB) Video clip of depth-keying examples: (a) color image, (b) depth image, (c) objects in the furthest range only, (d) objects in the middle range only, (e) objects in the nearest range only.
Fig. 12.
Fig. 12. (22.4 MB) Video clip of a virtual studio synthesized by combining live images with prerecorded scenes.

Table 1. Specifications of the HDTV Axi-Vision Camera

I + ( t s , d ) = σ ( 4 π d 2 ) 2 s ( t s 2 d ν ) .
I ( t s , d ) = σ ( 4 π d 2 ) 2 s [ T 2 ( t s 2 d ν ) ] ,
d = 1 2 ν [ t s T 2 ( R 1 + R ) ] ,
R = I + I .
d = λ 8 ( 1 R 1 + R ) ,
n pe ¯ = η E A p τ ε m 2 ,
( S N ) pe = n pe ¯ σ pe = n pe ¯ = η E A p τ ε m 2 .
( S N ) phosphor = ( S N ) pe 1 N f = η E A p τ ε m 2 N f .
E ρ T L I 4 F N 2 S I d 2 ,
( S N ) phosphor η T L I A p τ d .
