Khalid Omer and Meredith Kupinski, "Mid-fusion of road scene polarization images on pretrained RGB neural networks," J. Opt. Soc. Am. A 38, 515-525 (2021)
This work presents a mid-fusion pipeline that can increase the detection performance of a convolutional neural network (RetinaNet) by including polarimetric images even though the network is trained on a large-scale database containing RGB and monochromatic images (Microsoft COCO). Here, the average precision (AP) for each object class quantifies performance. The goal of this work is to evaluate the usefulness of polarimetry for object detection and recognition of road scenes and determine the conditions that will increase AP. Shadows, reflections, albedo, and other object features that reduce RGB image contrast also decrease the AP. This work demonstrates specific cases for which the AP increases using linear Stokes and polarimetric flux images. Images are fused during the neural network evaluation pipeline, which is referred to as mid-fusion. Here, the AP of polarimetric mid-fusion is greater than the RGB AP in 54 out of 80 detection instances. The recall values for cars and buses are similar for RGB and polarimetry, but values increase from 36% to 38% when using polarimetry for detecting people. Videos of linear Stokes images for four different scenes are collected at three different times of the day for two driving directions. Despite this limited dataset and the use of a pretrained network, this work demonstrates selective enhancement of object detection through mid-fusion of polarimetry to neural networks trained on RGB images.
Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.
Cited By
You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Figure files are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Article tables are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Equations are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
Details on Road Scenes Imaged with a LUCID Vision Lab Triton 5.0 MP Polarized Camera (with Sony IMX250MZR Sensor), Mounted on the Roof of an Automobilea
Scene
Date
Driving Direction
GPS Coordinates
Local Time (MST)
Arterial road
3-16-2020
East
7:22, 12:15, 17:53
West
7:19, 12:09, 17:50
Residential road
3-16-2020
East
7:26, 11:59, 17:45
West
7:25, 11:58, 17:58
Downtown street
3-16-2020
East
7:45, 12:36, 18:15
West
7:42, 12:32, 18:10
Parking lot
3-15-2020
N/A
7:47, 12:12, 17:39
In total, this work analyzes approximately an hour of video footage.
Table 2.
Parking Lot Scene AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Car
Person
Truck
Fire Hydrant
Traffic Light
07:47
86%
97%
70%
18%
55%
12:12
84%
42%
15%
73%
17:39
85%
43%
3%
07:47
83%
46%
47%
26%
48%
12:12
77%
21%
15%
48%
17:39
78%
41%
0%
07:47
86%
100%
74%
31%
58%
12:12
85%
48%
15%
75%
17:39
85%
37%
10%
07:47
86%
100%
73%
29%
58%
12:12
85%
47%
15%
73%
17:39
85%
39%
11%
07:47
86%
97%
75%
28%
56%
12:12
85%
47%
16%
71%
17:39
85%
42%
4%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At 7:47 MST, the polarimetry value exceeds AP(${S_0}$) in three out of four object categories. At 12:12 MST, except for backbone set $\{{S_0},{S_1},{S_2}\}$, the polarimetry value exceeds AP(${S_0}$) at 12:12 MST for all object categories. At 17:39 MST, the polarimetry value exceeds AP(${S_0}$) in one out of three object categories.
Table 3.
Eastbound Downtown Scene AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Bus
Car
Fire Hydrant
Person
Stop Sign
Traffic Light
Truck
7:45
3%
70%
57%
57%
23%
32%
12:36
59%
93%
14%
27%
6%
18:15
13%
51%
57%
22%
40%
8%
35%
7:45
51%
47%
41%
44%
12:36
64%
81%
12%
33%
18:15
12%
33%
46%
20%
23%
7:45
3%
70%
57%
59%
25%
31%
12:36
60%
93%
13%
27%
6%
18:15
13%
50%
56%
23%
28%
8%
37%
7:45
3%
70%
58%
58%
24%
31%
12:36
60%
93%
13%
27%
6%
18:15
13%
50%
57%
24%
28%
8%
36%
7:45
3%
70%
60%
58%
26%
31%
12:36
59%
93%
13%
28%
6%
18:15
13%
50%
57%
22%
26%
9%
37%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At least one polarimetric backbone set exceeds the RGB AP at 7:47 MST for four out of six possible object categories. At 12:36 MST, at least one backbone set exceeds AP(${S_0}$) in two out of five possible object categories. For 18:15 MST, at least one polarimetric backbone set exceeds AP(${S_0}$) in three out of seven possible object categories.
Table 4.
Westbound Downtown Scene AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Car
Fire Hydrant
Parking Meter
Person
Traffic Light
Truck
07:42
95%
85%
18%
73%
4%
12:32
86%
80%
37%
98%
18:10
90%
65%
22%
63%
50%
07:42
82%
75%
0%
63%
14%
12:32
88%
73%
39%
73%
18:10
82%
67%
15%
27%
36%
07:42
95%
85%
12%
73%
4%
12:32
86%
77%
38%
98%
18:10
91%
65%
24%
66%
52%
07:42
95%
85%
13%
74%
4%
12:32
86%
80%
37%
99%
18:10
91%
69%
23%
66%
52%
07:42
95%
89%
18%
74%
5%
12:32
86%
80%
38%
99%
18:10
91%
69%
23%
65%
51%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At 7:42 MST, the polarimetry value exceeds AP(${S_0}$) when detecting fire hydrants, traffic lights, and trucks. At 12:32 MST, the polarimetry value exceeds AP(${S_0}$) when detecting cars, traffic lights, and trucks. Lastly, at 18:10 MST, the polarimetry value matches or exceeds AP(${S_0}$) when detecting cars, fire hydrants, people, traffic lights, and trucks.
Table 5.
Eastbound Arterial Road AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Bike
Car
Fire Hydrant
Motor-cycle
Person
Truck
07:22
58%
18%
12:15
69%
21%
34%
88%
17:53
19%
62%
100%
07:22
40%
7%
12:15
58%
11%
34%
87%
17:53
20%
63%
100%
07:22
57%
13%
12:15
68%
19%
35%
88%
17:53
17%
58%
100%
07:22
56%
13%
12:15
69%
20%
31%
89%
17:53
18%
62%
100%
07:22
57%
13%
12:15
67%
20%
27%
90%
17:53
18%
60%
100%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At 7:22 MST, the polarimetry value does not exceed AP(${S_0}$) for any detection category. Additionally, at 12:15 MST, at least one polarimetric backbone set exceeds AP(${S_0}$) for one out of four possible detected categories. Lastly, at 17:53 MST, the polarimetric backbone set AP exceeds AP(${S_0}$) for two out of three possible detected categories.
Table 6.
Westbound Arterial Road AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Bus
Car
Stop Sign
Person
07:19
58%
70%
12:09
67%
67%
25%
80%
17:50
35%
52%
33%
22%
07:19
53%
58%
12:09
51%
73%
25%
73%
17:50
28%
52%
33%
15%
07:19
57%
69%
12:09
69%
66%
25%
77%
17:50
33%
50%
33%
24%
07:19
59%
69%
12:09
69%
68%
25%
80%
17:50
32%
52%
33%
23%
07:19
56%
69%
12:09
66%
68%
25%
80%
17:50
34%
53%
33%
23%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At least one polarimetric backbone set AP at 7:19 MST exceeds AP(${S_0}$) when detecting buses. At 12:09 MST, the polarimetry value exceeds AP(${S_0}$) in half of the object categories. The polarimetry AP at 17:50 MST exceeds AP(${S_0}$) when detecting cars and people.
Table 7.
Eastbound Residential Road AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Car
Chair
Fire Hydrant
Person
Truck
07:26
57%
68%
36%
11:59
62%
17%
17:45
55%
71%
82%
49%
07:26
43%
35%
49%
11:59
54%
11%
17:45
48%
15%
73%
57%
07:26
59%
68%
38%
11:59
60%
19%
17:45
57%
77%
82%
48%
07:26
59%
69%
38%
11:59
61%
19%
17:45
57%
100%
82%
53%
07:26
60%
72%
38%
11:59
62%
17%
17:45
58%
69%
92%
52%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At least one polarimetric backbone set AP at 7:26 MST exceeds AP(${S_0}$) when detecting cars, fire hydrants, and trucks. At 11:59 MST, the polarimetry value exceeds AP(${S_0}$) when detecting people. The polarimetry AP at 17:45 MST exceeds AP(${S_0}$) for detecting cars, chairs, fire hydrants, and trucks.
Table 8.
Westbound Residential Road AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Bus
Car
Truck
07:25
63%
63%
11:58
96%
77%
14%
17:58
98%
27%
63%
07:25
49%
50%
11:58
60%
65%
26%
17:58
94%
26%
77%
07:25
63%
69%
11:58
96%
77%
13%
17:58
98%
28%
70%
07:25
63%
68%
11:58
96%
78%
14%
17:58
98%
28%
65%
07:25
64%
68%
11:58
97%
77%
13%
17:58
98%
28%
70%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At least one polarimetric backbone set AP at 7:25 MST exceeds AP(${S_0}$) when detecting cars and trucks. At 11:58 MST, the polarimetry value exceeds AP(${S_0}$) for bus and truck detection. The polarimetry value exceeds AP(${S_0}$) at 17:58 MST for car and truck detection.
Table 9.
and Backbone Set Recall Values Averaged Across All Locations and Times of Daya
Backbone Set
Car
Bus
Person
92%
30%
36%
88%
28%
28%
90%
30%
38%
92%
30%
38%
91%
30%
38%
The polarimetric backbone set recall values for buses and cars do not exceed ${S_0}$ but increased from 36% to 38% for detecting people.
Tables (9)
Table 1.
Details on Road Scenes Imaged with a LUCID Vision Lab Triton 5.0 MP Polarized Camera (with Sony IMX250MZR Sensor), Mounted on the Roof of an Automobilea
Scene
Date
Driving Direction
GPS Coordinates
Local Time (MST)
Arterial road
3-16-2020
East
7:22, 12:15, 17:53
West
7:19, 12:09, 17:50
Residential road
3-16-2020
East
7:26, 11:59, 17:45
West
7:25, 11:58, 17:58
Downtown street
3-16-2020
East
7:45, 12:36, 18:15
West
7:42, 12:32, 18:10
Parking lot
3-15-2020
N/A
7:47, 12:12, 17:39
In total, this work analyzes approximately an hour of video footage.
Table 2.
Parking Lot Scene AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Car
Person
Truck
Fire Hydrant
Traffic Light
07:47
86%
97%
70%
18%
55%
12:12
84%
42%
15%
73%
17:39
85%
43%
3%
07:47
83%
46%
47%
26%
48%
12:12
77%
21%
15%
48%
17:39
78%
41%
0%
07:47
86%
100%
74%
31%
58%
12:12
85%
48%
15%
75%
17:39
85%
37%
10%
07:47
86%
100%
73%
29%
58%
12:12
85%
47%
15%
73%
17:39
85%
39%
11%
07:47
86%
97%
75%
28%
56%
12:12
85%
47%
16%
71%
17:39
85%
42%
4%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At 7:47 MST, the polarimetry value exceeds AP(${S_0}$) in three out of four object categories. At 12:12 MST, except for backbone set $\{{S_0},{S_1},{S_2}\}$, the polarimetry value exceeds AP(${S_0}$) at 12:12 MST for all object categories. At 17:39 MST, the polarimetry value exceeds AP(${S_0}$) in one out of three object categories.
Table 3.
Eastbound Downtown Scene AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Bus
Car
Fire Hydrant
Person
Stop Sign
Traffic Light
Truck
7:45
3%
70%
57%
57%
23%
32%
12:36
59%
93%
14%
27%
6%
18:15
13%
51%
57%
22%
40%
8%
35%
7:45
51%
47%
41%
44%
12:36
64%
81%
12%
33%
18:15
12%
33%
46%
20%
23%
7:45
3%
70%
57%
59%
25%
31%
12:36
60%
93%
13%
27%
6%
18:15
13%
50%
56%
23%
28%
8%
37%
7:45
3%
70%
58%
58%
24%
31%
12:36
60%
93%
13%
27%
6%
18:15
13%
50%
57%
24%
28%
8%
36%
7:45
3%
70%
60%
58%
26%
31%
12:36
59%
93%
13%
28%
6%
18:15
13%
50%
57%
22%
26%
9%
37%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At least one polarimetric backbone set exceeds the RGB AP at 7:47 MST for four out of six possible object categories. At 12:36 MST, at least one backbone set exceeds AP(${S_0}$) in two out of five possible object categories. For 18:15 MST, at least one polarimetric backbone set exceeds AP(${S_0}$) in three out of seven possible object categories.
Table 4.
Westbound Downtown Scene AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Car
Fire Hydrant
Parking Meter
Person
Traffic Light
Truck
07:42
95%
85%
18%
73%
4%
12:32
86%
80%
37%
98%
18:10
90%
65%
22%
63%
50%
07:42
82%
75%
0%
63%
14%
12:32
88%
73%
39%
73%
18:10
82%
67%
15%
27%
36%
07:42
95%
85%
12%
73%
4%
12:32
86%
77%
38%
98%
18:10
91%
65%
24%
66%
52%
07:42
95%
85%
13%
74%
4%
12:32
86%
80%
37%
99%
18:10
91%
69%
23%
66%
52%
07:42
95%
89%
18%
74%
5%
12:32
86%
80%
38%
99%
18:10
91%
69%
23%
65%
51%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At 7:42 MST, the polarimetry value exceeds AP(${S_0}$) when detecting fire hydrants, traffic lights, and trucks. At 12:32 MST, the polarimetry value exceeds AP(${S_0}$) when detecting cars, traffic lights, and trucks. Lastly, at 18:10 MST, the polarimetry value matches or exceeds AP(${S_0}$) when detecting cars, fire hydrants, people, traffic lights, and trucks.
Table 5.
Eastbound Arterial Road AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Bike
Car
Fire Hydrant
Motor-cycle
Person
Truck
07:22
58%
18%
12:15
69%
21%
34%
88%
17:53
19%
62%
100%
07:22
40%
7%
12:15
58%
11%
34%
87%
17:53
20%
63%
100%
07:22
57%
13%
12:15
68%
19%
35%
88%
17:53
17%
58%
100%
07:22
56%
13%
12:15
69%
20%
31%
89%
17:53
18%
62%
100%
07:22
57%
13%
12:15
67%
20%
27%
90%
17:53
18%
60%
100%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At 7:22 MST, the polarimetry value does not exceed AP(${S_0}$) for any detection category. Additionally, at 12:15 MST, at least one polarimetric backbone set exceeds AP(${S_0}$) for one out of four possible detected categories. Lastly, at 17:53 MST, the polarimetric backbone set AP exceeds AP(${S_0}$) for two out of three possible detected categories.
Table 6.
Westbound Arterial Road AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Bus
Car
Stop Sign
Person
07:19
58%
70%
12:09
67%
67%
25%
80%
17:50
35%
52%
33%
22%
07:19
53%
58%
12:09
51%
73%
25%
73%
17:50
28%
52%
33%
15%
07:19
57%
69%
12:09
69%
66%
25%
77%
17:50
33%
50%
33%
24%
07:19
59%
69%
12:09
69%
68%
25%
80%
17:50
32%
52%
33%
23%
07:19
56%
69%
12:09
66%
68%
25%
80%
17:50
34%
53%
33%
23%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At least one polarimetric backbone set AP at 7:19 MST exceeds AP(${S_0}$) when detecting buses. At 12:09 MST, the polarimetry value exceeds AP(${S_0}$) in half of the object categories. The polarimetry AP at 17:50 MST exceeds AP(${S_0}$) when detecting cars and people.
Table 7.
Eastbound Residential Road AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Car
Chair
Fire Hydrant
Person
Truck
07:26
57%
68%
36%
11:59
62%
17%
17:45
55%
71%
82%
49%
07:26
43%
35%
49%
11:59
54%
11%
17:45
48%
15%
73%
57%
07:26
59%
68%
38%
11:59
60%
19%
17:45
57%
77%
82%
48%
07:26
59%
69%
38%
11:59
61%
19%
17:45
57%
100%
82%
53%
07:26
60%
72%
38%
11:59
62%
17%
17:45
58%
69%
92%
52%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At least one polarimetric backbone set AP at 7:26 MST exceeds AP(${S_0}$) when detecting cars, fire hydrants, and trucks. At 11:59 MST, the polarimetry value exceeds AP(${S_0}$) when detecting people. The polarimetry AP at 17:45 MST exceeds AP(${S_0}$) for detecting cars, chairs, fire hydrants, and trucks.
Table 8.
Westbound Residential Road AP Values for Three Times of Day and Four Backbone Image Typesa
Backbone Set
Time [MST]
Bus
Car
Truck
07:25
63%
63%
11:58
96%
77%
14%
17:58
98%
27%
63%
07:25
49%
50%
11:58
60%
65%
26%
17:58
94%
26%
77%
07:25
63%
69%
11:58
96%
77%
13%
17:58
98%
28%
70%
07:25
63%
68%
11:58
96%
78%
14%
17:58
98%
28%
65%
07:25
64%
68%
11:58
97%
77%
13%
17:58
98%
28%
70%
Boldfaced values denote AP performance that exceeds AP(${S_0}$). At least one polarimetric backbone set AP at 7:25 MST exceeds AP(${S_0}$) when detecting cars and trucks. At 11:58 MST, the polarimetry value exceeds AP(${S_0}$) for bus and truck detection. The polarimetry value exceeds AP(${S_0}$) at 17:58 MST for car and truck detection.
Table 9.
and Backbone Set Recall Values Averaged Across All Locations and Times of Daya
Backbone Set
Car
Bus
Person
92%
30%
36%
88%
28%
28%
90%
30%
38%
92%
30%
38%
91%
30%
38%
The polarimetric backbone set recall values for buses and cars do not exceed ${S_0}$ but increased from 36% to 38% for detecting people.