Abstract
The codebook background subtraction approach is widely used in computer vision applications. One of its distinguished features is the cylinder color model used to cope with illumination changes. The performances of this approach depends strongly on the color model. However, we have found this color model is valid only if the spectrum components of the light source change in the same proportion. In fact, this is not true in many practical cases. In these cases, the performances of the approach would be degraded significantly. To tackle this problem, we propose an arbitrary cylinder color model with a highly efficient updating strategy. This model uses cylinders whose axes need not going through the origin, so that the cylinder color model is extended to much more general cases. Experimental results show that, with no loss of real-time performance, the proposed model reduces the wrong classification rate of the cylinder color model by more than fifty percent.
© 2014 Optical Society of America
1. Introduction
Background subtraction is a fundamental task in most of the computer vision applications, such as video surveillance, activity recognition, human motion analysis and object tracking. The purpose of a background subtraction algorithm is to distinguish moving objects from the scene. Its basic principle can be formulated as building a model of a background and comparing this model with the current frame in order to identify non-stationary or new objects.
Nature scenes put many challenges on background modeling, including light changes, waving trees, shadows, rippling water, snow, rain, and others. A large number of research papers have been published in the previous decade dealing with these challenges (e.g., [1–21]). Among these, the codebook model [1] and its enhancements (e.g., [2, 14–21]) perform better than many state of the art methods in terms of both segmentation accuracy and the processing speed.
The codebook is a compact and compressed model capable of background modeling for long periods. One of its distinguished features is the cylindrical color (CY) model used to cope with illumination changes. This model was developed based on the observation that under lightening variation, pixel values almost totally distributed in elongated shape along the axis going toward the original point in RGB color space. The CY model, or its variants (e.g., [18, 19, 21]), has been widely used.
However, we have found that the CY model mentioned above is valid only if the spectrum components of the light source change in the same proportion. In fact, it is not true in many practical cases, where the variations of each spectrum component of the light source would not be in proportion. In these cases, the CY model is inaccurate and much less efficient.
To tackle the problem mentioned above we propose an arbitrary cylinder based color (AC) model with a highly efficient updating strategy. This model uses cylinders whose axes needs not going through the origin, so that the CY model is extended to more general cases. With no loss of real-time performance, the AC model reduces the wrong classification rate of CY model by more than 50%. Furthermore, it should be pointed out that, the CY model is only a special case of the AC model.
Another minor contribution of this paper recites in a modified forgetting and strengthening strategy. The original codebook adopts a conservative update policy which does not allow the creation of new codewords when out of the learning stage. To deal with new backgrounds, many researchers introduced the concept of foreground model (e.g., [2, 18, 20]). In their works, the foreground model and the background model are treated separately. Since the background and the foreground models are similar in representations, we merge the two models together and remove some redundant operations to accelerate calculations.
The original codebook background subtraction algorithm is given in Section 2. The proposed AC model is introduced in Section 3. Its construction details are given in Sections 4. The AC model based codebook background subtraction is summarized in Section 5. In Section 6, experiments are made to compare the performance of the proposed AC model with that of the CY model. Comparisons with two other typical generalizations of the CY model [18, 21] are also given. Finally, the paper is concluded in Section 7.
2. Original codebook
Original codebook background subtraction is a pixel-based approach. Each pixel is modelled by a codebook containing one or more codewords. Each codeword is represented by a RGB vector and a 6-tuple , where and are the minimum and the maximum brightness, respectively. is the frequency with which the codeword has occurs. is the maximum negative run-length, meaning the longest time interval during which this codeword is not accessed. and are the first and the last access times of the codeword respectively. A codebook for each pixel can be constructed as follows:
- 1) Find a codeword from the codebook matching to the current pixel value based on two conditions a) and b).
- 2) If no match could be found, label the pixel as foreground and create a new codeword by setting
- 3) Otherwise, label the pixel as background and update the corresponding codeword as follows.
For each codeword in the codebook, if , delete the codeword from the codebook.
3. Arbitrary cylinder color model
It was firstly reported in [1] that, under lighting variation, the pixel values are mostly distributed in elongated shape along the axis going toward the origin in RGB color space. In their experiment, pixels were sampled from an image sequence captured from a color-chart. Illumination was changed by decreasing or increasing the light strength to make the pixel values darker or brighter. Based on this observation, they developed the CY model. This color model, or its variants (e.g., [18, 19, 21]), has been widely used.
However, this model implies that the spectrum components of the light source change in the same proportion. In fact, it is not true in many practical cases. For example, there are three light sources, the pure red light source, the pure green light source, and the pure blue light source. If we only decrease or increase the brightness of the red light, the pixel values can never move along an axis going toward the origin.
Such complex illumination conditions exist in many environments as illustrated in Fig. 1. The test sequence in the first row was drawn from an IP camera at a junction of three roads. The test sequence in the second row was captured by a camera near a harbor. And the test sequence in the third row was downloaded from the PETS2007 website. Pictures in the first column are screenshots. Pixels for observation are marked by small crosses centered in red boxes.
In these scenarios, both direct light (e.g., from the sun or light-bulbs) and diffuse light contribute to the background illumination. But the variations of each light source would not be in proportion. The sampled pixel value distributions are shown in the second column of Fig. 1. It can be seen that the distributions are not along axes through the origin.
If we use the CY model to represent the background pixel value distribution, as illustrated in the third column of Fig. 1, many small cylinders must be used. This would make the computation much slower. In addition, since those cylinders cover many unwanted spaces, inaccuracy would occur. If we use cylinders whose axes do not pass though the origin, as shown in the last column of Fig. 1, only one cylinder would be enough to model the background, and the result of background subtraction should be much more accurate.
Based on the discussion above, to extend the CY model to more general cases, we developed the AC color model. As shown in Fig. 2, this model is represented by its two controlling point and .
For an observed pixel value , the color distortion is described by two parameters . Parameter is the distance from to along, which can be calculated by
in which is the dot product and is the norm operator. Parameter is the perpendicular distance from to , which can be calculated byThe incoming pixel is said to be within the model AC (i.e. it is classified as background), if it meets the following conditions
where (the color threshold) and (the intensity threshold) are positive numbers.4. AC model construction
In this section, we deal with how the proposed AC model is initialized, updated, and forgotten (or strengthened).
4.1. AC model initialization
Suppose is the observed pixel value. If a new AC model has to be established, it is initialized as follows.
4.2. Updating the AC model over time
Suppose background pixel values lie in the neighborhood of the line segment . We can obtain the equation of by solving the least square estimation from the sampled pixel values. However, this approach requires the storage of a large amount of pixel values and the calculation is time costing. In order to simplify calculations, we made an approximation that the sampled pixel values are evenly distributed along the axis of the AC model.
Let the new pixel value be . Then, The updated line segment must lie in the plane defined by and . Hence, the updating problem can be reduced from 3D space to 2D space. The 2D space coordinate system is defined as follows and shown in Fig. 3.
- 1) Let the point be the origin.
- 2) Let the direction of the x-axis along the vector .
- 3) Let the direction of the y-axis along the vector , where is the distance from to along .
The points , and , respectively, can be represented in the 2D system as.
where is the distance from to . In the 2D coordinate system the new line segment can be estimated from the original points and the new point together.Let the new line equation be
The parameters, can be obtained by solving the least square solution of Eq. (8)The solution isIn order to determine the endpoints of the new line segment, we calculate the projections of points , , and onto the new estimated line, and choose the most distant two points among the projections , , and as the endpoints and . The formula used to calculate the projection of a point onto a line is given in Eq. (10)where is the projection.Finally, we must transform and from the 2D coordinate system to the 3D coordinate system. The unit vector in 3D space representing the direction of the x-axis in 2D space is
The unit vector in 3D space representing the direction of the y-axis in 2D space isHence, endpoints of the new line segment in the 3D coordinate system can be calculated byThis updating process can be accelerated by adopting a look-up table method. That is, for each , we pre-compute and . Then, we store them in a table. Checking up the table can substantially reduce the time for calculating and . The ranges of are set as follows:
It should be noted that if only one light source exists, the axis of the AC model points to the origin. This is the same as the CY model. Therefore, the CY model could be considered as a special case of the AC model.
4.3. Forgetting and strengthening process
If a codeword was rarely used, it should be discarded (or forgotten). On the contrary, if a codeword was used constantly, it should be strengthened.
The original codebook adopts a conservative update policy which does not allow the creation of new codewords when out of the learning stage. In order to deal with new backgrounds, researchers introduced the concept of foreground model (e.g [2, 18, 20].). In their works, the foreground model and the background model are treated separately. However, the representations of both models are identical. So, it is reasonable to merge the two models together and remove some redundant operations in order to accelerate calculations.
In the proposed algorithm, we use only two parameters to control the forgetting and strengthening processes. The parameter represents the number of times a model been updated since its establishment, and represents the number of times a model has not been updated since its latest update. The corresponding forgetting and strengthening criteria is given below.
- 1) If a codeword is added to the codebook for the first time, let and .
- 2) If a codeword is matched by an observed pixel value, let and .
- 3) For each codeword, let .
5. AC model based codebook
A summarization of the proposed AC model based codebook background subtraction algorithm is given below. The codeword used in this paper has the form of , where and .
- 1) For an observed pixel value, find a matching codeword from the codebook meeting the following conditions.
- 2) If no matching codeword is found, label the pixel as foreground and create a new codeword by setting.
- 3) Otherwise, update the corresponding codeword as follows.
- 4) For each codeword, let .
6. Experiments
In this section, we compare the performances between the proposed AC model, the CY model and two other generalizations of the CY model [18, 21]. The two other generalizations are the cone (CO) color model and the box (BO) color model. All color models are implemented in the same codebook framework. In addition, the thresholds used for color, intensity, and recurrence in all color models are identical.
All models are tested on a variety of challenging indoor and outdoor environments, including subway hall, bridge, harbor, road junction, and city center. The first sequence (subway hall) was extracted from the PETS2007 public data set (data set 8, camera 2). The second and third sequences (bridge and harbor), were captured by HD cameras. The fourth sequence was grabbed with an IP camera. The last sequence (city center) was extracted from the PETS2009 public data set (city center, view 2). All test sequences are resampled to the resolution of 500x300 pixels.
The difficulties for foreground detection in these scenes are caused by dynamic background, local and global illumination changes, training without clear background, camouflage foreground objects, and noise due to video recording.
Many metrics are used to assess their performances. These metrics consists of the true positive number (TP), the false positive number (FP), the true negative number (TN), the false negative number (FN), and the time costs (TC) in millisecond. All ground truth segmentations were produced manually.
6.1. Parameter settings
The corresponding matching and updating schemes for the AC and the CY models can be found in Sections 2 and Subsection 4.2 respectively. The matching and updating schemes for the CO and the BO models can be found in references [18, 21]. All color models adopt the same forgetting and strengthening strategy given in Subsection 4.3. Parameters used in the test are given in Table 1 and Table 2.
The parameter is the color threshold, is the intensity threshold, is the occurrence threshold, is the non-occurrence threshold, and is the updating rate.
The parameterα is the learning rate, θcolor is the threshold for color direction error, β and γ together determine the upper and lower bounds of intensity variations, and ε is the detection threshold.
6.2. Qualitative comparisons
Figure 4 shows examples of background subtraction for one typical frame of each sequence. Pictures in the first column are screenshots. Pictures in the second columns are ground truths. The segmentation results using the CY model and the AC model are shown in the third and fourth columns respectively. The segmentation results using the CO model and the BO model are shown in the fifth and sixth columns respectively. It can be seen from the results that AC model performs much better than the CY model and the other models do, especially in outdoor environments with complex illumination conditions.
6.3. Quantitative evaluations
Calculations are performed by a 3.2GHz i5 processor. Algorithms are realized in LabVIEW environment. We selected 5 frames from each sequence, leading to 25 frames for evaluation.
Table 3 records the TP, the FN, the FP, the TN, and the TC for each of the 25 frames. The first column indicates the frame number of each chosen frame. The scores in columns 2-6 are for the AC model and the scores in columns 7-11 are for the CY model. The last row of this table shows the average value for each metric.
Table 4 records the TP, the FN, the FP, the TN, and the TC for the CO and the BO color models. The scores in columns 2-6 are for the CO model and the scores in columns 7-11 are for the BO model. The last row of this table shows the average value for each metric.
The sum of the FN and the FP indicates the number of wrong classifications. It is obviously that the AC model reduces the wrong classification rate of the CY model by more than 50%. Meanwhile, the time costs of both models are nearly identical.
The detection rate (indicated by TN) of the CO color model (or the BO model) is only a slightly higher than that of the AC model. However, the wrong classification rate of the CO model is three times more than that of the AC model. And the wrong classification rate of the BO model is five times more than that of the AC model.
7. Conclusions
We have found that, the CY model used in codebook background subtraction is inaccurate in many practical cases. Actually, the CY model is only valid if the spectrum components of the light source change in the same proportion. Both theoretical and experimental results proved this point. To extend the CY model to more general cases, we proposed an AC model.
First, we represent the AC model by two endpoints of its axis. Then, a highly efficient updating scheme is developed by considering a linear regression problem. To accelerate the updating process, a look-up table method is proposed. We also simplify the forgetting and strengthening process to reduce time costs.
With no loss of real-time performance, the proposed model reduces the wrong classification rate of the CY model by more than 50%. It should be pointed out that the CY model is only a special case of the AC model. Hence, we expect that the proposed AC model could be a better substitute for the CY model in all codebook background subtraction applications.
Acknowledgments
We are grateful to the reviewers for their constructive suggestions. We are also very grateful for helpful discussions with P. P. Shi (Xidian University) on the AC model updating strategy used in this paper.
References and links
1. K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis, “Background modeling and subtraction by codebook construction,” in Proceedings of IEEE Conference on Image Processing (IEEE, 2004), pp. 3061–3064.
2. K. Kim, T. H. Chalidabhongse, D. Harwood, and L. Davis, “Real-time foreground-background segmentation using codebook model,” Real-time Imaging 11(3), 172–185 (2005). [CrossRef]
3. A. Elgammal, R. Duraiswami, D. Harwood, and L. S. Davis, “Background and foreground modeling using nonparametric kernel density estimation for visual surveillance,” Proc. IEEE 90(7), 1151–1163 (2002). [CrossRef]
4. L. Y. Li, W. M. Huang, I. Y. H. Gu, and Q. Tian, “Statistical modeling of complex backgrounds for foreground object detection,” IEEE Trans. Image Process. 13(11), 1459–1472 (2004). [CrossRef] [PubMed]
5. D. S. Lee, “Effective Gaussian mixture learning for video background subtraction,” IEEE Trans. Pattern Anal. Mach. Intell. 27(5), 827–832 (2005). [CrossRef] [PubMed]
6. Z. Zivkovic and F. Heijden, “Efficient adaptive density estimation per image pixel for the task of background subtraction,” Pattern Recognit. Lett. 27(7), 773–780 (2006). [CrossRef]
7. M. Heikklä and M. Pietikäinen, “A texture-based method for modeling the background and detecting moving objects,” IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 657–662 (2006). [CrossRef] [PubMed]
8. L. Maddalena and A. Petrosino, “A self-organizing approach to background subtraction for visual surveillance applications,” IEEE Trans. Image Process. 17(7), 1168–1177 (2008). [CrossRef] [PubMed]
9. D. M. Tsai and S. C. Lai, “Independent component analysis-based background subtraction for indoor surveillance,” IEEE Trans. Image Process. 18(1), 158–167 (2009). [CrossRef] [PubMed]
10. O. Barnich and M. Van Droogenbroeck, “ViBe: a universal background subtraction algorithm for video sequences,” IEEE Trans. Image Process. 20(6), 1709–1724 (2011). [CrossRef] [PubMed]
11. S. Kwak, G. Bae, and H. Byun, “Moving-object segmentation using a foreground history map,” J. Opt. Soc. Am. A 27(2), 180–187 (2010). [CrossRef] [PubMed]
12. C. Cuevas, R. Mohedano, and N. García, “Adaptable Bayesian classifier for spatiotemporal nonparametric moving object detection strategies,” Opt. Lett. 37(15), 3159–3161 (2012). [CrossRef] [PubMed]
13. A. Elkabetz and Y. Yitzhaky, “Background modeling for moving object detection in long-distance imaging through turbulent medium,” Appl. Opt. 53(6), 1132–1141 (2014). [CrossRef] [PubMed]
14. Y. B. Li, F. Chen, W. L. Xu, and Y. T. Du, “Gaussian-based codebook model for video background subtraction,” Adv. Nat. Comput. 2, 762–765 (2006). [CrossRef]
15. M. H. Sigari and M. Fathy, “Real-time background modeling/subtraction using two-layer codebook model,” in Proceedings of the International Multi-Conference of Engineers and Computer Scientists (IAENG, 2008), pp. 717–720.
16. A. Ilyas, M. Scuturici, and S. Miguet, “Real time foreground-background segmentation using a modified codebook model,” in Proceedings of IEEE Conference on Advanced Video and Signal Based Surveillance (IEEE, 2009), pp. 454–459. [CrossRef]
17. M. J. Wu and X. R. Peng, “Spatio-temporal context for codebook-based dynamic background subtraction,” AEU, Int. J. Electron. Commun. 64(8), 739–747 (2010). [CrossRef]
18. J. M. Guo, Y. F. Liu, C. H. Hsia, M. H. Shih, and C. S. Hsu, “Hierarchical method for foreground detection using codebook model,” IEEE Trans. Circ. Syst. Video Tech. 21(6), 804–815 (2011). [CrossRef]
19. I. T. Sun, S. C. Hsu, and C. L. Huang, “A hybrid codebook background model for background subtraction,” in Workshop of IEEE Conference on Signal Processing Systems (IEEE, 2011), pp. 96–101. [CrossRef]
20. M. Shah, J. Deng, and B. Woodford, “Enhanced codebook model for real-time background subtraction,” Neural Information Processing 3, 449–458 (2011).
21. Q. Tu, Y. Xu, and M. Zhou, “Box-based codebook model for real-time objects detection,” in Proceedings of IEEE Conference on Intelligent Control and Automation (IEEE, 2008), pp. 7621–7625.