Rolling-shutter-effect camera-based visible light communication using RGB channel separation and an artificial neural network

Ke-Ling Hsu; Chi-Wai Chow; Yang Liu; Yu-Chun Wu; Chong-You Hong; Xin-Lan Liao; Kun-Hsien Lin; Yi-Yuan Chen

doi:10.1364/OE.405309

1. Introduction

Visible light communication (VLC) uses the visible light region of the electromagnetic spectrum to provide optical wireless transmissions [1–5]. It is electromagnetic interference (EMI) free and license-free. Besides, it can also release the pressure on the congested radio-frequency (RF) communication spectrum. Hence, VLC is considered as a complementary technique to the RF communications. Employing mobile-phone embedded complementary metal oxide semiconductor (CMOS) camera for VLC, which is also known as optical camera communication (OCC), has received much attention in both academia and industries. Recently, in the standard IEEE 802.15.7, “Short-Range Optical Wireless Communications”, three more physical (PHY) types (PHY IV, V and VI) related to OCC are included. The three PHY types specify the operating data rates from a few bit/s to hundreds of bit/s [6]. One implementation of OCC is to use the region-of-interest (RoI) technique for detecting a specific region of a frame to reduce the computational burden. It can provide > 100 m at a few bit/s transmission for vehicle-to-Infrastructure (V2I) or vehicle-to-vehicle (V2 V) communications [7,8]. Several RoI tracking techniques for OCC were proposed, including the Cam-Shift algorithm with the Kalman filter [9] and the Particle-Filter algorithm based on Cam-Shift algorithm [10]. These schemes were effective to mitigate different outdoor environmental interferences, and can provide tracking error in centimeters. OCC using camera with tailor-made pixels for imaging and high-speed detection is also proposed [11]. Another implementation of OCC is to use the rolling shutter effect (RSE) of the CMOS camera [12–15]. During the RSE operation, the CMOS image sensor does not acquire all the optical signal at the same time. Instead, each image frame is generated row-by-row based on the captured pixels. Dark and bright fringes are captured in each frame illustrating light “OFF” and “ON” states. The RSE based OCC system can offer much higher data rate than the RoI scheme, and it is suitable for indoor environments to deliver real-time positioning and price information to users in bus terminals, shopping malls etc. As the RSE based OCC is an asynchronous transmission with a time-gap during the frame-to-frame processing, special packet based transmission algorithms are required [13–15]. As mentioned before, each image frame is filled at different times due to the row-by-row exposure delay. This will result in a high inter-symbol interference (ISI) when the VLC is operated at high data rate (i.e., fewer pixel-rows per bit). Moreover, the transmission performance is also highly affected by the advertisement contents displayed on the LED light panel [16]. Recently, Logistic Regression machine learning (LRML) algorithm was proposed to improve the RSE pattern decoding performance [17]. Besides, a 2-D convolutional neural network (CNN) [18] was also proposed to enhance RSE pattern decoding; however, it needed to process all the 2-D frame images, which may be complicated and time consuming.

In this work, we put forward and experimentally demonstrate a light emitting diode (LED) light panel and RSE camera based VLC system using Z-score normalization, red/green/blue (RGB) color channel separation, and 1-D artificial neural network (ANN). The proposed scheme is used to mitigate the high ISI generated by the RSE pattern due to the low pixel-per-bit and the high noise-ratio (NR) of the display contents on the LED light panel. Here, only 1-D ANN is needed for the retrieval of the RSE pattern, and it could be simpler than the CNN scheme reported in [18]. The proposed scheme here can be operated at data rate of 1.2 kbit/s with free-space distance of 2.5 m, which is higher than the previously proposed LED-panel based OCC using the LRML scheme, which was operated at data rate of 0.78 kbit/s with transmission distance of 1.5 m [17]; and the grayscale-value-distribution with LRML scheme, which was operated at data rate of 1.02 kbit/s with transmission distance of 1.5 m [19]. The proposed scheme here also performs better than the LED panel based OCC system using frequency-shift-keying (FSK) format, which was operated at 3 bit/s with transmission distance of 2 m [20].

2. Neural network algorithm and experiment

Figure 1(a) shows the experiment of the RSE based VLC system. An arbitrary waveform generator (AWG, Tektronix AFG3252C) is attached to a 22 W LED display light panel (Li-Cheng Corp.) to modulate the backlight LED using on-off keying (OOK) modulation. In practical implementation, Ardunio microprocessor can be used to control the VLC data emitting by the LED panel as illustrated in [20]. The Rx is the embedded CMOS camera in Redmi Note 7. The resolution, frame rate, exposure time and ISO are 1920 × 1080 pixels, 30 fps, 1/5000 s and 550 respectively. Figure 1(b) shows the flow diagram of the mechanism of the RSE decoding process. The VLC packet is first read-in; then the light panel area is located. The maximum grayscale value in each pixel row is calculated to build a grayscale value matrix for each image frame. Due to the frame-to-frame processing time-gap of the CMOS image sensor, we have to make sure a complete VLC packet is received in an image frame. This is confirmed by identifying two headers, and the payload data is recorded between two headers. Then the grayscale value matrix of the payload will be separated into R, G, and B color channel matrices, and Z-score normalization are applied for the three matrices. The Z-score normalization is performed by using the average grayscale value minus the means of all the averaged grayscale values, then the value is divided by the standard deviation of all the averaged grayscale values. This is to produce a bipolar matrix with zero average for the subsequent ANN operation. After the ANN operation, BER is measured based on bit-by-bit comparison.

Fig. 1. (a) Proposed RSE based VLC system. (b) Flow diagram of the mechanism of RSE decoding. (c) Proposed ANN with an input layer with 3 elements (R, G, B), a hidden layer with 4 neurons, and an output layer.

Download Full Size | PDF

Figure 1(c) shows the proposed ANN architecture. It consists of an input layer with 3 elements (R, G, B), a hidden layer with 4 neurons, and an output layer. The activation function is based on feed-forward sigmoid function as shown in Eq. (1),

(1)$${z_j} = \sigma (\sum\limits_{i = 1}^3 {{w_{ji}}{x_i} + {w_{jo}}} )$$

where j is the j^th hidden layer, x is the R, G, B input, w_ji is the weight of the color channel and w_j0 is the input bias. The weight can be obtained by using the cross entropy loss function as shown in Eq. (2), where t_n is the target logic, P_n is the output layer probability.

(2)$$\textrm{E(w)} =- \sum\limits_{n = 1}^N {[{t_n}\ln {P_n} + (1 - {t_n})\ln (1 - {P_n})]} + \alpha ||w ||_2^2$$

In order to reduce the error function, the Limited-memory Broyden–Fletcher–Goldfarb–Shanno (LBFGS) optimizer [21] is used as shown in Eq. (3). This can provide the minimum of the loss function using a limited amount of computer memory.

(3)$${\textrm{W}_{k + 1}} = {\textrm{W}_k} - H_k^{ - 1}{g_k}; \textrm{where} {g_k} = \nabla E({W_k})$$

LBFGS method uses second-order optimization based on Hessian matrix H_k. It can find the error at a faster rate. In order to improve the training speed, LBFGS uses the diagonal matrix as the initial Hessian matrix, and then update the Hessian matrix with s_k and y_k, as shown in Eq. (4). The error function gradient can be calculated by backpropagation.

(4)$${s_k} \leftarrow {W_{k + 1}} - {W_k}; {y_k} \leftarrow {g_{k + 1}} - {g_k}; \textrm{H}_{\textrm{k + 1}}^{ - 1} = (I - \frac{{{s_k}y_k^T}}{{y_k^T{s_k}}})H_k^{ - 1}(I - \frac{{{y_k}s_k^T}}{{y_k^T{s_k}}}) + \frac{{{s_k}s_k^T}}{{y_k^T{s_k}}}$$

Here, the frame rate of the camera is 30 fps. For acquiring the data set among the 30 image frames, the first 3 image frames are used for training and the remaining 27 frames are used for testing. During the training process, the maximum grayscale value of each pixel row in the VLC packet is extracted as the training data set. As there are hundreds of pixel row in the training frames, there are hundreds of data for the training. The binary numbers 0 and 1 are the target values representing logics 0 and 1. Given the initial parameters, training data set and target values, the cross entropy loss function of classification described above can be calculated. For the LBFGS optimizer explained above, the data obtained of the first 3 image frames are substituted for each iteration to minimize the error. After the error function converges to an acceptable value, the optimized parameters for the ANN model can be obtained. Here, we arbitrary select 3 different display contents with NRs defined in [16]. Figures 2(a)–2(c) show the loss function errors against iterations at NR = 0%, 40% and 70% respectively. Since the noise at NR = 0% is small, the function converges to a lower error of 0.2124 after 10 iterations. At high NR of 40% and 70%, the function converges to a higher error of 0.375 after 20 iterations.

Fig. 2. Loss function errors against iterations at NRs of (a) 0%, (b) 40% and (c) 70%.

Download Full Size | PDF

3. Results and discussions

The primary function of the LED light panel is for displaying different advertisement contents. When including the secondary function of VLC to modulate the light panel, the display contents on the light panel will produce different signal-to-noise ratios (SNRs) to the VLC signal, affecting the transmission performance. Figure 3(a) shows the three display contents with NR = 0%, 40% and 70% respectively. Figure 3(b) shows the experimental pixel-per-bit of the RSE pattern at different data rates and different VLC transmission distances for satisfying the forward error correction (FEC, i.e. BER ≦ 3.8 × 10⁻³) threshold. When the data rate or transmission distance increases, the pixel-per-bit decreases. This means that at high data rate, the number of pixel rows required to represent one bit becomes smaller. As a result, the difficulty of demodulating the RSE pattern increases.

Fig. 3. (a) Three display contents with NR = 0%, 40% and 70%. (b) Experimental pixel-per-bit of the RSE pattern at different data rates and different VLC transmission distances.

Download Full Size | PDF

Figures 4(a) illustrates an example of NR = 70% RSE patterns of the R, G, B channels after Z-score normalization. As the NR of the display content high (NR = 70%), the R, G, and B RSE patterns are highly distorted, and the logic bit cannot be identified by drawing the decision threshold at Z-score = 0. Then, we apply the proposed ANN scheme (pink solid line) and the LRML scheme (gray solid line) as reported in [17], we can observe that the extinction ratio (ER) of the RSE patterns are significantly enhanced in both cases as shown in Fig. 4(b). However, when closer look at the RSE pattern as revealed in Fig. 4(c), we can observe that the ANN scheme can correctly predict the logic probability at the decision threshold = 0.5, while the LRML scheme cannot. As ANN performs classification by making non-linear decision, it can make a better decision when processing non-linear RSE patterns.

Fig. 4. (a) An examples of RSE patterns of the R, G, B channels after Z-score normalization. (b), (c) Output probability by using the proposed ANN and LRML schemes.

Download Full Size | PDF

The BER measurements against different NR display contents and free-space transmission distances between the proposed ANN scheme and the LRML scheme are shown in Figs. 5(a)–5(f). Both schemes perform similarly and can make the correct data logic estimation at low data rate of 720 bit/s with transmission distance of 3 m even the NR = 70% as presented in Figs. 5(a) and 5(d). When the data rate is increased to 960 bit/s, we can observe that both schemes can provide BER satisfying the FEC threshold at NR = 30% with transmission distance of 3 m as shown in Figs. 5(b) and 5(e). When the data rate is further increased to 1200 bit/s, ANN scheme can provide BER satisfying the FEC threshold at NR = 30% with transmission distance of 2.5 m while the LRML can only provide the FEC required BER with transmission distance of 2 m, as shown in Figs. 5(c) and 5(f). In this work, higher data rate is possible by using a higher frame rate camera or higher resolution image sensor. Longer transmission distance is possible by increasing the dimension or emitted power of the LED display panel.

Fig. 5. The BER measurements against different NR display contents and free-space transmission distances between the (a)-(c) LRML scheme and (d)-(f) the proposed ANN scheme.

Download Full Size | PDF

4. Conclusion

We proposed and demonstrated a VLC system using Z-score normalization, RGB color channel separation, and 1-D ANN. The proposed scheme was used to mitigate the high ISI generated by the RSE pattern due to the low pixel-per-bit and the high NR of the display contents. Experiments were performed to compare the proposed ANN scheme with the previous LRML scheme. Results showed that both schemes can make the correct data logic estimation at low data rate of 720 bit/s with transmission distance of 3 m even NR = 70%. When the data rate was increased to 1200 bit/s, ANN scheme can provide BER satisfying the FEC threshold at NR = 30% with transmission distance of 2.5 m.

Funding

Ministry of Science and Technology, Taiwan (MOST-107-2221-E-009-118-MY3, MOST-108-2218-E-009-031, MOST-109-2221-E-009-155-MY3) and ITRI subcontract project.

Disclosures

The authors declare no conflicts of interest.

References

1. H. L. Minh, D. O’Brien, G. Faulkner, L. Zeng, K. Lee, D. Jung, Y. J. Oh, and E. T. Won, “100-Mb/s NRZ visible light communications using a post-equalized white LED,” IEEE Photonics Technol. Lett. 21(15), 1063–1065 (2009). [CrossRef]

2. Z. Wang, C. Yu, W. D. Zhong, J. Chen, and W. Chen, “Performance of a novel LED lamp arrangement to reduce SNR fluctuation for multi-user visible light communication systems,” Opt. Express 20(4), 4564–4573 (2012). [CrossRef]

3. H. H. Lu, Y. P. Lin, P. Y. Wu, C. Y. Chen, M. C. Chen, and T. W. Jhang, “A multiple-input-multiple-output visible light communication system based on VCSELs and spatial light modulators,” Opt. Express 22(3), 3468–3474 (2014). [CrossRef]

4. B. Janjua, H. M. Oubei, J. R. Durán Retamal, T. K. Ng, C. T. Tsai, H. Y. Wang, Y. C. Chi, H. C. Kuo, G. R. Lin, J. H. He, and B. S. Ooi, “Going beyond 4 Gbps data rate by employing RGB laser diodes for visible light communication,” Opt. Express 23(14), 18746–18753 (2015). [CrossRef]

5. C. W. Chow, C. H. Yeh, Y. Liu, Y. Lai, L. Y. Wei, C. W. Hsu, G. H. Chen, X. L. Liao, and K. H. Lin, “Enabling techniques for optical wireless communication systems,” Proc. OFC2020, paper M2F.1.

6. . “IEEE Standard for Local and metropolitan area networks–Part 15.7: Short-Range Optical Wireless Communications,” in IEEE Std 802.15.7-2018 (Revision of IEEE Std 802.15.7-2011), 23 1–407 (2019) doi: 10.1109/IEEESTD.2019.8697198.

7. P. Luo, M. Zhang, Z. Ghassemlooy, H. L. Minh, H. M. Tsai, X. Tang, L. C. Png, and D. Han, “Experimental demonstration of RGB LED-based optical camera communications,” IEEE Photonics J. 7(5), 1–12 (2015). [CrossRef]

8. C. W. Chow, R. J. Shiu, Y. C. Liu, Y. Liu, and C. H. Yeh, “Non-flickering 100 m RGB visible light communication transmission based on a CMOS image sensor,” Opt. Express 26(6), 7079–7084 (2018). [CrossRef]

9. M. Huang, W. Guan, Z. Fan, Z. Chen, J. Li, and B. Chen, “Improved target signal source tracking and extraction method based on outdoor visible light communication using a Cam-Shift algorithm and Kalman filter,” Sensors 18(12), 4173 (2018). [CrossRef]

10. Z. Liu, W. Guan, and S. Wen, “Improved target signal source tracking and extraction method based on outdoor visible light communication using an improved particle filter algorithm based on Cam-Shift algorithm,” IEEE Photonics J. 11(6), 1–20 (2019). [CrossRef]

11. I. Takai, S. Ito, K. Yasutomi, K. Kagawa, M. Andoh, and S. Kawahito, “LED and CMOS image sensor based optical wireless communication system for automotive applications,” IEEE Photonics J. 5(5), 6801418 (2013). [CrossRef]

12. C. Danakis, M. Afgani, G. Povey, I. Underwood, and H. Haas, “Using a CMOS camera sensor for visible light communication,” Proc. OWC’12, 1244–1248.

13. C. W. Chow, C. Y. Chen, and S. H. Chen, “Visible light communication using mobile-phone camera with data rate higher than frame rate,” Opt. Express 23(20), 26080–26085 (2015). [CrossRef]

14. C. W. Chow, C. Chen, and S. Chen, “Enhancement of signal performance in LED visible light communications using mobile phone camera,” IEEE Photonics J. 7(5), 1–7 (2015). [CrossRef]

15. K. Liang, C. W. Chow, Y. Liu, and C. H. Yeh, “Thresholding schemes for visible light communications with CMOS camera using entropy-based algorithms,” Opt. Express 24(22), 25641–25646 (2016). [CrossRef]

16. C. W. Chow, R. J. Shiu, Y. C. Liu, C. H. Yeh, X. L. Liao, K. H. Lin, Y. C. Wang, and Y. Y. Chen, “Secure mobile-phone based visible light communications with different noise-ratio light-panel,” IEEE Photonics J. 10(2), 1–6 (2018). [CrossRef]

17. Y. C. Chuang, C. W. Chow, Y. Liu, C. H. Yeh, X. L. Liao, K. H. Lin, and Y. Y. Chen, “Using logistic regression classification for mitigating high noise-ratio advisement light-panel in rolling-shutter based visible light communications,” Opt. Express 27(21), 29924–29929 (2019). [CrossRef]

18. L. Liu, R. Deng, and L. Chen, “47-kbit/s RGB-LED-based optical camera communication based on 2D-CNN and XOR-based data loss compensation,” Opt. Express 27(23), 33840–33846 (2019). [CrossRef]

19. K. L. Hsu, Y. C. Wu, Y. C. Chuang, C. W. Chow, Y. Liu, X. L. Liao, K. H. Lin, and Y. Y. Chen, “CMOS camera based visible light communication (VLC) using grayscale value distribution and machine learning algorithm,” Opt. Express 28(2), 2427–2432 (2020). [CrossRef]

20. C. W. Chow, R. J. Shiu, Y. C. Liu, X. L. Liao, K. H. Lin, Y. C. Wang, and Y. Y. Chen, “Using advertisement light-panel and CMOS image sensor with frequency-shift-keying for visible light communication,” Opt. Express 26(10), 12530–12535 (2018). [CrossRef]

21. C. M. Bishop, Pattern Recognition and Machine Learning (Springer, 2006).

Rolling-shutter-effect camera-based visible light communication using RGB channel separation and an artificial neural network

Abstract

1. Introduction

2. Neural network algorithm and experiment

3. Results and discussions

4. Conclusion

Funding

Disclosures

References

Cited By

Figures (5)

Equations (4)

Optics Express