## Abstract

In this paper, a deep learning-based detection scheme is proposed for the visible light communication (VLC) systems using generalized spatial modulation (GenSM). In the proposed detection scheme, a deep neural network consisting of several neural layers is applied to detect the received signals. By integrating the signal processing modules of the conventional detection schemes into one deep neural network, the proposed scheme is able to extract the information bits from the received signals efficiently. After offline training, the proposed detection scheme can serve as a promising detection method for the VLC system with GenSM. Simulation results validate that the proposed detection scheme is capable of achieving superior detection error performance than conventional detection schemes at acceptable complexity.

© 2020 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Nowadays, with the widespread deployment of the energy-efficient light emitting diodes (LEDs) in homes, offices, and streetlights, visible light communication (VLC) is anticipated to be an important part of the next generation wireless communication networks [1]. Different from the radio frequency communication, VLC supports the wireless communication by modulating the data information on the intensity of light emitted from the LEDs [2]. At the receiver, the signals are detected by the photo-diodes (PDs) [3].

Recently, the generalized spatial modulation (GenSM) technique has been introduced into the VLC systems [4]. Different from the conventional multiple-input multiple-output (MIMO) VLC systems [5], in the VLC-GenSM system, not all the LEDs are activated at the same time to transmit data information [6]. Due to the principle of GenSM, both the intensity of light transmitted from the activated LEDs, and the indices of the activated LEDs can be modulated to convey data information. Related works have proven that the VLC systems using GenSM are more robust to the channel correlation in the indoor environments [7]. As a result, VLC with GenSM has attracted numerous research efforts, including error performance analyses [8], dimming control assistance [9], and channel adaptation [10].

In related research works, several detection schemes have been proposed for the VLC systems using GenSM. In [8], a joint maximum-likelihood (ML) detection scheme is proposed, which takes both the LED activation patterns and intensity levels into consideration. It searches all the possible LED activation patterns and intensity levels jointly to find the most likely transmitted signals. The joint ML scheme achieves the optimal error performance, which can be seen as the benchmarks for other detection schemes. A two-step equalization-based detection scheme is proposed in [11], where a zero-forcing (ZF) equalization module is used to alleviate the interference from different LEDs. The detector regards the positions of the signals with maximum values as the estimated indices of the activated LEDs. Then, the detector decodes the information bits from the intensity levels on the estimated activated LEDs.

Deep learning technique, which takes advantage of deep neural networks (DNNs) to learn the representations of data information efficiently, has powered many aspects of modern society, including computer vision, object recognition, and wireless communication [12]. Recent researches have brought the deep learning technique to the VLC systems and solved several problems in VLC successfully. In [13], a deep learning-based method has been proposed to detect the atmospheric turbulence in outdoor VLC system, which can achieve much better detection accuracy than conventional methods. In [14], a long short-term memory neural network is utilized to combat the nonlinear distortion in VLC systems. In [15], a deep learning-based collaborative constellation design method is proposed for the VLC systems, which can achieve remarkably low complexity over conventional methods. The number of neurons in the network can be adaptively adjusted to fine tune the trade-off between performance and complexity. These researches have validated the power of using deep learning technique in VLC systems, and motivated us to build a deep learning-based detection framework for the VLC systems with GenSM.

In this paper, we propose a DNN-based framework to solve the signal detection problem of the VLC system using GenSM, which is termed as DNN-GenSM. The proposed DNN-GenSM scheme integrates the equalization module, the decoding module, and the bit combiner module of the conventional detectors into one deep neural network, which contains one input layer, one equalization layer, three hidden layers, one output layer, and one hard decision layer. By utilizing this DNN structure, the DNN-GenSM detector is able to extract the information bits from the received GenSM symbols efficiently. After offline training, it shows that the DNN-GenSM detector can serve as a promising detection method for the VLC system with GenSM. Simulation results validate that the proposed detection scheme is capable of achieving better trade-offs between detection error performance and computational complexity.

The rest of the paper is organized as follows. In Section 2., the optical and communication model of the indoor VLC system with GenSM is presented. In Section 3., the proposed DNN-GenSM detection scheme is explained in detail. Simulation results and corresponding discussions are provided in Section 4. Finally, concluding remarks are given in Section 5.

## 2. System model

We consider an indoor multi-LED VLC system using GenSM, the geometry setup of which is shown in Fig. 1. In a cubic room with size of $(a \times b \times c)$, $N$ LEDs are equally spaced on the ceiling pointing straight down to provide illumination and communication services as well. The LEDs are denoted as $L_j~(j = 1, 2, \ldots , N)$. A user device is placed on the floor with height of $e$, which is equipped with $M$ PDs pointing straight up. The PDs are represented as $D_i~(i=1,2,\ldots ,M)$. The distance, angle of irradiance, and the angle of incidence between the $j$-th LED $L_j$ and the $i$-th PD $D_i$ are denoted as $d_{i,j}$, $\phi _{i,j}$, and $\theta _{i,j}$, respectively.

In VLC, the information bits required by the user are carried by the optical pulses emitted from the LEDs. The block diagram of the VLC system is shown in Fig. 2. Since the GenSM technique is used, only $N_{\mathrm {a}}$ LEDs are activated at the same time to transmit the $Q$-level pulse amplitude modulation (PAM) symbols, and the indices of the activated LEDs can be utilized to convey extra data information as well [4]. The intensity levels of the PAM symbols can be expressed as

where $I$ denotes the mean optical power emitted. For each GenSM symbol, $S_1 = N_{\mathrm {a}} \log _2(Q)$ bits can be transmitted through the PAM symbols and $S_2 = \lfloor \log \binom {N}{N_{\mathrm {a}}} \rfloor$ bits can be conveyed by the indices of the activated LEDs, where $\binom {\cdot }{\cdot }$ represents the binomial coefficient. As a result, $S = S_1 + S_2$ bits can be transmitted on each GenSM symbol.At the transmitter, the information bits $\mathbf {b} = [b_1, b_2, \ldots , b_S]^{\mathrm {T}} \in \mathbb {B}^S$ are split into two parts firstly, i.e., $\mathbf {b}_1 \in \mathbb {B}^{S_1}$ and $\mathbf {b}_2 \in \mathbb {B}^{S_2}$. $\mathbf {b}_1$ is used to determine the transmitted PAM symbols on the activated LEDs, and $\mathbf {b}_2$ is used to decide the indices of the activated LEDs. Then, the GenSM symbols are generated according to the chosen indices of the activated LEDs and the PAM symbols. We denote the transmitted GenSM symbols as $\mathbf {x} = [x_1, x_2, \ldots , x_N]^{\mathrm {T}} \in \mathbb {R}^N$. Since the GenSM technique is used, only $N_{\mathrm {a}}$ elements of $\mathbf {x}$ are non-zero, and the rest $(N-N_{\mathrm {a}})$ elements are zeros. Therefore, the transmitted symbols has the structure of

where $n_p \in [1,N]~(p = 1,2,\ldots ,N_{\mathrm {a}})$ denotes the position of the activated LEDs. After adding the direct current (DC) bias, the GenSM symbols are emitted from the LEDs and transmitted through the optical channel.At the receiver, the signals are received by the PDs of the user device. We denote the received signals after DC bias removal as $\mathbf {y} = [y_1, y_2, \ldots , y_M]^{\mathrm {T}} \in \mathbb {R}^M$, which can be expressed as

where $\gamma$ represents the responsivity of the PDs. $\boldsymbol {\omega } \sim \mathcal {N}(\mathbf {0}, \sigma ^2 \mathbf {I}_M)$ denotes the additive white Gaussian noise (AWGN). $\mathbf {H} \in \mathbb {R}^{M \times N}$ is the optical channel matrix, which can be further given as## 3. Proposed deep learning-based detection scheme

In this section, we explain the framework of the proposed DNN-GenSM scheme in detail, the block diagram of which is given in Fig. 3. The proposed scheme integrates the equalization module, the decoding module, and the bit combiner module of the conventional detection schemes into one deep neural network to improve the detection performance. The proposed DNN-GenSM detector consists of one input layer, one equalization layer, three hidden layers, one output layer, and one hard decision layer. The principles and functions of these neural layers are explained in the following paragraphs.

**Input layer:** The input layer is used to prepare the input for the whole neural network. The input layer consists of $M$ neurons, the values of which are taken from the received signals at $M$ PDs after DC bias removal $\mathbf {y}$.

**Equalization layer:** The equalization layer is used to alleviate the impact of the optical path loss and interference from multiple LEDs. The weights for this layer is fixed, which is not changed during the training procedure. In the proposed DNN-GenSM detector, the weights are chosen based on the ZF principle, which is given by $\frac {1}{\gamma } \mathbf {H}^{\dagger }$, where $\mathbf {H}^{\dagger }$ is the pseudo inverse of the optical channel matrix $\mathbf {H}$. Therefore, the output for the equalization layer $\hat {\mathbf {x}}^{(1)} \in \mathbb {R}^M$ can be given as

**Hidden layers:**The hidden layers are utilized to extract the inherent useful information from the signals efficiently. In the proposed DNN-GenSM detector, we adopt three fully connected hidden layers with $F_1$, $F_2$ and $F_3$ neurons in these layers, respectively. The widely used rectified linear unit (ReLU) function $f_{\mathrm {ReLU}}(\alpha ) = \max (0,\alpha )$ is used as the activation function. We denote the output for the hidden layers as $\hat {\mathbf {x}}^{(2)} \in \mathbb {R}^{F_1}$, $\hat {\mathbf {x}}^{(3)} \in \mathbb {R}^{F_2}$, and $\hat {\mathbf {x}}^{(4)} \in \mathbb {R}^{F_3}$, respectively, which can be derived as

**Output layer:** The output layer is used to derive the roughly estimated bit information, which is a fully connected layer with $S$ neurons. We adopt the Sigmoid function $f_{\mathrm {Sigmoid}}(\alpha ) = 1 / (1 + \exp ^{-\alpha })$ as the activation function for this layer in order to map the elements of the output vector to the interval $(0,1)$. The weights and the bias are denoted as $\mathbf {W}_4 \in \mathbb {R}^{S \times F_3}$ and $\mathbf {k}_4 \in \mathbb {R}^S$. Thus, the roughly estimated bit information $\hat {\mathbf {x}}^{(5)} = [\hat {x}^{(5)}_1, \hat {x}^{(5)}_2, \ldots , \hat {x}^{(5)}_S]^{\mathrm {T}} \in \mathbb {R}^S$ can be expressed as

**Hard decision layer:**Finally, the hard decision layer is utilized to obtain the estimated bit information $\hat {\mathbf {b}} = [\hat {b}_1, \hat {b}_2, \ldots , \hat {b}_S]^{\mathrm {T}}$. The elements $\hat {b}_s~(s = 1, 2, \ldots , S)$ is set to be $1$ if the corresponding element $\hat {x}^{(5)}_s$ is greater than or equal to $0.5$, or $0$ vice verse. Mathematically, the estimated bit information can be derived by

For mobile VLC users, the network requires retraining after the channel state information (CSI) or signal to noise ratio (SNR) changed. Since the structure of the network keeps unchanged, we can use the data memory to save the optimal network parameters $\boldsymbol {\theta }^*$ at each specific place and SNR level after offline training. Therefore, when the mobile user enters the same place where it has been trained before, the optimal network parameters $\boldsymbol {\theta }^*$ can be directly obtained from the data memory to avoid retraining. As a result, no further retraining is needed once the mobile user has been trained in all of its moving range.

## 4. Simulation results

We consider an indoor VLC system defined in Section 2. $N=4$ LEDs are placed on the ceiling, and a user device with $M=4$ PDs is placed on the floor. According to the principle of GenSM, $N_{\mathrm {a}} = 2$ LEDs are activated simultaneously. The proposed DNN-GenSM detector is implemented on the PyTorch platform [18]. Mini-batch is utilized to accelerate the training process with $1.0 \times 10^2$ GenSM symbols in each mini-batch. The training data set and the testing data set contain $2.0 \times 10^5$ and $1.0 \times 10^4$ randomly generated GenSM symbols, respectively. The detailed simulation parameters are given in Table 1.

#### 4.1 MSE loss analyses

Firstly, we analyze the MSE loss of the DNN-GenSM detector. The MSE losses at different training SNRs are illustrated in Fig. 4. It can be seen from the figure that the MSE loss decreases very fast along with the training process. It can also be seen that the DNN-GenSM detector trained with a higher SNR has much lower MSE loss. In order to obtain stable parameter estimation, different SNR levels may require different numbers of iterations. It shows the DNN-GenSM detector can provide satisfactory detection performance within 50-epoch training, which ensures the fast deployment of the proposed detector.

#### 4.2 Bit error rate (BER) analyses

Then, we analyze the BER performance of the proposed DNN-GenSM scheme. The joint ML detection scheme [8], and the two-step detection schemes using ZF equalization and minimum mean square error (MMSE) equalization [11] are also simulated as benchmarks. The DNN-GenSM detection scheme is trained for $50$ epochs at the SNR of $40$ dB, $50$ dB, and $60$ dB, respectively. The simulation results are provided in Fig. 5.

It can be seen from Fig. 5 that the BERs decrease with the increase of SNRs for all the simulated detectors. Among the simulated detectors, the joint ML detector enjoys the best BER performance, and the two-step detector with ZF equalization achieves the worst BER performance. The two-step detector with MMSE equalization obtains slightly better BER performance than that with ZF equalization, which is mainly because of the utilization of the knowledge for the noise variance in the MMSE algorithm. The proposed DNN-GenSM detector enjoys near-ML BER performance, which is much better than the two-step detectors. This is because by using the DNN structure, the proposed DNN-GenSM detector is able to extract the bit information of the PAM symbols and the indices of the activated LEDs from the received signals efficiently. It can also be seen that the DNN-GenSM detector trained at the SNR of $40$ dB, $50$ dB, and $60$ dB exhibits the best BER performance at the corresponding testing SNR, respectively. The DNN-GenSM detector trained at the SNR of $40$ dB achieves better BER performance in the low to medium SNR region, which is because the neural network is much easier to learn the statistics of the noise at low training SNR. On the other hand, the detector trained at $\mathrm {SNR} = 60$ dB is easier to learn the statistics of the data symbols, since the impact of the noise is not so severe. The detector trained at the SNR of $50$ dB considers both the statistics of the noise and those of the data symbols. Therefore, it achieves satisfactory BER performance for the whole simulated SNR region.

#### 4.3 Computational complexity analyses

Finally, we investigate the computational complexity of the proposed DNN-GenSM detector. We consider the complexity in the online detection process, and the complexity of the offline training process is not taken into account. This is because that once the detector has been trained, it can implement signal detection for a long time without further retraining. The complexity is measured by the number of floating-point operations (flops), including addition, subtraction, multiplication, division, and other operations [19]. The equalization layer takes $2M^2$ flops. The three hidden layers contain $2MF_1$, $2F_1 F_2$, and $2 F_2 F_3$ flops, respectively. The output layer takes $2F_3 S$ flops. In the end, the hard decision layer contains $S$ flops. The total complexity of the DNN-GenSM detector and the joint ML detector are summarized in Table 2. The complexity of the proposed DNN-GenSM and the state-of-the-art detectors with different PAM levels $Q$ is further compared in Fig. 6. It shows that the two-step detector with ZF enjoys the lowest computational complexity, and the complexity of the two-step detector with MMSE is a little bit higher. The proposed DNN-GenSM detector has slightly higher computational complexity than the two-step detectors, while it is much lower than the joint ML detector, especially with high PAM levels. For instance, when $16$ PAM is adopted, the joint ML detector needs about $4.5 \times 10^4$ flops, and the proposed DNN-GenSM detector only needs $2.0 \times 10^3$ flops. It verifies that the proposed detector can achieve near-ML error performance with greatly reduced complexity.

## 5. Conclusion

In this paper, a deep learning-based detection scheme has been proposed for the VLC systems with GenSM, which is termed as DNN-GenSM. In the proposed DNN-GenSM detector, a deep neural network consisting of several neural layers is used to extract the information bits from the received signals effectively and efficiently. The superior performance of the proposed DNN-GenSM detector has been verified through computer simulations. It is seen that the proposed deep learning-based detection framework can serve as a promising candidate for the signal detections in the multi-LED VLC systems.

## Funding

National Natural Science Foundation of China (61871255); Natural Science Foundation of Guangdong Province (2015A030312006); National Key Research and Development Program of China (2017YFE0113300); Fok Ying Tung Education Foundation.

## Disclosures

The authors declare no conflicts of interest.

## References

**1. **H. Haas, L. Yin, Y. Wang, and C. Chen, “What is LiFi?” J. Lightwave Technol. **34**(6), 1533–1544 (2016). [CrossRef]

**2. **D. Tsonev, S. Videv, and H. Haas, “Towards a 100 Gb/s visible light wireless access network,” Opt. Express **23**(2), 1627–1637 (2015). [CrossRef]

**3. **Y. Chen, A. E. Kelly, and J. H. Marsh, “Improvement of indoor VLC network downlink scheduling and resource allocation,” Opt. Express **24**(23), 26838–26850 (2016). [CrossRef]

**4. **C. R. Kumar and R. K. Jeyachitra, “Power efficient generalized spatial modulation MIMO for indoor visible light communications,” IEEE Photonics Technol. Lett. **29**(11), 921–924 (2017). [CrossRef]

**5. **J. Wang, J. Dai, R. Guan, L. Jia, Y. Wang, and M. Chen, “Channel capacity and receiver deployment optimization for multi-input multi-output visible light communications,” Opt. Express **24**(12), 13060–13074 (2016). [CrossRef]

**6. **M. D. Renzo, H. Haas, A. Ghrayeb, S. Sugiura, and L. Hanzo, “Spatial modulation for generalized MIMO: Challenges, opportunities, and implementation,” Proc. IEEE **102**(1), 56–103 (2014). [CrossRef]

**7. **T. Fath and H. Haas, “Performance comparison of MIMO techniques for optical wireless communications in indoor environments,” IEEE Trans. Commun. **61**(2), 733–742 (2013). [CrossRef]

**8. **T. Ozbilgin and M. Koca, “Optical spatial modulation over atmospheric turbulence channels,” J. Lightwave Technol. **33**(11), 2313–2323 (2015). [CrossRef]

**9. **T. Wang, F. Yang, L. Cheng, and J. Song, “Spectral-efficient generalized spatial modulation based hybrid dimming scheme with LACO-OFDM in VLC,” IEEE Access **6**, 41153–41162 (2018). [CrossRef]

**10. **K. Xu, H. Yu, and Y. Zhu, “Channel-adapted spatial modulation for massive MIMO visible light communications,” IEEE Photonics Technol. Lett. **28**(23), 2693–2696 (2016). [CrossRef]

**11. **C. He, T. Thomas, Q. Wang, and J. Armstrong, “Performance comparison between spatial multiplexing and spatial modulation in indoor MIMO visible light communication systems,” in IEEE International Conference on Communications (ICC), (2016).

**12. **Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature **521**(7553), 436–444 (2015). [CrossRef]

**13. **J. Li, M. Zhang, D. Wang, S. Wu, and Y. Zhan, “Joint atmospheric turbulence detection and adaptive demodulation technique using the CNN for the OAM-FSO communication,” Opt. Express **26**(8), 10494–10508 (2018). [CrossRef]

**14. **X. Lu, C. Lu, W. Yu, L. Qiao, S. Liang, A. P. T. Lau, and N. Chi, “Memory-controlled deep LSTM neural network post-equalizer used in high-speed PAM VLC system,” Opt. Express **27**(5), 7822–7833 (2019). [CrossRef]

**15. **M. Le-Tran and S. Kim, “Deep learning-based collaborative constellation design for visible light communication,” IEEE Commun. Lett. (to be published).

**16. **T. Wang, F. Yang, C. Pan, L. Cheng, and J. Song, “Spectral-efficient hybrid dimming scheme for indoor visible light communication: A subcarrier index modulation based approach,” J. Lightwave Technol. **37**(23), 5756–5765 (2019). [CrossRef]

**17. **P. Chvojka, S. Zvanovec, P. A. Haigh, and Z. Ghassemlooy, “Channel characteristics of visible light communications within dynamic indoor environment,” J. Lightwave Technol. **33**(9), 1719–1725 (2015). [CrossRef]

**18. **A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in * Neural Information Processing Systems (NIPS)*, (2017).

**19. **T. Wang, S. Liu, F. Yang, J. Wang, J. Song, and Z. Han, “Generalized spatial modulation-based multi-user and signal detection scheme for terrestrial return channel with NOMA,” IEEE Trans. Broadcast. **64**(2), 211–219 (2018). [CrossRef]