Design and experimental demonstration of underwater wireless optical communication system based on semantic communication paradigm

Jie Xu; Zhitong Huang; Yi Gao; Wenmin Zhai; Hongcheng Qiu; YueFeng Ji

doi:10.1364/OE.507955

1. Introduction

Underwater wireless communication technology is developing rapidly with the increasing exploration and exploitation of the ocean [1]. Compared to the limited data rate of underwater acoustic communication and the significant attenuation of radio frequency communication, underwater wireless optical communication (UWOC) utilizes blue/green optical waves in the underwater transparent "window" (450-550nm) to achieve high-speed data transmission over a medium range, thus attracting immense attention [2,3].

Recently, various advanced communication techniques have been studied in order to improve data rate, increase transmission distance, and enhance robustness to complex underwater environment in the UWOC system. For instance, Chen et al. used a 520-nm laser diode (LD) and 32 quadrature amplitude modulation (32-QAM) single-carrier signals to implement a 56-m/3.31-Gbps UWOC system [4]. Zhang et al. investigated the performance of UWOC systems with low-density parity-check (LDPC) codes as channel coding in various seawater environments [5]. Shi et al. developed a data-driven neural network-based self-equalization model for dynamic pre-equalization of signal in different bandwidth-limited cases and various nonlinearity cases [6]. To the best of our knowledge, the previous studies have utilized techniques gradually approaching the Shannon limit. However, the transmission of the vast amount of data generated by sensors, hydrophones, and cameras in the underwater internet of things necessitates higher data rates [7], which poses a significant challenge to conventional communication paradigm.

Semantic communication (SC), as a new communication paradigm, concentrates on how to precisely convey the desired meaning behind the transmitted symbols, instead of just focusing on how to accurately transmit symbols [8]. Compared to conventional communication, SC effectively extracts semantic information from the source based on specific communication tasks, while also sensing and matching the channel environment. Semantic information refers to the meaningful information that captures the essence of the world [9]. By intelligently allocating communication resources according to the channel environment and the importance of semantic information, SC is expected to significantly enhance efficiency and robustness of the communication process [10]. Based on advanced artificial intelligence (AI) techniques, the SC systems for text, image and video transmission have been designed [11–14]. These studies demonstrate that SC can efficiently transmit minimal data while maintaining semantic integrity and exhibit robustness even under challenging channel conditions. However, current SC systems are designed based on the ideal transmission scenarios, disregarding the impacts of real channel environments and imperfect devices, therefore compromising their practicality in real-world applications. Moreover, the performance of SC systems is mainly demonstrated by simulation results, but experimental verification is lacking.

In this paper, we design a novel end-to-end underwater wireless optical semantic communication (UWOSC) system for underwater image transmission based on the SC paradigm. Specifically, we design a deep residual convolutional neural network-based transceiver and a long short-term memory-based underwater wireless optical channel model. Additionally, a two-phase training strategy is employed to optimize the system. Subsequently, we evaluate the transmission performance of the proposed UWOSC system through experiments conducted on an emulated UWOC experimental platform. Experimental results show that the UWOSC system generalizes well to the untrained channel environments. Compared to the conventional communication schemes, UWOSC demonstrates enhanced robustness in extremely harsh environments, which is essential to maintain transmission stability in open and complex underwater environments. Moreover, the proposed UWOSC system can achieve higher quality transmission while consuming the same communication resources. The superior performance is particularly evident in situations with limited communication resources.

The rest of the paper is organized as follows. In Section 2, we first describe the UWOSC system model, then introduce the neural network (NN) architecture of the transceiver and channel model, finally propose a two-step training strategy. Section 3 introduces the experimental platform and experimental setup. The results and discussion are presented in Section 4, followed by the paper summary in Section 5.

2. Principles

2.1 System model

Given the immense potential of deep learning (DL) techniques in handling intelligent tasks, we interpret the whole UWOSC system as a DL-based end-to-end framework, which consists of transmitter, underwater wireless optical channel, and receiver, as shown in Fig. 1. The transceiver’s modules are all implemented by NNs. We utilize a data-driven NN-based channel model to simulate the channel response, which allows the transceiver NNs propagate the gradient and thus perform the weight update process. Particularly, we assume that the system transmits an underwater image, ${S} \in {R}^{h \times w \times n}$, where $h$, $w$, and $n$ represent height, width and channel of the image, respectively.

Fig. 1. The framework of proposed DL-based UWOSC system composed of a transmitter, underwater wireless optical channel, and a receiver.

Download Full Size | PDF

At the transmitter, the semantic encoder module obtains the spatial topology of the input image, extracts semantic information and maps it into symbols ${Y} \in {R}^{o}$, where $o$ denotes the number of symbols. The symbols ${Y}$ can be represented as:

(1)$${Y}={f}_{{\varphi}}\left({S}\right)$$

where ${f}_{{\varphi }}(\cdot )$ is the semantic encoder function with parameters ${\varphi }$. Notably, bandwidth compression ratio (BCR) is defined as $\frac {o}{h \times w \times n}$, where $o$ represents the channel bandwidth and ${h \times w \times n}$ represents the source bandwidth [12]. A smaller BCR represents fewer symbols are transmitted for the same source. In the up sample module, the symbols are initially resampled and pulse-shaped to obtain samples ${\tilde {Z}} \in {R}^{L}$, where $L = o \times m - m + l$ denotes the number of samples, $m$ and $l$ denote the number of resampled points and the length of pulse-shaping filter, respectively. Then, to satisfy the average transmit power constraint, the samples are power-normalized according to:

(2)$${{Z}}=\sqrt{{L} P}\frac{{\tilde{Z}}}{\sqrt{{\tilde{Z}}^*{\tilde{Z}}}}$$

where ${\tilde {Z}}^*$ is the transpose of ${\tilde {Z}}$, $P$ is the average power constraint.

The power-normalized samples ${Z}$ are transmitted through the underwater wireless optical channel. The absorption and scattering effects of marine environment and imperfect photoelectric devices severely impair the optical signal [15]. We assume the impaired samples obtained at the receiver are ${\hat {Z}}\in {R}^{L}$.

At the receiver, the down sample module firstly convolves the received samples ${\hat {Z}}$ with a matched filter, and then downsamples to obtain symbols ${\hat {Y}} \in {R}^{k}$. Afterwards, the semantic decoder module recovers the semantic information, which is utilized to reconstruct the image ${\hat {S}} \in {R}^{h \times w \times n}$. The reconstruct image can be represented as:

(3)$${\hat{S}}={f}_{{\psi}}\left({\hat{Y}}\right)$$

where ${f}_{{\psi }}(\cdot )$ is the semantic decoder function with parameters ${\psi }$.

The task of the proposed UWOSC system is reconstructing image in real underwater channel environments. Firstly, the accurate channel model need to be established. Assuming that real transmitted samples $d$ and received samples $\hat {d}$ are collected from the UWOC sytem platform. The NN-based channel model can be trained by using the following mean squared error (MSE) loss function:

(4)$${L}_1=\frac{1}{N}\sum_{j=1}^{N}\left({{f}_{{\phi}}\left({d}\right)}_j-{{\hat{d}}}_j\right)^2$$

where ${f}_{{\phi }}(\cdot )$ is the underwater wireless optical channel response, characterized by parameters ${\phi }$, therefore ${{f}_{{\phi }}\left ({d}\right )}$ represents the predicted samples. N denotes the number of samples involved in each computation. Afterwards, transmitter and receiver are jointly optimized based on the trained channel model. It is worth emphasizing that during the training phase of the transceiver NNs, the receiver processes the samples predicted by the channel model, when the transceiver NNs are deployed to the UWOSC system, the receiver processes the actual received samples. In the training phase, the MSE error between the original image and the reconstructed image can be calculated as the loss function, denoted as:

(5)$${L}_2=\frac{1}{h \times w \times n}\sum_{j=1}^{h \times w \times n}\left({S}_j-\hat{{S}}_j\right)^2$$

2.2 System design

2.2.1 Transceiver model design

Considering the superiority of convolutional neural network (CNN) in field of image processing, we utilize deep CNN for semantic encoding and semantic decoding. The up sample module and the down sample module are implemented based on the identical DL library. The framework diagrams of NN-based transmitter and receiver are illustrated in Fig. 2(a) and (d). To extract the image’s semantic information, we design the encoder block comprising a convolution layer, a generalized divisive normalization (GDN) layer and an activation function layer, as shown in Fig. 2(b). The convolution layer are parameterized by ${k}\times {k}\times {c}\mid {s}\mid {p}$, where k, c, s and p are kernel size, number of kernels, stride, and padding size respectively. The GDN layer is a normalization layer appropriate for image reconstruction [16]. The activation function layer introduces nonlinear transformation to NN. Based on the encoder block, we design the encoder residual block shown in Fig. 2(c) to extract more comprehensive information without encountering the issue of gradient vanishing. Since the semantic decoder module realizes the inverse function of the semantic encoder module, decoder block and decoder residual block shown in Fig. 2(e) and (f) invert the operations of encoder block and encoder residual block. Specifically, the IGDN layer and transpose convolution layer are employed as replacements for the GDN layer and convolution layer, respectively.

Fig. 2. (a) Overview of the transmitter NN architecture. (b) The architecture of the encoder block. Conv: convolution layer; GDN: generalized divisive normalization layer (c) The architecture of the encoder residual block. (d) Overview of the receiver NN architecture. (e) The architecture of the decoder block. TransConv: transpose convolution layer; IGDN: inverse generalized divisive normalization layer (f) The architecture of the decoder residual block.

Download Full Size | PDF

At the transmitter, the input image is first pixel-normalized, ensuring that each pixel value range in $\left [0,1\right ]$. After a sequence of encoder blocks and encoder residual blocks, the image’s semantic information is extracted and transformed into symbols. The number of kernels of the last encoder block determines the BCR. It’s noted that the activation function of last block is hardtanh, which constrains the amplitude of symbols within the range of $\left [-1,1\right ]$ to mitigate excessive peak-to-average power ratio. The activation function employed in other blocks is parametric rectified linear unit (PReLU), which enhances the generalization capability of NN and expedites convergence [17]. Then, the reshape layer reshapes symbols into a one-dimensional vector. The up sample module followed converts it into power-normalized samples, which are transmitted over the channel.

At the receiver, the received samples are initially transformed into symbols through matched filtering and downsampling in the down sample module. Subsequently, the symbols are reshaped into a three-dimensional matrix, which undergoes processing in the decoder blocks and decoder residual blocks to reconstruct image. The activation function utilized in all blocks, except for the last one, is PReLU. In order to limit the value of each pixel to $\left [0,1\right ]$, a sigmoid function layer is employed in the final block. Finally, the pixel denormalization layer ultimately rescales the values of pixels back to their initial range.

2.2.2 Channel model design

For UWOC systems, the optical signals can be regarded as time series data. The long short-term memory (LSTM), as a prototypical recurrent neural network, exhibits exceptional efficacy in handling long temporal sequences and effectively addresses the issues of gradient vanishing and exploding [18]. The structure of a LSTM unit at the moment of t is shown in Fig. 3, which contains forget gate $F_{t}$, input gate $I_{t}$, output gate $O_{t}$ and temporal cell state $\hat {C_{t}}$. The gates and the temporal cell state serve as control units to selectively passing valuable information while filtering out irrelevant information. Suppose the input of a LSTM unit includes previous cell state $C_{t-1}$, previous hidden memory $H_{t-1}$ and current input $D_{t}$. The current cell state $C_{t}$ and the current hidden memory $H_{t}$ can be obtained according to the following equation [19]:

(6)$$F_{t}=sigmoid(w_{F}\cdot[H_{t-1},D_{t}]+b_{F})$$

(7)$$I_{t}=sigmoid(w_{I}\cdot[H_{t-1},D_{t}]+b_{I})$$

(8)$$O_{t}=sigmoid(w_{O}\cdot[H_{t-1},D_{t}]+b_{O})$$

(9)$$\hat{C_{t}}={{tanh}}(w_{C}\cdot[H_{t-1},D_{t}]+b_{C})$$

(10)$$C_{t}=F_{t}\odot C_{t-1}+I_{t}\odot \hat{C_{t}}$$

(11)$$H_{t}=O_{t}\odot tanh(C_{t})$$

where $w_{F}$, $w_{I}$, $w_{O}$, $w_{C}$, $b_{F}$, $b_{I}$, $b_{O}$ and $b_{C}$ denote the weights matrices and bias vector of forget gate, input gate, output gate and temporal cell state, respectively. The $\odot$ operator and the $\cdot$ operator denote element-wise multiplication and matrix multiplication, respectively.

Fig. 3. The structure of a LSTM unit at the moment of t

Download Full Size | PDF

Motivated by the impressive performance of LSTM, we employ it for modeling the underwater wireless optical channel to simulate the propagation of optical signals. The framework diagram of the channel model, depicted in Fig. 4, is composed of an input layer, a LSTM module, a fully connected module, and an output layer. $T$ time-step optical samples are sent to the network as input vectors. The LSTM module consists of multiple layers, each containing $T$ LSTM units. The fully connected module is comprised of multiple fully connected layers, each equipped with a leaky rectified linear activation function to improve its nonlinear capability. After passing through them, the corresponding optical samples are output in time sequence.

Fig. 4. Framework diagram of the LSTM-based channel model with input layer, LSTM module, fully connected module and output layer.

Download Full Size | PDF

2.2.3 Training algorithm

The proposed UWOSC system is trained with the objective of optimizing communication resource utilization by effectively transmitting and receiving the image’s semantic information, while ensuring robustness against channel impairments. To do this, NN-based transmitter, channel, and receiver are jointly trained and different channel conditions are considered. We apply a two-phase training approach to optimize the UWOSC system, as depicted in Algorithm 1.

In the first phase, the LSTM networks employed for modeling channels are trained. To characterize the channels under different conditions, each channel condition corresponds to a NN. Suppose a total of m channel conditions are considered. For each network, the following operations are performed: Firstly, the NN parameters are initialized. Then, we build the dataset, which consists of randomly generated samples and received channel-impaired samples, and divide it into a training set and a validation set. Afterwards, a minibatch of the training dataset is inputted into the NN, which generates predicted samples. The convergence is calculated based on Eq. (4), and the NN parameters are updated iteratively using adaptive moment estimation (ADAM) optimization method [20]. The training process continues until the maximum number of iteration is reached.

In the second phase, the NN-based transceiver and channel are trained jointly. It is worth mentioning that LSTM networks trained in the first phase are frozen in this phase, i.e., the parameters remain unaltered. In order to enable the transceiver to efficiently semantic encode and semantic decode in different channel environments, the following operations are carried out: Firstly, the NN parameters of transmitter and receiver are initialized. Then, a batch of images is processed by the semantic encoder module and up sample module of the transmitter to obtain time-series samples. Subsequently, we randomly select one network from m trained channel networks. The received channel-impaired samples are predicted based on the selected channel network. Then, the down sample module and semantic decoder module of receiver utilize the predicted samples to reconstruct an estimation of the original image. Finally, the NN-based transceiver is optimized by ADAM method, where the loss is computed according to Eq. (5). Similarly, the training process continues until the maximum number of iterations is reached.

3. Experimental setup

Figure 5 shows the block diagram and experimental setup for comparing the image transmission performance of the proposed UWOSC system with that of the conventional UWOC system. At the transmitter, the same number of transmitted samples generated by two different schemes are loaded into an arbitrary waveform generator (AWG) operating at a sampling rate of 50 kSa/s. The signals and the DC bias are combined through a bias tee to drive the 450 nm blue LD with a typical output power of 1600 mW. The LD is equipped with a lens that focuses the lights. To emulate the underwater environment, we use a $0.1\times 0.2\times 1$ $m^3$ water tank wrapped with black stickers. The black stickers can reduce the interference of light reflected from the side walls. At the receiver, the 450 nm beam is captured and converted into electrical signals by a commercially available APD. The 3.5 GHz bandwidth oscilloscope with a sampling rate of 1 GSa/s is used to store the signals. Then the digital signals are processed by the corresponding schemes.

Fig. 5. Block diagram and experimental setup for emulated underwater wireless optical communication

Download Full Size | PDF

Algorithm 1. Train the whole UWOC-SC system.

View Table | View all tables in this article

The experimental purpose is to compare the transmission performance of two schemes in underwater environments with varying turbidity levels and under different BCRs. We first add 10 liters of pure water to the water tank. A total of 10 different turbidity levels are emulated by adding 10 mg ${Mg(OH)}_2$ to the water each time. For each channel environment, we consider three cases with BCR values of 1/8, 1/12, and 1/16. The performance is evaluated by calculating the peak signal-to-noise ratio (PSNR) and the structure similarity (SSIM) [21]. A higher PSNR represents a smaller MSE of corresponding pixels in two images. A higher SSIM indicates a greater similarity in terms of brightness, contrast, and structure between images.

With respect to the proposed UWOSC scheme, the NNs structure and parameters of UWOSC system are listed in Table 1. Before being deployed to a real communication system, the NN-based transceiver needs to be trained using the designed two-phase training method. Firstly, we train the corresponding NNs for four different channel environments with impurity concentrations of 3, 4, 5 and 6 mg/L respectively. For each channel environment, we collect data in the experimental platform to build the dataset, which consists of 1.2 million training data and 60,000 validation data. We set the initial learning rate to 0.08 and decrease it by 0.8 every 10 epochs. The batch size and the maximum number of training epochs are set to 300,000 and 50 respectively. Afterwards, the NN-based transceiver is trained. The adopted dataset is enhancing underwater visual perception (EUVP) dataset [22], which consists of 16930 underwater images captured by seven different cameras. The dataset is divided into a training set, a validation set, and a test set. Considering the inconsistent sizes of dataset images, we randomly crop each image into a patch of size $64\times 64$. The learning rate, the batch size and the maximum number of training epochs are set to $10^{-4}$, 200 and 400, respectively. After the training process, we randomly select an image from the test dataset and transmit it to evaluate the performance.

Table 1. Detailed NNs structure and parameters of the UWOSC system

View Table | View all tables in this article

For comparison, we conduct experiments employing conventional communication schemes. At the transmitter, the image is encoded into bits by the the most widely used image compression algorithm, joint photographic experts group (JPEG). We select the 1/2 LDPC (the near-optimal channel code) for channel coding. 8-QAM, 16-QAM and 32-QAM are used as the modulation formats. The number of modulation symbols is the channel bandwidth involved in calculating the BCR. Then, the symbols are resampled, pulse-shaped, and power-normalized in the same way as the proposed UWOSC scheme. At the receiver, the image can be reconstructed through inverse processes corresponding to the transmitter.

4. Experimental results and discussion

Figure 6 shows PSNR and SSIM performance of the proposed UWOSC scheme and the conventional communication schemes as a function of impurity concentration for BCR values of 1/8, 1/12 and 1/16 respectively. The UWOSC scheme performs well in the untrained impurity concentrations (i.e. when the impurity concentrartion is 0, 1, 2, 7, 8, or 9 mg/L), indicating that UWOSC is capable of operating efficiently in untrained channel environments. From Fig. 6(a) to (f), it can be observed that under the same BCR, the proposed UWOSC outperforms the conventional communication schemes in all channel environments. It’s worth stating that the conventional schemes fail to convert the bits obtained through source decoding into a picture when the impurity concentration exceeds a certain threshold. The constellations of points A1, A2, A3, B1, B2, B3, C1, C2, and C3 are the constellations of the critical points where three conventional schemes can reconstruct the images under three BCRs, respectively. The results show that under three BCRs, 32QAM, 16QAM and 8QAM become invalid when the impurity concentration reaches 4, 7 and 8 mg/L, respectively. This means that the conventional communication scheme with a lower modulation order is more robust to the channel environment. However, even with an impurity concentration as high as 9 mg/L, the UWOSC scheme operates effectively and reconstructs the high-quality images under all three BCRs. The results demonstrate the superior adaptability of the UWOSC system in open and complex underwater environments.

Fig. 6. Transmission performance comparison of UWOSC with conventional communication schemes. PSNR versus impurity concentration when (a) BCR=1/8, (c) BCR=1/12 and (e) BCR=1/16. SSIM versus impurity concentration when (b) BCR=1/8, (d) BCR=1/12 and (f) BCR=1/16.

Download Full Size | PDF

Furthermore, the BCR is smaller (i.e., the number of transmitted symbols are smaller), the superiority of the UWOSC scheme is more obvious. To compare the performance of different schemes, we select JPEG+LDPC+8QAM as the baseline scheme and calculate the relative performance improvement of other schemes under three BCRs when impurity concentration is 0mg/L. Table 2 shows the results of the statistical data. When BCR is reduced successively from 1/8 to 1/12 and then to 1/16, the PSNR performance improvement of the UWOSC scheme increases from 11.59% to 20.48% and further to 26.92%. However, the performance improvement of other conventional communication schemes is not significant. The result demonstrates the superior efficiency of the UWOSC scheme in compressing source redundancy compared to conventional communication schemes. Even with a limited number of transmitted symbols, UWOSC exhibits exceptional capability in reconstructing high-quality images. The performance improvement of the UWOSC scheme in terms of SSIM is more pronounced compared to PSNR. When BCR decreases, the SSIM performance improvement of the UWOSC scheme grows from 13.97% to 31.32% and then to 51.90%. This demonstrates that the UWOSC scheme can acquire image features in a superior way compared to conventional communication schemes, thus effectively preserving the brightness, contrast and structure of the image to meet the human visual perception.

Table 2. The performance improvement values of other communication schemes in terms of PSNR and SSIM compared to the baseline scheme (JPEG+LDPC+8QAM) under three BCR conditions when the impurity concentration is 0 mg/L.

View Table | View all tables in this article

Finally, Fig. 7 presents a visual comparison of the reconstructed images for both conventional communication schemes and the UWOSC scheme under three BCRs in channel environments with impurity concentrations of 0 and 9 mg/L. The PSNR and SSIM values are reported below each image. It is evident that a reduction in BCR leads to the presence of pigmented blocks in images using conventional schemes, while the visual perception quality of images using the UWOSC scheme consistently remains excellent. When the impurity concentration reaches 9 mg/L, the conventional communication schemes fail to accomplish the communication task. However, the human visual quality of the reconstructed images obtained by the UWOSC scheme is comparable to that obtained at an impurity concentration of 0 mg/L.

Fig. 7. Reconstructed images when the impurity concentrations are 0 and 9 mg/L. PSNR and SSIM are used to evaluate image quality.

Download Full Size | PDF

The above experiments have verified the superiority of the UWOSC system. However, there remains a substantial body of research that necessitates investigation. Firstly, the practicality of the UWOSC system is demonstrated by its successful operation in untrained channel environments. However, the current research work is still at the laboratory stage, and investigation into deploying NN-based transceiver to real marine devices holds significant research value. Secondly, thanks to the globally optimizable end-to-end architecture and the two-phase training strategy, the UWOSC system can robustly handle extremely harsh channel environments. However, we solely focus on the impact of water turbidity in this paper. It is worth investigating more intricate channel environments that encompass turbulence, intense ambient light, and other factors. In our future work, we plan to introduce deep transfer learning into the UWOSC system, where the transceiver NNs will be further trained in varying channel conditions. Lastly, the designed UWOSC system utilizes AI techniques to efficiently compress information by extracting semantic information from images, thereby achieving high-quality image transmission even with limited resources. Nevertheless, the performance of the UWOSC system is expected to be enhanced through the utilization of advanced AI technologies.

5. Conclusion

In this paper, we propose and experimentally demonstrate an UWOSC system based on semantic communication paradigm for transmitting images. By modeling the NN-based transceiver and channel, as well as implementing a two-phase training strategy, the system can extract and recover semantic information that is appropriate for transmitting over underwater channels. Experimental results demonstrate that the UWOSC scheme outperforms conventional communication schemes, particularly in low BCR regimes and extremely turbid environments. Therefore, the robustness of UWOSC scheme to complex and dynamic underwater environments is crucial for improving the transmission range of UWOC system. Moreover, the UWOSC system significantly reduces the amount of transmission data by efficiently extracting semantic information, which is expected to meet the future demand for image transmission in the underwater internet of things. In future work, the UWOSC system can be enhanced by considering comprehensive channel environments and adopting advanced AI techniques.

Funding

National Natural Science Foundation of China (62371058); National Key Research and Development Program of China (No.2013CB329205).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. Z. Zeng, S. Fu, H. Zhang, et al., “A survey of underwater optical wireless communications,” IEEE Commun. Surv. Tutorials 19(1), 204–238 (2017). [CrossRef]

2. X. Sun, C. H. Kang, M. Kong, et al., “A review on practical considerations and solutions in underwater wireless optical communication,” J. Lightwave Technol. 38(2), 421–431 (2020). [CrossRef]

3. N. Saeed, A. Celik, T. Y. Al-Naffouri, et al., “Underwater optical wireless communications, networking, and localization: A survey,” Ad Hoc Networks 94, 101935 (2019). [CrossRef]

4. X. Chen, W. Lyu, Z. Zhang, et al., “56-m/3.31-gbps underwater wireless optical communication employing nyquist single carrier frequency domain equalization with noise prediction,” Opt. Express 28(16), 23784–23795 (2020). [CrossRef]

5. J. Zhang, Y. Yang, Z. Gao, et al., “Performance analysis of ldpc codes for wireless optical communication systems in different seawater environments,” in 2018 Asia Communications and Photonics Conference (ACP), (2018), pp. 1–3.

6. J. Shi, W. Niu, Z. Li, et al., “Optimal adaptive waveform design utilizing an end-to-end learning-based pre-equalization neural network in an uvlc system,” J. Lightwave Technol. 41(6), 1626–1636 (2023). [CrossRef]

7. M. Jahanbakht, W. Xiang, L. Hanzo, et al., “Internet of underwater things and big marine data analytics—a comprehensive survey,” IEEE Commun. Surv. Tutorials 23(2), 904–956 (2021). [CrossRef]

8. C. E. Shannon, “A mathematical theory of communication,” The Bell Syst. Tech. J. 27(3), 379–423 (1948). [CrossRef]

9. P. Basu, J. Bao, M. Dean, et al., “Preserving quality of information by using semantic relationships,” in 2012 IEEE International Conference on Pervasive Computing and Communications Workshops, (2012), pp. 58–63.

10. K. Niu, J. Dai, S. Yao, et al., “A paradigm shift toward semantic communications,” IEEE Commun. Mag. 60(11), 113–119 (2022). [CrossRef]

11. H. Xie, Z. Qin, G. Y. Li, et al., “Deep learning enabled semantic communication systems,” IEEE Trans. Signal Process. 69, 2663–2675 (2021). [CrossRef]

12. E. Bourtsoulatze, D. Burth Kurka, and D. Gündüz, “Deep joint source-channel coding for wireless image transmission,” IEEE Trans. Cogn. Commun. Netw. 5(3), 567–579 (2019). [CrossRef]

13. D. Huang, F. Gao, X. Tao, et al., “Toward semantic communications: Deep learning-based image semantic coding,” IEEE J. Select. Areas Commun. 41(1), 55–71 (2023). [CrossRef]

14. S. Wang, J. Dai, Z. Liang, et al., “Wireless deep video semantic transmission,” IEEE J. Select. Areas Commun. 41(1), 214–229 (2023). [CrossRef]

15. Y. Zhao, P. Zou, W. Yu, et al., “Two tributaries heterogeneous neural network based channel emulator for underwater visible light communication systems,” Opt. Express 27(16), 22532–22541 (2019). [CrossRef]

16. J. Ballé, V. Laparra, E. P. Simoncelli, et al., “Density modeling of images using a generalized normalization transformation,” arXiv, arXiv:1511.06281 (2016). [CrossRef]

17. K. He, X. Zhang, S. Ren, et al., “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in 2015 IEEE International Conference on Computer Vision (ICCV), (2015), pp. 1026–1034.

18. S. Deligiannidis, A. Bogris, C. Mesaritakis, et al., “Compensation of fiber nonlinearities in digital coherent systems leveraging long short-term memory neural networks,” J. Lightwave Technol. 38(21), 5991–5999 (2020). [CrossRef]

19. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput. 9(8), 1735–1780 (1997). [CrossRef]

20. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv, arXiv:1412.6980 (2015). [CrossRef]

21. Z. Wang, A. Bovik, H. Sheikh, et al., “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

22. M. J. Islam, Y. Xia, and J. Sattar, “Fast underwater image enhancement for improved visual perception,” IEEE Robot. Autom. Lett. 5(2), 3227–3234 (2020). [CrossRef]

	layer/module name	parameter
Transmitter	Pixel normalization layer	None
	Encoder block1, 2	$5 \times 5 \times 256 ∣ 2 ∣ 2$
	Encoder residual block1, 2	$5 \times 5 \times 256 ∣ 1 ∣ 2$
	Encoder block3	$5 \times 5 \times 3 / 4 / 6 ∣ 1 ∣ 2$
	Reshape layer	None
	Upsample layer	Time of upsampling: 6
	Pulse Shaping layer	Length of filter: 49
	Power normalization layer	Power constraint: 0.5
Channel	Input layer	Time step: 6
	LSTM module	Number of layers: 3
	LSTM module	Input size: 1, Hidden size: 32
	Fully connected module	Number of layers: 2
	Fully connected module	Number of neurons in each layer: 32
	Output layer	Output dimension: $6 \times 1$
Receiver	Matched filtering layer	Length of filter: 49
	Downsample layer	Time of downsampling: 6
	Reshape layer	None
	Decoder block1	$5 \times 5 \times 256 ∣ 1 ∣ 2$
	Decoder residual block1, 2	$5 \times 5 \times 256 ∣ 1 ∣ 2$
	Decoder block2	$5 \times 5 \times 256 ∣ 2 ∣ 2$
	Decoder block3	$5 \times 5 \times 3 ∣ 2 ∣ 2$
	Pixel denormalization layer	None

BCR	JPEG+LDPC+16QAM	JPEG+LDPC+32QAM	UWOSC
BCR	(PSNR/SSIM)	(PSNR/SSIM)	(PSNR/SSIM)
1/8	5.39%/7.05%	8.81%/11.46%	11.59%/13.97%
1/12	6.32%/11.53%	10.66%/19.15%	20.48%/31.32%
1/16	7.98%/19.65%	12.95%/29.74%	26.92%/51.90%

	layer/module name	parameter
Transmitter	Pixel normalization layer	None
	Encoder block1, 2	$5 \times 5 \times 256 ∣ 2 ∣ 2$
	Encoder residual block1, 2	$5 \times 5 \times 256 ∣ 1 ∣ 2$
	Encoder block3	$5 \times 5 \times 3 / 4 / 6 ∣ 1 ∣ 2$
	Reshape layer	None
	Upsample layer	Time of upsampling: 6
	Pulse Shaping layer	Length of filter: 49
	Power normalization layer	Power constraint: 0.5
Channel	Input layer	Time step: 6
	LSTM module	Number of layers: 3
	LSTM module	Input size: 1, Hidden size: 32
	Fully connected module	Number of layers: 2
	Fully connected module	Number of neurons in each layer: 32
	Output layer	Output dimension: $6 \times 1$
Receiver	Matched filtering layer	Length of filter: 49
	Downsample layer	Time of downsampling: 6
	Reshape layer	None
	Decoder block1	$5 \times 5 \times 256 ∣ 1 ∣ 2$
	Decoder residual block1, 2	$5 \times 5 \times 256 ∣ 1 ∣ 2$
	Decoder block2	$5 \times 5 \times 256 ∣ 2 ∣ 2$
	Decoder block3	$5 \times 5 \times 3 ∣ 2 ∣ 2$
	Pixel denormalization layer	None

BCR	JPEG+LDPC+16QAM	JPEG+LDPC+32QAM	UWOSC
BCR	(PSNR/SSIM)	(PSNR/SSIM)	(PSNR/SSIM)
1/8	5.39%/7.05%	8.81%/11.46%	11.59%/13.97%
1/12	6.32%/11.53%	10.66%/19.15%	20.48%/31.32%
1/16	7.98%/19.65%	12.95%/29.74%	26.92%/51.90%

Design and experimental demonstration of underwater wireless optical communication system based on semantic communication paradigm

Abstract

1. Introduction

2. Principles

2.1 System model

2.2 System design

2.2.1 Transceiver model design

2.2.2 Channel model design

2.2.3 Training algorithm

3. Experimental setup

4. Experimental results and discussion

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (7)

Tables (3)

Equations (11)

Optics Express