Dense D2C-Net: dense connection network for display-to-camera communications

Nilesh Maharjan; Lakpa Dorje Tamang; Byung Wook Kim

doi:10.1364/OE.498067

1. Introduction

Optical camera communication (OCC) devices use commonly accessible components such as light-emitting diodes (LEDs) and commercial off-the-shelf image sensors to detect spatial and temporal fluctuations in light intensity, which allows them to operate in the visible light spectrum (380–700 nm) [1,2]. As part of optical wireless communications [3–5], OCC has notable benefits such as substantial amounts of unregulated spectrum, link-level security, and easy availability. Display-to-camera communication (D2C) [6–8] is a type of OCC that allows a display screen to transmit data to a camera, displaying patterns that are invisible to the human eye. The popularity of mobile phone cameras and display screens has led to a growing interest in D2C technology. For instance, D2C has become a practical choice for content-oriented display technology such as digital signage, which has resulted in a wide range of applications for D2C systems in information technology [9,10].

D2C is an appealing candidate for short-range machine-to-machine communication over a display-to-camera link [7–15]. The transmitter in a D2C communication system provides digital multimedia content, such as images and videos, and embedded data via electronic displays. Since the embedded data may alter an image’s appearance and underlying statistics, embedding should consider mitigation against impacting the integrity of the displayed content. In particular, the hidden data should be imperceptible to the human eye because visible image modifications reduce its quality and aesthetic value. On the receiving end, the camera captures any visual content displayed on the screen, followed by decoding the embedded data. During transmission in a wireless D2C link, the embedded data must resist signal processing distortions from the channel.

Recent research on data embedding in display images has focused on spatial and frequency domains as well as two-dimensional barcodes like QR codes [6–10]. However, the minimal display area prevents transmission of a significant amount of information, and extensive effort is necessary to capture the codes. Motivated by ideas for facilitating high-capacity data embedding, better synchronization, and accurate code extraction on the receiver side, researchers began to investigate specially designed two-dimensional color barcodes [7,8] that allow smooth transmission of enormous amounts of information. These approaches do not permit data concealment in visually recognized multimedia content. Spatial data-embedding techniques were introduced [11,12] to address this limitation, enabling simultaneous data transfer across optical wireless channel conditions without affecting multimedia visual quality while also considering geometric attacks and the viewing experience. This allows concurrent data transmission through optical wireless channel conditions without impairing the visual quality of the content. Nevertheless, spatially embedded data are sensitive to noise from optical wireless channels, necessitating advanced modulation schemes for reliable data communication. To incorporate the desired data, these techniques use the spectral component of a digital image [13–15] and inject small perturbations in selected coefficients. Because this approach uses robust coefficients of a spectral domain image, it is less prone to distortion during transmission across an optical wireless channel even in the presence of noise.

Recent breakthroughs in artificial intelligence (AI) have resulted in widespread industrial adoption, with machine learning systems surpassing humans in a variety of tasks if provided with sufficient data. Specifically, the deep convolutional neural network (DCNN) [16–18], a multilayer feed-forward network developed from various deep learning (DL) models, has been widely used for image steganalysis [19,20] and image watermarking [21–24]. In previous work, cover dependent deep hiding (DDH) networks are utilized to enclose full-scale color images inside another image [25–27]. In another study, the universal deep hiding (UDH) network architecture separates the encoding of the hidden picture from that of the cover image [28]. Also, DCNNs are employed to combat objects’ low resolution, noisy representation, and the possibility of color distortions in the captured images [30–35]. Recent Deep D2C-Net [35] studies explored the possibilities of increasing the network’s feature extraction capabilities to demonstrate robust performance in a D2C environment using hybrid layers.

However, the Deep D2C-Net architecture has a weakness in merging feature maps effectively enough to retain important features from both cover image and secret data. This shortcoming can result in information loss and degraded quality in the embedded image. In practical applications that require high visual quality and a low bit error rate (BER), the alignment between transmitter and receiver when they are not parallel to each other can result in a higher BER, which may pose a challenge to achieving optimal performance. To address these shortcomings and to increase the visual quality of the encoded images, we developed a novel Dense D2C-Net that enables real-time data encoding and decoding from display images. Using feature reuse techniques in combination with hybrid layers obtained by concatenation of feature maps of the cover image and secret data, the proposed Dense D2C-Net architecture can produce feature maps with high spatial resolution. This architecture preserves the spatial information of the image throughout the network, and makes it possible to learn complex and non-linear relationships between the input image and the secret information. The structure consists of multiple layers in which only the Y components of the input YUV image are utilized to embed secret data in order to minimize distortion in the embedded image. The Y channel is robust against image processing techniques, including compression, when a higher density of bits is allocated to the Y component in comparison to the U and V components [36]. The decoder receives the data-embedded images captured by the camera and runs them through a series of 2D convolutional layers to extract the hidden data. Based on our experiments under a variety of environmental parameters (including capture angle, transmission distance, camera resolution, and ambient light), the proposed scheme can outperform current state-of-the-art DCNN-based data-embedding and extraction approaches by providing an excellent peak signal-to-noise ratio (PSNR) and good BER performance for a short-distance D2C link.

The following contributions from this study improve visual quality and BER in a D2C environment.

• The Dense D2C-Net architecture enables feature reuse, providing each layer of the cover image access to features from the previous layers, enabling the network to learn complicated representations of input images, which helps improve the visual quality of the image.
• The data embedding process is conducted on the Y components of an image, which contains the image’s luminance information, rather than on chrominance information (U and V components). Because the luminance component demonstrates greater resilience to image processing methods like compression, using the Y components is robust in the D2C communication environment.
• The real-world experiments with several environmental parameters (e.g., transmission distance D, capture angle θ, camera resolution, and ambient light) show that the proposed scheme overcomes existing state-of-the-art DCNN-based embedding and extraction approaches by providing excellent BER performance for a short-distance D2C link.

The rest of this paper is structured as follows. A brief overview of previous research on steganography is given in Section 2. In Section 3, the novel Dense D2C-Net model's elaborate encoder-decoder network architecture is proposed. Analysis and evaluation of the findings from several experiments are presented in Section 4. Finally, Section 5 offers concluding remarks.

2. Related works

AI has recently made significant progress, which has greatly increased our understanding of how to bridge the gap between human and machine capabilities. To do this, several DL techniques have been used to build usable representations from the given data to address real-world issues. If given sufficient data, DL models can learn to solve real-world issues more accurately than humans. From the various DL models, researchers have shown a great deal of interest in using DCNNs to embed and extract data via fully automated D2C links because of their superiority in extracting key characteristics from raw data. These methods consider visual imperceptibility of information concealed in an image as well as resistance against image distortions.

To hide data in large amounts, Baluja [25] introduced the first DL-based architecture that encloses a full-scale color image inside another image of the same size. However, the concealed images are slightly noticeable in residual images from the resulting embedded images. Also, the architecture makes use of three networks: preparation, concealment, and revelation, commonly known as the DDH network, which consumes a lot more GPU memory and requires longer embedding time [26]. Later studies demonstrated that the preparation network is not required, and concealment can be coupled with revelation to create a single network [27].

Furthermore, Zhang et al. [28] suggested a novel UDH architecture that is fundamentally distinct from DDH in that it separates the encoding of the hidden picture from that of the cover image. The results show that secret images are encoded into repeating high-frequency components, which makes it easier to understand the encoding process. UDH also increases versatility by creating a container allowing the encoded secret image to be combined directly with any random cover image.

Research by Zhang and colleagues [29] suggested the SteganoGAN model, which adapts well-known DL frameworks like ResNet [30] and DenseNet [31] for encoding text into images. However, data retrieval is constrained by poor resolution and noisy representations. To replicate data loss owing to the corrosive nature of the optical wireless channel and the phase noise brought on by the camera's out-of-focus effect, a D2C model trained with noise layers must be included. DCNN-based encoder and decoder models must be trained with noise layers; otherwise, communication performance over an actual D2C channel cannot be guaranteed. A comparable study conducted by Zhu et al. [32] introduced a DCNN collection that introduces various types of noise between encoding and decoding to increase robustness, but they only focused on the set of corruptions that would occur through digital image manipulation (e.g., JPEG compression and cropping). The model produced information with amazing visual quality, but it ignores the potential for color distortions in the collected images brought on by illumination changes in the optical channel.

Color distortions typically have a big influence on embedded data, and are a danger to proper data acquisition. The StegaStamp model suggested by Tancik et al. [33] uses a U-Net framework [34] to encode text information inside digital images. In an optical wireless connection, those researchers successfully demonstrated real-time decoding of hyperlinks from physically printed photographs. To address channel distortions in multimedia content, the model included pixel-wise perturbations that account for variations made to individual pixels in an image, and spatial perturbations that explain changes made to the overall spatial arrangement of pixels. However, the display quality of multimedia content was degraded, resulting in a trade-off between visual quality and communication performance. Communication performance suffers significantly when the distance between transmitter and receiver is great, or when images are captured at different resolutions and under various lighting conditions. This may be explained by a decoding network's poor feature extraction capabilities; hence, the capabilities must be increased to demonstrate robust performance in the D2C environment.

Another Deep D2C-Net uses hybrid layers in the encoder in which concurrent feature maps of the intended data and cover images are concatenated in feed-forward fashion [35], improving visual quality and communication performance. Although the method provides relatively better visual quality and BER performance than previous DCNN-based algorithms, it has two significant shortcomings. The first is insufficient merging of feature maps, which is necessary to preserve important features in both cover image and secret image. This leads to loss of information, and potentially degrades the quality of the embedded image. If the alignment between D2C transmitter and receiver is not parallel, corrupted bits (incorrectly decoded from the embedded image) dramatically increase during transmission. Therefore, to show robust communication performance in a more realistic situation, a more advanced DCNN network structure and a new approach to embedding data in images are required.

3. Proposed system

The D2C scenario for this study included a transmitter that displays an image combined with hidden binary data to deliver visual information via electronic display. In particular, the hidden information must be invisible to the naked eye while maintaining accuracy in the content displayed. The camera captures the visual information and subsequently decodes the data. Given that a wireless D2C link is used for transmission, the embedded data must be resistant to various forms of wireless optical channel signal processing distortions. Figure 1 demonstrates an overview of the Dense D2C-Net system. A binary data vector, $\boldsymbol{b}$, and a digital cover image, ${\boldsymbol{I}_{\boldsymbol{o}}} \in {\boldsymbol{R}^{\boldsymbol{H} \times \boldsymbol{W} \times {\mathbf{C}}}}$ (H, W, and C are height, width, and the number of channels for ${\boldsymbol{I}_{\boldsymbol{o}}}$) are input into the Dense D2C-Net encoder at the transmitter. Before transmitting $\boldsymbol{b}$ to the Dense D2C-Net encoder $\boldsymbol{b}$ is reshaped and upsampled to obtain upsampled data $\boldsymbol{d}$ in a 2D space with dimensions H ${\times} $ W ${\times} $ 3. The goal of upsampling is to make the feed-forward process easier by concatenating the feature maps of both the cover image and the input data. The goal of upsampling is to make the feed-forward process easier by concatenating the feature maps of both the cover image and the input data. After the training session, the encoder, $\boldsymbol{\varepsilon }({{\boldsymbol{I}_{\boldsymbol{o}}},\, \boldsymbol{d}} )$, produces a data-embedded image, ${\boldsymbol{I}_{\boldsymbol{E}}}$, with the same shape as ${\boldsymbol{I}_{\boldsymbol{o}}}$ and which is subsequently presented on screen for visualization while the data are collected by a camera for decoding.

Fig. 1. Overview of the proposed Dense D2C-Net.

Download Full Size | PDF

The acquired image in a real-world D2C communications scenario is subject to various distortions from several signal processing procedures that occur in the optical wireless channel. These disturbances tend to impact both the spatial and spectral domains of the transmitted ${\boldsymbol{I}_{\boldsymbol{E}}}$, therefore degrading the embedded data. As a result, many randomized noise layers are inserted during the training process to represent the influence of the optical channel on the transmitted picture, and to adjust for distortions. The noise layers are responsible for reducing spatial distortions such as additive noise, color changes due to variations in brightness and contrast, and image blurring as well as spectrum distortions resulting from the wireless transmission channel, such as those caused by JPEG compression. Similarly, geometric distortions due to unstable capture locations and camera orientation, such as rotation, scaling, and translation, occur in picture ${\boldsymbol{I}_{\boldsymbol{C}}}$ acquired at the receiver. As a result, a geometric correction technique is used to generate a corrected image, ${\boldsymbol{I}_{\boldsymbol{G}}}$, which is then decoded for data retrieval. $\boldsymbol{\delta }({{{\hat{\boldsymbol{I}}}_{\boldsymbol{E}}}} )$ is a trained decoder that generates output data $\hat{\boldsymbol{b}}$ by following a robust decoding method. Dense D2C-Net’s encoder and decoder are trained end-to-end with two main objectives: to decrease image reconstruction loss ${L_I}$ (the difference between the cover image and the encoded image) and data reconstruction loss ${L_D}$ (loss between input and output data).

Section 3.1 describes reshaping and upsampling the binary input data. In Section 3.2, the Dense D2C-Net encoder and decoder network structures are briefly explained. The noise layers utilized in optical wireless communication are explained in Section 3.3. Then, the training procedures of a fully end-to-end encoding and decoding network are described.

3.1 Reshaping and upsampling the input data

Figure 2 illustrates reshaping and upsampling the input binary data. As can be seen, 1D input binary data, $\boldsymbol{b} \in {\{ 0,1\} ^M}$ of length M, are initially fed into a fully connected (FC) layer with dimensions of 3072 ${\times} $ 1. We observed that adding this preprocessing to the message improves convergence [33], and allows training the network to an adequate degree of accuracy. The result of the FC layer, ${\boldsymbol{b}_{\boldsymbol{F}}}$, which is a 1D vector of length 3072, is reshaped to produce a 2D matrix with dimensions 32${\times} $32${\times} $3. Following reshaping, the 32${\times} $32${\times} $3 matrix is upsampled by interpolating it to obtain $\boldsymbol{d}$ with dimensions of H${\times} $W${\times} $3 (the same height and width as the cover image). It should be noted that reshaping and upsampling spatially duplicate the 1D message data in a 2D space. The intermediate representations of ${\boldsymbol{I}_{\boldsymbol{o}}}$ and $\boldsymbol{d}$ generated by their respective convolutional series are then combined to construct a hybrid layer. As a result, the textual information in $\boldsymbol{d}$ is embedded in the spatial position of ${\boldsymbol{I}_{\boldsymbol{o}}}$ in each hybrid layer of the Dense D2C-Net encoder via feature map concatenation.

Fig. 2. Reshaping and upsampling process.

Download Full Size | PDF

3.2 Encoding and decoding networks

As shown in Fig. 1, upsampled binary data $\boldsymbol{d}$ and the cover image are input for the Dense D2C-Net encoder network, which outputs data-embedded image ${\boldsymbol{I}_{\boldsymbol{E}}}$ for display. The hybrid layers are made by combining feature maps from both cover image ${\boldsymbol{I}_{\boldsymbol{o}}}$ and input binary data $\boldsymbol{d}$. Each convolutional layer in the cover image path receives input not only from the previous layer but also from all previous layers, creating a dense connection with all preceding layers. Densely-connected skip connection facilitates production of high-quality images during reconstruction by helping to preserve the features from earlier layers. In addition, a skip connection is inserted between the cover image and the ninth convolutional layer (Conv 9), which further enhances the information flow and helps the network learn both high-level and low-level features. As a result, the network can maintain prominent features like visual patterns and textures in the cover image. Furthermore, the feature maps of $\boldsymbol{d}$ are combined as an input of the convolution layer where the cover image was used as an input. The procedure is repeated up to the seventh hybrid layer, after which the output is sent to the final three 2D convolutional layers to produce ${\boldsymbol{I}_{\boldsymbol{E}}}$. Each hybrid layer incorporates $\boldsymbol{d}$ into ${\boldsymbol{I}_{\boldsymbol{o}}}$ through feature map concatenation and undergoes continuous end-to-end training. This enables the Dense D2C-Net encoder to learn how to incorporate the data into the cover image to minimize visible artifacts in ${\boldsymbol{I}_{\boldsymbol{E}}}$. Note that the YUV color space is designed to separate color information (chrominance) from brightness (luminance); the Y channel represents luminance, and the U and V channels are chrominance components. Here, the luminance component is more robust against image processing operations such as compression, in which higher density allocation of bits is provided for the Y component compared to U and V components [36]. As a result, even if the image is subjected to various processing operations, the embedded data in the Y component are more likely to remain intact and retrievable. Additionally, the Y channel is less sensitive to changes in color and saturation. Therefore, the proposed scheme uses Y channel of the cover image as an input to the network to have less impact on the visual quality of the image.

On the receiver side, the camera detects the data-embedded image from the electronic display. In our system, the captured image with embedded data is denoted ${\boldsymbol{I}_{\boldsymbol{C}}}$. The spatial domain of ${\boldsymbol{I}_{\boldsymbol{C}}}$ is geometrically distorted because the display content is transmitted via optical wireless channel, which continues to significantly alter the corners. To adapt geometric correction techniques to account for the distortions, before sending the image for decoding, we employ an image matching technique, oriented FAST and rotated BRIEF (ORB), which detects key points in the image and assigns to each key point an orientation such as left-facing or right-facing depending on how the levels of intensity change around that key point. These key points are then converted to binary feature vectors that are rotationally invariant and contain direction information. This information is utilized in the transformation of captured image ${\boldsymbol{I}_{\boldsymbol{C}}}$ into ${\boldsymbol{I}_{\boldsymbol{G}}}$. Also, the region of interest for the target image is defined by these key points. Thereafter, the image is sent directly to the decoder after geometric correction is finished, where the embedded data are extracted.

The Dense D2C-Net makes use of a DCNN-based intelligent decoder to implement end-to-end information recovery from the received signal. The data can be recovered as reliably as possible, even in a diverse array of less-than-ideal circumstances in the optical wireless channel [37], thanks to the DL-based decoder's ability to learn the intricate relationships between the received signal and the transmitted sequences of information. To counteract the effects of channel distortion, the decoder learns the intrinsic representations of the data from the received image, which is distorted in various ways through multiple noisy layers. The trained model achieves robust bit recovery performance in real-world environments because it closely resembles the underlying channel distortions that take place in any D2C system.

As shown in Fig. 1, the decoder comprises a single FC layer for classification and eight 2D convolutional layers for feature extraction. Since the input data consist of M bits, this classification layer uses a total of M binary classifiers, each of which recovers one bit (0 or 1). The final FC layer's classifier count corresponds to the total number of recovered data bit streams provided by $\hat{\boldsymbol{b}} \in {\{ 0,1\} ^M}$. Tables 1 and 2 contain information on the filter sizes as well as the input and output dimensions of the Dense D2C-Net encoder and decoder network.

Table 1. Encoder network details.

View Table | View all tables in this article

Table 2. Decoder network details

View Table | View all tables in this article

3.3 Robustness against the optical wireless channel

In D2C systems, the received data can be distorted because of several conditions when delivered over an optical wireless channel. The position and orientation of the camera, environmental factors like ambient lighting and background illumination, and several important signal processing elements like analog-to-digital (A/D) and digital-to-analog (D/A) conversions, significantly increase the risk of a degraded received signal. This deterioration of the received signal eventually results in the loss of embedded data. Therefore, a D2C system should be built considering the intrinsic features of the optical wireless channel to compensate for the effect of the channel and to overcome channel distortion.

To account for the impact of channel distortion in D2C systems, a robust noise model is included in the Dense D2C-Net encoder-decoder system. In our method, we employ several stochastic noise layers during the training session to both emulate the physical distortions of the acquired image and overcome the terrible impacts of the transmission channel on the embedded data. Data-embedded image ${\boldsymbol{I}_{\boldsymbol{E}}}$ is subjected to the following random sets of changes before being sent to the decoder:

(1)$$\widehat {{\textrm{I}_\textrm{E}}}\, = \, {I_E}\, + \, {B_L}({G\, } )+ \, {C_T}\, + \, N\, + \, {C_{JPEG\; }},$$

where ${B_L}(G )$ is Gaussian blur, ${C_T}$ is color transformation, N is additive Gaussian noise, and ${C_{JPEG}}$ is the JPEG compression layer. The series of modified images that simultaneously pass through several noise layers are shown in Fig. 3. As seen in Eq. (1), data-embedded image ${\boldsymbol{I}_{\boldsymbol{E}}}$ is first subjected to Gaussian blur, then random color transformation, Gaussian noise, then JPEG compression. Depending on how the wireless channel environment influences the transmitted image, the network's noise layers can be modified. When a camera takes a picture, a large D, a mismatched capture angle, and uneven camera movement cause undesirable image blur. Generally, to imitate the wireless communication channel, this image blur is approximated as phase noise of additive white Gaussian noise. Based on this, we transform data-embedded image ${\boldsymbol{I}_{\boldsymbol{E}}}$ and apply a Gaussian blur kernel in the noise layer with a zero mean and randomly sampled standard deviation.

Color distortions may appear in the image whenever it is captured from the display. For this, a color transformation strategy described in [35] is utilized to tackle changes to the hue component under which the photograph will look faded or washed out when taken under different lighting conditions. In addition, the additive Gaussian noise model is utilized to consider additional noise such as photon noise and shot noise [35]. Lastly, a differentiable JPEG approximation [38] is used where JPEG quality is uniformly sampled throughout to correct for image compression issues in high-frequency regions of the image [34]. To deliver the encoded data from the numerous distortions that might happen in the optical wireless channel, stochastic noise layers are added during training. This noise layers generalize the essential features of the optical wireless channel, making them adaptable to a variety of environmental conditions. Finally, a Dense D2C-Net model that is robust against optical channel distortion can be obtained by implementing all these sets of transformations in the training session.

Fig. 3. Sequence of noise layers between the Dense D2C-Net encoder and decoder

Download Full Size | PDF

3.4 Training

We trained the novel Dense D2C-Net in an end-to-end manner, which involves simultaneously training the encoder and decoder networks. The MS COCO dataset's 105,000 cover photos, which had been resized to the desired dimensions for ${\boldsymbol{I}_{\boldsymbol{o}}}$, served as the model's training data. To evaluate the model, we utilized 200 bits of binary data and 1000 test photos for the input; each test image was randomly selected from one of the possible classes in the Linnaeus dataset [39]. This dataset includes a total of five classes, each of which comprises 400 test images at 256 ${\times} $256 pixels. The network parameters shown in Table 3 were used to repeatedly tune the encoding and decoding networks. For training, Adam optimizer was utilized due to its superior computational efficiency and strong performance while training with large datasets. Additionally, all convolutional layers in the encoding and decoding networks used a rectified linear unit (ReLU) activation function, except for the last layer. As a result of its unsaturation property and low likelihood of vanishing gradients, the ReLU achieves faster convergence.

Table 3. Network Parameters

View Table | View all tables in this article

To achieve convergence during training, we created two loss functions, shown in Fig. 4, as part of our overall network loss. First, we imposed mean square error (MSE) loss ${L_I}$ between the cover picture and data-embedded image to visualize their proximity. Second, cross-entropy loss, ${L_D}$, was modified to compare binary data that had been encoded and decoded. Therefore, as shown in Eq. (2), the training objective was to reduce total loss ${L_T}$:

(2)$$\begin{array}{c}{L_I} = \frac{1}{{H \times W \times C}}\parallel {{\mathbf{I}}_{\mathbf{o}}} - \mathrm{\varepsilon }({{{\mathbf{I}}_{\mathbf{o}}},{\mathbf{d}}} ){\parallel ^2},\\{L_D} = \, \textrm{CE}({\mathrm{\delta }({\mathrm{\varepsilon }({{\boldsymbol{I}_{\boldsymbol{o}}},\boldsymbol{d}} )} ),\boldsymbol{b}} ),\\{L_T} = \textrm{Minimize}({{L_I} + {L_D}} ),\end{array}$$

Fig. 4. Image and data reconstruction loss of the Dense D2C-Net.

Download Full Size | PDF

To minimize the generalization error, a small batch size was maintained (at 4), resulting in a total of 26,250 steps being completed to process the entire batch, which constituted an epoch. The model was trained for 140,000 steps at a learning rate of 10⁻⁴. As shown in Fig. 5 (a) and (b), we can see the Dense D2C-Net learning curves evaluated using the training dataset. The figures demonstrate that the suggested model's training was completed up to the point at which visual and communication performance reached saturation.

Fig. 5. Learning curves during training the Dense D2C-Net: (a) encoded image loss, and (b) bit accuracy.

Download Full Size | PDF

4. Experiment analysis and evaluation

In this section, we demonstrate the performance of the proposed scheme in various D2C scenarios. For the performance comparison, PSNR and achievable data rate (ADR) were measured. In addition, BER is presented based on the different experimental environments, such as changing the angle and distance between the display screen and camera, adjusting display screen brightness under normal and ambient lighting, and varying the resolution of the camera.

In our experiment, the data-embedded image was shown on a digital screen, and the display was captured using a smartphone camera. The experimental setup for D2C communication is shown in Fig. 6, where the distance between the display screen and camera is denoted by D, and θ represents the camera’s orientation to the display. Table 4 provides a list of the parameters used in the experiment. The Samsung display had a resolution of 2150 ${\times} $ 1920 and a display rate (${R_D}$) of 60 Hz. We utilized an iPhone 11 with a camera capture rate (${R_C}$) set to 1080p HD at 60 frames per second for the receiver. The model was trained to embed 200 bits of binary data into 256 ${\times} $ 256 pixels images. The experiments were carried out indoors in an environment with typical ambient light from the ceiling.

Fig. 6. Experimental setup for the display-to-camera communication experiments.

Download Full Size | PDF

Table 4. Experimental Parameters

View Table | View all tables in this article

4.1 PSNR and BER performance comparisons

Table 5 presents the PSNR and BER performance comparisons for Dense D2C-Net versus several state-of-the-art data-embedding and steganography approaches based on DCNNs [29,32,33,35]. We evaluated communication performance by embedding 200 binary data bits within a single cover picture at 256 × 256 pixels by using 1000 cover images randomly selected from the Linnaeus dataset [39]. Distance D and angle θ between the transmitter and receiver were respectively fixed at 10 cm and 90° for practical testing. We can see that the HiDDeN model [32] resulted in a PSNR of 37.84 dB; however, the BER was not satisfactory over the optical wireless channel. Even though several noise layers were used to offset the effects of signal processing artifacts from the channel, the embedded data were still vulnerable to noise produced by the display device. When capturing the image, lens distortions and lighting fluctuations frequently caused noise such as color distortions and Moire patterns [40]. SteganoGAN (Dense) [29] outperformed the other schemes in PSNR, which can be attributed to DenseNet's propensity for propagating feature mappings from earlier layers to later layers. Despite its PSNR effectiveness, SteganoGAN’s BER performance degraded significantly due to the absence of a noise layer during its training process. This steganography technique yielded image quality perceptible to the human eye and is not a viable option for D2C applications due to the extent of data degradation caused by noise on the channel. If the model is not trained with noise layers, it becomes vulnerable to multiple types of transmission media attacks in real-world D2C scenarios.

Table 5. PSNR and BER performance comparisons for the proposed system with state-of-the-art DCNN-based steganography techniques.

View Table | View all tables in this article

It is worth noting that the StegaStamp [33] technique exhibited exceptional BER performance. In order to achieve that, the encoder and decoder networks in this technique consider a collection of differentiable image perturbations (including pixel-wise and spatial variations) during the training process of the model. To provide real-world resilience with transmitted data, a pipeline model of noise layers is created specifically to account for perspective warping, blurring, color modification, noise, and JPEG compression. However, this comes at the expense of the data-embedded image's visual quality, where a significant PSNR reduction was seen. The U-Net architecture used in the StegaStamp encoder network has a limited ability to learn complex features and patterns in data. Thus, it is unable to create images of sufficient quality for the human eye under the conditions required to ensure reliable communication performance.

Compared to StegaStamp [33] and Deep D2C-Net [35], the proposed Dense D2C-Net improved the visual quality of the data-embedded image considerably, resulting in a sufficient increase in PSNR. Concurrent feature maps are generated by a convolutional series of upsampled data and the Y channel of the cover image in a hybrid fashion. The Y channel is employed because it is robust against image processing techniques such as compression, and is less sensitive to changes in color and saturation. Alongside this, each convolutional layer is connected to every other layer in a feed-forward fashion in the cover image block, creating dense connections that reuse feature maps throughout the network. This preserves low-level features of the encoding network when proceeding through deeper layers, and allows gradients to propagate more easily through the network, improving training convergence.

Figure 7 shows randomly selected samples of cover pictures and the data-embedded images obtained by using StegaStamp [33], Deep D2C-Net [35], and the proposed Dense D2C-Net. We can see that the Dense D2C-Net images are more visually imperceptible by the human eye than StegaStamp and (by a small margin) Deep D2C-Net. The low-level features are effectively conveyed to the final layers due to the concatenation of feature maps produced by the convolution series with preceding layers and from the upsampled data with a hybrid layer. In this way, the cover image's rich spatial information can be preserved in the data-embedded image, thus maximizing the PSNR. In addition, the data-embedded image's visual quality is precisely preserved by the skip connection that was added between the cover image and final layers.

Fig. 7. Randomly selected cover images: top row, from the Linneaus dataset [39]; second row, data-embedded images from StegaStamp [33]; third row, data-embedded images from Deep D2C-Net [35]; and bottom row, data-embedded images from Dense D2C-Net.

Download Full Size | PDF

4.2 Achievable data rate

Figure 8 compares the ADR performance of Dense D2C-Net to the state-of-the-art DCNN steganography methods. The total number of data bits to be embedded was 200. To calculate the ADR, the values for distance (D) and angle (θ) between transmitter and receiver were held constant at 10 cm and 90°, respectively. The 90° angle is the ideal alignment of the display screen to the camera.

Fig. 8. Performance comparison of ADR of Dense D2C-Net with other models.

Download Full Size | PDF

ADR is a function of the maximum data rate (${R_{max}}$) and the BER, representing the amount of data that can be transmitted over the D2C link at D = 10 cm and θ = 90°, per unit of time:

(3)$$\textrm{ADR} = ({1 - BER} )\times {R_{max}},$$

where ${R_{max}}$ is determined by multiplying the total number of input bits by the transmitter's display rate.

To account for the impact of the optical wireless channel's corrosive features and the phase noise caused by the camera's out-of-focus effect, it is necessary to integrate a D2C model that has been trained with noise layers. Since the noise layers are not introduced while training the SteganoGAN (Dense) [29] and HiDDeN [32] models, the BER is poor, and ADR is inadequate. Dense D2C-Net evidently performed better by approximately 80% and 50% compared to SteganoGAN (Dense) and HiDDeN, respectively, and was comparable to StegaStamp [33] and Deep D2C-Net [35].

4.3 Receiver orientation, angle and distance

To examine the impacts of geometric distortion on captured images based on the camera’s position, the display content was captured from various distances and angles. Figure 9 displays the geometrically warped images from the various camera orientations. The misaligned θ and a large D between display and camera result in noise related to the environmentally unfriendly property of the optical wireless channel in which phase noise produced by the camera’s out-of-focus depth of field is visible. The captured image is typically distorted by this sort of noise, which eventually degrades communication performance. As seen in Table 5, StegaStamp and Deep D2C-Net were chosen as the standard for comparison in this experiment owing to the almost identical BER under ideal display camera alignment. Furthermore, the displayed image was captured with three distinct values for θ (30°, 90°, and 120°), with D changing from 15 cm to 30 cm.

Fig. 9. Example images captured by the camera: (a) at D = 15 cm and θ = 30°, (b) at D = 15 cm and θ = 90°, (c) at D = 15 cm and θ = 120°, and (d) at D = 30 cm and θ = 90°.

Download Full Size | PDF

Figure 10 illustrates the BER performance of Dense D2C-Net compared to StegaStamp and Deep D2C-Net when varying distance D and angle θ between the display screen and camera. As seen in the figure, BER increases as D increases. The increased distance between the display and the camera worsens the degree of image blur, and introduces severe geometric distortions in the captured image. Moreover, when the viewing angle between the display screen and camera was varied, the appearance of the image changed. This is because the image is composed of pixels that emit light in a certain direction, and when viewed from an angle, the emitted light is not perpendicular to the receiver, causing color shift or distortion. Therefore, the changes in distance and angle between the display screen and camera results in degradation of decoding performance. We can see that the BER is the best when θ = 90° because the emitted light directly aligns with the camera, resulting in accurate representation of the visual information. This experiment showed that Dense D2C-Net outperformed both Deep D2C-Net and StegaStamp in terms of BER for various distances and angles. This is due to feature maps of each convolutional layer in the cover image block being connected to every other layer in a feed-forward fashion. This leads to the formation of dense connections that reuse features throughout the network, particularly allowing the network to learn patterns at different levels of granularity and to capture both local and global features. In the cover image block, the Y channel is utilized to embed the data because it is subjected to fewer image processing operations. In addition, the robust end-to-end network architecture allows the Dense D2C-Net encoder’s hybrid layers to minimize BER by lowering training losses alongside the decoding network. The Dense D2C-Net’s decoder also includes a single FC layer to learn the non-linear combinations of features, as well as very deep convolutional layers to detect high-level features from the collected image. Unlike this, the decoder for StegaStamp is composed of three FC layers that learn non-linear combinations of the features from the previous convolutional layers. Due to this structural distinction, Dense D2C-Net can offer more insightful and invariant feature space representatives, irrespective of the degree of image blur. As a result, after the decoder has extracted enough features from Dense D2C-Net, the data retrieval performance is resistant to changes in the camera's capture direction.

Fig. 10. BER performance of Deep D2C-Net, StegaStamp and Dense D2C-Net when varying distance and angle

Download Full Size | PDF

4.4 Display brightness and ambient lighting

In D2C communications, variations in the display brightness and ambient light when capturing images can significantly impact the reliability of communication. Therefore, the impact of changing the display brightness on the BER of Dense D2C-Net should be analyzed for data transmission in a secure and reliable manner. In this experiment, images were captured with a smartphone camera from the display screen at varied brightness levels in the range [20, 100]. The camera was utilized with and without a flash to examine the impact of ambient light on the image taken. Figure 11 shows two distinct levels of intensity in the images captured under normal light and ambient light. Figure 12 compares the BER performance of Dense D2C-Net, Deep D2C-Net, and StegaStamp under increasing display brightness levels for both lighting situations. Throughout this experiment, D and θ were held constant at 15 cm and 90°, respectively. It is apparent that all three methods exhibited similar behavior in that the BER was reduced as the display brightness increased and eventually converged to zero.

Fig. 11. Captured images under different lighting conditions: (a) normal lighting, and (b) ambient lighting with flash.

Download Full Size | PDF

Fig. 12. BER performance of Dense D2C-Net, Deep D2C-Net, and StegaStamp when varying screen brightness under different lighting conditions.

Download Full Size | PDF

When the display's brightness was low, decoding became significantly difficult, and the system was more prone to error. On the other hand, when images were captured with ambient light, pixel intensity rose, making it difficult for the decoder to distinguish the transmitted digital content. Therefore, a little increase in the BER could be seen when the display content was captured with the flash. Based on the overall characteristics of the BER curve, we conclude that Dense D2C-Net outperformed Deep D2C-Net and StegaStamp in all working conditions, regardless of brightness level. The Dense D2C-Net architecture demonstrates that the cover image block's convolutional layers are interlinked with one another in a feed-forward manner, which results in the formation of dense connections between the feature maps of each layer. This results in better gradient flow and preservation of the image’s spatial information throughout the network. In the cover image block, the Y channel was utilized to embed data in the cover image block because it is less sensitive to changes in color and saturation. Dense D2C-Net's training process optimizes the network hyperparameters to such a degree that it achieves minimum global loss. This optimization ensures that the network provides stronger BER performance than StegaStamp even in real-world wireless communication scenarios where effects from ambient light can occur. Furthermore, the StegaStamp decoder employs a weak technique for learning non-linear feature combinations from the captured image, attributed to its use of multiple fully connected layers, which compromises BER performance. The experiment demonstrated that the Dense D2C-Net model can deliver reliable communication in a real-world D2C environment, despite changes in brightness level and under ambient light.

4.5 Resolution of the receiver camera

Figure 13 presents comparisons of BER performance from Dense D2C-Net, Deep D2C-Net, and StegaStamp when varying the resolution of the camera under different lighting conditions. We chose three distinct resolutions for this experiment, ranging from low resolution (LR) at 720${\times} $480 pixels to high resolution (HR) at 1920 ${\times} $1080, with D and θ set to 15 cm and 90°, respectively. The images were taken under two lighting scenarios (ambient light and normal light), as shown in Fig. 11. We can see from Fig. 13 that the BER approached zero when the display was captured with HR values by all three models. On the other hand, when the camera recorded the display at the LR setting, the error level increased considerably. When an image is displayed on a screen with low resolution, it is distorted, fuzzy, and lacks clarity, which makes it harder to interpret and comprehend. This limits the decoder’s capacity to extract data from the image, making it more prone to data loss and producing images with a higher BER. Because Dense D2C-Net utilizes the Y channel to embed binary data, which is subjected to fewer image processing operations and less loss of color and saturation, it consistently outperformed Deep D2C-Net and StegaStamp in terms of BER performance on resolutions ranging from low to high. The encoder with hybrid layers and dense connections in the cover image block in Dense D2C-Net effectively embeds data in the Y channel, even in LR images, enabling the decoder network to successfully recover the data. The Dense D2C-Net decoder, which extracts salient features from the captured image by sending them through numerous convolutional layers, is also responsible for the better BER performance. As seen from the experiment, since the Dense D2C-Net is less sensitive to the receiving camera’s resolution, it is a better fit for the D2C context.

Fig. 13. BER performance of Dense D2C-Net, Deep D2C-Net, and StegaStamp by varying the resolution of the camera under different lighting conditions.

Download Full Size | PDF

5. Conclusion

In this study, we introduce an unobtrusive D2C communications technique based on a novel Dense D2C-Net. The Dense D2C-Net encoder processes a cover image through multiple convolutional layers with dense connections for feature reuse to maintain visual quality. The Y channel is chosen for embedding the data owing to its resilience against image distortion that results from compression and color variations, ensuring robustness in the optical wireless channel. Hybrid layers that combine feature maps from both the cover image and input binary data were designed to efficiently hide the embedded data. Since the received signal is susceptible to distortions arising from the optical wireless channel, several noise layers, which effectively counteract the impact of channel distortions on the transmitted data, are added after the encoder structure. At the decoder, output data are extracted from the data-embedded image using several 2D convolutional layers. The Dense D2C-Net encoder and decoder were iteratively trained using an end-to-end structure and were tested in real-world scenarios with an electronic display and a smartphone camera that captured data-embedded images through an optical wireless channel. Compared with existing DL-based approaches, the proposed Dense D2C-Net outperformed other schemes in terms of image quality and communication performance. By exploring the capabilities of the proposed Dense D2C-Net, we demonstrated successful data embedding and extraction for a D2C system using images displayed in various experimental environments, including changing the angle and distance of the display screen and the receiver camera, adjusting the display screen brightness under normal and ambient lighting, and varying the resolution of the camera.

Funding

National Research Foundation of Korea (2022R1A2B5B01001543).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are available in Ref. [39].

References

1. X. Hu, P. Zhang, Y. Sun, X. Deng, Y. Yang, and L. Chen, “High-Speed Extraction of Regions of Interest in Optical Camera Communication Enabled by Grid Virtual Division,” Sensors 22(21), 8375 (2022). [CrossRef]

2. P. Luo, M. Zhang, H. L. Minh, H. M. Tsai, X. Tang, L. C. Png, and D. Han, “Experimental demonstration of RGB LED-based optical camera communications,” IEEE Photonics J. 7(5), 1–12 (2015). [CrossRef]

3. H. Hass, J. Elmirghani, and I. White, “Optical wireless communication,” Philos. Trans. R. Soc., A 378(2169), 20200051 (2020). [CrossRef]

4. A. Al-Kinani, C. Wang, L. Zhou, and W. Zhang, “Optical wireless communication channel measurements and models,” IEEE Commun. Surv. Tutorials 20(3), 1939–1962 (2018). [CrossRef]

5. Z. Ghassemlooy, S. Zvanovec, M. A. Khalighi, W. O. Popoola, and J. Perez, “Optical wireless communication systems,” Optik 151, 1–6 (2017). [CrossRef]

6. S.U. Maheswari and D. J. Hemanth, “Frequency domain QR code based image steganography using Fresnelet transform,” AEU-International Journal of Electronics and Communications 69(2), 539–544 (2015). [CrossRef]

7. Q. Wang, M. Zhou, K. Ren, T. Lei, J. Li, and Z. Wang, “Rain Bar: Robust application-driven visual communication using color barcodes,” 2015 IEEE 35th International Conference on Distributed Computing Systems, (2015), pp. 537–546.

8. T. Hao, Z. Ruogu, and G. Xing, “COBRA: Color barcode streaming for smartphone systems,” Proceedings of the 10th international conference on Mobile systems, applications, and services (MobiSys ‘12). Association for Computing Machinery, (2012), pp. 85–98.

9. P. Singh, B.W. Kim, and Sung-Yoon Jung, “Performance Analysis of Display Field Communication with Advanced Receivers,” ICT Express 7(3), 392–397 (2021). [CrossRef]

10. X. Bao, J. Pan, Z. Cai, J. Li, X. Huang, R. Chen, and J. Fang, “Real-time display camera communication system based on LED displays and smartphones,” Opt. Express 29(15), 23558–23568 (2021). [CrossRef]

11. K. Jo, M. Gupta, and S. K. Nayar, “DisCo: Display-camera communication using rolling shutter sensors,” ACM Trans. Graph. 35(5), 1–13 (2016). [CrossRef]

12. T. Li, C. An, A. T. Campell, and X. Zhou, “HiLight: Hiding bits in pixel translucency changes,” SIGMOBILE Mob. Comput. Commun. Rev. 18(3), 62–70 (2015). [CrossRef]

13. L. D. Tamang and B. W. Kim, “Exponential data embedding scheme for display to camera communications,” 2020 International Conference on Information and Communication Technology Convergence (ICTC), (2020), pp. 1570–1573.

14. B. W. Kim, H. Kim, and S. Jung, “Display field communication: fundamental design and performance analysis,” J. Lightwave Technol. 33(24), 5269–5277 (2015). [CrossRef]

15. R. Mushu, T. Wada, K. Mukumoto, and H. Okada, “A proposal of information embedding scheme based on discrete cosine transform in parallel transmission visible light communications,” 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE), (2018), pp. 175–176.

16. M. Liu, J. Shi, Z. Li, C. Li, J. Zhu, and S. Liu, “Towards better analysis of deep convolutional neural networks,” IEEE Trans. Visual. Comput. Graphics 23(1), 91–100 (2017). [CrossRef]

17. W. Rawat and Z. Wang, “Deep convolutional neural networks for image classification: A comprehensive review,” Neural Computation 29(9), 2352–2449 (2017). [CrossRef]

18. W. G. Hatcher and W. Yu, “A survey of deep learning: Platforms, applications and emerging research trends,” IEEE Access 6, 24411–24432 (2018). [CrossRef]

19. M. Plachta, M. Krzemień, K. Szczypiorski, and A. Janicki, “Detection of image steganography using deep learning and ensemble classifiers,” Electronics 11(10), 1565 (2022). [CrossRef]

20. M. Guarascio, M. Zuppelli, N. Cassavia, L. Caviglione, and G. Manco, “Revealing MageCart-like threats in favicons via artificial intelligence,” 17th International Conference on Availability, Reliability and Security, 2022, pp. 1–7.

21. S. M. Mun, S.H. Nam, H. Jang, D. Kim, and H.K. Lee, “Finding robust domain from attacks: A learning framework for blind watermarking,” Neurocomputing. 337, 191–202 (2019). [CrossRef]

22. M. Ahmadi, A. Norouzi, N. Karimi, S. Samavi, and A. Emami, “ReDMark: Framework for residual diffusion watermarking based on deep networks,” Expert Systems with Applications 146, 113157 (2020). [CrossRef]

23. P. T. Yu, H. H. Tsai, and J. S. Lin, “Digital watermarking based on neural networks for color images,” Signal processing 81(3), 663–671 (2001). [CrossRef]

24. D. Li, L. Deng, B. B. Gupta, H. Wang, and C. Choi, “A novel CNN based security guaranteed image watermarking generation scenario for smart city applications,” Inf. Sci. 479, 432–447 (2019). [CrossRef]

25. S. Baluja, “Hiding images in plain sight: Deep steganography,” Advances in neural information processing system. 30, 2066–2076 (2017).

26. P. Wu, Y. Yang, and X. Li, “Stegnet: Mega image steganography capacity with deep convolutional network,” Future Internet 10(6), 54 (2018). [CrossRef]

27. S. Baluja, “Hiding images within images,” IEEE Trans. Pattern Anal. Mach. Intell. 42(7), 1685–1697 (2020). [CrossRef]

28. C. Zhang, P. Benz, A. Karjauv, G. Sun, and I. S. Kweon, “Udh: Universal deep hiding for steganography, watermarking, and light field messaging,” Advances in Neural Information Processing Systems. 33, 10223–10234 (2020). [CrossRef]

29. K. Zhang, A. Cuesta-Infante, and K. Veeramachaneni, “SteganoGAN: Pushing the limits of image steganography,” arXiv, arXiv:1901.03892 (2019).

30. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), pp. 770–778.

31. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), pp. 2261–2269.

32. J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei, “HiDDeN: Hiding data with deep networks,” arXiv, arXiv:1807.09937 (2018). [CrossRef]

33. M. Tancik, B. Mildenhall, and R. Ng, “StegaStamp: Invisible hyperlinks in physical photographs,” arXiv, arXiv:1904.05343v2 (2020). [CrossRef]

34. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” arXiv, arXiv:1505.04597 (2015). [CrossRef]

35. L.D. Tamang and B.W. Kim, “Deep D2C-Net: Deep learning-based display-to-camera communications,” Opt. Express 29(8), 11494–11511 (2021). [CrossRef]

36. S. D. Lin, S. C. Shie, and J. Y. Guo, “Improving the robustness of DCT-based image watermarking against JPEG compression,” Computer Standards & Interfaces 32(1-2), 54–60 (2010). [CrossRef]

37. S. Zheng, S. Chen, and X. Yang, “DeepReceiver: a deep learning-based intelligent receiver for wireless communications in the physical layer,” arXiv, arXiv:2003.14124 (2020). [CrossRef]

38. R. Shin and D. Song, “JPEG-resistant adversarial images,” In NIPS 2017 Workshop on Machine Learning and Computer Security 1, 8 (2017).

39. G. Chaladze and L. Kalatozishcili, “Linnaeus 5 dataset for machine learning,” Chaladze (2017), http://chaladze.com/l5/.

40. K. Patel, H. Han, A.K. Jain, and G. Ott, “Live face video vs. spoof face video: Use of Moire patterns to detect replay video attacks,” International Conference on Biometrics (ICS), (2015), pp. 98–105.

Layers	Input	Filter	Kernel size	Output dimensions
Conv 1	(256, 256, 1)	32	3 $\times 3$	(256, 256, 32)
Conv A	(256, 256, 3)	32	3 $\times 3$	(256, 256, 32)
Conv 2	Hybrid layer 1	32	3 $\times 3$	(256, 256, 32)
Conv B	Conv A	32	3 $\times 3$	(256, 256, 32)
Conv 3	Hybrid layer 2 + Conv 1	32	3 $\times 3$	(256, 256, 32)
Conv C	Conv B	32	3 $\times 3$	(256, 256, 32)
Conv 4	Hybrid layer 3 + Conv 1 + Conv 2	32	3 $\times 3$	(256, 256, 32)
Conv D	Conv C	32	3 $\times 3$	(256, 256, 32)
Conv 5	Hybrid layer 4 + Conv 1 + Conv 2 + Conv 3	32	3 $\times 3$	(256, 256, 32)
Conv E	Conv D	32	3 $\times 3$	(256, 256, 32)
Conv 6	Hybrid layer 5 + Conv 1 + Conv 2 + Conv 3 + Conv 4	32	3 $\times 3$	(256, 256, 32)
Conv F	Conv E	32	3 $\times 3$	(256, 256, 32)
Conv 7	Hybrid layer 6 + Conv 1 + Conv 2 + Conv 3 + Conv 4 + Conv 5	16	3 $\times 3$	(256, 256, 16)
Conv G	Conv F	16	3 $\times 3$	(256, 256, 16)
Conv 8	Hybrid layer 7	16	1 $\times 1$	(256, 256, 16)
Conv 9	Conv 8	16	1 $\times 1$	(256, 256, 16)
Conv 10	Conv 9	3	1 $\times 1$	(256, 256, 3)

Layers	Input	Filter	Kernel size	Output dimensions
Conv 11	(256, 256, 3)	32	3 $\times 3$	(256, 256, 32)
Conv 12	Conv 11	32	3 $\times 3$	(256, 256, 32)
Conv 13	Conv 12	16	3 $\times 3$	(256, 256, 16)
Conv 14	Conv 13	16	3 $\times 3$	(256, 256, 16)
Conv 15	Conv 14	8	3 $\times 3$	(256, 256, 8)
Conv 16	Conv 15	8	3 $\times 3$	(256, 256, 8)
Conv 17	Conv 16	3	3 $\times 3$	(256, 256, 3)
Conv 18	Conv 17	1	1 $\times 1$	(256, 256, 1)
FC layer	Conv 18		-	(1, M)

Parameters	Value
Training dataset	MS COCO
Learning rate	10⁻⁴
Training steps	140,000
Batch size	4
Test images	Linnaeus dataset
Loss function	Cross entropy
Activation function	ReLU
Optimizer	Adam Optimizer

Parameter	Description
Display	Samsung monitor, resolution 2150 × 1920 pixels, with a display rate of 60 Hz
Receiver camera	iPhone 11, capture rate: 1080p HD at 60 fps
Data file	Small file, data underlying a plot in a figure
Number of data bits	200
Environment	Indoor
Lighting conditions	Normal and ambient

Method	PSNR [dB]	BER
HiDDeN [32]	37.84	0.336
SteganoGAN (Residual) [29]	39.31	0.410
SteganoGAN (Dense) [29]	41.26	0.461
StegaStamp [33]	21.79	0
Deep D2C-Net [35]	31.12	0
Dense D2C-Net [proposed]	32.00	0

Dense D2C-Net: dense connection network for display-to-camera communications

Abstract

1. Introduction

2. Related works

3. Proposed system

3.1 Reshaping and upsampling the input data

3.2 Encoding and decoding networks

3.3 Robustness against the optical wireless channel

3.4 Training

4. Experiment analysis and evaluation

4.1 PSNR and BER performance comparisons

4.2 Achievable data rate

4.3 Receiver orientation, angle and distance

4.4 Display brightness and ambient lighting

4.5 Resolution of the receiver camera

5. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (13)

Tables (5)

Equations (3)

Optics Express