Visual tracking for mobile optical wireless communications

Biao Chen; Biao Chen; Haotian Yu

doi:10.1364/OE.402557

1. Introduction

Optical wireless communication (OWC) refers to transmission in unguided propagation media using optical carriers with invisible, infrared, or ultraviolet bands. It is a line-of-sight (LOS) technology that propagates modulated light to transmit data between stations in stationary or mobile conditions. The notion of OWC includes free space optical (FSO) communication, which primarily employs lasers to transmit data, and visible light communication (VLC), which frequently employs light emit diodes (LEDs). OWC has attracted considerable attention as it has the potential to provide transmissions at extremely high data rates between two terminals separated over a distance that varies from nanometers to thousands of kilometers and possesses multiple advantages such as license-free band use, spatial reusability, security, and immunity to electromagnetic interference. OWC has found applicability in numerous use cases such as in high-speed trains, unmanned aerial vehicles (UAVs), building-to-building networks,, satellites, chip-to-chip networks, indoor and outdoor local- and wide-area networks, inter-vehicular and vehicle-to-infrastructure communications, underwater communications, and deep-space communications [1,2].

Acquisition, tracking, and pointing (ATP) mechanisms in OWC systems can avoid or reduce pointing errors by continuously measuring system-wide performance metrics, typically based on the received beacon signal, and then adjusting correction elements such as gimbals, mirrors, or adaptive optics [2]. Existing ATP mechanisms primarily employ beacon lights to determine the orientation of the remote optical terminal [2–5]. A quadrant photodetector (QPD) is widely used to track a beacon light by comparing the output signals gathered from its quadrants. When the laser beam is well aligned, the beam spot should be at the center of the QPD, implying that the output voltage levels from all the quadrants are similar. A photo-sensor such as a position sensing diode (PSD) can also detect a laser beam and calculate its position based on the part of the photodiode’s surface area to which the laser is incident [4]. Even a camera has been employed as a light spot position sensor for coarse tracking and initial alignment [3]. Laser beams are typically employed as beacon lights for long and median distance communications, e.g., ground-to-air [2–4], whereas LEDs with a focusing lens can be employed as beacon lights for short distance communications, e.g., ground-to-train [2,5]. Conventional OWC systems were not designed for accommodating user mobility because strict alignment conditions are required for reliable communication. As related technologies advance, OWC systems are expected to be applied in broader scopes and markets, especially in mobile systems, e.g., ground-to-air [3], ground-to-train [4,5], and vehicle-to-everything (V2X) communications, requiring new ATP designs. The ATP system presented in [3] tracks the trajectory of the UAV aided by its GPS (global positioning system) position and used a beacon laser with a beam width wide enough to make up for the GPS error, so that the UAV is always illuminated by the beacon. However, a wide beacon beam decreases the pointing accuracy. Limited by the performance of the GPS module, the update frequency of the position information of the UAV's is relatively low, UAV may miss the beacon sometimes. The existing ground-to-train schemes are specific and not suitable for other mobile scenarios. Rotating head designs could provide lighter and smaller gimbals for mobile OWC [6,7]; however, their methods to sense the position and alignment using motion calculation with the aid of GPS and radio communications, are not accurate and reliable according to their simulation results. They could not maintain continuous alignment, even though they employed wide beam transceivers.

With the innovative development of artificial intelligence (AI) technology, deep learning has already been introduced into optical communication systems to analyze the system performance degradation due to the effects of fiber impairments [8], and to realize intelligent dimming in the VLC system [9]. In addition to machine learning, computer vision is another major AI category that has been introduced to realize intelligent adaptive transmission for OWC systems [10]. However, none of these efforts has involved applications of AI technologies for ATP systems. In this paper, we consider a visual tracking approach where metrics based on target imaging rather than the received beacon signal are used for steering the gimbals and/or mirrors to aim at the target. A traditional beacon uses a directional light source, which can only be detected at an extremely limited angle. The proposed method adopts a new shape beacon, which can be captured at a wide angle, and hence is suitable for universal mobile applications.

2. Visual tracking mechanism

There are different scenarios in mobile OWCs, for example, UAV-to-UAV, UAV-to-station, vehicle-to-vehicle, and vehicle-to-infrastructure communications. A challenge in these applications is signal acquisition and tracking between two targets with rapidly changing relative positions. Without loss of generality, herein we focus mainly on vehicle-to-infrastructure optical communication to illustrate the proposed methods.

Figure 1 displays vehicle-to-infrastructure and UAV-to-station optical communication scenarios. A fixed terminal can be used as the master OWC terminal to communicate with slave OWC terminals located on a vehicle or UAV. A traditional OWC terminal typically includes an ATP system and optical antenna plus transceiver, as indicated in Fig. 2(a). The tasks of a traditional ATP system include pointing the transmitter in the direction of a receiver, acquiring the incoming light signal, and maintaining the OWC link by tracking the position of peer OWC terminal. An ATP system includes units such as a beacon, gimbal, beacon sensor, and embedded computer. The beacon usually is a narrow light beam or laser and it is to be detected by the sensor of peer terminal. As previously stated, a QPD is the widely used beacon sensor. A gimbal is a platform able to perform pan and tilt movements controlled by motors. The computer was connected to the sensor and gimbal, executing an ATP program to point and track peer OWC terminal by controlling the action of the platform. The two ATP systems of peer terminals keep their optical antennas aligned so that they can continuously exchange data through their transceivers. Our ATP system is different from existing one in that we use a special beacon and use a camera as the sensor. The key character of the proposed visual tracking method is the use of a shape beacon, i.e., the use of an object with a specific shape that is easy to be recognized from all directions. In fact, one can simply use the vehicle itself as a shape beacon, which is large, unique, and does not require additional objects. Adding an object, e.g., a ball, can facilitate visual acquisition and tracking. A ball is an effective shape beacon because it appears the same from any angle. However, an added shape beacon is typically small compared with the vehicle itself, and could be difficult to be captured and identified correctly if a camera with a large field of view (FOV) is used. A camera with a smaller FOV is better for tracking smaller shape beacons; however, it is liable to lose sight of the beacon. If the beacon is out of sight, the camera must take the necessary time to scan and recapture it.

Fig. 1. Two mobile optical communication scenarios.

Download Full Size | PDF

Fig. 2. (a) Block diagram of OWC terminal; (b) Front view of optical antenna.

Download Full Size | PDF

Considering the vehicle-to-infrastructure (or UAV-to-station) scenario, the optical antenna of the stationary master terminal at the infrastructure must point to the antenna of the mobile slave terminal at the vehicle to transmit data through a light beam effectively. For an OWC terminal, the lens of beacon sensor (camera lens in our method) is usually packed with the optical antenna and installed on the same gimbal, as illustrated in Fig. 2(b), such that the direction of its view can be maintained in alignment with that of the optical antenna. The master terminal can acquire azimuth and elevation values, denoted by (a, e), of the vehicle by pointing to it and calculating, and then send the information via a low-speed communication channel, e.g., a radio channel or one-way optical channel with an omni-directional optical antenna at the receiver side. The slave terminal uses (a, e) to steer its gimbal and points to the master terminal.

Using computer vision technology, the beacon image can be identified and marked with a rectangular box. Upon receiving each image, the computer executes the visual tracking program, which includes three procedures.

1. Scan and capture the shape beacon and mark it with a rectangular frame.
2. Calculate the position shift between the aiming point and special point of the above-identified marked frame.
3. Drive the gimbal to decrease the value of the position shift.

3. Numeric analysis of pointing error

Figure 3 is the principle sketch of imaging system. Assuming the slave terminal is mounted on the upper center of a car and the master terminal looks down at it, the aiming point should coincide with the central point of the beacon image. The camera can only perceive an area with a height of H and width of V, i.e., FOV. The digital image is composed of h × v pixels. From Fig. 3, formulas can be obtained according to the basic knowledge of optics, where ε is the position shift at the image side due to pointing error, α is the aiming angle error, D is the distance between the camera and target (shape beacon), f is the focus length of lens, and d is the distance between the lens and image:

\frac{1}{D} + \frac{1}{d} = \frac{1}{f}

\alpha = {\tan ^{ - 1}}\left( {\frac{\varepsilon }{d}} \right) = {\tan ^{ - 1}}\left( {\frac{k}{D}} \right)

where k is the position shift at the object side, i.e.:

k = \frac{{\varepsilon H}}{h} = \frac{{\varepsilon V}}{v}

In addition,

AOV = {\tan ^{ - 1}}\left( {\frac{H}{D}} \right)

where AOV is the angle of view.

Fig. 3. Principle sketch of imaging system.

Download Full Size | PDF

If ε / h = 0.02, V < H = 10 m, when D is greater than 1000 m, α is less than 0.2 mrad; such an ATP accuracy even meets the requirements of interstellar communication [11].

In vehicle-to-infrastructure communication cases, the distance could cover 10 to 1000 meters. Figures 4(a), 4(b), and 4(c) demonstrate the numeric results calculated according to the above formulas. Figure 4(a) displays α versus D under different k. If α < 10⁻³ rad (i.e., < 0.057°) is demanded, k should be keep under 0.01 m for D > 10 m. If H = 10 m, which means that the vehicle with a height of approximately 2 m appears sufficiently significant to be identified on the image, then ε / h must be no more than 0.001 to ensure α < 1 mrad. ε / h < = 0.001 indicates the camera vertical resolution should be considerably more than 1000 pixels. To relax the camera resolution requirement, the requirement of α should be relaxed in a shorter distance. Fortunately, OWC systems do tolerate relatively greater α in a shorter distance.

Fig. 4. Numeric and experimental results.

Download Full Size | PDF

Figure 4(b) indicates that AOV decreases rapidly as D increases if H is fixed to 3 m, 10 m, or 100 m. If f is fixed, then AOV remains virtually constant. Figure 4(c) indicates that f varies if H is fixed. If f is fixed to 40 mm, then H can remain less than 100 m within the distance coverage (D < 1000 m); however, H should be greater than 3 m or the vehicle image may not be completely obtained, and D should be greater than 30 m. To reduce the ATP system hardware cost, a fixed focus lens is preferred; f = 40 mm is suitable for the distance coverage between 30 m and 1000 m. If a row of master terminals are deployed alongside the road and the distance gap between two terminals is less than 970 m, then continuous communication is practicable by adding a handoff mechanism. AOV is approximately 5° when f = 40 mm. Performance improves when two cameras are used, one with a smaller AOV for tracking and another with a larger AOV for capturing. Figures 4(b) and 4(c) indicate that a camera with f = 2 mm (i.e., AOV = 60°) is suitable for capturing the shape beacon for D < 100 m.

4. Experimental results

Visual tracking experiments were made inside a room; a toy car was employed as the vehicle for recognition. As indicated in Fig. 5, (a) digital camera was mounted on a pan-tilt platform. The platform could rotate from 0° to 355° in a horizontal direction, with a rotation speed of 9 °/s. It could also rotate in the vertical direction with a rotation range of −60° to +10° and rotation speed of 4 °/s. A laptop computer was connected to the camera and platform, executing a visual tracking program to acquire the real time image, identify it, and then control the action of the platform.

Fig. 5. Experiment setup.

Download Full Size | PDF

We used OpenCV (www.opencv.org), to assist capturing the shape beacon. First, a median filter, a simple and practical nonlinear algorithm, was used to reduce the picture noise. The recognition of a shape beacon is based on machine learning. Machine learning must build a classifier for the problem. We used Haar features and AdaBoost’s algorithm architecture to build the classifier. To build this classifier, two data sets were required. One data set contained positive samples and the other set contained negative samples. The number of sample images used in each set had to be sufficiently large, e.g., more than 5000. Sample images can be obtained by searching the Internet and taking pictures by ourselves. Moreover, the images first required normalization, i.e., the images were adjusted to the same size, and then converted to BMP formats. Figure 6 shows four pictures, the left two are in positive set and right two are in negative set. Each image in the positive set contained the object to be tracked, and the object to be tracked in each image was manually marked with the smallest rectangle containing its complete contour. Conversely, each image in the negative set did not contain the object to be tracked. It was better to use different background pictures in the negative set. After obtaining the data sets of the positive and negative samples, we imported them into the machine-learning framework at OpenCV, and configured the hierarchical structure and operation time of the machine learning. To balance the accuracy and time cost, 15 computation layers and 2 weak classifiers were adopted. After training the classifiers with the data sets, we could use them to identify the model car. As can be observed from the results in Fig. 7, when the shooting angle of the car was changed, the computer continued to identify and mark the car correctly.

Fig. 6. Pictures in positive and negative sets.

Download Full Size | PDF

Fig. 7. Correctly identified and marked car.

Download Full Size | PDF

After identifying the shape beacon and marking it with a rectangular frame, the position shift between the aiming point and central point of the above rectangular frame could be obtained. Let Δx and Δy be the horizontal and vertical portions of the position shift, and e be the accepted pointing error in pixels. Once the position shift value was updated (the updating rate was 30 times per second), if Δx > e or Δx < -e, the computer sent a horizontal rotation instruction to the platform, ordering it to rotate clockwise or anti-clockwise. If -e < Δx < e, instruct it to pause. Then, if Δy > e or Δy < -e, the computer sent a vertical rotation instruction to the platform, ordering it to rotate upward or downward. If -e < Δy < e, instruct it to pause. In this manner, the camera could track the toy car with a limited pointing error. The pointing error e should be carefully preset. If e is set overly small, the tracking system could become unstable. In our experiment, we set e = 10 pixels and the tracking system was stable. When the car moved slowly, the offsets Δx and Δy changed randomly, yet all of the values were confined within e (see Fig. 4(d), we recorded 84 pairs of offsets with each pair corresponding to a point on the rectangular coordinate system). When the car moved on, the camera could continue tracking. However, if it moved excessively fast, i.e., at a speed the platform cannot follow, the offsets Δx and Δy could expand owing to the limited rotation speed of the platform. Then, the ATP system entered into capturing procedure if the car moved slowly again.

5. Conclusion

A shape beacon can be viewed from a wide angle; hence, visual tracking can be an effective ATP solution for mobile OWC. Theoretical analysis indicates that visual tracking can be sufficiently accurate to meet the requirements of OWC systems employing narrow beams, making it suitable for universal mobile OWC such as vehicle-to-infrastructure applications. Employing a narrow beam increases the received power density and improves link margin under diverse weather conditions. Our low-cost and proof-of-principle experiments demonstrated the visual tracking process for vehicle-to-infrastructure optical communications using the vehicle itself as the shape beacon, and proved our method can maintain a steady alignment under mobile conditions. Future work could involve performing experiments outdoors, employing real vehicles or UAVs carrying active OWC terminals.

Disclosures

The authors declare no conflicts of interest.

References

1. M. A. Khalighi and M. Uysal, “Survey on free space optical communication: a communication theory perspective,” IEEE Commun. Surv. Tutorials 16(4), 2231–2258 (2014). [CrossRef]

2. Y. Kaymak, J. Roberto Rojas-Cessa, N. Ansari, M. Zhou, and T. Zhang, “A survey on acquisition, tracking, and pointing mechanisms for mobile free-space optical communications,” IEEE Commun. Surv. Tutorials 20(2), 1104–1123 (2018). [CrossRef]

3. Alberto Carrasco-Casado, Eva Oton, Ricardo Vergaz, Morten A. Geday, Jose M. Sanchez-Pena, and Jose M. Oton, “Low-impact air-to-ground free-space optical communication system design and first results,” in Proc. IEEE Int. Conf. Space Opt. Syst. Appl. (ICSOS), 2011, pp. 109–112.

4. Mouhammad K. Al-Akkoumi, Hakki Refai, and James J. Sluss Jr., “A tracking system for mobile FSO,” in Proc. Lasers Appl. Sci. Eng., San Jose, CA, USA, 2008, Art. no. 68770O.

5. Shinichiro Haruyama, Hideki Urabe, Tomohiro Shogenji, Shoichi Ishikawa, Masato Hiruta, Fumio Teraoka, Tetsuya Arita, Hiroshi Matsubara, and Shingo Nakagawa, “New ground-to-train high-speed free-space optical communication system with fast handover mechanism,” in Proc. Opt. Fiber Commun. Conf., Los Angeles, CA, USA, 2011, pp. 1–3.

6. M. Khan and M. Yuksel, “Maintaining a free-space-optical communication link between two autonomous mobiles,” in Proc. IEEE Wireless Commun. Netw. Conf. (WCNC), Istanbul, Turkey, 2014, pp. 3154–3159.

7. M. Khan and M. Yuksel, “Autonomous alignment of free-space-optical links between UAVs,” in Proc. 2nd Int. Workshop Hot Topics Wireless, Paris, France, 2015, pp. 36–40

8. B. Karanov, M. Chagnon, F. Thouin, T. A. Eriksson, H. Bülow, D. Lavery, P. Bayvel, and L. Schmalen, “End-to-end deep learning of optical fiber communications,” J. Lightwave Technol. 36(20), 4843–4855 (2018). [CrossRef]

9. H. Lee, I. Lee, T. Q. S. Quek, and S. H. Lee, “Binary signaling design for visible light communication: a deep learning framework,” Opt. Express 26(14), 18131–18142 (2018). [CrossRef]

10. Z. Huang, L. Zu, Z. Zhou, X. Tang, and Y. Ji, “Computer-vision–based intelligent adaptive transmission for optical wireless communication,” Opt. Express 27(6), 7979–7987 (2019). [CrossRef]

11. J. Tan, Y. Wang, M. Zhang, J. Liu, D. Liu, and J. Tang, “All-Optical Transparent Forwarding Relay System for Interstellar Optical Communication Networks,” IEEE J. Quantum Electron. 54(2), 1–7 (2018). [CrossRef]

Visual tracking for mobile optical wireless communications

Abstract

1. Introduction

2. Visual tracking mechanism

3. Numeric analysis of pointing error

4. Experimental results

5. Conclusion

Disclosures

References

Cited By

Figures (7)

Equations (4)

Optics Express