Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Slim and robust eye tracker on eyeglass temples with NIR patterned mirrors

Open Access Open Access

Abstract

Eye trackers play a crucial role in the development of future display systems, such as head-mounted displays and augmented reality glasses. However, ensuring robustness and accuracy in gaze estimation poses challenges, particularly with limited space available for the transmitter and receiver components within these devices. To address the issues, we propose what we believe is a novel eye tracker design mounted on foldable temples, which not only supports accurate gaze estimation but also provides slim form-factor and unobstructed vision. Our temple-mounted eye tracker utilizes a near-infrared imaging system and incorporates a patterned near-infrared mirror for calibration markers. We present wearable prototypes of the eye tracker and introduce a unique calibration and gaze extraction algorithm by considering the mirror's spatial reflectance distribution. The accuracy of gaze extraction is evaluated through tests involving multiple users with realistic scenarios. We conclude with an evaluation of the results and a comprehensive discussion on the applicability of the temple-mounted eye tracker.

© 2023 Optica Publishing Group under the terms of the Optica Open Access Publishing Agreement

1. Introduction

The upcoming era of virtual and augmented reality (VR/AR) devices has brought attention to the field of eye tracking, as it plays a crucial role in enhancing user experiences and interactions within these immersive environments [18]. The visual system, being one of the dominant human senses, necessitates the integration of eye-tracking capabilities into near-eye devices such as head-mounted displays, eyeglasses, and contact lenses [9,10]. These wearable and head-mounted devices impose strict constraints on size, weight, and system integration, driving the adoption of eye-trackers as both input and output devices within limited space [1113]. The research for enhancing the accuracy of eye tracking algorithm are represented also [14,15].

To address the challenges associated with integrating eye trackers into wearable devices, particularly glasses-type devices, we propose a novel eye tracker design that positions the eye-tracking sensors on the temples while the lenses are coated with a near-infrared (NIR) mirror and patterns. The temple-mounted eye tracker overcomes the limitations posed by glasses-frame deformation by employing a pattern analysis process, ensuring robust performance in various frame-deformation situations. Furthermore, the eye tracker's placement on the temples allows it to be positioned beyond the user's visual field, providing unobstructed visual experiences.

In traditional tabletop gaze trackers, the user's gaze is ascertained by holistically employing data from the upper body, face, and eyes. Such multifaceted information is instrumental in extracting the gaze, even amidst shifts in the relative positioning between the tracker and the user. Conversely, the constrained space in wearable devices restricts eye trackers to rely predominantly on light reflection from the user's eyes and pupils. Consequently, wearable device gaze trackers operate under the presumption that the relative position between the user and the camera remains constant, thereby determining the genuine gaze and eye position. However, in wearable devices, the relative position between the transceiver and the eye may change due to frequent device usage, wearing status, or hardware deformation. This vulnerability is amplified when the wearable eye tracker is designed with a flexible hinge and frame, akin to everyday glasses, to ensure user comfort. The relative position change between the transmitter and receiver necessitates a new type of calibration.

In this study, we present our eye tracker design for glasses-type devices, where eye-tracking sensors are strategically placed on the temples, and the lenses feature a coating of NIR mirrors and patterns. Thanks to the pattern analysis process, our proposed system demonstrates robust performance even under glasses-frame deformation. We evaluate the eye tracker's performance in various frame-deformation scenarios, and our experimental results validate its ability to provide accurate eye-tracking performance. Additionally, user studies are conducted to assess the system's effectiveness across different users and re-wearing procedures. The work comprehensively describes the temple-mounted eye tracker, experimental results, and concluding remarks. The subsequent sections are organized to discuss related work and the proposed system, including the patterned NIR mirror and temple-mounted eye tracker, followed by experiments and conclusions.

2. Related work

Eye trackers have been extensively studied as core interaction devices for wearable devices, enabling accurate and efficient estimation of a person's gaze. To ensure the reliability and stability of eye trackers, many studies have focused on the optimal placement of the receiver component to capture the human eye effectively. While this configuration aids in accurate gaze estimation, it often obstructs the user's field of view or limits the overall device design. These limitations become particularly critical in wearable devices that resemble regular eyeglasses in form factor.

One approach to address the issues is to integrate the eye tracker's components into another device element [16]. While the arrangement with MEMS devices proves beneficial, it has a significant drawback that hinders its application to self-luminous panels such as micro-LED and OLED, considered promising candidates for AR/VR displays. Consequently, integrating the eye tracker into the display becomes impractical. Another proposed method suggests adding the transmitters’ input grating adjacent to the display's input grating [17]. However, it requires a complex structure with bandpass filters and switchable gratings, which can reduce the display's waveguide's light efficiency and degrade the device's overall optical quality. An alternative study proposes locating the eye tracker's transmitter and receiver around the temples of glasses by employing an infrared reflecting surface with a metasurface [18]. While the configuration with the metasurface is efficient among the proposed methods, it presents challenges regarding the complex fabrication process for large-scale metasurfaces on eyeglasses and compromises the device's efficiency and captured image quality.

In addition to component placement and performance, calibration is a crucial parameter in eye tracking, particularly in head-mounted displays (HMDs). Due to variations in facial shape, including eye position, optical axis, and nose height, eye trackers in HMDs typically require a calibration procedure to establish the relationship between the system hardware and the user's gaze prior to use [19]. However, conventional eye tracker calibration assumes a fixed relative positional relationship between the user and the wearable device. In reality, due to user movements or changes in device position, maintaining a fixed position for the eye tracker relative to the eyes becomes challenging. A study has proposed a calibration method to account for relative positional changes between the user and the wearable device, enabling stable gaze estimation while users employ the wearable eye tracker by compensating for device slippage [20].

Nonetheless, the studies still assume that the receiver of the eye tracker is rigidly fixed to the wearable device. When considering alternative scenarios where the eyeglass frame features a foldable hinge, and the sensors are placed on the temple sides, the receiver of the eye tracker becomes displaced relative to the frame of the wearable device. Consequently, changes in the angle of the temples necessitate additional system calibration for eye trackers mounted on the temples. In this study, we propose a novel temple-mounted eye tracker design that addresses the abovementioned challenges. By positioning the eye-tracking sensors on the temples of glasses-type devices, we offer a solution that provides robust and accurate gaze estimation, even in the presence of temple angle variations. Our proposed system addresses the limitations of existing approaches, providing a more flexible and practical solution for wearable eye-tracking applications.

3. Temple-mounted eye tracker

3.1 System configuration

The proposed temple-mounted eye tracker system configuration, as depicted in Fig. 1(a), consists of a transmitter (Tx) comprising a near-infrared (NIR) LED, a receiver (Rx) consisting of a small form-factor NIR camera, and a patterned NIR mirror used as a reflector. Initially, NIR light is emitted from the LED, and the NIR mirror reflects Tx optical signal, directing it toward the user's eyes. As the NIR light interacts with the eye, it creates glints and features that enable the eye tracker to track gaze movements. The receiver then captures the reflected signal.

 figure: Fig. 1.

Fig. 1. (a) System configuration for temple-mounted eye tracker. The Tx and Rx mean an NIR transmitter and a receiver, respectively. (b) Illustration of coated areas (reflecting inclined NIR and transmitting visible light) and uncoated areas (transmitting visible light).

Download Full Size | PDF

The NIR mirror is designed to exhibit high reflectance, specifically at the NIR wavelength and the desired range of angle of incidence as shown in Fig. 1(b). In contrast, visible light is transmitted through the mirror, ensuring the user's outside view remains clear and unobstructed. The NIR mirror incorporates invisible patterns to facilitate system calibration according to different angles of the eyeglasses’ temples. The lenses are coated with a patterned NIR coating, as illustrated in Fig. 1. During calibration, the system spatially analyzes the selected patterned area, which appears as a dark region in the Rx's image. The proposed system configuration enables accurate gaze estimation by utilizing the reflective properties of the patterned NIR mirror, the glints and features generated by the interaction of NIR light with the eye, and the analysis of the patterned NIR coating on the lenses. The proposed temple-mounted eye tracker offers a practical and effective solution for capturing eye movements and providing precise gaze estimation even when the angle of the eyeglasses’ temples changes.

3.2 Patterned NIR mirror

A vapor deposition process is employed to apply a reflective coating with high reflectance at the target wavelength and angle of incidence, which significantly impacts the optical efficiency of the eye-tracking system. Additionally, the coating should have a high transmission rate for visible light at normal incidence to ensure an unobstructed vision experience for the user. By exploiting the difference in wavelengths between the NIR Tx and visible light, the coating effectively separates functionality as a mirror for NIR light and a transparent glass for visible light. The cutoff wavelength of the coating is set to 840 nm, while the Tx operates at a wavelength of 940 nm with a target angle of incidence of 45 degrees. Given that the cutoff frequency can shift with varying incidence angles, it is crucial to maintain a sufficiently high reflection rate around the target wavelength to ensure uniform optical efficiency across the Tx's field of view. Moreover, the coating effectively suppresses the reflection of normal incidence visible light to maintain clarity in the real world and prevent color distortions and rainbow effects. Figure 2 illustrates the high reflectance of the reflective coating at a 45-degree angle of incidence for NIR light while maintaining a relatively high transmittance for visible light at normal incidence.

 figure: Fig. 2.

Fig. 2. Reflectance and transmittance of the mirror coating with Tx’s wavelength range.

Download Full Size | PDF

To create the pattern on the glass substrate, a masking process is utilized following the shaping and polishing of the substrate. The masking is achieved using two layers: low-temperature masking ink as the first layer, followed by alcohol-resistant ink as the top layer. The masking process defines a region on the glass surface where the coating is not deposited, allowing for the creation of patterns. After masking, the glass substrate undergoes evaporation processes to apply the coating, resulting in a total thickness of approximately 2.5 µm. Subsequently, the patterns are formed by peeling off the coating in the masked area, utilizing the height differences between the masked and non-masked regions. The resulting patterns are negative patterns on the glass surface that prevent the reflection of the Tx's optical signal. Figures 3(a) and (b) represent two types of spectacle lenses and their corresponding calibration patterns used in the wearable prototype of the temple-mounted eye trackers. Each pattern is specifically designed to accurately calculate the angle between the temples and frames while also minimizing the computational load required for pattern recognition algorithms.

 figure: Fig. 3.

Fig. 3. (a) An example of lens with ArUco marker and (b) checkers as negative patterns, (c) a rendered image of ArUco marker and (d) checkers with human head model and eye by a virtual Rx, (e) an example of pupil and pattern detection for ArUco marker with axes and (d) for checker patterns. Note that detection results for the squares inside of the rectangle frame are not plotted in the figure.

Download Full Size | PDF

To facilitate calibration, ArUco markers, and checkers are incorporated into the proposed patterns, strategically positioned near the periphery of the NIR mirror to avoid obscuring eye features and glints. Furthermore, to ensure robust operation concerning the user's eye position, the patterns, and Rx positions are determined using 3D graphics software. Rendered images in Figs. 3(c) and (d) show the lens and patterns with an eye in the field of view, illustrating bright eye images with dark patterns.

The masking technique proposed in this study allows for the printing of patterns with a tolerance of approximately 10 µm. The ArUco markers have a size of 5 mm with a 0.8 mm square feature width. Each square forms a checkerboard pattern with a width of 1 mm, and the eight checkers are surrounded by a 9 mm × 4 mm rectangle frame. While any well-known fiducial marks, such as ARTag [21] and AprilTag [22], can be adopted as calibration markers if camera calibration is possible, the maximum spatial frequency of the patterns should consider the tolerance of the patterning process. Current semiconductor processes, including various etching techniques, ensure high accuracy suitable for mass production. However, the pattern printing process proposed in this study offers a more straightforward and significantly cost-effective approach.

4. Eye tacking algorithm

4.1 User calibration

In order to accurately estimate the gaze direction based on the detected pupil position, it is necessary to establish a user-dependent mapping function that relates the pupil parameters to the corresponding gaze vector. A nine-point calibration session is conducted before the eye-tracking session to achieve the vector. The calibration process allows for the calibration of the eye-tracking system to the specific characteristics of each user. During the calibration session, the subject is instructed to fixate their gaze on nine displayed targets in a predetermined order. Each target corresponds to one of the nine display spaces, which are evenly divided into a 3-by-3 grid of cells. Sequentially fixating on each target, the eye-tracking system captures and records the subject's gaze movements.

To ensure accurate calibrations, a sufficient amount of data is collected for each target. More than 200 frames of the subject's eye movements are typically recorded before progressing to the next target. The procedure confirms that adequate eye movement data is captured and allows for robust calibration of the eye-tracking system. Throughout the calibration process, the eye tracker measures and records the relevant pupil parameters, such as pupil position and features, along with the known gaze vectors associated with each target. The data establish the user-dependent mapping function that accurately maps the detected pupil parameters to the corresponding gaze direction. By conducting the nine-point calibration session, we can effectively calibrate each user's eye-tracking system. The calibration step considers the unique characteristics of the user's pupils and gaze behavior, allowing for accurate and precise estimation of the gaze direction during subsequent eye tracking sessions.

4.2 Eye tracking algorithm with pattern calibration

In this study, we have implemented an eye-tracker that comprises several vital processes: marker detection, viewpoint compensation, pupil tracking, and gaze estimation (as depicted in Fig. 4). The relationship between glasses-frame deformation, the user's pupil position, and their gaze point is established through a general user calibration using nine fixed points. The functional modules involved in the eye-tracking system are described in detail below.

 figure: Fig. 4.

Fig. 4. The process of the proposed eye tracking algorithm. (a) An image of the user eye is taken for eye tracking. (b) ArUco markers in the eye image are detected. (c) By regarding the reference marker position, the image is warped the detected markers onto the reference viewpoint, where the dashed line shows the gap between marker positions. (d) The position of the user’s pupil is detected using the warped eye image. (e) By regarding the user calibration data, the gaze vector is estimated.

Download Full Size | PDF

The initial step of the eye-tracking process is marker detection. Our system analyzes negative patterns in the image and calculates the center position and transformation matrix of these patterns. Unlike conventional fiducial markers, the markers utilized in our proposed device do not possess an outline due to inverted brightness. Consequently, we introduce a pattern detector designed explicitly for negative ArUco markers or checkers, employing a component analysis method. Initially, the detector selects pixels that are expected to construct the patterns in the receiver's (Rx) image. Subsequently, the detector finds markers by considering the size and center position of the group, in addition to assessing its similarity to the original pattern.

The viewpoint compensation process is initiated after identifying the closest matching pattern. Firstly, we compute the level of glasses-frame deformation using perspective projection matrices, leveraging the differences between the detected and reference markers. The position of the reference marker is user-dependent, and thus, we obtain it from a training dataset collected during the user calibration session. The process warps the observed image onto the reference viewpoint, minimizing the effects of deformation. The image warping process unfolds as follows, predominantly utilizing the OpenCV library without resorting to deep learning-based algorithms. Initially, a marker is identified within the reference image, and its corners and distinct features are extracted. A similar procedure is then applied to detect the marker from the incoming image and log its features. By comparing these parameters, an optimal perspective transformation bridges the reference and the current image. Building on the pre-existing marker data, the homography of the new marker is deduced. The extraction of this homography predominantly leverages Zhang's SVD decomposition algorithm [23]. Subsequently, a warping computation aligns the perspective of the current image with that of the reference image.

The warped image is then fed into the feature extractor, which is the pupil tracking system. Pupil data is constructed based on the pupil's center position and ellipse parameters. The pupil detector is constructed by modifying a well-known neural network architecture called U-Net [24,25]. We tailor the network by employing three contracting and expansive blocks, which are shallower than the original U-Net architecture. The modification is implemented to accommodate the smaller input size of the eye-tracker, which is set at 160 × 120 pixels (as shown in Fig. 5). The network is trained to segment the pupil region using the input data. As a post-processing step, pixels with high confidence are selected and grouped based on their distances. The group with the largest number of members is considered the pupil region. The outline of the region is computed using a Canny edge detector, and the ellipse parameters are obtained using an ellipse fitter.

 figure: Fig. 5.

Fig. 5. The neural network architecture of the pupil tracker. The eye image has size of 160 × 120. Each box denotes multi-channel feature map. The filter size is on the bottom of the box. Left four blocks are contracting blocks, and right three are expansive blocks. The yellow boxes show convolutional layers with 3 × 3 kernel, and orange max pooling with 2 × 2 kernel. Blue boxes mean up-convolutional-and-concatenating layers. The purple box denotes convolutional softmax layer.

Download Full Size | PDF

Subsequently, the gaze estimation process follows the pupil detection step. The gaze estimator functions as a mapping function that projects the pupil data to the user's gaze point. As the estimator is user-dependent, a training dataset needs to be acquired during the user calibration session. Our gaze estimator employs a shallow network consisting of three dense layers. The network exhibits a high sensitivity to the training dataset compared to conventional methods such as non-linear regression and support vector machines [2628]. To address this sensitivity, a data quality checker is introduced during the user calibration session. The checker monitors the pupil data stream and approves samples as a training dataset if the data exhibits stable variance and sufficient duration. It also filters out data collected during eye blinking or unnecessary eye movements. The approach guarantees that each session continuously provides refined samples for training. The samples in the training dataset are further processed using marker detection and viewpoint compensation techniques. Due to slight variations in individual marker positions relative to the reference marker position, the pupil data obtained from the rectified images is fed into the gaze estimator for training. Those pre-processing steps enhance the robustness of our gaze estimator to viewpoint-compensated images. Figure 6 illustrates the impact of image warping on eye tracking by changing temples’ angle gradually. The reference points were set at θ equals 0°.

 figure: Fig. 6.

Fig. 6. Comparison between original and compensated pupil image for three temple angles (gaze direction remains same).

Download Full Size | PDF

Without image-warping compensation, the eye-tracking calibration information generated at 0° cannot be effectively utilized due to variations in pupil center positions for the same gaze position across the different temple angles. The misalignment introduces significant errors in gaze estimation. However, when image-warping compensation is applied, the pupil center positions remain relatively consistent across the different temple angles. Consequently, the calibration performed at 0° can be effectively utilized for different temple angle cases without noticeable errors. The proposed calibration demonstrates the robustness of our eye-tracking algorithm in mitigating the impact of glasses-frame deformation. By implementing image-warping compensation, our eye-tracking system successfully rectifies the observed images to align with the reference viewpoint, allowing for accurate gaze estimation across different temple angles. Proposed procedures enhance the reliability and robustness of our eye-tracking algorithm, making it suitable for applications involving glasses-frame deformation.

Our pupil segmentation module is derived from a weight-optimized version of SegNet. When executed 100 times on an RTX3090 with GPU acceleration, using the model saved in ONNX format, the average processing time was recorded as 0.0039 seconds.

5. User study

5.1 Experiment method

The experiment was conducted to compare the eye-tracking accuracy of the proposed algorithm, as described in Section 3, with a conventional algorithm. The experiment aimed to demonstrate the proposed algorithm's improved robustness in handling changes in temple angles. The proposed eye-tracker was implemented on eyeglasses, as shown in Fig. 7(a). The Tx and (Rx modules (Pupil Labs, 192 × 192 pixels, max FPS = 200 Hz) were fixed on the right temple of the glasses. The patterned NIR mirror was cut to fit into the glasses, as depicted in Figs. 7(b) and (c). Informed written consent was obtained from the 8 participants recruited for the user study, all Asian males wearing glasses and affiliated with Samsung Research.

 figure: Fig. 7.

Fig. 7. (a) Wearable prototype with a user, (b) actual lens of the proposed NIR coating with ArUco marker and (c) checkers. (Both figures are captured under NIR lighting)

Download Full Size | PDF

A 23-inch monitor (LS23C570, Samsung) with a resolution of 1920 × 1080 pixels and a refresh rate of 60 Hz was used to display the calibration target points. The data collection process involved two conditions. In the first condition, the subjects wore the glasses in a regular manner, while in the second condition, the temple-mounted eye-tracker was outwardly flexed and re-worn to produce displaced eye-tracking images intentionally. The user calibration matrix was established using only the frames collected under the first condition. The eye-tracking accuracy of both algorithms was then compared under both conditions.

For the Experiment section's starting point, we decided to use a user-centric approach by prioritizing individual comfort. Recognizing that everyone has different preferences when wearing glasses, we controlled the relative angles to denote the unfolding angle of the temples. Our experiments began by identifying a comfort angle for each participant, establishing it as the benchmark for all further adjustments. This methodology ensured that our results were rooted in the realistic and varied experiences of wearers.

During the data collection, solid circular targets with a radius of 10 pixels were sequentially presented in nine positions arranged in a 3 × 3 grid. The horizontal distance between the targets was 480 pixels, and the vertical distance was 270 pixels. Prior to data collection, the subjects were instructed to sit down, wear the glasses, and fixate on the center of the monitor, which was placed at a distance of 60 cm from the subjects. Also, during the data collection process, the subjects were instructed to maintain their gaze on each target until it changed to the next position. The operator adjusted the height of the chin-rest to align the eye position with the center of the monitor. To ensure accurate eye feature extraction, the position of the glasses on the head was adjusted to prevent overlap between the marker and the eye in the output image.

5.2 Experiment results

We compared the eye-tracking accuracy of the proposed algorithm in Section 4 with the conventional algorithm by means of a user study in order to demonstrate that the proposed algorithm is more robust to temple-angle changes. For the user study, informed written consent was given by all subjects. We recruited 8 participants (all Asian, male with glasses) at Samsung Electronics. Figure 8(a) shows pattern recognition and eye tracking results. Patterns are outlined in white line and pupil area is segmented and represented in red circle with three-axis of Cartesian coordinates. The 3D eye models are estimated with blue circle respectively. The green gaze vectors represent estimated gaze vectors. From the pattern and ellipse of pupil, 3D eye models are estimated frame by frame. With those of centers in 3D eye models and pupil area, the gaze vector is calculated by drawing a line between the center of eyeball and pupil.

 figure: Fig. 8.

Fig. 8. (a) Pattern recognition and eye tracking results. (b) Gaze vector errors with and without proposed compensations based on relative temple angles. (c) Gaze extraction results for three different wearing conditions.

Download Full Size | PDF

The average eye tracking accuracy of the two algorithms for all subjects under each condition is shown in Fig. 8(b) and Fig. 9. Error bars indicate standard error. The mean absolute error (MAE) of the gaze angle difference between the ground-truth target position and the estimated gaze position was used as an error metric. In the regular condition, the averaged eye tracking accuracy were similar for the two algorithms (Conventional: 3.61, Proposed: 3.80). Changing the temple angle in the flexed condition degrade the eye tracking accuracy of both algorithms. However, the averaged accuracy of the conventional algorithm became almost double of that of the proposed algorithm (Conventional: 15.49, Proposed: 7.91). This trend appeared similarly for all subjects as shown in the Table 1.

 figure: Fig. 9.

Fig. 9. Eye tracking error comparison in two test conditions (Error bars indicate standard error) respectively.

Download Full Size | PDF

Tables Icon

Table 1. Eye-tracking error for conditions and subjects

5.3 Discussion

In the experimental results, the proposed algorithm exhibited superior eye-tracking accuracy compared to the conventional algorithm, as evidenced by the lower mean absolute error (MAE). The proximity of our IR camera to the subject's eye played a significant role, as even minor temple angle changes of less than 5 degrees resulted in noticeable changes in pupil pixel positions. To mitigate the impact of such angle changes, our proposed algorithm employed image warping to align the eye frame onto a reference viewpoint based on the marker position. While the proposed algorithm showed promising initial results, there is still room for improvement, especially considering the increased error observed in the flexed condition.

One primary source of error lies in the detection of ArUco markers. Due to the reflective coating limitations, we did not include a border around the markers in our implementation. The absence of a border can lead to less stable marker detection, mainly due to the overlap between the markers and the eyebrows. Our marker detection algorithm sometimes overestimates or underestimates the marker size and, in some cases, fails to detect the markers altogether. Enhancements and optimizations of the marker design and detection algorithm would significantly improve the accuracy of the proposed algorithm. Another source of error is the slippage of the eyeglasses during the change in temple angle condition as shown in Fig. 8(c). While intentionally flexing the temple outward to collect eye frames under different conditions, we could not maintain the eyeglasses’ consistent position and pose in the vertical direction. Our algorithm is robust to temple twists in the vertical or horizontal direction but does not explicitly account for eyeglass slippage. The errors could be effectively reduced by implementing a slip-proof algorithm, such as a pupil center and corneal reflections algorithm, to ensure more stable eyeglass positioning [29].

In systems such as gaze tracking, accuracy is sometimes considered to represent all of the system's performance. However, multiple factors interplay in this evaluation, potentially overshadowing the true essence of an eye tracker's capabilities. The role of preprocessing gaze data, for instance, is critical. It is shaped by the system's inherent characteristics and the hardware's condition—such as the image sensor's FPS—which affects how the gaze information is processed. In some scenarios, high-fps data might be a boon, while in others, like specific AR/VR contexts, it could be counterproductive. External variables, including a user's age, ambient light conditions, and the presence of a dominant eye, can also skew the fundamental tracking accuracy. These factors highlight the significance of scenario-specific data preprocessing. Directly contrasting a user's observed gaze point with their intended target may oversimplify our system's performance. While examining the temporal standard deviation of accuracy might illuminate the system's stability, it also risks revealing preprocessing trade-offs.

In exploring eye-tracking systems, we were prompted to consider the implications and challenges of marker-free methods. While the notion of an accurate eye tracker without markers is attractive, its practical execution presents challenges. Alterations in the eyeglass temples, such as folding or unfolding, induce significant shifts in the camera system. Compensating for these shifts without markers requires an external standard, possibly sensors capturing facial expressions or 3D head information. However, the efficacy of such indirect sources of information, given the present state of advanced eye trackers, remains to be determined. An alternative strategy would involve tracking tangible physical structures other than our proposed patterns on the reflective coating. A camera with a wide field of view might capture both the user's eyes and key facial features, providing insights into the temple's folded state. Yet, this method introduces its own set of complications. The vast diversity in eyeglass frame designs, especially those where aesthetics play a critical role, necessitates unique algorithms for each frame. A singular design requires a dedicated deep-learning network. Through these deliberations, we aim to provide a holistic understanding of the complexities and potential trajectories in developing marker-less eye-tracking systems that are incorporated within eyeglass temples.

Furthermore, it is essential to acknowledge the limitations of our study. The experiment was conducted with a relatively small sample size of 8 participants, all Asian males wearing glasses and affiliated with Samsung Research. Therefore, the generalizability of the findings to a broader population may be limited. Future studies should include a more diverse sample, encompassing individuals of different genders, ethnicities, and eyeglass types, to assess the algorithm's performance across a broader range of demographics. Additionally, further investigations could explore the impact of different environmental conditions, such as variations in lighting and background clutter, on the accuracy and robustness of the proposed algorithm. In conclusion, the proposed algorithm demonstrated improved eye-tracking accuracy and robustness in handling temple angle changes. However, there are opportunities for further improvement, particularly in marker detection and eyeglass slip-proof algorithms. Addressing these areas of concern would enhance the performance and applicability of the proposed eye-tracking system.

6. Conclusion

The manuscript presents the development of a temple-mounted eye tracker designed for glasses with foldable temples, aiming to provide unobstructed vision and accurate gaze estimation. The proposed system incorporates a NIR mirror with high transmittance for visible light, enabling the eye tracking system to be located on the temples. Wearable prototypes were implemented and evaluated by adjusting the opening angle of the temples. Through user studies, we demonstrated the system's ability to consistently capture reliable and stable gaze-tracking information from users with different face widths. The real-time pattern-matching algorithm employed in the system effectively calibrated gaze estimations, resulting in significantly reduced gaze errors compared to conventional methods. The research opens up possibilities for practical and seamless gaze tracking on eyeglasses, with potential applications in augmented reality, virtual reality, and human-computer interaction fields.

Acknowledgment

The research was supported by Samsung Research, Samsung Electronics Co., Ltd.

Disclosures

YJ, SS, BK, DYK, JCC: Samsung Electronics (F,E,P), KK, GY: Samsung Electronics (F,E)

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. W. Fuhl, T. C. Kubler, T. Santini, et al., “Automatic generation of saliency-based areas of interest for the visualization and analysis of eye-tracking data,” in VMV, (2018), pp. 47–54.

2. Q. Zhao and C. Koch, “Learning a saliency map using fixated locations in natural scenes,” J. \Vision 11(3), 9 (2011). [CrossRef]  

3. Y. Li, P. Xu, D. Lagun, et al., “Towards measuring and inferring user interest from gaze,” in Proceedings of the 26th International Conference on World Wide Web Companion, (2017), pp. 525–533.

4. T. Toyama, T. Kieninger, F. Shafait, et al., “Gaze guided object recognition using a head-mounted eye tracker,” in Proceedings of the Symposium on Eye Tracking Research and Applications, (2012), pp. 91–98.

5. B. Guenter, M. Finch, S. Drucker, et al., “Foveated 3d graphics,” ACM Trans. Graph. 31(6), 1–10 (2012). [CrossRef]  

6. A. Patney, M. Salvi, J. Kim, et al., “Towards foveated rendering for gaze-tracked virtual reality,” ACM Trans. on Graph. (TOG) 35(6), 1–12 (2016). [CrossRef]  

7. L. L. Di Stasi, M. B. McCamy, A. Catena, et al., “Microsaccade and drift dynamics reflect mental fatigue,” Eur. J. Neurosci. 38(3), 2389–2398 (2013). [CrossRef]  

8. J. Zheng, K. Chan, and I. Gibson, “Virtual reality,” IEEE Potentials 17(2), 20–23 (1998). [CrossRef]  

9. L.-H. Lee and P. Hui, “Interaction methods for smart glasses: A survey,” IEEE Access 6, 28712–28732 (2018). [CrossRef]  

10. A. Olwal, K. Balke, D. Votintcev, et al., “Wearable subtitles: Augmenting spoken communication with lightweight eyewear for all-day captioning,” in Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, (2020), pp. 1108–1120.

11. V. Tanriverdi and R. J. Jacob, “Interacting with eye movements in virtual environments,” in Proceedings of the SIGCHI conference on Human Factors in Computing Systems, (2000), pp. 265–272.

12. T. Piumsomboon, G. Lee, R. W. Lindeman, et al., “Exploring natural eye-gaze-based interaction for immersive virtual reality,” in 2017 IEEE symposium on 3D user interfaces (3DUI), (IEEE, 2017), pp. 36–39.

13. M. Kyto, B. Ens, T. Piumsomboon, et al., “Pinpointing: Precise head-and eye-based target selection for augmented reality,” in Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, (2018), pp. 1–14. [CrossRef]  

14. F. Lu, T. Okabe, Y. Sugano, et al., “A head pose-free approach for appearance-based gaze estimation,” In BMVC, (2011), pp. 1–11.

15. K.-H. Tan, D. J. Kriegman, and N. Ahuja, “Appearance-based eye gaze estimation,” in Sixth IEEE Workshop on Applications of Computer Vision, 2002.(WACV 2002). Proceedings., (IEEE, 2002), pp. 191–195.

16. J. Meyer, T. Schlebusch, W. Fuhl, et al., “A novel camera-free eye tracking sensor for augmented reality based on laser scanning,” IEEE Sensors J. 20(24), 15204–15212 (2020). [CrossRef]  

17. S. Robbins and I. A. Nguyen, “Waveguide eye tracking employing switchable diffraction gratings,” (2016). US Patent 9,494,799.

18. J.-H. Song, J. van de Groep, S. J. Kim, et al., “Non-local metasurfaces for spectrally decoupled wavefront manipulation and eye tracking,” Nat. Nanotechnol. 16(11), 1224–1230 (2021). [CrossRef]  

19. T. Santini, W. Fuhl, and E. Kasneci, “Calibme: Fast and unsupervised eye tracker calibration for gaze-based pervasive human-computer interaction,” in Proceedings of the 2017 chi conference on human factors in computing systems, (2017), pp. 2594–2605.

20. T. Santini, D. C. Niehorster, and E. Kasneci, “Get a grip: Slippage-robust and glint-free gaze estimation for real-time pervasive head-mounted eye tracking,” in Proceedings of the 11th ACM symposium on eye tracking research & applications, (2019), pp. 1–10.

21. M. Fiala, “Artag, a fiducial marker system using digital techniques,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), vol. 2 (IEEE, 2005), pp. 590–596.

22. E. Olson, “Apriltag: A robust and flexible visual fiducial system,” in 2011 IEEE international conference on robotics and automation, (IEEE, 2011), pp. 3400–3407.

23. Z. Zhang and A. R. Hanson, “3D reconstruction based on homography mapping,” Proc. ARP A96, 1007–1012 (1996).

24. O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, (Springer, 2015), pp. 234–241.

25. S. Y. Han, H. J. Kwon, Y. Kim, et al., “Noise-robust pupil center detection through cnn-based segmentation with shape-prior loss,” IEEE Access 8, 64739–64749 (2020). [CrossRef]  

26. P. Blignaut, K. Holmqvist, M. Nystrom, et al., “Improving the accuracy of video-based eye tracking in real time through post-calibration regression,” Curr. Trends Eye Track. Res. 1, 77–100 (2014). [CrossRef]  

27. Z. Zhu and Q. Ji, “Robust real-time eye detection and tracking under variable lighting conditions and various face orientations,” Comput. Vis. Image Underst. 98(1), 124–154 (2005). [CrossRef]  

28. J. Mompeán, J. L. Aragón, P. M. Prieto, et al., “Design of an accurate and high-speed binocular pupil tracking system based on gpgpus,” The J. Supercomput. 74(5), 1836–1862 (2018). [CrossRef]  

29. E. D. Guestrin and M. Eizenman, “General theory of remote gaze estimation using the pupil center and corneal reflections,” IEEE Trans. Biomed. Eng. 53(6), 1124–1133 (2006). [CrossRef]  

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (9)

Fig. 1.
Fig. 1. (a) System configuration for temple-mounted eye tracker. The Tx and Rx mean an NIR transmitter and a receiver, respectively. (b) Illustration of coated areas (reflecting inclined NIR and transmitting visible light) and uncoated areas (transmitting visible light).
Fig. 2.
Fig. 2. Reflectance and transmittance of the mirror coating with Tx’s wavelength range.
Fig. 3.
Fig. 3. (a) An example of lens with ArUco marker and (b) checkers as negative patterns, (c) a rendered image of ArUco marker and (d) checkers with human head model and eye by a virtual Rx, (e) an example of pupil and pattern detection for ArUco marker with axes and (d) for checker patterns. Note that detection results for the squares inside of the rectangle frame are not plotted in the figure.
Fig. 4.
Fig. 4. The process of the proposed eye tracking algorithm. (a) An image of the user eye is taken for eye tracking. (b) ArUco markers in the eye image are detected. (c) By regarding the reference marker position, the image is warped the detected markers onto the reference viewpoint, where the dashed line shows the gap between marker positions. (d) The position of the user’s pupil is detected using the warped eye image. (e) By regarding the user calibration data, the gaze vector is estimated.
Fig. 5.
Fig. 5. The neural network architecture of the pupil tracker. The eye image has size of 160 × 120. Each box denotes multi-channel feature map. The filter size is on the bottom of the box. Left four blocks are contracting blocks, and right three are expansive blocks. The yellow boxes show convolutional layers with 3 × 3 kernel, and orange max pooling with 2 × 2 kernel. Blue boxes mean up-convolutional-and-concatenating layers. The purple box denotes convolutional softmax layer.
Fig. 6.
Fig. 6. Comparison between original and compensated pupil image for three temple angles (gaze direction remains same).
Fig. 7.
Fig. 7. (a) Wearable prototype with a user, (b) actual lens of the proposed NIR coating with ArUco marker and (c) checkers. (Both figures are captured under NIR lighting)
Fig. 8.
Fig. 8. (a) Pattern recognition and eye tracking results. (b) Gaze vector errors with and without proposed compensations based on relative temple angles. (c) Gaze extraction results for three different wearing conditions.
Fig. 9.
Fig. 9. Eye tracking error comparison in two test conditions (Error bars indicate standard error) respectively.

Tables (1)

Tables Icon

Table 1. Eye-tracking error for conditions and subjects

Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.