Passive indoor visible light-based fall detection using neural networks

Khaqan Majeed; Steve Hranilovic

doi:10.1364/OE.441009

1. Introduction

Visible light positioning (VLP) systems [1,2] have attracted much attention recently due to the common availability of luminaires in the indoor environment. VLP techniques can be broadly classified into active and passive localization techniques based on the involvement of a user in the localization process.

In active localization, the user participates in the localization process by carrying a mobile device or a receiver and transmitting signals that are measured at the receiver [1,3]. The energy in line-of-sight (LoS) components of the detected signals play an important role in positioning. However, a direct LoS is not necessarily required in passive localization scenarios [2,4,5] where the user does not hold any device, which is in contrast to more conventional active localization.

The detection of a human fall in a passive scenario can be considered as a complementary application of passive localization techniques [4–6] since it does not require additional hardware resources but rather employs signal measurements that are already available in the passive VLP methods. The authors in [7,8] provide a comprehensive overview of the fall detection techniques. A majority of approaches require the use of devices worn by an individual to detect the fall. These devices include cameras, sensors attached to the user’s body e.g., wrist, waist, ankle, etc., remote sensing receivers or a combination of aforementioned devices. The method proposed in [9] uses a purpose built motion depth sensor to detect human fall, where the data from the sensor are used to compute velocity of the object. The anomalous change in computed velocity is used to classify the fall. The methods in [10,11] use cameras to obtain image or video of a person in order to detect the fall. The useful features required to classify the fall are extracted from the images captured by the camera. In [12], an infrared-based motion detector is used to detect a fall. The method in [13] employs an array of ultrasonic sensors that is used to perform gesture analysis of human posture in order to detect the fall.

Several fall detection systems discussed in [7,8,14] are based on wearable sensors, cameras, ambient sensors or hybrid combination of these. For instance, a recent work in [15] employs a camera, where it gathers real time video of the shadows formed on the wall due to the users and uses convolutional neural networks to count number of people in the area and detect their activities. These earlier fall detection approaches have some drawbacks. The use of cameras for fall detection requires monitoring of user movement in the indoor area which has privacy implications. Wearable sensor-based methods require the user to wear sensors in order to collect signal measurements. Radio frequency (RF)-based [16] approaches sense movement of user using radio waves in the indoor area in order to detect the fall, however, RF-based methods are prone to interference since radio waves can travel through walls and opaque objects in contrast to light rays.

In this paper, a fall detection system is proposed that uses existing luminaires in a room in order to classify state (i.e., upright or prone) of a target (i.e., a person) in an indoor environment. Unlike the visible light positioning work in [4–6], this work presents the first example of a visible light sensing (VLS) application. Here we consider a realistic room model (unlike the simplified single-bounce model in [4,5]) which considers a complex indoor environment with multi-order reflections and wavelength-dependent characteristics of sources and surfaces. In order to classify the state of target, channel impulse response (CIR) measurements between source-receiver pairs are obtained for upright and prone states of the target at uniformly distributed locations in the room. Neural networks are then applied to learn the relationship between the CIRs and the states of the target. The proposed method does not require the user to wear a sensor or hold a device in order to detect the fall as compared to many earlier methods [7,8,17]. Furthermore, the proposed method leverages existing lighting infrastructure, preserves the privacy of the user, and offers low complexity in predicting the fall due to the collection of CIR measurements in a short period of time i.e, on the order of $\mu$s. It is important to note that, in order to detect the fall, the target is assumed to be present inside the room. Secondly, the proposed method uses a snapshot of CIR measurements between different source-receiver pairs in order to detect the fall. One of the important applications of fall detection techniques is to detect falls in healthcare settings since such events can result in serious injuries or even death in case of delayed emergency response.

The remainder of the paper is organized as follows. Sec. 2 describes modeling of a realistic room scenario. Sec. 3 shows performance evaluation of the proposed method. Finally, the paper is concluded in Sec. 4.

2. System model

An example of passive scenario is shown in Fig. 1, where all three states (upright, tilted, and prone) of a user, assumed to be in the room a priori, are shown with multi-order reflections of light rays between a source-receiver pair. The light rays $w_1$, $w_2$, $w_3$, and $w_4$ undergoing 2^nd-order, 1^st-order, 1^st-order, and 3^rd-order reflections respectively, between the source $s_i$ and receiver $r_j$ are also shown in the figure. This concept is realized by modeling a room in Zemax Opticstudio [18] as described in Sec. 2.1.

Fig. 1. An example room scenario showing all three states (upright, tilted, and prone) of a user in the room. The rays $w_1$, $w_2$, $w_3$, and $w_4$ show 2^nd-order, 1^st-order, 1^st-order, and 3^rd-order reflections respectively. The silhouettes of a person are reproduced from [19].

Download Full Size | PDF

2.1 Realistic room model

A realistic room environment similar to the one in [6] is modeled in Zemax OpticStudio and is shown in Fig. 2 with the sources and receivers co-located on the ceiling. The figure shows an example of upright, tilted, and prone states of the target at several random locations, where all possible cases of the tilted state (north, east, south, and west directions) and prone state (horizontal (x-axis) and vertical (y-axis) directions) are shown. It is important to note that the tilted state is not used in the training process, but rather used only to evaluate robustness of the proposed fall detection method.

Fig. 2. Zemax model of the room containing co-located sources and receivers on the ceiling, furniture, and target with upright, tilted, and prone states. The target is located at (1.94 m, 2.01 m, 1.6 m) in upright position and room dimensions are 5m $\times$ 5 m $\times$ 3 m. The tilted (north, east, south, and west) and prone (horizontal and vertical) states at random locations are also shown in (a) 3D view and (b) top view of the room.

Download Full Size | PDF

The sources are assumed to have a Lambertian radiation pattern [20] given as

(1)$$R\left( \varphi \right) = \frac{{m + 1}}{{2\pi }}{{\xi P_{\mathrm{max}}}}{\cos ^{m}}\left( \varphi \right)$$

where $m$, $\xi$, and $P_{\mathrm {max}}$ are the Lambertian index, brightness factor, and total output power of the source respectively, and $\varphi$ is the angle formed between the normal of the source and the ray emerging from the source. Moreover, the sources are also assumed to have variable flux distribution against wavelengths of visible light (i.e., photopic source in Zemax [18]). The receivers have area $A_r$ and field-of-view (FOV) $\Psi$. It is important to note that the CIR measurement obtained between a source-receiver pair is directly proportional to $A_r$. The larger $\Psi$ of the receiver allows to capture more details in the CIR waveform. The effect of receiver parameters on the channel model has been investigated in [21]. Furthermore, coating materials with measured wavelength-dependent characteristics are used to coat different surfaces in the room including furniture [22–24]. The CAD objects corresponding to the furniture are modeled using Blender [25].

Consider the CIR $h_{{s_i}{r_j}}\left (\tau \right )$ between source $s_i$ and receiver $r_j$ that can be approximated as

(2)$$h_{{s_i}{r_j}}\left(\tau\right) \approx \sum_{k = 1}^{{N_p}} {{I_k}\delta \left( {\tau - {\tau_k}} \right)}$$

where $N_p$ are the total number of rays detected at the receiver, $I_k$ is the intensity of last segment of the detected ray, and $\tau _k$ is the time corresponding to total path length of the detected ray. Examples of CIR waveforms $h_{{s_5}{r_5}}\left (\tau \right )$ corresponding to upright, tilted and prone states (west direction in case of the tilted state and horizontal direction i.e., along x-axis in case of the prone state) between source $s_5$ and receiver $r_5$ when the target is located near center of the room (see Fig. 2) are shown in Fig. 3(a). The CIRs can be well approximated as being time-limited, i.e. $\tau =[0,\tau _{\mathrm {max}}]$ and $\tau _{\mathrm {max}}$ can be chosen such that sufficient details of multi-order reflections are captured and is limited in practice by the storage available at the receiver.

Fig. 3. (a) Channel Impulse response (CIR) $h_{{s_5}{r_5}}\left (\tau \right )$ between source $s_5$ and receiver $r_5$ when the target is located at (1.94 m, 2.01 m, 1.6 m) as illustrated in Fig. 2 for upright, tilted, and prone states (Note that the tilted and prone states are not shown in Fig. 2 for this target location). The values of parameters used in simulation are listed in Table 1. (b) Difference of CIRs: upright and prone states and tilted and prone states.

Download Full Size | PDF

The CIR can be measured between a source-receiver pair by considering a single pair active at a time with measurement process repeated for all the pairs. The measured CIR $g_{{s_i}{r_j}}\left (\tau \right )$ between source $s_i$ and receiver $r_j$ can be modeled as

(3)$$g_{{s_i}{r_j}}\left(\tau\right) = h_{{s_i}{r_j}}\left(\tau\right) + n_{{s_i}{r_j}}\left(\tau\right)$$

where $n_{{s_i}{r_j}}\left (\tau \right )$ is additive white Gaussian noise (AWGN) with zero mean and variance $\sigma ^{2}$. Furthermore, the noise is considered to be independent amongst the receivers. The measured CIRs between $N_s$ sources and $N_r$ receivers corresponding to a single state of the target can be gathered in set ${{\cal G}}$ as

(4)$${{\cal G}} = \left\{ {\begin{array}{*{20}{c}} {{g_{{s_1}{r_1}}}\left( \tau \right), \ldots ,{g_{{s_1}{r_{{N_r}}}}}\left( \tau \right)}\\ \vdots \\ {{g_{{s_{{N_s}}}{r_1}}}\left( \tau \right), \ldots ,{g_{{s_{{N_s}}}{r_{{N_r}}}}}\left( \tau \right)} \end{array}} \right\}$$

The CIRs between source $s_i$ and all the receivers can be obtained by turning on each source $s_i$ one at a time. The measurement process is repeated for all the remaining sources. The source-receiver pairs can be controlled from a central backbone network, potentially realized via power-over-ethernet.

It is important to note that CIR is obtained for a target of fixed height and given reflectivity values. The peak in CIR due to the presence of target in upright state occurs earlier in time as compared to the rest of CIR waveform as shown in Figs. 3(a) and 3(b). The reflections from sides of the target occur later in time and are also smaller in amplitude. Higher order bounces occur much later in time due to large path lengths and become submerged in the remainder of CIR waveform. The figure also shows the CIR due to tilted and prone states, where the peaks due to these states occur relatively later in time as compared to the upright state. Therefore, small changes in height and choice of clothes does not affect position and amplitude of peaks corresponding to different states in the CIR waveform considerably. The robustness of the proposed method is numerically evaluated in Sec. 3.

The simulation results presented employ Zemax OpticStudio in order to model the room scenario. The use of ray tracing for room modeling has been widely investigated in the past [26–28]. For example, [28] showed that the CIR computed via Zemax model and that of a comparable experimental setup have a root-mean-square error within 2%.

2.2 Neural network classification

Neural networks are used to classify state of the target i.e., upright or prone in the room. Figure 4 shows a complete system model including the feed-forward neural networks (FNN) architecture [29] that is employed for classification of the state of the target.

The CIRs are measured in a similar fashion as shown in [6]. The measured CIRs $g_{{s_i}{r_j}}\left (\tau \right ) \in {{\cal G}}$ in (4) are divided into $N_b$ time bins each where the time bins are defined as

(5)$$\mathbb{T}_n = \left\{\begin{array}{ll} \left[(n-1)\tau_b, n\tau_b \right), & \textrm{for } n=1,2,\ldots,N_b-1\\ \left[(N_b-1)\tau_b, \tau_{\mathrm{max}} \right), & \textrm{for } n=N_b \end{array}\right.$$

where $\tau _b =\lceil \tau _{\mathrm {max}} /N_b\rceil$ denotes the bin size and $\tau _{\mathrm {max}}$ denotes the maximum duration of CIR and $\lceil \cdot \rceil$ is the ceiling function. It is important to note that all bins are of equal size when $\tau _{\mathrm {max}} /N_b$ is an integer. The $n^{\mathrm {th}}$ feature of $g_{{s_i}{r_j}}\left (\tau \right )$ is defined as $b_{{s_i}{r_j},n} = \mathop\int\limits _{{\mathbb {T}_n}} {{g_{{s_i}{r_j}}}\left ( \tau \right )d\tau }$. The $N_b$ features from all CIRs are accumulated in an $N_b \times N_s \times N_r$ dimensional feature vector (data sample) as

(6)$${\textbf{b}} = {\left[ {\begin{array}{*{20}{c}} {{b_1}} & {{b_2}} & \cdots & {{b_{{N_b} \times {N_s} \times {N_r}}}} \end{array}} \right]^{T}}$$

where the subscript of elements in ${\textbf {b}}$ are simplified with correspondence to its dimensions.

Fig. 4. Block diagram showing a complete system architecture along with feed-forward neural networks for classifying output state of a target.

Download Full Size | PDF

Define ${{\cal T}}_{U,P}$ the training data set containing samples corresponding to upright and prone states only and ${{\cal T}}_{U,T,P}$ the training data set containing samples corresponding to all three states i.e., upright, tilted, and prone. The validation data set is defined as ${{\cal V}}_{U,T,P}$ and contains samples corresponding to all three states. In order to train the network, two independent cases of network training are considered that depend on the type of training data used, as described in Sec. 3.

Define $\Gamma _{U,P}$ as the scenario when the network is trained using ${{\cal T}}_{U,P}$ and validated with ${{\cal V}}_{U,T,P}$. Similarly, $\Gamma _{U,T,P}$ refers to network training with ${{\cal T}}_{U,T,P}$ and validation using ${{\cal V}}_{U,T,P}$. It is important to note that the validation data set contains all three target states i.e., ${{\cal V}}_{U,T,P}$ in both cases of network training. It is important to note that there is no overlap between the training and validation samples. Two scenarios of training data are used to train the network (i.e., ${{\cal T}}_{U,P}$ and ${{\cal T}}_{U,T,P}$) whereas independently measured validation data ${{{\cal V}}}_{U,T,P}$ are used for performance evaluation.

The samples in ${{\cal T}}_{U,P}$, ${{\cal T}}_{U,T,P}$, and ${{\cal V}}_{U,T,P}$ are obtained using (4) and (6) and are first preprocessed before feeding them to the FNN architecture as indicated in Fig. 4. The vector elements in ${{\cal T}}_{U,P}$, ${{\cal T}}_{U,T,P}$, and ${{\cal V}}_{U,T,P}$ are simply represented by changing notations in (6).

It is important to note that the samples in ${{\cal T}}_{U,P}$, ${{\cal T}}_{U,T,P}$, and ${{\cal V}}_{U,T,P}$ are obtained when the target is located at randomly distributed locations in the room as described in Sec. 3. In order to preprocess, the samples in data sets ${{\cal T}}_{U,P}$ and ${{\cal V}}_{U,T,P}$ corresponding to the first case of network training $\Gamma _{U,P}$ are normalized as $\bar {{{\cal T}}}_{U,P} = \left \{ {{{\bar {\textbf {t}}}_{i({U,P})}} = \frac {{{{\textbf {t}}_{i({U,P})}}}}{{{M_{{{\cal T}}_{U,P}}}}}|i = 1,2, \ldots,{|{{\cal T}}_{U,P}|}} \right \}$ and $\bar {{{\cal V}}}_{U,T,P} = \left \{ {{{\bar{\textbf v}}_{j({U,T,P})}} = \frac {{{\textbf {v}_{j({U,T,P})}}}}{{{M_{{{\cal T}}_{U,P}}}}}|j = 1,2, \ldots,{|{{\cal V}}_{U,T,P}|}} \right \}$ respectively, where ${\textbf {t}}_{i({U,P})}$ denotes the $i^{\mathrm {th}}$ sample vector in ${{\cal T}}_{U,P}$, ${\bar {\textbf {t}}}_{i({U,P})}$ denotes the normalized $i^{\mathrm {th}}$ sample vector in $\bar {{{\cal T}}}_{U,P}$, $\textbf {v}_{j({U,T,P})}$ denotes the $j^{\mathrm {th}}$ sample vector in ${{\cal V}}_{U,T,P}$, ${\bar{\textbf v}}_{j({U,T,P})}$ denotes the normalized $j^{\mathrm {th}}$ sample vector in $\bar {{{\cal V}}}_{U,T,P}$, $M_{{{\cal T}}_{U,P}} = \mathop {\max }_{1 \le l \le {N_b} \times {N_s} \times {N_r},1 \le i \le {|{{\cal T}}_{U,P}|}} t_{l(U,P)}^{\left ( i \right )}$ is the maximum element in all vectors in ${{\cal T}}_{U,P}$, and the operator $|\cdot |$ denotes cardinality of a set. Similarly, the normalized data sets $\bar {{{\cal T}}}_{U,T,P}$ and $\bar {{{\cal V}}}_{U,T,P}$ are obtained for the second case of network training $\Gamma _{U,T,P}$.

The normalized training and validation data sets are fed to the FNN architecture through an input layer as illustrated in Fig. 4. This is then followed by $N_l$ hidden layers and finally an output layer that predicts output state $\hat {u}$ of the target. The parameter values used to model the FNN architecture are described in Sec. 3.

3. Numerical results

3.1 Simulation environment

The values of parameters used in defining the simulation environment for ray tracing in Zemax are listed in Table 1, which corresponds to the room model shown in Fig. 2.

Table 1. Simulation Parameters

View Table | View all tables in this article

The number of sources and receivers are $N_s = N_r = 9$ and are considered co-located on the ceiling. In indoor environments, luminaires are typically installed to provide even illumination coverage over the whole room. Similarly, the number and FOV of the receivers can also be selected so that they cover the entire room. Though in this paper we only present results for a single illumination arrangement, our simulations have shown that so long as the room is adequately covered by the illuminators and the receivers that good fall detection can be realized. Additionally, we consider a single target with given dimensions and optical reflectivities for all fall detection results in this section. Our numerical simulations also indicate that performance does not depend strongly on the the reflectivity of the object sides but is dominated by reflections from the top surface of the target. Indeed, the FNN may need to be retrained in the case of multiple targets with different heights and is left as future work.

In order to collect the training (${{\cal T}}_{U,P}$ or ${{\cal T}}_{U,T,P}$) and validation (${{\cal V}}_{U,T,P}$) samples, the target is located at randomly selected positions in the room for upright, tilted, and prone states and the CIRs are measured between all source-receiver pairs for the target in each state. The maximum time duration of CIR is chosen as $t_{\mathrm {max}} = 100$ ns in order to capture sufficient details from multi-order reflections as evident from Fig. 3(a). The 100 ns time duration for CIR acquisition corresponds to total path length around 30 m when a ray is traced from a source to receiver with multi-order reflections, which is sufficient given the dimensions of the room shown in Table 1.

The tilted state is simulated randomly along four directions i.e., north, east, south, and west and the prone state is modeled randomly along horizontal and vertical directions i.e., x and y axes. The total number of locations considered are 1000 with each position containing upright, tilted, and prone states, where the number of tilted and prone states are equally divided among their respective directions i.e., 250 each in case of tilted state and 500 each in case of prone state. The network is trained using two types of training data i.e., $\Gamma _{U,P}$ and $\Gamma _{U,T,P}$ separately in order to compare the performance between them as described in Sec. 2.2.

Define the state of the target by $u$. In order to classify, the samples corresponding to the target states are labeled as follows: upright state by $u=1$ and prone state by $u=2$. The tilted state is a transition between upright and prone states and it is considered as $u=1$. It is important to note that the target states are divided into non-prone ($u=1$) and prone ($u=2$) states. The non-prone state consists of upright and tilted states and the tilted state is only used to evaluate robustness of the proposed method.

The noise model for signal-to-noise ratio (SNR) is considered similar to the one described in [30] and is given as

(7)$$\mathrm{SNR} = \frac{\alpha^{2}\left(\xi P_{\mathrm{max}}\right)^{2}}{\sigma^{2}}$$

where $\alpha$ is responsivity of the receiver measured in A/W, $P_{\mathrm {max}}$ is the the maximum optical power emitted by the source in Watts, $\xi \in \left [0.1,1\right ]$ controls brightness factor of the source, and the noise is considered independent among all the source-receiver pairs. Define $N_a$ as the total number of CIR acquisitions corresponding to a single source-receiver pair. The SNR can be improved by increasing $N_a$ and then averaging the measurements [6,31]. Notice that increasing $N_a$ improves the SNR but results in increased latency for CIR acquisitions.

The accuracy (%) of correctly predicting the target state is calculated as

(8)$$\mathrm{Accuracy} = \frac{1}{{{|{{\cal V}}_{U,T,P}|}}}\sum_{j = 1}^{{|{{\cal V}}_{U,T,P}|}} {\mathbf{1}\left( {\hat u_j = u_j} \right)}$$

where $\mathbf {1}\left (\cdot \right )$ is an indicator function that outputs 1 when the predicted and actual class values match and 0 vice versa and $|{{\cal V}}_{U,T,P}|$ are the total number of validation samples. The variables $\hat {u_j}$ and $u_j$ denote the predicted and actual class values respectively for the $j^{\mathrm {th}}$ sample in validation data set ${{\cal V}}_{U,T,P}$. In order to avoid overfitting, the number of layers and neurons in each layer are chosen after experimentation, which are $N_l = 3$ with 500 neurons each and all the training samples are used as a single batch while training the network [29]. Table 2 shows values of hyperparameters used to train the FNN which were chosen after extensive simulation.

Table 2. Hyperparameters and settings of FNN

View Table | View all tables in this article

3.2 Accuracy against the increasing time bin size $\tau _b$

The performance is evaluated by changing time bin size $\tau _b$ of CIRs in time binning process. The large $\tau _b$ reduces complexity of the analog-to-digital converter and storage requirements of the receiver at the detector. Figure 5 shows accuracy against the increasing $\tau _b$ (or equivalently smaller number of bins $N_b$). The different curves in the figure correspond to different values of brightness factor $\xi$ of the sources. The curves are obtained by averaging accuracy values over 30 separate experimental runs and using random training and validation samples in each run. The solid curves correspond to the first case of network training $\Gamma _{U,P}$ with $|{{\cal T}}_{U,P}| = 1600$ and $|{{\cal V}}_{U,T,P}| = 400$ and the dotted curves correspond to the second case of network training $\Gamma _{U,T,P}$ with $|{{\cal T}}_{U,T,P}| = 2400$ and $|{{\cal V}}_{U,T,P}| = 600$. It is important to note that the number of samples in validation set ${{\cal V}}_{U,T,P}$ are divided approximately evenly among the upright, tilted, and prone states. The number of CIR acquisitions for a single source-receiver pair for averaging as defined earlier are $N_a = 150$. This results in increased system latency for collecting measurements in (4) i.e., 900 ns for $N_a = 1$ to 135 $\mu$s for $N_a = 150$ in case of CIR with length $t_{\mathrm {max}} = 100$ ns and $N_s = 9$ sources since turning one source on at a time enables CIR acquisitions at all the receivers. However, CIR measurements between all source-receiver pairs can still be obtained within fraction of a second.

The performance improves by increasing brightness factor $\xi$ of the luminaires as expected. This can be inferred from the figure that the improvement in accuracy is modest with the inclusion of the tilted state in the training, which shows robustness of the proposed approach. However, this improvement is at the cost of increased size of training set i.e., $|{{\cal T}}_{U,T,P}| = 2400$.

Fig. 5. Accuracy against the increasing time bin size $\tau _b$ of the CIRs (or equivalently decreasing no. of bins $N_b$) for different values of $\xi$ of the luminaires. The number of training and validation samples for $\Gamma _{U,P}$ are $|{{\cal T}}_{U,P}| = 1600$ and $|{{\cal V}}_{U,T,P}| = 400$ respectively, and for $\Gamma _{U,T,P}$ are $|{{\cal T}}_{U,T,P}| = 2400$ and $|{{\cal V}}_{U,T,P}| = 600$ respectively.

Download Full Size | PDF

3.3 Accuracy against the increasing brightness factor $\xi$

The performance is also evaluated with respect to the increasing brightness factor $\xi$ of the luminaires as shown in Fig. 6. The different curves in the figure correspond to different values of $N_a$ used for averaging multiple CIR acquisitions. The curves are obtained by averaging accuracy values over 5 independent experimental runs using random training and validation samples in each run for two separate cases of network training i.e., $\Gamma _{U,P}$ and $\Gamma _{U,T,P}$. The number of training and validation samples for both the training cases are same as used in Fig. 5. The time bin size used for binning CIRs is $\tau _b = 1$ ns (or $N_b = 100$), which is highest bandwidth sampling as shown in Fig. 5. The comparison of the curves show that performance of the system can be greatly improved by using CIRs averaged over large $N_a$. The performance also improves by increasing brightness of the luminaires. Furthermore, the addition of a tilted state in training (dotted curves) i.e., $\Gamma _{U,T,P}$ shows comparable performance to using only upright and prone states (solid curves) i.e., $\Gamma _{U,P}$, especially for $N_a = 150$. This demonstrates that the system is robust to distinguish the tilted state from the prone state despite the fact of excluding tilted state in training the network.

3.4 Accuracy against the increasing number of training samples

Figure 7 shows accuracy against the increasing number of training samples for different brightness factor $\xi$ of the luminaries. The time bin size for binning process of CIRs is $\tau _b = 1$ ns ($N_b = 100$). The number of CIR acquisitions for averaging are $N_a = 150$. The accuracy curves are averaged over 5 separate experimental runs.Figure 7(a) shows network training $\Gamma _{U,P}$ with $|{{\cal V}}_{U,T,P}| = 400$ samples and Fig. 7(b) shows $\Gamma _{U,T,P}$ with $|{{\cal V}}_{U,T,P}| = 600$ samples. The training and validation samples are chosen randomly during each experimental run and the number of elements in ${{\cal V}}_{U,T,P}$ are divided approximately equally among the three states. As inferred from the figure, high accuracy can be achieved using a large training set with luminaires set at full brightness. The comparison of Figs. 7(a) and 7(b) shows that the performance of network training $\Gamma _{U,P}$ is comparable to $\Gamma _{U,T,P}$. Secondly, the performance saturates after $|{{\cal T}}_{U,P}| = |{{\cal T}}_{U,T,P}| = 800$ samples in both the figures. This indicates that the network can be trained with less number of samples instead of using the complete training data set.

3.5 Confusion matrix

Figure 8 shows confusion matrix for one experimental run of network training using training and validation data which are uniformly distributed in the room. Figure 8(a) uses $|{{\cal T}}_{U,P}| = 1600$ and $|{{\cal V}}_{U,T,P}| = 400$ samples while Fig. 8(b) uses $|{{\cal T}}_{U,T,P}| = 2400$ and $|{{\cal V}}_{U,T,P}| = 600$ samples. The figure depicts state prediction accuracy for both upright (or tilted) and prone states of the target. The time bin size is $\tau _b = 1$ ns ($N_b = 100$) and number of CIR acquisitions for averaging are $N_a = 200$. The brightness factor of the luminaires is set at $\xi = 1$. The accuracy of more than 97% can be observed in Fig. 8(b) when only upright and prone states are used in training the network i.e., $\Gamma _{U,P}$. On the other hand, the accuracy greater than 98% can be observed in Fig. 8(b) when all three states are used in network training i.e., $\Gamma _{U,T,P}$. The comparison of Figs. 8(a) and 8(b) shows that the system is robust in predicting the state of target even when it is tilted.

The correct prediction of prone state is of utmost importance in fall detection techniques. Apart from accuracy values, sensitivity and specificity measures are typically used in order to assess the performance of fall detection systems. Sensitivity measure reflects the correct prediction of true positives i.e., it shows how well the system performs in correctly predicting the fall and specificity measure shows correct prediction of true negatives i.e., robustness of the system in rejecting incorrectly predicted falls [7]. The sensitivity and specificity values can be calculated as

(9)$$\begin{aligned} \mathrm{Sensitivity} &= \frac{N_{TP}}{N_{TP} + N_{FN}} \\ \mathrm{Specificity} &= \frac{N_{TN}}{N_{TN} + N_{FP}} \end{aligned}$$

where $N_{TP}$, $N_{TN}$, $N_{FP}$, and $N_{FN}$ represent number of true positives, true negatives, false positives, and false negatives respectively. Consider the prone state ($u=2$) as true positive and the upright (or tilted) state as true negative ($u=1$). The sensitivity, specificity, and accuracy values corresponding to $\Gamma _{U,P}$ and $\Gamma _{U,T,P}$ for the experimental run in Fig. 8 are shown in Table 3.

The comparison of the performance measure values between $\Gamma _{U,P}$ and $\Gamma _{U,T,P}$ shows robustness of the fall detection system in predicting the prone state of the target even when the tilted state is excluded in network training i.e., $\Gamma _{U,P}$.

Fig. 6. Accuracy against the increasing brightness factor $\xi$ of the luminaires for different averaging values $N_a$ of CIR acquisition. The time bin size is $\tau _b = 1$ ns. The number of training and validation samples for $\Gamma _{U,P}$ are $|{{\cal T}}_{U,P}| = 1600$ and $|{{\cal V}}_{U,T,P}| = 400$ respectively, and for $\Gamma _{U,T,P}$ are $|{{\cal T}}_{U,T,P}| = 2400$ and $|{{\cal V}}_{U,T,P}| = 600$ respectively.

Download Full Size | PDF

Fig. 7. Accuracy against the increasing number of training samples for different brightness factor ($\xi$) of the luminaires. The time bin size is $\tau _b = 1$ ns and number of CIR acquisitions for averaging are $N_a = 150$. (a) $\Gamma _{U,P}$ with $|{{\cal V}}_{U,T,P}| = 400$ samples and (b) $\Gamma _{U,T,P}$ with $|{{\cal V}}_{U,T,P}| = 600$ samples.

Download Full Size | PDF

Fig. 8. Confusion matrix showing prediction accuracy for upright/tilted ($u=1$) and prone ($u=2$) states. The time bin size is $\tau _b = 1$ ns ($N_b = 100$) and number of CIR acquisitions for averaging are $N_a = 200$. The brightness factor of the luminaires is $\xi = 1$. Network training (a) $\Gamma _{U,P}$ with $|{{\cal T}}_{U,P}| = 1600$ and $|{{\cal V}}_{U,T,P}| = 400$ and (b) $\Gamma _{U,T,P}$ with $|{{\cal T}}_{U,T,P}| = 2400$ and $|{{\cal V}}_{U,T,P}| = 600$.

Download Full Size | PDF

Table 3. Sensitivity and Specificity values for experimental run (Fig. 8)

View Table | View all tables in this article

The proposed method achieves accuracy greater than 97% and relieves the user from carrying a device or wearing a sensor, which is in contrast to the most fall detection methods surveyed in [7,8,14] that require wearable sensors, camera-based sensors, ambient sensors, or combination of them in order to detect the falls. The ambient sensors-based study in [8] describes methods that sense changes in the indoor environment e.g., using radio frequency signals i.e., WiFi [16]. The method in [16] uses channel state information of WiFi signals in order to detect the fall and the reported accuracy is around 93%. The reported accuracy in [33] is greater than 95%, however, the proposed method uses geophone sensors in the indoor area in order to detect floor vibrations that occur due to the fall. The sensitivity and specificity of the fall detection method in [14] is 96%, however, the method requires the user to wear accelerometer sensor. Notice that in this work, an implicit assumption is that there is a person in the room. In a practical deployment, occupancy sensing must proceed the fall detection stage and visible light sensing approaches for occupancy sensing are interesting future directions for research.

4. Conclusions

A visible light sensing fall detection system is presented in a realistically modeled indoor environment that uses neural networks in order to predict output state of a target. A set of channel impulse response (CIR) measurements between different source-receiver pairs in the room are obtained that correspond to upright or prone states of the target. The employed neural network architecture learns the relationship between a set of CIR measurements and the target states. The proposed method does not require any sensor tags attached to the user and predicts the fall assuming that the user is present inside the room. Though the results presented consider the use of visible light in order to detect fall, the system can also be deployed using infrared (IR) source-receiver pairs to enable operation when the room is not illuminated.

The variation of accuracy against the changing time bin size of CIRs shows that with some trade-off in accuracy, a low complexity lower bandwidth receiver can be used. Similarly, the trade-off between accuracy and luminaire brightness can be made since the main purpose of luminaires is to provide illumination in the indoor environment and is under the control of the user.

The proposed method learns the details in CIR waveforms in order to differentiate between the prone and non-prone states. The robustness of the proposed method is also evaluated when one of the non-prone states (tilted) is not included in network training. The accuracy of more than 97% is reported when large number of training samples are used with luminaires set at full brightness. It is important to note that the proposed approach assumes that a person is present inside of the room which is in contrast to occupancy sensing approaches where the presence of a person is detected inside the room. Our future work includes investigating passive visible light sensing occupancy sensors to count number of individuals in an indoor environment and to integrate it with this fall detection approach. Additionally, our future work for the fall detection algorithm focuses on using dynamic falling motion models of a person in the indoor area and collecting series of measurements in order to track their state. Additionally, the room can be made more complex by including additional furniture and objects as well as alternate luminaire arrangements. Beyond this initial proof-of-concept study, future directions include a more accurate rendering of individuals with different sizes as well as a comprehensive study on the impact of different materials on system performance as well as experimental verification. Furthermore, the measurement snapshot approach proposed here can be expanded to consider multiple snapshots of CIR measurements corresponding to the dynamic variation of states of a person along with prior information in order to detect their fall. This approach may allow for more accurate fall detection over a variety of target sizes by tracking the dynamic evolution of the CIR during the falling process.

Funding

Natural Sciences and Engineering Research Council of Canada.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. T. Do and M. Yoo, “An in-depth survey of visible light communication based positioning systems,” Sensors 16(5), 678 (2016). [CrossRef]

2. J. Singh and U. Raza, “Passive visible light positioning systems: An overview,” in Proc. Work. Light Up IoT, (Association for Computing Machinery, London, 2020), pp. 48–53.

3. M. T. Taylor and S. Hranilovic, “Angular diversity approach to indoor positioning using visible light,” in Globecom Work. (GC Wkshps), 2013 IEEE, (2013), pp. 1093–1098.

4. K. Majeed and S. Hranilovic, “Passive indoor localization for visible light communication systems,” in 2018 IEEE Glob. Commun. Conf., (Abu Dhabi, 2018), pp. 1–6.

5. K. Majeed and S. Hranilovic, “Performance bounds on passive indoor positioning using visible light,” J. Lightwave Technol. 38(8), 2190–2200 (2020). [CrossRef]

6. K. Majeed and S. Hranilovic, “Passive indoor visible light positioning system using deep learning,” IEEE Internet Things J. 8(19), 14810–14821 (2021). [CrossRef]

7. R. W. Broadley, J. Klenk, S. B. Thies, L. P. Kenney, and M. H. Granat, “Methods for the real-world evaluation of fall detection technology: A scoping review,” Sensors 18(7), 2060 (2018). [CrossRef]

8. X. Wang, J. Ellul, and G. Azzopardi, “Elderly fall detection systems: A literature survey,” Front. Robot. AI 7, 71 (2020). [CrossRef]

9. Y. Nizam, M. N. H. Mohd, and M. M. A. Jamil, “Human fall detection from depth images using position and velocity of subject,” Procedia Comput. Sci. 105, 131–137 (2017). [CrossRef]

10. X. Wang and K. Jia, “Human fall detection algorithm based on YOLOv3,” 2020 IEEE 5th Int. Conf. Image, Vis. Comput. ICIVC 2020 pp. 50–54 (2020).

11. J. Zhang, C. Wu, and Y. Wang, “Human fall detection based on body posture spatio-temporal evolution,” Sensors 20(3), 946 (2020). [CrossRef]

12. Q. Guan, C. Li, X. Guo, and B. Shen, “Infrared signal based elderly fall detection for in-home monitoring,” Proc. - 9th Int. Conf. Intell. Human-Machine Syst. Cybern. IHMSC 20171, 373–376 (2017).

13. Y. T. Chang and T. K. Shih, “Human fall detection based on event pattern matching with ultrasonic array sensors,” Ubi-Media 2017 - Proc. 10th Int. Conf. Ubi-Media Comput. Work. with 4th Int. Work. Adv. E-Learning 1st Int. Work. Multimed. IoT Networks, Syst. Appl. pp. 8–11 (2017).

14. O. Aziz, M. Musngi, E. J. Park, G. Mori, and S. N. Robinovitch, “A comparison of accuracy of fall detection algorithms (threshold-based vs. machine learning) using waist-mounted tri-axial accelerometer signals from a comprehensive set of falls and non-fall trials,” Med. Biol. Eng. Comput. 55(1), 45–55 (2017). [CrossRef]

15. P. Sharma, M. Aittala, Y. Y. Schechner, A. Torralba, G. W. Wornell, W. T. Freeman, and F. Durand, “What You Can Learn by Staring at a Blank Wall,” arXiv:2108.13027 [cs.CV] (2021).

16. S. Palipana, D. Rojas, P. Agrawal, and D. Pesch, “FallDeFi: Ubiquitous fall detection using commodity Wi-Fi devices,” Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1(4), 1–25 (2018). [CrossRef]

17. M. Gietzelt, J. Spehr, Y. Ehmen, S. Wegel, F. Feldwieser, M. Meis, M. Marschollek, K. Wolf, E. Steinhagen-Thiessen, and M. Gövercin, “GAL@Home: A feasibility study of sensor-based in-home fall detection,” Z. Gerontol. Geriatr. 45(8), 716–721 (2012). [CrossRef]

18. Zemax OpticStudio, “Optical system design software,” Zemax (2020).

19. Vector Clipart, “Standing person silhouette,” (2020).

20. F. R. Gfeller and U. Bapst, “Wireless in-house data communication via diffuse infrared radiation,” Proc. IEEE 67(11), 1474–1486 (1979). [CrossRef]

21. J. M. Kahn and J. R. Barry, “Wireless infrared communications,” Proc. IEEE 85(2), 265–298 (1997). [CrossRef]

22. S. K. Meerdink, S. J. Hook, D. A. Roberts, and E. A. Abbott, “The ECOSTRESS spectral library version 1.0,” Remote Sens. Environ. 230, 111196 (2019). [CrossRef]

23. A. M. Baldridge, S. J. Hook, C. I. Grove, and G. Rivera, “The ASTER spectral library version 2.0,” Remote Sens. Environ. 113(4), 711–715 (2009). [CrossRef]

24. K. Lee, H. Park, and J. R. Barry, “Indoor channel characteristics for visible light communications,” IEEE Commun. Lett. 15(2), 217–219 (2011). [CrossRef]

25. Blender, “Open source 3D creation software,” (2020).

26. J. R. Barry, J. M. Kahn, W. J. Krause, E. A. Lee, and D. G. Messerschmitt, “Simulation of multipath impulse response for indoor wireless optical channels,” IEEE J. Sel. Areas Commun. 11(3), 367–379 (1993). [CrossRef]

27. F. Miramirkhani and M. Uysal, “Channel modeling and characterization for visible light communications,” IEEE Photonics J. 7(6), 1–16 (2015). [CrossRef]

28. H. B. Eldeeb, M. Uysal, S. M. Mana, P. Hellwig, J. Hilt, and V. Jungnickel, “Channel modelling for light communications: Validation of ray tracing by measurements,” in 2020 12th Int. Symp. Commun. Syst. Networks Digit. Signal Process., (Porto, 2020), pp. 1–6.

29. I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016).

30. J. Grubor, S. Randel, K. D. Langer, and J. W. Walewski, “Broadband information broadcasting using LED-based interior lighting,” J. Lightwave Technol. 26(24), 3883–3892 (2008). [CrossRef]

31. J. J. Goldberger and J. Ng, Practical Signal and Image Processing in Clinical Cardiology (Springer, London, 2010), 1st ed.

32. The MathWorks Inc., “Deep learning toolbox,” (2020).

33. Y. Huang, W. Chen, H. Chen, L. Wang, and K. Wu, “G-Fall: Device-free and training-free fall detection with geophones,” in 2019 16th Annu. IEEE Int. Conf. Sensing, Commun. Netw., (IEEE, 2019), pp. 1–9.

Model Room	- $L \times W \times H$ ( $5 m \times 5 m \times 3 m$ )
	- No. of sources and receivers ( $N_{s} = N_{r} = 9$ )
	- Reflection coefficient for floor, walls, and ceiling [24]
	- Room temperature and atmospheric pressure ( $20$ °C and $1$ atm)
Luminaires	- Maximum transmit power ( $P_{m a x} = 1$ W)
	- Lambertian index ( $m = 1$ )
	- Photopic source distribution [18]
Receivers	- Responsivity ( $α = 1 A / W$ )
	- half-angle FOV ( $Ψ / 2 = 45 \deg$ )
	- Surface area ( $A_{r} = 1 {c m}^{2}$ )
	- Noise variance ( $σ^{2} = 10^{- 16} A^{2}$ )
Target	- $(L \times W \times H)$ ( $0.3 m \times 0.3 m \times 1.6 m$ )
	- Reflection coefficient ( $ρ_{o b j} = 0.5$ )
	- Tilted state angle w.r.t. vertical axis (30 °)
	- Tilted state directions (north, east, south, west)
	- Prone state directions (horizontal (x-axis), vertical (y-axis))
Furniture	- Desk, $L \times W \times H$ ( $0.5 m \times 1 m \times 0.85 m$ )
	- Chair, $L \times W \times H$ ( $0.43 m \times 0.41 m \times 0.95 m$ )
	- Closet $L \times W \times H$ ( $1 m \times 1.6 m \times 1.85 m$ )
	- Pine wood coating [22,23]

Number of inputs	$N_{b} \times N_{s} \times N_{r}$
Number of hidden layers	$N_{l} = 3$
Number of neurons in each hidden layer	$N_{w} = 500$
Learning rate	0.001
Batch size	2048
Optimizer	Adam (MATLAB [32])

	$Γ_{U, P}$	$Γ_{U, T, P}$
Sensitivity	98.5%	99.5%
Specificity	97.3%	97.5%
Accuracy	97.8%	98.2%

Model Room	- $L \times W \times H$ ( $5 m \times 5 m \times 3 m$ )
	- No. of sources and receivers ( $N_{s} = N_{r} = 9$ )
	- Reflection coefficient for floor, walls, and ceiling [24]
	- Room temperature and atmospheric pressure ( $20$ °C and $1$ atm)
Luminaires	- Maximum transmit power ( $P_{m a x} = 1$ W)
	- Lambertian index ( $m = 1$ )
	- Photopic source distribution [18]
Receivers	- Responsivity ( $α = 1 A / W$ )
	- half-angle FOV ( $Ψ / 2 = 45 \deg$ )
	- Surface area ( $A_{r} = 1 {c m}^{2}$ )
	- Noise variance ( $σ^{2} = 10^{- 16} A^{2}$ )
Target	- $(L \times W \times H)$ ( $0.3 m \times 0.3 m \times 1.6 m$ )
	- Reflection coefficient ( $ρ_{o b j} = 0.5$ )
	- Tilted state angle w.r.t. vertical axis (30 °)
	- Tilted state directions (north, east, south, west)
	- Prone state directions (horizontal (x-axis), vertical (y-axis))
Furniture	- Desk, $L \times W \times H$ ( $0.5 m \times 1 m \times 0.85 m$ )
	- Chair, $L \times W \times H$ ( $0.43 m \times 0.41 m \times 0.95 m$ )
	- Closet $L \times W \times H$ ( $1 m \times 1.6 m \times 1.85 m$ )
	- Pine wood coating [22,23]

Number of inputs	$N_{b} \times N_{s} \times N_{r}$
Number of hidden layers	$N_{l} = 3$
Number of neurons in each hidden layer	$N_{w} = 500$
Learning rate	0.001
Batch size	2048
Optimizer	Adam (MATLAB [32])

Passive indoor visible light-based fall detection using neural networks

Abstract

1. Introduction

2. System model

2.1 Realistic room model

2.2 Neural network classification

3. Numerical results

3.1 Simulation environment

3.2 Accuracy against the increasing time bin size $\tau _b$

3.3 Accuracy against the increasing brightness factor $\xi$

3.4 Accuracy against the increasing number of training samples

3.5 Confusion matrix

4. Conclusions

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (8)

Tables (3)

Equations (9)

Optics Express