Abstract
The Radio Signal Strength Indication (RSSI) obtained by Bluetooth can be used to estimate the proximity and duration of an individual’s exposure to patients diagnosed with COVID-19. This is because as the RSSI signal becomes stronger with the help of an rf amplifier, it indicates a closer proximity. This proximity information can be used alongside timestamp data to estimate the distance between individuals over time. However, due to the nature of the radiofrequency signal, fluctuations in the RSSI make it difficult to associate a measured RSSI with an exact distance. Therefore, this project aims to measure the direct distance in feet between two devices by mapping RSSI to distance using deep neural networks. It is also shown that there is a threshold value which can minimise the number of false positives.
Keywords: Contact Tracing, Bluetooth Devices, Machine Learning, COVID-19, Mobile App, Database, Privacy, Deep Neural Network, Google Tensorflow, Python
Introduction
Project Description
This project addresses a crucial task in contact tracing: predicting if two devices are closer than 6 feet apart. In order to correctly alert people of possible transmissions, an algorithm must be able to process the given RSSI values and determine if the two devices were close enough. If an inaccurate or faulty algorithm is implemented, numerous people could be infected or inconvenienced.
In this project, repeated measurements of RSSI between two devices at varying distances were collected for varying scenarios. After the data was collected, a conclusion regarding the possibility of an effective machine learning model to estimate distances was drawn. The procedures of the experiment were as follows:
- Data was collected from two different locations: a) two Raspberry Pis in the same room and b) two Pis in different rooms. When the two Pis were in the same room, data was collected with distances 3 feet, 6 feet, and 9 feet. When the two Pis were in different rooms, data was collected with distances 3 feet, and 9 feet.
- New data was generated by a factor of 100 through the bootstrapping technique: sampling the whole dataset with replacement set to true.
- The initial Keras sequential model using 1 hidden dense layer with layer size 64 and adam optimiser was built.
- Hyperparameter tuning, the process of testing different parameters to find the best performing model, was run on the dense layer size, the amount of dense layers, and different optimisers.
- Optimal model training was run with 128 hidden layer size, 1 Dense layer, and the adam optimiser to 250 epochs.
- The early stopping technique (ending the training process at the highest accuracy) was implemented on the optimal model.
- The RSSI threshold value was estimated from the 6 feet apart data and tested on the Experiment I (see Table I) dataset.
Background Information
Contact Tracing:
Contact tracing is used to slow the spread of infectious diseases. In general, contact tracing involves identifying index patients (initial disease carriers) who have the disease and people who they came in contact with (contacts). [1] This includes asking people with positive diagnoses to undergo isolation and their contacts to quarantine temporarily. [2]
While contact tracing is useful to prevent the spread of infectious diseases, traditional contact tracing can pose some challenges in accuracy, efficiency, and privacy. [1] For example, manual contact tracing is subject to a person\’s ability to recall everyone contacted with over a certain period. [1] Also, people are unable to reveal their private location data due to public policy. These difficulties in current contact tracing systems request a better automated, privacy-preserving contact tracing system. Today, most cell phones are equipped with Bluetooth modules that can advertise their presence through an anonymous signal. Using these devices as automated tracing devices can solve most of the issues. [3]
False Positives / False Negatives:
Distinguishing false positives and false negatives is very important. False positives are of concern since they mean that people may be led to unnecessarily quarantine. False negatives are of concern because they mean that infected people may unknowingly spread the infection. [1]
Bluetooth Advertisement:
Bluetooth is a wireless technology standard used for exchanging data between fixed and mobile devices over short distances using short-wavelength UHF radio waves in the industrial, scientific and medical radio bands, from 2.402 GHz to 2.480 GHz, and building personal area networks (PANs). Advertisements using Bluetooth are recommended for contact tracing because these advertisement “chirps” can be privacy preserving and anonymous. [3]
Machine learning:
Machine Learning is the study of computer algorithms that improve automatically through experience. Machine learning algorithms build a mathematical model based on sample data, known as \”training data\”, in order to make predictions or decisions without being explicitly programmed to do so. After the model is trained, it can be evaluated using separate “test” and “validation” datasets. [6]
Assumptions:
It was assumed that people were in an indoor space with consistent advertising and scanning devices. The effects of weather conditions, clothing, or movement of the Pis were not addressed. Interference due to other signals such as Wi-Fi routers or other Bluetooth devices was also not accounted for. Taking these assumptions into account is important as having different conditions (For ex: indoor vs outdoor, sunny vs. cloudy) may affect the RSSI data collected even though collected over the same distance.
Hypotheses
- Using RSSI data values, it is possible to develop an effective deep neural network to classify distances between two Raspberry Pis with above 90% accuracy. As effectively recognising possible transmissions is vital to contact tracing, a neural network which distinguishes safe distances from unsafe ones is extremely relevant to projects such as the Private Automated Contact Tracing currently being conducted by the Massachusetts Institute of Technology, Google, and Apple. [7] A neural network can help identify possible transmissions with high speed, low cost, and across millions of devices simultaneously. These identification results can be used to alert people who might have had close contact with a person who has tested positive. The model requires the most investigation as there are a multitude of parameters that can be fine-tuned.
- There is an effective RSSI value threshold which enables the Raspberry Pi to conclude if the person is either too far away or separated by a wall or barrier with 95% accuracy. Finding an effective RSSI value threshold which automatically disregards a certain range of data as too far away or behind an obstacle could be valuable in the long term. The threshold value enables the model to process less data which allows for higher efficiency with larger datasets. Data collection requires the most investigation as the distance(s) at which the data is collected must be determined.
Experiments and Data Collection
TABLE I. Experiment Overview
Exp. # | Hypothesis | Reason | Repetition |
# 1 RSSI Data Collection Experiment |
Using RSSI data values, it is possible to develop an effective deep neural network to classify distances between two Raspberry Pis with above 90% accuracy. | The RSSI data is needed to train the neural network and increase the accuracy of classification. | 4 |
# 2 Disregarding RSSI Threshold Experiment | There is an effective RSSI value threshold which enables the raspberry pi to conclude if the person is either too far away or separated by a wall or barrier with 95% accuracy. | The RSSI values at 6 feet must be collected to determine the threshold that disregards data most accurately. | 1 |
Plan and Execution
For the RSSI Data Collection Experiment, data in four different scenarios were collected. The scenarios are listed below. In total, 2,000 data points with 500 RSSI values for each scenario were collected. Throughout this experiment, the two Raspberry Pis were stationary on the floor with minimal surrounding movement. The advertising and scanning Pis were also kept constant throughout the experiment.
- Data collection with Pis 3 feet apart in an indoor open space.
- Data collection with Pis 9 feet apart in an indoor open space.
- Data collection with Pis 3 feet apart with an interior wall in between.
- Data collection with Pis 9 feet apart with an interior wall in between.
However I used this data to help us differentiate two people in the same room to two people in different rooms. I wanted to minimize the error of the detection system so that it does not classify two people as too close when they are separated by a wall. This is essentially the control data.
TABLE II. Experiment I Information
Scenario # | Conditions | Target Classification |
#1 open 3 feet |
– 3 feet apart – Indoor open space – 500 data points |
1 |
#2 open 9 feet |
– 9 feet apart – Indoor open space – 500 data points |
0 |
#3 wall 3 feet |
– 3 feet apart – Interior wall in between – 500 data points |
0 |
#4 wall 9 feet |
– 9 feet apart – Interior wall in between – 500 data points |
0 |
Fig. 2. The one target classification means that the Pis were close enough for a possible transmission while the zero target classification means that the Pis were too far away.
Bootstrapping is also an effective method to increase the amount of data points without having to repeat the data collection process thousands of times. By sampling the entire dataset with the replacement parameter set to true, new data that is similar but slightly variant to the original sample can be generated. Bootstrapping was applied to data in each scenario by a factor of 100 which gave a total of 200,000 data points in the end. Bootstrapping this dataset was important because the distance model gives higher accuracies with more data points in the training dataset.
After bootstrapping, the data was split into training, validation, and testing datasets. 64% of the data was allocated to the training dataset, 16% to the validation dataset, and 20% to the testing dataset.
For Experiment II, 650 data points from a single scenario were collected. The Raspberry Pis were placed 6 feet apart on the ground in an open indoor space. Throughout the experiment, the Pis were kept stationary on the ground with minimal surrounding movement. The advertising and scanning Pis used in the previous experiment were kept constant.
Data Relevance
Collecting RSSI values from a distance of less than 6 feet and greater than 6 feet allows for the creation of a classification neural network. The effect of an interior wall on the RSSI values was considered: wall data from 3 feet and wall data from 9 feet was added to the 0th class (Table II). As a result, the data corresponding to the 1st class was the RSSI values collected with the Pis less than 6 feet apart while the data corresponding to the 2nd class was the RSSI values collected with the Pis more than 6 feet apart or separated by a barrier (in this case an indoor wall). Wall data was incorporated into the data collection process to minimize the detection error of the model so that it does not classify two people as too close when they are separated by a clear barrier.
Using this categorised data, the model would be able to differentiate between RSSI values that are too low or too high. If the model is able to achieve at least 90% accuracy, the first hypothesis is likely to be true. By collecting RSSI values with the Raspberry Pis 6 feet apart, a disregarding threshold value to be used alongside the model could be estimated. If the threshold value disregards low RSSI values with an accuracy of at least 95%, the second hypothesis is likely to be true.
Examples
Experiment I:
TABLE III. Distribution Graphs
Scenario | Original Data | Bootstrapped Data |
Open 3 Feet |
![]() |
![]() |
Open 9 Feet |
![]() |
![]() |
Wall 3 Feet |
![]() |
![]() |
Wall 9 Feet |
![]() |
![]() |
Experiment II:
Fig. 4. Experiment II RSSI Value Distribution collected at 6 feet in an open space.
Analysis
Description
Using the TensorFlow feature classification documentation, a feed forward sequential deep neural network was developed to classify whether two Raspberry Pis were less than 6 feet or greater than 6 feet apart. The model was created using Keras, TensorFlow, pandas, Python, and Google Colaboratory as the online IDE. The goal for this algorithm was to surpass 90% accuracy on the test dataset.
Fig. 5. Experiment I Initial Model Structure for Proximity Classification.
After training the initial model for 40 epochs, 74% accuracy was achieved. Hyperparameter testing was done to fine tune the model’s parameters and scout for better performing model structures. The parameters tested were the amount of dense layers, the layer size, and the optimiser. In an attempt to further increase the accuracy, the optimal model was trained for 250 epochs and implemented early stopping.
With the 650 data points collected in Experiment II, the disregarding threshold was calculated by taking the minimum RSSI value. The methodology is that in order to have an RSSI value less than the lowest six feet value, the distance between the Pis must be farther away. This method of analysis also effectively covers the wall data(wall3feet & wall9feet datasets) as those RSSI values are dramatically lower than the Experiment II data.
Results
TABLE IV. Evaluation Accuracy
RSSI Deep Neural Network | Disregarding Threshold | |
Final Accuracy | 86.28% | 100% |
Result Description for Hypothesis 1:
After running hyperparameter testing, the optimal model was found to use one dense layer, 128 layer size, 0.25 dropout, and the adam optimiser. The final test accuracy of the model was 86.28%. The full hypothesis testing and optimal model graphs are shown below.
Result Description for Hypothesis 2:
The minimum RSSI value of the Experiment II data was -67. Testing the estimated threshold value on the entire Experiment I dataset led to 52,909 data points considered too low. As all 52,909 data points were in the 0th class, the threshold achieved 100% accuracy in disregarding insignificant data.
Hypothesis I Results:
Hyperparameter Testing Tables:
Optimiser: adam
1 Dense Layer |
2 Dense Layers | 3 Dense Layers | |
Hidden layer size 64 |
Accuracy: 0.8328 ![]() ![]() |
Accuracy: 0.8606 ![]() ![]() |
Accuracy: 0.8596 ![]() ![]() |
Hidden layer size 128 | Accuracy: 0.8624 ![]() ![]() |
Accuracy: 0.8583 ![]() ![]() |
Accuracy: 0.8309 ![]() ![]() |
Hidden layer size 256 | Accuracy: 0.8574 ![]() ![]() |
Accuracy: 0.8608 ![]() ![]() |
Accuracy: 0.8597 ![]() ![]() |
Optimiser: rmsprop
1 Dense Layer | 2 Dense Layers | 3 Dense Layers | |
Hidden layer 64 | Accuracy: 0.8356 ![]() ![]() |
Accuracy: 0.8319 ![]() ![]() |
Accuracy: 0.8350 ![]() ![]() |
Hidden layer 128 | Accuracy: 0.8342 ![]() ![]() |
Accuracy: 0.7592 ![]() ![]() |
Accuracy: 0.7942 ![]() ![]() |
Hidden layer 256 | Accuracy: 0.8346 ![]() ![]() |
Accuracy: 0.7937 ![]() ![]() |
Accuracy: 0.8248 ![]() ![]() |
Fig. 9. Experiment I Optimal Model Structure after Hyperparameter Testing
![]() ![]() |
Fig. 10. Full model training with optimal parameters: 250 epochs, adam optimizer, 128 hidden layer size, 1 dense layer.
Fig. 11. Highest accuracy for optimal model after implementing early stopping.
Conclusions
Hypothesis Evaluation
According to the Evaluation Accuracy Table (#4), the final accuracy of the model was 86.28%. The first hypothesis was indeterminate as an accuracy of lower than 90% was achieved.
According to the Evaluation Accuracy Table (#4), the final accuracy of the disregarding threshold on the RSSI data was 100%. Therefore, the second hypothesis was found to be true as the accuracy was above 95%.
Noteworthy Conclusions
- A deep neural network is able to classify distance via Bluetooth RSSI signals with at least 86.28% accuracy.
- If two people (devices) are in different rooms separated by a wall, a neural network is able to exclude this case from a scenario involving actual contact even though they are within 6 feet away.
- RSSI value thresholds and neural networks can effectively reduce the number of false-positive cases.
General Lessons Learned
It is difficult to increase the accuracy of models with only Bluetooth RSSI values. Since RSSI values fluctuate even when the advertising and scanning devices are stationary, it is difficult to make reliable conclusions about the distance between them. This may explain the oscillations existent in the validation model evaluation graphs. Either more RSSI data would need to be collected or more features would need to be added.
Next Steps
In terms of data, external factors such as temperature, humidity, and the presence of other radiofrequency signals can be incorporated into the model. The effects of exterior walls on the RSSI values can be tested. If the exterior walls have higher attenuation than interior walls, it could be assumed that the model and threshold value can accurately classify the RSSI values as insignificant.
In terms of modeling methods, distance prediction formulas using RSSI value can be applied to further increase the accuracy of the classification model. Additional hyperparameter testing on the optimal model could be run.
For the practical aspect in real world contact tracing, the differences in RSSI measurements during various conditions, such as different device orientations, device location on the human body, model of the Bluetooth module, and surrounding articles of clothing could be considered.
Using the model, a mobile application that uses alert systems to notify the user of a potential disease transmission could possibly be developed. SQL databases to hold Bluetooth RSSI information and user IDs could be created, making storing and processing data more organised and efficient. Lastly, privacy algorithms can be implemented to preserve anonymity and ensure that only necessary information is given.
References
- \”Digital Contact Tracing for Pandemic Response.\” 2020. doi:10.1353/book.75831.
- \”COVID-19 Contact Tracing.\” Centers for Disease Control and Prevention. https://www.cdc.gov/coronavirus/2019-ncov/daily-life-coping/contact-tracing.html.
- \”Bluetooth.\” Wikipedia. September 08, 2020. https://en.wikipedia.org/wiki/Bluetooth.
- \”Classify Structured Data with Feature Columns : TensorFlow Core.\” TensorFlow. https://www.tensorflow.org/tutorials/structured_data/feature_columns.
- Purdue Writing Lab. \”Writing Report Abstracts // Purdue Writing Lab.\” Purdue Writing Lab. https://owl.purdue.edu/owl/subject_specific_writing/professional_technical_writing/technical_reports_and_report_abstracts/index.html.
- Hao, Karen. “What Is Machine Learning?” MIT Technology Review, MIT Technology Review, 5 Apr. 2021, www.technologyreview.com/2018/11/17/103781/what-is-machine-learning-we-drew-you-another-flowchart/.
- “Private Automated Contact Tracing.” PACT, pact.mit.edu/.
About the Author
Edward Jung is a 15 year old student from Irvine, United States. He is fascinated by machine learning and its potential to provide effective solutions to real-world problems. He is also interested in other areas of STEM, particularly medicine where he hopes to save lives through his research. Edward\’s other interests include competitive programming, critical thinking, and playing basketball.
Hi There,
Just a heads-up that I believe the word “Wellcome” is spelled wrong on your website. I had a couple of errors on my site before I started using a service to monitor for them. There are a few sites that do this but we like SpellingReport.com and ErrorSearch.com.
-Maria
Hi, The name of the Institute in question is “Wellcome Sanger Institute” spelt with two ‘l’s https://www.sanger.ac.uk/