Deep Learning for Safe Driving

Protecting Drivers from Themselves


A small camera is placed in your vehicle on the passenger-side dashboard.


Start the device and drive as normal while Soteria monitors your behavior.


Soteria provides gentle reminders to you indicating distracted driving.


Use Soteria's dashboard to see when and where you were distracted.


Distracted driving is one of the leading causes of automobile accidents that can lead to damages that even money cannot replace. At Soteria we provide a new way to prevent this risk and introduce a revolutionary end-to-end product.

Watch the video below to see the product in action. Notice how the device detects distracted driving and notifies the driver in real-time!

How It Works

Soteria uses combination of the internet of things (IoT) and machine learning to detect distracted driving. The IoT component consists of a Raspberry Pi unit connected to the cloud while the machine learning component is a convolutional neural network (CNN).

To truly combat the issue of distracted driving, Soteria provides users feedback while they are driving. Soteria is the first product that attempts to prevent distracted driving in a proactive manner. We do so by placing a small, in-car camera in the user's vehicle.

By placing a device in the user's vehicle, we are able to capture images of the driver over fixed intervals (e.g. every 15 seconds). Images and GPS coordinates are then passed to our cloud servers which are running our state-of-the-art model. Should the model detect distracted driving, the driver will be notified via a chime through the car stereo system (similar to how modern cars warn drivers if the seatbelt is not fastened). Click play below to hear the warning chime.

Product Architecture

Click here for the detailed architecture.


Our model "learned" what distracted driving looks like from images obtained from a Kaggle competition sponsored by State Farm Insurance. The goal of the model is to accurately predict unseen images to the following 10 classes:

  • 0: safe driving
  • 1: texting - right
  • 2: talking on the phone - right
  • 3: texting - left
  • 4: talking on the phone - left
  • 5: operating the radio
  • 6: drinking
  • 7: reaching behind
  • 8: hair and makeup
  • 9: talking to passenger
Each input image is resized to 224 pixels by 224 pixels for model training and testing. Very basic image processing techniques such as histogram equalization across all 3 channels and randomly rotating the images by 10 degrees have been done on the images.


Our initial approach involved training a CNN from "scratch" but we quickly realised that using a pre-trained network might be a better option due to our limited dataset. To be exact, we utilized a technique known as transfer learning where a pre-trained network is used for initialization weights and then further trained to learn the idiosyncrasies of our data. The well-known VGG-16 pre-trained net was used as our starting point. In our VGG-16 net model, we perform global average pooling (GAP) just before the final output layer at the end. This helps the convolutional neural network to have localization ability despite being trained on images.

Class Activation Maps for Different Classes

A class activation map for a particular category indicates the discriminative image regions used by the CNN to identify that category. Below are the class activation maps for safe driving and texting using right hand. For a single class itself based on the position different regions in the image get activated.

Model Accuracy

The training data did not generalize well as the images were captured in a simulated environment. "Real" images showed much more variation in lighting, driver actions, and clarity. In many images, even a human would have difficulty classifying the image due to ambiguity.

Below is the confusion matrix for a subset of test data. The accuracy of the model for this subset of test data was 56.29%. The model finds it difficult to differentiate between the safe driving, talking to passenger, and hair & makeup classes.

  • 0: safe driving
  • 1: texting - right
  • 2: talking on the phone - right
  • 3: texting - left
  • 4: talking on the phone - left
  • 5: operating the radio
  • 6: drinking
  • 7: reaching behind
  • 8: hair and makeup
  • 9: talking to passenger


Aggregated View of All Trips

The map below depicts areas where the model detected distracted driving.

Areas with higher frequencies of distraction are colored red.

Detailed View of Trips



  • + What is Soteria?

  • Soteria prevents distracted driving with a combination of the internet of things and machine learning.
  • + What does the name mean?

  • In Greek mythology, Soteria was the goddess or spirit of safety and salvation, deliverance, and preservation from harm.
  • + How did the project come about?

  • The project began as a Kaggle competition but the team realized that the project could be much more than just another neural network. We explored different options and determined what we feel is the most impactful way of utilizing the model, real-time feedback.
  • + What tools did you use?

  • Python was the primary language of the project. Some of the core packages we relied on were NumPy, Theano, Keras, OpenCV, Paho-MQTT, PiCamera, Tableau, and Python-GPS. Additional work and explorations were done in Tensorflow, Lua, Caffe, Google Charts, and AWS C SDK.


  • + Who does Soteria help?

  • Soteria's in-vehicle device and data analytics can help anyone with a personal and/or financial interest in friends, family, employees or customers who drive vehicles. From parents, to business owners, to insurance providers, Soteria can help improve driving safety and mitigate risk.
  • + How can Soteria benefit individuals?

  • Imagine you're a parent with a teen driver on your insurance policy. You hope that they're driving safely but hope does not prevent/detect distracted driving. Soteria's in-car device combined with web-based analytics can give you accurate feedback on your teen's driving safety.
  • + How can Soteria benefit business vehicle owners?

  • Many businesses own one or more company vehicles. These vehicles are driven by employees but are a company liability. Soteria's in-vehicle device and web analytics can give you the peace of mind that your assets are in good hands and being driven responsibly. If you discover distracted driving, you can review the photo evidence and intervene as needed.


  • + What is a neural network?

  • Neural networks are a machine learning technique where the model mimics the architecture of the human brain. They are very good at approximating nonlinear functions and are the backbone of the recent advances in both image recognition as well as speech recognition.
  • + What is deep learning?

  • Deep learning is a rebranding of neural networks. With the advent of big data and improvements in computational power, neural networks grew deeper in size and became capable of learning more and more complex problems.
  • + Why did you use Keras?

  • There are many deep learning libraries and we explored several of them. We considered other frameworks such as Torch and Caffe but found prototyping and implementation were easier in Keras. Keras runs on top of either Theano or Tensorflow which gave us the ability to switch backends with no loss of work.
  • + What are the limitations of the model?

  • The model has many limitations due to our limited training data. Lighting conditions, camera angle, vehicle type, and driver characteristics are factors that could impact the accuracy of the model. As we continue to collect data, we will be able to train the model on a greater variety of data.
  • + Why did you use VGG-16?

  • We initially explored creating our model from "scratch" (i.e. with randomly assigned model weights) but quickly realized our training set was limited. In the deep learning world, 20,000 images is a rather small dataset. After switching to transfer learning, we saw a dramatic improvement in model performance. We considered other pre-trained models such as VGG-19, GoogLeNet, and ResNet-50 but VGG-16 produced our best results.


  • + How did you make the device?

  • There were several iterations which varied slightly but the device consists of a Raspberry Pi, PiCamera, custom case, WiFi USB dongle, GPS USB dongle, and auxiliary cable. It is mounted in the vehicle using a phone mount and bungee cords.
  • + What is Internet of Things (IoT)?

  • IoT is the development where everyday objects (e.g. light bulbs) become connected to the internet. IoT allows for increased control and connectivity across people's lives.
  • + When does the model notify the driver?

  • If the probability of distraction is greater than 95%, the model will notify the driver within 1-3 seconds.
  • + What kind of backend server do you use?

  • We use lightweight Amazon Web Services (AWS) Elastic Cloud Compute (EC2) instances to run the model. The EC2 instances are loaded with the required Python packages, the model, and the AWS Internet of Things (IoT) credentials.
  • + How do you store the data?

  • Images are stored in AWS Simple Storage Service (S3) while trip data is stored in AWS SimpleDB.


  • + Is my data safe?

  • Soteria stores data securely on AWS Simple Storage Service (S3) and AWS SimpleDB. Messages between the device and the backend server merely contain a unique ID of the S3 key.
  • + Will you share/sell my data to other companies?

  • We will not share any individual user information.

Next Steps

  • + When can we expect this device to hit market?

  • Currently the team at Soteria is perfecting our model and hardware. We are considering patent options and will pursue commerical release pending interest from investors.
  • + What future updates will you make to the model?

  • We plan on exploring several different updates. One update that looks promising is human activity pose estimation. As more and more drivers start using the device, we will be able improve the model with the increase in both the volume as well as the variety of data.
  • + What updates will you make to the device?

  • Configurable tolerance thresholds, a feedback mechanism, and device optimization are some of the planned updates. Configurable tolerance thresholds will allow users to set the sensitivity of the model to control the frequency of the feedback. A feedback mechanism will allow users to give the model feedback to ensure the model is learning the correct information.
  • + What updates will you make to the analytics platform?

  • We understand that even the best dataset is useless if it cannot be turned into actionable information. We plan on improving our web analytics platform by expanding the backend to support both a higher volume as well as a higher velocity of data. Similar to Amazon's Echo product, we plan on incorporating user feedback to provide a more personalized product. Did the model get an image wrong? Well then you can reclassify it and our product will learn from your input.
  • + What about integrations with other products?

  • As Soteria grows, we will definitely integrate with other products such as Automatic, Apple Watch, and Amazon Alexa.

Nate Black

Hetal Chandaria

Ben Spooner

Howard Wen