Smart Cap is an assistant for the visually impaired that is designed to narrate the description of a scene via pictures from a webcam.
There are about 285 million visually impaired people in the world. They are not able to experience the world the way visually abled people can. Smart Cap aims to provide this missing experience for them by using state-of-the art deep learning techniques from Microsoft Cognitive Services for image classification and tagging.
The Smart Cap aims to bring the world as a narrative to the visually impaired. This narrative is generated by converting the scenes in front of the person to text, which describes the important objects in the scene. Examples of text include “a group of people playing a game of baseball”, “a yellow truck parked next to the car”, or “a bowl of salad on a table”. For the first prototype of the system, one line along with some keywords would be played as audio to the user, but in later versions a detailed description would be added as a feature.
The architecture of the system includes Amazon Alexa, DragonBoard™ 410c, and Microsoft computer vision API’s.
A webcam retrofitted into a regular cap is connected to the DragonBoard410c, which then captures the image from the webcam and sends it to Microsoft API’s for recognition. The response is then inserted in to the DynamoDB database. When the user asks Alexa to describe the scene, the Alexa Skills Kit triggers the Amazon Lambda function to fetch the data from the DynamoDB database. The latest description is then played as an audio response on the Amazon Echo device.