Proof of Concept for Creating a Mixed Reality Application for AAC Users

Author(s): Adrienne Hembrick, Swati Rampalli

Mentor(s): Craig Yu, Computer Science

Abstract

Augmentative and alternative communication (AAC) is a communication mechanism for those with complex communication needs, and existing AAC devices are forms of assistive technology containing hardware and software that can support or replace natural speech entirely. Augmented reality (AR), a user’s visual perception supplemented with additional computer-generated sensory modalities, is rising in its abilities to support assistive technology through rehabilitation therapies helping with motor and visual impairments. However, while immersive learning such as augmented and virtual reality have greatly supported individuals with disabilities, current AAC devices don’t carry the contextual intelligence to prompt appropriate conversation choices or phrases based on a user’s environment. This is particularly concerning for emergency situations where real-time communication is important for supporting AAC users who not only have to consider their decreased communication accommodations but navigate a crisis with heightened emotions. This prompts the need for an AI-driven AAC system aware of the environmental situation and demand. Major fields of computer science research, such as human computer interaction and computer vision are being leveraged to support context-aware, adaptive technologies to empower those with communication impairments. We discuss a proof of concept in creating a mixed reality application that will not limit AAC users’ vocabulary usage, by using object detection, conversation retrieval, and user interface display on the Microsoft HoloLens to create a context-based, self learning AAC system.

Video Transcript

[introduction slide] My name is Adrienne and I am from Virginia Commonwealth University, and I am from Swati and I am from the University of Minnesota Twin-Cities. We are both working under the guidance of Dr. Craig Yu in creating a proof of concept for a mixed reality application that will help AAC users in everyday communication assistance.

[motivation]

In understanding the motivation behind this, augmentative and alternative communication (AAC) devices are forms of assistive technology being utilized to provide a communication mechanism for those with speech disabilities or other complex communication needs, containing hardware and software that can support or replace natural speech entirely. Augmented reality (AR), which is essentially overlaying virtual objects in the physical world, is being highly integrated into assistive technology, a prominent example being rehabilitation therapies for those with motor and visual impairments. However, while immersive learning is rising in its abilities to support those with disabilities, current AAC devices don’t carry the contextual intelligence to prompt appropriate conversation choices based on a user’s environment. This is especially important for emergency situations where real-time communication is important for supporting AAC users who not only have to consider their decreased communication accommodations but navigate a crisis with heightened emotions. This prompts the need for an AI-driven AAC system aware of the environment that can empower those with communication impairments so as not to limit their vocabulary in demanding situations.

[objective]

Our work aims to provide a proof of concept for creating a mixed reality application that can perform object detection within a grocery store environment to assist people with complex communication needs in everyday conversations. The principal device that will be used is the Microsoft Hololens, where context-based sentences will appear on its interface for users to select. The pipelined approach of this proof of concept involves the following: object recognition, conversation retrieval, user interface display, and text-to-speech generation. Based on the context of a grocery store, object recognition will detect objects commonly found within a grocery store (i.e. apple, milk, etc.). After retrieving the determined keywords, or object names, several sentences related to the keywords will be pulled from a sentence database and individual weights assigned such that sentences used more often will be ranked higher priority. The probability of a sentence being retrieved will be determined with a Naïve Bayes approach. The top three sentences with the highest probabilities will be sent to the user interface of the Microsoft Hololens for display, allowing the user to interact and click on the sentence they would like to be read out loud.

[object recognition]

Object recognition was performed with the expectation that our application would be able to accurately detect common grocery store items, such as apples, sugar, etc. Object recognition was carried out with three different approaches, or attempts, that each involved training the YOLOv3 model on a dataset consisting of grocery store items. YOLOv3 is a popular object detection model we chose specifically for its faster speed during real-time detection and limited spatial constraints in relation to our dataset. The initial two trials utilized a Python library called ImageAI, however yielded considerably low accuracy results. The first attempt specifically produced unpromising results and was left out of accuracy comparison. The last result replicated the results of the GrocerEye project, providing weights for a pretrained model yielding notably high accuracies for the classes of objects. The dataset used with this model was the Freiburg Grocery Item dataset, consisting of 25 different classes of commonly found supermarket items. Notable differences across each approach can be potentially attributed to the following: larger variation in dataset images (i.e. lighting, packaging, angle, etc.) was incompatible with the ImageAI library in that it demanded more time for training the model, which conflicted with GPU limitations provided by Google Colab. For this reason, ImageAI was deemed unfit for accommodating to the dataset size and variation that would make the model more robust.

[user interface]

On the front end is the user interface which consists of a menu with dialogue option that when selected can execute a text to speech task thread. The user interface was created using Unity 2018 and the MRTK provided by Microsoft. The user interface was tested with a combination of the unity game emulator as well as the Hololens Emulator. In order for the user interface to function, the hololens detects the user’s palm, thus bringing up the dialogue menu. The user would then use their other hand to select the more preferred dialogue option and the Hololens will speak the selected phrase out loud.

[discussion/future work]

Some future goals for us as we continue our research. Work on fulling connecting the back end or the object recognition to the front ended user interface. This would also involve creating a self learning AAC system for conversation retrieval based on user popularity and usage as well as adapting the user interface so that the dialogue options would change based on the items detected in the environment as well as making the design of the ui more accessible and user friendly.

Leave a Reply