MIT Develops Wearable AI System That Can Detect the Tone of the Conversation

Researchers from Massachusetts Institute of Technology's (MIT) Computer Science and Artificial Intelligence Laboratory (CSAIL), in partnership with MIT's Institute of Medical Engineering and Sciences, developed a new type of wearable artificially intelligent system capable of predicting whether the conversation is happy, sad or neutral.

Their new system, to be presented at the Association for the Advancement of Artificial Intelligence (AAAI) conference in San Francisco, can determine the overall tone of the conversation by analyzing audio, text transcriptions and physiological signal of the participants.

"Imagine if, at the end of a conversation, you could rewind it and see the moments when the people around you felt the most anxious," said Tuka Alhanai, a graduate student at MIT and co-author of a related paper, in a press release. "Our work is a step in this direction, suggesting that we may not be that far away from a world where people can have an AI social coach right in their pocket."

For their new system, the researchers used the Samsung Simband to capture high-resolution physiological waveforms of 31 different conversations of several minutes each. These waveforms will then be used to measure features such as heart rate, movements, blood pressure, blood flow and skin temperature. Audio data and text transcripts captured by the system will also be used to analyze the speaker's tone, pitch, energy and vocabulary.

The researchers then trained two algorithms using the data captured by the Simband. One of the algorithms was trained to classify the nature of the conversation as either "happy" or "sad," while the other one was trained to classify each five-second block of every conversation as positive, negative or neutral.

The first algorithm was able to determine the overall tone of the conversation with 83 percent accuracy. The deep-learning techniques used in the system were also able to classify the mood of each five-second interval with an accuracy that was approximately 18 percent above chance on average. The model is also a full 7.5 percent better than any existing approaches.

The next step for the researchers is to improve the emotional granularity of the algorithm to better classify interactions as boring, tense and excited, rather than just labeling them as "positive" or "negative."