2018-04-06 Researchers develop device that can ‘hear’ your internal voice, The Guardian
New headset can listen to internal vocalisation and speak to the wearer while appearing silent to the outside world
by Samuel Gibbs
MIT’s AlterEgo headset can ‘hear’ internalised voices and speak to the user through a bone-conduction system. Photograph: Lorrie Lejeune/MIT
Researchers have created a wearable device that can read people’s minds when they use an internal voice, allowing them to control devices and ask queries without speaking.
The device, called AlterEgo, can transcribe words that wearers verbalise internally but do not say out loud, using electrodes attached to the skin.
“Our idea was: could we have a computing platform that’s more internal, that melds human and machine in some ways and that feels like an internal extension of our own cognition?” said Arnav Kapur, who led the development of the system at MIT’s Media Lab.
Kapur describes the headset as an “intelligence-augmentation” or IA device, and was presented at the Association for Computing Machinery’s Intelligent User Interface conference in Tokyo. It is worn around the jaw and chin, clipped over the top of the ear to hold it in place. Four electrodes under the white plastic device make contact with the skin and pick up the subtle neuromuscular signals that are triggered when a person verbalises internally. When someone says words inside their head, artificial intelligence within the device can match particular signals to particular words, feeding them into a computer.
The computer can then respond through the device using a bone conduction speaker that plays sound into the ear without the need for an earphone to be inserted, leaving the wearer free to hear the rest of the world at the same time. The idea is to create a outwardly silent computer interface that only the wearer of the AlterEgo device can speak to and hear.
“We basically can’t live without our cellphones, our digital devices. But at the moment, the use of those devices is very disruptive,” said Pattie Maes, a professor of media arts and sciences at MIT. “If I want to look something up that’s relevant to a conversation I’m having, I have to find my phone and type in the passcode and open an app and type in some search keyword, and the whole thing requires that I completely shift attention from my environment and the people that I’m with to the phone itself.”
Maes and her students, including Kapur, have been experimenting with new form factors and interfaces to provide the knowledge and services of smartphones without the intrusive disruption they currently cause to daily life.
The AlterEgo device managed an average of 92% transcription accuracy in a 10-person trial with about 15 minutes of customising to each person. That’s several percentage points below the 95%-plus accuracy rate that Google’s voice transcription service is capable of using a traditional microphone, but Kapur says the system will improve in accuracy over time. The human threshold for voice word accuracy is thought to be around 95%.
Kapur and team are currently working on collecting data to improve recognition and widen the number of words AlterEgo can detect. It can already be used to control a basic user interface such as the Roku streaming system, moving and selecting content, and can recognise numbers, play chess and perform other basic tasks.
The eventual goal is to make interfacing with AI assistants such as Google’s Assistant, Amazon’s Alexa or Apple’s Siri less embarrassing and more intimate, allowing people to communicate with them in a manner that appears to be silent to the outside world – a system that sounds like science fiction but appears entirely possible.
The only downside is that users will have to wear a device strapped to their face, a barrier smart glasses such as Google Glass failed to overcome. But experts think the technology has much potential, not only in the consumer space for activities such as dictation but also in industry.
“Wouldn’t it be great to communicate with voice in an environment where you normally wouldn’t be able to?” said Thad Starner, a computing professor at Georgia Tech. “You can imagine all these situations where you have a high-noise environment, like the flight deck of an aircraft carrier, or even places with a lot of machinery, like a power plant or a printing press.”
Starner also sees application in the military and for those with conditions that inhibit normal speech.