MIT researchers have developed an interface that can transcribe words that a person verbalizes internally but doesn’t really speak aloud.
The system consists of a wearable gadget and an associated computing system. Electrodes within the machine pick up neuromuscular signals in the jaw and face which are triggered by internal verbalizations — saying words or phrases “in your head” — but are undetectable to the eye. The signals are fed to a machine-learning system that has been trained to correlate particular signals with specific words.
The machine also includes a pair of bone-conduction headphones, which transmit vibrations via the bones of the face to the inner ear. Because they don’t obstruct the ear canal, the headphones allow the system to convey sound to the person without interrupting a conversation or otherwise interfering with the person’s auditory experience.
The gadget is thus part of a complete silent-computing system that lets the person undetectably pose and receive answers to difficult computational issues. In one of the researchers’ experiments, for example, subjects used the system to silently report their opponents’ moves in a chess game and just as quietly receive computer-recommended responses.
“The motivation for this was to construct an IA device — an intelligence-augmentation device,” says Arnav Kapur, a graduate student at the MIT Media Lab, who led the development of the brand new system. “Our concept was: Could we’ve a computing platform that’s more internal, that melds human and machine in some ways and that feels like an internal extension of our own cognition?”
“We basically cannot live without our smart phones, our digital devices,” says Pattie Maes, says Kapur’s thesis advisor and a professor of media arts and sciences, “But at the moment, using those gadgets is very disruptive. If I want to look something up that’s relevant to a dialog I’m having, I’ve to find my smartphone and type in the passcode and open an app and type in some search keyword, and the whole thing requires that I completely shift attention from my surroundings and the people that I’m with to the phone itself. So, my students and I’ve for a very long time been experimenting with new form factors and new types of experience that enable people to still benefit from all the wonderful knowledge and services that these gadgets give us, but do it in a way that lets them remain in the present.”
YOU MAY LIKE: Engineers create plants that glow using nanoparticles
The researchers described their gadget in a paper they presented at the Association for Computing Machinery’s (ACM) Intelligent User Interface conference. Kapur is the first author on the paper, whereas Professor Maes is the senior author, and they are joined by Shreyas Kapur, an undergraduate major in electrical engineering and computer science at MIT.
The concept that internal verbalizations have physical correlates has been around since the 19th century, and it was seriously investigated in the 1950s. One of the goals of the speed-reading movement of the 1960s was to eliminate internal verbalization, or “subvocalization,” as it’s known.
However, subvocalization as a computer interface is virtually unexplored. The researchers’ first step was to find out which locations on the face are the sources of the most dependable neuromuscular signals. So they carried out experiments during which the same subjects were requested to subvocalize the same series of phrases four times, with an array of 16 electrodes at different facial locations every time.
The researchers wrote code to investigate the resulting data and found that indicators from 7 particular electrode locations were consistently able to distinguish subvocalized words. In the conference paper, the researchers report a prototype of a wearable silent-speech interface, which wraps around the back of the neck like a phone headset and has tentacle-like curved appendages that touch the face at seven places on either side of the mouth and along the jaws.
But in current experiments, the researchers are getting comparable outcomes using only 4 electrodes along one jaw, which should lead to a less obtrusive wearable gadget.
Once they’d selected the electrode locations, the researchers started collecting data on a few computational tasks with limited vocabularies — about 20 words each. One was arithmetic, during which the person would subvocalize big addition or multiplication problems; another was a chess application, wherein the person would report moves utilizing the standard chess numbering system.
Then, for every application, they used a neural network to seek out correlations between particular neuromuscular signals and specific words. Like most neural networks, the one, the researchers, used is arranged into layers of easy processing nodes, each of which is connected to some nodes in the layers above and below. Information is fed into the bottom layer, whose nodes process it and pass them to the next tier, whose nodes process it and pass them to the next layer, and so forth. The output of the final layer yields is the result of some classification process.
The essential configuration of the researchers’ system features a neural network trained to identify subvocalized words from neuromuscular signals. However, it can be customized to a specific person via a process that retrains just the final two layers.
Practical Applications of Such a Device
Utilizing the prototype wearable interface, the researchers carried out a usability study during which 10 subjects spent about 15 minutes each customizing the arithmetic application to their own neurophysiology, then spent another 90 minutes using it to execute computations. In that study, the system had a mean transcription accuracy of about 92%.
However, Kapur says, the system’s efficiency should improve with extra training data, which could be collected during its ordinary use. Though he hasn’t crunched the numbers, he estimates that the better-trained system he uses for demonstrations has an accuracy rate greater than 92%.
In ongoing work, the researchers are collecting a wealth of information on more elaborate conversations, in the hope of building applications with far more expansive vocabularies.
“We are in the middle of collecting data, and the results look good,” Kapur says. “I think we will achieve full conversation some day.”
Thad Starner, a professor in Georgia Tech’s College of Computing thinks that there is a lot more potential in such a system: “Like, say, controlling the aeroplanes on the tarmac at Hartsfield Airport here in Atlanta. You’ve got jet noise all around you; you are wearing these massive ear-protection things — wouldn’t it be great to communicate with a voice in an environment where you normally wouldn’t be able to? You can imagine all these situations where you have a high-noise environment, like the flight deck of an aircraft carrier, or even places with a lot of machinery, like a power plant or a printing press. This is a system that would make sense, especially because often in these types of or situations people are already wearing protective gear. For example, if you’re a fighter pilot, or if you’re a firefighter, you’re already wearing these masks.”
“The other thing where this is extremely helpful is special ops,” Starner adds. “There is a lot of places where it’s not a noisy environment but a silent environment. A lot of time, special-ops folks have hand gestures, but you can’t always see those. Wouldn’t it be nice to have silent-speech for communication between these people? The last one is individuals who have disabilities where they can’t vocalize normally. For example, Roger Ebert could not speak anymore because lost his jaw to cancer. Could he do this sort of silent speech and then have a synthesizer that would speak the words?”