Remember meForgot password?
    Log in with Twitter

article imageListening in: researchers record conversation by filming chip bag

By Martin Laine     Aug 5, 2014 in Technology
A team of researchers has developed an algorithm that can be used to reconstruct a private conversation just by filming an object in the same room — in one case, a potato chip bag.
“When sound hits an object, it causes the object to vibrate,” said Abe Davis, a graduate student at the Massachusetts Institute of Technology and lead author of the study, in an article on the MIT news website. “The motion of this vibration creates a very subtle signal that’s usually invisible to the naked eye. People didn’t realize the information was there.”
The team was made up of researchers from MIT, Microsoft, and Adobe. Their paper will be presented at the Siggraph 2014 conference which begins next week in Vancouver, BC. The international conference highlights innovations in computer graphics and interactive techniques, according to its website.
In one experiment, a potato chip bag was filmed through sound-proof glass while someone 15 feet away recited the nursery rhyme “Mary had a little Lamb.” Similar experiments were carried out on aluminum foil, the surface of a glass of water, and the leaves of a potted plant.
In order to extract the audio from the video, the speed of the filming, that is, the frames per second, has to be fast enough to pick up the movement of the object being filmed. While there are commercial high-speed cameras that can film at 100,000 to 200,000 frames per second, the researchers generally used a camera that could film at 2,000 to 6,000 frames per second. They also experimented with an ordinary digital camera.
The video image is then converted into a color-coded graphic that can display movements as small as five-thousandths of a pixel. By running through a series of similar filters, a pattern of speech emerges that can then be reconstructed into audio form.
It stands to reason that the faster the frames-per-second, the clearer the audio will be. But even the relatively slow 60 frames-per-second of an ordinary digital camera can be used to determine the gender of a speaker, the number of speakers in a room, even the identity of a speaker, if they had a speech pattern to compare it with.
“This is new and refreshing,” said Alexei Efros, an associate professor at UC Berkeley. “It’s the kind of stuff no other group would do right now. I’m sure there will be applications that nobody will expect. I think the hallmark of good science is when you do something just because it’s cool and then somebody turns around and uses it for something you never imagined.”
More about Mit, computer science, Siggraph
More news from
Latest News
Top News