Neural network learns to recognize sounds

MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) researchers have developed techniques to improve sound recognition for computers, which could enhance machine learning.

12/17/2016


The researchers’ neural network was fed video from 26 terabytes of video data downloaded from the photo-sharing site Flickr. Researchers found the network can interpret natural sounds in terms of image categories. For instance, the network might determineIn recent years, computers have gotten remarkably good at recognizing speech and images: Think of the dictation software on most cellphones, or the algorithms that automatically identify people in photos posted to Facebook.

Recognition of natural sounds such as crowds cheering or waves crashing has lagged behind. That's because most automated recognition systems, whether they process audio or visual information, are the result of machine learning, in which computers search for patterns in huge compendia of training data. Usually, the training data has to be first annotated by hand, which is prohibitively expensive for all but the highest-demand applications.

Sound recognition may be catching up, however, thanks to researchers at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL). At the Neural Information Processing Systems conference next week, they will present a sound-recognition system that outperforms its predecessors but didn't require hand-annotated data during training.

Instead, the researchers trained the system on video. First, existing computer vision systems that recognize scenes and objects categorized the images in the video. The new system then found correlations between those visual categories and natural sounds.

"Computer vision has gotten so good that we can transfer it to other domains," said Carl Vondrick, an MIT graduate student in electrical engineering and computer science and one of the paper's two first authors. "We're capitalizing on the natural synchronization between vision and sound. We scale up with tons of unlabeled video to learn to understand sound."

The researchers tested their system on two standard databases of annotated sound recordings, and it was between 13 and 15% more accurate than the best-performing previous system. On a data set with 10 different sound categories, it could categorize sounds with 92% accuracy, and on a data set with 50 categories it performed with 74% accuracy. On those same data sets, humans are 96 and 81% accurate, respectively.

"Even humans are ambiguous," said Yusuf Aytar, the paper's other first author and a postdoc in the lab of MIT professor of electrical engineering and computer science Antonio Torralba. Torralba is the final co-author on the paper.

"We did an experiment with Carl," Aytar said. "Carl was looking at the computer monitor, and I couldn't see it. He would play a recording and I would try to guess what it was. It turns out this is really, really hard. I could tell indoor from outdoor, basic guesses, but when it comes to the details: 'Is it a restaurant?' those details are missing. Even for annotation purposes, the task is really hard."

Complementary modalities

Because it takes far less power to collect and process audio data than it does to collect and process visual data, the researchers envision that a sound-recognition system could be used to improve the context sensitivity of mobile devices.

When coupled with GPS data, for instance, a sound-recognition system could determine that a cellphone user is in a movie theater and that the movie has started, and the phone could automatically route calls to a prerecorded outgoing message. Similarly, sound recognition could improve the situational awareness of autonomous robots.

"For instance, think of a self-driving car," Aytar said. "There's an ambulance coming, and the car doesn't see it. If it hears it, it can make future predictions for the ambulance—which path it's going to take—just purely based on sound."

Visual language

The researchers' machine-learning system is a neural network, so called because its architecture loosely resembles that of the human brain. A neural net consists of processing nodes that, like individual neurons, can perform only rudimentary computations but are densely interconnected. Information such as the pixel values of a digital image is fed to the bottom layer of nodes, which processes it and feeds it to the next layer, which processes it and feeds it to the next layer, and so on. The training process continually modifies the settings of the individual nodes, until the output of the final layer reliably performs some classification of the data.

Vondrick, Aytar, and Torralba first trained a neural net on two large, annotated sets of images: one, the ImageNet data set, contains labeled examples of images of 1,000 different objects; the other, the Places data set created by Torralba's group, contains labeled images of 401 different scene types, such as a playground, bedroom, or conference room.

Once the network was trained, the researchers fed it the video from 26 terabytes of video data downloaded from the photo-sharing site Flickr. "It's about 2 million unique videos," Vondrick said. "If you were to watch all of them back to back, it would take you about two years." Then they trained a second neural network on the audio from the same videos. The second network's goal was to correctly predict the object and scene tags produced by the first network.

The result was a network that could interpret natural sounds in terms of image categories. For instance, it might determine that the sound of birdsong tends to be associated with forest scenes and pictures of trees, birds, birdhouses, and bird feeders.

Benchmarking

To compare the sound-recognition network's performance to that of its predecessors, however, the researchers needed a way to translate its language of images into the familiar language of sound names. So they trained a simple machine-learning system to associate the outputs of the sound-recognition network with a set of standard sound labels.

For that, the researchers did use a database of annotated audio—one with 50 categories of sound and about 2,000 examples. Those annotations had been supplied by humans. But it's much easier to label 2,000 examples than to label 2 million. And the MIT researchers' network, trained first on unlabeled video, significantly outperformed all previous networks trained solely on the 2,000 labeled examples.

Massachusetts Institute of Technology (MIT)

www.mit.edu 

- Edited by Chris Vavra, production editor, Control Engineering, CFE Media,  cvavra@cfemedia.com. See more Control Engineering machine vision stories.



Engineers' Choice Awards
The Engineers' Choice Awards highlight some of the best new control, instrumentation and automation products as chosen by Control Engineering subscribers.
System Integrator Giants
The System Integrator Giants program lists the top 100 system integrators among companies listed in CFE Media's Global System Integrator Database.
System Integrator of the Year
Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
How to Maximize Factory Automation Efficiency with Low Cost Machine Vision
This eGuide illustrates solutions, applications and benefits of machine vision systems.
Wireless Reliability in Harsh Environments
Learn how to increase device reliability in harsh environments and decrease unplanned system downtime.
Human Factors and the Impact on Plant Safety
This eGuide contains a series of articles and videos that considers theoretical and practical; immediate needs and a look into the future.
August 2018
Augmented reality and virtual reality education, autotuning PID control, cybersecurity advice, educating engineers
July 2018
Ladder logic best practices and object-oriented programming, safety instrumented systems, enclosure design issues and challenges, process control advice
June 2018
Discrete and process sensor fundamentals, autotuning controls, system integrator roundtable
Edge Computing
This article collection contains several articles on how today's technologies heap benefits onto an edge-computing architecture such as faster computing, better networking, more memory, smarter analytics, cloud-based intelligence, and lower costs.
Data Center Design
Data centers, data closets, edge and cloud computing, co-location facilities, and similar topics are among the fastest-changing in the industry.
PLCs
Programmable logic controllers (PLCs) represent the logic (decision) part of the control loop of sense, decide, and actuate. Featured articles in this digital report compare PLCs and programmable automation controllers (PACs), industrial PCs, and robotic controllers.
SIDB

Find and connect with the most suitable service provider for your unique application. Start searching the Global System Integrator Database Now!

August 2018
SCADA standardization, capital expenditures, data-driven drilling and execution
June 2018
Machine learning, produced water benefits, progressive cavity pumps
April 2018
ROVs, rigs, and the real time; wellsite valve manifolds; AI on a chip; analytics use for pipelines
John O. Ayuk, PE, CFSE, PMP, CAP
Automation Engineer; Wood Group
Doug Baker
System Integrator; Cross Integrated Systems Group
Jose S. Vasquez, Jr.
Jose S. Vasquez, Jr.
Fire & Life Safety Engineer; Technip USA Inc.
Data Centers: Impacts of Climate and Cooling Technology
This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
Safety First: Arc Flash 101
This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
Critical Power: Hospital Electrical Systems
This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
Engineers' Choice Awards
The Engineers' Choice Awards highlight some of the best new control, instrumentation and automation products as chosen by Control Engineering subscribers.
System Integrator Giants
The System Integrator Giants program lists the top 100 system integrators among companies listed in CFE Media's Global System Integrator Database.
System Integrator of the Year
Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
How to Maximize Factory Automation Efficiency with Low Cost Machine Vision
This eGuide illustrates solutions, applications and benefits of machine vision systems.
Wireless Reliability in Harsh Environments
Learn how to increase device reliability in harsh environments and decrease unplanned system downtime.
Human Factors and the Impact on Plant Safety
This eGuide contains a series of articles and videos that considers theoretical and practical; immediate needs and a look into the future.
August 2018
Augmented reality and virtual reality education, autotuning PID control, cybersecurity advice, educating engineers
July 2018
Ladder logic best practices and object-oriented programming, safety instrumented systems, enclosure design issues and challenges, process control advice
June 2018
Discrete and process sensor fundamentals, autotuning controls, system integrator roundtable
Edge Computing
This article collection contains several articles on how today's technologies heap benefits onto an edge-computing architecture such as faster computing, better networking, more memory, smarter analytics, cloud-based intelligence, and lower costs.
Data Center Design
Data centers, data closets, edge and cloud computing, co-location facilities, and similar topics are among the fastest-changing in the industry.
PLCs
Programmable logic controllers (PLCs) represent the logic (decision) part of the control loop of sense, decide, and actuate. Featured articles in this digital report compare PLCs and programmable automation controllers (PACs), industrial PCs, and robotic controllers.
SIDB

Find and connect with the most suitable service provider for your unique application. Start searching the Global System Integrator Database Now!

August 2018
SCADA standardization, capital expenditures, data-driven drilling and execution
June 2018
Machine learning, produced water benefits, progressive cavity pumps
April 2018
ROVs, rigs, and the real time; wellsite valve manifolds; AI on a chip; analytics use for pipelines
John O. Ayuk, PE, CFSE, PMP, CAP
Automation Engineer; Wood Group
Doug Baker
System Integrator; Cross Integrated Systems Group
Jose S. Vasquez, Jr.
Jose S. Vasquez, Jr.
Fire & Life Safety Engineer; Technip USA Inc.
Data Centers: Impacts of Climate and Cooling Technology
This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
Safety First: Arc Flash 101
This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
Critical Power: Hospital Electrical Systems
This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
Engineers' Choice Awards
The Engineers' Choice Awards highlight some of the best new control, instrumentation and automation products as chosen by Control Engineering subscribers.
System Integrator Giants
The System Integrator Giants program lists the top 100 system integrators among companies listed in CFE Media's Global System Integrator Database.
System Integrator of the Year
Each year, a panel of Control Engineering and Plant Engineering editors and industry expert judges select the System Integrator of the Year Award winners in three categories.
How to Maximize Factory Automation Efficiency with Low Cost Machine Vision
This eGuide illustrates solutions, applications and benefits of machine vision systems.
Wireless Reliability in Harsh Environments
Learn how to increase device reliability in harsh environments and decrease unplanned system downtime.
Human Factors and the Impact on Plant Safety
This eGuide contains a series of articles and videos that considers theoretical and practical; immediate needs and a look into the future.
August 2018
Augmented reality and virtual reality education, autotuning PID control, cybersecurity advice, educating engineers
July 2018
Ladder logic best practices and object-oriented programming, safety instrumented systems, enclosure design issues and challenges, process control advice
June 2018
Discrete and process sensor fundamentals, autotuning controls, system integrator roundtable
Edge Computing
This article collection contains several articles on how today's technologies heap benefits onto an edge-computing architecture such as faster computing, better networking, more memory, smarter analytics, cloud-based intelligence, and lower costs.
Data Center Design
Data centers, data closets, edge and cloud computing, co-location facilities, and similar topics are among the fastest-changing in the industry.
PLCs
Programmable logic controllers (PLCs) represent the logic (decision) part of the control loop of sense, decide, and actuate. Featured articles in this digital report compare PLCs and programmable automation controllers (PACs), industrial PCs, and robotic controllers.
SIDB

Find and connect with the most suitable service provider for your unique application. Start searching the Global System Integrator Database Now!

August 2018
SCADA standardization, capital expenditures, data-driven drilling and execution
June 2018
Machine learning, produced water benefits, progressive cavity pumps
April 2018
ROVs, rigs, and the real time; wellsite valve manifolds; AI on a chip; analytics use for pipelines
John O. Ayuk, PE, CFSE, PMP, CAP
Automation Engineer; Wood Group
Doug Baker
System Integrator; Cross Integrated Systems Group
Jose S. Vasquez, Jr.
Jose S. Vasquez, Jr.
Fire & Life Safety Engineer; Technip USA Inc.
Data Centers: Impacts of Climate and Cooling Technology
This course focuses on climate analysis, appropriateness of cooling system selection, and combining cooling systems.
Safety First: Arc Flash 101
This course will help identify and reveal electrical hazards and identify the solutions to implementing and maintaining a safe work environment.
Critical Power: Hospital Electrical Systems
This course explains how maintaining power and communication systems through emergency power-generation systems is critical.
click me