ALERT! AI Can Understand Images and Incorporate Sounds Through Computer Vision
Just imagine a set of photos framed and fed on a computer, and then it automatically configures the right background sound for each slide. That will be like a hassle-free AVP in minutes! Now, just imagine that this type of artificial intelligence exists, and Disney Research has discovered it.
A picture of a cow can be backed up with a moo sound and a pig with its squeal. These are just a few of the capabilities of this new AI being developed by experts from Disney Research.
"Videos with audio tracks provide us with a natural way to learn correlations between sounds and images," said Jean Charles-Bazin, the associate research scientist of Disney Research. "Video cameras equipped with microphones capture synchronized audio and visual information. In principle, every video frame is a possible training example."
The system that their team tried to develop was able to determine the right and appropriate sound for each photo or film shown. With this innovative capacity, people with visual difficulties can cope with what's happening. They said the project was really challenging and tricky; thus, they have explored and studied video collections to enhance their system.
Markus Gross, Vice president of Disney Research, said that video images and the associated sounds can be ambiguous. "By figuring out a way to filter out these extraneous sounds, our research team has taken a big step toward an array of new applications for computer vision," he added.
Through tedious screening, employing computer algorithm to "learn" the proper sounds for a specific frame, the extraneous sounds were already eliminated, focusing on a suitable sound. Now, the system is still being refined with the team's innovative methods to further improve the legacy of Disney.
The system developed by the team was presented at the European Conference on Computer Vision (ECCV) workshop held at Amsterdam.