The social media giant is looking for a way to harness the enormous potential of audio.
The ICML has just closed. The world’s largest event on machine learning research. The perfect opportunity for Facebook to present its latest work in this field. The social media giant has published 30 scientific research studies, one of which is of particular interest to us. It is an artificial intelligence capable of separating up to 5 voices on the same microphone while reducing background noise.
Facebook model can separate up to 5 voices and isolate background noise
This is an innovation that opens up great opportunities. We think in particular of voice assistants who still sometimes have difficulty understanding us well. With such technology, the exchanges between a human and an intelligent assistant like Siri or Alexa could be much more fluid. This artificial intelligence model could also improve audio quality for people with hearing aids. Beyond separating voices, the model unveiled by Facebook knows how to isolate background noise. Imagine all the possible applications of such technology, especially for recordings.
As explained by researchers at FAIR (Facebook Artificial Intelligence Research) Israel, Facebook’s artificial intelligence laboratory, a new neural network architecture running directly on the raw audio waveform has been developed. To work, this model does not necessarily need to know the total number of speakers. The AI automatically detects the number of different people and can separate up to 5 voices on the same audio track.
As we can read on the Facebook AI website, this method does seem revolutionary. The researchers used the WSJ0-2mix and WSJ0-3mix datasets to achieve an improvement in SI-SNR (the signal-to-noise ratio, a common measure of separation quality) of more than 1.5 decibels compared to the best models on the market. At the moment the technology presented by Facebook researchers works in the studio. The next step is to test its performance in real conditions.
The potential of audio
Thanks to artificial intelligence, audio seems to have real potential. A few months ago, Google Duo also presented a model capable of improving the audio quality of calls. WaveNetEQ is a machine learning model that can replace short sounds when your audio data gets lost along the way. Practical when you know that 99% of calls suffer a loss of audio data.
Twitter has been offering an audio message feature for iOS users for the past few weeks. With this new option, it is finally possible to share much more detailed comments since the audio clips can last up to 140 seconds. Once this ceiling is reached, a new message can even be inserted below the original, in a thread. Audio is becoming more and more important through the media and we are seeing this especially with the development of podcasts.