LightBlog

lundi 1 mars 2021

Google Duo uses a new codec for better call quality over poor connections

While U.S. carriers are busy marketing their new 5G networks, the reality is that the vast majority of people won’t experience the advertised speeds. There are still many parts of the U.S. — and around the world — where data speeds are slow, so to compensate, services like Google Duo use compression techniques to efficiently deliver the best possible video and audio experience. Google is now testing a new audio codec that aims to substantially improve audio quality on poor network connections.

In a blog post, the Google AI team details its new high-quality, very low-bitrate speech codec they’ve named “Lyra.” Like traditional parametric codecs, Lyra’s basic architecture involves extracting distinctive speech attributes (also known as “features”) in the form of log mel spectrograms that are then compressed, transmitted over the network, and recreated on the other end using a generative model. Unlike more traditional parametric codecs, however, Lyra uses a new high-quality audio generative model that is not only able to extract critical parameters from speech but is also able to reconstruct speech using minimal amounts of data. The new generative model used in Lyra builds upon Google’s previous work on WaveNetEQ, the generative model-based packet-loss-concealment system currently used in Google Duo.

Lyra architecture

Lyra’s basic architecture. Source: Google

Google says its approach has made Lyra on par with the state-of-the-art waveform codecs used in many streaming and communication platforms today. The benefit of Lyra over these state-of-the-art waveform codecs, according to Google, is that Lyra doesn’t send over the signal sample-by-sample, which requires a higher bitrate (and thus more data). To overcome the computational complexity concerns of running a generative model on-device, Google says Lyra uses a “cheaper recurrent generative model” that works “at a lower rate” but generates multiple signals at different frequency ranges in parallel that are later combined “into a single output signal at the desired sample rate.” Running this generative model on a mid-range device in real-time yields a processing latency of 90ms, which Google says is “in line with other traditional speech codecs.”

Paired with the AV1 codec for video, Google says that video chats can take place even for users on an ancient 56kbps dial-in modem. That’s because Lyra is designed to operate in heavily bandwidth-constrained environments such as 3kbps. According to Google, Lyra easily outperforms the royalty-free open-source Opus codec as well as other codecs like Speex, MELP, and AMR at very low-bitrates. Here are some speech samples provided by Google. Except for audio encoded in Lyra, each of the speech samples suffers from degraded audio quality at very low bitrates.

Clean Speech

Original

Opus@6kbps

Lyra@3kbps

Speex@3kbps

Noisy Environment

Original

Opus@6kbps

Lyra@3kbps

Speex@3kbps

Google says it trained Lyra “with thousands of hours of audio with speakers in over 70 languages using open-source audio libraries and then verifying the audio quality with expert and crowdsourced listeners.” As such, the new codec is already rolling out in Google Duo to improve call quality on very low bandwidth connections. While Lyra is currently aimed at speech use cases, Google is exploring how to make it into a general-purpose audio codec.

The post Google Duo uses a new codec for better call quality over poor connections appeared first on xda-developers.



from xda-developers https://ift.tt/3b6vm40
via IFTTT

Aucun commentaire:

Enregistrer un commentaire