After the release of the Codec Spec in march 2018 (“frozen bitstream” and reference decoder), the next step qs to show that AV1 could be used in production. The decoder had to be made fast enough on commodity hardware, hardware vendor had to integrate AV1 support in their chips, for the base profile. Then advanced profiles (SVC, …) and modes (lossless, Real-Time) would deliver. For the work on real-time and SVC modes, a specific subgroup was created to continued the work beyond just the codec Spec. Integration with the Real-Time-Protocol (i.e. writing an AV1 RTP Payload specification), and usage of SVC in conjunction with Media Servers (RTP Header extensions, …) needed to happen. On Halloween 2018, CoSMo demo’ed live the first AV1 RTP integration in WebRTC. It did not support SVC and was not Real-Time. On June 26 2019, Cisco demonstrated live from New York the first Real-Time AV1 RTP Integration in WebRTC, through a modified version of their flagship product: webex. It denotes a new step in the evolution of AV1, one that happens 12 months earlier than anybody thought it would.
I. Generic notes about recent Codec History & different ways to go at it.
The Alliance for Open Media (AOM) was created to ensure the genesis of a new, royalty-free codec they called AV1, based on the initial effort of three codec projects: google’s VP10, Mozila/Xiph’s Daala and Cisco’s Thor.
Royalties were not the only problem. The IP claims and “IP pool” formation is a problem that has plagued the industry for quite some time and became so egregious with H.265 that the only chairman that MPEG has ever had explicitly denounced it in a dedicated blog post. Jonathan Rosenberg, Cisco CTO at the time, also attacked it directly (here in 2015 when the pool situation started to go bad, and here in march 2018 when AV1 was released) as the main reasons to just skip H.265. More on Cisco later.
Traditionally, codecs have been about keeping the quality while achieving better compression. The movies or the broadcast programme were almost only recorded, and there was a lot of time to find the right settings to achieve best quality for a given, fixed physical support size (CD, DVD, BR-Disk, …). The traditional benchmarks will then focus on two use case: smallest possible file at a given quality factor, or highest quality achievable for a fixed size. Most of the time, this involve two passes, one pass which extract statistics on the content, and a second pass which actually encodes. It is assumed, that from one generation to the next you should expect 20~30% reduction in size, for an increased complexity / CPU footprint, for a given fixed quality.
Nowadays, codecs have any usages, and each use case has different expectations. There are different angles that need to be optimised at the same time, and priorities need to be set.
- Smaller Size:
- We want to use the minimum amount of bandwidth or storage space,
- Better Quality
- We want to support higher input resolutions (4k, 8k, …)
- We want to support higher input fps (60, 120fps, …)
- We want to support better color spaces (HDR, 8/10/12bits, …)
- We want the encoding/decoding not to reduce quality (subjective metrics, VMAF, …),
- We want a lossless mode (ScreenSharing). This is one of the new codec mode that emerged in the past years.
- Decode Speed
- We always want to decode in real-time (30)
Those above are the traditional goals of a codec. Very early in the history, the need for better codecs for video that would be sent over a network, and not burned on a physical media was seen and additional angles were optimised.
MPEG made the first request for scalable video codec in 2003, which eventually found its way in H.264 Annex G, at a time where H.222 (MPEG-2 TS), H.263, and H.320 were ruling the video world.
- Better Resilience to Network conditions, and adaptability,
- I want the video stream to behave well in presence of packet loss, jitter and latency,
- I want the video stream to adapt automatically bitrate, resolution, fps, and bit depths when need be to ensure playback continuity.
Eventually, Real-Time Video became important enough to justify it s own requirements and codec mode.
- Better Speed
- We want to encode as fast as possible, sometimes as fast as real-time (30/60fps)
- Better latency
- We need a real-time mode, where the time it takes for a video frame to go from capture to display should be less than 300ms in good network conditions.
II. Timeline of AV1
In January 2018, Apple joins AOMedia.
In March 2018, AOMedia announced the release of AV1 along with its (excruciatingly slow) reference implementation: libaom.
By September 2018, chrome 70 and Firefox nightly had added some kind of support for decoding / playing AV1, and youtube and Netflix start proposing limited support, test videos, and so on and so forth.
In October 2018, CoSMo Software announced the first AV1 integration in RTP and WebRTC. Not real-time, no SVC support.
In December 2018, AOMedia Sponsored dav1d encoder has been released. It became quickly the de-facto standard decoder, and is included e.g. in Firefox 67, ….
In January 2019, CoSMo Software joins AOMedia.
In March 2019, dav1d Version 0.2 was released.
In March 2019, Samsung joins AOMedia.
On April 8th, 2019, At NAB, INTEL and NETFLIX, both founding members of AOMedia, announced their collaboration around the SVT-AV1 open-source codec that INTEL had released in February 2019 , boasting great throughput rates, but with improvable latency.
On 18 April 2019, Allegro DVT announced its AL-E210 multi-format video encoder hardware IP, the first publicly announced hardware AV1 encoder.
in May 2019 dav1d Version 0.3 was announced with further optimisations demonstrating performance 2 to 5 times faster than libaom.
On May 28th, 2019, Realtek announced the RTD2893, its first integrated circuit with AV1 decoding, up to 8K. On June 17th 2019, it announced the RTD1311 SoC for set-top boxes with an integrated AV1 decoder.
On June 26th 2019, Cisco makes a public demo of the first Real-Time AV1 integration in RTP and WebRTC. It does not supports SVC. It will not be open-source anytime soon (like they did OpenH264).
On July 9th, 2019, at CommConUK, CoSMo Software release a demo of Real-Time AV1 integration in RTP and WebRTC, answering Cisco’s. It also does not support SVC. Availability in their streaming service MilliCast.com is announced for IBC, along with other AV1 goodness.
III. What does that mean for me, and what’s next.
It means that AV1 main profile is ready for production, and actually already integrated in some services. While AOMedia is still working on the specification of AV1, especially when it comes to SVC and its interaction with media servers, the simpler usage of AV1 that do not require SVC are perfectly viable today.
If you are in this space, and do not have an AV1 project ongoing, you’re late. The state of the art, as set by Cisco and CoSMo demo is now: 720p at 30fps, encoded in real-time, on a mac book pro, with same quality as H.264 high-profile but using 30% less bandwidth.
Just like Cisco, you can just bootstrap directly from H.264, or if you could/can not wait, do like Netflix and Google and use VP9 as a transit destination on the way to AV1. The SVC structure in AV1 (k-SVC) has been greatly influenced by Google’s work on bandwidth usage optimization with layered codecs done in VP9 for stadia, and as such going from VP9 to AV1 later should be relatively easy. It is likely that VP9 profile 2 (up to 10bits I420 input) has been motivated by the same project, and is also an intermediary step toward HDR, which would require additional work to support not only 12bits images, but also I422 and I444 as input.
The next steps will be commodity HW acceleration and mobile HW acceleration availability, and SVC implementations. While the AOMedia real-time codec group is hard at work to prepare a specification for the RTP payload and SVC support, it is not clear exactly when they will deliver. Given the race right now to make announcements, and the now public positioning of several players toward AV2, it would be great to deliver those two specifications rapidly (payload, header extensions), to produce interoperable solutions quickly. So far, all teh successful implementations have been kept closed.
If you’re looking for packages or expertise for your AV1 project, you can contact CoSMo Software.