CoSMo provided the H264 simulcast implementation to chrome and safari (based on earlier patch by highfive, kudos to them). We helped Intel and Apple work together to put H265 in libwebrtc. AOMedia members, we also were among the first to have a realtime implementation of AV1 in libwebrtc, and have been regularly speaking publicly at different conferences about it. Today, some of this work is becoming available in consumer versions of the browser. Let us give you through enabling it, and taking it out for a ride.
If you are only interested in enabling the codecs, you can just skip the following sections and go directly to the bottom of this blog post.
Why are Apple, Google, taking decisions that appears to contradict when it comes to codecs, or webrtc?
If you want to generate traffic, take all the announcements, point to the inconsistencies, and claim that there is a codec war, or secret roadmaps. If you can bring it to the conspiracy theory level, and create a big enough controversy, you have generated all the buzz you could hope for.
More seriously, first, a little word about the perceived inconsistencies of decisions. People see Apple, or Google, as unique entities with a common unified goal. The reality is, beyond a certain size, any corporation is divided into teams, or units, with different goals, and internal politics emerge.
Let’s take Apple for example: you have the VOD side and the RTC side. On the VOD side you have the king technology HLS, that has reigned without mercy on the streaming world for most the past decade. Even though it is **NOT** a standard, it is used everywhere, by almost everyone, and is mandated by the apple store to stream anything to an apple Device. The entire AppleTV Ecosystem is based on it. Lots of revenue is based on it. This technology is based on usual codecs, and specifically H264 and HEVC / H265 for video. Those are mandatory to use if you want to be compliant, and for your app to go in the store. That gives Apple a huge leverage. The pendant is that Apple is providing hardware for H265 support in all its device. Needless to say, latency, or bandwidth management, (or security, which HLS delegate to the underlying transport) is not a focus for this side of Apple, and the focus is more on what they call quality, and resolution. In the HLS world quality means adding buffers everywhere.
Now, it is worth stopping here for minute to address the licensing problem. H265 licensing situation is a mess. However, as with any codec, if you leverage a Hardware implementation, the burden of the license is on the hardware provider. That s a key point that a lot of people in the webrtc industry forget or underestimate.
Let me provide a telling example.
When we provided the H264 simulcast implementation to libwebrtc as a patch, it took 9 months to get accepted by Google. Apple actually adopted it before Google did. The (official) reason? Legal review. Even with a much simpler H264 license landscape than H265, the legal review for H264 simulcast took 9 months to google (and maybe 6 for apple). Admittedly, there was more than the codec to be validated (RTP, simulcast itself), but still.
Most of the people in the WebRTC ecosystem know that today H264 in WebRTC is only supported in Android on a limited number of devices that have Hardware Acceleration. One of the reason is that shipping it with a software implementation would make the browser vendors liable. Windows Firefox users have been prompted to download the openH264 dll the first time they used that codec. That iss because for Firefox not to be liable, they both:
- cannot compile the codec implementation (which Cisco does for everyone with openH264),
- but also cannot ship it. The end user needs to install it itself. Since legally the only binding action on a web page is a click in a prompt, here you go.
On the RTC side of things at Apple, you have Facetime and now Safari. For them using hardware accelerated encoder is always good (battery life has a great impact on UX), but it should not come at the cost of latency. Those are antagonistic goals to the much bigger HLS/MPEG-DASH/CMAF team within Apple. For example, as of today, none of the hardware encoder have a true real-time mode, even if a private API called VTC is used, and should be made public soon, with among other things a 0-frame buffer. They are fighting an uphill battle, as, outside of the voip/webrtc world, #webrtc is perceived as a low quality solution that is barely good enough for 1-1 social calls and not much more. The fact that HLS / MPEG DSH is generating directly a lot of revenue, and that safari and FaceTime are not, is making their fight even more difficult. The improvements of webrtc usage in the past 10 years, the pressure from cisco originally (a big part of their cisco/apple partnership was about enabling the same experience with webrtc that FaceTime, or native call could provide, and led to the opening of h264 hardware acceleration API, and replayKit among other things), and then from all the other big players that had a product depending on webrtc (first app in app store: FB messenger, second app in the pp store: WhatsApp …). The current Work From Home situation also provide extra pressure for vendors to support webrtc.
With all this in mind, let’s revisit some of Apple decisions.
Why adding H265 in webrtc? The question is more, why not? The code was already made available by INTEL. Apple already had H265 hardware acceleration. It does not reduce the capacity for those who can’t support it, but it allows peers with the capacity to have an improved experience. It’s only a win. It was was helping internally bringing the two groups together and leveraging a common asset. In practice, it took less than 2 days for one very jet-lagged Apple engineer and the main coder behind the implementation at INTEL to get the code into libwebrtc-in-webkit and to have a working version. It was not a big effort.
Of course, Apple never takes a decision about webrtc without asking google / mozilla / MS about it, because they cannot afford to maintain too big of a fork and because the web platform is consensus based. MS already had an hardware only support of H265, and google was not opposed to the patch, if it was hardware based, replicating more or less what they had done with H264 for android.
Why caring about AOMedia and AV1 if you’re betting on H265?
First, those are not mutually exclusive. The decision to use H265 in multiple Apple products was taken a long time ago, while Av1 is more recent, and already much more efficient. As far as the codec is concerned, AV1 has been out for years now, and AOMedia is already discussing about AV2. While Apple, as usual, did not comment on why they joined AOMedia, the individual they sent belong to the HLS group, and only asked questions about CMAF packaging of the AV1 bitstream, seemingly indicating that it has nothing to do with WebRTC for the time being.
The fact that the webrtc team within safari decided to support H265 and that the HLS team at Apple is involved in AOMedia do not seem linked at all for now.
What is interesting about AOMedia is that the membership comes with some very interesting protective measure when it comes to IP. This has been a problem plaguing the streaming industry for long, and the state of H265 licensing is but one example of it, and it is possible that Apple joining AOMedia was in part motivated by the legal protection that AOMedia provides, both in term of legal due diligence on codecs, and on litigation protection fund for the members.
What about Google then?
Google is also a big corporation, with the same problems. If you look at Webrtc, you have the core Webrtc team in Stockholm, the hangout/meet team, the Webrtc network team in Seattle, the Webrtc chrome team in mountain view, the stadia team and the duo team in Seattle. There are two P. managers, Niklas and Huib, and then the founding fathers Serge and Justin. That of course overlaps with the chrome team (if only for the implementation, build system test system, QA …. ) and the youtube team, which owns the codec development , and the Stadia client. Lot os stakeholders with sometimes different focus, roadmap, and timeline.
For YouTube, the 2-pass version of the encoder is the most important. Since they own the codec team, it can make the real-time aspects of codec development a secondary goal at time. That being said, from our experience, libaom has had a real-time mode way before SVT-AV1 (the other contender for becoming the official reference code base for AV2 moving forward) as, so practically, there has been no problem for us during the AV1 in webrtc project.
The google representative at AOMedia (and founder), is from the youtube team. However, in the Real-time group inside the codec group, which discuss about RTp payload and SVC for MANE and SFUs specifically, Qoogle Webrtc team is represented by no less than 3 engineers and one engineering manager.
Since so many products depend on webrtc the two bigger groups (chrome and youtube) have to take webrtc into account. When working on enabling RTC AV1 in webrtc in chrome, all those groups had to be involved, and it seemed to be a first for those specific individuals, but business as usual otherwise. libaom had to add a real-time mode, which was done in april 2019. The default libaom support in chrome was non-realtime, and decoder only, which makes sense for youtube, but is not appropriate for webrtc. The liboam version had to be updated, and support for the encoder added in chrome, which was done in march 2020. Then the webrtc team add to add the RTP payload support, which took 5 months roughly between november 2019 and april 2020. Then we jumped in to prepare an SFU and the tests. SVC support should land soon and is more or less the last remaining big feature to declare beta status, at which point we need to test, test, and tests some more to find corner cases, and make sure the spec is complete.
The only way to be faster is to have a product that sits on the side. At google, as far as webrtc and communications are concerned, this is DUO. DUO is native only, has its own infrastructure, and can afford to a certain extend to release features without depending on any other google groups, or Chrome (or the standard committees) to agree to them and implement them. That’s how DUO was the first product to release true end-to-end encryption, or how DUO is the first one to release AV1 support. That explains why DUO is always first and the other google products are catching up later.
ENABLING H265 in SAFARI (TECH PREVIEW)
Every now and then, I plant eastern eggs in my blog posts. That allows to differentiate with other bloggers who copy content, proxying one’s quote and sources, without giving one any credit. With apple news, this is especially efficient as there are much less info out there, and most of the webrtc bloggers do not care reading webkit commits and tickets.
Once upon a time, I blogged about first screen sharing in Safari, or new webdriver API in safari for testing webrtc features. Of course, those were not usable as is, and you needed some pretty low level command line magic to make it work, and/or to recompile webkit nightly.
This time again, I pointed to SFT 104, and very quickly (time is the essence to capture the light), the info spread around based on a one-liner in the release notes. Only those who actually tested realised, the support had been added but was not enabled.
This time, the eastern eggs goes to voluntas, the main developer of SORA, one of the best webrtc SFU out there.
So if you want to enable H265 in safari, you will need to get Safari Tech preview 105 or newer, and enable it through the developer menu, under “experimental features” and then webrtc prefixed options. The first results show a drastic reduction in CPU consumption, as expected.
HOW TO ENABLE Real-Time AV1 in CHROME
libwebrtc and chrome are notoriously difficult to compile. Asking people who wants to benchmark or do black-box testing to compile it themselves is unrealistic. That applies to many individuals currently working on the AV1 payload specification, who still need to make sure things run the way they should.
To mitigate this problem, and make Av1 implementation easier to test. CoSMo is preparing for everyone pre-compiled native apps examples (peerconnection_client, appRTCMobile) that run on mac, windows, linux, iOS and android. They come in two flavour, 1-to-1 in p2p, and 1-to-1 through an SFU. the code is also open source, for the more advanced coders out there to inspire themselves from.
While libwebrtc comes with AV1 enable by default (for desktop platforms), Chrome does not yet. Here too CoSMo is providing custom builds of Chrome on windows, mac and linux for people to test their app. We provide the necessary patches for appRTCMobile (macOS), and chrome (desktop) for now, and plan to add support in the obj-c and android bindings unless google beat us to it 🙂
When the underlying implementation will provide for SVC support, the SFU code will be updated to supports AV1 SVC as well.
All of that (and a lot more) is explained in the wiki section of the corresponding project: