WebRTC 1.0 Simulcast Hackathon @ IETF 104

Finally finding the time to go back to the root of this blog: webrtc standard. In this post we will revisit what simulcast is and why you want it (product view), before we go into the implementation status score cards for both Browser vendors and Open Source SFUs out-there that could be established after a fast and furious hackathon in Pragues two weeks ago. WebRTC Hackathon before the IETF meeting are becoming a tradition, and as more and more come together to participate, the value for any developer to participate is getting higher. We hope to see more participants next times (free food, no registration fee, all technical questions answered by the world experts, what else could we ask for?)

 Simulcast in WebRTC 1.0 refers to the sending a different resolutions of the same media source to a relaying media server (SFU). The SFU then can relay the most adequate resolution on a per-viewer basis depending on the viewer bandwidth, screen size, hardware capacity, …

The SFU can also dynamically adjust which resolution is being sent dynamically, were the conditions on the receiving side change. This is most often the case with bandwidth over the public internet which can vary a lot, or when one goes from ethernet to wifi, or from one mobile antenna to the next.

This comes at some extra burden on the sender’s CPU, and sender bandwidth usage. Practically, spatial resolution goes down by a half for each dimension (x, y) at a time, which reduce the number of pixels in each resolution by a quarter. The CPU footprint and bandwidth being proportional to the number of pixels, you end up using 25% more CPU or bandwidth for an extra resolution, then an extra 6.25% for the next resolution, 1.6% for the next, and so on and so forth. The additional cost is considered marginal compared to the added User Experience value for the viewers. Moreover, in Streaming use cases, it is usual that the computer streaming out is a high-end studio computer where some composition is happening in real-time, or at least a desktop level computer. 

 Traditionally, at least in the WebRTC ecosystem, the term “simulcast” is used when changing the Spatial resolution through using separated encoder generating separated bitstream. This is independent of the media transport protocol used in WebRTC: (S)RTP.

For modifying temporal resolution, the approach is slightly different, and is based on layered codecs technologies called SVC. S stands for scalable, and the process is then called “temporal scalability”. It consists in using RTP headers instead of the bitstream to tag video frames. Changing temporal scalability (e.g. from 30 fps to 15fps), is then just a matter of dropping packets, which can be done extremely fast (milliseconds) in the media server. Most browser vendors implement both simulcast and Temporal scalability for VP8 and simulcast for H.264. Temporal scalability for H.264 is in progress (Patch Review).

Moreover most browsers supporting VP9 (and AV1) also implement full SVC support, which means that there is only one bitstream, and switching between spatial and temporal resolution in the SFU is quasi-instantaneous (Note, AV1 is implemented as decoder only for now).

Thanks to those, the SFU can be smart about the way it manages outgoing bandwidth by choosing which stream to relay. It should also be smart about how it manages the incoming bandwidth, which is a little trickier. Let’s say you are using three resolutions for the sake of the argument. If the bandwidth between the sender and the media server was to be reduced, what should happen?

The natural answer is: drop the highest resolution, as we have been used to that behavior by years of skype usage. However smarter bandwidth allocation could be used depending on your use case. Some might want to protect the highest resolution stream by default, shutting down the lowest resolution(s) if it allows to stay within the available bandwidth. Some might decide depending on what resolutions are used by the viewers at that time: kill the streams with no viewers, or the less amount of viewers, …..

That is why the notion of simulcast in WebRTC 1.0, is entangled with the notion of “multiparty signaling”. While simulcast can be achieved with separate peer-connections, one per resolution, it does not allow then for smart bandwidth management on the sender-side, as the bandwidth management, a.k.a. congestion control, is on a peer-connection basis. Simulcast has been possible for more than a year now, as long as you were using separated peer connections, hacking your way around the signaling (mangling SDP), and some other workaround. Full fledge simulcast, bringing better connectivity (less ports used), full bandwidth management on sender-side, and so on and so forth requires a single peer-connection, using some new APIs, corresponding capable codecs, …… This all only came along very recently in browsers.

The Hackathon

So here is the challenge: parts of simulcast in WebRTC 1.0, like the sender side and corresponding signalling, are standardised, parts, everything about the SFU and the receiving side, are not. The specifications go across at least two standardization groups, W3C which has a test suite, and IETF, which does not, but for Which the WebRTC ecosystem has created a special Testing Infrastructure called KITE.

As leaders of WebRTC testing and member of all the standard committee, and in collaboration with google, which is co-chairing most of the working groups, CoSMo went ahead and organized an Hackathon to help giving all the stakeholder visibility on the current state of affairs and eventually bringing simulcast faster to maturity, and (Finally, those slides were written in 2015 !!) call WebRTC 1.0  “DONE”. 

The hackathon at IETF 104 was the biggest ever, as people realized the value of working together on some subjects. I suspect that the free food helped too. While we had done WebRTC hackathon before, around Identity and security in London, and around bandwidth management in Bangkok last year for example, this was the best organized, and biggest WebRTC Hackathon to date. 19 individual registered for WebRTC with 13 listing ONLY WebRTC.  All main Browser vendors were represented MS, Google, Mozilla, Apple.  Many open-source SFU Tech Leads were on-site: Meetecho, Medooze, …while many othershad  prepared tests to be run by us like MediaSoup.Two W3C staffs came to help as well, as getting visibility on what’s left to be done is their biggest concern today.

THE RESULTS

The details of each bug found, filled, fixed, new tests, and all the detailed metrics can be found on the wiki and corresponding presentation, but suffice to say, having browsers, SFU devs, and KITE experts around the table was really efficient. Meetecho has been writing a great blog post about it. I’m going to illustrate it with a few handpicked items.

The W3C, and the browser  vendors, were really interested in seeing a global browser status card, and set up automated testing to be able to check their progress (and avoid regressions) from that day forward. A specific version of and SFU wrote on top of Meedoze with a “lean-n-mean, only-accept-specs mode had already been specifically provided to all browser vendors last year, and e.g. Apple had been using it a lot to come up with their first Simulcast implementation. The above table required the interaction between all of us, plus some of the specification writers in the case of bandwidth estimation. Moreover, the results have been vetted by each browser vendors as being true up to their knowledge, to avoid bias. The result is a reference status table the W3C and others can use to plan and roadmap their transition to a spec-compliant Simulcast world.

For the SFUs vendors, there is a need to be more pragmatic. You cannot be more catholic than the pope, and if the browsers do not implement simulcast, why should they. Most of them have then a tendency to implements what the browser have implemented, and lag behind a little bit. This is especially true with Commercial SFUs or services. While we originally included some commercial SFU results, we have decided to remove them from this table. For the reason cited above, they do not support simulcast and it should be expected. Colouring them in red, as they do not respect the spec today, makes them feel like they’re being told their baby is ugly. Since this is not a judgement piece, but a factual, compliance piece, and since we would not be able to double-check their claim ourselves anyway, we felt it was more reasonable to just take them out.

Open-source SFUs are more prone to implements the latest tech, and more open to constructive criticism, while still needing to be pragmatic. The result is that they supports several flavours of Simulcast, often in parallel, to support all browser vendors, hoping that they’d get their s%^&* together fast, converge quickly to spec-compliant implementation.

In the above table, everything that is important to support any given flavour of Simulcast today is indicated, and those important to be spec-compliant are coloured. It is more or less in order from the most compliant (left) to the less compliant (right).

Now, they were quite a few Nasty bugs found during that hackathon, and I would like to focus on two which I think would be almost impossible to find if you did not have the holy trinity (browser, SFU, KITE) around the table, showing the value for all to participate in those hackathons.

Choose a simulcast stream (High, Medium and Low) and a Temporal layer (original FPS, FPS/2, FPS/4), and you can visually compare the sent (top line) and received (bottom line)  Width, Height, FPS and kbps. People familiar with my blog and webrtchacks will recognize the UI that has been used many times in the past, to test VP9 SVC in chrome, test VP9 SVC and simulcast for Medooze and janus, and test the first implementation of H.264 simulcast in Safari by Apple.

In this example that we open source, we use a single browser in loopback mode, and we implicitly check that simulcast is working by selecting the layers. The example is fully automated with KITE, and can then be run across any desktop and mobile browsers (+some more), to check for regression.

 The second bug is funnier, once you have found it. It so happens that chrome is allocating bandwidth to simulcast streams implicitly based on the order you feed them to the API, assuming the first one is the highest resolution, and the last one is the lowest. If you feed the streams in e.g. the reverse order, it ignore the indication provided and allocate most bandwidth to the first, lowest resolution, stream J

Kudos on Lorenzo For finding this one and for making a manual test, that CoSMo then automated with KITE for Apple. As you can see above, Meetecho has the equivalent of the loopback test we spoke about earlier with Medooze, in the form of the EchoTest plugin demo. The UI gives you indications about which layers are being received, and what bandwidth is being used.

You can see on this screen capture with the test running that the bandwidth provided is NOT what you would expect (on the left).

(Below) KITE allows for automation of this test, result gathering, and reporting.