Why are you so mean? WebRTC NV / ORTC APIs are too hard!

I. Introduction

I saw a lot of reactions to the ORTC announcement by (small) webRTC solution vendors that the new API, wether the webRTC NV or the ORTC one (they share a common inspiration after all) was too complicated. People to start wondering about the reasons why the standard committee / MS was doing that, and why the Australian Government was renaming their research centers “Data61” (here), would that be because area 61 actually exists but is in Australia? Was NSA involved? … and I stop reading the conspiracy theories at that point 🙂

II. Standard Committees: Cathedral or Bazaar?

(For those too young to know which book I’m referring to, I recommend finding yourself a copy and read it.)

The W3C and the IETF are open consortia. Anybody can join and participate. Joining an IETF mailing list actually makes you a member! As far as the W3C is concerned, a membership is involved, but small start up only pay a couple of thousand US Dollars for the first two years. Those two are the two entities involved in the specification of the core of webRTC. There are others that are also worth mentioning like IMTC or 3GPP that are interesting depending on your use of webRTC (interoperability between VoIP and webrtc and Mobile, respectively). In the case of W3C and IETF, the mailing lists are public and not limited to members, so anybody can go there and ask questions, provide feedback, and interact in any way with the members that will eventually make the decision. That feedback from users of the technology is very important for us to make the right decision, and I encourage everybody to go there and exchange. 

III. Use case and feedback

Like any other software, defining a JS API for the browser is about defining the right use case. Whatever you define it will in turn impact the API surface. In the case of webRTC, the original use case is a 1:1 call with audio and video, and that use case was implemented as appRTC. For a very long time appRTC was the reference for bug reports, tests, interoperability between browsers, etc. In turn the Peer Connection API has been tailored to make that use case dead simple. Most of the underlying machinery (all of ICE, the encryption, the codecs, ….)  was hidden within the browsers, and most of the parameters were hidden in SDP. It made writing a webpage that could do a video call a 10 lines homework a student could do.

Use cases evolve as one understanding of a technology improve, as reflected by the corresponding document that is used as an informational reference for both webRTC and RTCweb. Looking at the document tracker, you can see that no less than 16 revisions exist before it was stabilized early this year. If your use case is not in this list, it is very likely that webrtc 1.0 (due sometime around Xmass 2015, if we’re all good boys/girls) will not support it. However, you can voice your need and try to have your use case taken into account for the next version of webRTC (no, not webrtc 1.1, no, not webrtc 2.0, no, not ORTC, just ……. webRTC NV for next version).

Some thought that a 1:1 use case was too simple: peer connection would be too big a black box, and shoehorning all parameters in an SDP blob was just adding complexity and dependencies. Vote happened, decision was made, peerconnection was here to stay. The ones in disagreement created a Community Group, with no standardization power, named ORTC to prepare what could be the base for specifications the day people would want to do things differently, if ever.

IV. WTF happened with all those new API you’re throwing at us?

As usual in the web, people use API in way they were designed for, and it’s awesome. When they do, things break, and/or we get feedback about things that are not working because people assumed it was working in a different way it actually is (or different browsers implement it in a slightly different way). The bug-or-feature discussions happens next, we take note, and put it in the agenda for the next meeting if enough people are interested in it. This time, we were facing very clear and convergent cases.

1:1 is bo~ring!

First, 1:1 is boring and most people are expecting multiparty calls, simulcast or even smarter simulcast using SVC codecs (264, vp9, …). There are slight differences there and the order in which those have been mentioned is not random.

Supporting multiparty calls is having the capacity to have several people join the same conversation, wether in p2p or not. While you can do that with multiple peer connections the underlying assumption is that you want to do it with a single peer connection, to leverage synchronization of streams, common bandwidth adaptation, port optimizations, ….. The problem here is more about how to signal this case between browsers, and gave birth to the infamous Plan B and Unified plan. The former was implemented in chrome for very long, the later is the official spec, but is only fully implemented in Firefox today. Those media streams can be completely independent, i.e. they can come from different sources.

Simulcast is about sending a media stream from a single source using different resolutions. The main usage here is to choose which resolution you are going to use depending on external factors like the resolutions of the remote peer’s screen, the bandwidth of the remote peer, …. While you can implement simulcast using a multiparty implementation as above, you would be losing the information about the relation between the media streams, namely that they all come from the same source, and that one is a decimated/scaled down version of the other. The multiparty implementation would treat all stream equally and in bad network conditions would reduce the resolution of all the streams, killing the purpose. Simulcast usually comes with smart Bandwidth adaptation algorithm that knows he needs to keep the lower resolution stream untouched, and just adapt the highest resolution stream first when bandwidth goes down. Simulcast is most important in use case that involve a media server. In simulcast, the media streams come from the same source, but are independent in the sense that they each can be decoded and rendered/played separately.

SVC codecs allow for yet another level of greatness. SVC will encode the lowest resolution media stream as a normal stream, that can be decoded on its own, and will then encode only the difference between the higher resolution and the base resolution in subsequent streams. The advantage here are multiple: lower bandwidth (low frequency information is not duplicated across streams), better resilience, ….. SVC codecs are especially useful in cases that involve a media server. In this case, the media streams come from the same source, and are NOT independent, except for the lowest resolution stream. The subsequent streams need to have all the lower resolution streams available to be rendered/played.

People are jumping the fence because they have unanswered needs

People are today modifying the SDP on the fly to be able to have access to properties, or capacities of peer-connection internal objects, or to be able to set those properties, or parameters. Several underlying object were modified this way: the ICE agent, the encryption (DTLS), the codec choice, the bandwidth, …

If the use case is valid (and more often than not they are) adding a JS API that  does what people were doing by manipulating the SDP is the right thing to do. We slowly replace an opaque, not specified, API by a specified, JS API with JSON objects. It does not give more work to the developers, since they were doing it already, even though they will have to take the opportunity to refactor and clean their code.

V. Here is why

It so happen that some of the API proposed by the ORTC group would answer both the multiparty/simulcast/SVC problems and the SDP munging problems. They are being slowly integrated in the webRTC specification when and where they make sense (except for Microsoft, which just implements it all his way and dumps it on an unexpecting audience). The time to bring them in webRTC 1.0 specs was shortened by the fact that those had been though about for quite some time now, and overlapping members had worked on both webrtc and ortc and could bridge the gap.

Most of the new API you have seen coming out of the last meeting were APIs that would just provide a good way to achieve what people where trying to achieve by manipulating the SDP, *AND* could be integrated before the end of the year not to push further webRTC 1.0. The other changes are related to paving the way to simulcast, but I already spoke about that in a previous post.

Because the APIs are more granular instead of being tailored for a 1:1 case, it makes writing the 1:1 case with those API look overly complicated in contrast. I do not believe it to be really a problem, as it is always easy to go from granular to simple. Within a few weeks, you will have webrtc-on-ORTC shims, and your website will work exactly the same (as long as you don t need video), or you can keep ignoring Edge all together. There are quite a few things that are overly complicated to do in webRTC today that will be easily doable with the new APIs. No regression in any case, just possible improvements. I expect the same thing to happen for the latest additions to webRTC 1.0 API set. Eventually webRTC and ORTC should also converge. 

I hope that this post brought some light on the decision process followed by W3C. The core of it is feedback from users, and timeline considerations, so once again, if you have a use case, or a question, voice them on the w3c’s public-webrtc mailing list (not the discuss-webrtc mailing list).

 

Creative Commons License
This work by Dr. Alexandre Gouaillard is licensed under a Creative Commons Attribution 4.0 International License.

This blog is not about any commercial product or company, even if some might be mentioned or be the object of a post in the context of their usage of the technology. Most of the opinions expressed here are those of the author, and not of any corporate or organizational affiliation.