Why are you so mean? WebRTC NV / ORTC APIs are too hard!

I. Introduction

I saw a lot of reactions to the ORTC announcement by (small) webRTC solution vendors that the new API, wether the webRTC NV or the ORTC one (they share a common inspiration after all) was too complicated. People to start wondering about the reasons why the standard committee / MS was doing that, and why the Australian Government was renaming their research centers “Data61” (here), would that be because area 61 actually exists but is in Australia? Was NSA involved? … and I stop reading the conspiracy theories at that point 🙂

II. Standard Committees: Cathedral or Bazaar?

(For those too young to know which book I’m referring to, I recommend finding yourself a copy and read it.)

The W3C and the IETF are open consortia. Anybody can join and participate. Joining an IETF mailing list actually makes you a member! As far as the W3C is concerned, a membership is involved, but small start up only pay a couple of thousand US Dollars for the first two years. Those two are the two entities involved in the specification of the core of webRTC. There are others that are also worth mentioning like IMTC or 3GPP that are interesting depending on your use of webRTC (interoperability between VoIP and webrtc and Mobile, respectively). In the case of W3C and IETF, the mailing lists are public and not limited to members, so anybody can go there and ask questions, provide feedback, and interact in any way with the members that will eventually make the decision. That feedback from users of the technology is very important for us to make the right decision, and I encourage everybody to go there and exchange. 

III. Use case and feedback

Like any other software, defining a JS API for the browser is about defining the right use case. Whatever you define it will in turn impact the API surface. In the case of webRTC, the original use case is a 1:1 call with audio and video, and that use case was implemented as appRTC. For a very long time appRTC was the reference for bug reports, tests, interoperability between browsers, etc. In turn the Peer Connection API has been tailored to make that use case dead simple. Most of the underlying machinery (all of ICE, the encryption, the codecs, ….)  was hidden within the browsers, and most of the parameters were hidden in SDP. It made writing a webpage that could do a video call a 10 lines homework a student could do.

Use cases evolve as one understanding of a technology improve, as reflected by the corresponding document that is used as an informational reference for both webRTC and RTCweb. Looking at the document tracker, you can see that no less than 16 revisions exist before it was stabilized early this year. If your use case is not in this list, it is very likely that webrtc 1.0 (due sometime around Xmass 2015, if we’re all good boys/girls) will not support it. However, you can voice your need and try to have your use case taken into account for the next version of webRTC (no, not webrtc 1.1, no, not webrtc 2.0, no, not ORTC, just ……. webRTC NV for next version).

Some thought that a 1:1 use case was too simple: peer connection would be too big a black box, and shoehorning all parameters in an SDP blob was just adding complexity and dependencies. Vote happened, decision was made, peerconnection was here to stay. The ones in disagreement created a Community Group, with no standardization power, named ORTC to prepare what could be the base for specifications the day people would want to do things differently, if ever.

IV. WTF happened with all those new API you’re throwing at us?

As usual in the web, people use API in way they were designed for, and it’s awesome. When they do, things break, and/or we get feedback about things that are not working because people assumed it was working in a different way it actually is (or different browsers implement it in a slightly different way). The bug-or-feature discussions happens next, we take note, and put it in the agenda for the next meeting if enough people are interested in it. This time, we were facing very clear and convergent cases.

1:1 is bo~ring!

First, 1:1 is boring and most people are expecting multiparty calls, simulcast or even smarter simulcast using SVC codecs (264, vp9, …). There are slight differences there and the order in which those have been mentioned is not random.

Supporting multiparty calls is having the capacity to have several people join the same conversation, wether in p2p or not. While you can do that with multiple peer connections the underlying assumption is that you want to do it with a single peer connection, to leverage synchronization of streams, common bandwidth adaptation, port optimizations, ….. The problem here is more about how to signal this case between browsers, and gave birth to the infamous Plan B and Unified plan. The former was implemented in chrome for very long, the later is the official spec, but is only fully implemented in Firefox today. Those media streams can be completely independent, i.e. they can come from different sources.

Simulcast is about sending a media stream from a single source using different resolutions. The main usage here is to choose which resolution you are going to use depending on external factors like the resolutions of the remote peer’s screen, the bandwidth of the remote peer, …. While you can implement simulcast using a multiparty implementation as above, you would be losing the information about the relation between the media streams, namely that they all come from the same source, and that one is a decimated/scaled down version of the other. The multiparty implementation would treat all stream equally and in bad network conditions would reduce the resolution of all the streams, killing the purpose. Simulcast usually comes with smart Bandwidth adaptation algorithm that knows he needs to keep the lower resolution stream untouched, and just adapt the highest resolution stream first when bandwidth goes down. Simulcast is most important in use case that involve a media server. In simulcast, the media streams come from the same source, but are independent in the sense that they each can be decoded and rendered/played separately.

SVC codecs allow for yet another level of greatness. SVC will encode the lowest resolution media stream as a normal stream, that can be decoded on its own, and will then encode only the difference between the higher resolution and the base resolution in subsequent streams. The advantage here are multiple: lower bandwidth (low frequency information is not duplicated across streams), better resilience, ….. SVC codecs are especially useful in cases that involve a media server. In this case, the media streams come from the same source, and are NOT independent, except for the lowest resolution stream. The subsequent streams need to have all the lower resolution streams available to be rendered/played.

People are jumping the fence because they have unanswered needs

People are today modifying the SDP on the fly to be able to have access to properties, or capacities of peer-connection internal objects, or to be able to set those properties, or parameters. Several underlying object were modified this way: the ICE agent, the encryption (DTLS), the codec choice, the bandwidth, …

If the use case is valid (and more often than not they are) adding a JS API that  does what people were doing by manipulating the SDP is the right thing to do. We slowly replace an opaque, not specified, API by a specified, JS API with JSON objects. It does not give more work to the developers, since they were doing it already, even though they will have to take the opportunity to refactor and clean their code.

V. Here is why

It so happen that some of the API proposed by the ORTC group would answer both the multiparty/simulcast/SVC problems and the SDP munging problems. They are being slowly integrated in the webRTC specification when and where they make sense (except for Microsoft, which just implements it all his way and dumps it on an unexpecting audience). The time to bring them in webRTC 1.0 specs was shortened by the fact that those had been though about for quite some time now, and overlapping members had worked on both webrtc and ortc and could bridge the gap.

Most of the new API you have seen coming out of the last meeting were APIs that would just provide a good way to achieve what people where trying to achieve by manipulating the SDP, *AND* could be integrated before the end of the year not to push further webRTC 1.0. The other changes are related to paving the way to simulcast, but I already spoke about that in a previous post.

Because the APIs are more granular instead of being tailored for a 1:1 case, it makes writing the 1:1 case with those API look overly complicated in contrast. I do not believe it to be really a problem, as it is always easy to go from granular to simple. Within a few weeks, you will have webrtc-on-ORTC shims, and your website will work exactly the same (as long as you don t need video), or you can keep ignoring Edge all together. There are quite a few things that are overly complicated to do in webRTC today that will be easily doable with the new APIs. No regression in any case, just possible improvements. I expect the same thing to happen for the latest additions to webRTC 1.0 API set. Eventually webRTC and ORTC should also converge. 

I hope that this post brought some light on the decision process followed by W3C. The core of it is feedback from users, and timeline considerations, so once again, if you have a use case, or a question, voice them on the w3c’s public-webrtc mailing list (not the discuss-webrtc mailing list).

 

Creative Commons License
This work by Dr. Alexandre Gouaillard is licensed under a Creative Commons Attribution 4.0 International License.

This blog is not about any commercial product or company, even if some might be mentioned or be the object of a post in the context of their usage of the technology. Most of the opinions expressed here are those of the author, and not of any corporate or organizational affiliation.

ORTC in Edge – Are you ready for the tsunami?

My twitter is on fire! This is one of the most awaited for news in the ecosystem and for many but not all, it comes way earlier than expected.

I. history and context

Earlier this year, speaking at NTT’s webrtc conference in Tokyo, I had listed ORTC only as a possibility for 2015 (slide #17 and #30). Later on, at the San Francisco webrtc meet up hosted by tokbox in june, I had repeated this prediction, which was fitting what most were thinking back then (around 1:45). The release of the GetUserMedia API earlier in May should have been a hint. I was wrong.

As august and september passed by, more and more hints in the ecosystem pointed to a fast(er) release of ORTC in Edge. A lord of questions were still open: which codec will be used, how close to webrtc NV API will the ORTC API be, etc. Actually, some of those are still unanswered today. By september, it was clear everything was done and ready, which made some devs happy, and some standard committee members less happy. It shall start to make sense why I chose to spend one week in Redmond before the W3C meeting. 🙂

II the facts

On september 18, 2015, Microsoft announced the availability of ORTC for Microsoft Insider Program members. They announced it in several places, but mainly on the windows/Edge and office/Skype Blogs. On the official page of microsoft, a demo with twilio wass accessible. The same day, blogs posts on webrtchacks and other very visible venues started to appear. Very clearly, some were in the known, as the body language of many at the Kranky geek show when the subject came on the table had hinted at.

Both &yet’s simpleRTC (with its audio-only, chrome-only-for-now, demo) and twilio’s ortc-adapter were providing when this article was written ways to interoperate between webrtc and ORTC already.

As announced by Bernard Aboba at the Kranky & Geek show, and as I reported on the corresponding webrtc-discuss mailing list thread, MS plans to support it’s own flavor of 264 SVC first, called 264UC. For the hardcore media developers that already know about 264 AVC/SVC and want to see the  difference with 264UC, here you goIt then plans to support plain H264 (AVC) for interoperability with firefox (today) and chrome later. As far as support for 264 for webrtc in chrome, android and iOS versions of the libwebrtc library already support them through the OSes APIs, only desktop versions of chrome do not support it, and have a bug open and already assigned. In brief, they are working on it as we speak, and patches are already available.

current Skype design – ORTC

In the mean time, MS plans to use a media server to transcode between the different flavors of H264 out there for Skype, as illustrated above.

III Consequences and discussion

A. short term

First and foremost, go read the webrtchacks article to get a quick overview of what you need to do to have a simple 1:1 call. Get an aspirine, come back here. 🙂

The ORTC API is more complex, and being lower level will need you to define many more things than the webrtc api would for the same use case. The advantages or those API only become obvious for more complicated cases like codec manipulation, connections with media servers, multi-party with one peer connection, simulcast, SVC codecs …. For most, it’s just going to be superfluous nonsense. In which case, go with a shim like twill’s ortc-adapter and you will be taken care of. It’s very likely that most Vendors will update their JS SDK in the next weeks to do just that as a first step. Note: that will only work for audio, so prepare some aspirines for the web developers that will have to handle all the browsers and all the media cases 🙂

If you are using an IE plugin, you need to make sure your market need it. Entreprise market might need it as they are captive of older versions of IE, but social website, and common websites should not care. Moreover, the two main plugins out there are still stuck with older versions of the specs (before promises and other things were added, roughly half a year ago), and show no sign of preparing for the 30 new Objects and API ORTC and webrtc NV are bringing.

B. medium/long term

There are two main questions here you need to ask yourself for the long term. 

  1. Am I limited in any way by the webrtc API?

If you have already everything you want, just use a shim, you’re good. If you are limited in terms of network control (security or bandwidth), codec, number of streams, etc, then ORTC or webrtc NV will bring something for you. If you’re using a media server, chances are also high that you will benefit from the new APIs.

2. How convergent webrtc and ORTC really are?

That’s the key question. If I wait, will this interoperability problem get easier or disappear? Most JS developer are ok with some degree of incompatibility anyway, but incompatibility on the wire is not something you will be able to deal with at JS level, so one has a real problem here.

First demos and example seems to indicate that interop, as far as signaling, API, and audio media is concerned, is already a reality. Until MS implements plain H264, interoperability for the video will be a question mark.

Finally, I am not aware of anybody having looked at the webrtc / webrtc NV / ORTC APIs to check how far appart they really are. My feeling is that they are close enough for some JS magic to make them disappear, but again, I have not checked yet.

IV Conclusion

Since august, it was obvious for many that this was coming. On my most recent (public) presentation, my last slide was clear (here, slide #12): ORTC is a great API, and it’s needed for a lot of use case, but many vendors are not ready for the tsunami and the amount of work needed to support all that.

My prediction was and still is that the webrtc PaaS ecosystem will shrink by 25% (in number of companies) as the smaller one are already very very lean, in a market were traction is still the exception, and funding can be hard to come by. The IE plugin vendors will be even more impacted, as the market just got smaller, and they will need to have different sources of traction to hope raising money. We’re living interesting times.

Creative Commons License
This work by Dr. Alexandre Gouaillard is licensed under a Creative Commons Attribution 4.0 International License.

This blog is not about any commercial product or company, even if some might be mentioned or be the object of a post in the context of their usage of the technology. Most of the opinions expressed here are those of the author, and not of any corporate or organizational affiliation.

Creative Commons License
This work by Dr. Alexandre Gouaillard is licensed under a Creative Commons Attribution 4.0 International License.

This blog is not about any commercial product or company, even if some might be mentioned or be the object of a post in the context of their usage of the technology. Most of the opinions expressed here are those of the author, and not of any corporate or organizational affiliation.

Will Simulcast be in webrtc 1.0?

On september 9 and 10, a join interim meeting of the W3C Webrtc and Device API  groups took place at Microsoft HQ in Redmond.

Interim meetings, also called face-to-face meetings (when they are not happening in china ….), are smaller, more focussed meeting that take place between large all hands meetings with a specify agenda to make things happen faster. IETF meetings (3 a year) are long and can be draining. W3C meetings, well, “meeting” without an ‘s’ actually since only the TPAC is technical, are also long, but more importantly, they only happen once a year, and one might want to take decision faster than that, especially when the plate is full with webrtc 1.0 wanted features. Those interim meetings allow for feedback and decisions that are difficult if not impossible to make by e-mail.

Earlier this year, the webRTC working group came to the end of its allocated time, and requested an extension to the W3C. The extension was granted, under the condition that the convergence between the webrtc working group work, and the ORTC community group works becomes more of a reality. Erik L. (hook flash) became a chair of the new WebRTC group, and the convergence between the specs was accelerated. 

This particular meeting was dedicated to refresh the list of features that we should get on 1.0 or leave for later, and have a clear view before TPAC (end of october in Japan), so that the meeting during TPAC could just be validation of decisions, and we could end up with a stable list of features for 1.0.

Lots of new APIs had been experimented with within the ORTC group (which was comprised of many webrtc group members anyway), and peter T. from Google for example had come up with many proposal about how to design those new APIs. Those were not really controversial, where reasonably small, and things went well. Most of the topics and corresponding presentations are now accessible here.

Then we spoke about simulcast.

Simulcast was … an interesting discussion. How to simulcast on the wire was not so much a problem, almost everybody would agree to what needs to be done (even if MS and Cisco went at it a bit). What the ultimate goal should be was also not really a subject of discussion, everybody would agree that we should aim for a JS API that would allow not only for simulcast but for SVC later, and that the new APIs approved a little bit earlier would do the job. The major concern was the time and amount of effort to be done to get this in for webrtc 1.0 without delaying the already overdue release. 

There were two aspects of this. One, as simulcast touches on almost every layer from the wire to the signaling, the amount of work to test and debug it would be bigger than usual, and Google was concerned about being able to do it before the end of the year. Second, but related, was the discussion about how to signal simulcast.

Here, we were not discussing about Plan B vs Unified plan, which touches on how to signal multiple streams in a single peer connection, but discussing simulcast, which is a sub case of multiple streams where the streams are not independent from each other but a variation (different resolution, different frame rate) of the same source.  The key point here is that you do not want the browser to treat those streams independently. For example you do NOT want the browser’s bandwidth congestion control or the codec’s bandwidth adaptation algorithm to separately modify the stream resolution. You want the highest resolution to be fully dropped to accommodate bandwidth first, or e.g. the video to be switched off while the audio remains. This is to possible with multiple independent stream, this is possible (conceptually) with simulcast as you know the dependency between the streams.

The main question about simulcast was wether webrtc should re-use the work done within the IETF MMUSIC group. That group is in charge of SDP format, and his currently working on a draft specifically for simulcast. For some, it would seem odd not to use this, and to duplicate the work. Especially, mozilla who would have implemented that already in Firefox, was a big supporter of using it. For others, given the history of the MMUSIC group, that draft might still take too long to mature into a spec. Google was especially reluctant to add an additional normative dependency to the webrtc spec without having control or even visibility on the potential delay that would result from doing it. Focus on the 1.0 deadline and do not postpone it, especially by an undefined delay. The fact that the next IETF meeting is to happen AFTER the w3c TPAC was bringing the likelihood of this issue to be addressed in time for 1.0 down to zero.

As everybody agreed on what the right thing to do is, and as the only remaining unknown was related to the capacity of the IETF group to deliver, two members of the W3C werbtc working group also chairs of the IETF rtcweb group, namely Cullen J, from Cisco and Peter H. from Google reached out to their pendant at MMUSIC to probe interest of MMUSIC to have an interim meeting early enough to allow for this specific spec to surface and remove that concern from the table. As of today, the answer from MMUSIC is positive, and they committed to accelerate the process on the SDP simulcast spec (here).

The road to having simulcast in webrtc 1.0 is still perilous. First the MMUSIC need to converge on the sdp-multicast specs, hopefully with feedback from the webrtc group. Next, the w3c webrtc group had agreed on having a virtual meeting before TPAC (last week of october), with feedback from MMUSIC in hand to decide if simulcast is a go for 1.0. Finally, TPAC should be the place for first implementation feedback and final decision about inclusion in webrtc 1.0. All that in 5 weeks. 😉

 

Creative Commons License
This work by Dr. Alexandre Gouaillard is licensed under a Creative Commons Attribution 4.0 International License.

This blog is not about any commercial product or company, even if some might be mentioned or be the object of a post in the context of their usage of the technology. Most of the opinions expressed here are those of the author, and not of any corporate or organizational affiliation.

Amazing Applications using WebRTC (1)

[ this article as a follow up, written on February 2016, here ]

Last week I gave a training at Cisco, followed up by a hackathon on OpenWebRTC. My friend, colleague at w3c and IETF, dan burnett was handling the standard part of the training (and god knows he is good at it), while I was handling the pratical part: implementations, stacks, debugging browsers, ….. It was a great session, the duality between theory and practice (Yes, this is what the spec says, now, it is not exactly implemented like that …..) was exciting, and the audience was very knowledgeable, each on some part of the big puzzle that a WebRTC solution is, as well. Great experience.

At the end of the training, I was asked a question I was unprepared for: what are the most amazing webrtc apps or solution. Hum, tough one. First because it’s about taste, some people might be amazed by great user interfaces (and would love Telenor Digital‘s appear.in for example), while some other would be amazed by a company that solved a technical problem they have been unable to address so far (Ericson’s bowser, or eFace2Face‘s cordova plugin both address the problem of missing webrtc Support on iOS, where the webrtcinwebkit project tries to address the same problem at the source).

The researcher in me is always amazed by people that have a vision and can not only see beyond what is possible today, but make it happen.

For that reason I love the presentation by Ferros. The guy is passionate, he has a vision, he makes it happen, and he has that rare capacity to transmit his passion to his audience. In one (yes, only one, for more, you can follow him on twitter) of his side project, he is basically trying to bring bit torrent to the web and to the browsers. I had the chance to work with him half a day in Singapore when he came to speak at JSconf.asia, and to listen to him again at the webrtc meet up in SF lately. It was really enjoyable experiences. I’m really looking forward to listen to him again at the Kranky and geek event (#webrtclive) later this month.

When I was working at Temasys I had the chance to work with Thomas Gorrissen, who is the best JS dev on that side of the world, organizer of JSConf.asia, and UI/UX Expert for Sequoia Capitals among (many) other things. So I did not have to worry too much about my JS code, which is good because C++ devs makes poor JS devs at best 🙂 My only call for fame here is to have made him excited enough about WebRTC that he would consider working on it. My dream JS team would be him, feross, and Iñaki (see below).

There are some engineers that stand out as well in the ecosystem as being part of small teams and still making a difference. I have a lot of respect for the work of Philippe Hancke and Iñaki Baz Castillo for example, and I’m humbled by victor pascual avila or dan burnett knowledge about the different APIs and technologies in use. So I strongly recommend to follow them if you’re not already doing so. I think Iñaki nailed it for all of us: “I was told it was impossible, so I did it.”  My feeling, exactly.

The advantage of WebRTC is that it s so fast to improve, and so easy to start using it, that I’m discovering everyday cool stuff. The project that excites me most this week is a global p2p network project by david dias that leverages webrtc. Yes, it is close to what feross does and I actually can see both projects as complementary, but what is nice about david’s project is that his blog posts are almost complete research papers. That means you have all the information from start to finish, with overview of the entire field of p2p communication (back to kaazaa), with citations, drawing and so on and so forth. It’s refreshing to see something well explained, and reproducible. It also allows someone new to the field (like me) but with enough scientific/technological knowledge to build up enough knowledge on p2p technologies, and signing and security of Distributed Hash Tables used in crypto currency like bitcoin for example. I think he is on to something, and I wouldn’t be surprised if a US-based company was poaching him from Portugal.

If you know of any crazy project that tries to use webrtc in ways that nobody else does, please let me know, I need my weekly fix.

Creative Commons License
This work by Dr. Alexandre Gouaillard is licensed under a Creative Commons Attribution 4.0 International License.

This blog is not about any commercial product or company, even if some might be mentioned or be the object of a post in the context of their usage of the technology. Most of the opinions expressed here are those of the author, and not of any corporate or organizational affiliation.