WebRTC has been a magical word for the best part of the 2010 decade, starting, if we had to put a date on it, with 2011 Google IO presentation. From conversations as early as 2018, and many small signs (dropping support for official mobile release in m80 release notes), it was clear that, for Google at least, the WebRTC star itself was already the past. Still, more people depend on webRTC, or want to adopt it, today than ever. What are the options out there? How should one prepare for WebRTC in 2020?
WebRTC implementation? or maybe RTCWeb? What’s the difference?
The word WebRTC is used to refer to the official JS APIs in the browsers, and the official implementation of the entire stack. The former is standardised as part of HTML5. The later is code resulting from the GIPS and ON2 acquisition open-sourced and under the, sometimes frustrating, governance from Google. It is implementing everything, from the WebRTC API, all the way down (codec, RTP, encryption, ICE, network, …), including de facto everything needed to be interoperable on-the-wire, i.e the RTCWEB specs. For some time now, people have called the implementation libwebrtc, and that helps removing the ambiguity between the standard document, and its implementation by google.
Where can I find a webrtc implementation to … inspire myself from?
If you are looking for a WebRTC implementation, you have to look on the browsers side, or anything that implements HTML5.
Firefox has its own implementation of webrtc (the API), most of it directly in JS, on top of different components and libraries, most often from open source projects. Firefox ICE implement for example is inherited from the reSIProcate project, and their media engine (codec + RTP) is borrowed from libwebrtc.
Safari has also borrowed libwebrtc for their RTCWEB implementation, with a lot of optimisations on the codec side to support hardware acceleration, and apple APIs in general, as much as possible. The WebRTC JS API implementation was bootstrapped from contributions by the WebRTC-in-webkit project by Ericsson Research, italia, centricular, and myself.
If you go deep enough, you will realise that there are maybe only 2 or 3 C/C++ libraries out there for ICE, SSL, SRTP (respectively), and 1 for SCTP (needed by the data-channels), so eventually everybody use the same ones.
Do I really need all of that? What is the minimum I need?
Some distinctions needs to be made.
If you want to implement the HTML5 WEBRTC JS API, for example, in react-native, Qt QML, or other JS compatible framework to be able to reuse some existing JS libraries or complete websites, you are likely to need all of that. However, this is usefully not the case, and if this is what you want, some projects or vendors got you covered (React-native, Qt QML)
If you just want to be WebRTC compatible or compliant, i.e. to interoperate with web apps running in browsers, whether you are a native app or a gateway, you do NOT need to implement WebRTC, but just RTCWEB.
In other words, you can use whatever API you want, you just need to be sure that what you send (on-the-wire) to the browsers is compliant. In certain cases, the specifications themselves recognise the difference between browsers and gateways, and lower the expectations for gateways (e.g. the ICE part can be much simpler when you know you’re on a public IP).
Is libwebrtc the only library around? is it the best?
Libwebrtc is famous because it was the original one, it is maintained by an army of 100+ google engineers (plus all the chromium infrastructure and tooling team), and it is used by most browsers. While WebRTC was not fully standardised, the fact that it represented most of the code in Chrome (but not all, check this) was really appealing, as the target was moving fast. It’s likely to be less and less the case from this year on as the WebRTC 1.0 spec has just reached Candidate Release level, the last level before becoming an official standard.
libwebrtc is a client-side implementation, that is less than ideal for server-side webrtc implementation. Most if not all of the open-source SFU, and many closed source, have their own stack, all different which each other, although interoperable on-the-wire. So, when you hear that there is only one implementation out there: libwebrtc, it’s far from the truth. One could take any WebRTC media server and have a fully compliant implementation, or the firefox source code, ect. I’m not saying it easy, I’m saying that there are many more options that people usually think. I’m also saying that, in most cases you do not care about the WebRTC JS API and just need to implement RTCWEB.
A few years ago I had made a list of the implementations / media servers out there. it also include the signalling-only (p2p media), but it is still useful to illustrate how many choices you have, and to start from.
Nowadays, you have multiple choices. If you’re looking for something lightweight for embedded devices, Janus and PIPE comes to mind, both written in C. If you’re looking for an implementation in JAVA, libjitsi is a good start and is used in many derivative products like red5 or ant media. There is a GO implementation called PION, and a python implementation named aiortc. The de facto standard webrtc SFU used by W3C and many IETF members in general, and google and Apple in particular, for compliance testing is called medooze. This is also the SFU used to power the millicast streaming platform media path.
Do not be afraid to mix different implementations in your system. Actually, you should as the same rtcweb implementation is not going to be good for both server and client-side. If you only use web app, browsers are taking care of you, otherwise, libwebrtc seems to be the most popular choice client-side nowadays, but a lot of people, depending on their preferences also use e.g. GStreamer.
Trivia / Fun facts to close this post.
Facebook owns the most popular apps on the market. For those curious about what webrtc stack their use, it is notorious that FB messenger uses libwebrtc stack. The fact that they forked it in 2013 and never quite got to update it is generating a certain number of bugs that have entertained the browser community for some years now.
Whatsapp is based on the less known PJMedia, part of PJSIP library. PJMedia packs a full media engine (voice only) in 56kb! of course, lots has been added on top of it, including but not limited to end-to-end encryption (more on that on a later post). How they dealt with the GPL license is unknown.