This is a translated, adapted version of an original post by NTT’s Iwase Yoshimasa available here, with agreement from the author. As the ecosystem move quickly, some updates were added in blue and in italic.
This post describes the current state (as of september 2016) of MCU and SFU media servers used in WebRTC solutions. I hope it will serve as a quick reference for those wanting to know more about the concepts and the available projects. The details of each product introduced here are not provided, but a link to each product is, so you can read further if you want. Moreover, we almost only mention stand alone media servers, and did not touch on webRTC CPaas or PaaS.
One can divide WebRTC system architectures roughly into two types:
- Those who do not terminate the encryption or access the media
- p2p architecture
- using TURN server
- [Alex Note] Those supporting PERC in the future.
- Those who do
- (also here VoIP-webRTC interoperability server, but not covered here)
As usual, choosing the best solution depends heavily on the use case, and no architecture is considered best overall.
Main Media servers (In alphabetical order)
TL;DR, this is for you, most of the content to follow is summarised in this table. Make sure to read the comment on the right side.Intel Collaboration Suite for WebRTC
Include client SDKs (JS / Android / iOS / Windows), server SDKs (SFU / MCU / SIP gateway). It provides almost everything you wanted already packaged. Instead of developing everything from scratch, the MCU / SFU is Licode-based.
By the way, recently, it was mentioned as a core technology in the partnership between INTEL and South Korea’s telecommunications carriers giant SK Telecom.
Janus core is WebRTC “gateway”, it has been developed on top of libsrtp and libnice (implementation of the SRTP and ICE protocols also used by Google and mozilla). By adding a variety of plug-ins, you can achieve different functions or use cases, for example an SFU. As I wrote in a previous article, it has been used in Slack. Implementation is made in C language. License was originally AGPL, but was changed to GPLv3 after a discussion with Dr Alex and Oleg (creator of CoTurn) on Discuss-Webrtc mailing list..
Acquired by Atlassian, it is a SFU written in Java. Atlassian allowed for the project to remain open source (even though they changed the license) and the development is continuing. From the start Jitsi has been using XMPP for signalling (with clients), and its own XMPP extension: Colibri, to exchange (with the signaling server). It also provides a REST API.
The entire team is very close to the IETF standards and to the browsers. It is the fastest to implement simulcast (maybe the only so far). Emil, the ex-CEO and founder of jitsi and tech lead for the video bridge, is also the author of Trickle ICE.
While Atlassian is obviously using it in their products, others like http://talky.io/ have used it for different use cases.
Originally designed as an MCU, Kurento itself is implemented in C ++. There is SDKs for JS / Node / and Java. Developers can manipulate the Kurento using the SDK. How Kurento media server can be managed with Node.js (Kurento + WebRTC + Node.js) is very detailed. At present, it can also behave as an SFU.
[Alex Note] : bought by twilio on September 20th.
There is also a managed service based on Kurento called NUBOMEDIA, people who do not want to operate on their own servers can either use that or elasticRTC.
Although it was only originally MCU, it can now also behave as an SFU. Licode itself is implemented in C ++. As described above, it has been used by INTEL as a base for their media server logic.
The project being a little bit older than the others, there is no github repository but only a source forge repository.
Developed by Dialogic, it is a commercial media server. The fixed layout is painful, but is an historical problem coming from RFC5707. If you have implemented support for RFC5707 you can probably control it. It should be noted that, in Japan, Softbank Corporation is using it for a very large scale install base. (Reference)
The only SFU “made in Japan” (by Shiguredo) among those mentioned so far. Unlike other SFUs, in addition to functioning as a video router, it implements resource optimisation feature (by mean of snapshot), and many other unique features. Refer to Shiguredo WebRTC SFU Sora development logs for other advanced features.
While this post is about media servers, I think it’s good to remind the audience that WebRTC does not only achieve communication through media servers, there is of course also form of communication that does not pass through the media server (P2P / TURN).
The problem with P2P (a.k.a. full mesh) is that it does not scale very well on client side, i.e. the number of people in a given conversation is limited. If you implements things smartly, according to Mr. Philipp Hancke, you should be able to handle about 8 audio+video flux on a normal PC. In addition, (this is my own opinion) If the communication is not both way, or if you do not render all the video flux, you can handle more than that.