Using #webrtc as a replacement for rtmp

Such is the title of one of the latest blog post by wowza. While it is a very interesting question, I believe the blog post is conveying the cliche the streaming ecosystem as been carrying about WebRTC and that are not longer true. I do not believe wowza to be knowingly deceiving people, i see their point, I just believe that recent advances in the webrtc protocols make most of their statements inaccurate. This post is an attempt to document the statements that can be proven wrong today. Fact-checking in some way.

I. It’s not one use case, it’s two.

As a global comment, while the title states the question as a pure rtmp vs webrtc, lots of the statement deals with global systems which involve recording, storage, and HLS, which is quite different.

Let’s separate two use cases:

a – you cannot afford any delay, you want as-real-time-as-possible, for all the uses cases wowza presented in a precedent post: betting and gambling, gaming, VR, interactive chat, adult industry ….. In this use case, the end-to-end delay is critical, and involves the encoding, the chunking/packetization,  and everything sender side.

b – you can afford some delay, either to the level of cable television (5s), or pre-recorded content serving (up to 90+s). In this case you might even be able to pre-process your media, doing encoding, chunking, and uploading to the CDN way before it is streamed to an actual viewer. Anybody saw that 1999 Batman recently? In that case, all the time-sensitive magic happen between the storage (CDN) and the player on receiving side.

RTMP and WEBRTC could address both cases a and b. One can stream directly from capturer to viewer without transcoding for lowest possible latency (case a), but one can also add a recording unit in the list of “clients” to record and store the media, and later serve it directly from storage. Conceptually there is no difference there between the behaviour of RTMP and WEBRTC.

HLS in contrast, is by design too slow to deliver stream with a latency under 2 ~ 5 second. I had the chance to exchange with the Apple HLS team at WWDC this week and they confirmed that it was by design. Most of the difficult problems are dealt with in HLS by delegating the problem to the transport (HTTP), or to the buffer, in all cases, adding delay. The “specification” for HLS make it clear: the scope was to achieve scalability, and to leverage the internet cache infrastructure.

II. if you want to slow down the streams…

If you would like to delay the playback time, or try to synchronize playback across multiple devices, you may want to capture with WebRTC, but use HTTP Live Streaming (HLS) for playback, using metadata and timecode to control the time you want referenced from playback.

This really only is a problem with case b, pre-recorded content delivered over webrtc. Moreover, this is not a protocol limitation, as webrtc includes all timestamps and a client web app can use them to sync if need be. There is no real difference there with HLS, a lot depends on the features of the client app, more than the protocol itself. (homework: look at chapter 10 of the HLS specification, check that security is not mandatory, and realise that HLS streaming security depends on the client app, and a centralised infrastructure…)

III. WebRTC Does not scale beyond 1,000?

Currently, WebRTC is very limited in its ability to scale past a few thousand instances without a vast (and expensive) network of live-repeating servers that can handle the load. Because WebRTC utilizes peering networks, there still has to be a nearby node to help distribute the stream to other local hosts—and peering across a global network can be incredibly difficult.

Lack of scalability is really the biggest cliche about WebRTC, but to everybody’s defence, it was practically true until very recently. WebRTC has roots in the VoIP world, and the first use case was video conference / video chat / Unified communication. From the start, it was about real-time. While the Video conferencing industry and the telcos had experience in real-time media, they did not in dealing with large audience. If we take jitsi, one of the best open source webrtc media server out there, as an example: they presented results of their benchmark that showed that once you reach 30 users in the same conference, a normal server would become saturated.  The thing is, we know for a long time that the average number of people ein a vide chat / conference is closer to 4. Scalability has never been a bottleneck for those who created webrtc in the first place, as a single server would always be plenty enough to support their worse case scenario. The scalability limit of one server (roughly 1,000 streams), then became the scalability limit of the protocol. Practically there was nothing more stable, because nobody had ever tried.

Now this is not true anymore. There are several companies that have, or claim to have arranged clusters of webrtc servers in such a way that you can accommodate either bigger conferences for video conference use case (more than 30 people in a conference), or bigger audiences in the one-to-many use case, like it’s been done in the VoIP field for decades. Vidyo is most famous for their cascading technology (video presentation in 2017), jitsi is rumoured to be about to release a big-conference cluster solution, and some others like red5 pro cluster, liveSwitch, claim to have equivalent solutions. The difficulty here is not to cascade the media, but to make sure that the mechanisms that handle bad network (bandwidth fluctuations, jitter, packet loss a.k.a RTCP / NACK / PLI / RTX / RED / FEC ) still work well with multiple hops between media producer and viewers, and with so many viewers.

Most recently, CoSMo as WebRTC experts has teamed up with Media Streaming experts from Xirsys to Develop such webrtc cascading technology for streaming, that we call “milicast”, that can do just that: webrtc sub second latency, at 1M viewers scale. We have tested it using the industry-validated, google-sponsored KITE testing engine, and it is already used in production by several clients in time-sensitive verticals like the adult “camming” industry. Don’t trust us; don’t trust spankchain, or Xirsys; don’t trust Google, don’t trust anyone! “talk is cheap”, had said linus torvald, “show me the code”. Contact Xirsys to set up a demo and check both latency and scalability by yourself.

Note that RTMP has exactly the same limitation, i.e. a single RTMP server will be limited in the number of viewers that can be feeding from it. The only difference is that, it s been a long time that people have cracked the problem of having ingress servers, and egress servers (with possibly a lot of other stuff in between) to deal with that single server limit.

IV. WebRTC not broadcast-quality?

Today, you can’t reliably stream broadcast-quality video through a WebRTC infrastructure. The WebRTC protocol is currently limited to supporting VP8 and .H264 video”

This one is just plain false. In many ways.

First, at it’s peak, RTMP was only capable of using H.264. In that regard, WebRTC is in no way worse than RTMP.

Second, VP8 and H.264 are the only MANDATORY TO IMPLEMENT codecs in webrtc, but nothing prevents browser vendors to add more codecs. Ericsson had a webrtc stack with H.265 as early as 2013, Goggle has been supporting VP9 for years, and Firefox has followed. All of them are founding members of the Alliance for Open media creating the new AV1 codec … 

which means larger file sizes will bog down the network, not to mention burn up processors when ingesting and attempting to send a large file in its entirety.

It is likely that the author here is putting webrtc in comparison with HLS here, and not RTMP, the former which mandates H.265/HEVC for encoding larger files, leveraging H.265 higher compression rate.

Moreover, with webrtc or rtmp, the encoding is done on the sender side. One never send the raw / pre-encoded file over the network, and there is no transcoding.

To support 4K or broadcast-quality 1080p60 resolutions, you’ll need to be able to transcode for playback on a variety of devices, while sending the highest-quality source to your transcoder.

Here again, this is a problem specific of HLS (or mpeg dash, any file=based system) and does not apply to WebRTC and RTMP.

The problem being addressed here is the following: if I have different viewers with different playback capacity, in terms of bandwidth, hardware, and display size, how can I manage to serve them all with the best quality stream they can handle with a single source.

In file-based systems, you will encode different chunks for different resolutions to be served. Depending on the available bandwidth, display size, and hardware, the player will chose which chunk to download. It’s usually called transcoding, because in those cases, the media source encode once, push the encoded media to a server which will transcode the input stream into different streams at different resolutions.

WebRTC has the equivalent mechanism, just real-time. Here again, several resolutions of the same source media are encoded and served to the server, and the player / receiving side then indicate to the server which resolution it wants to receive, in real-time. The simple version of this is achieved with simulcast (multiple separate encoder with different settings), the best version is achieved with SVC codecs. VP8 has supported simulcast and half-SVC for a very long time, and CoSMo has just provided Google a patch for the same feature with H.264, likely to be adopted by Apple and Mozilla right away.


WebRTC has evolved very quickly in the past years. It has the capacity today to replace Flash with an even lower latency. You can always slow things down, add recording, storage, and HLS transcoding as extra legs, but you will never be able to fasten HLS or current streaming solution to WebRTC latency.

Webrtc was needing some work at system level to be able to replace a full streaming infrastructure, especially in term of scalability, but if Xirsys and CoSMo’s milicast platform and ecosystem (OBS-studio-webrtc) is an example, the solution is already out there.

WebRTC already has the latest codecs implemented (VP9), codec much better than RTMP ever had, and in par in compression ratio with what HLS has. Given the extremely fast update rate of the browsers (6 weeks), it will get next codecs faster than other protocols (AV1 support in Firefox since version 59).

Don’t wait, don’t base judgement on faith, try it yourself.

2 thoughts on “Using #webrtc as a replacement for rtmp

  1. Hi DR.Alex,
    you said:”Moreover, this is not a protocol limitation, as webrtc includes all timestamps and a client web app can use them to sync if need be.”.
    how web application can work with webrtc/rtp timestamps ? Do you mean web-app can send webrtc server something during signaling stage to start playback with particular delay ?

    1. First, if it is pre-recorded content, you can delay at the source. Of course that will not be sufficient if you want different delays for different viewers. One does NOT have access to the RTP timestamps in JS, but you can use normal video element APIs if I remember correctly.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.