In a public comment to millicast recent post about simulcast, Chris Allen, CEO of infrared mentioned that they have been supporting ABR with WebRTC in their Red5 Pro Product for a long time. While his claim is valid, and many in the streaming industry use a variation of what they do, there are two very important distinctions that needs to be made between ABR and simulcast. We made the distinction about latency quickly in our presentation at streaming media west last year, however possibly too quickly, and we never really explain the distinction about end-to-end encryption, so we though we should dedicate a full post this time around. WebRTC with simulcast is the only way to achieve the lowest latency possible, and real end-to-end security, with a higher flexibility than DRM can provide.
The main goal of using WebRTC in Streaming is the optimally low latency. In millicast recent post about simulcast, we explained how the latency was mainly related to chunk size, why WebRTC was optimal in that regard and would have the lowest possible latency of all the UDP-based media transports. That’s for the packetization and the transport, but encoding is also adding to the overall latency.
While WebRTC End-to-End provide optimally low latency, if you add extra encoding, transcoding, even only transmuxing or any other media operation on the path, you degrade that latency. Most of the latency, and CPU footprint comes from the encoder/decoder part, as illustrated by this paper by INTEL. Let’s count the number of Encoding / Decoding pairs happening on the media path when using WebRTC end-to-end on one hand and when using server-side ABR on the other hand.
In simulcast or ABR, one encode the same media source with different spatial resolutions. In simulcast, the multiple encoding is done client-side, in ABR, it is done server-side.
Multiple encoding in parallel do not increase the latency, but increase the CPU footprint by 25% for a single additional stream, an a theoretical 33% for an infinite number of additional streams, and the bandwidth usage. In simulcast both those overhead are shouldered by the sender, in Server-side ABR by the server.
It’s all about compromise. ABR could be, and certainly is, sold as a way to reduce the cpu footprint and the bandwidth usage sender side, which is does, at the cost of almost doubling the latency and preventing the use of end-to-end encryption.
Moreover with nowadays desktop computer, the additional CPU footprint is negligible. Everyday, gamers are playing games and streaming in real-time using software like OBS-Studio, without the streaming part noticeably impacting the performance of the game.
In Server-side ABR, one need to encode the stream once (high resolution), to send it to the server, where it needs to be decoded before it can be re-encoded again with different resolution. With WebRTC End-to-End, from glass to glass you ever only encode and decode once. With server-side ABR, you do that twice. However good your server-side ABR implementation is, you can never be as fast as WebRTC End-to-End with simulcast.
CoSMo and Millicast, as experts and visionaries in the corresponding technologies, working with and for most of the browser vendors and major actors, we have been aware of those limitations for several years. We were part of the decision to include Simulcast in webrtc 1.0 and not to wait was taken at the technical plenary meeting of W3C in Sapporo in 2015!
Instead of trying to have an early implementations that would necessitate a lot of workaround, and would need to be rewritten once the browser vendors once mature, we decided to invest in helping the browsers get there faster. There is no surprise that Apple mentioned us, and only us, in their most reentSafari WebRTC blog post, or that our KITE technology is used to test the webrtc implementation of all browsers, on a daily basis and reported to the official webrtc website.
Security and Privacy
Another reason, beyond uncompromising latency, NOT to allow re-encoding, is security. Since the Snowden revelations, the world knows about all kind of national agencies spying on everything that goes on transit on the wire, and the Internet Engineering Task Force has taken a very strong stance on security. If everything can be captured once it goes out of my computer, if my Internet Service provider, or my CDN, can be forced to provide access to my data without my consent and without informing me, the only way to protect myself is to encrypt everything that goes on-the-wire. Welcome Telegram, Whatsapp, Signal, and all kind of new generation of communication tools which implement end-to-end-encryption with a two keys system where the service provider only ever has one key and cannot provide access to unencrypted content even if being legally asked to. They protect their customer privacy.
That implies a new trust model in which no server or connection is trusted, and should have access to the raw frames. You cannot do transcoding in the cloud, since the cloud shouldn’t have access to your raw content in the first place.
DRM, and corresponding W3C Encrypted Media Encryption is different in nature. With End-To-End encryption, the end-user or its organization has control of the encryption with their personal keys. With DRM/EME the keys are provided by the media distribution service. For free content or ad-based monetisation, one might not care, but for paid or sensitive/regulated content, the consequences of leaving access to your raw content to external third parties can be dire.
While webrtc 1.0 does not include end-to-end encryption, the subject was brought into the discussion once it became clear that the original webrtc p2p use case was not scalable. It was just left to be addressed in the next version of WebRTC (WebRTC NV) not to delay WebRTC 1.0.
The IETF has been working on a specification called Privacy Enhanced RTP Conferencing (PERC). Its biggest implementation known to date is providing SEC-level compliant double encryption to the top 25 banks in the world, and is sold and operated by Symphony Communications. Its design and implementation was done by your humble servitor: CoSMo.
Several W3C members are working on new APIs in the browser that would allow to manipulate encoded frames, which in turn allow End-to-End encryption, while an evolution of PERC, protocol independent to be able to use e.g. QUIC, and more bandwidth efficient exists already for those using only native SDKs.
We hope that this post made the differences between webrtc end-to-end, including simulcast. There will be a follow-up post on smart but important distinctions about encoders and decoders, but we did not want to make this post too long. We also hope that the subtle but decisive difference when it comes to securing content was adequately illustrated.
We would like to thanks Chris Allen again for his original comment. Just like with their comment on a previous post, however wrong the comment, it provided us with insight on what people in the streaming industry might not yet be aware of when it comes to WebRTC, and the opportunity to write a nice blog post about it. For that, we would like to thank them.