The cat is out of the box. Today, November 19th, is google official launch date for their Cloud Gaming service STADIA. Beyond the excitement it brings to the gamer in me, the satisfaction it brings to me to see a large scale #webRTC streaming service is overwhelming.
With our Millicast.com service, we have been pushing the boundaries of WebRTC for streaming, and have faced at time some incredulity. We have documented a few keys problem and differentiating solutions (here, and here for example), but it has been an uphill battle. With Google bringing it out in the open, people will just not challenge it and have a completely different mindset when approaching webrtc for streaming. Good time.
For those that are not aware, Justin Uberti, the Webrtc expert in the stadia design, has made some public presentation about the underlying technology stack. Once was for the very technical community of the IETF (the organization which defines and standardise the technologies which power the internet itself) and can be downloaded here. The most recent was done last friday in SFO at the google s office, and provide a lot more details, for a less technical audience. What we loved is that he touched on many things we had detailed in our blog posts before.
Gaming Requires more often than not less than 1s latency, and in certain cases as low as 150 ms, to perform well.

Traditional Streaming Technologies are not appropriate for Real-Time.
As discussed in our blogs posts here and there, whether it is for cloud gaming or real-time streaming in general, a typical Video Streaming set-up just won’t do.



The difference is at system level and not component or server level, which is a paradigm shift in the ecosystem.
Traditionally, the streaming technologies are chaining or clustering servers, each working almost independently. As shown above, to achieve real-time streaming you need to change your approach from a component approach to a systemic, end-to-end approach to things.
The media transport (webRTC here) needs to be end-to-end as illustrated above. The Network management also needs to be end-to-end, as illustrated below. Each component’s buffer size and bandwidth target needs to be adapted depending on the entire path. For bandwidth for example, you are only as capable as you weakest link, and everything in the chain needs to adapt that (using, E.G. BBR congestion control).

Codec advantage
Most of the gaming, and streaming ecosystem is stuck with H.264 for multiple reasons, especially on mobile. H.265, mainly for licensing and legal matter, is not a very practical solution. However, many are already taking advantage of the enhanced coding efficiency provided by more recent codecs.
Netflix for example, mention at the live streaming conference last year to have switched to VP9, and to be able to achieve 25% (in average over resolutions and movies) bandwidth usage reduction. In practice that means that with your data plan, if you were able to enjoy 4 hours of video, you are now able to enjoy 5 hours of the same quality of video!
Beyond the codec efficiency, more recent codecs also include mechanisms for much better network resilience (less to no buffering on bad network), and real-time adaptability instead of ABR which translates into added latency. That is not a problem for the Netflix and the youtube who are serving pre-recorded content, but when you go real-time, like gaming or live event streaming, it’s making a world of difference.
It is well-known that Stadia is using VP9 with those extra extension, and is leveraging that experience in production to design the Real-Timde of AV1 (* some support for real-time AV1 have now been added to chrome 80).

End-To-end stream and parameters monitoring.
Monitoring is key to assess the state of your system. Beyond monitoring, the capacity to evaluate the subjective quality of your stream in real-time open the way to react and fix the quality if it drops also in real-time.
Right now, during the research discussions about AV2, the coding ecosystem is already entertaining the idea of dropping the traditional objective metrics (PNSR), in favour of subjective metrics that reflect more what the viewer perceives as being the quality of the stream. In the context of AV2, the proposed subjective metric is VMAF by netflix. Of course those metrics are usually not real time, and are used in general to benchmark in the lab or during development the codecs and their implementations (only decoder are standardised, there can be a multitude of encoder with different latency/memory footprint/cpu footprint profiles).
At CoSMo we have developed a real-time equivalent of VMAF to be able to evaluate the quality of streams and react to them in operation. The corresponding algorithm is called NARVAL and more details can be seen here.

The Future

What is coming for WebRTC
As discussed in our previous blog post, since this was presented in 2018, the QUIC part of WebRTC NV has been renamed WebTransport, and AV1 now is being discussed as part of the WebCodec work, which aims at letting people bring their own codec to the web, possibly written in WASM (Web Assembly).


