#webrtc.org now officially reporting test results with KITE

One week ago, KITE daily runs results were made available on webrtc.org. This signs the end of a first phase, and shows that webrtc automated testing in desktop and mobile browsers is doable today. This blog post reflects on the path taken to get here, the ongoing maturation of webrtc implementations, and KITE as tool for the RTC industry to achieve End-to-End testing, as well as load testing, and benchmarking.

I. Context of Real-Time Communication Testing

a. Testing RTC before WebRTC

Testing real-time media has always been hard. Generating artificial content, mimic’ing user behaviour, with all the possible variation, and sometimes with enough load, is challenging in its own. With previous technology stacks (Flash/RTMP, VoIP, …) it was possible to take some shortcuts and make some assumption, but eventually you would end up with a tool that can be use with one solution, from one vendors, and not with others. Testing was possible, benchmarking was difficult, comparisons between solutions and vendors impossible.

b. WebRTC can make it slightly worse.

With Webrtc, it became worse. The number of possible client types (browsers / native, desktop / mobile, IoT, ….) increased. The signalling not being fixed, the number of possible implementations of even the simplest case grew exponentially. Finally, not everybody agreed on the use case anyway. Should we test 1-1 calls only. Should we test multi-party calls? Are we in a Video Conference use case (many individual deemed equals participating with audio and video), presentation / training use case (many individuals but one presenting and the rest part of an interactive audience), broadcast / streaming use case (one to many, the many not having any interaction with the broadcaster) ….

c. Down that path, a need for Load Testing

Whatever the (business logic) use case, there was also a need to do some load testing. In a normal test, the complexity resides in reproducing the logic of the use case, with all the possible corner cases, the behaviour of the users, and the variety of the client type / operating systems / signalling / …. . The complexity comes from the size of the testing matrix. In load testing the complexity comes from the number of clients / calls you are trying to have your platform or product handle. You could use a simple client configuration (Chrome 63 on win 10), but 50 thousand times over. This often end sup being more complicated than normal testing. In our daily runs for the Browser vendors we usually have only 16 desktop and mobile browser configurations involved. For a long time, the corresponding selenium nodes were set up and managed manually. If you want to test your product with 1,000 streams, and are in a Video Conference Use case, like jitsi did in their great blog post, you might require only 30 clients. However, if you want to test one-to-many broadcast use case with 50 thousands viewers, you need 50 thousand concurrent nodes. That obviously will not happen with a manually set up grid. This will be the topic for a dedicate subsequent blog post.

II. State of the Art, and beyond

a. State of the Art as of late 2017

Some vendors implemented specific testing solutions for their own product, Like I did while at Temasys with the “Puppet Master” (awardedBest Tool at RTC world and Expo 2014), like Jitsi did with their Hammer, like Meetecho did for their Janus gateway with jattak, and so on and so forth (For a much more complete list and detailed analysis, please refer to our IEEE publication). The multiplication of such tool shows that there is a real need, and that the need is not being addressed globally by any of the existing attempt.

b. Genesis

When I presented KITE design at the W3C TPAC Meeting in Lisbon in 2017, I truly believed it was the right design, but I thought it would be something we would either do it as a group, as a community, or that i would slowly implement it in my own time. I had had great hope to be able to make it happen within the scope of IMTC, as their WebRTC group co-chair. I believed their flagship testing event SuperOp was the perfect venue for such a tool, and that the mindset was already in place to welcome interoperability testing. It proved more complicated than expected, and I failed to make it happen there.

Little did I know that internally Google was on the verge of starting a huge push on testing, with a lot of different angles. Our goals where aligned, and we started working together on making KITE happen, with great support from the entire browser vendors community, and some more. 

c. KITE Development

Most of KITE development happens offline. When a feature is deemed complete by Google, it is then pushed to the public repository, whose code is available to all under a permissive license. We are also operating the Selenium / Appium / “Secret Sauce” that run daily tests on a variety of browsers configurations, and maintain corresponding “Dashboards”. Last week, a simplified Dashboard was made available directly on webrtc.org which links to our more complex but more complete KITE dashboard at CoSMo.

III. KITE contribution to technologies

a. Before KITE

Setting up the desktop and mobile browsers to be able to test a WebRTC app automatically is not easy. WebRTC standard is not final yet, and its implementation differ from one browser to the next. Webdriver specification does not even include any provision for handling permission prompts. To be able to test WebRTC we also need to be able to use generated media. Some browser have proprietary workaround for that, some other just don’t have anything. Scratch that. Did not have anything!

b. The Vision behind KITE – from a technology perspective

That’s one of the reason we were so motivated by this project in the first. It allowed to identify and surface right now what the missing pieces to mature WebRTC into a production-ready technology much faster than the original way of doing things. It really reflects CoSMo aim: getting a better WebRTC, Faster.

In the process of just setting up KITE with standard configuration, we worked with the Browser teams, WebDriver teams, and Webrtc teams of all the major browsers vendors, and collaborate with them within the corresponding IETF and W3C working group to remove vendor specific workarounds, and converge to a standardised approach based on Webdriver to test Webrtc.

It translates directly into great improvements for web devs. No need to handle magic flag through configurations for firefox, command lines arguments for chrome, rebuild webkit with some developers flag on , or mess with MacOS command lines to enables Safari Functionalities anymore. No need to even know about all that. It makes the browser vendor life easier, but above all it makes the web developper life easier as well, and it allows to test better, hardening the technology in the process.

c. Practical contributions to technology advances.

As a result of this global effort, Apple recently contributed the first implementation of permission prompt support in WebDriver. We (as a group) are pushing to get this API in the specs, and get it adopted in all the browsers ‘ webdriver implementations, with likely chrome being the first to implement it then in chrome driver. Unless firefox beat them to it? 😀

At CoSMo we have extended appium and firefox webdriver implementation so that webrtc testing becomes possible on android Firefox. We are in the process of contributing the changes back. For now, As Far as We know, we are the only one to have the same tests running through selenium on desktop and mobile browsers (Chrome / android, Firefox / android, Safari / iOS).

Those who follow our twitter feed (@agouaillard) know that we have been recently moving in new space, and upgrading our clusters, so the last daily run of KITE is from february 6, but you should see them resuming today on webrtc.org. They will include the mobile browsers as well.

IV. The road to Production-Ready KITE

a. WebDriver maturation phase is done with.

As illustrated before, KITE project has been used so far to give vendors visibility on the missing pieces to mature the technology. The tests could be extremely simple, we just needed to have one to run across al the configurations. Having the configurations, all supporting automated webrtc testing, and being able to instrument them, was the goal. We feel we have reached the stage where this is stable enough now. There is still an effort to be made to push the technology forward to the point where the stable versions of browsers and corresponding webdriver implement it all, but it’s an ongoing effort.

b. WebRTC Maturation phase is ongoing.

The second phase, as a group project, consists in writing better and more dedicated tests to help mature ongoing implementation efforts. Bringing better Webrtc, faster. We are right now, among other things, focussing on writing specific multi-streams (in a single peer connection) tests to help fasten the transition from plan B to Unified plan, which in turn will pave the way to simulcast. Those will be contributed to the open source repository for anybody to inspire themselves from.

c. KITE: A tool for the RTC industry

CoSMo is developing commercial modules to support transparently Electron and Qt based clients. If you have a Qt or an Electron Client, you can contact us to give it a try (and if you don’t, you can still contact us to get one developed for you 😀 ).

CoSMo is also polishing a dedicated selenium grid manager which serves two purposes:

  1. Allow one to set up an on-premises or hosted grid to be used by KITE without any prior knowledge of selenium / webrtc / browsers / …..,
  2. Allow for dynamic grid of extreme sizes as needed by KITE in load testing mode. If you have a PaaS, and would like us to stress test it, you can contact us.

As an example, we have been recently involved in a case where we stressed a Broadcasting PaaS with 50,000 concurrent users first, followed by an additional 10,000 users every minutes for as long as it did not crash (they did not have any user on their platform then). That allowed to identify several high severity bugs in both the platform and the corresponding SDKs/clients. More on a separate post.

V. Conclusion

KITE is now stable enough for people to start using it in production. The main source code is open source free and free. The documentation should be complete enough for anybody to manually replicate the daily runs shown on webrtc.org today. We’re happy to help either extending KITE, write specific tests, and operating KITE. Contact us.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.