Lies, damned lies, and #webrtc statistics!

Recurrently, people are arguing on different webrtc mailing lists or social sites. Some questions are still left open: which MCU/SFU is better, which PaaS is better, should one work on webRTC in Safari? in Edge? Usually some kind of statistics is being used. While it is well known that you can make statistics say whatever you want them to, I do not always see reason to argue. Indeed, most of the time people argue about different use cases that are not opposed to each other, or implicitly define different scopes that do not overlap. I thought it would be good to clarify things a little bit by presenting several trustable source of data for browser usage, webrtc support, and define a view usual use cases for illustration.

The goal of this post is not to make you expert of statistical quality tools like the 3 sigma rule used in particle physics, or the intricacies of P-Values used in biology and bio-informatics.

The goal is to simply point to some data sources, and document some of the most common bias they have and misusage that are made of them, so one can independently evaluate claims made.

Browser Usage statistics

One of the most documented, but also most difficult to understand (and most misused) metric is the global browser market share.

Conceptually, there are two approaches: per user, and per usage.

If you go per user like ( net market share [2] ) does, anybody who would surf the web, even my mom owning an old windows computer and not knowingly going on the web, once a month, would count for one. She would actually account for as much as me, who is wired 24/7. As illustrated below, if your priority is decided based on addressable market size, you would go for chrome first, then IE, Firefox and Safari or Edge.

netmarketshares

if you go per usage like ( stat counter ) does, well, heavy internet user would account for more. Of course both metrics give quite different results, as illustrated below. Beware, the time axis is different with the above graph starting at sep 2015, and the graph below at a date in 2008. Also the above graph is above Desktop Browsers only, while the graph below includes both desktop and mobile, which explain e.g. the much better score of safari below. You should start getting a feeling about how the scope can influence the conclusion. If you were to set your priorities depending on the addressable market as measured below, you would work on chrome first, then safari, firefox, ….

There a few interesting conclusions to draw from the comparison of both datasets.

First, Stat counter has traditionally given higher scores to chrome, while netmarketshare would give higher score to IE. Intuitively: IE coming preinstalled on PCs, netmarketshare would reflect the PC/Linux/Mac distribution. The only conclusion we could draw was: there is more PC than anything else in the world, and the vast majority of PC users would stick to IE.

Then, it seems that heavy users prefer chrome, as per statsCounter results. 

Finally, whatever metric you take, it’s clear that IE+Edge is losing market shares while chrome is gaining market share.

OK, this is all well and interesting, theoretically, but what about webRTC? Not all browsers support webRTC, and even those who do have only implemented feature <name your feature of choice> in version Y, so how do I know my addressable market in that case? and should I care?

A metric for webRTC support?

This question can only be answered if you have two things: a metric, and corresponding datas.

What’s the reference?

A lot of people that are developing web apps will be interested in webRTC support among browsers. If you want to be thorough, let me stop right away, this is a question you cannot answer. The reason is simple: the specs are not final.

web app view

What does “webrtc support” means then? We see different interpretations there, each taking a different reference point:
1 –  the latest specs
2 – the latest chrome stable
3 – the latest firefox stable
4 – the latest adapter.js
5 – my PaaS vendor will take care of that for me, thanks.
6 – I’m mobile/Desktop only, I don’t care (not web app dev)
7 – I’m using ORTC/Edge anyway, I don’t care

To be real, #1 is not going to work until the specs are final, and fully implemented by all browsers.

#2 was very popular especially at early stage when chrome was implementing a lot more than the other browsers. Jitsi meet me would only support chrome early on for lack of simulcast support in Firefox, e.g.

#3 was very popular for those who wanted to have a H264 only app early on, like Cisco’s Spark. 

Now that specs, firefox implementation and chrome implementations are converging #4 is the preferred approach. Other browsers (Edge, safari) and APIs (ORTC) support is also being merged in, which makes adapter.js an almost mandatory companion to the browsers for those who want to use webRTC API across. Most browser vendors are supporting and enhancing adapter.js, and corresponding support is manually reported with an arbitrary granularity at www.iswebrtcreadyyet.com

iswebrtcreadyyet

More interestingly, for the point i m trying to make, #5 is a perfectly good answer, just as #6 and #7, and we can start seeing here a fragmentation of the use cases, which in turn bring people to reach different conclusions given the same question, without any of those conclusion to be wrong. No need to argue there.

Where is the data?

Well, while most of the big PaaS vendors have telemetry and logging in place, and can query their log (see a great presentation here for examples using tokbox and appear.in data ), and callstats.io help implementing the equivalent if you are not using a PaaS vendor, it can be quite difficult for an individual without access to those data to have a clear view before you put a web app in production and start getting one’s own data.

Even then, you are biased toward what your web app support. If you look at the color coded diagram below, you will see that today the browsers that have both a big global market share, and support for webrtc (in green) are limited to Chrome, Firefox, and Opera (with a lose definition of “big market share”).

Browsers color coded by webrtc support

It is very tempting to include that assumption in your web app, by showing a message when one of your user uses safari or edge to suggest downloading, say,  chrome. Very quickly, safari or edge will disappear from your logs. If you then use the resulting logs to justify at posteriori not supporting safari or edge, you have created a circular bias.

It does not matter if other solutions exists to support webrtc in safari or edge, and it does not matter if in the mean time safari or edge has hypothetically added native support for webrtc. Your application is filtering that browser out. [Same could be said for IE which do not support webRTC API natively.]

To go back to the subject: where are the data, Arnaud Budkewitzch, founder of Bistri, understood the need for the ecosystem to come together and consolidate usage data. He started www.webrtcstats.com. Now that some vendors like callstats.io have the critical mass to deliver global data, and some PaaS like TokBox or Twilio became big enough, it would be interesting to see them contribute those data. Of course, the data could be seen as sensitive, but the only thing that the ecosystem need, really, is percentages of usage.

What about stats or number for my market or my use case?

That’s really the 1 Billion dollars questions. Most of the data we have out there reflect the global internet, or come from browsers and PaaS vendors, which is more or less the same: they have a tendency to reflect a web-only approach (no native mobile, no native desktop, no IoT, …). They also represent the global consumer market, and not the enterprise market or specific verticals.

While this is good to have, a lot of app developers would like to be able to have stats by vertical (games, healthcare, ….) or maybe see the difference between enterprise and consumer markets. To make it worse, in most of the discussions I saw online, it is implicitly assumed that people speak about the web in general. of course. what else ?

Well, what else ? 

global view

Illustrated above is a simplified view of most of the pieces  that can compose a webrtc solution, and that are being called “webrtc” by some. You will recognise the web app and browser side of thing, on the top left, but also the mobile native in the middle, and native desktop / IoT on the right hand side. Worth noting is also the infrastructure / back end side of things which needs to interoperate with those clients, and support the (IETF’s) webRTC specs as well.

Whether you’re a mobile dev, a desktop dev, a back end  an IoT devs,  you do not use JS, and you do not care about the webrtc JS APIs. Lucky you will say (ironically) some, you are less impacted by browser releases, but then you have to handle all the stack by yourself. You are not less “Webrtc” than web apps. Actually in most cases you will end up using the same stack as the browser (webrtc.org), or use a repackaged version of the browser, be it webview, node.js, electron, or something else.

All of the above are technical differences, what about the business point of view?

Specific vertical can have a very different OS or browser distribution. Guess what, in the education and healthcare verticals, they love their Macs with safari on it. Here again, no data, so anybody can make any statement. Less than ideal.

Enterprise customers.

I was speaking with tokbox CEO and CTO one day about how they were dealing with IE and Safari. Their answer was easy: they usually don’t (even though they have a plugin for IE embedded in their platform). The reason was equally simple: most of their clients have the liberty of downloading another browser. 

I agree with my friend tsahi, who noted in many posts that if you have to download anything, you’d better download a better browser, or an app, before you think about a plugin. Here the tech fragmentation is less important, because you have the capacity to jump out of one box (non-webrtc-supporting-browser) to another one (webrtc-supporting browser, or native desktop app) at will. Fair.

However, that suppose that you have that freedom. In many enterprise context, you do not. Here most of us still agree. 

Where we disagree is on the importance of that use case by lack of data. Of course, pointing to the public web browser market share is a poor approximation of the enterprise market, but that’s almost the only available data. 

Recently, at WebRTC Expo in NYC, I presented The statistics for all of Citrix GoTo products. Citrix GoTo products require the installation of a native app. The web app then use webrtc when it is present for some stuff, but always connect to the app client. It then supports all browsers, and do not request users to download another browser. The resulting statistics are then not biased (see the first section of this post).

What to take away from this? First, almost 80% of all users are on windows 7.x  ! Among those (not shown here) 48% are using chrome, and 48% a flavor of IE. The second most important platform is MacOs (13%). On that OS the majority of users (69%, not shown here) are using the latest safari available for their version of MacOS. Windows 10 is almost non existent, and the edge bucket is not even making it to a tick (0.01%).

If you were to define your priorities from this, what would they be? chrome 38%, IE 38%, Mac 9%, ……

Case Study

Around may 2016, this was posted on a social website, and is quite representative of the biases and implicit assumptions that were mentioned in the post above. There is nothing specific about this post, there are many more like this, I just picked it up because it is concise and representative.

Things are going wild on the twitter #webrtc tag. Not a day without someone writing about Apple and WebRTC. […]

I’ve seen quite some usage numbers in the last three years. There is a clear trend: Chrome (and mobile clients based on the webrtc library) dominates WebRTC with a market share of more than 90%.

Safari is usually redirected to a “Please download Chrome or Firefox” page. Users that can download another browser will do it and then learn that they have to use that browser for the service.

So we are talking about roughly 5% of the /initial/ users that are affected by Safari not supporting WebRTC. While 90% of the regular users are affected by issues with Chrome. Not sure about your priorities, but it is pretty clear to me where I have to spend my time… 

Assumption #1: my users are on the public internet. Clearly the 5% of safari users correlates with the stats we have from other sites about global market shares. While this is a correct assumption for most of the big PaaS and some vendors, it’s clearly not the case for all. In that example, it’s very possible that the people cited working on Safari are in a situation where Safari is more important to them. As we saw before, there are quite a few of them.

Assumption #2: chrome + webrtc native clients represent 90% of users. While there are some stats about webrtc browser market shares and to some extend webrtc support, there is up to my knowledge no stats about the native clients. A PaaS offering both web SDK and native SDK would know what stack is used, but there is no way to know what all the apps are using internally. I would challenge the 90%. However, whatever the right number is, it is clear from the stats we have, that as far as webrtc is concerned, chrome is ubiquitous, and bugs in chrome impact more people than bugs in safari, or edge, or …..anything else.

Assumption #3: You can’t do two things at a time. There is no reason why you have to choose between reporting bugs on chrome and reporting bugs on any other browsers, or even working on fixing them.

Fippo, one of the best JS webRTC engineer there is out there shows that by being not only the best bug reporter for chrome (by far), attending the firefox stand-up every week (and hanging out in their IRC channel), and also the main developer of a webRTC on ORTC shim, among other things. What is the percentage of user that is impacted by Edge and that will benefit from that ORTC shim? According to both main sites cited above, around 5%. Same as Safari. In a specific market, it might very well be even less.

He also made a proud post about a bug that affect a fraction of TURN server users with firefox on linux. “the fraction of a fraction of a fraction of a pie. But if the pie is big enough …“. That shows you that you can have reasons beyond just the raw public market distribution and it contradicts directly the quote I started this paragraph with.

Conclusion

I hope this post helped clarifying a few things. I tried to provide different possible perspectives on what webRTC could be for different people. I obviously failed at listing them all, but I hope I provided enough possibilities to show that there is more than one “right” approach to webRTC.

I also hope I stressed enough the need to share some stats, perhaps through webrtcstats, perhaps another way, but in a verifiable and “query-able” way to help everybody make data-driven decision, and not point-of-view or opinion-driven decisions. 

Beyond the technical points, I’d like to stress three more things.

First, people are free to work on what they want during their free (or paid) time. The fact that they do something different than you is not a bad thing and there is no reason to be upset about it. It’s much more positive to show the good reasons why you do things, than to point at the perceived bad reasons others do what they do. It’s (mainly) a free world out there.

Second, everybody that writes a blog, or on the public internet and reach a certain fame, has IMHO an implicit duty to be as accurate, fair and respectful as possible. Most of the time “I don’t understand why they do that” does not mean “what they do makes no sense”. It might not make sense to you, with the knowledge you currently have, but it does not mean that it does not make sense for any given point of view, given info you might not have, forever. Usually it just means “I did not understand.”

While everybody is entitled of his or her opinion, I believe that fact-checking is an important part of writing blog posts about technology. Sometimes, reaching out with questions before writing anything is already preventing a lot of inaccuracies to slip in. Once it s out on the public internet, it’s difficult to fix. We’re far away from science, but in a true peer-review process, each statement is either backed up by a citation from a trustworthy source, or provably true.

Finally, the quest of truth and accuracy is paved with errors. Being wrong, accepting it, and not making the error again is part of the learning process. I consider it a great thing to stand corrected. It means I learned something and made progress. It happens to me very often (I have a big mouth), see here and here for the latest examples, and I always thanks those who help me improve. I do not see this happening too often in the webRTC ecosystem. If you ask a question, and get an answer, any answer, thanks the individual who answered. If you made a mistake, or are being perceived as making a mistake (whatever you think about it), apologise. Those who know will admire your class, those who don’t will accept your apologize.

Sorry seems to be the hardest word“.

[1] – https://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics
[2] – http://www.netmarketshare.com
[3] – http://gs.statcounter.com/#desktop+mobile-browser-ww-monthly-201510-201512-bar

Creative Commons License
This work by Dr. Alexandre Gouaillard is licensed under a Creative Commons Attribution 4.0 International License.

This blog is not about any commercial product or company, even if some might be mentioned or be the object of a post in the context of their usage of the technology. Most of the opinions expressed here are those of the author, and not of any corporate or organizational affiliation.

One thought on “Lies, damned lies, and #webrtc statistics!

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.