Dean Bubley's Disruptive Wireless: Voice-only applications: The Cinderella of WebRTC

I'm currently working on a major 2014 update of my WebRTC report & forecasts. As part of this, I'm reviewing all the major use-cases for the technology that have emerged so far, and those that are predicted in coming years. This spans enterprise, telecom, consumer web, mobile and assorted domains. (Pre-register interest in the report by email, or via the info request form on the right)

One thing I've noticed is that there's a surprising lack of voice-only use-cases for WebRTC. Video accounts for a far higher proportion of focus than could reasonably be expected. Indeed, when everyone lazily refers to WebRTC as "Skype in the browser", they almost always invoke an image of video chat or conferencing. You almost never hear terms like "VoIP in the browser" or "Vonage/Viber/etc in the browser".

Now certainly, there are various WebRTC audioconferencing products out there (eg Uberconference, Drum, Iotum), while Vonage itself had one of the first mobile WebRTC apps deployed last year. And a number of internal contact-centre solutions use a browser dashboard and a headset instead of a traditional telephony platform. Twilio CX is a good example of this. Plivo and Tropo also have voice-centric cloud platforms. Telefonica's Tuenti WebRTC app is currently VoIP-only too, as is Movirtu's Cloudphone extension for 2G/3G voice.

But these are exceptions. Most vendors, most apps and most WebRTC cloud API platforms are video-centric. It could be that video is just a harder problem for most developers, so WebRTC is a much bigger step forward. And of course, video is "shiny" and demo-friendly in a way that audio often isn't. It often needs bigger and better network boxes too, as well as smarts in mixing or transcoding that aren't (yet) as easy to commoditise or open-source at scale. And it needs lots of bandwidth, so understandably many people are keen to encourage its growth.

All these are fair enough, and I agree with them. WebRTC will indeed catalyse huge growth in video uses and usage. But it doesn't get around the fact that the vast bulk of today's realtime communications is voice-centric. Apart from the history, there's a practical reason - a lot of people cannot or will not use video in many cases - either because it's dangerous (eg driving/walking), invasive, distracting or uncomfortable. And if a given person is willing and able to use video, say, 30% of the time, that implies that any two people can use it just .3*.3 = 9% of the time. And any three, four or more people even more rarely, unless it's a pre-arranged conference.

"Spontaneous video chat" is therefore very rare, unless preceded by a voice/IM session that escalates. (A video call to a ready-and-waiting customer service agent like Amazon Mayday is a special case).

So it seems strange that so few WebRTC applications and services are targeted at the audio-only, or even audio-primary marketplace.

I have another theory about this. I'm starting to wonder if people just don't quite feel comfortable "talking into a browser", unless it involves video as well. The browser is such an image-centric application, that audio-only communications websites often feel a bit weird.

This is true for one-way audio too - we still see most audio/music use in dedicated applications - Spotify and Pandora and iTunes are still use desktop applications, even if they have some browser capability too. Web-based conferencing/collaboration software often pops open a separate "phone" window, if it doesn't just avoid VoIP altogether. Such tools also often assume use of a headset.

Some of this is a legacy of browsers needing plug-ins for audio communications - Flash, Silverlight and so on. (For music, it's also about DRM). There certainly have been a number of Flash-based PC VoIP websites in the past, but none has really emerged as a major winner.

Playing amateur psychologist here, I wonder if this is because audio-only applications (VoIP or WebRTC) have three modes of operation:

Using an external microphone, typically a headset
Using the integral microphone of a PC
Using a phone (smartphone or deskphone)

Talking into a headset or handset somehow feels quite "natural" to us. In particular, the speaker and microphone are in the right places, relative to our ears and mouth, so we don't feel like we're shouting - and can indeed modulate our speech down to a whisper if we want. We've also been habituated to it by 100 years of telephony.

It's the middle category that's the problem here - talking into a PC or large tablet's integral mic, and listening to the other person via the device's speakers. Without video cues as well, it feels like you're either talking to an inanimate object, or hearing a "disembodied voice". If you can see the other person, you are able to suspend disbelief that you're conversing with a lump of glass and metal. You're not sure how loud you sound to the other person, and there's also the risk of feedback. You're also aware of the fact that you're broadcasting your conversation to your surroundings - especially if you're speaking unnaturally loudly to make sure the mic picks up your words.

(To take this to the extreme, imagine having a voice-only conversation through your TV. Weird idea, isn't it?)

Now it could be argued that this will just drive video adoption further and faster, with WebRTC as another catalyst. And in some instances, it will indeed do that. But as already mentioned, video isn't always an option. In business, video can compromise confidentiality (eg the whiteboard behind you), or massively increase costs in a contact-centre scenario. For telcos, it brings a whole host of QoS and application issues, and increases transmission volumes multiple times, for an (at best) much more modest uplift in revenue.

WebRTC-powered video will absolutely have many uses cases, but it equally certainly will never be ubiquitous or the default mode for all human communications. Therefore, there seems to be a significant gap for companies (or open-source) solutions to enable more pure-audio WebRTC than is currently seen.

To my mind, the opportunities will emerge in three areas:

Use-cases where it can be assumed that headsets will already be being used, such as contact centres. Otherwise we face the irony and clunkiness of a plug-in free WebRTC service requiring the user to find and plug-in a physical device.
Mobile use-cases, where WebRTC audio is baked-into an app. This is most obviously suited to telephony-style applications where the phone is held to an ear, but for some reason we don't seem to mind speaking into a smartphone or tablet from a distance quite as much as a PC. (Perhaps because it's held closer, typically).
"Integral microphone & speaker" use-cases with much more well-designed UIs, that help overcome the cognitive dissonance of shouting at a machine. This might also mean WebRTC in standalone native applications, as an alternative to the browser.

I'm not a designer or ergonomicist, but observation of other PC-based VoIP services might yield some clues. For example, it helps a bit if the service or website displays a static photo or avatar of the other person - your brain starts perceiving "he/she" as the target of your speech, rather than "it". There's also mileage in displaying other visual cues - perhaps a dial-pad, or a graphic of a microphone, or even audio level meter so you don't feel the need to shout.

For audio-WebRTC to succeed, there will also be other considerations. For example, Skype maintains a little control-pad hovering in the foreground on a PC screen, even if the main app is behind something else (eg a presentation, or a notepad).

Some of the proposed use-cases for WebRTC are going to need some more thought here - for example "click to speak" to a call-centre, or perhaps communications integrated directly into emails or adverts.

Overall, I think that pure WebRTC-VoIP applications are something of a forgotten domain at the moment. Let's see if good design can provide a glass slipper for this Cinderella.

Details of Disruptive Analysis' WebRTC research & update service are available here. Please also get in touch via the request form, or information AT disruptive-analysis DOT com about the forthcoming update release.

Speaking Engagements & Private Workshops - Get Dean Bubley to present or chair your event

Pages

Tuesday, August 05, 2014

Voice-only applications: The Cinderella of WebRTC

No comments: