Speaking Engagements & Private Workshops - Get Dean Bubley to present or chair your event

Need an experienced, provocative & influential telecoms keynote speaker, moderator/chair or workshop facilitator?
To see recent presentations, and discuss Dean Bubley's appearance at a specific event, click here

Monday, February 17, 2014

Decoding Apple's VoIP, WebRTC, UC and VoLTE strategy

Like everyone else in the mobile industry, I'm curious about Apple's future direction and possible launches and strategic intentions. In particular, I'm interested in its involvement in voice, video and messaging-based communications. We've had both iMessage and FaceTime video-calling for a while... but what about voice? And what about APIs? WebRTC? And what about the impact on telcos' services? And will it get explicitly involved in enterprise comms & UC?  [Quick sidenote: I'm speaking at this hosted-UC vendor event in Frankfurt this week]

First, let's recap. There are (broadly) three sorts of voice communications:

a) Standalone "classic" phone calls, or something very functionally-close to primary telephony. This is basic "Person A calls Person B for X minutes" (or goes to voicemail). Telephony is the bulk of "voice" today, to the extent that people often wrongly use the two terms interchangeably. VoLTE fits here as well. Traditional business PBX systems fit here too, mostly.

b) Alternative forms of standalone voice communications that are not really "phone calls". Classically push-to-talk (walkie-talkie) service has been a good telco example, or conferencing - but there are also new types of realtime audio communication from the developer community. Encrypted secure calls with a dedicated app fit here. Some forms of enterprise mobile UC. One I heard about recently was "networked jamming" for band-members playing or singing in different places. Arguably Skype falls here rather than (a) as calls are often prefaced by an IM session and then "escalated" to voice - a very different user-interaction model than a normal interruptive phone call. Apple's Siri is also clearly a non-telephony voice application as well, albeit with one party as a robot. This domain is growing fairly fast.

c) Embedded voice communications, in which speech/audio gets embedded into a website or application. This is where the action - and future disruption - is mostly going to come from. Well-known examples include voice-chat inside multi-player games, IM/voice hybrids (although these have gone out of fashion somewhat), call-me buttons in websites and a broad array of API- and WebRTC-based applications that are emerging, often with video as well. This model is likely to extend to mobile voice-enhanced applications in the near future - imagine a taxi app with a "speak to the driver" button, rather than sending an SMS with a phone number. In the enterprise space, "proper" collaboration via UC, plus concepts like Hypervoice mostly fit here as well. 

A similar split of use-cases applies to both messaging and video, for example with SMS and Apple iMessage being in equivalent category a), Whatsapp & LINE in b), and Facebook messaging originally as c) but now moving towards b) a bit as well. For video, Skype, Tango and Apple FaceTime are in a), assorted telepresence and CCTV apps in b), and most of the WebRTC and Flash-based video in c), especially in browsers but increasingly in apps as well.

Notably, Apple has shown fairly little overt interest in category (c) to date - embedded communications to date. There are no easy iMessage or FaceTime Video APIs to incorporate them into apps, nor WebRTC support in Safari for websites. However, Apple has now finally joined the W3C's WebRTC working group, so perhaps we'll see some more concrete moves, as it realises that alternate 3rd-party approaches are putting the technology on iPhones, iPads & Macs anyway.

Google, by contrast, focuses more on b) and c) for messaging, voice and video (eg with Hangouts and its WebRTC evangelism), while assorted others can also be plotted on a (rough, first-pass, to-be-refined-comments-welcome) 3x3 chart:

(As an aside , this gives another clear view of where RCS goes wrong - trying to do too many things rather than focusing on actual user requirements. Also worth noting that category c embedded-comms platforms are usually those that have evolved from successful products first)

But back to voice comunications...

FaceTime Audio

Although it has gathered surprisingly little attention, Apple launched FaceTime Audio with iOS7 in September 2013. Despite the name being similar to the video product, it is specifically an audio-only voice calling product, that is very similar to a traditional phone call, with a similar UI/UX. It fits firmly into category a), although at present it is only available on LTE-enabled iPhones/iPads via cellular, or older devices like iPhone 4 via WiFi.

At the moment, FaceTime audio is almost but not quite seamless. It has a separate icon to "ordinary" phone calls, and a separate ringtone. There also seems to be a bit of a lag from swiping the lock screen to audio actually starting, when answering a call. But it's quite close - because it is integrated into the main contact and dialler UI, it's even possible to use it by mistake instead of a normal phone call. I've had a couple of calls via FaceTime audio and been impressed by the clarity, and its good functioning over (probably fairly uncongested) LTE in central London. But it's not quite ready for primetime yet, given the relatively low numbers of both 4G users so far, and the patchy nature of LTE coverage in much of the world.

In other words, it isn't quite as much an in-your-face slap to the telcos as iMessage was with SMS. It's also not usable with the sizeable base of iPhone 4/4S users unless they're on WiFi.As with iMessage, it only works between Apple users - otherwise it defaults to a normal circuit call (typically itself using CSFB, circuit-switched fallback). I suspect Apple is treating it as a large-scale beta at the moment, and monitoring both user behaviour and the app's performance in real-world conditions.

Apple & VoLTE

The interesting question arises later this year (or maybe later if my doomsday predictions prove accurate) when more operators, especially in the US, start rolling out VoLTE. I'd say there's at least a 70% chance that the iPhone 6 and iOS 8 won't support VoLTE at all, but that's possibly a function of pressure from AT&T, Verizon, China Mobile and a couple of others. One of the key variables will be whether real-world VoLTE works as well as FaceTime Audio, as well as more strategic issues that Apple needs to consider around IMS. Personally, I expect Apple to procrastinate as long as possible over VoLTE especially if it involves any compromises in terms of user experience or its own control of its users. 

What I do expect is that if FaceTime Audio gets favourable feedback over the next few months, it will be pushed higher in the stack towards becoming the default category-a telephony experience. That will be especially true for markets with decent LTE networks and sufficient iPhone user base to make FT-to-FT calls a decent probability. It may also be dependent on Apple indicating to users that it's consuming data allowance, and that niggling aspects of user experience like the lag in answering are fixed. As with VoLTE, it's also dependent on coverage, although Apple tends to make WiFi use easy on its devices.

One possible scenario is that iPhones become FaceTime Audio-primary, with either VoLTE or CSFB as a fallback, either if coverage is poor or for iOS-to-non-iOS customers. The interesting thing there is it implies a double fallback - there always needs to be CS telephony if there's no LTE coverage. (Although I suspect Apple will be more willing and able to use decent HSPA for VoIP than the operators).

One other interesting question is whether Apple might be able to improve/hack the radio aspects of all of this. The main problem with CSFB is the long call setup times - the network has to push the connection down from LTE to 2G/3G when the user makes a call, which takes some considerable time. Yet plausibly, Apple might be able to pre-empt this if the OS notices the user composing a phone number, or looking at the call register - perhaps only if there is no concurrent data traffic to disrupt. 

Similarly, if the user is already doing an intensive data task, maybe it might allow them to stop the phone shunting the connection down to 3G, just to receive an inbound call. It's wrong to imagine that all phone calls are more important than all data applications and should automatically have the right to override an ongoing 4G data session.

In other words, Apple might try to reinvent and enhance category-a primary telephony, using a combination of FaceTime, CSFB, VoLTE etc, in order to make the experience of calling better. It could develop FaceTime Audio with "interruption controls" for the user, rejig the awful voicemail experience (remember the original Visual Voicemail?) and try to tune the device-based user-experience of telephony, which is something that GSMA, OMA and others have woefully failed to attempt.

In many ways, iMessage is "like SMS, but better". I can imagine FaceTime Audio being positioned as "like voice calls, but better" in future too. (VoLTE is pretty much just PSTN-for-IP in terms of UI/UX, except where vendors try to blend it with video in a "communicator" product, or, laughably, couple it to RCS).

In this way, Apple could be the company that stems the tide of (some) users from clunky-old telephony to categories b & c, especially "nearby" substitutes like Skype.

To sum up, I expect Apple to monitor how FaceTime Audio works in practice, and then push it towards the primary telephony engine for iOS8 if it performs in way that's better for the end-user. CSFB will be the main fallback, although there is a slim chance of using VoLTE at the end of 2014. I could also possibly imagine an FT-A/IMS gateway and transcoder of some type. 

Non-telephony voice

How will Apple play directly in "category B", the non-telephony standalone voice comms category? I suspect that Siri will remain the centrepiece here, using it as a gateway to various forms of cloud-based comms interaction. We already see Siri as assistant; I wouldn't be surprised to see it evolve towards concierge- or interpreter-type roles.


As for WebRTC or other ways to category-c embedded voice and video, I think that Apple is probably acutely aware of its "unofficial" appearance on various of its devices already, despite no explicit support. WebRTC is being enabled either via browser plug-ins (yes, I know WebRTC isn't supposed to need them, but they're emerging anyway), or mobile SDKs from the likes of Tokbox, Twilio et al. I can't see these being blocked by Apple, despite Google's fingerprints all over WebRTC, because it is also supported by pretty much all telcos, all network vendors and most of the IT industry already. Moreover, early signs are that WebRTC will drive a lot of new user-satisfying applications, or enhancements of existing ones. I'd imagine Apple has take a close look at Amazon Mayday as a case-study, too.

I don't think that Apple is able to create its own WebRTC competitor (perhaps unlike Microsoft), because it doesn't have the starting-point assets in productivity software, full-scale conferencing, UC, telco infrastructure, contact centres and the like. It could (indeed arguably should) release FaceTime audio/video APIs for native iOS app developers. But given that Apple has itself been the main culprit driving the dagger into Flash, it must also realise that the browser/PC use of embedded comms will only go WebRTC's way in future, especially give its still-small share of PC installed base, and the fact that Safari doesn't reach onto Windows devices. (In fact, I'd be surprised if more than 50% of Mac users still treat Safari as their primary browser, rather than Chrome or Firefox).

Given its new membership of W3C WG, it wouldn't entirely surprise me if a future Apple iOS used WebRTC APIs or something very close to them, to give developers some way to embed FaceTime Audio and/or Video into apps. (It also wouldn't surprise me to see it acquire one of the cloud comms/SDK players too). There are some open questions over codecs (Apple likes H.264 for video, but has been silent about the "done deal" of Opus and G711 for audio) but I don't see that as a showstopper. I also suspect Apple is slightly worried what might happen when Google (inevitably in my view) puts WebRTC APIs right into Android, meaning that developers could not develop feature-equivalent iOS apps without relying on 3rd party APIs.

As always, deeper analysis of all trends in WebRTC is available to purchasers & subscribers of Disruptive Analysis' WebRTC strategy report & updates. Details here

(Sidenote here: I wonder if Rakuten's CEO or its investment bankers have heard of WebRTC. $900m seems like an awful lot to pay for Viber, given the likely future direction of mobile VoIP)

UC, Unified Communications

Before I started focusing more on mobile, I used to spend more time as an enterprise comms and networking analyst. WebRTC and my Future of Voice research has taken me back into that sphere more deeply in recent years.

Clearly, Apple has been making a more concerted effort in the enterprise recently, with a dedicated developer programme, and rather more overt marketing and positioning of iOS for business/government users. It is no doubt aware that many large companies are moving to iPhones, as well as iOS devices being likely a popular choice for BYOD programmes. iPads are also gaining strong adoption across the business landscape, for diverse use-cases.

However, as yet Apple has shied away from anything resembling a full UC strategy, instead leaving that space for its developers to exploit. But if FaceTime Audio and Video become more-used by employees, might that change? In particular, there may be issues around call-recording that emerge in some sectors like finance.

There is also a B2C angle here, especially where iPhone users interact with a business that also uses Apple devices - Microsoft appears to be grooming Skype & Lync as a way to enable direct connection from customer to business without a 1-800 or similar mechanism.

I don't really see Apple wanting to compete head-on with Cisco or Microsoft or Avaya or BroadSoft and peers in this area. Apart from anything else, the hardware margins aren't there. But I can certainly imagine an attempt to blend or gateway FaceTime (and maybe Siri for cloud voice like recording) with select partners. Unlikely to happen too quickly though - but I'm keeping a close eye.


Overall, I think the next 12 months will yield much greater clarity on Apple's stance on voice communications, as well as video or messaging. I wouldn't be surprised to see WebRTC (or almost-WebRTC) emerge as an important part of the overall experience, but I think FaceTime Audio is the bit which will suddenly be noticed by customers. VoLTE - if and when Apple implements it in late 2014 or more likely 2015 - will probably be pushed down to a supporting role, when neither FaceTime nor embedded communications is appropriate, similar to SMS's secondary role to iMessage and push notifications today. Siri might be pressed into broader service as a general gateway to "cloud voice" functions, while enterprise UC will still probably not be tackled directly by Apple on a standalone basis - although elements like conferencing might be carved off to compete more with Google.


AlanL said...

> One I heard about recently was "networked jamming" for band-members playing or singing in different places.

Really? I imagine getting millisecond latency + adequate audio quality must be quite the challenge

Dean Bubley said...

Hi Alan

Haven't seen it demo'd myself, but it's certainly be discussed as possibility in various places.

Found this for example: https://github.com/komasshu-skyway-sample/drumsessions

And also a reference to jamming on the Wikipedia page for the Opus codec: http://wiki.hydrogenaudio.org/index.php?title=Opus



Anonymous said...


Last November I reached out to Apple to see if they could support an insurance industry application for video to support claims processing. One of the basic requirements was to support recording of the video at a gateway or media service. Their engineers told me that this was not possible and that FT "was never intended to be an enterprise application." Wow, my customer was ready to write a 7 figure check and was basically told, no thanks.


Unknown said...

Hi Dean,
Insighftul, very interesting analysis as always. Being a newcomer to the mobile space (1yr), and coming from the traditional UC/enterprise comm industry as well (Avaya, Lucent.... that old AT&T spinoff) it has been quite surprising to see mobile telcos struggling with the ideas we've been pushing with enterprise folks for years. The parallels are amazing. And yet telcos are literally years away from implementing something similar to, say, Siemens (oops... Unify now) Project Ansible.
So funny to think that an industry that is in shambles (Cisco and Microsoft mopping up) still has vitality to produce products that mobile only dreams of (maybe a hint of what will happen to telcos that don't get it right... disruptive innovation will come soon and knock on their door).

Anyway, pleasure reading you. Cheers from South America,


Toby Allen said...

VoLTE is one of those interesting cases. Large network investment for effectively no direct change in end customer experience. I think Apple will implement it as the carriers demand and the carriers will only implement it as spectrum demands force them to reduce the 3G CS spectrum usage.

Niels said...

This is a quite good comprehensive overview. Do you see any reason how Apple could play a role in this space? Apple's business in its long history has been proven mainly to copy, make-it-proprietary and commercialize successful concepts in an early stage. This specific strategy wont't work in an open, global and collaborative evolution of technology which basically relies on vendor independent standardization.
In my understanding, Apple with its parasitizing business model may be ruled-out here. Any comments?

Duncan said...

No mention of BlackBerry BBM? With cross platform voice available now and video coming soon, this has to be a contender, especially for enterprise.

Dean Bubley said...

Chris - Interesting feedback, thanks. Although that doesn't mean Apple isn't working on something in private though, I guess. Could be they'll do "FaceTime for Enterprise" (or even a separate comms suite) at some point - who knows?

Fernando - mobile companies have long struggled with enterprise. Ironically, their old issues with legacy-PBX integration is now changing to a new problem with how to integrate mobile-based UC with enterprise IT & cloud applications. Also always problems with purely network-based call control vs. on-premise/corporate-controlled policies.

Toby - I'm not so sure. Depends if Apple can implement VoLTE in such a way that there are no "generalised" IMS & policy tentacles through the rest of the OS and comms stack. Silo'd VoLTE is a possibility, without a generic IMS App Framework on the device, perhaps

Neil - fairly provocative choice of words. I wouldn't describe Apple as "parasitising". In what specific area do you see vendor-independent standardising being important? There is a lot of value in comms islands (voice, video or as Whatsapp just proved, messaging). Sometimes uses underlying standard enablers, sometimes not.

Duncan - I've referenced BBM before, but it's definitely waning in importance. The chart isn't intended to be comprehensive (eg no Cisco Jabber or Snapchat either)

Fazal Majid said...

One major use case you do not list for voice is conference calling. Polycom got it mostly right at the edge with their echo cancellation, but conferencing services are still dismal in terms of usability and quality.