I've been using my voice-fragmentation bubble chart in presentations for a couple of years now. It points out that "voice" is not the same as "telephony", despite the fact that in the past almost all voice-based communications has been made up of standalone phone calls.
In future, we will see an increasing amount of voice capability become embedded into other applications and services, often using different technology and user interaction models to a traditional "call". The best example of this has been in-game voice chat between players, which bears little resemblence to a phone call, and adds extra features like stereo/3D-audio capability so users can tell if team-mate is behind them or in front. There has also been a rise in new forms of audioconferencing which are not strictly phone calls, as well as a broader (but still quite slow) move towards application-embedded telephony using various API platforms.
This is important to understand, especially for telecom operators wanting to retain relevance in voice communications. Many in the industry do not understand the difference - many use the two terms (voice & telephony) interchangeably, and believe that operators have a cornerstone role in voice communications overall, rather than just telephony. This is even reflected in the misnaming of VoLTE, which should really be called ToLTE.
The big question is whether the standalone phone call - and the bulk of the global trillion-dollar telephony market - will be threatened by contextualisation as well as competition.
This is an important distinction:
There are also a couple of other related trends:
At present, contextual voice has been relatively restricted. Some niches (games, conferencing) are large and specialised enough to justify well-integrated solutions. In the past, we've had some voice-chat between IM buddies, which is somewhere between competition and substitution, depending precisely how and when it is used. But we haven't seen much true "social voice", for example - until Facebook's recent forays into adding VoIP in North America.
Messaging is some way ahead of voice in terms of both contextualisation and competition. There's already a ton of competitive standalone-messaging services like WhatsApp and iMessage, which are having an impact on SMS. We also see a huge and growing use of app push-notifications, which are clearly contextual, but one-way only. And, increasingly, we're seeing messaging linked to existing social networks (Facebook, Twitter, LinkedIn) or embedded into customer-service websites (live-agent chat). But while we've seen a lot of A2P SMS, we haven't seen a lot of P2A2P SMS - ie personal SMS's driven from within other apps or websites.
(Sidenote - messaging also has a critical non-voice like characteristic, in that the same message often gets delivered via multiple paths, such as in-app, plus via SMS or email or push notification. This means a lot of double-counting in the stats too)
This is all happening sooner for text-type communications, because it has been comparatively easy to contextualise messaging for a while. Adding an IM function to an app or website is fairly simple technically for developers, and relatively undemanding of both network and device. It is also much easier to use the application's own identity-space for messaging, or indeed none at all.
That said, there is essentially no need for P2A2P SMS - it would be silly to use (paid-for, subscribed, number-based) SMS between users of a game or social app, for example, although "SMS-out" is more common.
Voice (and especially video) is much harder than messaging to contextualise and add into apps or websites. In particular, voice mainly comes both in the form of circuit telephony (anchored in a telco) and various classes of VoIP. This explodes into a whole range of complex issues around user interaction, billing model, network quality, codecs, firewall traversal and so forth.
So we have had (broadly) two approaches to voice-contextualisation in the past:
In general the API-type approaches have not been "symmetrical", eg using an ordinary phone call within an app used by two users. You wouldn't see a multiplayer mobile karaoke app, with embedded phone calls used to transport the singing. They have tended to be used for contact-centre style use-cases, such as customer service (call-me) or automated surveys.In future, we will see an increasing amount of voice capability become embedded into other applications and services, often using different technology and user interaction models to a traditional "call". The best example of this has been in-game voice chat between players, which bears little resemblence to a phone call, and adds extra features like stereo/3D-audio capability so users can tell if team-mate is behind them or in front. There has also been a rise in new forms of audioconferencing which are not strictly phone calls, as well as a broader (but still quite slow) move towards application-embedded telephony using various API platforms.
[NEW REPORT ON WEBRTC - CLICK HERE]
This is important to understand, especially for telecom operators wanting to retain relevance in voice communications. Many in the industry do not understand the difference - many use the two terms (voice & telephony) interchangeably, and believe that operators have a cornerstone role in voice communications overall, rather than just telephony. This is even reflected in the misnaming of VoLTE, which should really be called ToLTE.
The big question is whether the standalone phone call - and the bulk of the global trillion-dollar telephony market - will be threatened by contextualisation as well as competition.
This is an important distinction:
- Competition for telephony comes from other cheaper/better standalone calling services and applications used for similar purposes as a phone call. This includes most call-oriented VoIP services such as Viber and some (but not all) use of Skype. To the user, the experience is similar - person A calls person B, perhaps with a bit of pre-negotiation via IM/presence about convenient timing. The intent & context is similar, although the mechanics and features might differ a bit. This is direct competition, often based on free/freemium models. In some cases such as SkypeOut, we see hybrids.
- Contextualisation for telephony is more subtle. This is where certain conversations are siphoned off from being standalone calls, into being "in-app" or "in-context". In some cases the ultimate user-interaction is similar enough to a phone call to permit APIs to be used (from a telco or a 3rd party like Voxeo or Twilio) to embed calling capability. In others (such as gaming, or an intercom-type function), a phone call is not a good enough "raw ingredient". A critical element here is whether the voice part remains a "service" (billed, linked to a subscription etc) or whether it is just a feature or function. This is an area where WebRTC will have a major impact, as it allows easier contextualisation of voice into web and app activities.
[NEW DISRUPTIVE ANALYSIS REPORT ON WEBRTC - CLICK HERE]
There are also a couple of other related trends:
- Upgrading phone calls involves adding video or some other feature, but leaving the call as a standalone activity (eg much of Skype video fits here). Sometimes this is grouped together with contextualisation - for example, in-app or in-website video calling. Again, WebRTC will have a major impact here.
- Substituting phone calls is the use of another mechanism to deliver the same intention & outcome instead of a call. For example, sending an SMS or IM to say "I'm running late" rather than calling, or using an app to book a taxi - rather than calling a dispatch office. There is also a broad trend of people disliking phone calls and other forms of voice altogether, and seeking a non-voice alternative.
At present, contextual voice has been relatively restricted. Some niches (games, conferencing) are large and specialised enough to justify well-integrated solutions. In the past, we've had some voice-chat between IM buddies, which is somewhere between competition and substitution, depending precisely how and when it is used. But we haven't seen much true "social voice", for example - until Facebook's recent forays into adding VoIP in North America.
Messaging is some way ahead of voice in terms of both contextualisation and competition. There's already a ton of competitive standalone-messaging services like WhatsApp and iMessage, which are having an impact on SMS. We also see a huge and growing use of app push-notifications, which are clearly contextual, but one-way only. And, increasingly, we're seeing messaging linked to existing social networks (Facebook, Twitter, LinkedIn) or embedded into customer-service websites (live-agent chat). But while we've seen a lot of A2P SMS, we haven't seen a lot of P2A2P SMS - ie personal SMS's driven from within other apps or websites.
(Sidenote - messaging also has a critical non-voice like characteristic, in that the same message often gets delivered via multiple paths, such as in-app, plus via SMS or email or push notification. This means a lot of double-counting in the stats too)
This is all happening sooner for text-type communications, because it has been comparatively easy to contextualise messaging for a while. Adding an IM function to an app or website is fairly simple technically for developers, and relatively undemanding of both network and device. It is also much easier to use the application's own identity-space for messaging, or indeed none at all.
That said, there is essentially no need for P2A2P SMS - it would be silly to use (paid-for, subscribed, number-based) SMS between users of a game or social app, for example, although "SMS-out" is more common.
Voice (and especially video) is much harder than messaging to contextualise and add into apps or websites. In particular, voice mainly comes both in the form of circuit telephony (anchored in a telco) and various classes of VoIP. This explodes into a whole range of complex issues around user interaction, billing model, network quality, codecs, firewall traversal and so forth.
So we have had (broadly) two approaches to voice-contextualisation in the past:
- API-led approaches to CS call control, such as adding a "call me" button on a website, which initiates a standard PSTN call to a phone
- Custom VoIP implementations such as that seen in web-conferencing, gaming, healthcare (eg nurse-call) and so on.
Conversely, the custom VoIP approach has been expensive and confined to areas where developers are communications specialists, using SIP or web plug-ins like Flash.
WebRTC democratises the creation of contextualised VoIP-style voice - certainly on the web via browsers, and increasingly mobile via 3rd-party SDKs. Disruptive Analysis believes that this will ultimately prove to be more important than new "standalone calling" use-cases for WebRTC, although the ease of creating those may mean they happen first.
WebRTC is likely to be the main enabler of "social voice", rather than some new form of telephony API exploiting carrier VoIP or VoLTE. Those come with the baggage of real phone numbers, whereas the whole point of communicating inside a social network is about using that community's ID and naming/reachability conventions. We already see the first of these with Twelephone.
That said, there may still also be an argument for vertical, custom-integrated contextual VoIP as well. Facebook doesn't need to use WebRTC, although its use of web-based HTPP messaging for chat suggests that would make sense. Similarly, Microsoft is connecting its enterprise Lync system to Skype, to enable contextual in-app customer service and other B2C communications functions. That might transition to WebRTC (or CU-RTC-Web) but doesn't need to.
[NEW DISRUPTIVE ANALYSIS REPORT ON WEBRTC - CLICK HERE]
For operators, the real issue is this: in the past, the voice marketplace has almost entirely been made up of standalone telephone services & calls. That has faced competition on both price and features from Skype already.
But the real battle is ahead, as standalone calling starts to get nibbled away by context-based voice, for which a broad range of creation options exist - WebRTC being the most notable and fast-moving. It is far from clear that telephony - CS or telco VoIP - is the right raw ingredient. SMS certainly hasn't been the driver for in-app or in-website messaging, so it seems unreasonable to assert that telephony will have any more success.
Standalone calls are not going to disappear. But it's much less clear that the rump of out-of-context telephony will be worth more than a tiny fraction of today's market.
Fantastic analysis – spot on as usual!
ReplyDeleteGood technology ...
ReplyDeleteWhat about flash or java based webphones such as mizu phone?
ReplyDelete(www.mizu-voip.com/Software/WebPhone.aspx)