In the last few weeks I've been doing a lot of work on voice communications (and messaging / video / context):A common, over-arching, theme is starting to form for me. The future sources of value in voice are all about SPs / vendors asking the right questions when they design new services and solutions.
- I attended Enterprise Connect in Orlando discussing collaboration, UCaaS, cPaaS, WebRTC and related themes
- I spoke at a private workshop, for a Tier-1 operator group's communications-service internal experts team
- I've helped a client advise a strategy around the new European eCall in-vehicle emergency-call standard
- I've been writing a report on VoLTE adoption and impact, for my Future of the Network research stream published by STL Partners / Telco 2.0 (Subscribe! Link here)
Historically, most value in voice communications has come from telephony (Sidenote: voice is 1000 applications/functions. Phone calls are merely one of these). And in particular, the revenue has stemmed from answering the following:
- Who is calling?
- Where are they?
- Who is being called?
- Where are they?
- How long did they speak for?
- Plus (sometimes):
- When did they call?
- What networks were they on?
- Was the call high-quality? (drops, glitches etc)
- Is it an emergency?
Clearly, the answers to these questions are worth a lot of money: many billions of dollars. But equally clearly, they don't seem to be enough to protect the industry from competition and substitution from other voice-comms providers, or alternative ways of conducting conversations and transactions. As a result, voice telephony services are (mostly) being bundled as flat-rate offers into data-led bundles for consumers, or perhaps per-month/per-seat fees for unified comms (or SIP trunks) for business.
In other words, current voice revenues are being delivered based on answering fewer questions than in the past. Unsurprisingly, this is not helping to defend the voice business.
The current "mainstream" telecoms industry seems to be focused only on adding a few more questions to the voice roster:
- Is it VoIP / VoLTE / VoWiFi? (Answer = sometimes, but "so what" for the customer?)
- Can we use it to drag through RCS? (Answer = No)
- How can we reduce the costs of implementation? (Answer = maybe NFV/cloud)
- Are there special versions for emergencies? (Answer = yes, eg MCPTT and eCall)
- Is there a role for CSPs in business UCaaS? (Answer = yes, but it's hard to differentiate against Microsoft, Cisco, RingCentral, Vonage and 100 others)
- What do we do about Amazon Echo? (Answer = "Errrrmmmm... chatbots?")
Fixed and cable operators are in a slightly better position - they have long had hybrid business models partnering with PBX/UC vendors for businesses and can monetise various solutions, especially where they bundle with enterprise connectivity. For fixed home telephony, most operators have long viewed basic calls as a commodity, and are either protected by regulators via line-rental and emergency-call requirements, or can outsource provision to third parties.
In my view, there are many other questions that can be asked and answered - and that is where the value lies for the future of voice communications. None are easy to achieve, but then they wouldn't be valuable if they were:
- Why is the call occurring? (To buy something, ask a question, catch up with a friend, arrange a meeting or 100 other underlying purposes)
- Where is the call being made and received (physically)? For instance indoors, in a noisy bar, on a beach with crashing waves, in a car, in a location with eavesdroppers?
- Is the communication embedded in an app, website or business process?
- Is the call part of an ongoing (multi-occasion) conversation or relationship?
- Is a "call" the right format, with interruptive ringing and no pre-announcement? Is a push-to-talk, one-way, "whisper mode", broadcast, team or other form more appropriate?
- Are both/all parties human, or is a machine involved as well?
- What device(s) are being used? (eg headset, car, wearable, TV, Echo, whiteboard?)
- Who gets to record the call, and own/delete/transcribe the recording?
- Are the call records secure, and can they be tampered with?
- What's the most effective style of the call? (Business-like, genial, brusque, get-to-the-point-quickly etc)
- What languages and accents are being spoken? Can these be adjusted for better understanding? What about background noise - is that helpful or hindering?
- Can the call add/drop other parties? Are these pre-arranged, or can they be suggested by the system in context?
- Are the participants displaying emotion? (Happiness, anger, eagerness, impatience, boredom etc) . How can this be measured, and if necessary, managed?
- Is there a role for ultrasound and/or data-over-sound signalling before or during the call?
- How can the call be better scheduled / postponed / rescheduled?
- Is a normal phone number the best "identifier"? What about a different number, or a social / enterprise / gaming / secure identity?
- Are there multiple networks involved/available for connection, or just one? What happens when there are multiple choices of access or transit providers? What happens where the last 10m is over WiFi or Bluetooth beyond the SP's visibility?
- Is encryption needed? Whose?
- What solutions are needed to meet the needs of specific vertical-markets or other user groups? (Banking, healthcare, hospitality, gaming etc)
- What are the desired/undesired psychological effects of the communications event? How can the user interface and experience by improved?
- Did the call meet the underlying objectives of all parties? How could a similar call be improved the next time?
- How do we track, monetise and bill any of this?
I'm seeing various answers to some of these questions - for example, contact-centre solutions seem to be most advanced on some of the emotional analysis, language-detection and other aspects. There are some interesting human-driven psychology considerations being built into new codec designs like EVS (eg uncomfortable silences between words). MVNOs and cPaaS players are doing cool things to "program" telephony for different applications and devices. The notion of "hypervoice" was a good start, but hasn't had the traction it deserved (link). Machine-learning is being applied to help answer some of these questions - most obviously with Alexa/Siri/Assistant voice products, but also behind the scenes in some UC and contact-centre applications.
But we still lack any consistent recognition that voice is "more than calls". 99% of effort still seems to go on "person A calls person B for X minutes". Very little is being done around intention and purpose - ask a CSP "Why do people make phone calls?" and most can't give a list of the top-10 uses for a "minute". Most people still use "voice" and "telephony" synonymously - a sure-fire indicator they don't understand the depth of possibility here. And we still get hung up on replacing voice with video (they have a Venn overlap, but most uses are still voice-centric or video-centric).
Until both the telco and traditional enterprise solutions marketplaces expand their views of voice (and entrench that vision among employees, vendors and partners), we should continue to expect Internet- and IoT-based innovators to accelerate past the humble, 140yr-old phone call. Start asking the right questions, and look for ways to provide answers.