Future of Voice: Taking Voice beyond Ordinary Telephony

Masterclasses by Dean Bubley & Martin Geddes

Small-group collaborative workshops.
Next events: US East Coast, Spring 2012

London & Private events - please inquire
Click here for details and booking

Wednesday, October 05, 2011

My suspicions on iPhone delays - chipset integration complexity?

EDIT 1am 6/10/11: RIP Steve Jobs

 So.

Now we know that we're going to get an iPhone 4S, and not an iPhone 5 for now. I'm quite glad that I haven't been speculating about what might or might not be in an iPhone 5, as there are plenty of analysts and other commentators with egg on their faces today.

And so now there are lots of people who are disappointed in the outcome:

"Is that all? It's the same device with a better camera and that voice recognition thingy. Where's our big screen, our NFC, our LTE - what happened to all the leaked designs and mockups of cases for a new device? Why did it take them so long to do this, surely this isn't 16 months work, these changes are minor!"

I suspect that the answer lies under the hood. What Apple has actually been doing is working on a new hardware platform which will probably endure for several generations of its devices. That has likely taken a *huge* amount of work: chipset and hardware level integration is massively complex and needs lots of fine-tuning. It's quite possible to have to start again several times if the outcomes aren't perfect - something that Apple has the unique luxury of doing as it's not really that pressured cashflow-wise to get something out, even if it's compromised. It's worth remembering that the original iPhone only came out when Steve Jobs thought it was up to standard, and I'd imagine that Tim Cook will take the same stance on the '5. Then think of all the other promising devices over the years that have had issues like awful battery life, or which crash all the time - or even Apple putting out the iPhone 4 before getting the antenna properly tested and sorted.

It's worth looking at what's in an iPhone. Most critically, the baseband chipset which is the "modem" which connects to the cellular network. (There's a separate story with the apps processor as well, see below).

In most of the iPhones to date, the 2G/3G baseband has come from Infineon (now owned by Intel), with three generations of silicon. The CDMA-based version of the iPhone 4 was the first to have an alternative chip, from Qualcomm. In theory, this chip could have been used to give the "world-phone" UMTS/CDMA ability touted for the 4S - but it was never actually switched on and used in this way.

While the baseband for the 4S hasn't been announced, I strongly suspect it's also a Qualcomm product, as they pretty much monopolise everything CDMA-related, and so dual-mode GSM/CDMA devices are pretty much a done-deal for the Big Q.

Most critically, not only does Infineon/Intel not support CDMA (critical to keep Verizon and others on-board), it  does not currently have an LTE chipset , although it is working on one for mid-2012 volume shipments. Crucially though, that still doesn't support CDMA, which will likely still be needed for a few years as Verizon and others will not have full LTE coverage and enough capacity until 2013-2014 at the earliest and possibly much later. And of course, regular readers will know that I don't expect VoLTE to be fully ready for prime-time for a long while either, so VZW iPhones will need at least CDMA 1x support for the foreseeable future.

And then there's the processor. The iPad has already been using the A5 chip, so it is natural that the company would like to migrate it down into the phone. The iPhone A4 was used in both the iPhone 4's, with previous phones thought to be using a Samsung chip. But a tablet has a big form-factor and battery, so there doesn't need to be such tight integration between the processor and modem - the baseband can be done as a "module" - which in fact works well, because the iPad has to come in WiFi-only versions anyway. But iPhones all need baseband and processor, fitted into much tighter constraints of space and battery size.

So Apple historically has had:

  • A4 + Infineon (iPhone 4 GSM)
  • A4 + Qualcomm (iPhone 4 CDMA)
  • A5 + optional Infineon (iPad & iPad 2)
  • Samsung + Infineon (iPhone 3GS)

But Apple won't want to support multiple hardware platforms unnecessarily if it can avoid it, as it wants to keep up its margins in the face of competition and will want scale benefits and performance optimisation. So there are probably four important hardware integration and evolution exercises that have been occuring at Apple:

  • Move the GSM/UMTS platform from Infineon to Qualcomm, with lots of integration with the radio and other bits of the hardware
  • Add UMTS capability to the CDMA versions of the device - lots of integration, again. [Note: it's more important to add UMTS to CDMA than vice-versa for outbound roaming, although I'm sure Vodafone likes the idea of being able to roam onto Verizon]
  • Shift the iPhone platform from A4 to A5 processor & future descendants AND integrate this with the new (presumably Qualcomm) baseband
(There's also a vague chance that it's migrating away from the Samsung apps processor in newer versions of the 3GS, I guess - especially given the current IPR war between the two companies)


But more importantly I'm expecting that it is also....
  • (I'm really hypothesising here) Developing a single LTE platform consisting of A5 (or A6 etc) processor and a CDMA/UMTS/LTE baseband, usable in iPhone, iPad and iImagineSomethingElse. This is probably a *huge* development project which probably faces a ton of horribleness in everything from power consumption to radio performance. I'd guess there's perhaps a fallback plan of going to separate UMTS/LTE and CDMA/LTE platforms if it faces insuperable problems, especially given the range of frequencies that will need to be supported. I suspect Apple would rather have two or three variants of the same core platform, rather than totally divergent solutions.
I'd guess that just doing the first three have involved a Herculean effort, which we now see the results of in the 4S. If that means that Apple has to disappoint some of its more ardent fans clamouring for new stuff... well, I reckon they took they realised that doing everything in one go was impossible, and decided to take the pain now. 

It's also possible that Apple sticks with another iteration of the current 4S platform for another year, before adding in a "perfect" LTE option in 2013. My prediction from June 2010 was that Apple support LTE was most likely in 2012 or 2013 (I'm glad I dodged the bullet on the 5% 2011 chance). An October 2012 launch would make sense - and would also fit in with future timelines of both Qualcomm and Intel (and possibly others like nVidia).

Edit: there's also a chance Apple will do something truly disruptive with its LTE implementation, and move to a full dual-radio SVLTE approach, keeping telephony on circuit-based radio connections rather than relying on VoIP on LTE or the uber-clunky, worse-than-useless circuit-switched fallback option. Interestingly, Huawei has now started floating dual-radio as a possible option (ZTE has for some time). Hat-top to Zahid for spotting this, I'll follow up another time in depth - and it's also covered in the Future of Voice workshops' section on LTE.

To sum up - in my view, the iPhone 4S is all about the hardware platform shift. Stuff like Siri is window-dressing in comparison, to give the fans at least something visible. LTE support was completely unrealistic (as I've said before) given the other more important and urgent changes going on with the platform. It's also another reason why fripperies like NFC have been kicked further down the road - especially as I imagine Apple knows very well that it's being overhyped.

This might be disappointing for some, and could possibly give Microsoft and Nokia an opportunity to profit from a temporary lull in external iPhone evolution, but it's likely set the scene for continued growth and profitability from Apple's mobile devices for a few more years.

Tuesday, October 04, 2011

3GPP gets serious about the Future of Voice beyond telephony?


Some of the themes that Martin Geddes and I cover in our Future of Voice workshops (sign up now for Oct 27th!) are that:
  • Voice is much more than the basic 100-year old product called "telephony", or even the 30-year old product called "telephony while walking about"
  • There is a need for more cleverness in acoustics, especially for Mobile VoIP & VoLTE
  • In the network, there is a need for more “cure” as well as "prevention" when it comes to QoS and QoE, eg packet-loss concealment for when packets do (inevitably) get dropped
I've also been quite vocal for some time that bodies like 3GPP and GSMA have largely ignored these issues for now. The very term VoLTE (voice on LTE) tells you all you need to know - it's basically just Telephony 1.1 intended for LTE, not a generalised Voice platform technology. It's ToLTE, not VoLTE. There is also not much in the specs about acoustic requirements on devices, nor about ways to manage the user experience when problems occur.

Certainly, I've not seen anything from these type of organisations asking more fundamental questions like "What is voice communications anyway?", or "Why exactly do people make phone calls, and how is that changing?". Asking these questions and digesting the answers would drive a better understanding of exactly what mobile operators should be doing with voice services to protect against inevitable revenue erosion in the coming years. 

These type of things are not unimportant. There are technical, commercial and human-behavioural reasons to consider - and it is too late now to start thinking about merely replicating "old telephony" on LTE (or more generally), when others are defining the new user experiences and sources of value in voice communications. The right time to have started work on "vanilla mobile VoIP telephony" was four years ago - the world has moved on since then rather a lot, and fixing yesterday's problems (badly) is an exceptionally risky way to deal with a critical market transition.

For example, an increasing proportion of users now feel that unsolicited phone calls - even in business - are rude. It's a source of irritation - why is this person phoning me when I'm busy?! They resent being made to interrupt their day - or their more important data applications - for intrusive and unwanted calls that force them to divert 100% of their cognitive load (and device activities) to something that is perhaps of no value. The notion that "voice always come first" harks back 100 years, to the days when voice services were so expensive that all calls were de-facto important, and you dropped everything when the phone rang, with a live operator connecting the call. Those days are gone - this is why the escalation method via IM or SMS ("OK for a call now?") is becoming more prevalent. 

And then there's a whole plethora of non-telephony voice apps - I've seen a Skype slide with a VoIP-based baby monitor, for example. That's not a phone call (it's built into dedicated hardware for a start), and neither is in-game voice chat or a hundred other innovations with mashups and interesting hybrid forms of voice. 

Think of it in human speech terms - only a certain proportion of vocal activity is in the form of two people having a protracted, two-way conversation. There is singing, arguing, presenting, mumbling, rapping, acting, announcing, mimicking, being part of multiple conversations, interrupting and numerous other "use cases" of speech besides a conversation session. The same is true of electronically-communicated speech.

Browsing through the new work items for 3GPP R11, there are some interesting items, notably "Extensions of Acoustic Test Specifications", and "Codec for Enhanced Voice Services"

I'm not going to do a full analysis of these, but here's a couple of choice quotes from those:

  • “Enhanced quality for mixed content and music in conversational applications (for example, in-call music), leading to improved user experience for cases when selection of dedicated 3GPP audio codecs is not possible”
  • Robustness to packet loss and delay jitter, leading to optimized behaviour in IP application environments like MTSI within the EPS.
  • Acoustics and speech processing in terminals have a strong impact on the perceived quality of voice services. The audio test specifications in TS 26.131 and TS 26.132 do not completely reflect all aspects influencing user experience
The good news is that (a) the 3GPP recognises that what we have today isn't the last word in voice communications, and (b) that all of this is perhaps only 2 years late, rather than 3-4 years as we've seen in the past (maybe someone's been reading this blog or attending some of my conference speeches). 

But the bad news is that the context in which all this is taking place is itself evolving. The goalposts are moving very quickly, with problems in multiple parts of the ecosystem - the core, the radio, the device, interconnection points, BSS/OSS and so on. 

Not only that, but the very notion of "quality" is itself shifting quickly from the narrow network observable of QoS, to a much richer notion of QoE. If mobile operators are to retain or grow their 100's of billions of dollars of telephony revenue in the IP era, they will need to demonstrate value at the experience level, which isn't necessarily measureable in terms of 3-9's or 5-9's of availability.

Technically, there will be much more variable loss and delay to deal with because of faster networks, high definition audio, with complex interactions between flows at every layer. It's not the absolute loss and jitter that is going to be the problem, but their variability over time. 

We are also seeing the extension of voice to new acoustic environments, such as "ambient voice" in the home, devices with different form factors (e.g. smartphones with speakers on the back vs front), in-vehicle systems, "social voice" based around integrating the TV experience with voice communicatiosn and so forth. Consider what happens when you hit the 'speakerphone' button in some of these context - should this be a receiving device problem, or should there be a signal to the sending device to also do microphone audio processing differently?
The example 3GPP cites of hold music is a good one. Things like pre-recorded announcements might also be better sent using TCP and then played, rather than a single non-reliable audio stream. So future audio processing of, say, an interaction with a call centre might use a number of different audio processing strategies at different times in the interaction, depending whether you are talking to the IVR, person, voice authentication system, hold music, etc.  [Credit to Martin Geddes for this point]

The bottom line is that voice and future telephony - is not a "solved" problem, especially with LTE.
It's good that 3GPP is aware of some of the issues and is addressing them, but Martin and I are unconvinced that the scale of the problem has yet been fully grasped, nor how compressed the timelines are before third parties solve them (better) first. We'd like to see much more open and clear discussion of future voice use-cases, and a clear roadmap to go from Telephony to The Future of Voice.

And with that - a final plug for the next Future of Voice workshop on October 27th in central London, where we cover all of these themes and many more. It will be an excellent introduction to all these areas, with plenty of scope for networking and collaborative discussion and problem-solving. Details are here.

Use the promo code VIP25 for a discount on the pricing if booked online, or email information AT disruptive-analysis DOT com for more information or invoice-based payment options.
 
Blog Directory - Blogged