Speaking Engagements & Private Workshops - Get Dean Bubley to present or chair your event

Looking for a provocative & influential keynote speaker, an experienced moderator/chair, or an effective workshop facilitator?
To discuss Dean Bubley's appearance at a specific event, contact information AT disruptive-analysis DOT com

Tuesday, April 19, 2016

TelcoFuturism: Will AI & machine-learning kill the need for network QoS?

Following on from my introductory post about TelcoFuturism (link), this is a forward-looking "what if?" scenario. It arises from one impending technology intersection - the crossover between network policy-management, real-time applications (especially voice & video) and machine-learning/artificial intelligence (AI)

One of the biggest clich├ęs in telecoms is that every new network technology allows the creation of special "quality of service" characteristics, that potentially enable new, revenue-generating, differentiated services. But while QoS and application-based traffic-engineering certainly is useful in some contexts - for example, managed IPTV on home broadband lines, or prioritisation of specific data on enterprise networks - its applicability to a wider audience remains unproven. 

In particular, end-to-end QoS on the public Internet, paid-for by application or content providers and enforced by DPI and in-network policy engines, remains a fantasy. Not only does Net Neutrality legislation prohibit it in many cases, but the concept is an undesirable and unworkable fallacy to begin with

App-specific QoS doesn't work technically on most shared networks (ask colleague Martin Geddes, who'll enlighten you about the maths of contention-management). There's no way to coordinate it all the way from server-to-user-access. While CDNs and maybe future mobile edge nodes might help a bit, that's only a mid-point, for certain applications. On mobile devices, the user is regularly using one of millions of 3rd-party WiFi access points, over which the app-provider has no control, and usually no knowledge. The billing and assurance systems aren't good enough to charge for QoS and ensure it was delivered as promised. Different apps behave differently on different devices and OS, and there's no native APIs for developers to request network QoS anyway. And increasing use of end-to-end encryption makes it really hard to separate out the packets for each application, without a man-in-the-middle.

There's also another big problem: network quality and performance isn't just about throughput, packet-loss, latency or jitter. It's also about availablility - is the network working at all? Or has someone cut a fibre, misconfigured a switch, or just not put radio coverage in the valley or tunnel or basement you're in? If you fall off of 4G coverage back to 3G or 2G, no amount of clever policy-management is going to paper over the cracks. What's the point of 5-9's reliability, if it only applies 70% of the time?

Another overlooked part of QoS management is security. Can DDoS overload the packet-scheduling so that even the "platinum-class" apps won't get through? Does the QoS/policy infrastructure change or expand the attack surface? Do the compromises needed to match encryption + QoS introduce new vulnerabilities? Put simply, is it worth tolerating occasionally-glitchy applications, in order to reduce the risks of "existential failure" from outages or hacks? 

There are plenty of other "gotchas" about the idea of paid QoS, especially on mobile. I discussed them in a report last year (link) about "non-neutral" business models, where I forecast that this concept would have a very low revenue opportunity.

There's also another awkwardness: app developers generally don't care about network QoS enough to pay for more of it, especially at large-enough premiums to justify telcos' extra cost and pain of more infrastructure and IT (and lawyers)

While devs might want to measure network throughput or latency, the general tendency is to work around the limitations, not pay to fix them. That's partly because the possibility isn't there today, but also because they don't want to negotiate with 1000 carriers around the world with different pricing schemes and tax/regulatory environments (not to mention the 300 million WiFi owners already mentioned). Most would also balk at paying for networks' perceived failings, or possibly to offset rent-seeking or questionable de-prioritisation. Startups probably don't have the money, anyway. 

Moreover - and to the core of this post - in most cases, it's better to use software techniques to "deal with" poor network quality, or avoid it. We already see a whole range of clever "adaptive" techniques employed, ranging from codecs that change their bit-rate and fidelity, through to forward error-correction, or pre-cacheing of data in advance if possible. A video call might drop back to voice-only, or even messaging as a fallback. Then there's a variety of ways of repairing damage, such as packet-loss concealment for VoIP. In some cases, the QoS-mitigation goes up to the UI layer of the app: "The person you're talking to has a poor connection - would you like to leave a voicemail instead?"

And this is where machine-learning and AI comes in. Because no matter how fast network technology is evolving - NFV & SDN, 5G, "network-slicing" or anything else - the world of software and cognitive intelligence is evolving faster still. 

I think that machine-learning and (eventually) AI will seriously damage the future prospects for monetising network QoS. As Martin points out regularly, you can't "put quality back into the network" once it's lost. But you can put quality, cognitive smarts or mitigation into the computation and app-logic at each end of the connection, and that's what already occurring and is about to accelerate further.

At the moment, most of the software mitigation techniques are static point solutions - codecs built-into the media engines, for instance. But the next generation is more dynamic. An early example is that of enterprise SD-WAN technology, which can combine multiple connections and make decisions about which application data to send down which path. It's mostly being used to combine cheap commodity Internet access connections, to reduce the need to spend much more on expensive managed MPLS WANs. In some cases, it's cheaper and more reliable to buy three independent Internet connections, mark and send the same packets down all of them simultaneously, and just use whichever arrives first at the other end to minimise latency. As I wrote recently (link), SD-WAN allows the creation of "Quasi-QoS".

Furthermore, an additional layer of intelligence and analytics allows the SD-WAN controller (sitting in the cloud) to learn which connections tend to be best, and under which conditions. The software can also learn how to predict warning-signs of problems and what the best fixes are. Potentially it could also signal to the app, to allow preventative measures to be taken - although this will obviously depend on the timescales involves (it won't be able to cope with millisecond transients, for instance).

But that is just the start, and is still just putting intelligence into the network, albeit an overlay.

What happens when the applications themselves get smarter? Many are already "network-aware" - they know if they're connected via WiFi or 4G, for example, and adapt their behaviour to optimise for cost, bandwidth or other variables. They may be instrumented to monitor quality and self-adapt, warn the user, or come up with mitigation strategies. They have access to location, motion-sensor and other APIs, that could inform them about which network path to choose.

But even that is still not really "learning" or AI. But now consider the next stage - perhaps a VoIP application spots glitches, but rather than an inelegant drop, it subtly adds an extra "um" or "err" in your voice (or just a beep) to buy itself an extra 200ms to wait for the network to catch up? Perhaps it is possible to send voice-recognised words and tone to a voice-regenerating engine at the far end, rather than the modulated wave-forms of your actual speech?

Or look forward another few years, and perhaps imagine that you have a "voice bot" that can take over the conversation on your behalf, within certain conversational or ethical guidelines. Actually, perhaps you could call it an "ambassador" - representing your views and empowered to take action in your absence if necessary. If two people in a trusted relationship can send their ambassadors to each others' phone, the computers can take over if there's a network problem. Your "mini-me" would be an app on your friend's or client's device and create "the illusion of realtime communications".
Obviously it would need training, trust and monitoring, but in some cases it might even generate better results. "Siri, please negotiate my mobile data plan renewal for the best price, using my voice". "Cortana, please ask this person out on a date, less awkwardly than I normally do" (OK, maybe not that one...)

Investment banks already use automated trading systems, so there are already examples of important decisions being made robotically. If the logic and computation can be extended locally to "the other end" - with appropriate security and record-keeping - then the need for strict network QoS might be reduced. 

Machine-learning may also be useful to mitigate risks from network unavailability, or security exploits. If the app knows from past experience that you're about to drive through a coverage blackspot, it can act accordingly in advance. The OS could suggest an alternative app or method for acheiving your underlying goal or outcome - whether that is communication or transaction - like a SatNav suggesting a new route when you miss a turn.

For some applications, maybe the network is only used a secondary approach, for error-correction or backup. In essence, it takes the idea of "edge computing" to its ultimate logical extension - the "edge" moves right out to the other user's device lor gateway, beyond the network entirely. (This isn't conceptually much different to a website's JavaScript apps running in your browser)

Obviously, this approach isn't going to work ubiquitously. Network QoS will still be needed for transmitting unpredictable real-time data, or dealing with absolutely mission-critical applications. Heavy-lifting will still need to be done in the cloud - whether that's a Google search, or a realtime lookup in a sales database. Lightweight IoT devices won't support local computing and maintain low power consumption. But clever application design, plus cognitively-aware systems, can reduce the reliance on the access network for many cases. It could just be argued that this is just a lower quality threshold, but at a certain point that coincides with what is routinely available from a normal Internet connection, or perhaps two or three bonded or load-balanced together.

But overall, just as we expect to see robots taking over from humans in "automatable jobs", so too will we see computation and AI taking over from networks in dealing with "automatable data". The basis for the network "translocating" data becomes less of an issue, if the same data (or a first-approximation) can be generated locally to begin with.

Tuesday, March 22, 2016

Is SD-WAN a Quasi-QoS overlay for enterprise, independent of telcos & NFV?

In the last two weeks I’ve been at two events: EnterpriseConnect in Orlando (EC16), and NetEvents in Rome. The former is a midsize trade-show, mostly UC/cloud-comms providers and vendors pitching to business users. The latter is smaller: vendors and a few SPs briefing and debating in front of technology journalists and analysts, mostly about enterprise networks, or the carrier networks needed to support them.

An interesting divide is emerging. Both events involved a huge focus on cloud – especially for communications apps and security functions. But it is mostly only the “traditional” carriers and their major vendors which are really discussing “proper” NFV and SDN as a platform for delivering new customer-facing services to businesses. For other enterprise vendors and service providers, NFV is not even on the radar screen as an acronym.

EC16 was dominated by major players in telephony and collaboration – vendors like Cisco, Avaya and Microsoft talking about cloud-based evolutions of their UC and conferencing tools; UCaaS providers like 8x8 and RingCentral with their own hosted platforms, or others based on BroadSoft. WebRTC, contact centres and cPaaS made a good showing as well. A few traditional telcos were there too, such as Verizon which has an LTE-based UC solution, and Sprint talking about its partnership with DialPad (formerly Switch.co). Slack wasn’t there, but other workstream-style messaging and collaboration tools were pretty ubiquitous, usually with a heavy mobile bias.

There was also a decent turnout of comm-centric vendors that make SBCs, UC/telephony servers and related infrastructure elements – Oracle, Dialogic, Metaswitch, Sonus, BroadSoft and peers. But while these were definitely talking about virtualisation, it was mostly not in the guise of NFV as perceived by the telecoms industry. There wasn’t much discussion of MANO and service-chaining, unless I specifically asked about them in meetings. Their use-cases for virtualisation were all much more pragmatic, aimed at non-telco UCaaS providers, or in-house deployments by enterprises in private-cloud or hybrid cloud/on-premise configurations.

The general assumption was that enterprises will continue to buy their collaboration apps/services separately to buying their network connectivity. Even where a UCaaS provider also sells access or SIP trunking, they’re not likely to be tightly coupled. There might be some "dimensioning" to ensure sufficiently-reliable performance, a separate MPLS connection entirely, or some tweaking of prioritisation to the UC provider's cloud. But there was no sense that a customer-facing UC server would be “just another VNF” hosted in the telco’s infrastructure alongside its vIMS and vEPC.

In a nutshell – corporate telephony and collaboration and contact centres are not really seen as “network functions”, any more than SAP or Office or other line-of-business apps are. (It’s worth noting that security functions like VPNs and firewalls are more aligned with NFV, as they are often integral with access connectivity). There's no real "telco cloud" either, except as an equivalent of an ISP-cloud or SaaS-cloud.

Maybe this will change in future as we see more telco "distributed cloud" and "fog computing" architectures emerge - for example, the Mobile Edge Computing initiative. But to be honest, I've got my doubts about that as well - a topic for another post, though.

However another form of software-based infrastructure for enterprise got airtime at both events: SD-WAN (software-defined WAN). I am starting to think that SD-WAN may actually reduce the potential for some proposed NFV business models, because it could put a new layer of abstraction between telco networks and corporate applications and communications.

In essence, SD-WAN allows the creation of “Quasi-QoS” by various methods. Perhaps the most important is the blending together of multiple access connections to a company’s sites – MPLS, vanilla Internet (perhaps x2 or x3), LTE and so on – and then load-balancing, bonding or using them for backup or differential routing of traffic.There are also various approaches involving hacking TCP in some way, or various proprietary approaches to packet classification and scheduling. Typically, SD-WAN will involve some sort of server or dedicated box at each customer site.

The following illustration is from SD-WAN vendor Velocloud, and is given as an example

This could put mission-critical applications onto managed connections, with less-demanding traffic onto Internet access. Or it could be “Internet-primary” and use more expensive connections only when congestion seems to be causing problems. It can also link into major IaaS and cloud platforms (Amazon, Google, Microsoft etc) in different locations and with large-scale connections. Many other use-cases and permutations are feasible as well, especially when linked with UCaaS or other SaaS offers.
In other words, SD-WAN could be described as “Arbitrage-as-a-Service” or “Managed Least-Cost-Routing”. Where the SD-WAN is offered by a company which isn’t one of the access providers, it is essentially “OTT-QoS” – although I think that “Quasi-QoS” sounds better.

I see this as a conspicuous threat to various forms of NFV-based enterprise service, especially what gets called NFVaaS or NaaS. By putting an overlay around access connections, it reduces the ability for any extra capabilities to be offered from within their infrastructure.

My colleague Martin Geddes has been scathing about this type of “QoS” (link) – noting that the underlying “network science” doesn’t allow for performance to be accurately predicted or guaranteed, without some very clever maths in the access network boxes. The “failure modes” can be ugly, as sudden sub-second spikes and buffering issues can occur and disappear randomly, trashing sensitive applications before the SD-WAN can respond.

My sense is that while that might be technically true, the real-world problems are more prosaic. Either a fully-dimensioned MPLS connection is too costly, or something fails completely because a fibre is cut, or a network node crashes and reboots. Or, is is the case here, the economics are so compelling that it's cheaper to just buy two redundant connections rather than optimise one.

The bottom line is that SD-WAN is potentially a game-changer - and it potentially undermines the NFV argument, not just for UC services, but perhaps other functions too. While some vendors are working with telcos to offer hybrid solutions, that's because of customer pull. This isn't to say that it invalidates everything proposed by NFV believers - far from it, in fact - but it does act as a counterbalance to the view that virtualisation is all telcos need to dominate enterprise connectivity and communications. 

SD-WAN entrenches the idea that enterprise communications and apps are decoupled from access. It also empowers Internet-based UCaaS providers to offer SLAs and QoS guarantees without owning access connectivity themselves - for example, Vonage works with VeloCloud, and Star2Star has its own connectivity boxes that optimise "vanilla Internet" access.