Dean Bubley's Disruptive Wireless: Debunking the Network QoS myth

Monday, October 30, 2017

Debunking the Network QoS myth

Every few years, the network industry - vendors, operators & industry bodies - suffers a mass delusion: that there is a market for end-to-end network QoS for specific applications. The idea is that end-users, application developers - or ideally both - will pay telcos for prioritised/optimised connections of specific "quality", usually defined in terms of speed, latency & jitter (variability).

I've watched it for at least a dozen years, usually in 3-year waves:

We had enterprise networks promising differentiated classes of service on VPNs or the corporate connection to the Internet. Avoid the impact of the marketing department watching cat videos!
We had countless failed iterations of the "turbo boost" button for broadband, fixed or mobile.
We had the never-realised "two-sided markets" for broadband, featuring APIs that developers would use to pay for "guaranteed QoS"
We had numerous cycles of pointless Net Neutrality arguments, talking about "paid prioritisation", a strawman of massive proportions. (Hint: no content or app developer has ever had lobbyists pleading for their right to buy QoS, only telcos asking to be able to sell it. Compare with, say, campaigns for marijuana decriminalisation).
We currently have 5G "network slicing" concepts, promising that future MNOs will be able to "sell a slice" to an enterprise, a car manufacturer, a city or whatever.
My long-standing colleague & interlocutor Martin Geddes is pitching a concept of app-focused engineering of networks, including stopping "over-delivering" broadband to best-efforts applications, thus forcing them to predict and right-size their demands on the network.

In my view, most of these attempts will fail, especially when applied to last-mile Internet access technologies, and even more-especially to wireless/mobile access. There isn't, nor ever will be, a broad and open market for "end-to-end network QoS" for Internet applications. We are seeing network-aware applications accelerating much faster than application-aware networks. (See this paper I wrote 2 years ago - link).

Where QoS works is where one organisation controls both ends of a connection AND also tightly-defines and controls the applications:

A fixed-broadband provider can protect IP telephony & IPTV on home broadband between central office & the home gateway.
An enterprise can build a private network & prioritise its most important application(s), plus maybe a connection to a public cloud or UCaaS service.
Mobile operators can tune a 4G network to prioritise VoLTE.
Telco core and transport networks can apply differential QoS to particular wholesale customers, or to their own various retail requirements (eg enterprise users' data vs. low-end consumers, or cell-site timing signals and backhaul vs. user data).
Industrial process & control systems use a variety of special realtime connection protocols and networks. Vendors of "OT" (operational technology) tend to view IT/telecoms and TCP/IP as quaint. The IT/OT boundary is the real "edge".

Typically these efforts are costly and complex (VoLTE was frequently-described as one of the hardest projects to implement by MNOs), make it hard to evolve the application rapidly because of dependencies on the network and testing requirements, and often have very limited or negative ROI. More importantly, they don't involve prioritising chunks of the public Internet - the telco-utopian "but Netflix will pay" story.

There are a number of practical reasons why paid-QoS is a myth. And there's also a growing set of reasons why it won't exist (for the most part) in future either, as new techniques are applied to deal with variable/unpredictable networks.

An incomplete list of reasons why Internet Access QoS isn't a "market" include:

Coverage. Networks aren't - and won't be - completely ubiquitous. Self-driving cars need to be able to work offline, whether in a basement car-park, during a network outage in a hurricane, or in the middle of a forest. The vehicle won't ask the cloud for permission to brake, even if it's got promised millisecond latency. Nobody pays for 99.99% access only 80% of the time.
The network can't accurately control or predict wireless effects at micro-scale, ie RF absorption or interference. It can minimise the damage (eg with MIMO, multiple antennas) or anticipate problems (weather forecast of rain = impact on mmWave signals).
End-user connections to applications generally go via local WiFi or LAN connections, which service providers cannot monitor or control.
No application developer wants to cut QoS deals with 800 different global operators, with different pricing & capabilities. (Or worse, 800 million different WiFi owners).
5G, 4G, 3G and zero-G all coexist. There is no blanket coverage. Nobody will pay for slicing or QoS (if it works) on the small islands of 5G surrounded by an ocean of lesser networks.

"Applications" are usually mashups of dozens of separate components created by different companies. Ads, 3rd-party APIs, cloud components, JavaScript, chunks of data from CDNs, security layers and so on. Trying to map all of these to separate (but somehow linked) quality agreements is a combinatorial nightmare.
Devices and applications have multiple features and functions. A car manufacturer wouldn't want one slice, but ten - engine telemetry, TV for the kids in the back seat, assisted-driving, navigation, security updates, machine-vision uploads and so on all have very different requirements and business models.
Lots of IoT stuff is latency-insensitive. For an elevator maintenance company, a latency of a week is fine to see if the doors are sticking a bit, and an engineer needs to arrive a month earlier than scheduled.
I don't know exactly how "serverless computing" works but I suspect that - and future software/cloud iterations - take us even further from having apps asking the network for permission/quality on the fly.
Multiple networks are becoming inevitable, whether they are bonded (eg SD-WANs or Apple's use of TCP Multipath), used in tandem for different functions (4G + SigFox combo chips), meshed in new ways, or linked to some sort of arbitrage function (multi-IMSI MVNOs, or dual-SIM/radio devices). See also my piece on "Quasi-QoS" from last year (link)

Wider use of VPNs, proxies and encryption will mean the network can't unilaterally make decisions on Internet QoS, even if the laws allow it.
Increasing use of P2P technologies (or D2D devices) which don't involve service providers' control infrastructure at all.
Network APIs would probably have to be surfaced to developers via OS/browser functions. Which then means getting Apple, Google, Microsoft et al to act as some sort of "QoS storefront". Good luck with that.
No developer will pay for QoS when "normal" service is running fine. And when it isn't, the network has a pricing/delivery challenge when everyone tries to get premium QoS during congestion simultaneously. (I wrote about this in 2009 - link)
Scaling the back-end systems for application/network QoS, to perhaps billions of transactions per second, is a non-starter. (Or wishful thinking, if you're a vendor).
There's probably some extra horribleness from GDPR privacy regulations in Europe and information-collection consent, which further complicates QoS as it's "processing". I'll leave that one to the lawyers, though.
It's anyone's guess what new attack-surfaces emerge from a more QoS-ified Internet. I can think of a few.

But the bigger issue here is that application and device developers generally don't know or care about how networks work, or (in general) have any willingness to pay. Yes, there's a handful of exceptions - maybe mobile operators wanting timing sync for their femtocells, for example. Safety-critical communications obviously needs quality guarantees, but doesn't use the public Internet. Again, these link back to predictable applications and a willingness to engineer the connection specifically for them.

But the usually-cited examples, such as videoconferencing providers, IoT specialists, car companies, AR/VR firms and so on are not a viable market for Internet QoS. They have other problems to solve, and many approaches to delivering "outcomes" for their users.

A key issue is that "network performance" is not considered separately and independently. Many developers try to find a balance between network usage and other variables such as battery life / power consumption together. They also think about other constraints - CPU and screen limitations, user behaviour and psychology, the costs of cloud storage/compute, device OS variations and updates, and so on. So for instance, an app might choose a given video codec based on what it estimates about available network bandwidth, plus what it knows about the user, battery and so on. It's a multi-variable problem, not just "how can the network offer better quality".

Linked to this is analytics, machine learning and AI. There are huge advances in tuning applications (or connection-managers) to deal with network limitations, whether that relates to performance, cost or battery use. Applications can watch rates of packet throughput and drops from both ends, and make decisions how to limit the impact of congestion. (see also this link to an earlier piece I wrote on AI vs. QoS).

Self-driving vehicles use onboard image-recognition. Data (real-world sensed data and "training" data) gets uploaded to the cloud, and algorithms downloaded. The collision-avoidance system will recognise a risk locally, in microseconds.

They can focus resources on the most-important aspects: I saw a videoconference developer last week talk about using AI to spot "points of interest" such as a face, and prioritise "face packets" over "background packets" in their app. Selective forwarding units (SFUs) act as video-switches which are network-aware, device-aware, cost-aware and "importance-aware" - for example, favouring the main "dominant" speaker.

Another comms developer (from Facebook, which has 400 million monthly users of voice/video chat) talked about the variables it collects about calls, to optimise quality and user experience "outcome": network conditions, battery level before & after, duration, device type & CPU patterns, codec choice and much more. I suspect they will also soon be able to work out how happy/annoyed the participants are based on emotional analysis. I asked about what FB wanted from network APIs and capabilities - hoping for a QoS reference - and got a blank look. It's not even on the radar screen.

At another event, GE's Minds and Machines, the "edge" nodes have a cut-down version of their Predix software which can work without the cloud-based mothership when offline - essential when you consider the node could be on a locomotive in a desert, or on a plane at 35000ft.

The simple truth is that there is no "end to end QoS" for Internet applications. Nobody controls every step from a user's retina to a server, for generic "permissionless innovation" applications and services. Paid prioritisation is a nonsense concept - the Net Neutrality crowd should stop using that strawman.

Yes, there's a need for better QoS (or delta-Q or performance management or slicing or whatever other term you want to use) in the middle of networks, and for very specific implementations like critical communications for public safety.

The big unknown is for specific, big, mostly mono-directional flows of "content", such as streaming video. There could be an argument for Netflix and YouTube and peers, given they already pay CDNs, although that's a flawed analogy on many levels. But I suspect there's a risk there that any QoS payments to non-neutral networks get (more than?) offset by reverse payments by those networks to the video players. If telcos charge Netflix for QoS, it wouldn't surprise me to see Netflix charge telcos for access. It's unclear if it's zero-sum, positive or negative net.

But for the wider Public Internet, for consumer mobile apps or enterprise cloud? Guaranteed (or paid) QoS is a myth, and a damaging one. Yes, better-quality, better-managed networks are desirable. Yes, internal core-network use of better performance-management, slicing and other techniques will be important for telcos. Private wireless or fixed broadband networks, where the owner controls the apps and devices, might be an opportunity too.

But the concept of general, per-app QoS-based Internet access remains a dud. Both network innovation and AI are taking it ever-further from reality. Some developers may work to mark certain packets to assist routing - but they won't be paying SPs for an abstract notion of "quality". The notion of an "application outcome" is itself a wide and moving target, which the network industry only sees through a very narrow lens.

1 comment:

Andrew von Nagy said...: Hi Dean, good article as usual. I have been looking at Martin Geddes work, along with Predictable Network Solutions, very carefully because I think it has immediate impact and utility in "characterizing" a network path's quality over time. As a long-time IT professional that specializes in networking and Wi-Fi, I have seen how horrible network monitoring, measurement, and insights are to achieve. We have been stuck looking at interface counters, bandwidth, utilization, and related buffer and packet loss statistics on individual nodes in isolation. There is a real issue with being able to understand how a network performs in aggregate, end to end, and then to quickly diagnose issues to understand where/why/when an problem occurred. Often times with modern networks everything appears to be running fine, but a user experience issue manifests, sporadically, and is difficult to track down. That's what I like about their approach with delta-Q... give me an operational tool to characterize and baseline a network's (or network path) quality. Then at least I know what to expect. More importantly, then we can start to bridge the application development and IT networking worlds with concrete information about what applications should be able to expect from the network. If I have concrete data about what the delay and loss characteristics are of a network path or traffic path, then I can have an informed discussion with an application developer who is writing a new app or who is troubleshooting a user experience issue with an existing application already deployed. We have such metrics with other IT systems such as databases and storage. It's well past time we are able to characterize network performance, end to end, in an operational fashion.

Will we try to engineer a level of quality... absolutely. But there will be limits of what is realistic based on effort and value - the old 80/20 rule.

Will we try to sell a level of quality... as you stated, probably not. It's too complex and fragmented, with many modern applications traversing multiple network operators end to end. It could be achievable for specific use-cases in private networks, and that holds tremendous value for businesses. But I doubt it will develop into a marketplace in any fashion, as you have stated.

Thanks for your analysis.

Cheers,
Andrew von Nagy
@revolutionwifi; 8:46 pm

Neutral Host Networks Workshop, 31st March

3rd Neutral Host + vRAN Workshop - Mar 31st London

Small-group interactive workshop day on NHNs & vRAN

Will cover: Metro 4G/5G densification, rural coverage, in-building wireless, multi-site & campus enterprise, road/rail, small cells & spectrum-sharing models

Technology, regulatory & business model issues - what's real in NHNs, and what's just slideware? Where does vRAN & OpenRAN fit with the move to shared/neutral networks?

Click HERE or email information AT disruptive-analysis DOT com for registration

Speaking Engagements & Private Workshops - Get Dean Bubley to present or chair your event

Pages

Monday, October 30, 2017

Debunking the Network QoS myth

1 comment: