A Deeply Reinforced Internet Control Plane

If there was ever a problem suited to Reinforcement Learning, it is network optimization: many combinations, changing conditions, the need for continuous learning, and optimization through trial & error, over a period of time, to achieve long term quality & profitability objectives . Today’s leading edge of machine learning (ML) is deep reinforcement learning, the combination of reinforcement learning and deep learning.

Not only is network optimization a continuous learning problem as traffic flows change, but so is defense against cyberattacks. Attacks on networking infrastructure, and attacks on whatever is attached to network infrastructure. The worst kept intelligence secret is that the world is in a permanent state of non-peace. There may not be bombs dropping everywhere, but the major technology powers are cyberattacking each other, at will, and to date, without significant consequence. The major powers are not in a state of “war”, but their digital attacks on economic, military, and political targets mean they are also not in a state of peace. They are in a state of non-peace, neither peace nor war. Lest we are lulled into thinking AI/MLwill just be used as technology for network optimization, it will also be used in the service of cyberattacks, constantly learning responses to cyberattacks and looking for new attack angles and vulnerabilities.

Future Internet control plane leadership will include not only the ability to optimize resources, provide application-specific performance objectives, ease the management of complexity, but also, secure against both malevolent and non-malevolent vulnerabilities; importantly, state-sponsored cyberattacks.

AI/ML has made significant progress in the last few years due to massive compute/storage scalability, data out the ying-yang for those that have it, cloud services, open source software and silicon acceleration — not just graphic processing units or chips with a few AI cores, but now, wafer scale chips with 400,000 AI cores. There is more dramatic change coming as researchers and practitioners take on AI/ML’s hardest problems.

Unsupervised learning is where inference is drawn from no labeled/described data. Supervised learning is where many of the best results have come from and involves learning from labeled/described data. Historically, labeling has been a manual effort. Just as the conversation about digital sweat shops is now emerging, armies of low paid data labelers, attention has turned to other approaches. Weakly supervised learning tries to leverage labels that are already in the data, for example social media image processing can be improved by using associated hashtags, even if they are noisy. Semi-supervised learning uses a teacher set of labeled data, that is then input to learning on a much larger dataset, and ultimately refined by using the original dataset. This provides Similar accuracy as other approaches, but with much greater efficiency / fewer parameters. There is also self-supervised learning, where the structure is inherent within the dataset. Examples include one pattern overlaid on another and language classification tasks.

In Internet/networking/security use cases, perhaps one takeaway for those creating protocols and especially telemetry data is to make sure that the data is created with labels and structure that can be easily adapted to a range of supervised learning techniques. Though reinforcement learning is a slightly different animal, deep reinforcement learning is the combination of deep learning and reinforcement learning, so we want to have both in a deeply reinforced Internet control plane.

Reinforcement learning is targeted at problems where you want to optimize decision making over a long sequence of actions, often in a time series. Reinforcement learning has been used in Chess, Go, and similar games. In an Internet/networking context, what we are going to want to do is get the best resource optimization over a time series, and also play a game of “chess” with those launching cyberattacks at the infrastructure. I have always simplified security as ensuring that the right things, are using the right resources, in the right ways, and for profit-making companies, the most profitable ways. As we see policy-based SD-WAN/SD-BRANCH and cyberattack security come together, this has never been more true.

Learning through trial-and-error is the basic paradigm for reinforcement learning. Do something, do something else, see which generates the best “reward” and learn from it. How rewards are designed becomes extremely important in this process. Anyone who has ever been involved in sales incentive plan design has thought long and hard about how rewards impact outcomes. On one hand it is simple. “Sales people are coin operated” so the saying goes. On the other hand, it can sometimes be difficult to design rewards that achieve the outcome you want with so many conflicting goals and motivations, for example a sales person asking them self if they should spend their time ramping up on selling something new, or should they stick to what they know best. One reaction to that problem is provide a huge incentive to do the new thing. That adds expense and may end up hurting total revenue goals. While I am inadvertently making the case for reinforcement learning being used in sales compensation design, I’m just using it as an example that many people can relate to. In a network, there may be competing resource requirements for services of different value or other constraints.

One of the challenges with a trial & error approach to learning is the errorpart of the learning. Doubly so with critical infrastructure. To address this issue, simulation will get a close look. In some decade’s simulation is the golden-haired child of the sciences, in others it is a pariah. With the need to get things like autonomous vehicle models more robust as soon as possible, without fatalities, simulations going to get more attention. I suspect it will also come into vogue for IT infrastructure modeling, and specifically Internet control plane applications of AI/ML.

In the original incarnation of SDN (software-defined networking) there was a significant focus on how to commoditize networking equipment. With SD-WAN/SD-BRANCH/SEDANs, we are seeing a kind of SDN2, a controller-centric paradigm aimed at secure, enterprise driven, policy/intent-based, application performance centric networking. On the infrastructure side of networks, the underlay, there has also been some controller-based networking as well. Perhaps this SDN2 lays the groundwork for massively scaled, centralized, deep reinforcement learning that combines deep learning and reinforcement learning. Will we see router line cards with AI chips on them as well? Perhaps. Though it was not too long ago that somewhere complaining that even IPv6 processing added a lot of extra work to packet processing. I don’t think we’ll be seeing wafer scale AI subsystems on router line cards this year. Maybe next year 😉 The more important question is where should training be done, and should it be done at a local level or a global level.

The one caveat on all this of course is how the Internet ecosystem responds to responsible AI, and if there is anything to be learned from Facebook’s trials and tribulations, it is better to get ahead of the curve before it bites you. Even if a USD 5 billion-dollar fine is a slap on the wrist for Facebook, there are plenty of tech companies that don’t have that kind of cash laying around, and Facebook’s fine created ripples through the valley. Tech, and especially big tech, has enough black eyes already from the perspective of those living outside Silicon Valley, and increasingly, even the left-wing of politics that tech has historically been so strongly aligned with. Tech needs to get it together on responsible AI: explainability, fairness, and privacy.

Putting “layer 8” aside, politics, from a purely technological perspective, where the game in networking may have once been all about feeds and speeds to keep pace with hyper growth and the broadband buildout, increasingly it will be about operational excellence: productivity, security, end user experience/personalization/private networks and responsibility. AI/ML is going to play a significant role in this sea change.

AI/ML may well determine the next generation of winners and losers, both on the network infrastructure side of the industry, and on the technology supplier side of the industry. It may also determine geopolitical winners and losers. If you do not have an AI/ML strategy. Get one. Let me deeply reinforce that — GET ONE! Your business and nation may depend on you doing so.

This site uses Akismet to reduce spam. Learn how your comment data is processed.