If we think of traffic engineering (TE) as the real-time to long-term mechanisms for networks achieving performance objectives & customer SLAs, in the presence of network failures & topology changes, then we might assert there are three forms of TE:
- Throw huge amounts of bandwidth at it
- Throw less bandwidth at it and have a light-weight approach to TE (perhaps including ECMP)
- Throw the least amount of bandwidth at it and have a fine-grained approach to TE
IETF Segment Routing Working group, recently adopted draft-dong-spring-sr-for-enhanced-vpn, dealing with service-aware segment identifiers, that can reserve network resources and build virtual networks. Both that draft and a referenced TEAS (Traffic Engineering Architecture & Signaling) document https://datatracker.ietf.org/doc/draft-ietf-teas-enhanced-vpn/ a) deal in part with how to address 5G capabilities such as network slicing and b) make reference to a hybrid control plane mode that combines a centralized controller with a distributed control plane.
Which leaves us where we often find ourselves in networking, one size does not fit all. For IP Network Architects, there needs to be some understanding of the strategy/value proposition of the network, whether it is guaranteed SLAs per service, for-profit, or just high availability and productivity in an enterprise context.
Throughout the history of networking, there has always been this implied tradeoff between throwing bandwidth at the problem and more complex optimizations, with the associated operations overhead. More simply, CAPEX vs OPEX. In the above diagram, I attempt to capture this a little. The chart is trying to capture the idea that the more a network manager attempts to finely control bandwidth and constrain bandwidth growth by optimizing the network as much as possible, the more operational expense is incurred through increasing configuration, engineering, and recovery from failure complexity. The chart is very coarse, and illustrative, in its depiction and could be improved in many ways. Throwing bandwidth at the problem is not necessarily OPEX free, some engineers believe there is a cost associated with each link that has to be managed, hence, the move from current high-speed technology to the next new high-speed technology can yield both CAPEX (price-performance) and OPEX (fewer ports to manage / bit per watt / …) gains. Another improvement to the chart might be showing SR-TE as having less OPEX burden than IP/MPLS TE, but I did not want to get into that debate in this article, because there is a more important point to be made.
The more important point to be made is whether it is any longer true that using a more complex / finer-grained approach to TE incurs more OPEX than other approaches. As the above illustrative chart shows, what the industry is certainly hoping to do in next-generation networks, is bring some of the operational complexity and operational cost down of adopting sophisticated approaches to TE, by mitigating the increased complexity with automation, telemetry, and controllers.
Now an IP Architect still needs to have her business hat on and understand which approach to TE is required to deliver the brand promise of the company she works for, but if the answer is IP/MPLS TE, SR-TE or VPN+ with 5G Network Slicing, then telemetry-based automation and controllers should make that decision a little simpler and less costly. That should be the goal at least. Of course, automation can be applied to any form of networking: no TE, lightweight TE, or heavyweight TE. Yielding benefits in all cases. Where there is the most complexity, and where the problem domain is hardest for humans to calculate, either over the long run or in real-time, is where automation, AI/ML/analytics, and controllers will of course be top of mind for IP architects.
Note: Thanks to Bruno Risjman for suggestions on the graphics.