You cannot eliminate complexity for the SAME capability & certainty

The common saying in networking “you cannot eliminate complexity, you can only move it around” are words to live by. Providing insight into the combinatorial complexity in engineering tradeoffs. I find it useful to think about the issue in a slightly modified way “You cannot eliminate complexity for the SAME capability (& certainty)”.

Information theory points to the idea of entropy, ‘average level of “information”, “surprise”, or “uncertainty” inherent in the variable’s possible outcomes’. This leads to the insight that when you add information, you increase certainty / reduce uncertainty/surprise. MPLS has the concept of “entropy” labels to add more information to the MPLS data plane for purposes of better ECMP, as one example of the term being used in networking.

No alt text provided for this image

One of the common industry debates is which routing protocol is better than the other, and which is more “scalable”. This is almost always an apples and oranges comparison. You can legitimately ask the question “does a routing protocol implementation, on my supplier of choice, meet my needs, and have I previously operationalized it”. That is a slightly different, and pragmatic question, compared to the more abstract question of whether one protocol is better than another, which may hide assumptions about what is needed in network design/architecture.

If we return to the more “abstract” questions we can observe that different protocols can do different things. Then we might ask, why is this so? Sometimes the answer is just a matter of doing the engineering work. Regularly, it is a question of whether the information is available to do those different things, with a reasonable level of confidence.

Let’s take a router 24 ports attached to servers and 4 ports attached to other routers. A packet arrives on server port 1. What should the router do? Well, the router could just randomly select any one of the network/router-connected ports (or one of the other 23 server ports for that matter). If that is how decisions were made in networking, then packet loss would be high due to either endless looping or packets being dropped by the network due to expiration. Loop mitigation is done with techniques like TTL (time to live indicator), whereas loop avoidance is done by adding information about which routing choices will not lead to looping packets (according to the best current information). That information is automatically generated and shared across networks using routing protocols. Net-net, routing information reduces the surprise/uncertainty of forwarding a packet on a selected port.

Different routing protocols distribute different information. In a vanilla BGP network, only a single best routing option is advertised, and in traditional BGP, the advertisement of new information for the same destination/network prefix is an implicit withdrawal of the previous information. BGP evolution, including Addpath, can add some additional information, but if we compare traditional BGP to OSPF/IS-IS we see a big difference in information. OSPF/IS-IS advertises information about all routers/links in the network, and therefore any router can understand all the different paths through the network. This additional information can be used for capabilities such as ECMP (equal-cost multipath networks). Segment Routing and RSVP-TE can explicitly establish different paths through the network, leveraging the information that OSPF / IS-IS advertises. OSPF-TE / IS-IS TE also has the ability to advertise link information such as Traffic engineering metric, Maximum bandwidth, Maximum reservable bandwidth, and Unreserved bandwidth. Full topology and additional traffic engineering information may mean OSPF / IS-IS are managing much more information than BGP. This will come at some increased complexity, but that is like saying a Ferrari is more complex than a Honda civic. If you just need a Honda civic, great, why chew off the cost of something you don’t need. However, if you need to do a non-trivial amount of traffic engineering, then maybe you need the Ferrari and some really good mechanics/automation systems to keep it on the road, depending on how it is driven (there is always the option not to use all the capabilities).

No alt text provided for this image

Before leaving readers with the idea that OSPF / IS-IS are Ferrari’s and BGP is not, let us flush out the story a little bit more. OSPF and IS-IS were designed as interior gateway protocols, somewhat assuming they would not be used for connecting thousands of different networks from thousands of different entities, all doing their own network administration. Therefore policy has traditionally received more emphasis in BGP than OSPF/IS-IS. BGP policy management has and can be an area of differentiation between router suppliers, for example. If getting the policy right is your most important design need, then BGP is possibly the Ferrari in this context, and to have that policy capability, BGP adds information in configuration, and in routing protocol information, communities for example. Can BGP policies become complex?

Router supplies invest significant portions of their R&D on scaling. For the largest of routers, it is an ongoing activity. The scale of software/data structures, the performance of CPUs, the amount of memory, etc. are all things that have a tendency to improve over time and lead to more scale. At any given point in time, a specific routing protocol may not “scale” for a given set of capabilities/information, on a specific implementation. Technology advancement and available implementations make a difference. While this will always be part of the equation, an equally important part is the relationship between scale, complexity, and information. If you want the network to do more stuff, it is often going to be a requirement for more information (or some other dimension of complexity), and that may have implications for scale. Given equally smart engineers, different protocols are going to manage similar amounts of information for a given capability. Different capabilities require different dimensions of complexity, for example, more information/state.

A good comparison of routing protocols is what capabilities they support. Those capabilities will have a relationship with complexity, and more than likely, a similar level of complexity for a similar capability.

This site uses Akismet to reduce spam. Learn how your comment data is processed.