Work in Progress. Far from complete or polished.
We Automate So We Can Grow
The Internet has leveraged automation from its inception. The synchronization across distributed routers, of a common view of network routes/paths/topology, is a completely automated process – no human intervention is required, nor is any centralized control mechanism. The Internet would likely not exist today if not for this characteristic of IP networks. The available manpower to bootstrap the Internet, the distributed/multiple administrative entities, and the rich multipath connectivity, would have all been inhibitors to growth without the automation provided by routing protocols/the IP control plane.`
A fully automated distributed control plane frees network managers from the grunt work of creating and maintaining routing tables, and adapting to changes in topology. This frees network managers to focus on other, higher-valued activities, that have lead to today’s scope of diverse applications and services, and in the cannibalization of previous ways of delivering a number of existing services, for example voice and increasingly video. Not to mention how quickly IP networks adapted to the new need for work from home group video conferencing during the 2020 COVID-19 global pandemic.
While the IP control plane is a wonder of automation, nuances, quirks, and challenges notwithstanding, there is much about IP networks that are not automated. There remains much grunt work in running IP networks, and aspirations can be difficult to fully realize. IP networks support a large scope of different applications and services because it is flexible and fungible. That same flexibility and fungibility create a large scope of optionality. In other words, there are many tradeoffs and choices to be made. The industry is reacting to this option space in different ways: software-defined networking, analytics, streaming telemetry, programmatic interfaces, AI/ML, new approaches to IP control and forwarding planes, converging L2 & L3 VPNs on BGP, RIFT, SD-WAN/SASE and more. These now become part of the domain of IP architects, engineers, and operations professionals. This is all emerging at the time of the most disruptive force in IT in decades, arguably since client/server. Cloud.
There have been two fundamental IT disruptions caused by cloud. Customer experience and operational excellence. IP architects, engineers, and network managers are challenged and inspired to meet the expectations of the cloud era; an era where off-box / out-of-band signaling plays an important role in automating the totality of the network operations environment; where the operations professional is not a CLI expert, but a software developer.
While cloud forces networking professionals to consider how IP networks take the next step forward, the fundamentals of IP network architecture have not changed, because they are derived from automated distributed control plane architecture: hierarchy, addressing, summarization, redundancy, protocol overhead, convergence time, and network stability in the face of change. All these critical aspects of IP network architecture remain.
IP networks are at a moment of great innovation. Cloud hyperscalers are now key influencers in network directions. SP networks are going through significant change. Enterprise networks are going through the biggest change in many decades. IP architects and engineers must understand the consequences of all these forces, while still paying attention to what has always been important in IP networks. Automation is an opportunity to relieve IP professionals of the remaining grunt work in designing, planning, and operating IP networks so their experience and intellectual capital can be invested in achieving tradeoffs and aspirations that are currently challenging to realize.
So much of today’s automation conversation is about orchestration, workflows, abstractions, and alike. All important. IP networks are unlike any other layer of the network, though. Therefore, IP architects, engineers, and managers have a special role to play, in the automation journey.
Automation has always been a central characteristic of IP networks, and continued automation, following IP principles, with a deep understanding of network tradeoffs, will be a continuing source of network growth, and value-add for individuals, enterprises, governments, and society as a whole. Automation is also a catalyst for the next phase of growth by IP networking professionals.
We automate so we can grow.
Tradeoffs in IP Network Architecture
There are many ways to talk about the tradeoffs in network architecture. Some of the thinking in this article has been influenced by the SOS model discussed in the Network Collective series (state, optimization, and surface). So a hat tip to that work, that views network design decisions through the lens of complexity.
The discussion that follows draws on the SOS (state, optimization, and surface) model discussed by the Network Collective, reframes it a little, and expands on it. The SOS model comes at the question of tradeoffs through the lens of complexity. That lens is illuminating. There is no doubt that state is an issue the industry has discussed extensively over a long period of time. Surface/interdependencies is an issue that impacts a failure bringing down other components. Optimization is what we want in networks, especially in a time where SP revenues are not growing and cloud hyperscalers are focused on operation excellence. In between a network where there are not optimization tools in the toolbox and all you do is add more bandwidth, and a network which is fixed and rigid and no change is possible, is a highly-optimized network responding to changing conditions while maintaining the network’s mission. Capturing that in a holistic way is one of the challenges of the time we live in. There is probably no perfect model for this, and anything approaching a holistic model will take time and iteration. This section attempts to take a step in that direction.
However, called to simplify, automate, and optimize networks, the industry is exploring tradeoffs:
- Putting information in IP packet headers, versus the control plane.
- Augmenting the distributed IP control plane with centralized analysis and global optimizations.
- Reducing the number of control plane protocols
- Source-based routing
- Converging all VPN services on a single protocol (BGP)
- Moving the network edge out to servers, and the applications/containers running on them
- Network fabric as a black box with a simple interface (RIFT)
- Creating overlay networks that span multiple service providers and access modes, without much interaction with the underlying infrastructure
Automated Distributed Control Plane
IP networks use an automated distributed control plane.
- Distributed: runs on multiple IP routers (usually all)
- Control plane: synchronizes a common view across routers of network routes/topology
- Automated: once configured, synchronization of the common view occurs without human intervention
There are multiple considerations in achieving desired scale, reliability, capabilities, and economics. IP Network Architecture touches on all of these.
IPv4, IP/MPLS, Segment Routing MPLS, Segment Routing IPv6, IPv6
“IP” networks come with different forwarding planes, the protocol formats used to move traffic from a source to a destination. However, while there are significant differences in this aspect of IP networks, there are many similarities with respect to the control plane, and that is why, all these networks can be rightly considered, IP Routing.
Regardless of whether a network manager is operating an IPv4, IPv6, IP/MPLS, SR MPLS, or SRv6 network, there will be some similar considerations:
- Partitioning the network into core, distribution, and access.
- Summarizing routes to reduce the size of routing tables.
- Having an approach to addressing that supports summarization
- Assessing the impact of redundancy options on control plane complexity
- Reducing routing protocol overhead
- Improving convergence times
- Improving network stability in normal operations and in response ot topology changes
Whether a network is IPv4, IPv6, IP/MPLS, SR MPLS, or SRv6, OSPF, IS-IS, and/or static routes will be used as interior routing protocols and BGP4 as the exterior routing protocol. Many of the same considerations apply to all these networks, even if some of additional control plane protocols and considerations. Some of what network architects have learned in any one of these IP network types, is applicable to all of them. The one thing they all have in common is the use of IP addressing, and IP routing protocols.
To be sure, there are implications of using MPLS labels vs IPv4 headers, or segment labels for that matter. Those implications are relevant to the network architect. We start our journey here, looking at an IPv4 network, because many of the considerations will be the same for other flavors of IP networking.
IP Network Architecture Summary
In IP networks, some of the key architecture concerns are:
- Partitioning the network
- Managing the size of the routing table
- Redundancy of routes and routers
Partitioning the Network
- Core, distribution, and access can take a large problem and breal it down into smaller problems with each having its own role/responsibilities.
- The core should be focused on moving as many packets as possible, as fast as possible, through as few hops as possible, with the least complexity possible. Policy should normally be avoided. Full reachability is common.
- Distribution should reduce the size of routing tables by route summarization to core and access.
- Access routers only need enough information to get to distribution routers & they also need to implement customer-facing functions.
Managing the size of the routing table
- No router should have more information than it needs.
- Route summarization is one of the keys to managing the size of the routing table.
- Route summarization is often implemented in distribution routers.
- The ability to do route summarization depends on addressing.
- Topology-based; router attached to addressing is one of the most effective approaches.
Redundancy of routes and routers
- Helps route around failures.
- Redundancy can introduce its own complexity and confusion, sometimes requiring routing advertisement configuration.
- A partial mesh can be better than a full mesh.
- Link redundancy can sometimes be achieved by Optical and/or Ethernet switches/layers.
A common approach to addressing big problems is breaking them into smaller problems. The term “layers” means many things in a networking context, including the layers of a protocol stack. IP network architecture has traditionally dealt hierarchical routing with three layers:
- Core: focused on high-performance forwarding
- Distribution: focused on effective/efficient route advertisement
- Access: focused on customer-facing features
Having reliable links/physical equipment & interfaces is important for overall network health. IP Network Architecture is first and foremost about achieving stable topologies, because when the IP routing topology does not converge, network health disintegrates / “melts”.
In traditional IP networks, when routing protocols are converging, routing loops occur, and therefore no routing protocol can provide correct forwarding information while converging / in a state of transition. See “Segment Routing” for further discussion of mitigation for this issue.
Convergence time is influenced strongly by 1) the number of routers and 2) the amount of routing information each has to process. Summarization is one approach to reducing the amount of routing information each router needs to process. Large chassis designs traditional reduced the number of routers in a network. See “Spine and Leaf” for a discussion of approaches to routing in those architectures.
Good topology design, addressing schemes, & summarization are the basics of ensuring that the impact of a topology change is constrained and that routers have the minimum amount of routing information possible.
Through summarization, an access link error should not impact the core, and a core line error should have marginal impact on access routers.
In traditional IP networks, all traffic was aggregated up and through the core, so the most important role of the core was forwarding as many packets as possible, as fast as possible. Cloud computing, peer-to-peer, traffic offload, peering trends, person-to-person communication, and other trends have changed traffic patterns to the extent that all traffic may no longer go through the core. However, network managers will need to judge for themselves what traffic patterns they are anticipating, how easy it is for different types of routers to accomodate those flows, and what their core needs are.
Core should be able to route to any device in the network, and therefore should have full “reachability”. Summary routes are used to reduce the size of the core routing table. Default routes can be used for reaching external destinations, for example Internet hosts. Not using default routes for internal destinations may improve core redundancy while reducing suboptimal routing and routing loops.
- Isolating the impact of topology changes
- traffic aggregation
- route aggregation/summarization
In reality, there may be multiple aggregation points in a network, especially in some of the newer 5G networks with front, mid, and backhaul. Generally though, traffic aggregation is often driven by equipment economics and scalability: is it more cost effective to have separate aggregation point for “low” speed links or is it more cost effective and scalable to have all low speed links come into a single aggregation point. Often, high-capacity routers are not cost-effective at aggregating much lower-speed links.
That said, points of aggregation create points in the network where the distribution layer can exist, summarizing routes to reduce the routing information other routings need to hold.
- On-ramp for traffic
- Access control
- Some edge functions
Packet filtering places multiple roles including not forwarding traffic originating from outside the attached local area network (or other device facing link) and protecting devices attached to the access router from being attached from outside/inside the network. Common filters include:
- Preventing packets with a source other than the access filter (no spoofing)
- Preventing packets that have a broadcast address as a source (255.255.255.255 & the segment broadcast address).
Other access functions:
- Quality of service tagging
- Tunnel termination
- Traffic measurement
- Polio-based routing
It is often recommended that policy actions be taken outside the core. Anything that impacts packet processing capacity and/or adds latency, is discouraged from being implemented in the core. In addition, a policy error in the core could impact the entire network, where as an edge policy error will often be constrained in its impact. Generally speaking, it is recommended that policy be placed in the access layer or at the border between the access and distribution layer.
Destination-based routing: the most common routing in IP networks, what is the best effort / least cost path to a destination.
Policy-based Routing: routing based on fields other than destination address. Create a filter, build a policy, implement a policy. On Cisco routers, policy created with “route maps” and implemented with interface commands.
See introduction to IP Addressing.
It is difficult to change addresses after they are allocated and used because many devices may need changing, as a result. So upfront planning and architecture consideration is important. Address plans can impact network stability because it impacts the number of routes propogated when topology changes. Summarization is a key tool in managing network stability when topology changes occur.
- Distance topology changes travel.
- Routing table size.
If the distribution layer aggregates all the access subnets into one summarization route, and advertises that summarization route to the core, then when an individual access subnet goes down, the core routing table does not have to change.
The approach to addressing that provides the most opportunity for summarization is topological addressing, or addressing based on the router attached to. It may be possible to combine topological addressing with departmental addressing, for example, using the last two octets as department ranges. This may combine some good qualities of topological addressing with some good qualities of quickly recognizing the department associated with the address. It may also cut down on opportunities to summarize.
There may also be tradeoffs between addressing for summarization and growth, and conserving address space.
A network without redundancy will have many single points of failure, disconnecting hosts from services they depend on, when a failure occurs. Alternative paths around points of failure are provided by redundancy. Redundancy can also cause problems.
Some principles that can be adopted to redundancy:
- Only use redundant paths when the normal path is broken.
Redundant core design
Most core routers will have complete routing information, so routing loops are not likely, though suboptimal routes are possible.
- Hop count reduction
- Available path reduction
- Number of failures that can be handled
Ring designs are frequently used and can be created using a true ring-based networking technology (though few of these are any longer used above the optical domain) or by simply having two links from each router to other routers, constructed in the pattern of a ring. A ring topology provides two paths to any given destination from every device, a bounded number of hops before and after a link failure, and losing any two links isolates at least one section of the network.
A full mesh design, a link from every router to every other router provides the most redundancy. This usually works well in the core, where there is sufficient fiber / connectivity. In a four router core, three links would have to fail before any destination became unreachable. In other layers of the network, for example access, additional fibers may need to be pulled for a full mesh to be possible. Network managers may decide that the economic cost of pulling the fibers, especially over long distances, is not worth the additional redundancy.
A full mesh entails the expense of additional links and care when making changes. Traffic engineering / link sizing many also become more complex/confusing if it cannot be easily determined which path traffic normally takes. For these reasons, sometimes network managers implement the compromise of a partial mesh. So for example, a partial mesh of six routers with four links on two, and two links on the remaining four, may create a scenario of 3 hops in normal operation and 4 hops when a link fails. As a partial mesh grows, it is believed that hop counts tend to stay low.
Dual homing to the core is an option for distribution routers. The doubling of paths also doubles the possible routes, which increases convergence time. Various approaches to reducing the number of routes are available including conditional advertisement and floating static routes. Dual homing of distribution routers can lead to scenarios where core transit traffic ends up passing through a distribution router, if a link between two core routers fail. If anticipated and planned for, this is ok. If it is not desirable, then route advertisement configurations can be made to prevent this. Routing advertisement of reachability across both of the dual honed links can also lead to “routing information leaks”, instabilities, and increased convergence times.
Access routers can also be dual-homed or their can be a single link from each access router towards the distribution layer, and links connecting the access routers. One way to avoid some of the complications of dual homing is to only advertise routes learned below a router in the hierarchy.
Each route source has a default administrative distance, which can usually be overridden through configuration. On many implementations, static routes have a low administrative distance for example 1 (with connected being 0). By configuring a static route with a high administrative distance, for example 200 or above, a “floating static route” is created, one which is only used as a backup route or a conditionally advertised route.
Redundancy can also be created below the IP layer, at the Ethernet and/or Optical layers, using switches. Physical layer redundancy can be easier to implement and easier to maintain/manage, depending on the roles and responsibilities of each layer. Physical layer redundancy below the IP layer does not provide redundancy for a router failing.
Routing table size is one of the keys. Summarization is the primary way of controlling. Summarization is highly dependent on addressing.
Routing Protocol Overhead
- Unicast instead of multicast (though may require extra configuration)
- Multicast instead of broadcast
- Incremental change updates instead of full updates
- Reduced frequency: minutes is better than seconds.
Area borders are were summarization is done.
OSPF has a two-level hierarchy: core area, and areas off that core. So not ideal for a thee-layer hierarchy.
Example: Area 0, all core routers. Distribution routers in other areas.
- Small Area 0
- Only one path to each distribution router, so no sub-optimal routes
- Distribution layer and remote site redundant links in the same area
- Core routing table will be small as summarization done at core/distribution border
- Summarization at the core when ideally it would be done in distribution layer
- Router scaling and redundancy
Another option is to have the links from distribution routers to core included in Area 0. If there is only one link from distribution router to core, that will avoid distribution routers becoming transit nodes for core. So topology is important if distribution links are part of area 0.
In general, it is best to keep dual homed links from an access router, within the same area.
With the exception of stubby areas, external routes are flooded throughout the network. So best to minimize external routes with OSPF.
In OSPF, conditional advertisement is only for default route, unlike BGP.
Area 0 cannot be a stub. Three types of stub areas in OSPF:
- Stubby: Default route is used to reach externals. External routes are not advertised to or from stubby areas.
- Not-so-stubby areas (NSSAs): External routes can be generated from area, but are not advertised into area unless generated from the area. Used for areas that don’t need internals but generate externals.
- Totally stubby: Default route is used for all routes, internal and external. Generally only used when there is one area boundry.
If an area has significant traffic with externals, and has a small number of routers, then leave as normal area. Flooding the externals into the area probably not an issue with small number of routers.
An area without much contact with external networks might be a candidate for a stub.
Areas where there is not much chance of suboptimal routing can be stub areas.
For two routers to be OSPF adjacent they must have matching Hello interval, Dead interval, wait time, and link type. MTU must also be compatible – some routers limit the MTU to the smallest MTU. In addition, on a LAN, non-DR routers may not become full adjacent, they get routing information about each other from the DR.
Try to dual home a remote into routers in the same area.
OSPF always prefers intra-area links to inter-area links.
- Cisco OSPF Design Guide
- Cisco Live: OSPF in modern deployments (includes OSPFv3)
- OSPF Design
- Designing scalable OSPF design
Intermediate Systems – routers
- Flexible timers, fast convergence, good handling of IP instability
- Changes in IP routing do not impact main function: Connectionless Network Service (CLNS)
- L1 routing within areas
- L2 between areas
- L1 routers only understand the topology of their area.
- Two L2 routers cannot be separated by a L1 router.
If L2 routers are split through a failure, L1 links will not be used to repair, so provide enough redundancy between L2 routers.
Minimize routers that are both L1 and L2
IP network summarization only occurs on L2, so place L2 where summarization will take place.
If all routers in a single L1 area, then all routing is optimal, but no summarization.
Try to avoid having too many access routers attached to any distribution router, as this makes administration more difficult.
If traffic goes from access to access router commonly, it may be optimal to have distribution layer be L1 only.
- How common is access to access router flows, and how important is suboptimal routing
- How many access routers will connect, after growth, to any given distribution layer
- How many distribution layer routers will there eventually be
If you don’t know the answer to these questions, only run L1 in distribution, and reduce the number of routers running L1 & L2. Though this will result in summarization in the core, but stability, network size, and traffic have to be considered.
For common internal destinations / services, optimal routing is probably important, so consider running L2 (&L1).
Link state flooding
- Flooding causes partial SPF to run
- SPF runs eat CPU & memory so you want to watch
- Any time a link between two routers or a router fails, a full SPF is run
- Linl-state packet age out causes originating router to re-flood, which leads to a full SPF on every router. By default every 20 minutes, which can be significant for a large number of routes.
A switch can corrupt an LSP, leading to originator reflooding, leading to another corrupt LSP, leading to a storm. Routers can be set to ignore LSP errors, by configuration, which is not good, but maybe better than a storm.
Overflowing the Database
If the database on a small router overflows, it sets the overflow bit in its advertisements and the other routers know it has an incomplete topology, therefore they do not rely on paths going through that router.
- Default interface cost: 10
- Internal: 0-63
- External: 64-127
Small ranges lead to SPF efficiency, but not a big space to play with. Important consideration in IS-IS networks so total hop count is not limited.
Neighbor relationship not formed:
- Misconfigured NSAPs
- Integrated IS-IS across a point-to-point link and IP addresses not on the same subnet
Hierarchy, addressing, summarization, and redundancy are essential components of design.
BGP typically carries a large amount of routes compared to IGPs, and is used to overcome IGP scaling limits, even in the best network design.
Policies are hard to define and enforce with IGPs.
BGP has ways of coping with instability when joining networks/complex policies: communities, AS_PATH filters, local preference, Multiple Exit Descriminator (MED), and route dampening – route is suppressed if it is known to change regularly over time.
- Routes learned from eBGP peer is propagated to all peers.
- Routes learned from an iBGP peer – propagated to eBGP peers.
- Routes originated locally – propagated to all peers.
- As routers learned from iBGP peers are not sent to other iBGP peers, a logical mesh is needed to ensure consistent routing information.
BGP cannot detect loops within an AS, only in eBGP routes, therefore you cannot distribute iBGP routes into the IGP/distribution layer.
Problems with a large number of neighbors:
- BGP updates (one for each neighbor, each time a prefix changes)
- Information loss from aggregation
- Scaling BGP policies
- Scaling iBGP mesh
- Route flaps
Peer groups can be used to reduce updates.
iBGP neighbors can be reduced by confederations and route reflectors.
Route reflectors allow staged migration, configuring one router at a time.
TCP used for BGP sessions.
BGP can conditionally advertise.
Load sharing on inbound side:
- Little control over what other routers do, but…
- Prepending entries to AS Path
- Set MED outbound (only use when dual-homed to same ISP/AS)
- Set communities on outbound advertisements (may need to make an arrangement with ISP)