How does BGP routing work?

BGP stands for Border Gateway Protocol. It refers to the routing protocol used to ensure proper interconnection between autonomous systems (AS). eBGP (external BGP) is used between AS’s, while iBGP (interior BGP) is used within the AS (Autonomous System).

These basic concepts are explained in our article “What is BGP?”.

Let’s now deep dive a bit into how BGP actually works.

BGP is the protocol used in the backbone of the Internet. It allows organizations that have their own AS (typically Internet Service Providers and large organizations) to interconnect with others. This type of interconnection between AS’s is called a peering.

The basics of BGP peering

The Tier-1 club

When an AS gets set up, it peers with other AS’s to declare its IP prefixes (Prefixes refer to the IP subnets it owns), which are then declared to other AS’s, and so on. In this way, when new prefixes are announced, they get propagated around the Internet.

If you own an AS, it does not mean though that you can automatically make it available globally! Among the 100,000 AS’s , only about twenty of them can reach the whole Internet destinations without purchasing transit from any other AS, forming the so-called Tier-1 club.

The BGP routes

Unlike other routing protocols, there is no peer discovery process.

Each BGP speaker, which is called a “peer”, exchanges routing information with its neighboring peers in the form of network prefix announcements.

With prefix announcements, the information is sufficient to construct a graph of AS connectivity, like illustrated hereunder.

As you can see, communication between two prefixes can often occur through different paths.

Prefix J from AS 1559 can for example reach Prefix G from AS 257 via AS 20 or AS 13936.

So how do routers choose between different possible routes?

The routing decision

The BGP AS path

BGP does not work like any other traditional routing protocols that use metrics like the distance or costs (for example the bandwidth) to make routing decisions. Instead, BGP uses various attributes to route the traffic. 

The prime attribute of BGP is called “AS path”. This is a list of AS numbers describing the inter-AS path to a destination. The AS path is so critical to the function of BGP that the protocol is often referred to as a Path Vector routing protocol.

The figure here above shows how the AS path is propagated.

The AS 1 peer sends its prefix to the AS 6 and AS 5 peers (AS path [1]), which in turn send the prefix list respectively to AS 3 (AS path [6, 1]) and AS 2 (AS path [5, 1]) peers. AS 2 peer propagates this prefix list to AS 4 peer (AS Path [2, 5, 1]). Finally, AS 4 peer propagates the AS 1 peer prefix to AS 3 peer (AS Path [4, 2, 5, 1]).

So as a result, AS 3 is accessible from AS 1 through the AS path [6, 1] as well as AS Path [4, 2, 5, 1].

The BGP routing decision process

From the previous example, you may think that the chosen path between AS 1 and AS 3 peers will be via the AS 6 because this is the shortest path. 

Well, it can be the case, but this is not the strict rule! In fact, the best path is chosen based on policies, which are configured via various prefix filters, by announcing specific routes or by manipulating BGP attributes. 

When a destination is reachable from two different paths, BGP selects the best path by sequentially evaluating the path attributes:

  • Weight
  • Local preference
  • Originate
  • AS path length
  • Origin code
  • MED (Multi Exit Discriminator)
  • eBGP path over iBGP path
  • Shortest IGP path to BGP next hop
  • Oldest path
  • Router ID
  • Neighbor IP address.

The main point here is not to go into all details of these attributes, but to understand the basic principle of the routing decision process.

Taking back the example above, if the “weight” attribute of the AS path [4, 2, 5, 1] from AS 1 to AS 3 is greater than the attribute of AS path [6, 1], then this path is chosen. If the “weight attribute is equal for both paths, then the next attribute is evaluated (local preference), and so on.

So, in short, by using BGP attributes, you can make sure your traffic will transit through your preferred AS’s, based for example on non-technical parameters like financial agreements you may have with other AS owners.

How and when are BGP routing protocol data exchanged?

BGP uses the TCP transport protocol to transfer data. This provides reliable delivery of the BGP updates. BGP uses TCP port 179 for this.

It uses the Finite State Machine (FSM) model to maintain a table of all BGP peers and their operational status.

Compared to other routing protocols, BGP does not send any periodic updates of routing data. Instead, it sends updates only when changes occur on the network. For example, these changes can be due to session resets, link failures and policy changes.

Finally, BGP periodically sends keep-alive messages to check the TCP connection.

What can go wrong with BGP?

First, we have seen that BGP peering is configured manually. Human configuration is prone to errors. Or worse, this is prone to malicious attacks.

As an example, do you remember the IBM cloud outage back in June 2020? This was due to a BGP hijacking!

Another recent example from December 2020 is the Google Euro-Cloud outage due to an incorrect Access Control List configuration, which led the BGP routing protocol to withdraw the europe-west2-a availability zone from the rest of the Google backbone network.

Secondly, and this is certainly one of the major challenges for BGP, processing updates of large routing tables can be a problem for some routers.

Each router needs to store a local database of all prefixes announced by each routing peer. A router has a finite capacity to process updates and once the update rate exceeds its local processing capability, then the router will start to queue up unprocessed updates. In the worst case, the router will start to lag in real time, so that the information a BGP speaker is propagating reflects a past local topology, not necessarily the current local topology. At its most benign, the router will advertise ‘ghost’ routes where the prefix is no longer reachable, yet the out-of-sync router will continue to advertise reachability.

The following graph shows the evolution of the IPv4 BGP routing table size since the very beginning of BGP:

As you can see, even with the exhaustion of available IPv4 addresses, the size of routing tables still dramatically increases!

The APNIC organization has published the following detailed article on this topic.

Not only does the average BGP routing table size increase, the frequency of BGP updates also follows the same path. In another article, APNIC takes a specific AS (AS 131072) into consideration to measure the evolution of the routing table updates per year. Starting at 300.000 updates in 2009 (about 30 updates per hour), it exceeded 800.000 updates in 2020 (about 90 updates per hour).

Takeaway

BGP is an important part of the Internet foundations.

As it must be configured manually, it is prone to human error as well as security attacks.

Furthermore, the evolution of networks makes it more and more susceptible to instabilities and services disruptions. 

In a global digital services context, monitoring the BGP behavior in terms of paths discovery, performance monitoring, as well as path changes, becomes critical!

If you would like to learn more about how you can monitor network performance from the internet to your AS (or your cloud provider’s AS) or from your AS to different digital assets, I recommend that you read this article.