Facebook: An innovative data center network topology

Facebook: An innovative data center network topology

[[126753]]

Aerial view of Facebook's data center in Altoona, Iowa

Facebook's data centers receive billions of user requests every day; as the company continues to add members and launch new features, the number of requests continues to increase. All of this is basically good for Facebook, but it is a challenge for Facebook's network staff. For example, the data center topology that was able to meet the requirements five months ago is now overwhelmed.

So in addition to building large data centers, like this one in Altoona, Iowa, Facebook engineers are constantly optimizing the network design of data centers. That said, tweaking and changing the ideas that engineers came up with and implemented in the Altoona data center might not be the right word to describe it. It's more like they rewrote the network design guidelines.

The old Facebook network

Before the Altoona data center was built, Facebook engineers arranged the server racks in the data center into clusters similar to the architecture shown in Figure A. In a real environment, Facebook would have hundreds of racks instead of just three. The figure also shows the top-of-rack (TOR) switches for each rack, which act as intermediaries between the servers and the upstream aggregation switches.

Figure A: Top of Rack (TOR) - Network Connection Architecture

This architecture worked well, but it presented several challenges for Facebook engineers. "First, the size of the cluster was limited by the port density of the cluster switches," explains Alexey Andreyev, a network engineer at Facebook. "To build the largest clusters, we needed the largest network equipment, which was available from a limited number of vendors. Also, needing so many ports in a device was inconsistent with the desire to provide the highest bandwidth infrastructure. Even more difficult was finding the optimal long-term balance between cluster size, rack bandwidth, and bandwidth outside the cluster."

Fabric: A new network topology

Seeing those billions of requests per day as an incentive, engineers decided to move away from the complex, bandwidth-hungry top-down network hierarchy in favor of a new design called Fabric. The slide in Figure B depicts the new cluster of server racks, called pods. A single pod includes 48 racks and top-of-rack switches that are interconnected into four fabric switches. "Each top-of-rack switch currently has four 40G uplinks, providing a total of 160G of bandwidth capacity to the server racks connected with 10G."

Figure B

This design approach has the following advantages:

• Easy to deploy pods with 48 nodes

• Scalability is simplified and unlimited

• Each pod is identical and uses the same connection

The next step is to connect all the fabric switches -- the slide in Figure C describes how this task is accomplished. Andreyev says this is relatively simple (it's hard to imagine how it used to be).

Figure C

Andreyev explained that Facebook engineers adhered to this 48-node rule when adding spine switches. "To implement connectivity across the entire building, we created four separate 'planes' of spine switches, each of which can scale to up to 48 independent devices. Each fabric switch in each pod is connected to each spine switch in the local plane."

The numbers Andreyev mentioned next are staggeringly large. "Together, the pods and planes form a modular network topology capable of accommodating hundreds of thousands of servers connected with 10G, scaling to multiple petabits of bisection bandwidth, and providing non-oversubscribed rack-to-rack performance for our data center buildings."

Network Operations

From the top-of-rack switches to the edge of the network, the Fabric network design uses unified "Layer 3" technology, supports IPv4 and IPv6, and uses equal cost multi-path (ECMP) routing. Andreyev added: "To prevent occasional 'elephant traffic' from occupying a large amount of bandwidth and causing end-to-end path performance degradation, we make the network have multiple speeds-all 40G links between switches, while connecting servers through 10G ports on the top-of-rack switches. We also have server-side mechanisms so that in the event of a problem, we can bypass the fault."

Physical layout

Andreyev wrote that the new building layout shown in Figure D is not much different from Facebook's previous design. One difference is that the new spine and edge switches of Fabric are placed on the first floor between Data Hall X and Data Hall Y, and the network connection to the outside world (minimum point of entry, or MPOE) spans the era of spine switches and edge switches.

Figure D

Overcoming challenges

Facebook engineers appear to have overcome the challenges they faced. Hardware limitations are no longer an issue. Not only has the number of different components been reduced, but also the complexity. Andreyev said the technical team followed the "KISS (keep it simple)" principle. He added at the end of the article: "Our new fabric is not an exception to this approach. Although this topology is large and complex, it is actually a highly modular system with many repeated parts. It is easy to automate and deploy, and it is simpler to operate than a smaller batch of custom clusters."

<<:  Four questions to help you understand what DCIM is?

>>:  Comprehensive Anatomy of Data Center Facility Planning and IT Operations Checklist

Recommend

Share | Basic knowledge of 5G wireless network

Wireless networks have improved dramatically over...

Five ways 5G will change retail

5G is a hot topic - along with Web3.0 and the Met...

Do you know the functions of these interfaces on the monitor?

From the previous CRT monitors to the current LCD...

Linkerd 2.10 (Step by Step)—Install Multi-Cluster Components

[[406693]] The Linkerd 2.10 Chinese manual is bei...

Real-time communication technology battle

[[395758]] This article is reprinted from the WeC...

Analysis: Which businesses need a dedicated wireless network?

Over the past few years, private wireless network...

5G is just about faster internet speed? If you think so, you are out!!!

1. What is 5G? The world's communication tech...

5 Common SD-WAN Challenges and How to Address Them

Software-defined WAN is a feature-rich technology...

Why do you need to consider whether IPv6 is supported when adopting SD-WAN?

The Internet of Things (IoT) has fundamentally ch...