Actual combat case: Explosion! Accessing a certain IP in the intranet caused the entire network loop to crash. The root cause was this lazy configuration...

Background

The client company is a retail company with about 200 employees. The company's network is a classic full-gigabit three-layer network architecture, with an export iKuai soft router + an intranet switch. The general topology is shown in the figure below:

Typical topology

The network segment planning is as follows:

Wireless network segment: VLAN10-172.16.10.0/24
Monitoring network segment: VLAN20-172.16.20.0/24
Office network segment: VLAN30-172.16.30.0/24
Management network segment: VLAN100-192.168.90.0/24 (including all switches, routers, and servers)

The current problem is: the company's IT often finds that the CPU of the iKuai export router often soars to more than 90%, accessing pages is particularly slow, and at the same time, the terminals below have problems such as lag and delay when accessing the Internet, and the entire network has collapsed.

Existing analysis

Restarting the router and core switch did not work, but the problem persisted.
Traffic statistics of the router LAN port show that the bandwidth of the Gigabit link is fully utilized, but strangely, there is no large traffic on the WAN port?

Unplug the core switch, find a PC to connect directly to the iKuai router, the speed will be so fast!

The above tests show that there should be no major problems with routers, switches and other devices. When encountering this kind of high throughput statistics problem, it is speculated that there is a loop in the network, causing a broadcast storm to consume all the bandwidth and destroy the address table?

Troubleshooting Analysis

Step 1: Confirm the problem

We use the VLAN30 office area PC to ping Baidu IP:

You can see that the packet loss is quite serious.

Step 2: Confirm the loop problem in the network

The most common loops we encounter are Layer 2 loops, which have two main manifestations:

Broadcast storm: causes the port bandwidth to be fully loaded and the address table to be destroyed;
Multicast/broadcast flooding: If too many DHCP, ARP, and other messages flow into the CPU, CPU performance will also be reduced.

As shown in the topology above, because only the WAN and LAN ports of the router are in use, from the previous traffic statistics analysis, "there is no large traffic on the WAN port, but the traffic on the LAN port is abnormally large". We only need to check the data statistics on the uplink interface of the core switch to confirm whether this large traffic is broadcast & multicast!

The command is as follows:

 <核心交换机>dis int g0/0/1释义：查看上联口数据统计，每隔5s敲一次确认报文变化量

From the above figure, we can see that when the problem reappears, the multicast and broadcast packets on the core switch uplink GE0/0/1 interface increase very slowly within 5 seconds, which basically eliminates the possible existence of Layer 2 loops and packet flooding.

But please note: if you look closely at the statistics of the core switch uplink interface, 35,000 unicast packets are sent and received per second within 5 seconds, which is equivalent to 7,000 bidirectional packets sent and received per second. This amount is amazing! How can unicast packets have such a large throughput when the entire network is paralyzed?

The next step is to capture the data flow analysis.

Step 3: Intranet trunk data flow analysis

We mirror the flow of the upstream egress routing interface on the core switch:

The captured packets are as follows:

The analysis from this flow is as follows:

This flow is a UDP flow from the source VLAN20 monitoring network to the destination 172.16.40.0/24. According to statistics, the throughput of these flows is as high as 1Gbps, which is the main reason for the full Gigabit link bandwidth and the main reason for the router to be fully loaded with sending and receiving packets, resulting in a continuous high CPU. So why do these "abnormal flows" appear between the core switch and the router and flood? Let's take a look at the MAC address of the data packet and compare it:

Let’s look at the first one in chronological order:

Look at the second one in chronological order:

The analysis shows that this UDP packet circulates between the switch and the router, that is, SW—>R—>SW—>R... It keeps circulating until TTL=0 and then the UDP packet is discarded!

Conclusion:

There is a routing loop between the router and the switch. The destination network segment accessed by the monitoring network is 172.16.40.0/24. But strangely, this network segment does not exist in the intranet. Even if the monitoring device tries to access this network segment, it should be sent out from the WAN port of the router. Why does it bounce back? It must be related to the configuration!

Step 4: Check the routing tables of core switches and routers

The core switch routing table is as follows:

The core switch configuration is standard and there is no problem. Let’s take a look at the iKuai routing table:

For the sake of convenience, the IT staff configured the return route on iKuai as 172.16.0.0/16 (including all network segments of the intranet) with the next hop as the core. This is a disaster! Once the intranet accesses 172.16.1.X, 172.16.2.X, and 172.16.200.X, the trunk route will be looped and the link bandwidth will be exploded!

Summary and solutions

The summary is as follows

The project intranet has three VLAN segments, namely 172.16.10/20/30.X;
A core switch has all 0 routes and the next hop points to the iKuai egress router to ensure access to the external network. This configuration is fine.
To save trouble, iKuai's return static route is configured as one: the destination 172.16.0.0/16 next hop points to the core. Under this configuration, a routing loop occurs when accessing an IP that is not in the intranet segment and is in the 172.16.0.0/16 segment. For example, when accessing 172.16.1.1, the path tracking is as follows:

Solution

The return route must include details, do not use large network segments! As follows:

<<: Will there be any problems if the algorithms on both ends of a switch link aggregation are inconsistent?

>>: One minute to show you: the complete process from entering a URL to displaying the page

RackNerd: AMD Ryzen packages are available, starting at $18.88 per year for 1GB memory packages, San Jose/New York data centers

Blog

What is DHCP Option 43? Have you learned it?

[Black Friday] RAKsmart: VPS flash sale from $0.99/month, cloud server flash sale from $1.99/month, dedicated server flash sale from $30/month

Blog

Outstanding Network Virtualization Solutions in 2021

Blog

5G concepts are performing well. Who will become the best among the strong?

Blog

DogYun National Day promotion: 30% off on Elastic Cloud/20% off on Classic Cloud, 10 yuan free for every 100 yuan top-up, 100 yuan off for dedicated servers

Blog

What happens if you apply for 8G memory on a machine with 4GB physical memory?

Blog

Recommend

F5 President Fan Zhonglin reveals the story behind the acquisition of Nginx. What will F5 do after the perfect complement?

[51CTO.com original article] On March 11, 2019, F...

Huawei releases next-generation machine vision cameras and new products for 2020

[Hangzhou, China, May 25, 2020] Today, the Huawei...

What exactly does edge computing mean?

The word "edge" has been given a new de...

The preliminary round of the 10th "H3C Cup" National College Students Digital Technology Competition 2020 was successfully concluded

[51CTO.com original article] On November 14, 2020...

Sharktech cloud server 35% off annual payment starting at $33, 2G memory/40G hard drive/4TB traffic/multiple computer rooms available

Sharktech, also known as SK or Shark Data Center,...

NDRC: Strengthen new infrastructure such as 5G and industrial Internet to promote integrated innovation of technologies such as AI

At 10 a.m. on March 23, 2020, the Joint Preventio...

The United States has approved 6G trials. Is this a far-sighted move or a desperate attempt?

Recently, US President Trump announced the approv...

13 key technical differences between SD-WAN providers

Choosing the right software-defined WAN vendor ca...

Actual combat case: Explosion! Accessing a certain IP in the intranet caused the entire network loop to crash. The root cause was this lazy configuration...

Background

Existing analysis

Troubleshooting Analysis

Summary and solutions

RackNerd: AMD Ryzen packages are available, starting at $18.88 per year for 1GB memory packages, San Jose/New York data centers

What is DHCP Option 43? Have you learned it?

Network charges are more affordable and 5G demand is gradually released

Let's talk about UPNP and DLNA protocols

2017 is the turning point year with three main investment themes: 5G, Internet of Things and optical communications

[Black Friday] RAKsmart: VPS flash sale from $0.99/month, cloud server flash sale from $1.99/month, dedicated server flash sale from $30/month

Outstanding Network Virtualization Solutions in 2021

5G concepts are performing well. Who will become the best among the strong?

DogYun National Day promotion: 30% off on Elastic Cloud/20% off on Classic Cloud, 10 yuan free for every 100 yuan top-up, 100 yuan off for dedicated servers

What happens if you apply for 8G memory on a machine with 4GB physical memory?

Recommend

F5 President Fan Zhonglin reveals the story behind the acquisition of Nginx. What will F5 do after the perfect complement?

Huawei releases next-generation machine vision cameras and new products for 2020

What exactly does edge computing mean?

The preliminary round of the 10th "H3C Cup" National College Students Digital Technology Competition 2020 was successfully concluded

The relationship and difference between URL, URI and URN

Hostaris: £18/year-2*AMD Epyc 7401P/2GB/50G NVMe/4TB/Germany data center

Current Affairs | How many cards does the US have left to crush China’s 5G?

Are you ready for 5G? Five new applications you'll see

Talk about the communication protocol I2C subsystem Hs Mode

Servzen: $2.49/month OpenVZ-1GB/20GB/1Gbps unlimited traffic/Netherlands data center

Several thinking patterns that need to be changed in the 6G era

Sharktech cloud server 35% off annual payment starting at $33, 2G memory/40G hard drive/4TB traffic/multiple computer rooms available

NDRC: Strengthen new infrastructure such as 5G and industrial Internet to promote integrated innovation of technologies such as AI

The United States has approved 6G trials. Is this a far-sighted move or a desperate attempt?

13 key technical differences between SD-WAN providers