When the network scale of a data center becomes large, it is necessary to add network devices and implement multi-layer cascading. Today's data centers are often tree-shaped structures, with several devices with large forwarding capacity placed at the core, and then multiple layers of devices hanging below (due to insufficient port numbers, multiple layers may be required). Dozens or even hundreds of network devices are cascaded together. Once a fault occurs, how to quickly find the faulty device often troubles many network operation and maintenance personnel. The network equipment in the data center is redundant. When a network failure occurs, as long as the faulty device is found and isolated, the service can be restored, and then the cause of the failure can be slowly investigated. However, it is not easy to find the specific faulty device among hundreds of devices. Network failures often get fault feedback from the application side first, and then start troubleshooting. At this time, the application personnel often only describe an application access failure phenomenon. They will not tell you which specific addresses are not connected to which addresses, and sometimes even wrong information, which greatly delays the problem location time. Most of the time for problem location is spent on the process of sorting out the fault phenomenon. What should I do? How can the data center network be quickly troubleshooted? This article will give the answer.
If you want to analyze the network fault from the fault phenomenon reported by the application side, it is too late, and it is easy to be misled by the application personnel. Some application personnel report only the phenomenon they see, which is likely to be a local phenomenon and cannot reflect the fault of the entire network. Therefore, you have to rely on yourself, do a good job of network monitoring, discover problems through monitoring, and quickly find the faulty device, isolate the device or solve the fault. Early network monitoring mainly monitored some logs and port traffic of devices. More often than not, this information was not enough and problems could not be discovered in time. Many network equipment manufacturers say that their equipment logs are very complete, but in actual use, there are still some extreme cases or software bugs that result in no log output when a fault occurs. At this time, it is necessary to locate the traffic. At this time, network personnel need to find application personnel to understand the fault phenomenon, find some packet loss or unreachable IP addresses on site, and then conduct network traffic, and conduct traffic on all devices through which the fault traffic passes to find the faulty device. Since it is a tree-shaped network, there are many devices at each layer, and the traffic volume is quite large. Moreover, not all devices can support statistics on all characteristic traffic. If there are unsupported devices, the statistics will be inaccurate, which increases the difficulty of finding faulty devices. This is how I have persisted in network operation and maintenance over the years. Obviously, the previous network troubleshooting methods are effective but inefficient, take a long time to locate faults, and have a great impact on business. Today's network monitoring is all about data flow, monitoring specific data flows in the network, so that once the data flow is interrupted, the fault location can be immediately found. Here, we should mention several emerging network monitoring methods, also known as network visualization technology, which are the most effective methods for rapid troubleshooting.
With the above network monitoring methods, it is not difficult to find faults in the first place, and it can be fully automated. When a fault is found, the monitoring server automatically sends an isolation command to isolate the faulty device and automatically restore it. In this way, before the application reports the fault, the network fault location can be found, the faulty device can be isolated in time, and the business can be restored. This can greatly shorten the fault analysis time, have little impact on the business, and even the business part cannot perceive the fault at all. The actual application effect of network monitoring technologies such as INT and ERSPAN is still unknown. They are all technologies that have been mentioned recently and need to be tested in practice. SFLOW and Netstream technologies are relatively mature, but they are not used much in network troubleshooting, and they need to be promoted in this regard. Relying on these monitoring technologies, network faults can be quickly eliminated, which is of great significance to data center operation and maintenance, and greatly improves operation and maintenance efficiency. |
<<: Why choose NB-IoT when there are so many standards?
[51CTO.com original article] From June 7 to 8, 20...
Since 1994, there have been 12 versions of Blueto...
Whose product is 5G private network? A new report...
In the mobile phone industry in 2019, foldable sc...
This month, spinservers is offering a flash sale ...
Domestic 5G construction is still in full swing, ...
Do you feel that the current 4G network speed is ...
Most discussions about technology transformation ...
iWebFusion (iWFHosting) has been shared many time...
In October 2019, I wrote an article saying that i...
A lot of people have been questioning the value o...
The author has developed a simple, stable, and sc...
After successively losing important markets such ...
【Attention】This merchant has run away!!! Limewave...
Wireless networking is truly part of the culture ...