TCP retransmission problem troubleshooting ideas and practices

TCP retransmission problem troubleshooting ideas and practices

1. About TCP retransmission

TCP retransmission is a normal mechanism to ensure data transmission reliability. In a LAN environment, the network quality is guaranteed, because retransmission due to network problems should be extremely low; in an Internet or metropolitan area network environment, the lines are complex (you can imagine the city's underground pipe network, intricate electric poles, etc.), the network quality is not guaranteed, and the probability of retransmission is higher.

[[285498]]

TCP retransmission is not necessarily a network-level problem. It may also be that the receiving end does not exist, the receiving end's receive buffer is full, the application has an abnormal link that is not closed normally, etc.

2. TCP/IP related

To troubleshoot network problems, you need to understand the principles of TCP/IP. The truth is in each packet. The following are several key parameters related to TCP retransmission.

2.1 Parameters when establishing a TCP link


2.2 TCP retransmission type

Timeout retransmission

When the request packet is sent out, a timer is started. When the timer reaches the time, if no ACK is received, the request is resent until the resend limit is reached or an ACK is received.

Fast Retransmit

When the receiver receives a data packet with an abnormal sequence number, the receiver will repeatedly send the ACK that it should have received. At this time, if the sender receives three consecutive ACKs with the same sequence number, the fast retransmit mechanism will be activated to resend the packet corresponding to the ACK. For details, please refer to:


3. Common problems and solutions

3.1 TCP retransmission on a single machine or single application machine

The linked server or port may be unreachable.

Troubleshooting ideas


3.2 TCP retransmission on multiple machines or multiple applications simultaneously

It may be network jitter

Troubleshooting ideas

1. Check the network area buried points, check the network equipment alarms, and see if there is any regional network jitter. 2. If the regional network is fine, you can use the common problems: method to narrow the scope of investigation

3.3 Bandwidth Full

Troubleshooting ideas

1. View host monitoring

3.4 Uncommon Problems

1. Packet checksum failure caused by abnormality of network device port or optical module 2. Convergence jitter of network routing 3. Bug in host network driver, bug in network device, etc.

4. How to monitor

Use tsar -tcp -C to monitor the retran attribute of TCP, that is, the number of retransmissions.

  1. tsar --tcp -C | sed 's/:/_/g;s/=/ /g' | xargs -n 2  

Interested friends can directly execute the following monitoring script to obtain TCP-related status monitoring data, which is applicable to open-falcon.


5. Case practice

(1) Capture packets on the machine that encounters packet loss and retransmission and use wireshark to analyze the packets. Note that because retransmission does not always occur, the packet capture command must be executed continuously in order to capture the retransmitted packets. Use wireshark to open the tcpdump results and enter tcp.analysis.retransmission in the search box to get the following results:


Figure 1 shows that the server has retransmitted three times.

(2) Since there are many packets, we can use the trace stream function of Wireshark to obtain the TCP stream related to retransmission.


Figure 2 Tracking flow --> TCP flow can get retransmission related data packets


Figure 3 shows the request and response of the client and server.

(3) Analysis and retransmission

In particular, it is necessary to explain:

NO 67,68 The client does not receive the correct packet data for some reason and sends a dup ack to the server. Refer to the fast retransmission mentioned in the basic knowledge.

The time difference between NO.68 and NO.69 is 200ms (pay attention to the time column, the others are less than 1ms apart). The server waits for a timeout and retransmits.

NO 73-74 means the client sends a FIN packet and actively closes the connection.

This case only occurred once and has not been reproduced. No clear conclusion was obtained through packet capture and analysis.

6. Summary

This article summarizes the solution process of TCP retransmission problems encountered in my work, focusing on the general ideas and specific practices for solving the problem. There is less theoretical knowledge. If you are interested, you can read more related articles to gain a deeper understanding of the working mechanism of TCP.

<<:  In the 5G era, how to innovate network construction models?

>>:  South Korean government’s request for 5G fee reduction was rejected: How difficult is 5G construction?

Recommend

Understanding Internet Protocol Security — IPSec

​IPSec (Internet Protocol Security) is a security...

What is the difference between 5G and 6G?

In a world where technology is constantly evolvin...

WebSocket in real-time chat room

To learn more about open source, please visit: ​​...

Understanding Cloud Networks in One Article

​Enterprise digital transformation has promoted t...

How does network automation simplify network operations?

In today's rapidly evolving digital environme...

How do 5G base stations control mobile phones under NSA?

The 5G network architecture is divided into SA an...

To promote user migration to 5G, these tasks need to be done in advance

[[357697]] After the issuance of 5G licenses on D...

How about HostYun? Simple test of HostYun Los Angeles CN2 GIA cheap version

Recently, I shared the news that HostYun (Host Cl...

Understanding Deterministic Networks in Seconds: Playing with Queues (Part 2)

The previous section introduced the evolution of ...

Detailed explanation of several wireless transmission modes!

1. Access Point (AP) In this mode, the wireless n...