Network Performance

Chapter 5 — Network Performance

Network problems are the easiest bottleneck to miss. When CPU and memory look healthy, many administrators declare the server fine — but a 200ms DNS lookup on every database query, or a NIC saturated by a background rsync, can make an application feel broken without leaving any obvious trace in the usual tools. This chapter covers how to systematically rule network in or out, and how to find and fix it when it's the culprit.

What this chapter covers: Bandwidth vs latency — two separate problems. Diagnosing with ping, mtr, and traceroute. Reading ss (the modern netstat). TCP connection states — ESTABLISHED, TIME_WAIT, CLOSE_WAIT and what each means. Scenario 1: app is slow but CPU/mem are idle. Scenario 2: thousands of TIME_WAIT connections. Scenario 3: NIC saturation — finding the process and throttling it. /proc/net/dev for NIC errors and drops.

Bandwidth vs Latency — Two Different Problems

🚿

Bandwidth (Throughput)

The maximum data transfer rate — how wide the pipe is. Measured in Mbps or GB/s. Symptoms when limited: large file transfers are slow, video streams buffer, bulk API calls take long. Diagnosed with: iftop, nethogs, /proc/net/dev. The NIC itself has a hard limit (1 Gbps, 10 Gbps etc.).

⚡

Latency (Round-Trip Time)

The time for a single packet to travel and return — how fast the pipe responds. Measured in milliseconds. Symptoms when high: interactive apps feel laggy, many small API calls are slow even though bandwidth is fine, database queries with many round-trips are sluggish. Diagnosed with: ping, mtr, curl timing.

📦

Packet Loss

A percentage of packets that never arrive. Even 1% loss causes TCP to retransmit, adding latency and throttling throughput dramatically. Symptoms: connections work but are unreliable and slow, timeouts appear intermittently. Diagnosed with: mtr (shows per-hop loss%), ping with count.

🔍

DNS Resolution Time

Often invisible in monitoring but catastrophic in impact. If your app resolves a hostname on every request and DNS takes 200ms, that's 200ms of latency on every operation. Symptoms: app slow, server-to-server calls slow, intermittent timeouts. Diagnosed with: dig +stats, strace on a running process.

Connectivity Diagnostics — ping, traceroute, mtr

# ping — basic connectivity and round-trip time $ ping -c 10 8.8.8.8 PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data. 64 bytes from 8.8.8.8: icmp_seq=1 ttl=116 time=12.4 ms 64 bytes from 8.8.8.8: icmp_seq=2 ttl=116 time=11.9 ms 64 bytes from 8.8.8.8: icmp_seq=8 ttl=116 time=245.1 ms ← spike --- 8.8.8.8 ping statistics --- 10 packets transmitted, 10 received, 0% packet loss, time 9013ms rtt min/avg/max/mdev = 11.9/38.2/245.1/69.4 ms # avg 38ms but max 245ms — one packet had a large spike. mdev (deviation) of 69ms is high. # This jitter pattern suggests congestion somewhere in the path, not a constant problem. # traceroute — which hop in the path is slow? $ traceroute -n 8.8.8.8 # -n skips reverse DNS for faster output 1 192.168.1.1 1.2 ms 1.1 ms 1.0 ms ← your router 2 10.0.0.1 4.8 ms 4.9 ms 4.7 ms ← ISP edge 3 * * * ← hop drops ICMP (normal) 4 72.14.215.100 185.0 ms 182.0 ms 190.0 ms ← latency jump HERE 5 8.8.8.8 12.3 ms 12.1 ms 12.4 ms ← destination is fast # Hop 4 is adding ~180ms — a peering point or transit link under load. # Hop 5 (the destination) is fine. The problem is in the path, not the destination.

mtr — the best tool for diagnosing path problems

mtr (Matt's Traceroute) combines ping and traceroute into a live view that shows round-trip time and packet loss at every hop simultaneously. It's the single most useful tool for diagnosing whether a network problem is in your server, your network, or somewhere on the internet path.

$ mtr --report --report-cycles 20 -n 8.8.8.8 # --report: print a summary after 20 cycles instead of live display # -n: no reverse DNS (faster) Start: 2025-06-14T15:22:10+0100 HOST: myserver Loss% Snt Last Avg Best Wrst StDev 1. 192.168.1.1 0.0% 20 1.2 1.1 1.0 1.5 0.1 2. 10.0.0.1 0.0% 20 4.8 4.7 4.6 5.1 0.2 3. ??? 100.0% 20 0.0 0.0 0.0 0.0 0.0 4. 72.14.215.100 0.0% 20 12.0 12.1 11.8 12.6 0.2 5. 8.8.8.8 0.0% 20 12.3 12.2 12.0 12.8 0.2 # Hop 3 shows 100% loss — but hop 4 and 5 are fine with 0% loss. # This means hop 3 just doesn't respond to ICMP probes. It IS forwarding packets. # Genuine packet loss would show at hop 4 or 5, not just hop 3.

100% loss at an intermediate hop is not packet loss — many routers de-prioritise or block ICMP TTL-exceeded messages (what traceroute/mtr uses) while still forwarding packets normally. If all subsequent hops are reachable with 0% loss, the "100%" hop is just filtering probes. Only worry if loss appears at your destination or persists across multiple subsequent hops.

Testing DNS resolution time

# How long does a DNS lookup take? $ dig google.com | grep "Query time" ;; Query time: 2 msec # Good — local DNS cache hit $ dig google.com | grep "Query time" ;; Query time: 342 msec # Bad — slow or overloaded DNS resolver # Test a specific DNS server directly (bypasses system resolver) $ dig @8.8.8.8 google.com | grep "Query time" # For HTTP apps — break down where the time goes with curl: $ curl -o /dev/null -s -w "DNS: %{time_namelookup}s\nConnect: %{time_connect}s\nTTFB: %{time_starttransfer}s\nTotal: %{time_total}s\n" https://example.com DNS: 0.342s ← DNS lookup taking 342ms — this is the problem Connect: 0.344s TTFB: 0.489s Total: 0.490s # The curl timing breakdown instantly shows which phase is slow. # Here DNS (name lookup) is taking 342ms — everything else is fine.

ss — The Modern netstat

ss (socket statistics) replaced netstat as the recommended tool for inspecting network connections. It's faster, more informative, and available on all modern Linux systems. The flags are similar to netstat but the output is richer.

# The most useful ss commands # What's listening on which port, and which process owns it? $ ss -tulpn Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process tcp LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:(("sshd",pid=891)) tcp LISTEN 0 511 0.0.0.0:80 0.0.0.0:* users:(("nginx",pid=1234)) tcp LISTEN 0 128 127.0.0.1:5432 0.0.0.0:* users:(("postgres",pid=2341)) # -t TCP -u UDP -l listening -p show process -n no DNS lookup # Summary statistics — all TCP states at a glance $ ss -s Total: 4821 TCP: 4701 (estab 241, closed 4120, orphaned 0, timewait 4112) # 4112 TIME_WAIT out of 4701 total TCP sockets — high, worth investigating (Scenario 2) # All established connections with process info $ ss -tp state established Recv-Q Send-Q Local Address:Port Peer Address:Port Process 0 0 10.0.0.5:44321 10.0.0.10:5432 users:(("python3",pid=8821)) 0 0 10.0.0.5:44322 10.0.0.10:5432 users:(("python3",pid=8821)) # python3 (PID 8821) has two connections to PostgreSQL (port 5432) # Count connections per state $ ss -tan | awk 'NR>1 {print $1}' | sort | uniq -c | sort -rn 4112 TIME_WAIT 241 ESTABLISHED 8 LISTEN # Find connections to/from a specific port or address $ ss -tp dst 10.0.0.10:5432 # connections TO this PostgreSQL server $ ss -tp sport :80 # connections FROM port 80 (web server outbound)

Recv-Q and Send-Q in ss output: For LISTEN sockets, Recv-Q is the number of connections waiting to be accepted (should be near 0; a large value means the application isn't calling accept() fast enough). For established sockets, Send-Q is data buffered waiting to be sent to the remote end — a large Send-Q means the remote side is reading slowly or the connection is congested.

TCP Connection States

LISTEN Normal

Server socket waiting for incoming connections. Should exist for every service port. The Recv-Q shows the backlog queue — connections waiting to be accepted by the application.

ESTABLISHED Normal

Active two-way connection. Both sides can send data. The number of established connections reflects your actual active users or service-to-service connections right now.

TIME_WAIT Watch

Connection has been closed. The kernel holds the socket for 60 seconds (2×MSL) to absorb any late-arriving packets. Normal in small numbers; thousands indicates high connection turnover or missing keep-alive.

CLOSE_WAIT App Bug

Remote end sent FIN (closed its side), but the local application hasn't called close() yet. A large and growing CLOSE_WAIT count is almost always an application bug — sockets not being closed after use.

SYN_SENT / SYN_RECV

TCP handshake in progress. SYN_SENT = local side waiting for remote to respond. SYN_RECV = server received SYN, waiting for ACK. Many SYN_RECV can indicate a SYN flood attack.

FIN_WAIT1 / FIN_WAIT2

Connection teardown in progress — local side initiated the close. Brief transitional states. Many FIN_WAIT2 with no progression to TIME_WAIT can indicate the remote end is not responding to the close sequence.

iftop and nethogs — Bandwidth by Connection and by Process

Shows which source→destination pairs are consuming bandwidth. Answers: "which remote host am I talking to most?" Useful when you suspect a specific remote host is involved.

iftop -i eth0 — specify interface
iftop -n — don't resolve hostnames
iftop -P — show ports
iftop -B — show bytes not bits

Keys in interactive mode: t toggle TX/RX display · s show source · d show destination · p show ports · q quit

Shows which process is consuming bandwidth. The network equivalent of iotop. Answers: "which application on this server is sending/receiving the most data?"

nethogs eth0 — monitor one interface
nethogs -d 1 — update every 1 second
nethogs -b — batch mode for scripts

Keys in interactive mode: m cycle display units (KB/MB/B) · r sort by received · s sort by sent · q quit

Requires root or CAP_NET_ADMIN.

# nethogs output — which process is using bandwidth right now? $ nethogs -d 1 eth0 NetHogs version 0.8.5 PID USER PROGRAM DEV SENT RECEIVED 8821 backup /usr/bin/rsync eth0 94.2 0.1 MB/s 891 www /usr/sbin/nginx eth0 8.4 2.1 MB/s 2341 mysql /usr/sbin/mysqld eth0 0.8 0.3 MB/s # rsync (backup) is sending 94 MB/s — near the 1 Gbps NIC limit # nginx is doing 8 MB/s — normal for a web server # On a 1Gbps NIC: 125 MB/s max. rsync is consuming 75% of available bandwidth.

/proc/net/dev — NIC Errors and Drops

NIC-level errors are distinct from application-level bandwidth saturation. Hardware errors (CRC failures, framing errors) indicate a physical problem — bad cable, faulty NIC, misconfigured duplex. Drops indicate the kernel couldn't process packets fast enough.

$ cat /proc/net/dev Inter-| Receive | Transmit face |bytes packets errs drop fifo frame compressed multicast|bytes packets errs drop fifo colls carrier compressed eth0: 82341M 61234K 0 0 0 0 0 4821K 52341M 48923K 0 0 0 0 0 0 eth1: 12341M 18234K 142 891 0 12 0 0 8234M 12821K 0 0 0 0 0 0 # eth0: clean — zero errors and drops on both receive and transmit # eth1: 142 receive errors + 891 drops + 12 frame errors — investigate this NIC/cable # More readable with ip -s link: $ ip -s link show eth1 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP RX: bytes packets errors dropped missed mcast 12341M 18234K 142 891 0 0 TX: bytes packets errors dropped carrier collisions 8234M 12821K 0 0 0 0

Field	Meaning	Non-zero means…
RX errors	Packets received with hardware errors (CRC, framing, length)	Physical problem: bad cable, NIC fault, duplex mismatch. Replace cable first.
RX dropped	Received packets discarded by the kernel before processing	Ring buffer overflow — NIC received faster than the kernel could process. Increase ring buffer size: `ethtool -G eth0 rx 4096`
RX missed	Packets missed by the NIC hardware before they reached the ring buffer	NIC is hardware-saturated. Interrupt coalescing or RSS (Receive Side Scaling) may help.
TX carrier	Lost carrier signal during transmit — link went down mid-send	Physical link instability — cable, switch port, or NIC issue
TX collisions	Ethernet collisions (half-duplex only)	Should be zero on modern full-duplex links. Non-zero = duplex mismatch with switch.

Scenario 1 — App Is Slow but CPU and Memory Are Idle

Confirm with vmstat that it's not disguised I/O wait. Network blocking shows as idle CPU (not wa), making it look like the system is doing nothing when it's actually waiting on network responses.

$ vmstat 1 5 r b swpd free si so bi bo wa us sy id 0 0 0 8.2G 0 0 0 0 0 5 2 93 # id=93: CPU is 93% idle. No I/O wait. No swap. System appears completely idle # but the app is slow. The bottleneck is not a local resource — it's network or # external service latency (database server, API, DNS).

Test DNS resolution time — this is the most commonly missed culprit.

$ time dig db-server.internal A ;; Query time: 380 msec ;; SERVER: 10.0.0.53#53 # DNS is taking 380ms. If the app resolves "db-server.internal" on every request, # that's 380ms of latency before a single byte of database query is sent. # Fix: add db-server.internal to /etc/hosts, or fix the internal DNS server. $ cat /etc/hosts | grep db-server # quick fix: static resolution $ echo "10.0.0.10 db-server.internal" | sudo tee -a /etc/hosts

Use curl timing to break down an HTTP request into phases.

$ curl -o /dev/null -s -w \ "DNS lookup: %{time_namelookup}s\nTCP connect: %{time_connect}s\nSSL handshake: %{time_appconnect}s\nTime to first byte: %{time_starttransfer}s\nTotal: %{time_total}s\n" \ https://api.example.com/health DNS lookup: 0.002s ← fast, cached TCP connect: 0.015s ← 13ms to connect, reasonable SSL handshake: 1.240s ← 1.2 seconds for TLS — this is the bottleneck Time to first byte: 1.250s Total: 1.252s # TLS handshake taking 1.2 seconds. Possible causes: # - Server is doing slow certificate validation (OCSP stapling not configured) # - Client-side certificate revocation check over slow network # - Session resumption not working (full handshake every time)

Check for CLOSE_WAIT accumulation — a sign the app isn't closing connections.

$ ss -tan | awk '{print $1}' | sort | uniq -c 4 LISTEN 12 ESTABLISHED 820 CLOSE_WAIT # 820 CLOSE_WAIT: the remote side has closed connections but the app hasn't. # This means the app is holding onto dead connections. Each thread/worker waiting # on a CLOSE_WAIT socket is blocked. This directly causes slow responses. # Fix: find the bug in the application code that's not calling close() after use. # Find which local process owns the CLOSE_WAIT sockets: $ ss -tanp state CLOSE_WAIT | awk '{print $NF}' | sort | uniq -c | sort -rn | head -5 820 users:(("node",pid=8821,fd=42))

Check NIC errors — hardware problems can cause intermittent slowness.

$ ip -s link show eth0 | grep -A2 "RX:" RX: bytes packets errors dropped 82341M 61234K 1842 234 # 1842 receive errors and 234 drops — hardware-level problem on this NIC. # Check the cable, the switch port, and negotiate duplex settings. # ethtool eth0 | grep -E "Speed|Duplex" — confirm link speed and duplex. $ ethtool eth0 | grep -E "Speed|Duplex" Speed: 100Mb/s ← Should be 1000Mb/s — NIC negotiated at wrong speed Duplex: Half ← Half duplex — bad, should be Full. Explains collisions.

The curl timing breakdown is one of the most productive 30-second investments when diagnosing slow HTTP applications. DNS → Connect → TLS → TTFB each map to a specific layer you can investigate independently.

Scenario 2 — Thousands of TIME_WAIT Connections

Understand why TIME_WAIT exists before deciding to fight it. TIME_WAIT is the TCP protocol's safety mechanism — after a connection closes, the kernel holds the port combination for 60 seconds to absorb any late-arriving packets that were delayed in transit. It prevents a new connection on the same port from receiving old data. Removing it entirely is dangerous.

Determine whether it's actually causing a problem. TIME_WAIT is only a problem if you're running out of ephemeral ports — the pool of local ports used for outgoing connections.

# What's the ephemeral port range? $ cat /proc/sys/net/ipv4/ip_local_port_range 32768 60999 # 28,231 available local ports for outgoing connections. # If more than ~28,000 TIME_WAIT sockets exist to the SAME destination, # new connections to that destination will fail with "cannot assign address". # Are we actually failing to create new connections? $ dmesg -T | grep -i "port\|connect\|socket" $ netstat -s | grep -i "failed\|refused\|exhausted"

If TIME_WAIT is genuinely causing port exhaustion — tune carefully.

# Option 1: Enable TIME_WAIT socket reuse (safe — only for outgoing connections) $ sysctl -w net.ipv4.tcp_tw_reuse=1 # Allows reusing TIME_WAIT sockets for new outgoing connections when safe. # This is the recommended setting for servers making many outgoing connections. # Option 2: Widen the ephemeral port range $ sysctl -w net.ipv4.ip_local_port_range="1024 65535" # Gives ~64,000 local ports instead of ~28,000. Simple and safe. # Option 3: Enable keep-alive on HTTP connections (fix the root cause) # In nginx — ensure keep-alive is not disabled: keepalive_timeout 65; # in nginx.conf http {} block # With keep-alive, the same TCP connection handles multiple HTTP requests, # dramatically reducing connection turnover and TIME_WAIT accumulation. # Make sysctl changes permanent: $ echo "net.ipv4.tcp_tw_reuse=1" | sudo tee -a /etc/sysctl.d/99-network.conf $ sysctl -p /etc/sysctl.d/99-network.conf

Do not use tcp_tw_recycle. This parameter was removed in Linux kernel 4.12 because it broke connections from clients behind NAT (a common scenario with load balancers and mobile networks). If you see advice recommending it for TIME_WAIT, the advice is outdated and potentially harmful.

8,000 TIME_WAIT sockets with a 28,000-port range is perfectly healthy — you have headroom. TIME_WAIT only warrants action when either the count approaches your port range limit, or you're seeing actual "cannot assign requested address" errors in application logs or dmesg.

Scenario 3 — One Process Is Saturating the NIC

Confirm NIC saturation and identify the interface.

# Watch interface throughput in real time $ watch -n 1 'cat /proc/net/dev | awk "/eth0/{print \"RX: \" $2/1048576 \" MB total | TX: \" $10/1048576 \" MB total\"}"' # Or more elegantly with ip: $ ip -s -s link show eth0 | grep -A4 "TX:" # Run twice 1 second apart to see the delta (bytes/second) # iftop shows bandwidth by connection — which remote host is involved? $ iftop -i eth0 -n -B -P 12.5MB 25.0MB 37.5MB 50MB 62.4MB 10.0.0.5 => 192.168.50.20:873 89.4Mb 91.2Mb 87.8Mb <= 0.12Mb 0.14Mb 0.13Mb # Port 873 = rsync. Sending ~90 Mbps to 192.168.50.20 — the backup server. # This is consuming ~90% of the 100Mbps link being used here.

Confirm the responsible process with nethogs.

$ nethogs -d 1 eth0 PID USER PROGRAM SENT RECEIVED 9821 backup /usr/bin/rsync 94.2 MB/s 0.1 MB/s 891 www nginx 8.4 MB/s 2.1 MB/s # Confirmed: PID 9821 (rsync backup) is the culprit

Throttle rsync's bandwidth directly. rsync has a built-in bandwidth limit flag — if you have control over how it's invoked, this is the cleanest solution.

# Kill the current rsync and restart it with a bandwidth limit $ kill 9821 $ rsync -av --bwlimit=20480 /data /mnt/backup # limit to 20 MB/s (20480 KB/s)

For processes without built-in throttling — use trickle or tc.

# trickle: simple per-process bandwidth shaping (no root required) # apt install trickle or yum install trickle $ trickle -u 20480 rsync -av /data /mnt/backup # -u 20480 = limit upload to 20480 KB/s (20 MB/s) # -d 20480 = limit download # Works by intercepting socket calls — no kernel changes needed # tc (traffic control): kernel-level interface shaping (affects ALL traffic on interface) # Add a queuing discipline limiting the interface to 100 Mbit total $ tc qdisc add dev eth0 root tbf rate 100mbit burst 32kbit latency 400ms $ tc qdisc show dev eth0 # verify it's applied $ tc qdisc del dev eth0 root # remove the limit when done # tc affects the entire interface — use trickle or rsync --bwlimit for per-process control

Long-term fix: schedule bandwidth-heavy jobs during off-peak hours.

# crontab -e — run backup at 2am with bandwidth limit 0 2 * * * /usr/bin/rsync -av --bwlimit=51200 /data /mnt/backup >> /var/log/backup.log 2>&1 # --bwlimit=51200 = 50 MB/s at 2am when the server is quiet # Even at 50 MB/s, 1 TB backup completes in ~6 hours

If the high-bandwidth process is a production service rather than a backup job, investigate whether it's doing unnecessary data transfer (missing caching, missing compression, pulling full datasets when it only needs diffs) before throttling it — throttling a production service degrades its performance for users.

Quick Reference — Chapter 5 Commands

Command	Purpose	Key flags / notes
ping -c 10 host	Basic connectivity test and round-trip time. Watch for packet loss and jitter (mdev).	High mdev = jitter = congestion somewhere in path
mtr --report -n host	Combined traceroute + ping — shows loss% at every hop over multiple cycles	`--report-cycles 20` for 20 probes · 100% loss at intermediate hop is usually normal
dig hostname	DNS lookup — check "Query time:" for DNS latency	`dig @8.8.8.8 host` to test a specific resolver
curl -w "..." url	Break HTTP request into phases: DNS / Connect / TLS / TTFB / Total	Use the timing format string from Scenario 1 step 3
ss -tulpn	Listening ports with owning process — the first check for "what's running"	`-t` TCP · `-u` UDP · `-l` listening · `-p` process · `-n` no DNS
ss -s	Connection state summary — quick view of ESTABLISHED, TIME_WAIT, CLOSE_WAIT counts	Many CLOSE_WAIT = app bug. Many TIME_WAIT = high connection turnover.
ss -tan \| awk '{print $1}' \| sort \| uniq -c	Count connections per state	Add `state CLOSE_WAIT` to filter to one state
ip -s link show eth0	NIC statistics including RX/TX errors, drops, missed packets	Non-zero errors = physical problem. Non-zero drops = ring buffer overflow.
ethtool eth0	NIC link status — speed and duplex negotiation	Speed should be 1000Mb/s+, Duplex should be Full on modern links
iftop -i eth0 -n	Live bandwidth by source→destination connection pair	`-B` bytes · `-P` show ports · `-n` no DNS
nethogs eth0	Live bandwidth by process — the iotop equivalent for network	`-d 1` update every 1s · requires root
rsync --bwlimit=20480	Limit rsync bandwidth to 20 MB/s (20480 KB/s)	Built-in to rsync — cleanest solution when rsync is the culprit
trickle -u 20480 cmd	Per-process bandwidth cap for any command — no root needed	`-u` upload limit · `-d` download limit (in KB/s)
sysctl net.ipv4.tcp_tw_reuse=1	Allow reuse of TIME_WAIT sockets for new outgoing connections	Persist via `/etc/sysctl.d/` · safe, unlike the removed tcp_tw_recycle