CPU Bottlenecks

Chapter 2 — CPU Bottlenecks

High CPU load is one of the most visible performance problems — the fans spin up, response times climb, and htop fills with red bars. But not all CPU consumption is a problem. A machine compiling code or transcoding video at 100% is doing exactly what it should. This chapter is about telling the two apart, and acting safely when intervention is needed.

What this chapter covers: CPU utilisation vs load average — why they differ. Per-core breakdown with mpstat. Finding CPU-hungry processes. Scenario 1: dozens of the same process — how to count, identify, and safely resolve. Scenario 2: one process at 100% — runaway vs legitimate. Zombie processes. Adjusting priority with nice/renice. CPU affinity with taskset. Context switching and CPU steal.

CPU Utilisation vs Load Average — The Key Difference

Chapter 1 covered load average. CPU utilisation is a different measurement — and the relationship between them tells you a great deal about what's actually happening.

📊

CPU Utilisation (%)

What percentage of CPU time is being used right now. Shown in top and htop as %Cpu(s): us sy id wa. A value of 100% means a core is fully occupied — it cannot give more.

⚖️

Load Average

Average number of processes wanting to run (or blocked on I/O) over 1, 5, and 15 minutes. Includes I/O-waiting processes that use no CPU at all. Load can be high while CPU is idle.

🔍

The telling combination

High load + high CPU% → CPU bottleneck.
High load + low CPU% + high wa → disk/I/O bottleneck, not CPU.
Low load + slow app → look at network or locking.

The CPU utilisation breakdown in top

When you press 1 in top or look at the per-core bars in htop, you see multiple CPU components. Understanding these tells you where CPU time is going:

$ top # then press 1 to show per-core %Cpu0 : 45.2 us, 12.1 sy, 0.0 ni, 40.2 id, 2.1 wa, 0.0 hi, 0.4 si, 0.0 st %Cpu1 : 2.1 us, 1.0 sy, 0.0 ni, 96.0 id, 0.5 wa, 0.0 hi, 0.4 si, 0.0 st %Cpu2 : 98.0 us, 1.5 sy, 0.0 ni, 0.0 id, 0.5 wa, 0.0 hi, 0.0 si, 0.0 st # Cpu2 at 98% user — one process is saturating a single core # Cpu0 shows 12.1% kernel (sy) — unusually high, often a driver or syscall issue

Field	Name	What it means	Alert when…
us	User	CPU time in user-space processes (your applications)	High is normal for CPU-bound apps; watch for unexpected processes
sy	System	CPU time in kernel-space (system calls, drivers)	Above 20% sustained — driver issue, NFS, or syscall-heavy app
ni	Nice	CPU time used by niced (low-priority) processes	Rarely a concern on its own
id	Idle	CPU time doing nothing	If high but system is slow, the bottleneck is elsewhere (disk, network)
wa	I/O wait	CPU idle while waiting for disk I/O to complete	Above 10–15% sustained — disk bottleneck, not CPU
hi	Hardware IRQ	Servicing hardware interrupts (NIC packets, disk)	Above 5% — very high network or disk interrupt rate
si	Software IRQ	Kernel software interrupt processing	High on a busy network server — normal; high on a quiet box — investigate
st	Steal	CPU time stolen by the hypervisor (VMs only)	Above 5% on a VM — your host is overloaded; contact your cloud provider

Process States — The S Column in htop

Every process is always in one of a small number of states. The S column in htop and ps tells you the current state — and the pattern of states across all your processes reveals what the system is waiting on.

Running / Runnable

Process is either on a CPU right now or queued waiting for one. Many R-state processes = CPU saturation.

Sleeping (interruptible)

Process is waiting for an event (keyboard input, network data, a timer). Will wake immediately when the event arrives. Normal and healthy.

Disk Wait (uninterruptible)

Blocked on I/O — disk read, NFS response, or similar. Cannot be interrupted or killed until the I/O completes. Many D-state processes = I/O bottleneck.

Zombie

Process has exited but its parent hasn't called wait() to collect the exit code. Uses no CPU or memory. Kill -9 does nothing — the fix is in the parent.

Stopped

Suspended by SIGSTOP or Ctrl+Z. Will not run until it receives SIGCONT. Sometimes left behind by debug sessions or background jobs.

The D-state trap: A process in D state cannot be killed. It holds its PID, any file locks it owns, and any resources it had open — and it will stay that way until the I/O it's waiting on completes (or times out). If you have many D-state processes, the disk or NFS mount is the problem, not the processes. Fix the I/O, and they'll wake up.

Per-Core Breakdown with mpstat

htop shows per-core bars visually. mpstat gives you the numbers — useful when you want to identify which specific core is being saturated, or when you're working over SSH without a graphical display.

$ mpstat -P ALL 1 3 # -P ALL = show all CPUs, 1 = every second, 3 = three readings 14:35:01 CPU %usr %nice %sys %iowait %irq %soft %steal %idle 14:35:02 all 35.2 0.0 3.1 8.2 0.0 0.3 0.0 53.2 14:35:02 0 12.0 0.0 2.1 18.0 0.0 0.2 0.0 67.7 14:35:02 1 98.0 0.0 1.5 0.0 0.0 0.0 0.0 0.5 14:35:02 2 8.0 0.0 3.2 12.0 0.0 0.4 0.0 76.4 14:35:02 3 2.8 0.0 6.1 2.4 0.0 0.6 0.0 88.1 # CPU 1 is at 98% user — one single-threaded process is saturating it entirely # The 'all' line averages to ~35% — masking the real problem if you only look there

Always look at per-core stats, not just the average. A single-threaded runaway process on a 16-core machine will show as 6% average CPU load — completely invisible in the top-line average — while completely saturating one core. mpstat -P ALL 1 or pressing 1 in top/htop reveals it instantly.

Finding CPU-Hungry Processes

Once you know the system is CPU-bound, the next step is identifying which process is responsible. Several tools approach this from different angles:

# Sort ps output by CPU descending — snapshot, not live $ ps aux --sort=-%cpu | head -10 USER PID %CPU %MEM VSZ RSS STAT START TIME COMMAND www-data 4821 97.8 0.5 520000 85000 R 14:12 3:42 python3 /opt/app/worker.py mysql 1234 4.2 5.1 2500000 840000 S 09:00 45:23 mysqld # How long has the top process been running at this CPU%? $ ps -p 4821 -o pid,etime,pcpu,pmem,comm PID ELAPSED %CPU %MEM COMMAND 4821 00:03 97.8 0.5 python3 # Running for 3 minutes. TIME column in htop shows 3:42 of CPU time consumed. # This matches — it's been using CPU the entire time it's been running. # Alternative: watch a specific process live $ top -p 4821

# pidstat — per-process CPU history (from sysstat package) $ pidstat -u 1 5 14:35:01 UID PID %usr %system %CPU CPU Command 14:35:02 1000 4821 97.0 0.5 97.5 1 python3 14:35:03 1000 4821 98.0 0.3 98.3 1 python3 # CPU column shows which core — confirms CPU 1 as seen in mpstat # %usr vs %system breakdown: 97% user means it's the app code, not the kernel

Scenario 1 — Dozens of the Same Process

Count exactly how many there are. The number matters — 8 Python workers for a web app is normal; 400 is not.

$ pgrep -c python3 47 # -c = count. 47 python3 processes running right now. $ pgrep -c -u www-data python3 47 # All 47 belong to www-data — likely your web app user, not root

Find out what they're actually running — are they all the same script, or different things?

$ ps aux | grep python3 | grep -v grep | awk '{print $NF}' | sort | uniq -c | sort -rn 47 /opt/app/worker.py # All 47 are the same script — this is a worker pool pattern # uniq -c counts duplicates; sort -rn puts highest count first # Or see the full command line with arguments: $ ps aux | grep python3 | grep -v grep | head -5 www-data 4821 12.1 0.5 520M 85M S 14:12 0:03 python3 /opt/app/worker.py --queue celery www-data 4822 9.8 0.5 518M 83M S 14:12 0:02 python3 /opt/app/worker.py --queue celery

Find the parent process — who spawned all these workers? The parent's PID is the key to controlling the pool.

$ ps aux --forest | grep -A 50 celery | head -20 www-data 4800 0.5 0.3 200M 45M S 14:11 0:01 celery worker -A myapp --concurrency=50 www-data 4821 12.1 0.5 520M 85M S 14:12 0:03 \_ python3 /opt/app/worker.py www-data 4822 9.8 0.5 518M 83M S 14:12 0:02 \_ python3 /opt/app/worker.py # Parent is PID 4800 (celery) with --concurrency=50 — that explains 47 workers # --forest shows the tree. This is normal behaviour for this config. # Alternative: find the parent PID of any child process $ ps -p 4821 -o ppid= 4800

Decide: is this a normal worker pool or a runaway?

Normal signs: all workers belong to one parent, the parent is a known service (celery, gunicorn, uwsgi, apache), the count matches the configured concurrency level, workers are in S state (sleeping, waiting for work).

Runaway signs: workers are all in R state simultaneously, the count is growing over time (watch -n 1 'pgrep -c python3'), there's no obvious parent managing them, or the parent is a plain shell script.

If it's a runaway — stop the parent first, not the children. Killing children while the parent is alive just causes it to respawn them.

# SIGTERM the parent — lets it shut down gracefully and clean up children $ kill 4800 # Wait 10–15 seconds. Verify it's gone: $ pgrep -c python3 0 # If SIGTERM is ignored after 15 seconds, escalate to SIGKILL: $ kill -9 4800 # Then clean up orphaned children if any remain: $ pkill -9 -u www-data python3

If it's a legitimate pool that's overloaded — don't kill it. Reduce the concurrency setting in the service configuration (e.g., celery's --concurrency, gunicorn's --workers) and reload the service. Killing workers under load will drop in-flight requests and may corrupt queued tasks.

If you're unsure whether the process count is normal, check the service's documentation or configuration file for its concurrency setting. A well-configured service should document how many workers it's expected to spawn.

Scenario 2 — One Process at 100% CPU

Find the process and note its PID, user, and command.

$ ps aux --sort=-%cpu | head -3 USER PID %CPU %MEM VSZ RSS STAT START TIME COMMAND deploy 9142 99.5 0.8 180000 32000 R 15:04 8:32 /usr/bin/python3 -c "while True: pass" # TIME shows 8:32 of accumulated CPU time. ELAPSED would confirm wall-clock duration.

Check elapsed time vs CPU time — they should roughly match for a legitimate job.

$ ps -p 9142 -o pid,etime,cputime,pcpu,comm PID ELAPSED TIME %CPU COMMAND 9142 00:08:41 00:08:32 99.5 python3 # Elapsed: 8m41s. CPU time: 8m32s. Near-identical — this process has been using # 100% CPU for almost its entire lifetime. Very likely an infinite loop. # For a legitimate job (e.g. video encoding running 1 hour): 9000 01:02:15 01:00:44 97.3 ffmpeg # Also near 100%, but context (ffmpeg, known job) makes this expected.

Attach strace to see what the process is doing right now — is it making system calls (doing real work) or spinning with no system calls (tight loop)?

$ strace -p 9142 -c -e trace=all # Run for 5 seconds then Ctrl+C to see the summary strace: Process 9142 attached ^C % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- -------- 0.00 0.000000 0 2 read 0.00 0.000000 0 1 write # Only 3 syscalls in 5 seconds — the process is running pure user-space code # in a tight loop. Nothing is being read, written, or computed productively. # This is a runaway. # A legitimate CPU-bound job shows many syscalls: 99.8 8.432100 1234 6832 read ← reading input data 0.1 0.012000 80 150 write ← writing output

If it's a legitimate but disruptive job — reduce its priority with renice instead of killing it. This lets the job complete while giving other processes more CPU time.

# Renice to +10 (lower priority — other processes get CPU preference) $ renice +10 -p 9142 9142 (process ID) old priority 0, new priority 10 # The process still runs, but the scheduler deprioritises it when others need CPU. # Verify in htop — it will show NI column as 10.

If it's genuinely a runaway — send SIGTERM first, then SIGKILL. Always try SIGTERM first — it lets the process clean up (close files, flush buffers, release locks). Only use SIGKILL if SIGTERM is ignored.

$ kill 9142 # SIGTERM — please shut down gracefully $ sleep 5 && ps -p 9142 # Wait 5 seconds and check if it's gone PID TTY STAT TIME COMMAND 9142 pts/0 R 8:41 python3 ← still running after 5s $ kill -9 9142 # SIGKILL — forced termination, no cleanup $ ps -p 9142 PID TTY STAT TIME COMMAND ← empty, process is gone

Before killing any unfamiliar process, check what it is: ls -la /proc/9142/exe shows the real binary path, and cat /proc/9142/cmdline | tr '\0' ' ' shows the full command including all arguments.

Zombie Processes

A zombie (state Z) is a process that has finished executing but hasn't been fully removed from the process table because its parent hasn't called wait() to collect its exit status. The zombie itself is harmless — it uses no CPU, no memory, and no file descriptors. It occupies only a PID slot.

Zombie Process Lifecycle Parent process Child process │ │ ├──── fork() ──────────►│ │ │ (child does work) │ │ │ │ exit() ← child finishes │ │ │ Parent must call wait() to reap the child. │ Until then: child becomes a ZOMBIE (Z state). │ ├── wait() called ──────► Child fully removed from process table ✓ │ If parent dies before calling wait(): └── Zombie gets reparented to init/systemd → reaped automatically ✓ If parent is alive but buggy (never calls wait()): └── Zombies accumulate. They don't consume resources, but they do consume PID slots. On a 32-bit system, the PID limit is 32768 — fill those with zombies and no new processes can start.

# Find zombie processes $ ps aux | awk '$8=="Z" {print $0}' www-data 12041 0.0 0.0 0 0 Z 15:32 0:00 [defunct] www-data 12042 0.0 0.0 0 0 Z 15:32 0:00 [defunct] # How many zombies total? $ ps aux | awk '$8=="Z"' | wc -l 2 # Find the parent of a zombie (PPID column) $ ps -p 12041 -o ppid= 12000 # 1–2 zombies: normal, ignore them. # 50+ zombies and growing: bug in the parent process (PID 12000). # Fix: restart the parent — that clears its zombie children. # CANNOT kill a zombie with kill -9. It's already dead. # Kill -9 the PARENT to clear zombies (zombies reparent to init which reaps them) $ kill 12000 # SIGTERM parent first

1–5 zombies on a busy server is normal — they typically clear within a few seconds as the parent gets around to calling wait(). Only investigate if the count is large and stable (the parent is stuck), or if it's growing steadily (a bug causing the parent to never reap its children).

Adjusting Priority — nice and renice

The nice value tells the Linux scheduler how much to favour or deprioritise a process when competing for CPU time. It does not limit CPU usage — on an otherwise idle machine, a niced process still runs at full speed. The effect is only felt when there's competition.

-20 = Highest priority

0 = Default

+19 = Lowest priority

-20-100+10+19

# Launch a new process at low priority (nice +10) $ nice -n 10 python3 heavy_script.py # Launch at very low priority — use for batch jobs, backups, builds $ nice -n 19 make -j8 # Adjust priority of an already-running process (renice) $ renice +10 -p 4821 # lower priority of PID 4821 $ renice +10 -u www-data # lower priority of ALL processes by www-data $ renice -5 -p 4821 # raise priority (requires root) # Verify the change in ps: $ ps -p 4821 -o pid,ni,comm PID NI COMMAND 4821 10 python3

Only root can set negative nice values (raising priority above default). Any user can lower their own processes' priority (raise the nice value toward +19). You cannot renice another user's processes without root.

CPU Affinity — taskset

By default the kernel scheduler can run any process on any available CPU core. CPU affinity lets you pin a process to specific cores — useful when you want to isolate a CPU-intensive job so it doesn't interfere with latency-sensitive processes on other cores.

# Launch a process pinned to CPU core 0 only $ taskset -c 0 python3 heavy_script.py # Pin to cores 0 and 1 only (comma-separated, or range with dash) $ taskset -c 0,1 python3 heavy_script.py $ taskset -c 0-3 make -j4 # use first 4 cores for a build # Change affinity of a running process (PID 4821) $ taskset -cp 2,3 4821 pid 4821's current affinity list: 0-7 pid 4821's new affinity list: 2,3 # Check current affinity of a running process $ taskset -cp 4821 pid 4821's current affinity list: 2,3

Practical use case: You have a production web server on cores 0–3 and want to run a CPU-intensive batch job without it competing for those cores. Pin the batch job to cores 4–7 with taskset -c 4-7 batch_job. The web server stays responsive on its own cores regardless of what the batch job does.

Context Switching

Every time the kernel switches from running one process to another, it performs a context switch — saving the outgoing process's state and loading the incoming one's. Context switches are cheap individually but add up: a system with thousands of context switches per second per core is spending meaningful time on overhead rather than useful work.

# vmstat cs column shows context switches per second $ vmstat 1 5 r b swpd free buff cache si so bi bo in cs us sy id wa 2 0 0 4.1G 201M 8.2G 0 0 0 12 412 1821 12 3 84 1 8 0 0 4.0G 201M 8.2G 0 0 0 8 890 9200 45 8 46 1 12 0 0 3.9G 201M 8.2G 0 0 0 14 1100 24000 60 15 24 1 # cs (context switches) jumping to 24,000/sec with r=12 waiting processes # This is CPU saturation — too many threads fighting for too few cores # Per-process context switching with pidstat $ pidstat -w -p 4821 1 5 14:42:01 PID cswch/s nvcswch/s Command 14:42:02 4821 240.0 8120.0 python3 # cswch/s = voluntary context switches (process yielded the CPU, e.g. waiting for I/O — healthy) # nvcswch/s = non-voluntary (kernel forced it off because its time slice expired — high = contention)

Kill vs Throttle — Choosing the Right Response

The process is doing legitimate work (backup, build, encoding)
Killing it would mean losing progress or corrupting output
The process belongs to a production service that must keep running
The problem is timing — the job ran at peak hours when it shouldn't
The impact is "annoying but manageable" — not an outage
You can talk to the person who started it and co-ordinate

The process is confirmed to be an infinite loop / runaway
It is actively causing an outage for other users or services
strace shows no productive work — pure CPU spin
The process belongs to a test/dev environment, not production
It's already consuming all available CPU and growing
The process is owned by a user who isn't reachable for co-ordination

Always SIGTERM before SIGKILL. SIGTERM gives the process a chance to close files, flush buffers, release database connections, and delete temp files. SIGKILL bypasses all of that — it's the equivalent of pulling the power cable. For a database, SIGKILL can mean corruption. For a web service, it means dropped connections. Reserve it for when SIGTERM has demonstrably failed.

Quick Reference — Chapter 2 Commands

Command	Purpose	Key flags / notes
mpstat -P ALL 1	Per-core CPU breakdown every second	Press `1` in top/htop for the same view interactively
ps aux --sort=-%cpu	Snapshot of all processes sorted by CPU usage (highest first)	`\| head -10` to limit output
ps aux --forest	Process tree — shows parent/child relationships	Combine with `\| grep -A 20 processname`
ps -p PID -o etime,cputime,pcpu,comm	Elapsed time vs accumulated CPU time for one process	Match these two to spot runaway processes
pgrep -c name	Count processes matching name	`-u user` filter by user · `-l` list PIDs
pgrep -a name	List PIDs and full command lines matching name	Better than `ps \| grep` — no grep process in results
strace -p PID -c	Attach to running process, summarise system calls	Run for 5–10 seconds then Ctrl+C for the summary
pidstat -u 1	Per-process CPU stats updated every second	`-w` for context switches · `-p PID` for one process
renice +10 -p PID	Reduce a running process's CPU priority	Range +1 to +19 lowers priority; negative values need root
nice -n 10 cmd	Launch a command at reduced priority	`-n 19` for lowest priority batch jobs
taskset -c 0,1 cmd	Launch command pinned to specific CPU cores	`taskset -cp 2,3 PID` to change affinity of running process
kill PID	Send SIGTERM — graceful shutdown request	Always try this first. Wait 10–15 seconds before escalating.
kill -9 PID	Send SIGKILL — forced, immediate termination	Last resort. No cleanup. Can't be caught or ignored.
pkill -P PID	Send SIGTERM to all children of a parent process	`-9` for SIGKILL · `-u user name` by user and name
ps aux \| awk '$8=="Z"'	List zombie processes	Find parent with `ps -p ZOMBIEPID -o ppid=` then restart parent