Chapter 2 — CPU Bottlenecks
High CPU load is one of the most visible performance problems — the fans spin up, response times climb, and htop fills with red bars. But not all CPU consumption is a problem. A machine compiling code or transcoding video at 100% is doing exactly what it should. This chapter is about telling the two apart, and acting safely when intervention is needed.
What this chapter covers: CPU utilisation vs load average — why they differ. Per-core breakdown with mpstat. Finding CPU-hungry processes. Scenario 1: dozens of the same process — how to count, identify, and safely resolve. Scenario 2: one process at 100% — runaway vs legitimate. Zombie processes. Adjusting priority with nice/renice. CPU affinity with taskset. Context switching and CPU steal.
CPU Utilisation vs Load Average — The Key Difference
Chapter 1 covered load average. CPU utilisation is a different measurement — and the relationship between them tells you a great deal about what's actually happening.
📊
CPU Utilisation (%)
What percentage of CPU time is being used right now. Shown in top and htop as %Cpu(s): us sy id wa. A value of 100% means a core is fully occupied — it cannot give more.
⚖️
Load Average
Average number of processes wanting to run (or blocked on I/O) over 1, 5, and 15 minutes. Includes I/O-waiting processes that use no CPU at all. Load can be high while CPU is idle.
🔍
The telling combination
High load + high CPU% → CPU bottleneck.
High load + low CPU% + high wa → disk/I/O bottleneck, not CPU.
Low load + slow app → look at network or locking.
The CPU utilisation breakdown in top
When you press 1 in top or look at the per-core bars in htop, you see multiple CPU components. Understanding these tells you where CPU time is going:
$ top
%Cpu0 : 45.2 us, 12.1 sy, 0.0 ni, 40.2 id, 2.1 wa, 0.0 hi, 0.4 si, 0.0 st
%Cpu1 : 2.1 us, 1.0 sy, 0.0 ni, 96.0 id, 0.5 wa, 0.0 hi, 0.4 si, 0.0 st
%Cpu2 : 98.0 us, 1.5 sy, 0.0 ni, 0.0 id, 0.5 wa, 0.0 hi, 0.0 si, 0.0 st
| Field | Name | What it means | Alert when… |
| us | User | CPU time in user-space processes (your applications) | High is normal for CPU-bound apps; watch for unexpected processes |
| sy | System | CPU time in kernel-space (system calls, drivers) | Above 20% sustained — driver issue, NFS, or syscall-heavy app |
| ni | Nice | CPU time used by niced (low-priority) processes | Rarely a concern on its own |
| id | Idle | CPU time doing nothing | If high but system is slow, the bottleneck is elsewhere (disk, network) |
| wa | I/O wait | CPU idle while waiting for disk I/O to complete | Above 10–15% sustained — disk bottleneck, not CPU |
| hi | Hardware IRQ | Servicing hardware interrupts (NIC packets, disk) | Above 5% — very high network or disk interrupt rate |
| si | Software IRQ | Kernel software interrupt processing | High on a busy network server — normal; high on a quiet box — investigate |
| st | Steal | CPU time stolen by the hypervisor (VMs only) | Above 5% on a VM — your host is overloaded; contact your cloud provider |
Process States — The S Column in htop
Every process is always in one of a small number of states. The S column in htop and ps tells you the current state — and the pattern of states across all your processes reveals what the system is waiting on.
R
Running / Runnable
Process is either on a CPU right now or queued waiting for one. Many R-state processes = CPU saturation.
S
Sleeping (interruptible)
Process is waiting for an event (keyboard input, network data, a timer). Will wake immediately when the event arrives. Normal and healthy.
D
Disk Wait (uninterruptible)
Blocked on I/O — disk read, NFS response, or similar. Cannot be interrupted or killed until the I/O completes. Many D-state processes = I/O bottleneck.
Z
Zombie
Process has exited but its parent hasn't called wait() to collect the exit code. Uses no CPU or memory. Kill -9 does nothing — the fix is in the parent.
T
Stopped
Suspended by SIGSTOP or Ctrl+Z. Will not run until it receives SIGCONT. Sometimes left behind by debug sessions or background jobs.
The D-state trap: A process in D state cannot be killed. It holds its PID, any file locks it owns, and any resources it had open — and it will stay that way until the I/O it's waiting on completes (or times out). If you have many D-state processes, the disk or NFS mount is the problem, not the processes. Fix the I/O, and they'll wake up.
Per-Core Breakdown with mpstat
htop shows per-core bars visually. mpstat gives you the numbers — useful when you want to identify which specific core is being saturated, or when you're working over SSH without a graphical display.
$ mpstat -P ALL 1 3
14:35:01 CPU %usr %nice %sys %iowait %irq %soft %steal %idle
14:35:02 all 35.2 0.0 3.1 8.2 0.0 0.3 0.0 53.2
14:35:02 0 12.0 0.0 2.1 18.0 0.0 0.2 0.0 67.7
14:35:02 1 98.0 0.0 1.5 0.0 0.0 0.0 0.0 0.5
14:35:02 2 8.0 0.0 3.2 12.0 0.0 0.4 0.0 76.4
14:35:02 3 2.8 0.0 6.1 2.4 0.0 0.6 0.0 88.1
Always look at per-core stats, not just the average. A single-threaded runaway process on a 16-core machine will show as 6% average CPU load — completely invisible in the top-line average — while completely saturating one core. mpstat -P ALL 1 or pressing 1 in top/htop reveals it instantly.
Finding CPU-Hungry Processes
Once you know the system is CPU-bound, the next step is identifying which process is responsible. Several tools approach this from different angles:
$ ps aux --sort=-%cpu | head -10
USER PID %CPU %MEM VSZ RSS STAT START TIME COMMAND
www-data 4821 97.8 0.5 520000 85000 R 14:12 3:42 python3 /opt/app/worker.py
mysql 1234 4.2 5.1 2500000 840000 S 09:00 45:23 mysqld
$ ps -p 4821 -o pid,etime,pcpu,pmem,comm
PID ELAPSED %CPU %MEM COMMAND
4821 00:03 97.8 0.5 python3
$ top -p 4821
$ pidstat -u 1 5
14:35:01 UID PID %usr %system %CPU CPU Command
14:35:02 1000 4821 97.0 0.5 97.5 1 python3
14:35:03 1000 4821 98.0 0.3 98.3 1 python3
Scenario 1 — Dozens of the Same Process
1
Count exactly how many there are. The number matters — 8 Python workers for a web app is normal; 400 is not.
$ pgrep -c python3
47
$ pgrep -c -u www-data python3
47
2
Find out what they're actually running — are they all the same script, or different things?
$ ps aux | grep python3 | grep -v grep | awk '{print $NF}' | sort | uniq -c | sort -rn
47 /opt/app/worker.py
$ ps aux | grep python3 | grep -v grep | head -5
www-data 4821 12.1 0.5 520M 85M S 14:12 0:03 python3 /opt/app/worker.py --queue celery
www-data 4822 9.8 0.5 518M 83M S 14:12 0:02 python3 /opt/app/worker.py --queue celery
3
Find the parent process — who spawned all these workers? The parent's PID is the key to controlling the pool.
$ ps aux --forest | grep -A 50 celery | head -20
www-data 4800 0.5 0.3 200M 45M S 14:11 0:01 celery worker -A myapp --concurrency=50
www-data 4821 12.1 0.5 520M 85M S 14:12 0:03 \_ python3 /opt/app/worker.py
www-data 4822 9.8 0.5 518M 83M S 14:12 0:02 \_ python3 /opt/app/worker.py
$ ps -p 4821 -o ppid=
4800
4
Decide: is this a normal worker pool or a runaway?
Normal signs: all workers belong to one parent, the parent is a known service (celery, gunicorn, uwsgi, apache), the count matches the configured concurrency level, workers are in S state (sleeping, waiting for work).
Runaway signs: workers are all in R state simultaneously, the count is growing over time (watch -n 1 'pgrep -c python3'), there's no obvious parent managing them, or the parent is a plain shell script.
5
If it's a runaway — stop the parent first, not the children. Killing children while the parent is alive just causes it to respawn them.
$ kill 4800
$ pgrep -c python3
0
$ kill -9 4800
$ pkill -9 -u www-data python3
6
If it's a legitimate pool that's overloaded — don't kill it. Reduce the concurrency setting in the service configuration (e.g., celery's --concurrency, gunicorn's --workers) and reload the service. Killing workers under load will drop in-flight requests and may corrupt queued tasks.
If you're unsure whether the process count is normal, check the service's documentation or configuration file for its concurrency setting. A well-configured service should document how many workers it's expected to spawn.
Scenario 2 — One Process at 100% CPU
1
Find the process and note its PID, user, and command.
$ ps aux --sort=-%cpu | head -3
USER PID %CPU %MEM VSZ RSS STAT START TIME COMMAND
deploy 9142 99.5 0.8 180000 32000 R 15:04 8:32 /usr/bin/python3 -c "while True: pass"
2
Check elapsed time vs CPU time — they should roughly match for a legitimate job.
$ ps -p 9142 -o pid,etime,cputime,pcpu,comm
PID ELAPSED TIME %CPU COMMAND
9142 00:08:41 00:08:32 99.5 python3
9000 01:02:15 01:00:44 97.3 ffmpeg
3
Attach strace to see what the process is doing right now — is it making system calls (doing real work) or spinning with no system calls (tight loop)?
$ strace -p 9142 -c -e trace=all
strace: Process 9142 attached
^C
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- --------
0.00 0.000000 0 2 read
0.00 0.000000 0 1 write
# Only 3 syscalls in 5 seconds — the process is running pure user-space code
# in a tight loop. Nothing is being read, written, or computed productively.
# This is a runaway.
99.8 8.432100 1234 6832 read ← reading input data
0.1 0.012000 80 150 write ← writing output
4
If it's a legitimate but disruptive job — reduce its priority with renice instead of killing it. This lets the job complete while giving other processes more CPU time.
$ renice +10 -p 9142
9142 (process ID) old priority 0, new priority 10
5
If it's genuinely a runaway — send SIGTERM first, then SIGKILL. Always try SIGTERM first — it lets the process clean up (close files, flush buffers, release locks). Only use SIGKILL if SIGTERM is ignored.
$ kill 9142
$ sleep 5 && ps -p 9142
PID TTY STAT TIME COMMAND
9142 pts/0 R 8:41 python3 ← still running after 5s
$ kill -9 9142
$ ps -p 9142
PID TTY STAT TIME COMMAND
← empty, process is gone
Before killing any unfamiliar process, check what it is: ls -la /proc/9142/exe shows the real binary path, and cat /proc/9142/cmdline | tr '\0' ' ' shows the full command including all arguments.
Zombie Processes
A zombie (state Z) is a process that has finished executing but hasn't been fully removed from the process table because its parent hasn't called wait() to collect its exit status. The zombie itself is harmless — it uses no CPU, no memory, and no file descriptors. It occupies only a PID slot.
Zombie Process Lifecycle
Parent process Child process
│ │
├──── fork() ──────────►│
│ │ (child does work)
│ │
│ │ exit() ← child finishes
│ │
│ Parent must call wait() to reap the child.
│ Until then: child becomes a ZOMBIE (Z state).
│
├── wait() called ──────► Child fully removed from process table ✓
│
If parent dies before calling wait():
└── Zombie gets reparented to init/systemd → reaped automatically ✓
If parent is alive but buggy (never calls wait()):
└── Zombies accumulate. They don't consume resources, but they
do consume PID slots. On a 32-bit system, the PID limit is
32768 — fill those with zombies and no new processes can start.
$ ps aux | awk '$8=="Z" {print $0}'
www-data 12041 0.0 0.0 0 0 Z 15:32 0:00 [defunct]
www-data 12042 0.0 0.0 0 0 Z 15:32 0:00 [defunct]
$ ps aux | awk '$8=="Z"' | wc -l
2
$ ps -p 12041 -o ppid=
12000
$ kill 12000
1–5 zombies on a busy server is normal — they typically clear within a few seconds as the parent gets around to calling wait(). Only investigate if the count is large and stable (the parent is stuck), or if it's growing steadily (a bug causing the parent to never reap its children).
Adjusting Priority — nice and renice
The nice value tells the Linux scheduler how much to favour or deprioritise a process when competing for CPU time. It does not limit CPU usage — on an otherwise idle machine, a niced process still runs at full speed. The effect is only felt when there's competition.
-20 = Highest priority
0 = Default
+19 = Lowest priority
-20-100+10+19
$ nice -n 10 python3 heavy_script.py
$ nice -n 19 make -j8
$ renice +10 -p 4821
$ renice +10 -u www-data
$ renice -5 -p 4821
$ ps -p 4821 -o pid,ni,comm
PID NI COMMAND
4821 10 python3
Only root can set negative nice values (raising priority above default). Any user can lower their own processes' priority (raise the nice value toward +19). You cannot renice another user's processes without root.
CPU Affinity — taskset
By default the kernel scheduler can run any process on any available CPU core. CPU affinity lets you pin a process to specific cores — useful when you want to isolate a CPU-intensive job so it doesn't interfere with latency-sensitive processes on other cores.
$ taskset -c 0 python3 heavy_script.py
$ taskset -c 0,1 python3 heavy_script.py
$ taskset -c 0-3 make -j4
$ taskset -cp 2,3 4821
pid 4821's current affinity list: 0-7
pid 4821's new affinity list: 2,3
$ taskset -cp 4821
pid 4821's current affinity list: 2,3
Practical use case: You have a production web server on cores 0–3 and want to run a CPU-intensive batch job without it competing for those cores. Pin the batch job to cores 4–7 with taskset -c 4-7 batch_job. The web server stays responsive on its own cores regardless of what the batch job does.
Context Switching
Every time the kernel switches from running one process to another, it performs a context switch — saving the outgoing process's state and loading the incoming one's. Context switches are cheap individually but add up: a system with thousands of context switches per second per core is spending meaningful time on overhead rather than useful work.
$ vmstat 1 5
r b swpd free buff cache si so bi bo in cs us sy id wa
2 0 0 4.1G 201M 8.2G 0 0 0 12 412 1821 12 3 84 1
8 0 0 4.0G 201M 8.2G 0 0 0 8 890 9200 45 8 46 1
12 0 0 3.9G 201M 8.2G 0 0 0 14 1100 24000 60 15 24 1
$ pidstat -w -p 4821 1 5
14:42:01 PID cswch/s nvcswch/s Command
14:42:02 4821 240.0 8120.0 python3
Kill vs Throttle — Choosing the Right Response
- The process is doing legitimate work (backup, build, encoding)
- Killing it would mean losing progress or corrupting output
- The process belongs to a production service that must keep running
- The problem is timing — the job ran at peak hours when it shouldn't
- The impact is "annoying but manageable" — not an outage
- You can talk to the person who started it and co-ordinate
- The process is confirmed to be an infinite loop / runaway
- It is actively causing an outage for other users or services
- strace shows no productive work — pure CPU spin
- The process belongs to a test/dev environment, not production
- It's already consuming all available CPU and growing
- The process is owned by a user who isn't reachable for co-ordination
Always SIGTERM before SIGKILL. SIGTERM gives the process a chance to close files, flush buffers, release database connections, and delete temp files. SIGKILL bypasses all of that — it's the equivalent of pulling the power cable. For a database, SIGKILL can mean corruption. For a web service, it means dropped connections. Reserve it for when SIGTERM has demonstrably failed.
Quick Reference — Chapter 2 Commands
| Command | Purpose | Key flags / notes |
| mpstat -P ALL 1 | Per-core CPU breakdown every second | Press 1 in top/htop for the same view interactively |
| ps aux --sort=-%cpu | Snapshot of all processes sorted by CPU usage (highest first) | | head -10 to limit output |
| ps aux --forest | Process tree — shows parent/child relationships | Combine with | grep -A 20 processname |
| ps -p PID -o etime,cputime,pcpu,comm | Elapsed time vs accumulated CPU time for one process | Match these two to spot runaway processes |
| pgrep -c name | Count processes matching name | -u user filter by user · -l list PIDs |
| pgrep -a name | List PIDs and full command lines matching name | Better than ps | grep — no grep process in results |
| strace -p PID -c | Attach to running process, summarise system calls | Run for 5–10 seconds then Ctrl+C for the summary |
| pidstat -u 1 | Per-process CPU stats updated every second | -w for context switches · -p PID for one process |
| renice +10 -p PID | Reduce a running process's CPU priority | Range +1 to +19 lowers priority; negative values need root |
| nice -n 10 cmd | Launch a command at reduced priority | -n 19 for lowest priority batch jobs |
| taskset -c 0,1 cmd | Launch command pinned to specific CPU cores | taskset -cp 2,3 PID to change affinity of running process |
| kill PID | Send SIGTERM — graceful shutdown request | Always try this first. Wait 10–15 seconds before escalating. |
| kill -9 PID | Send SIGKILL — forced, immediate termination | Last resort. No cleanup. Can't be caught or ignored. |
| pkill -P PID | Send SIGTERM to all children of a parent process | -9 for SIGKILL · -u user name by user and name |
| ps aux | awk '$8=="Z"' | List zombie processes | Find parent with ps -p ZOMBIEPID -o ppid= then restart parent |