System-Wide Tuning & Kernel Parameters
Chapter 7 — System-Wide Tuning & Kernel Parameters
The Linux kernel exposes hundreds of tunable parameters that control how it manages memory, handles network connections, and allocates resources between processes. Most of the time defaults are fine. But when defaults aren't fine — a high-traffic web server, a database under load, or a system constantly throwing "too many open files" — understanding how to find, change, and (critically) revert these parameters safely is essential.
sysctl — Reading and Writing Kernel Parameters
Kernel tunables are exposed as files under /proc/sys/. sysctl is a convenient interface for reading and writing them. Changes made at runtime take effect immediately but are lost on reboot unless written to a config file. This is actually useful: a bad change won't survive a reboot, giving you a natural safety net.
/proc/sys/ maps to the sysctl parameter name — dots replace slashes. So /proc/sys/vm/swappiness ↔ vm.swappiness, and /proc/sys/net/ipv4/tcp_fin_timeout ↔ net.ipv4.tcp_fin_timeout.
Key Parameters to Know
vm.swappiness (default 60)How aggressively to swap. 0–10 for servers with plenty of RAM that must avoid swap latency. 60 for general use. 0 does not disable swap — it just deprioritises it strongly.
vm.dirty_ratio (default 20)Max percentage of RAM that can be dirty (unwritten to disk) before processes are throttled to wait for writes. Too low = frequent pauses; too high = large write spikes on flush.
vm.dirty_background_ratio (default 10)When background flushing starts. Set lower than dirty_ratio. The kernel starts writing dirty pages quietly in the background once this is exceeded.
vm.vfs_cache_pressure (default 100)How aggressively to reclaim inode and dentry cache. 50 = keep cache longer (good for file-heavy workloads); 200 = reclaim more aggressively.
net.core.somaxconn (default 128)Max listen queue depth. A web server under high connection rate needs this raised (1024–65535). If your backlog exceeds this, new connections are silently dropped.
net.ipv4.ip_local_port_range (default 32768–60999)Ephemeral port range for outgoing connections. Widen to 1024–65535 on servers making many outgoing connections (proxies, databases). Fixes port exhaustion.
net.ipv4.tcp_fin_timeout (default 60)How long FIN_WAIT2 state lasts in seconds. Reducing to 15–30 frees sockets faster under heavy connection churn.
net.ipv4.tcp_tw_reuse (default 0)Allow TIME_WAIT sockets to be reused for outgoing connections. Set to 1 on busy outbound services. Safe — never re-enable the removed tcp_tw_recycle.
net.core.rmem_max / wmem_maxMax TCP receive / send buffer size. Raise on high-bandwidth hosts (10 GbE+) to allow larger windows:
net.core.rmem_max=16777216.
fs.file-max (default ~800,000+)System-wide maximum total open file descriptors across all processes. Rarely the bottleneck — per-process ulimit usually hits first. Check with
cat /proc/sys/fs/file-nr (used / unused / max).fs.inotify.max_user_watches (default 8192)Maximum inotify watches per user. IDEs, build tools, and development servers (webpack, vite) watch thousands of files. If exhausted: "ENOSPC: System limit for number of file watchers reached." Fix: raise to 524288.
fs.inotify.max_user_instances (default 128)How many inotify instances a single user can have open. Raise to 256–512 on developer workstations.
kernel.pid_max (default 32768)Maximum PID number. On a server spawning many short-lived processes, PID space wraps around — raise to 4194304 (the Linux maximum) to avoid PID collision edge cases.
kernel.panic (default 0)Seconds to wait before auto-rebooting after a kernel panic. 0 = don't reboot (shows the panic for investigation). Servers often set 10 to auto-recover from transient kernel crashes.
kernel.shmmax (default varies)Maximum single shared memory segment in bytes. PostgreSQL requires this to be larger than shared_buffers. Common fix for "FATAL: could not create shared memory segment."
kernel.nmi_watchdog (default 1)Hardware watchdog that detects CPU lockups. Set 0 only if needed to reduce PMU counter usage for specific profiling tools.
ulimit — Per-Process Resource Limits
While sysctl fs.file-max is a system-wide ceiling, ulimit controls per-process (and per-user-session) limits. They operate in layers:
-
Kernel max
fs.file-max— the absolute ceiling for the entire system. No single process can exceed this across all its FDs combined. -
Hard limit
Set in
/etc/security/limits.confor/etc/security/limits.d/. Only root can raise the hard limit. Unprivileged users can only lower the hard limit or raise their soft limit up to the hard limit. - Soft limit The actual working limit enforced on the process. A process can raise its own soft limit up to the hard limit without root. This is what applications typically hit when they report "too many open files."
-
systemd override
For services managed by systemd,
LimitNOFILE=in the unit file overrideslimits.confentirely. This is the correct way to raise limits for modern services — limits.conf is not read by systemd services by default.
Persistent limits — /etc/security/limits.d/
systemd service limits
tuned-adm — Pre-Packaged Tuning Profiles
tuned is a daemon that applies a set of kernel parameters, CPU governor settings, and disk scheduler settings as a named profile. Rather than researching and setting dozens of individual sysctl values, you pick a profile that matches your workload type. It is installed by default on RHEL/CentOS/Fedora and available on Debian/Ubuntu.
sar — Historical System Activity Data
sar (System Activity Reporter) is part of the sysstat package. When sysstat is installed and its collection service is running, it records CPU, memory, disk, and network statistics every 10 minutes to files in /var/log/sysstat/. This lets you look back at what was happening hours or days ago — invaluable when someone reports "the server was slow this morning" and you weren't there.
grep ENABLED /etc/default/sysstat — on Debian/Ubuntu it ships with ENABLED="false" and you must change it to "true" and restart the service. Without this, the collection timers are installed but do nothing.
Scenario 1 — "Too Many Open Files" Errors
Scenario 2 — A sysctl Change Made Things Worse
sysctl -w survive until the next reboot, but no longer. If you make a bad change and can't figure out the correct value, rebooting restores whatever your config files specify. Make changes to the runtime kernel first; only write to config files once you've confirmed the change is an improvement.Quick Reference — Chapter 7 Commands
| Command | Purpose | Notes |
|---|---|---|
| sysctl -a | grep vm\. | List all virtual memory kernel parameters | Use sysctl parameter.name to read one value. Dot notation maps to /proc/sys/ path. |
| sysctl -w vm.swappiness=10 | Change a kernel parameter at runtime (lost on reboot) | Verify with sysctl vm.swappiness. Safe — rebooting undoes it unless in a config file. |
| sysctl --system | Reload all config files from /etc/sysctl.d/ (simulates boot) | sysctl -p /etc/sysctl.d/file.conf to reload just one file |
| cat /proc/PID/limits | Show a running process's actual resource limits | More reliable than ulimit -a which shows the current shell, not the target process |
| ulimit -n | Show soft open-files limit for current shell session | ulimit -Hn for hard limit · ulimit -a for all limits · ulimit -n 65536 to raise |
| cat /proc/sys/fs/file-nr | System-wide: used / unused / max open file descriptors | Rarely the bottleneck — per-process ulimit hits first |
| systemctl edit service | Create a drop-in override for a systemd unit (add LimitNOFILE=) | Follow with systemctl daemon-reload && systemctl restart service |
| tuned-adm recommend | Ask tuned what profile best suits this hardware | tuned-adm profile throughput-performance to apply · tuned-adm active to check current |
| sar -u 1 5 | CPU utilisation — sample every 1 second, 5 times | sar -r memory · sar -b I/O · sar -n DEV network · sar -f /var/log/sysstat/sa13 historical |
| grep -r "param" /etc/sysctl.d/ | Find which config file sets a particular kernel parameter | Run before modifying to know where to clean up. Also check /etc/sysctl.conf. |