System-Wide Tuning & Kernel Parameters

Chapter 7 — System-Wide Tuning & Kernel Parameters

The Linux kernel exposes hundreds of tunable parameters that control how it manages memory, handles network connections, and allocates resources between processes. Most of the time defaults are fine. But when defaults aren't fine — a high-traffic web server, a database under load, or a system constantly throwing "too many open files" — understanding how to find, change, and (critically) revert these parameters safely is essential.

What this chapter covers: sysctl — reading, changing, and persisting kernel parameters. The four parameter namespaces: vm.*, net.*, fs.*, kernel.*. ulimit — per-process resource limits, the soft/hard limit distinction. Persistent limits via /etc/security/limits.d/ and systemd LimitNOFILE. tuned-adm profiles for pre-packaged tuning. sar for viewing historical system data. Scenario 1: diagnosing and fixing "too many open files." Scenario 2: reverting a sysctl change that made things worse.

sysctl — Reading and Writing Kernel Parameters

Kernel tunables are exposed as files under /proc/sys/. sysctl is a convenient interface for reading and writing them. Changes made at runtime take effect immediately but are lost on reboot unless written to a config file. This is actually useful: a bad change won't survive a reboot, giving you a natural safety net.

# ── Reading parameters ────────────────────────────────────────── $ sysctl vm.swappiness # read one parameter vm.swappiness = 60 $ sysctl -a # list ALL parameters (very long output) $ sysctl -a | grep tcp_mem # filter for TCP memory settings $ sysctl -a 2>/dev/null | grep vm\. | sort # all vm.* params sorted # Read directly from /proc/sys (same result) $ cat /proc/sys/vm/swappiness 60 # ── Writing parameters at runtime ─────────────────────────────── # These take effect immediately but are LOST on reboot $ sysctl -w vm.swappiness=10 vm.swappiness = 10 # Or write directly to /proc/sys $ echo 10 > /proc/sys/vm/swappiness # Verify the change applied $ sysctl vm.swappiness vm.swappiness = 10 # ── Making changes persistent ──────────────────────────────────── # /etc/sysctl.conf — the traditional file (works, but gets cluttered) # /etc/sysctl.d/ — drop-in directory (preferred — keeps changes organised) $ cat /etc/sysctl.d/99-mytuning.conf vm.swappiness = 10 net.core.somaxconn = 1024 fs.file-max = 2097152 # Reload from all config files (applies persistent config to running kernel) $ sysctl --system # reads all files in /etc/sysctl.d/ and /etc/sysctl.conf $ sysctl -p /etc/sysctl.d/99-mytuning.conf # reload just one file

Naming convention: The directory path under /proc/sys/ maps to the sysctl parameter name — dots replace slashes. So /proc/sys/vm/swappiness ↔ vm.swappiness, and /proc/sys/net/ipv4/tcp_fin_timeout ↔ net.ipv4.tcp_fin_timeout.

Key Parameters to Know

vm.swappiness (default 60)
How aggressively to swap. 0–10 for servers with plenty of RAM that must avoid swap latency. 60 for general use. 0 does not disable swap — it just deprioritises it strongly.

vm.dirty_ratio (default 20)
Max percentage of RAM that can be dirty (unwritten to disk) before processes are throttled to wait for writes. Too low = frequent pauses; too high = large write spikes on flush.

vm.dirty_background_ratio (default 10)
When background flushing starts. Set lower than dirty_ratio. The kernel starts writing dirty pages quietly in the background once this is exceeded.

vm.vfs_cache_pressure (default 100)
How aggressively to reclaim inode and dentry cache. 50 = keep cache longer (good for file-heavy workloads); 200 = reclaim more aggressively.

net.core.somaxconn (default 128)
Max listen queue depth. A web server under high connection rate needs this raised (1024–65535). If your backlog exceeds this, new connections are silently dropped.

net.ipv4.ip_local_port_range (default 32768–60999)
Ephemeral port range for outgoing connections. Widen to 1024–65535 on servers making many outgoing connections (proxies, databases). Fixes port exhaustion.

net.ipv4.tcp_fin_timeout (default 60)
How long FIN_WAIT2 state lasts in seconds. Reducing to 15–30 frees sockets faster under heavy connection churn.

net.ipv4.tcp_tw_reuse (default 0)
Allow TIME_WAIT sockets to be reused for outgoing connections. Set to 1 on busy outbound services. Safe — never re-enable the removed tcp_tw_recycle.

net.core.rmem_max / wmem_max
Max TCP receive / send buffer size. Raise on high-bandwidth hosts (10 GbE+) to allow larger windows: net.core.rmem_max=16777216.

fs.file-max (default ~800,000+)
System-wide maximum total open file descriptors across all processes. Rarely the bottleneck — per-process ulimit usually hits first. Check with cat /proc/sys/fs/file-nr (used / unused / max).

fs.inotify.max_user_watches (default 8192)
Maximum inotify watches per user. IDEs, build tools, and development servers (webpack, vite) watch thousands of files. If exhausted: "ENOSPC: System limit for number of file watchers reached." Fix: raise to 524288.

fs.inotify.max_user_instances (default 128)
How many inotify instances a single user can have open. Raise to 256–512 on developer workstations.

kernel.pid_max (default 32768)
Maximum PID number. On a server spawning many short-lived processes, PID space wraps around — raise to 4194304 (the Linux maximum) to avoid PID collision edge cases.

kernel.panic (default 0)
Seconds to wait before auto-rebooting after a kernel panic. 0 = don't reboot (shows the panic for investigation). Servers often set 10 to auto-recover from transient kernel crashes.

kernel.shmmax (default varies)
Maximum single shared memory segment in bytes. PostgreSQL requires this to be larger than shared_buffers. Common fix for "FATAL: could not create shared memory segment."

kernel.nmi_watchdog (default 1)
Hardware watchdog that detects CPU lockups. Set 0 only if needed to reduce PMU counter usage for specific profiling tools.

# Check what the kernel is actually using right now for open files $ cat /proc/sys/fs/file-nr 3712 0 9223372036854775807 # Column 1: open file descriptors in use system-wide (3,712) # Column 2: unused but allocated slots (ignore, always 0 on modern kernels) # Column 3: system maximum (fs.file-max) # Check conntrack table (if running iptables/nftables with connection tracking) $ cat /proc/sys/net/netfilter/nf_conntrack_count # current tracked connections $ cat /proc/sys/net/netfilter/nf_conntrack_max # maximum allowed # If count is close to max, connections are silently dropped with "nf_conntrack: table full" # Fix: sysctl -w net.netfilter.nf_conntrack_max=524288

ulimit — Per-Process Resource Limits

While sysctl fs.file-max is a system-wide ceiling, ulimit controls per-process (and per-user-session) limits. They operate in layers:

Kernel max fs.file-max — the absolute ceiling for the entire system. No single process can exceed this across all its FDs combined.
Hard limit Set in /etc/security/limits.conf or /etc/security/limits.d/. Only root can raise the hard limit. Unprivileged users can only lower the hard limit or raise their soft limit up to the hard limit.
Soft limit The actual working limit enforced on the process. A process can raise its own soft limit up to the hard limit without root. This is what applications typically hit when they report "too many open files."
systemd override For services managed by systemd, LimitNOFILE= in the unit file overrides limits.conf entirely. This is the correct way to raise limits for modern services — limits.conf is not read by systemd services by default.

# ulimit in the current shell session $ ulimit -a # show all limits for current shell $ ulimit -n # show soft open-files limit 1024 $ ulimit -Hn # show hard open-files limit 1048576 # Raise the soft limit for the current shell (up to the hard limit, no root needed) $ ulimit -n 65536 # Other useful limits $ ulimit -u # max user processes $ ulimit -v # max virtual memory (kB) $ ulimit -s # stack size (kB) # Read a running process's actual limits (ignores current shell) $ cat /proc/8821/limits Limit Soft Limit Hard Limit Units Max open files 1024 1048576 files Max processes 128453 128453 processes Max address space unlimited unlimited bytes # This process's soft open-files limit is still 1024 — even if you raised # the shell limit later, this process was already launched with 1024.

Persistent limits — /etc/security/limits.d/

# /etc/security/limits.d/99-myapp.conf # Format: domain type item value # domain: username, @groupname, or * (all users) # Raise open file limits for the app user appuser soft nofile 65536 appuser hard nofile 65536 # Raise for all users in the www-data group @www-data soft nofile 32768 @www-data hard nofile 32768 # Raise the process limit system-wide (for servers with many workers) * soft nproc 65536 * hard nproc 65536 # IMPORTANT: limits.conf only takes effect on NEW login sessions. # A running process does not pick up changes until it restarts. # To verify a new login sees the new limits: $ su - appuser -c "ulimit -n" 65536

systemd service limits

# For services managed by systemd, limits.conf is NOT read. # Set limits in the service unit file (or a drop-in override). # Option 1: Add to /etc/systemd/system/myapp.service [Service] LimitNOFILE=65536 LimitNPROC=65536 # Option 2: Create a drop-in override (better — doesn't touch the original unit) $ systemctl edit myapp # opens editor for a drop-in override # Contents of the drop-in (systemctl edit creates this automatically): [Service] LimitNOFILE=65536 # Reload systemd and restart the service for changes to take effect $ systemctl daemon-reload $ systemctl restart myapp # Verify the service has the new limit $ systemctl show myapp | grep LimitNOFILE LimitNOFILE=65536 $ cat /proc/$(systemctl show -p MainPID --value myapp)/limits | grep "open files" Max open files 65536 65536 files

tuned-adm — Pre-Packaged Tuning Profiles

tuned is a daemon that applies a set of kernel parameters, CPU governor settings, and disk scheduler settings as a named profile. Rather than researching and setting dozens of individual sysctl values, you pick a profile that matches your workload type. It is installed by default on RHEL/CentOS/Fedora and available on Debian/Ubuntu.

throughput-performance

Maximises CPU throughput and I/O bandwidth. Sets CPU governor to performance, disables power saving, tunes network buffers for bulk data. Best for: batch jobs, database servers, data processing.

latency-performance

Minimises response latency. Disables all CPU power saving, disables transparent huge pages, tunes for low-latency I/O. Best for: trading systems, real-time applications, gaming servers.

balanced

Moderate tuning across power efficiency and performance. The default on most distributions. Best for: general-purpose servers, development machines.

powersave

Minimises energy consumption. CPU governor set to powersave, reduces disk write frequency. Best for: idle servers, edge devices, cost-sensitive deployments.

virtual-guest

Tuned for running inside a hypervisor. Disables unnecessary host-level tuning, reduces overhead for virtualised I/O. Best for: any VM or VPS.

network-latency

Focuses on network response time — tunes TCP parameters, disables offload features that add latency, enables RPS/RFS for multi-core NIC handling. Best for: high-traffic web/API servers.

# Install tuned if not present $ apt install tuned # Debian/Ubuntu $ dnf install tuned # RHEL/Fedora $ systemctl enable --now tuned # Working with profiles $ tuned-adm active # show current profile Current active profile: balanced $ tuned-adm list # all available profiles $ tuned-adm recommend # recommend a profile for this hardware virtual-guest # Apply a profile (takes effect immediately, survives reboot) $ tuned-adm profile throughput-performance Switching to profile 'throughput-performance' # Verify what parameters it actually changed $ tuned-adm profile_info throughput-performance $ cat /etc/tuned/throughput-performance/tuned.conf # if you have the profile installed # Revert to default (or any other profile) $ tuned-adm profile balanced

sar — Historical System Activity Data

sar (System Activity Reporter) is part of the sysstat package. When sysstat is installed and its collection service is running, it records CPU, memory, disk, and network statistics every 10 minutes to files in /var/log/sysstat/. This lets you look back at what was happening hours or days ago — invaluable when someone reports "the server was slow this morning" and you weren't there.

# Install (if not present) $ apt install sysstat # Debian/Ubuntu $ dnf install sysstat # RHEL/Fedora $ systemctl enable --now sysstat # ── Live sampling ─────────────────────────────────────────────── $ sar -u 1 5 # CPU utilisation — sample every 1s, 5 times $ sar -r 1 5 # memory utilisation — 1s interval, 5 samples $ sar -b 1 5 # I/O statistics — transfers, reads, writes per second $ sar -n DEV 1 5 # network — packets and bytes per interface per second # ── Historical data from today ────────────────────────────────── $ sar -u # CPU history from midnight to now (default) $ sar -r # memory history from midnight to now $ sar -u -s 08:00 -e 12:00 # CPU between 08:00 and 12:00 today Linux 6.1.0 (myserver) 06/14/2026 _x86_64_ (8 CPU) 08:00:01 AM CPU %user %nice %system %iowait %steal %idle 08:10:01 AM all 12.3 0.0 4.1 28.4 0.0 55.2 08:20:01 AM all 14.1 0.0 5.2 31.7 0.0 49.0 # High %iowait between 08:00–08:20 — something was hammering disk at that time. # Correlate with sar -b and sar -d to identify which disk. # ── Historical data from a previous day ───────────────────────── $ sar -u -f /var/log/sysstat/sa13 # CPU data from the 13th of this month $ ls /var/log/sysstat/ # see what data files exist sa12 sa13 sa14 # Memory — key columns in sar -r output: # %memused — percentage of RAM in use (including buffers/cache) # kbbuffers — memory used as kernel buffers # kbcached — memory used as page cache (reclaimable) # %commit — percentage of RAM+swap committed to processes $ sar -r -s 02:00 -e 06:00 # investigate a memory spike at 3am

sysstat not collecting? Check grep ENABLED /etc/default/sysstat — on Debian/Ubuntu it ships with ENABLED="false" and you must change it to "true" and restart the service. Without this, the collection timers are installed but do nothing.

Scenario 1 — "Too Many Open Files" Errors

Find the process and check its actual open file count vs its limit.

# Find the PID of the service $ pgrep -a myapp 8821 /usr/bin/myapp --config /etc/myapp/config.toml # How many file descriptors is it actually using right now? $ ls /proc/8821/fd | wc -l 1021 # What is its soft limit? $ cat /proc/8821/limits | grep "open files" Max open files 1024 1048576 files # The process has 1,021 FDs open against a soft limit of 1,024. # It's about to hit the wall — or already hitting it. # The hard limit is 1,048,576 so there is plenty of room to raise the soft limit.

Understand what those file descriptors are — are there any leaks?

# Break down open FDs by type $ lsof -p 8821 | awk '{print $5}' | sort | uniq -c | sort -rn 847 IPv4 ← 847 open TCP connections 98 REG ← 98 regular files 44 PIPE ← 44 pipes 32 sock ← 32 Unix domain sockets # 847 TCP connections from one process is a lot. # Check their states — are they all active, or is there a CLOSE_WAIT buildup? $ lsof -p 8821 | grep IPv4 | awk '{print $NF}' | sort | uniq -c 763 (ESTABLISHED) 84 (CLOSE_WAIT) # 84 CLOSE_WAIT connections indicate the app is not closing connections properly. # That's a code-level bug. Fix the bug long-term, but raise the limit short-term.

Immediate relief — if this is a systemd service, add LimitNOFILE to the unit.

# Create a drop-in override $ systemctl edit myapp # Add the following in the editor that opens: [Service] LimitNOFILE=65536 # Apply and restart $ systemctl daemon-reload $ systemctl restart myapp # Verify the new limit is in effect $ cat /proc/$(systemctl show -p MainPID --value myapp)/limits | grep "open files" Max open files 65536 65536 files

If this is a non-systemd process or a user-launched application — use limits.d/

# Add a drop-in limits file (takes effect on next login for that user) $ cat > /etc/security/limits.d/99-myapp.conf << 'EOF' appuser soft nofile 65536 appuser hard nofile 65536 EOF # Test as that user (in a new login session) $ su - appuser -c "ulimit -n" 65536

Also check the system-wide ceiling (rarely the bottleneck, but worth verifying).

$ cat /proc/sys/fs/file-nr 9841 0 9223372036854775807 # 9,841 open FDs system-wide out of a virtually unlimited ceiling. Not the issue. # If you ever do hit the system-wide limit (very unusual on modern systems): $ sysctl -w fs.file-max=2097152 $ echo "fs.file-max = 2097152" > /etc/sysctl.d/99-fs-limits.conf

The most common "too many open files" cause is not a limit that is too low — it's a leak. Connection pools not being returned, file handles not being closed, or CLOSE_WAIT accumulation. Raising the limit is the right first step to restore service, but investigate the root cause so you're not raising limits indefinitely.

Scenario 2 — A sysctl Change Made Things Worse

Confirm the current (bad) value and find the default to revert to.

$ sysctl vm.dirty_ratio vm.dirty_ratio = 3 # What is the kernel default? The safest source is the kernel documentation. # For common parameters, the defaults are well known: # vm.dirty_ratio → 20 (default on most systems) # vm.dirty_background_ratio → 10 (default on most systems) # vm.swappiness → 60 # net.core.somaxconn → 4096 (was 128 before kernel 5.4) # # You can also check an unmodified system of the same kernel version, # or look at /usr/lib/sysctl.d/50-default.conf or equivalent.

Revert the runtime value immediately — no reboot needed.

# Set the runtime value back to the default $ sysctl -w vm.dirty_ratio=20 vm.dirty_ratio = 20 # Verify it applied $ sysctl vm.dirty_ratio vm.dirty_ratio = 20 # The running kernel is now using the reverted value. # Applications will see improved behaviour immediately — no restart needed.

Remove or correct the persistent config file so it doesn't come back after reboot.

# Find which config file set this $ grep -r "dirty_ratio" /etc/sysctl.conf /etc/sysctl.d/ 2>/dev/null /etc/sysctl.d/99-custom.conf:vm.dirty_ratio=3 # Remove or correct the bad line $ sed -i '/dirty_ratio=3/d' /etc/sysctl.d/99-custom.conf # Or, if reverting the entire file, remove it altogether $ rm /etc/sysctl.d/99-custom.conf # Verify nothing in the config files would re-apply the bad value on reboot $ grep -r "dirty_ratio" /etc/sysctl.conf /etc/sysctl.d/ 2>/dev/null # (no output) — the bad value is gone from all config files

Test the revert by doing a dry-run of what would happen on reboot.

# Reload all sysctl config files (simulates what happens at boot) $ sysctl --system * Applying /usr/lib/sysctl.d/50-default.conf ... * Applying /etc/sysctl.d/10-network.conf ... * Applying /etc/sysctl.d/99-custom.conf ... ← no longer sets dirty_ratio * Applying /etc/sysctl.conf ... # Verify the final state is what you want $ sysctl vm.dirty_ratio vm.dirty_background_ratio vm.dirty_ratio = 20 vm.dirty_background_ratio = 10 # Both back to defaults. A reboot would produce the same result.

If you want to tune this correctly — understand what the parameters actually control before changing them.

# Monitor dirty memory in real time to understand your workload before tuning $ watch -n 1 'grep -E "Dirty|Writeback" /proc/meminfo' Dirty: 45312 kB ← currently dirty, not yet written Writeback: 256 kB ← currently being written to disk # If Dirty rarely exceeds a few hundred MB, vm.dirty_ratio=20 is already fine. # Only tune dirty_ratio downward if Dirty regularly hits tens of GB and causes # unacceptable pause latency when the kernel forces a flush. # A safer approach than lowering dirty_ratio: use ionice to deprioritise # I/O-heavy background processes rather than tightening the global dirty limit.

The key safety property of sysctl: changes made with sysctl -w survive until the next reboot, but no longer. If you make a bad change and can't figure out the correct value, rebooting restores whatever your config files specify. Make changes to the runtime kernel first; only write to config files once you've confirmed the change is an improvement.

Quick Reference — Chapter 7 Commands

Command	Purpose	Notes
sysctl -a \| grep vm\.	List all virtual memory kernel parameters	Use `sysctl parameter.name` to read one value. Dot notation maps to `/proc/sys/` path.
sysctl -w vm.swappiness=10	Change a kernel parameter at runtime (lost on reboot)	Verify with `sysctl vm.swappiness`. Safe — rebooting undoes it unless in a config file.
sysctl --system	Reload all config files from /etc/sysctl.d/ (simulates boot)	`sysctl -p /etc/sysctl.d/file.conf` to reload just one file
cat /proc/PID/limits	Show a running process's actual resource limits	More reliable than `ulimit -a` which shows the current shell, not the target process
ulimit -n	Show soft open-files limit for current shell session	`ulimit -Hn` for hard limit · `ulimit -a` for all limits · `ulimit -n 65536` to raise
cat /proc/sys/fs/file-nr	System-wide: used / unused / max open file descriptors	Rarely the bottleneck — per-process ulimit hits first
systemctl edit service	Create a drop-in override for a systemd unit (add LimitNOFILE=)	Follow with `systemctl daemon-reload && systemctl restart service`
tuned-adm recommend	Ask tuned what profile best suits this hardware	`tuned-adm profile throughput-performance` to apply · `tuned-adm active` to check current
sar -u 1 5	CPU utilisation — sample every 1 second, 5 times	`sar -r` memory · `sar -b` I/O · `sar -n DEV` network · `sar -f /var/log/sysstat/sa13` historical
grep -r "param" /etc/sysctl.d/	Find which config file sets a particular kernel parameter	Run before modifying to know where to clean up. Also check `/etc/sysctl.conf`.