Performance and Profiling
Chapter 6 — Performance and Profiling
Bash is not Python. It is not Go. But it is also not as slow as most people assume — when written well. The performance gaps that matter are almost always attributable to a small number of identifiable patterns: unnecessary forking, reading files line by line through subshells, and ignoring built-ins that do the same job faster. This chapter gives you the tools to measure first and the knowledge to fix the right things.
1 — Measuring: time, TIMEFORMAT, and date +%s%N
The time built-in
# time wraps any simple command or { compound command; } time find /usr -name '*.so' >/dev/null # real 0m0.412s # user 0m0.089s # sys 0m0.323s # TIMEFORMAT controls the output format (only affects the bash built-in time, # not /usr/bin/time) TIMEFORMAT='%R seconds real' time { sleep 0.3; } # 0.300 seconds real # Capture the output — time writes to stderr TIMEFORMAT='%R' elapsed=$( { time my_function; } 2>&1 >/dev/null ) # Note: >/dev/null discards stdout of my_function; 2>&1 captures stderr (time output) # TIMEFORMAT tokens: # %R elapsed real time (seconds, 3 decimal places) # %U user CPU time # %S system CPU time # %P CPU percentage (%U + %S) / %R * 100
High-resolution timestamps in a loop
# date +%s%N gives nanoseconds since epoch (GNU date only) t0=$(date +%s%N) # ... work ... t1=$(date +%s%N) printf '%.3f ms\n' "$(( (t1 - t0) / 1000000 ))" # Portable microsecond timer via EPOCHREALTIME (bash 5.0+) # No fork — just a variable read t0="$EPOCHREALTIME" # ... work ... t1="$EPOCHREALTIME" # EPOCHREALTIME is a float like 1718000000.123456 # Use awk or bc for the subtraction (bash can't do float arithmetic) awk "BEGIN { printf \"%.6f s\n\", $t1 - $t0 }" # Micro-benchmark helper — runs CMD N times and reports average bench() { local n="${1}"; shift local i TIMEFORMAT='%R' local total total=$( { time { for (( i=0; i<n; i++ )); do "$@" >/dev/null 2&1; done; }; } 2>&1 ) awk "BEGIN { printf \"%d runs: %.3fs total, %.4fs avg\n\", $n, $total, $total/$n }" } bench 100 date +%s # 100 runs: 0.312s total, 0.0031s avg bench 100 printf '%(%s)T' -1 # 100 runs: 0.001s total, 0.0000s avg
2 — The Cost of Forks
Every $(…) command substitution and every | pipeline forks a subshell. On a modern system each fork costs roughly 0.5–3 ms. That sounds tiny, but a loop running 1 000 iterations with one subshell each costs 0.5–3 seconds doing nothing but forking.
| Operation | Typical cost | Notes |
|---|---|---|
Fork + exec ($(date)) | 1–3 ms | Creates child process, execs binary |
Subshell only ($(printf…)) | 0.3–1 ms | Fork without exec — printf is a built-in |
$((…)) arithmetic | <1 µs | No fork — evaluated in-process |
${var//pat/rep} substitution | <1 µs | No fork — built-in pattern engine |
read -r from file | ~5 µs/line | Syscall per character by default; use mapfile |
mapfile -t arr < file | ~0.5 µs/line | Reads in larger chunks |
Pipe (cmd | cmd) | 0.2–1 ms setup | Two forks for the pipeline itself |
# Concrete comparison: 1000 timestamps # Slow: fork+exec per iteration TIMEFORMAT='%R' time { for (( i=0; i<1000; i++ )); do ts=$(date +%s) # fork + exec /usr/bin/date done } # ~2.8s # Fast: built-in printf format (bash 4.2+) time { for (( i=0; i<1000; i++ )); do printf -v ts '%(%s)T' -1 # no fork at all done } # ~0.004s — 700× faster
3 — Replacing External Commands with Built-ins
A systematic audit of your hot path often reveals external commands that Bash can replace with zero forks.
| External command | Bash built-in replacement | Notes |
|---|---|---|
echo "text" | printf '%s\n' "text" | Both are built-ins; prefer printf for portability |
$(date +%s) | printf -v v '%(%s)T' -1 | Bash 4.2+; $EPOCHSECONDS in 5.0+ |
$(date +%Y-%m-%d) | printf -v v '%(%Y-%m-%d)T' -1 | strftime format via printf |
$(basename "$f") | ${f##*/} | Parameter expansion |
$(dirname "$f") | ${f%/*} | Caveat: returns $f if no slash |
$(echo "$s" | tr a-z A-Z) | ${s^^} | Bash 4.0+ case conversion |
$(echo "$s" | tr A-Z a-z) | ${s,,} | Bash 4.0+ |
$(expr "$n" + 1) | $(( n + 1 )) or (( n++ )) | No fork for arithmetic |
n=$(wc -l < file) | mapfile -t a < file; n=${#a[@]} | No fork; trades memory for speed |
$(cat file) | $(< file) | Bash opens the file directly — no cat fork |
grep -c '' file | see mapfile above | Or use awk in one pass |
$(pwd) | $PWD | Always in sync with current directory |
$(whoami) | $USER or ${USER:-$(id -un)} | $USER set by PAM/login |
$(hostname) | $HOSTNAME | Bash sets this at startup |
$(printf '%s' "$v" | wc -c) | ${#v} | Character count (not bytes for multibyte) |
# Pattern expansion as sed/grep replacement # Remove leading whitespace s=" hello" s="${s#"${s%%[! ]*}"}" # trim leading spaces — no fork # Check if string contains a substring — no grep if [[ "$haystack" == *"$needle"* ]]; then echo "found"; fi # Check if string starts with prefix [[ "$s" == error:* ]] # Replace first occurrence — no sed s="${s/foo/bar}" # Replace all occurrences s="${s//foo/bar}" # Strip ANSI escape codes (a common sed job) — pure bash regex clean="$line" while [[ "$clean" =~ $'\e'\[[0-9;]*m ]]; do clean="${clean//"${BASH_REMATCH[0]}"}" done
4 — Reading Files Efficiently
# Slowest: subshell per line while IFS= read -r line; do processed+=( $(transform "$line") ) # fork per line! done < input.txt # Faster: read entire file into array with mapfile, then loop in bash mapfile -t lines < input.txt for line in "${lines[@]}"; do # pure-bash transformation here processed+=( "${line^^}" ) done # Fastest for text processing: delegate the whole job to a single awk/sed pass # One fork, no per-line overhead, purpose-built for text awk '{ print toupper($0) }' input.txt > output.txt # Reading a small file into a variable — $(< file) vs $(cat file) content=$(< config.txt) # Bash opens fd directly — no cat fork content=$(cat config.txt) # fork + exec cat — avoidable # mapfile with a callback (-C -c): process every N lines without building the # entire array in memory first process_chunk() { # $1 = index of first new element, $2 = value printf 'Processing index %d: %s\n' "$1" "$2" } mapfile -t -C process_chunk -c 1 arr < big_file.txt
5 — PS4 Profiling: Line-Level Tracing
Setting PS4 to include timestamps and enabling set -x turns every executed line into a timed trace. This is crude but requires no external tools and works on any Bash version.
# Minimal: show line numbers PS4='+${BASH_SOURCE}:${LINENO}: ' set -x # ... script body ... set +x # Timed: prefix each line with microseconds elapsed since start # Requires bash 5.0+ for EPOCHREALTIME _PROF_START="$EPOCHREALTIME" PS4='+ $(awk "BEGIN{printf \"%8.4f\",'"$EPOCHREALTIME"'-'"$_PROF_START"'}")s ${BASH_SOURCE##*/}:${LINENO}: ' # Caution: the awk in PS4 forks on every traced line — only enable for short # sections or you'll skew the results significantly. # Better approach: redirect xtrace to a dedicated FD, analyse after exec 19>/tmp/trace_$$ BASH_XTRACEFD=19 PS4='+ $EPOCHREALTIME ${BASH_SOURCE##*/}:${LINENO}: ' set -x # ... script body ... set +x exec 19>&- # Analyse the trace: find lines that took the most time # Each line of the trace looks like: + 1718034567.123456 script.sh:42: command awk ' /^\+/ { split($2, a, ".") ts = a[1] "." a[2] if (prev_ts != "") { elapsed = ts - prev_ts printf "%8.4fs %s\n", elapsed, prev_line } prev_ts = ts prev_line = substr($0, index($0,$3)) } ' /tmp/trace_$$ | sort -rn | head -20
BASH_XTRACEFD (Bash 4.1+) redirects set -x
output to a specific file descriptor instead of stderr. This lets you capture trace
data without polluting the script's error stream, making it safe to enable in
production for a single slow function.
6 — Function-Level Profiling with DEBUG Trap
# The DEBUG trap fires before every simple command. # Combine with BASH_SOURCE / FUNCNAME / LINENO for call-graph data. declare -A _PROF_ENTER # function → entry timestamp declare -A _PROF_TOTAL # function → cumulative ms declare -A _PROF_CALLS # function → call count _prof_debug() { local fn="${FUNCNAME[1]:-main}" local now="$EPOCHREALTIME" # Record entry on first call in this function frame [[ -z "${_PROF_ENTER[$fn]:-}" ]] && _PROF_ENTER["$fn"]="$now" } _prof_return() { local fn="${FUNCNAME[1]:-main}" local now="$EPOCHREALTIME" local entry="${_PROF_ENTER[$fn]:-}" [[ -z "$entry" ]] && return local ms ms=$(awk "BEGIN{printf \"%d\", ($now-$entry)*1000}") (( _PROF_TOTAL["$fn"]+=ms )) || true (( _PROF_CALLS["$fn"]++ )) || true unset '_PROF_ENTER[$fn]' } prof_start() { trap '_prof_debug' DEBUG trap '_prof_return' RETURN } prof_report() { printf '\n%-30s %8s %8s %10s\n' "Function" "Calls" "Total ms" "Avg ms" printf '%s\n' "$(printf '%.0s-' {1..60})" for fn in "${!_PROF_TOTAL[@]}"; do printf '%-30s %8d %8d %10.2f\n' \ "$fn" \ "${_PROF_CALLS[$fn]}" \ "${_PROF_TOTAL[$fn]}" \ "$(( _PROF_TOTAL["$fn"] * 100 / _PROF_CALLS["$fn"] ))e-2" done | sort -k3 -rn }
7 — Hot-Path Anti-Patterns and Fixes
Anti-pattern 1: Forking inside a loop
# Bad: one fork per iteration for f in "${files[@]}"; do ext=$(echo "$f" | sed 's/.*\.//') # fork + exec echo + fork + exec sed base=$(basename "$f") # fork + exec basename done # Good: parameter expansion — zero forks for f in "${files[@]}"; do ext="${f##*.}" base="${f##*/}" done
Anti-pattern 2: Useless use of cat
# Bad: cat is just a conduit cat file.txt | grep "pattern" cat file.txt | wc -l # Good: redirect directly grep "pattern" file.txt wc -l < file.txt
Anti-pattern 3: Redundant subshells in conditionals
# Bad: grep returns an exit code — no need to capture output if [[ $(grep -c "pattern" file) -gt 0 ]]; then echo found; fi # Good: test the exit code directly if grep -q "pattern" file; then echo found; fi # Bad: wc in arithmetic if (( $(wc -l < file) > 100 )); then echo big; fi # Good: mapfile + array length if you need the data anyway, # or awk for a single-pass threshold check (no array allocation) awk 'END { exit (NR > 100 ? 0 : 1) }' file && echo big
Anti-pattern 4: Concatenating strings with $(...)
# Bad: three forks to build a path string logfile="$(dirname "$script")/logs/$(basename "$script" .sh)_$(date +%Y%m%d).log" # Good: parameter expansion + built-in printf strftime dir="${script%/*}" name="${script##*/}"; name="${name%.sh}" printf -v stamp '%(%Y%m%d)T' -1 logfile="${dir}/logs/${name}_${stamp}.log"
Anti-pattern 5: Piping to while read
# Bad: the while loop runs in a subshell — variables set inside are lost grep "ERROR" app.log | while IFS= read -r line; do (( errors++ )) # invisible to parent done echo "errors: $errors" # always 0 # Good: redirect with process substitution — loop runs in current shell while IFS= read -r line; do (( errors++ )) || true done < <(grep "ERROR" app.log) echo "errors: $errors" # correct # Even better: let awk count in one pass — no while loop at all errors=$(grep -c "ERROR" app.log)
8 — When to Stop Using Bash
Recognising the boundary where Bash becomes the wrong tool is as important as knowing how to optimise it. These signals suggest it is time to reach for Python, Go, or another language:
| Signal | Reason to switch |
|---|---|
| Processing more than ~100k lines of text per run | awk/Python will be 10–100× faster with less code |
| Nested data structures (JSON, YAML, XML) | Bash has no native JSON parser; jq is a fork per call |
| More than ~500 lines of logic | No type system, no unit-testable modules, hard to maintain |
| Floating-point arithmetic throughout | Bash has no floats; every float operation needs awk or bc |
| HTTP client / API integration | Python requests or Go net/http are far more ergonomic |
| Stateful concurrency (shared queues, worker pools) | No threads; file-based synchronisation is brittle at scale |
| Security-sensitive string parsing | Too easy to introduce injection; use a real parser |
Exercises
Exercise 1 — Fork audit
The following function has at least six unnecessary forks. Identify each one
and rewrite the function to eliminate all of them using only Bash built-ins.
Then benchmark both versions with bench 200 original_func and
bench 200 fast_func and report the speedup ratio.
original_func() { local filepath="$1" local dir=$(dirname "$filepath") local base=$(basename "$filepath") local ext=$(echo "$base" | sed 's/.*\.//') local stem=$(basename "$base" ".$ext") local upper=$(echo "$stem" | tr 'a-z' 'A-Z') local ts=$(date '+%Y%m%d_%H%M%S') echo "${dir}/${upper}_${ts}.${ext}" }
# Forks in original_func: # 1. $(dirname) → ${filepath%/*} # 2. $(basename) → ${filepath##*/} # 3. $(echo | sed) to get ext → ${base##*.} # 4. $(basename base .ext) to get stem → ${base%.*} # 5. $(echo | tr) to uppercase → ${stem^^} # 6. $(date) → printf -v ts '%(%Y%m%d_%H%M%S)T' -1 fast_func() { local filepath="$1" local dir="${filepath%/*}" local base="${filepath##*/}" local ext="${base##*.}" local stem="${base%.*}" local upper="${stem^^}" local ts; printf -v ts '%(%Y%m%d_%H%M%S)T' -1 printf '%s/%s_%s.%s\n' "$dir" "$upper" "$ts" "$ext" } # Benchmark helper from section 1 bench() { local n="$1"; shift local i TIMEFORMAT='%R' local total total=$( { time { for (( i=0; i<n; i++ )); do "$@" >/dev/null 2&1 done; }; } 2>&1 ) awk "BEGIN { printf \"%-20s %d runs: %.3fs (%.4f avg)\n\", \"$1\", $n, $total, $total/$n }" } bench 200 original_func "/var/log/app/server.log" bench 200 fast_func "/var/log/app/server.log" # Typical output: # original_func 200 runs: 2.461s (0.0123 avg) # fast_func 200 runs: 0.003s (0.0000 avg) ~800× faster
Exercise 2 — PS4 trace analyser
Write a script profile_me.sh that:
- Opens FD 19 to
/tmp/trace_$$.log - Sets
BASH_XTRACEFD=19and a timestampedPS4 - Enables
set -xthen runs a deliberately slow function (at least 5 different operations including adatecall, agrep, and awc) - Disables tracing and closes FD 19
- Parses the trace log with awk to produce a table of the top 10 slowest lines (line content + elapsed time in ms since previous traced line)
#!/usr/bin/env bash set -uo pipefail TRACE_LOG="/tmp/trace_$$.log" trap 'rm -f "$TRACE_LOG"' EXIT # ---- Slow function under test ---- slow_work() { local ts ts=$(date '+%Y-%m-%d %H:%M:%S') # fork 1 local count count=$(grep -c 'root' /etc/passwd) # fork 2 local lines lines=$(wc -l < /etc/passwd) # fork 3 local upper upper=$(echo "$ts" | tr 'a-z' 'A-Z') # fork 4+5 local host host=$(hostname) # fork 6 printf '%s | root lines: %d/%d | host: %s\n' \ "$upper" "$count" "$lines" "$host" } # ---- Enable tracing to FD 19 ---- exec 19>"$TRACE_LOG" BASH_XTRACEFD=19 PS4='+ $EPOCHREALTIME ${BASH_SOURCE##*/}:${LINENO}: ' set -x slow_work set +x exec 19>&- unset BASH_XTRACEFD # ---- Analyse trace log ---- printf '\nTop 10 slowest lines:\n' awk ' /^\+/ { # Fields: + TIMESTAMP SOURCE:LINE: rest... ts = $2 rest = "" for (i = 3; i <= NF; i++) rest = rest (i==3 ? "" : " ") $i if (prev_ts != "") { elapsed_ms = (ts - prev_ts) * 1000 printf "%8.2f ms %s\n", elapsed_ms, prev_rest } prev_ts = ts prev_rest = rest } ' "$TRACE_LOG" | sort -rn | head -10
Exercise 3 — File-reading speed comparison
Write a script that reads a moderately large file (generate one with
seq 1 50000 > /tmp/test.txt) using four methods and times each:
while IFS= read -r line; do ... done < file— count lines with a bash countermapfile -t arr < file— then${#arr[@]}wc -l < fileawk 'END{print NR}' file
Print a formatted comparison table showing elapsed time and lines-per-second for each method. All four must produce the same count.
#!/usr/bin/env bash set -euo pipefail FILE=/tmp/bench_read_$$.txt seq 1 50000 > "$FILE" trap 'rm -f "$FILE"' EXIT TIMEFORMAT='%R' declare -a labels times counts # Method 1: while read c1=0 t1=$( { time { while IFS= read -r _; do (( c1++ )) || true; done < "$FILE" }; } 2>&1 ) # Method 2: mapfile declare -a arr t2=$( { time { mapfile -t arr < "$FILE"; }; } 2>&1 ) c2="${#arr[@]}" unset arr # Method 3: wc -l t3=$( { time { c3=$(wc -l < "$FILE"); }; } 2>&1 ) # Method 4: awk t4=$( { time { c4=$(awk 'END{print NR}' "$FILE"); }; } 2>&1 ) # Verify all agree for c in "$c1" "$c2" "$c3" "$c4"; do [[ "$c" == "$c1" ]] || { echo "Count mismatch!" >&2; exit 1; } done # Print table printf '\n%-25s %10s %8s %15s\n' "Method" "Lines" "Time(s)" "Lines/sec" printf '%s\n' "$(printf '%.0s-' {1..62})" while IFS='|' read -r label t c; do awk "BEGIN { t = $t; c = $c lps = (t > 0) ? int(c/t) : 999999 printf \"%-25s %10d %8.4f %15d\n\", \"$label\", c, t, lps }" done <<< $"while read|$t1|$c1 mapfile|$t2|$c2 wc -l|$t3|$c3 awk NR|$t4|$c4"
Exercise 4 — Hot-path rewrite
The script below processes a log file and is painfully slow on large inputs.
Identify every performance problem, then rewrite it as a single
awk program that produces identical output.
Verify correctness on a 10,000-line synthetic log, then benchmark both versions.
#!/usr/bin/env bash # slow_stats.sh — count ERROR/WARN/INFO lines and show top 3 error messages errors=0; warns=0; infos=0 declare -A msg_count while IFS= read -r line; do level=$(echo "$line" | cut -d' ' -f2) msg=$(echo "$line" | cut -d' ' -f3-) case "$level" in ERROR) (( errors++ )); (( msg_count["$msg"]++ )) || true ;; WARN) (( warns++ )) ;; INFO) (( infos++ )) ;; esac done < "$1" printf 'ERROR: %d WARN: %d INFO: %d\n' "$errors" "$warns" "$infos" printf 'Top 3 error messages:\n' for msg in "${!msg_count[@]}"; do printf '%d %s\n' "${msg_count[$msg]}" "$msg" done | sort -rn | head -3
# Problems in slow_stats.sh: # 1. while IFS= read loop — slow for large files (syscall per character) # 2. Two $(echo | cut) forks per line — 2N forks total # 3. The while|pipe anti-pattern: if it were cmd | while, vars would be lost # (here it's a redirect, so that's not the issue — but the forks are) # Solution: one awk pass does everything — zero extra forks # fast_stats.awk (inline here as a bash here-doc) fast_stats() { awk ' { level = $2 msg = substr($0, index($0, $3)) if (level == "ERROR") { errors++; msg_count[msg]++ } else if (level == "WARN") { warns++ } else if (level == "INFO") { infos++ } } END { printf "ERROR: %d WARN: %d INFO: %d\n", errors, warns, infos print "Top 3 error messages:" # Sort msg_count by value descending — build array of "count msg" lines n = 0 for (m in msg_count) lines[n++] = sprintf("%06d %s", msg_count[m], m) # Simple insertion sort on lines[] for (i = 1; i < n; i++) { key = lines[i]; j = i - 1 while (j >= 0 && lines[j] < key) { lines[j+1] = lines[j]; j-- } lines[j+1] = key } for (i = n-1; i >= n-3 && i >= 0; i--) { cnt = lines[i]+0; sub(/^[0-9]+ /, "", lines[i]) printf "%d %s\n", cnt, lines[i] } } ' "$1" } # Generate a synthetic 10k-line log for testing gen_log() { local levels=( ERROR WARN INFO INFO INFO ) local msgs=( "disk full" "timeout" "connection refused" "auth failed" "ok" ) for (( i=0; i<10000; i++ )); do printf '2024-01-01 %s %s\n' \ "${levels[RANDOM%5]}" "${msgs[RANDOM%5]}" done } LOGFILE=/tmp/test_log_$$.txt trap 'rm -f "$LOGFILE"' EXIT gen_log > "$LOGFILE" # Verify identical output diff <(bash slow_stats.sh "$LOGFILE") <(fast_stats "$LOGFILE") \ && echo "Outputs match" || echo "MISMATCH" # Benchmark TIMEFORMAT='%R' printf 'slow_stats: '; time { bash slow_stats.sh "$LOGFILE" >/dev/null; } printf 'fast_stats: '; time { fast_stats "$LOGFILE" >/dev/null; }