Performance and Profiling

Chapter 6 — Performance and Profiling

Bash is not Python. It is not Go. But it is also not as slow as most people assume — when written well. The performance gaps that matter are almost always attributable to a small number of identifiable patterns: unnecessary forking, reading files line by line through subshells, and ignoring built-ins that do the same job faster. This chapter gives you the tools to measure first and the knowledge to fix the right things.

1 — Measuring: `time`, `TIMEFORMAT`, and `date +%s%N`

The `time` built-in

# time wraps any simple command or { compound command; }
time find /usr -name '*.so' >/dev/null
# real    0m0.412s
# user    0m0.089s
# sys     0m0.323s

# TIMEFORMAT controls the output format (only affects the bash built-in time,
# not /usr/bin/time)
TIMEFORMAT='%R seconds real'
time { sleep 0.3; }    # 0.300 seconds real

# Capture the output — time writes to stderr
TIMEFORMAT='%R'
elapsed=$( { time my_function; } 2>&1 >/dev/null )
# Note: >/dev/null discards stdout of my_function; 2>&1 captures stderr (time output)

# TIMEFORMAT tokens:
#   %R  elapsed real time (seconds, 3 decimal places)
#   %U  user CPU time
#   %S  system CPU time
#   %P  CPU percentage  (%U + %S) / %R * 100

High-resolution timestamps in a loop

# date +%s%N gives nanoseconds since epoch (GNU date only)
t0=$(date +%s%N)
# ... work ...
t1=$(date +%s%N)
printf '%.3f ms\n' "$(( (t1 - t0) / 1000000 ))"

# Portable microsecond timer via EPOCHREALTIME (bash 5.0+)
# No fork — just a variable read
t0="$EPOCHREALTIME"
# ... work ...
t1="$EPOCHREALTIME"
# EPOCHREALTIME is a float like 1718000000.123456
# Use awk or bc for the subtraction (bash can't do float arithmetic)
awk "BEGIN { printf \"%.6f s\n\", $t1 - $t0 }"

# Micro-benchmark helper — runs CMD N times and reports average
bench() {
  local n="${1}"; shift
  local i
  TIMEFORMAT='%R'
  local total
  total=$( { time { for (( i=0; i<n; i++ )); do "$@" >/dev/null 2&1; done; }; } 2>&1 )
  awk "BEGIN { printf \"%d runs: %.3fs total, %.4fs avg\n\", $n, $total, $total/$n }"
}

bench 100 date +%s       # 100 runs: 0.312s total, 0.0031s avg
bench 100 printf '%(%s)T' -1  # 100 runs: 0.001s total, 0.0000s avg

2 — The Cost of Forks

Every $(…) command substitution and every | pipeline forks a subshell. On a modern system each fork costs roughly 0.5–3 ms. That sounds tiny, but a loop running 1 000 iterations with one subshell each costs 0.5–3 seconds doing nothing but forking.

Operation	Typical cost	Notes
Fork + exec (`$(date)`)	1–3 ms	Creates child process, execs binary
Subshell only (`$(printf…)`)	0.3–1 ms	Fork without exec — printf is a built-in
`$((…))` arithmetic	<1 µs	No fork — evaluated in-process
`${var//pat/rep}` substitution	<1 µs	No fork — built-in pattern engine
`read -r` from file	~5 µs/line	Syscall per character by default; use `mapfile`
`mapfile -t arr < file`	~0.5 µs/line	Reads in larger chunks
Pipe (`cmd \| cmd`)	0.2–1 ms setup	Two forks for the pipeline itself

# Concrete comparison: 1000 timestamps

# Slow: fork+exec per iteration
TIMEFORMAT='%R'
time {
  for (( i=0; i<1000; i++ )); do
    ts=$(date +%s)      # fork + exec /usr/bin/date
  done
}
# ~2.8s

# Fast: built-in printf format (bash 4.2+)
time {
  for (( i=0; i<1000; i++ )); do
    printf -v ts '%(%s)T' -1   # no fork at all
  done
}
# ~0.004s — 700× faster

3 — Replacing External Commands with Built-ins

A systematic audit of your hot path often reveals external commands that Bash can replace with zero forks.

External command	Bash built-in replacement	Notes
`echo "text"`	`printf '%s\n' "text"`	Both are built-ins; prefer printf for portability
`$(date +%s)`	`printf -v v '%(%s)T' -1`	Bash 4.2+; `$EPOCHSECONDS` in 5.0+
`$(date +%Y-%m-%d)`	`printf -v v '%(%Y-%m-%d)T' -1`	strftime format via printf
`$(basename "$f")`	`${f##*/}`	Parameter expansion
`$(dirname "$f")`	`${f%/*}`	Caveat: returns `$f` if no slash
`$(echo "$s" \| tr a-z A-Z)`	`${s^^}`	Bash 4.0+ case conversion
`$(echo "$s" \| tr A-Z a-z)`	`${s,,}`	Bash 4.0+
`$(expr "$n" + 1)`	`$(( n + 1 ))` or `(( n++ ))`	No fork for arithmetic
`n=$(wc -l < file)`	`mapfile -t a < file; n=${#a[@]}`	No fork; trades memory for speed
`$(cat file)`	`$(< file)`	Bash opens the file directly — no cat fork
`grep -c '' file`	see `mapfile` above	Or use awk in one pass
`$(pwd)`	`$PWD`	Always in sync with current directory
`$(whoami)`	`$USER` or `${USER:-$(id -un)}`	$USER set by PAM/login
`$(hostname)`	`$HOSTNAME`	Bash sets this at startup
`$(printf '%s' "$v" \| wc -c)`	`${#v}`	Character count (not bytes for multibyte)

# Pattern expansion as sed/grep replacement

# Remove leading whitespace
s="   hello"
s="${s#"${s%%[! ]*}"}"   # trim leading spaces — no fork

# Check if string contains a substring — no grep
if [[ "$haystack" == *"$needle"* ]]; then echo "found"; fi

# Check if string starts with prefix
[[ "$s" == error:* ]]

# Replace first occurrence — no sed
s="${s/foo/bar}"

# Replace all occurrences
s="${s//foo/bar}"

# Strip ANSI escape codes (a common sed job) — pure bash regex
clean="$line"
while [[ "$clean" =~ $'\e'\[[0-9;]*m ]]; do
  clean="${clean//"${BASH_REMATCH[0]}"}"
done

4 — Reading Files Efficiently

# Slowest: subshell per line
while IFS= read -r line; do
  processed+=( $(transform "$line") )   # fork per line!
done < input.txt

# Faster: read entire file into array with mapfile, then loop in bash
mapfile -t lines < input.txt
for line in "${lines[@]}"; do
  # pure-bash transformation here
  processed+=( "${line^^}" )
done

# Fastest for text processing: delegate the whole job to a single awk/sed pass
# One fork, no per-line overhead, purpose-built for text
awk '{ print toupper($0) }' input.txt > output.txt

# Reading a small file into a variable — $(< file) vs $(cat file)
content=$(< config.txt)    # Bash opens fd directly — no cat fork
content=$(cat config.txt)  # fork + exec cat — avoidable

# mapfile with a callback (-C -c): process every N lines without building the
# entire array in memory first
process_chunk() {
  # $1 = index of first new element, $2 = value
  printf 'Processing index %d: %s\n' "$1" "$2"
}
mapfile -t -C process_chunk -c 1 arr < big_file.txt

5 — PS4 Profiling: Line-Level Tracing

Setting PS4 to include timestamps and enabling set -x turns every executed line into a timed trace. This is crude but requires no external tools and works on any Bash version.

# Minimal: show line numbers
PS4='+${BASH_SOURCE}:${LINENO}: '
set -x
# ... script body ...
set +x

# Timed: prefix each line with microseconds elapsed since start
# Requires bash 5.0+ for EPOCHREALTIME
_PROF_START="$EPOCHREALTIME"
PS4='+ $(awk "BEGIN{printf \"%8.4f\",'"$EPOCHREALTIME"'-'"$_PROF_START"'}")s ${BASH_SOURCE##*/}:${LINENO}: '

# Caution: the awk in PS4 forks on every traced line — only enable for short
# sections or you'll skew the results significantly.

# Better approach: redirect xtrace to a dedicated FD, analyse after
exec 19>/tmp/trace_$$
BASH_XTRACEFD=19
PS4='+ $EPOCHREALTIME ${BASH_SOURCE##*/}:${LINENO}: '
set -x

# ... script body ...

set +x
exec 19>&-

# Analyse the trace: find lines that took the most time
# Each line of the trace looks like: + 1718034567.123456 script.sh:42: command
awk '
  /^\+/ {
    split($2, a, ".")
    ts = a[1] "." a[2]
    if (prev_ts != "") {
      elapsed = ts - prev_ts
      printf "%8.4fs  %s\n", elapsed, prev_line
    }
    prev_ts = ts
    prev_line = substr($0, index($0,$3))
  }
' /tmp/trace_$$ | sort -rn | head -20

BASH_XTRACEFD (Bash 4.1+) redirects set -x output to a specific file descriptor instead of stderr. This lets you capture trace data without polluting the script's error stream, making it safe to enable in production for a single slow function.

6 — Function-Level Profiling with `DEBUG` Trap

# The DEBUG trap fires before every simple command.
# Combine with BASH_SOURCE / FUNCNAME / LINENO for call-graph data.

declare -A _PROF_ENTER   # function → entry timestamp
declare -A _PROF_TOTAL   # function → cumulative ms
declare -A _PROF_CALLS   # function → call count

_prof_debug() {
  local fn="${FUNCNAME[1]:-main}"
  local now="$EPOCHREALTIME"
  # Record entry on first call in this function frame
  [[ -z "${_PROF_ENTER[$fn]:-}" ]] && _PROF_ENTER["$fn"]="$now"
}

_prof_return() {
  local fn="${FUNCNAME[1]:-main}"
  local now="$EPOCHREALTIME"
  local entry="${_PROF_ENTER[$fn]:-}"
  [[ -z "$entry" ]] && return
  local ms
  ms=$(awk "BEGIN{printf \"%d\", ($now-$entry)*1000}")
  (( _PROF_TOTAL["$fn"]+=ms )) || true
  (( _PROF_CALLS["$fn"]++   )) || true
  unset '_PROF_ENTER[$fn]'
}

prof_start() {
  trap '_prof_debug' DEBUG
  trap '_prof_return' RETURN
}

prof_report() {
  printf '\n%-30s %8s %8s %10s\n' "Function" "Calls" "Total ms" "Avg ms"
  printf '%s\n' "$(printf '%.0s-' {1..60})"
  for fn in "${!_PROF_TOTAL[@]}"; do
    printf '%-30s %8d %8d %10.2f\n' \
      "$fn" \
      "${_PROF_CALLS[$fn]}" \
      "${_PROF_TOTAL[$fn]}" \
      "$(( _PROF_TOTAL["$fn"] * 100 / _PROF_CALLS["$fn"] ))e-2"
  done | sort -k3 -rn
}

7 — Hot-Path Anti-Patterns and Fixes

Anti-pattern 1: Forking inside a loop

# Bad: one fork per iteration
for f in "${files[@]}"; do
  ext=$(echo "$f" | sed 's/.*\.//')   # fork + exec echo + fork + exec sed
  base=$(basename "$f")                  # fork + exec basename
done

# Good: parameter expansion — zero forks
for f in "${files[@]}"; do
  ext="${f##*.}"
  base="${f##*/}"
done

Anti-pattern 2: Useless use of cat

# Bad: cat is just a conduit
cat file.txt | grep "pattern"
cat file.txt | wc -l

# Good: redirect directly
grep "pattern" file.txt
wc -l < file.txt

Anti-pattern 3: Redundant subshells in conditionals

# Bad: grep returns an exit code — no need to capture output
if [[ $(grep -c "pattern" file) -gt 0 ]]; then echo found; fi

# Good: test the exit code directly
if grep -q "pattern" file; then echo found; fi

# Bad: wc in arithmetic
if (( $(wc -l < file) > 100 )); then echo big; fi

# Good: mapfile + array length if you need the data anyway,
# or awk for a single-pass threshold check (no array allocation)
awk 'END { exit (NR > 100 ? 0 : 1) }' file && echo big

Anti-pattern 4: Concatenating strings with `$(...)`

# Bad: three forks to build a path string
logfile="$(dirname "$script")/logs/$(basename "$script" .sh)_$(date +%Y%m%d).log"

# Good: parameter expansion + built-in printf strftime
dir="${script%/*}"
name="${script##*/}"; name="${name%.sh}"
printf -v stamp '%(%Y%m%d)T' -1
logfile="${dir}/logs/${name}_${stamp}.log"

Anti-pattern 5: Piping to while read

# Bad: the while loop runs in a subshell — variables set inside are lost
grep "ERROR" app.log | while IFS= read -r line; do
  (( errors++ ))   # invisible to parent
done
echo "errors: $errors"   # always 0

# Good: redirect with process substitution — loop runs in current shell
while IFS= read -r line; do
  (( errors++ )) || true
done < <(grep "ERROR" app.log)
echo "errors: $errors"   # correct

# Even better: let awk count in one pass — no while loop at all
errors=$(grep -c "ERROR" app.log)

8 — When to Stop Using Bash

Recognising the boundary where Bash becomes the wrong tool is as important as knowing how to optimise it. These signals suggest it is time to reach for Python, Go, or another language:

Signal	Reason to switch
Processing more than ~100k lines of text per run	awk/Python will be 10–100× faster with less code
Nested data structures (JSON, YAML, XML)	Bash has no native JSON parser; `jq` is a fork per call
More than ~500 lines of logic	No type system, no unit-testable modules, hard to maintain
Floating-point arithmetic throughout	Bash has no floats; every float operation needs awk or bc
HTTP client / API integration	Python requests or Go net/http are far more ergonomic
Stateful concurrency (shared queues, worker pools)	No threads; file-based synchronisation is brittle at scale
Security-sensitive string parsing	Too easy to introduce injection; use a real parser

The hybrid pattern: the sweet spot is often a thin Bash wrapper that orchestrates the environment (PATH, config files, working directory) and hands off the heavy lifting to a Python or Go binary. The shell excels at wiring; compiled languages excel at computing.

Exercises

Exercise 1 — Fork audit

The following function has at least six unnecessary forks. Identify each one and rewrite the function to eliminate all of them using only Bash built-ins. Then benchmark both versions with bench 200 original_func and bench 200 fast_func and report the speedup ratio.

original_func() {
  local filepath="$1"
  local dir=$(dirname "$filepath")
  local base=$(basename "$filepath")
  local ext=$(echo "$base" | sed 's/.*\.//')
  local stem=$(basename "$base" ".$ext")
  local upper=$(echo "$stem" | tr 'a-z' 'A-Z')
  local ts=$(date '+%Y%m%d_%H%M%S')
  echo "${dir}/${upper}_${ts}.${ext}"
}

# Forks in original_func:
# 1. $(dirname)   → ${filepath%/*}
# 2. $(basename)  → ${filepath##*/}
# 3. $(echo | sed) to get ext → ${base##*.}
# 4. $(basename base .ext) to get stem → ${base%.*}
# 5. $(echo | tr) to uppercase → ${stem^^}
# 6. $(date) → printf -v ts '%(%Y%m%d_%H%M%S)T' -1

fast_func() {
  local filepath="$1"
  local dir="${filepath%/*}"
  local base="${filepath##*/}"
  local ext="${base##*.}"
  local stem="${base%.*}"
  local upper="${stem^^}"
  local ts; printf -v ts '%(%Y%m%d_%H%M%S)T' -1
  printf '%s/%s_%s.%s\n' "$dir" "$upper" "$ts" "$ext"
}

# Benchmark helper from section 1
bench() {
  local n="$1"; shift
  local i
  TIMEFORMAT='%R'
  local total
  total=$( { time { for (( i=0; i<n; i++ )); do
    "$@" >/dev/null 2&1
  done; }; } 2>&1 )
  awk "BEGIN { printf \"%-20s %d runs: %.3fs (%.4f avg)\n\", \"$1\", $n, $total, $total/$n }"
}

bench 200 original_func "/var/log/app/server.log"
bench 200 fast_func     "/var/log/app/server.log"
# Typical output:
# original_func        200 runs: 2.461s (0.0123 avg)
# fast_func            200 runs: 0.003s (0.0000 avg)  ~800× faster

Exercise 2 — PS4 trace analyser

Write a script profile_me.sh that:

Opens FD 19 to /tmp/trace_$$.log
Sets BASH_XTRACEFD=19 and a timestamped PS4
Enables set -x then runs a deliberately slow function (at least 5 different operations including a date call, a grep, and a wc)
Disables tracing and closes FD 19
Parses the trace log with awk to produce a table of the top 10 slowest lines (line content + elapsed time in ms since previous traced line)

#!/usr/bin/env bash
set -uo pipefail

TRACE_LOG="/tmp/trace_$$.log"
trap 'rm -f "$TRACE_LOG"' EXIT

# ---- Slow function under test ----
slow_work() {
  local ts
  ts=$(date '+%Y-%m-%d %H:%M:%S')                   # fork 1
  local count
  count=$(grep -c 'root' /etc/passwd)                # fork 2
  local lines
  lines=$(wc -l < /etc/passwd)                        # fork 3
  local upper
  upper=$(echo "$ts" | tr 'a-z' 'A-Z')                 # fork 4+5
  local host
  host=$(hostname)                                       # fork 6
  printf '%s | root lines: %d/%d | host: %s\n' \
    "$upper" "$count" "$lines" "$host"
}

# ---- Enable tracing to FD 19 ----
exec 19>"$TRACE_LOG"
BASH_XTRACEFD=19
PS4='+ $EPOCHREALTIME ${BASH_SOURCE##*/}:${LINENO}: '
set -x

slow_work

set +x
exec 19>&-
unset BASH_XTRACEFD

# ---- Analyse trace log ----
printf '\nTop 10 slowest lines:\n'
awk '
/^\+/ {
  # Fields: + TIMESTAMP SOURCE:LINE: rest...
  ts   = $2
  rest = ""
  for (i = 3; i <= NF; i++) rest = rest (i==3 ? "" : " ") $i

  if (prev_ts != "") {
    elapsed_ms = (ts - prev_ts) * 1000
    printf "%8.2f ms  %s\n", elapsed_ms, prev_rest
  }
  prev_ts   = ts
  prev_rest = rest
}
' "$TRACE_LOG" | sort -rn | head -10

Exercise 3 — File-reading speed comparison

Write a script that reads a moderately large file (generate one with seq 1 50000 > /tmp/test.txt) using four methods and times each:

while IFS= read -r line; do ... done < file — count lines with a bash counter
mapfile -t arr < file — then ${#arr[@]}
wc -l < file
awk 'END{print NR}' file

Print a formatted comparison table showing elapsed time and lines-per-second for each method. All four must produce the same count.

#!/usr/bin/env bash
set -euo pipefail

FILE=/tmp/bench_read_$$.txt
seq 1 50000 > "$FILE"
trap 'rm -f "$FILE"' EXIT

TIMEFORMAT='%R'
declare -a labels times counts

# Method 1: while read
c1=0
t1=$( { time {
  while IFS= read -r _; do (( c1++ )) || true; done < "$FILE"
}; } 2>&1 )

# Method 2: mapfile
declare -a arr
t2=$( { time { mapfile -t arr < "$FILE"; }; } 2>&1 )
c2="${#arr[@]}"
unset arr

# Method 3: wc -l
t3=$( { time { c3=$(wc -l < "$FILE"); }; } 2>&1 )

# Method 4: awk
t4=$( { time { c4=$(awk 'END{print NR}' "$FILE"); }; } 2>&1 )

# Verify all agree
for c in "$c1" "$c2" "$c3" "$c4"; do
  [[ "$c" == "$c1" ]] || { echo "Count mismatch!" >&2; exit 1; }
done

# Print table
printf '\n%-25s %10s %8s %15s\n' "Method" "Lines" "Time(s)" "Lines/sec"
printf '%s\n' "$(printf '%.0s-' {1..62})"
while IFS='|' read -r label t c; do
  awk "BEGIN {
    t = $t; c = $c
    lps = (t > 0) ? int(c/t) : 999999
    printf \"%-25s %10d %8.4f %15d\n\", \"$label\", c, t, lps
  }"
done <<< $"while read|$t1|$c1
mapfile|$t2|$c2
wc -l|$t3|$c3
awk NR|$t4|$c4"

Exercise 4 — Hot-path rewrite

The script below processes a log file and is painfully slow on large inputs. Identify every performance problem, then rewrite it as a single awk program that produces identical output. Verify correctness on a 10,000-line synthetic log, then benchmark both versions.

#!/usr/bin/env bash
# slow_stats.sh — count ERROR/WARN/INFO lines and show top 3 error messages
errors=0; warns=0; infos=0
declare -A msg_count
while IFS= read -r line; do
  level=$(echo "$line" | cut -d' ' -f2)
  msg=$(echo "$line" | cut -d' ' -f3-)
  case "$level" in
    ERROR) (( errors++ )); (( msg_count["$msg"]++ )) || true ;;
    WARN)  (( warns++  )) ;;
    INFO)  (( infos++  )) ;;
  esac
done < "$1"
printf 'ERROR: %d  WARN: %d  INFO: %d\n' "$errors" "$warns" "$infos"
printf 'Top 3 error messages:\n'
for msg in "${!msg_count[@]}"; do
  printf '%d %s\n' "${msg_count[$msg]}" "$msg"
done | sort -rn | head -3

# Problems in slow_stats.sh:
# 1. while IFS= read loop — slow for large files (syscall per character)
# 2. Two $(echo | cut) forks per line — 2N forks total
# 3. The while|pipe anti-pattern: if it were cmd | while, vars would be lost
#    (here it's a redirect, so that's not the issue — but the forks are)
# Solution: one awk pass does everything — zero extra forks

# fast_stats.awk (inline here as a bash here-doc)
fast_stats() {
  awk '
  {
    level = $2
    msg   = substr($0, index($0, $3))
    if      (level == "ERROR") { errors++; msg_count[msg]++ }
    else if (level == "WARN")  { warns++  }
    else if (level == "INFO")  { infos++  }
  }
  END {
    printf "ERROR: %d  WARN: %d  INFO: %d\n", errors, warns, infos
    print "Top 3 error messages:"
    # Sort msg_count by value descending — build array of "count msg" lines
    n = 0
    for (m in msg_count) lines[n++] = sprintf("%06d %s", msg_count[m], m)
    # Simple insertion sort on lines[]
    for (i = 1; i < n; i++) {
      key = lines[i]; j = i - 1
      while (j >= 0 && lines[j] < key) { lines[j+1] = lines[j]; j-- }
      lines[j+1] = key
    }
    for (i = n-1; i >= n-3 && i >= 0; i--) {
      cnt = lines[i]+0; sub(/^[0-9]+ /, "", lines[i])
      printf "%d %s\n", cnt, lines[i]
    }
  }
  ' "$1"
}

# Generate a synthetic 10k-line log for testing
gen_log() {
  local levels=( ERROR WARN INFO INFO INFO )
  local msgs=( "disk full" "timeout" "connection refused" "auth failed" "ok" )
  for (( i=0; i<10000; i++ )); do
    printf '2024-01-01 %s %s\n' \
      "${levels[RANDOM%5]}" "${msgs[RANDOM%5]}"
  done
}

LOGFILE=/tmp/test_log_$$.txt
trap 'rm -f "$LOGFILE"' EXIT
gen_log > "$LOGFILE"

# Verify identical output
diff <(bash slow_stats.sh "$LOGFILE") <(fast_stats "$LOGFILE") \
  && echo "Outputs match" || echo "MISMATCH"

# Benchmark
TIMEFORMAT='%R'
printf 'slow_stats: '; time { bash slow_stats.sh "$LOGFILE" >/dev/null; }
printf 'fast_stats: '; time { fast_stats "$LOGFILE" >/dev/null; }