Concurrency and Parallelism
Chapter 5 — Concurrency and Parallelism
Bash has no threads, but it has processes — and processes are cheap enough on Linux that genuine parallel execution is well within reach of a shell script. This chapter covers the full spectrum: basic background jobs, precise waiting semantics, bounded job pools, race-condition avoidance, and GNU parallel for when you need industrial- strength throughput.
1 — Background Jobs and wait
The basics
# & sends a command to the background; $! is its PID sleep 2 & PID1=$! sleep 3 & PID2=$! # wait PID — block until that specific process exits, return its exit status wait $PID1; echo "sleep 2 exited: $?" wait $PID2; echo "sleep 3 exited: $?" # wait (no args) — block until ALL background jobs finish for host in server{1..8}; do ping -c1 -W1 "$host" &>/dev/null & done wait echo "All pings done"
wait -n — first to finish (Bash 5.1+)
# wait -n returns as soon as any one background job exits # The exit status is that of the completed job declare -a pids for url in "${urls[@]}"; do curl -sfo "/tmp/dl_$$_${#pids[@]}" "$url" & pids+=( $! ) done # Harvest results as they complete for (( i=0; i<"${#pids[@]}"; i++ )); do wait -n echo "A job finished with status $?" done # wait -n -p VAR (Bash 5.3+) — also stores the PID of the completed job wait -n -p finished_pid echo "PID $finished_pid just completed"
Capturing exit statuses from background jobs
# Pattern: store PIDs, then wait for each and record exit codes declare -A job_pids # job_name → PID declare -A job_exit # job_name → exit code run_job() { local name="$1"; shift "$@" & job_pids["$name"]=$! } harvest_jobs() { local name pid for name in "${!job_pids[@]}"; do pid="${job_pids[$name]}" wait "$pid" job_exit["$name"]=$? done } run_job backup rsync -a /data/ /backup/ run_job compress gzip -k /tmp/report.csv run_job notify curl -s "$WEBHOOK" -d '{"text":"starting"}' harvest_jobs for name in "${!job_exit[@]}"; do printf '%s: exit %d\n' "$name" "${job_exit[$name]}" done
2 — Bounded Parallelism: Job Pools
Firing all jobs at once saturates CPU and I/O. A bounded pool keeps exactly N jobs running simultaneously — new ones start as old ones finish.
Simple semaphore with wait -n
parallel_run() { # parallel_run MAX_JOBS CMD [ARGS...] # Reads newline-delimited work items from stdin and runs CMD ITEM for each, # keeping at most MAX_JOBS running simultaneously. local max="$1"; shift local running=0 local item while IFS= read -r item; do "$@" "$item" & (( running++ )) if (( running >= max )); then wait -n # block until any one job finishes (( running-- )) fi done wait # drain remaining jobs } process_file() { # Example worker: compress a file gzip -k "$1" && printf 'compressed %s\n' "$1" } find /data -name '*.log' | parallel_run 4 process_file
FD-based semaphore (works in Bash 4.x without wait -n)
# A counting semaphore using a FIFO and FD slots. # Each "token" is a byte in the FIFO; a worker reads one to acquire, # writes one back when done. sem_init() { # sem_init N SEMAPHORE_FD_VAR local n="$1" local -n __fd="$2" local fifo fifo=$(mktemp -u) mkfifo "$fifo" exec {__fd}<>"$fifo" # open for read+write so the FIFO stays open rm "$fifo" # Pre-fill with N tokens (one byte each) local i for (( i=0; i<n; i++ )); do printf 'x' >&$__fd done } sem_acquire() { IFS= read -r -n1 _ <&$1; } # blocks until a token is available sem_release() { printf 'x' >&$1; } # returns a token # Usage SEM_FD sem_init 4 SEM_FD for item in "${items[@]}"; do sem_acquire "$SEM_FD" # blocks if 4 jobs already running { process_file "$item" sem_release "$SEM_FD" } & done wait exec {SEM_FD}>&-
3 — Collecting Output Safely
Background jobs write to the same stdout as the parent. Interleaved output is a common problem. The cleanest solutions: write to per-job temp files, or use a dedicated output file per job.
# Anti-pattern: interleaved output for host in a b c d; do { echo "=== $host ==="; ssh "$host" 'uptime'; } & # lines from different hosts intermix done wait # Better: each job writes to its own temp file declare -A tmpfiles for host in a b c d; do tmpfiles["$host"]=$(mktemp) { echo "=== $host ==="; ssh "$host" 'uptime'; } > "${tmpfiles[$host]}" & done wait # Print results in submission order for host in a b c d; do cat "${tmpfiles[$host]}" rm -f "${tmpfiles[$host]}" done # Best: mktemp in a trap-guarded tmpdir so cleanup is guaranteed TMPDIR=$(mktemp -d) trap 'rm -rf "$TMPDIR"' EXIT for host in a b c d; do { ssh "$host" 'uptime'; } > "${TMPDIR}/${host}" & done wait for host in a b c d; do printf '=== %s ===\n' "$host" cat "${TMPDIR}/${host}" done
4 — Race Conditions and Shared Resources
Background jobs share the parent's open file descriptors and environment but run in separate processes with separate address spaces. The classic race conditions are: concurrent writes to a shared file and read-modify-write on a counter.
Atomic appends with flock
# flock -x LOCKFILE CMD — exclusive lock around CMD LOG=/tmp/parallel.log safe_log() { # Append atomically — flock serialises concurrent writers flock -x "${LOG}.lock" \ printf '[%s] %s\n' "$(date +%T)" "$*" >> "$LOG" } for i in {1..20}; do { safe_log "job $i started"; sleep 0.1; safe_log "job $i done"; } & done wait
Shared counters via a locked temp file
# Shared mutable state between processes requires a file + lock. # In-memory variables are NOT shared — each child has its own copy. COUNTER_FILE=$(mktemp) echo 0 > "$COUNTER_FILE" counter_inc() { ( flock -x 9 local n n=$(< "$COUNTER_FILE") echo $(( n + 1 )) > "$COUNTER_FILE" ) 9>"${COUNTER_FILE}.lock" } counter_get() { < "$COUNTER_FILE"; } for i in {1..50}; do { counter_inc; } & done wait echo "Final count: $(counter_get)" # 50 — not a random number rm -f "$COUNTER_FILE" "${COUNTER_FILE}.lock"
The subshell variable trap
# Variables set inside & subshells are invisible to the parent result="" { result="hello"; } & wait echo "'$result'" # '' — the assignment happened in a child process # Solution: communicate via file, pipe, or temp file tmpf=$(mktemp) { echo "hello" > "$tmpf"; } & wait result=$(< "$tmpf") echo "'$result'" # 'hello' rm "$tmpf"
5 — Signal Propagation to Children
When your script receives SIGINT or SIGTERM, background jobs do not automatically die — the kernel sends SIGINT to the entire foreground process group, but trap in the parent does not automatically reach & children spawned before the trap was set. Be explicit.
declare -a CHILD_PIDS cleanup() { printf '\nInterrupted — killing children\n' >&2 # Kill all children in our tracking array local pid for pid in "${CHILD_PIDS[@]}"; do kill "$pid" 2>/dev/null done wait exit 130 } trap cleanup INT TERM for item in "${items[@]}"; do slow_process "$item" & CHILD_PIDS+=( $! ) done wait # Alternative: kill the entire process group cleanup_group() { # kill -- -$$ sends signal to every process in this script's process group kill -- -$$ 2>/dev/null exit 130 }
6 — GNU Parallel
GNU parallel is the right tool when you need configurable parallelism, progress bars, retry logic, argument templating, or cross-host execution. It is not a Bash built-in but is available on all major distributions and is worth knowing.
Basic usage
# Run one job per CPU core (default) parallel gzip ::: *.log # Explicit job count parallel -j8 gzip ::: *.log # Read arguments from stdin find /data -name '*.csv' | parallel -j4 process_csv {} # {} is the input item; {.} strips extension; {/} basename; {//} dirname find /data -name '*.flac' | \ parallel ffmpeg -i {} -q:a 2 '{.}.mp3' # Multiple input sources parallel echo '{1} x {2}' ::: a b c ::: 1 2 # a x 1 a x 2 b x 1 b x 2 c x 1 c x 2
Advanced options
# --bar: progress bar --eta: time estimate --joblog: machine-readable log parallel --bar --joblog /tmp/jobs.log -j4 process_file ::: "${files[@]}" # --retries N: retry failed jobs up to N times parallel --retries 3 -j4 curl -sfo '{/}' ::: "${urls[@]}" # --keep-order: print results in submission order (not completion order) parallel --keep-order -j8 sha256sum ::: "${files[@]}" # --delay S: stagger job starts by S seconds (rate limiting) parallel --delay 0.5 -j4 curl -s ::: "${api_calls[@]}" # --halt now,fail=1: stop all jobs as soon as one fails parallel --halt now,fail=1 -j8 validate_file ::: "${files[@]}" # Run on multiple remote hosts via SSH parallel --sshlogin server1,server2,server3 uptime # Export a shell function for parallel to use my_func() { echo "Processing: $1"; } export -f my_func parallel -j4 my_func ::: "${items[@]}"
Reading the joblog
# The joblog TSV format: # Seq Host Starttime JobRuntime Send Receive Exitval Signal Command awk -F'\t' 'NR>1 && $7 != 0 { print "FAILED:", $NF }' /tmp/jobs.log # List only failed jobs for re-submission awk -F'\t' 'NR>1 && $7 != 0 { print $NF }' /tmp/jobs.log | parallel --retries 5 -j2 {}
7 — Patterns: Fan-out / Fan-in
A common parallel pipeline: fan-out splits a work queue among N workers; fan-in collects results into a single ordered stream. Both stages can be implemented in pure Bash.
#!/usr/bin/env bash # Fan-out/fan-in: process files in parallel, collect results in order set -euo pipefail JOBS=${JOBS:-$(( $(nproc) * 2 ))} WORKDIR=$(mktemp -d) trap 'rm -rf "$WORKDIR"' EXIT # --- Fan-out: submit all jobs, write output to numbered temp files --- seq_num=0 declare -a pids seqfiles while IFS= read -r -d '' file; do outfile="${WORKDIR}/${seq_num}.out" seqfiles+=( "$outfile" ) # Bounded: wait if at capacity if (( "${#pids[@]}" >= JOBS )); then wait -n # Remove any finished PIDs from the array declare -a live=() for p in "${pids[@]}"; do kill -0 "$p" 2>/dev/null && live+=( "$p" ) done pids=( "${live[@]}" ) fi # Dispatch worker { sha256sum "$file"; } > "$outfile" & pids+=( $! ) (( seq_num++ )) done < <(find /usr/lib -name '*.so' -print0) wait # drain remaining # --- Fan-in: concatenate results in submission order --- for outfile in "${seqfiles[@]}"; do cat "$outfile" done
8 — xargs -P: Simple Parallelism Without Bash Loops
# xargs -P N runs up to N processes simultaneously # Combined with -n1 (one arg per process), this is a simple job pool find /data -name '*.gz' -print0 | xargs -0 -P8 -I{} gunzip -k {} # With a shell function: export the function, then call bash -c process_item() { convert "$1" -resize '800x600>' "${1%.png}_thumb.png" } export -f process_item find /photos -name '*.png' -print0 | xargs -0 -P$(nproc) -I{} bash -c 'process_item "$@"' _ {} # xargs -P exit-code caveat: xargs exits 0 even if some children fail (xargs < 4.8) # With GNU xargs 4.8+: exits non-zero if any child exited non-zero # Portable workaround: track failures in a temp file FAIL_FILE=$(mktemp) find /data -name '*.csv' -print0 | xargs -0 -P4 -I{} bash -c 'validate_csv "$1" || touch "$2"' _ {} "$FAIL_FILE" [[ -s "$FAIL_FILE" ]] && { echo "Some files failed validation" >&2; exit 1; } rm "$FAIL_FILE"
| Technique | Bash version | Best for | Limitation |
|---|---|---|---|
& + wait |
Any | Fire-and-forget, known small set | No built-in rate limiting |
wait -n pool |
5.1+ | Bounded concurrency, large input | Cannot retrieve which PID finished |
| FIFO semaphore | 4.1+ | Bounded concurrency on 4.x | Slightly more setup code |
xargs -P |
Any | Simple one-liner parallelism | No per-job exit code tracking |
| GNU parallel | Any (external) | Production pipelines, SSH, retries | External dependency |
Exercises
Exercise 1 — Parallel URL checker
Write a script check_urls.sh that reads URLs from a file (one per line)
and checks each with curl -sIo /dev/null -w '%{http_code}'.
Run at most 8 checks concurrently. Print a summary line per URL:
200 OK https://... or 404 FAIL https://....
Output must appear in the original URL order regardless of completion order.
Non-200 codes should also be written to failed_urls.txt.
#!/usr/bin/env bash set -euo pipefail URL_FILE="${1:?Usage: $0 URL_FILE}" MAX_JOBS=8 WORKDIR=$(mktemp -d) trap 'rm -rf "$WORKDIR"' EXIT declare -a urls pids mapfile -t urls < "$URL_FILE" running=0 for i in "${!urls[@]}"; do url="${urls[$i]}" outf="${WORKDIR}/${i}" { code=$(curl -sIo /dev/null -w '%{http_code}' --max-time 10 "$url" 2>/dev/null || echo "000") printf '%s %s\n' "$code" "$url" > "$outf" } & pids+=( $! ) (( ++running )) if (( running >= MAX_JOBS )); then wait -n (( running-- )) fi done wait # Fan-in: print in order, collect failures for i in "${!urls[@]}"; do line=$(< "${WORKDIR}/${i}") code="${line%% *}" if [[ $code == "200" ]]; then printf '%s OK %s\n' "$code" "${urls[$i]}" else printf '%s FAIL %s\n' "$code" "${urls[$i]}" printf '%s\n' "${urls[$i]}" >> failed_urls.txt fi done
Exercise 2 — Locked shared counter
Write a script that launches 100 background subshells, each calling a
counter_inc function that increments a shared counter stored in a file.
Use flock to prevent lost updates. After all jobs complete, assert the
final counter value is exactly 100 and print PASS or FAIL. Then repeat the test
without the lock and show that the unprotected version produces a value
less than 100 (a classic race).
#!/usr/bin/env bash set -uo pipefail run_test() { local use_lock="$1" local cfile cfile=$(mktemp) local lfile="${cfile}.lock" echo 0 > "$cfile" inc_locked() { ( flock -x 9 n=$(<"$1") printf '%d\n' $(( n + 1 )) > "$1" ) 9>"$2" } inc_unlocked() { n=$(<"$1") printf '%d\n' $(( n + 1 )) > "$1" } for _ in {1..100}; do if [[ $use_lock == "yes" ]]; then inc_locked "$cfile" "$lfile" & else inc_unlocked "$cfile" & fi done wait local final final=$(<"$cfile") rm -f "$cfile" "$lfile" if (( final == 100 )); then printf '[lock=%s] final=%d PASS\n' "$use_lock" "$final" else printf '[lock=%s] final=%d FAIL (race condition demonstrated)\n' \ "$use_lock" "$final" fi } run_test yes # should always print PASS run_test no # will usually print FAIL (race)
Exercise 3 — Bounded image converter
Write convert_all.sh SRCDIR DSTDIR that converts every
.png in SRCDIR to a 800px-wide JPEG in DSTDIR using ImageMagick's
convert command. Constraints:
- Run at most
$(nproc)conversions concurrently - Trap SIGINT/SIGTERM and kill all running children before exiting
- Print a progress line
[N/TOTAL] converting FILEbefore each job starts - At the end print how many succeeded and how many failed (non-zero exit)
#!/usr/bin/env bash set -uo pipefail SRCDIR="${1:?Usage: $0 SRCDIR DSTDIR}" DSTDIR="${2:?Usage: $0 SRCDIR DSTDIR}" MAX=$(nproc) WORKDIR=$(mktemp -d) mkdir -p "$DSTDIR" declare -a CHILD_PIDS cleanup() { local p for p in "${CHILD_PIDS[@]}"; do kill "$p" 2>/dev/null; done rm -rf "$WORKDIR" exit 130 } trap 'rm -rf "$WORKDIR"' EXIT trap cleanup INT TERM mapfile -t -d '' files < <(find "$SRCDIR" -maxdepth 1 -name '*.png' -print0) total="${#files[@]}" done_count=0 ok=0; fail=0 running=0 for i in "${!files[@]}"; do src="${files[$i]}" base=$(basename "$src" .png) dst="${DSTDIR}/${base}.jpg" statusf="${WORKDIR}/${i}" (( done_count++ )) printf '[%d/%d] converting %s\n' "$done_count" "$total" "$base" { if convert "$src" -resize '800x>' "$dst" 2>/dev/null; then echo ok > "$statusf" else echo fail > "$statusf" fi } & CHILD_PIDS+=( $! ) (( ++running )) if (( running >= MAX )); then wait -n; (( running-- )) fi done wait for i in "${!files[@]}"; do status=$(< "${WORKDIR}/${i}") [[ $status == ok ]] && (( ok++ )) || (( fail++ )) done printf '\nDone: %d succeeded, %d failed\n' "$ok" "$fail" (( fail == 0 )) || exit 1
Exercise 4 — GNU parallel deep dive
Using GNU parallel, write a one-liner (or short pipeline) that:
- Finds all
.logfiles under/var/log - Counts the number of lines containing
ERRORin each file (usinggrep -c) - Runs up to 6 jobs concurrently
- Keeps output in the original file order
- Writes a
joblogto/tmp/grep_jobs.log - Retries failed jobs up to 2 times
Then write a second pipeline that reads the joblog and prints the filenames of any jobs that ultimately failed (non-zero exit after retries).
# Part 1: parallel grep with all requirements find /var/log -name '*.log' -print0 | parallel \ --null \ --keep-order \ -j6 \ --joblog /tmp/grep_jobs.log \ --retries 2 \ 'grep -c "ERROR" {} || true; printf "%s\n" {}' # Explanation of each flag: # --null : input items are NUL-delimited (matches find -print0) # --keep-order : output in submission order, not completion order # -j6 : at most 6 concurrent jobs # --joblog : write TSV job log to this file # --retries 2 : retry up to 2 times on non-zero exit # || true : grep exits 1 when no matches — treat as 0 matches, not failure # Part 2: report ultimately-failed jobs from the joblog # Joblog columns (tab-separated): # Seq Host Starttime JobRuntime Send Receive Exitval Signal Command awk -F'\t' ' NR == 1 { next } # skip header $7 != 0 { # Exitval column cmd = $NF # Extract filename: everything after the last space in the command n = split(cmd, parts, " ") print parts[n] } ' /tmp/grep_jobs.log # Simpler with grep+cut for quick inspection: awk -F'\t' 'NR>1 && $7!=0' /tmp/grep_jobs.log | cut -f9 | sed 's/.*[[:space:]]//'