Concurrency and Parallelism

Chapter 5 — Concurrency and Parallelism

Bash has no threads, but it has processes — and processes are cheap enough on Linux that genuine parallel execution is well within reach of a shell script. This chapter covers the full spectrum: basic background jobs, precise waiting semantics, bounded job pools, race-condition avoidance, and GNU parallel for when you need industrial- strength throughput.

1 — Background Jobs and `wait`

The basics

# & sends a command to the background; $! is its PID
sleep 2 &
PID1=$!
sleep 3 &
PID2=$!

# wait PID — block until that specific process exits, return its exit status
wait $PID1; echo "sleep 2 exited: $?"
wait $PID2; echo "sleep 3 exited: $?"

# wait (no args) — block until ALL background jobs finish
for host in server{1..8}; do
  ping -c1 -W1 "$host" &>/dev/null &
done
wait
echo "All pings done"

`wait -n` — first to finish (Bash 5.1+)

# wait -n returns as soon as any one background job exits
# The exit status is that of the completed job
declare -a pids
for url in "${urls[@]}"; do
  curl -sfo "/tmp/dl_$$_${#pids[@]}" "$url" &
  pids+=( $! )
done

# Harvest results as they complete
for (( i=0; i<"${#pids[@]}"; i++ )); do
  wait -n
  echo "A job finished with status $?"
done

# wait -n -p VAR (Bash 5.3+) — also stores the PID of the completed job
wait -n -p finished_pid
echo "PID $finished_pid just completed"

Capturing exit statuses from background jobs

# Pattern: store PIDs, then wait for each and record exit codes
declare -A job_pids   # job_name → PID
declare -A job_exit   # job_name → exit code

run_job() {
  local name="$1"; shift
  "$@" &
  job_pids["$name"]=$!
}

harvest_jobs() {
  local name pid
  for name in "${!job_pids[@]}"; do
    pid="${job_pids[$name]}"
    wait "$pid"
    job_exit["$name"]=$?
  done
}

run_job backup    rsync -a /data/  /backup/
run_job compress  gzip -k /tmp/report.csv
run_job notify    curl -s "$WEBHOOK" -d '{"text":"starting"}'
harvest_jobs

for name in "${!job_exit[@]}"; do
  printf '%s: exit %d\n' "$name" "${job_exit[$name]}"
done

2 — Bounded Parallelism: Job Pools

Firing all jobs at once saturates CPU and I/O. A bounded pool keeps exactly N jobs running simultaneously — new ones start as old ones finish.

Simple semaphore with `wait -n`

parallel_run() {
  # parallel_run MAX_JOBS CMD [ARGS...]
  # Reads newline-delimited work items from stdin and runs CMD ITEM for each,
  # keeping at most MAX_JOBS running simultaneously.
  local max="$1"; shift
  local running=0
  local item
  while IFS= read -r item; do
    "$@" "$item" &
    (( running++ ))
    if (( running >= max )); then
      wait -n    # block until any one job finishes
      (( running-- ))
    fi
  done
  wait   # drain remaining jobs
}

process_file() {
  # Example worker: compress a file
  gzip -k "$1" && printf 'compressed %s\n' "$1"
}

find /data -name '*.log' | parallel_run 4 process_file

FD-based semaphore (works in Bash 4.x without `wait -n`)

# A counting semaphore using a FIFO and FD slots.
# Each "token" is a byte in the FIFO; a worker reads one to acquire,
# writes one back when done.

sem_init() {
  # sem_init N SEMAPHORE_FD_VAR
  local n="$1"
  local -n __fd="$2"
  local fifo
  fifo=$(mktemp -u)
  mkfifo "$fifo"
  exec {__fd}<>"$fifo"   # open for read+write so the FIFO stays open
  rm "$fifo"
  # Pre-fill with N tokens (one byte each)
  local i
  for (( i=0; i<n; i++ )); do
    printf 'x' >&$__fd
  done
}

sem_acquire() { IFS= read -r -n1 _ <&$1; }   # blocks until a token is available
sem_release() { printf 'x' >&$1; }              # returns a token

# Usage
SEM_FD
sem_init 4 SEM_FD

for item in "${items[@]}"; do
  sem_acquire "$SEM_FD"   # blocks if 4 jobs already running
  {
    process_file "$item"
    sem_release "$SEM_FD"
  } &
done
wait
exec {SEM_FD}>&-

3 — Collecting Output Safely

Background jobs write to the same stdout as the parent. Interleaved output is a common problem. The cleanest solutions: write to per-job temp files, or use a dedicated output file per job.

# Anti-pattern: interleaved output
for host in a b c d; do
  { echo "=== $host ==="; ssh "$host" 'uptime'; } &    # lines from different hosts intermix
done
wait

# Better: each job writes to its own temp file
declare -A tmpfiles
for host in a b c d; do
  tmpfiles["$host"]=$(mktemp)
  { echo "=== $host ==="; ssh "$host" 'uptime'; } > "${tmpfiles[$host]}" &
done
wait

# Print results in submission order
for host in a b c d; do
  cat "${tmpfiles[$host]}"
  rm -f "${tmpfiles[$host]}"
done

# Best: mktemp in a trap-guarded tmpdir so cleanup is guaranteed
TMPDIR=$(mktemp -d)
trap 'rm -rf "$TMPDIR"' EXIT

for host in a b c d; do
  { ssh "$host" 'uptime'; } > "${TMPDIR}/${host}" &
done
wait
for host in a b c d; do
  printf '=== %s ===\n' "$host"
  cat "${TMPDIR}/${host}"
done

4 — Race Conditions and Shared Resources

Background jobs share the parent's open file descriptors and environment but run in separate processes with separate address spaces. The classic race conditions are: concurrent writes to a shared file and read-modify-write on a counter.

Atomic appends with `flock`

# flock -x LOCKFILE CMD — exclusive lock around CMD
LOG=/tmp/parallel.log

safe_log() {
  # Append atomically — flock serialises concurrent writers
  flock -x "${LOG}.lock" \
    printf '[%s] %s\n' "$(date +%T)" "$*" >> "$LOG"
}

for i in {1..20}; do
  { safe_log "job $i started"; sleep 0.1; safe_log "job $i done"; } &
done
wait

Shared counters via a locked temp file

# Shared mutable state between processes requires a file + lock.
# In-memory variables are NOT shared — each child has its own copy.

COUNTER_FILE=$(mktemp)
echo 0 > "$COUNTER_FILE"

counter_inc() {
  (
    flock -x 9
    local n
    n=$(< "$COUNTER_FILE")
    echo $(( n + 1 )) > "$COUNTER_FILE"
  ) 9>"${COUNTER_FILE}.lock"
}

counter_get() { < "$COUNTER_FILE"; }

for i in {1..50}; do
  { counter_inc; } &
done
wait
echo "Final count: $(counter_get)"   # 50 — not a random number
rm -f "$COUNTER_FILE" "${COUNTER_FILE}.lock"

The subshell variable trap

# Variables set inside & subshells are invisible to the parent
result=""
{ result="hello"; } &
wait
echo "'$result'"   # '' — the assignment happened in a child process

# Solution: communicate via file, pipe, or temp file
tmpf=$(mktemp)
{ echo "hello" > "$tmpf"; } &
wait
result=$(< "$tmpf")
echo "'$result'"   # 'hello'
rm "$tmpf"

5 — Signal Propagation to Children

When your script receives SIGINT or SIGTERM, background jobs do not automatically die — the kernel sends SIGINT to the entire foreground process group, but trap in the parent does not automatically reach & children spawned before the trap was set. Be explicit.

declare -a CHILD_PIDS

cleanup() {
  printf '\nInterrupted — killing children\n' >&2
  # Kill all children in our tracking array
  local pid
  for pid in "${CHILD_PIDS[@]}"; do
    kill "$pid" 2>/dev/null
  done
  wait
  exit 130
}
trap cleanup INT TERM

for item in "${items[@]}"; do
  slow_process "$item" &
  CHILD_PIDS+=( $! )
done
wait

# Alternative: kill the entire process group
cleanup_group() {
  # kill -- -$$ sends signal to every process in this script's process group
  kill -- -$$ 2>/dev/null
  exit 130
}

6 — GNU Parallel

GNU parallel is the right tool when you need configurable parallelism, progress bars, retry logic, argument templating, or cross-host execution. It is not a Bash built-in but is available on all major distributions and is worth knowing.

Basic usage

# Run one job per CPU core (default)
parallel gzip ::: *.log

# Explicit job count
parallel -j8 gzip ::: *.log

# Read arguments from stdin
find /data -name '*.csv' | parallel -j4 process_csv {}

# {} is the input item; {.} strips extension; {/} basename; {//} dirname
find /data -name '*.flac' | \
  parallel ffmpeg -i {} -q:a 2 '{.}.mp3'

# Multiple input sources
parallel echo '{1} x {2}' ::: a b c ::: 1 2
# a x 1   a x 2   b x 1   b x 2   c x 1   c x 2

Advanced options

# --bar: progress bar  --eta: time estimate  --joblog: machine-readable log
parallel --bar --joblog /tmp/jobs.log -j4 process_file ::: "${files[@]}"

# --retries N: retry failed jobs up to N times
parallel --retries 3 -j4 curl -sfo '{/}' ::: "${urls[@]}"

# --keep-order: print results in submission order (not completion order)
parallel --keep-order -j8 sha256sum ::: "${files[@]}"

# --delay S: stagger job starts by S seconds (rate limiting)
parallel --delay 0.5 -j4 curl -s ::: "${api_calls[@]}"

# --halt now,fail=1: stop all jobs as soon as one fails
parallel --halt now,fail=1 -j8 validate_file ::: "${files[@]}"

# Run on multiple remote hosts via SSH
parallel --sshlogin server1,server2,server3 uptime

# Export a shell function for parallel to use
my_func() { echo "Processing: $1"; }
export -f my_func
parallel -j4 my_func ::: "${items[@]}"

Reading the joblog

# The joblog TSV format:
# Seq  Host  Starttime  JobRuntime  Send  Receive  Exitval  Signal  Command
awk -F'\t' 'NR>1 && $7 != 0 { print "FAILED:", $NF }' /tmp/jobs.log
# List only failed jobs for re-submission
awk -F'\t' 'NR>1 && $7 != 0 { print $NF }' /tmp/jobs.log |
  parallel --retries 5 -j2 {}

7 — Patterns: Fan-out / Fan-in

A common parallel pipeline: fan-out splits a work queue among N workers; fan-in collects results into a single ordered stream. Both stages can be implemented in pure Bash.

#!/usr/bin/env bash
# Fan-out/fan-in: process files in parallel, collect results in order
set -euo pipefail

JOBS=${JOBS:-$(( $(nproc) * 2 ))}
WORKDIR=$(mktemp -d)
trap 'rm -rf "$WORKDIR"' EXIT

# --- Fan-out: submit all jobs, write output to numbered temp files ---
seq_num=0
declare -a pids seqfiles

while IFS= read -r -d '' file; do
  outfile="${WORKDIR}/${seq_num}.out"
  seqfiles+=( "$outfile" )

  # Bounded: wait if at capacity
  if (( "${#pids[@]}" >= JOBS )); then
    wait -n
    # Remove any finished PIDs from the array
    declare -a live=()
    for p in "${pids[@]}"; do
      kill -0 "$p" 2>/dev/null && live+=( "$p" )
    done
    pids=( "${live[@]}" )
  fi

  # Dispatch worker
  { sha256sum "$file"; } > "$outfile" &
  pids+=( $! )
  (( seq_num++ ))
done < <(find /usr/lib -name '*.so' -print0)

wait   # drain remaining

# --- Fan-in: concatenate results in submission order ---
for outfile in "${seqfiles[@]}"; do
  cat "$outfile"
done

8 — `xargs -P`: Simple Parallelism Without Bash Loops

# xargs -P N runs up to N processes simultaneously
# Combined with -n1 (one arg per process), this is a simple job pool

find /data -name '*.gz' -print0 |
  xargs -0 -P8 -I{} gunzip -k {}

# With a shell function: export the function, then call bash -c
process_item() {
  convert "$1" -resize '800x600>' "${1%.png}_thumb.png"
}
export -f process_item

find /photos -name '*.png' -print0 |
  xargs -0 -P$(nproc) -I{} bash -c 'process_item "$@"' _ {}

# xargs -P exit-code caveat: xargs exits 0 even if some children fail (xargs < 4.8)
# With GNU xargs 4.8+: exits non-zero if any child exited non-zero
# Portable workaround: track failures in a temp file
FAIL_FILE=$(mktemp)
find /data -name '*.csv' -print0 |
  xargs -0 -P4 -I{} bash -c 'validate_csv "$1" || touch "$2"' _ {} "$FAIL_FILE"
[[ -s "$FAIL_FILE" ]] && { echo "Some files failed validation" >&2; exit 1; }
rm "$FAIL_FILE"

Technique	Bash version	Best for	Limitation
`&` + `wait`	Any	Fire-and-forget, known small set	No built-in rate limiting
`wait -n` pool	5.1+	Bounded concurrency, large input	Cannot retrieve which PID finished
FIFO semaphore	4.1+	Bounded concurrency on 4.x	Slightly more setup code
`xargs -P`	Any	Simple one-liner parallelism	No per-job exit code tracking
GNU parallel	Any (external)	Production pipelines, SSH, retries	External dependency

Exercises

Exercise 1 — Parallel URL checker

Write a script check_urls.sh that reads URLs from a file (one per line) and checks each with curl -sIo /dev/null -w '%{http_code}'. Run at most 8 checks concurrently. Print a summary line per URL: 200 OK https://... or 404 FAIL https://.... Output must appear in the original URL order regardless of completion order. Non-200 codes should also be written to failed_urls.txt.

#!/usr/bin/env bash
set -euo pipefail

URL_FILE="${1:?Usage: $0 URL_FILE}"
MAX_JOBS=8
WORKDIR=$(mktemp -d)
trap 'rm -rf "$WORKDIR"' EXIT

declare -a urls pids
mapfile -t urls < "$URL_FILE"

running=0
for i in "${!urls[@]}"; do
  url="${urls[$i]}"
  outf="${WORKDIR}/${i}"
  {
    code=$(curl -sIo /dev/null -w '%{http_code}' --max-time 10 "$url" 2>/dev/null || echo "000")
    printf '%s %s\n' "$code" "$url" > "$outf"
  } &
  pids+=( $! )
  (( ++running ))
  if (( running >= MAX_JOBS )); then
    wait -n
    (( running-- ))
  fi
done
wait

# Fan-in: print in order, collect failures
for i in "${!urls[@]}"; do
  line=$(< "${WORKDIR}/${i}")
  code="${line%% *}"
  if [[ $code == "200" ]]; then
    printf '%s OK   %s\n' "$code" "${urls[$i]}"
  else
    printf '%s FAIL %s\n' "$code" "${urls[$i]}"
    printf '%s\n' "${urls[$i]}" >> failed_urls.txt
  fi
done

Exercise 2 — Locked shared counter

Write a script that launches 100 background subshells, each calling a counter_inc function that increments a shared counter stored in a file. Use flock to prevent lost updates. After all jobs complete, assert the final counter value is exactly 100 and print PASS or FAIL. Then repeat the test without the lock and show that the unprotected version produces a value less than 100 (a classic race).

#!/usr/bin/env bash
set -uo pipefail

run_test() {
  local use_lock="$1"
  local cfile
  cfile=$(mktemp)
  local lfile="${cfile}.lock"
  echo 0 > "$cfile"

  inc_locked() {
    (
      flock -x 9
      n=$(<"$1")
      printf '%d\n' $(( n + 1 )) > "$1"
    ) 9>"$2"
  }

  inc_unlocked() {
    n=$(<"$1")
    printf '%d\n' $(( n + 1 )) > "$1"
  }

  for _ in {1..100}; do
    if [[ $use_lock == "yes" ]]; then
      inc_locked "$cfile" "$lfile" &
    else
      inc_unlocked "$cfile" &
    fi
  done
  wait

  local final
  final=$(<"$cfile")
  rm -f "$cfile" "$lfile"

  if (( final == 100 )); then
    printf '[lock=%s] final=%d PASS\n' "$use_lock" "$final"
  else
    printf '[lock=%s] final=%d FAIL (race condition demonstrated)\n' \
      "$use_lock" "$final"
  fi
}

run_test yes   # should always print PASS
run_test no    # will usually print FAIL (race)

Exercise 3 — Bounded image converter

Write convert_all.sh SRCDIR DSTDIR that converts every .png in SRCDIR to a 800px-wide JPEG in DSTDIR using ImageMagick's convert command. Constraints:

Run at most $(nproc) conversions concurrently
Trap SIGINT/SIGTERM and kill all running children before exiting
Print a progress line [N/TOTAL] converting FILE before each job starts
At the end print how many succeeded and how many failed (non-zero exit)

#!/usr/bin/env bash
set -uo pipefail

SRCDIR="${1:?Usage: $0 SRCDIR DSTDIR}"
DSTDIR="${2:?Usage: $0 SRCDIR DSTDIR}"
MAX=$(nproc)
WORKDIR=$(mktemp -d)
mkdir -p "$DSTDIR"

declare -a CHILD_PIDS

cleanup() {
  local p
  for p in "${CHILD_PIDS[@]}"; do kill "$p" 2>/dev/null; done
  rm -rf "$WORKDIR"
  exit 130
}
trap 'rm -rf "$WORKDIR"' EXIT
trap cleanup INT TERM

mapfile -t -d '' files < <(find "$SRCDIR" -maxdepth 1 -name '*.png' -print0)
total="${#files[@]}"
done_count=0
ok=0; fail=0
running=0

for i in "${!files[@]}"; do
  src="${files[$i]}"
  base=$(basename "$src" .png)
  dst="${DSTDIR}/${base}.jpg"
  statusf="${WORKDIR}/${i}"

  (( done_count++ ))
  printf '[%d/%d] converting %s\n' "$done_count" "$total" "$base"

  {
    if convert "$src" -resize '800x>' "$dst" 2>/dev/null; then
      echo ok > "$statusf"
    else
      echo fail > "$statusf"
    fi
  } &
  CHILD_PIDS+=( $! )
  (( ++running ))
  if (( running >= MAX )); then
    wait -n; (( running-- ))
  fi
done
wait

for i in "${!files[@]}"; do
  status=$(< "${WORKDIR}/${i}")
  [[ $status == ok ]] && (( ok++ )) || (( fail++ ))
done
printf '\nDone: %d succeeded, %d failed\n' "$ok" "$fail"
(( fail == 0 )) || exit 1

Exercise 4 — GNU parallel deep dive

Using GNU parallel, write a one-liner (or short pipeline) that:

Finds all .log files under /var/log
Counts the number of lines containing ERROR in each file (using grep -c)
Runs up to 6 jobs concurrently
Keeps output in the original file order
Writes a joblog to /tmp/grep_jobs.log
Retries failed jobs up to 2 times

Then write a second pipeline that reads the joblog and prints the filenames of any jobs that ultimately failed (non-zero exit after retries).

# Part 1: parallel grep with all requirements
find /var/log -name '*.log' -print0 |
  parallel \
    --null \
    --keep-order \
    -j6 \
    --joblog /tmp/grep_jobs.log \
    --retries 2 \
    'grep -c "ERROR" {} || true; printf "%s\n" {}'

# Explanation of each flag:
#   --null        : input items are NUL-delimited (matches find -print0)
#   --keep-order  : output in submission order, not completion order
#   -j6           : at most 6 concurrent jobs
#   --joblog      : write TSV job log to this file
#   --retries 2   : retry up to 2 times on non-zero exit
#   || true       : grep exits 1 when no matches — treat as 0 matches, not failure

# Part 2: report ultimately-failed jobs from the joblog
# Joblog columns (tab-separated):
#   Seq Host Starttime JobRuntime Send Receive Exitval Signal Command
awk -F'\t' '
  NR == 1 { next }        # skip header
  $7 != 0 {               # Exitval column
    cmd = $NF
    # Extract filename: everything after the last space in the command
    n = split(cmd, parts, " ")
    print parts[n]
  }
' /tmp/grep_jobs.log

# Simpler with grep+cut for quick inspection:
awk -F'\t' 'NR>1 && $7!=0' /tmp/grep_jobs.log |
  cut -f9 |
  sed 's/.*[[:space:]]//'