Job Control and Signals

📡 Intermediate Topic 3 — Job Control and Signals

Every process in Linux lives inside a web of signals — the operating system's notification mechanism for events ranging from "the user pressed Ctrl+C" to "your parent process died" to "please flush your buffers and exit gracefully." Most scripts ignore this entirely and work fine for simple tasks. But the moment your script spawns background jobs, holds open files, manages a lock, or does anything that matters, you need to understand signals and job control: how to catch them, forward them to child processes, clean up properly, and implement graceful shutdown. This chapter covers the full picture.

1 — What Are Signals?

A signal is an asynchronous notification sent to a process by the kernel, another process, or the process itself. When a signal arrives, the process's current execution is interrupted and a signal handler runs — either a default kernel action or a custom function you register with trap. Signals have numbers, but are almost always referred to by name.

Signal delivery model: Sender Kernel Target process ────── ────── ────────────── kill -TERM 1234 ──────▶ marks signal pending ──────▶ interrupted Ctrl+C in terminal ──────▶ SIGINT to process group handler runs process exit ──────▶ SIGHUP to children (or default action) Default actions for unhandled signals: Terminate — kill the process (SIGTERM, SIGHUP, SIGPIPE…) Terminate+core — kill and write a core dump (SIGSEGV, SIGABRT…) Stop — pause the process (resumable) (SIGTSTP, SIGSTOP) Continue — resume a stopped process (SIGCONT) Ignore — silently discard (SIGCHLD by default in some shells)

Signal reference table

Signal	Number	Default action	Common cause / meaning
SIGHUP	1	Terminate	Terminal closed; also used to ask daemons to reload config
SIGINT	2	Terminate	Ctrl+C — interactive interrupt
SIGQUIT	3	Core dump	Ctrl+\ — quit and dump core
SIGKILL	9	Terminate	Unconditional kill — cannot be caught, blocked, or ignored
SIGUSR1	10	Terminate	Application-defined; use it for custom notifications
SIGUSR2	12	Terminate	Application-defined; second user signal
SIGPIPE	13	Terminate	Write to a pipe with no reader (broken pipe)
SIGALRM	14	Terminate	Timer set with `alarm()` expired
SIGTERM	15	Terminate	Polite termination request — the default signal sent by `kill`
SIGCHLD	17	Ignore	Child process stopped or terminated
SIGCONT	18	Continue	Resume a stopped process
SIGSTOP	19	Stop	Pause — cannot be caught or ignored
SIGTSTP	20	Stop	Ctrl+Z — terminal stop (can be caught, unlike SIGSTOP)
SIGWINCH	28	Ignore	Terminal window resized

🐧 Listing signals on your system

kill -l                    # list all signal names
kill -l TERM              # get number for SIGTERM  → 15
kill -l 9                 # get name for signal 9   → KILL
trap -l                   # bash's trap list (includes pseudo-signals)

2 — `trap` In Depth

trap registers a handler — a shell command or function — to run when a specific signal or event is received. It is the primary mechanism for cleanup, graceful shutdown, and debugging in bash scripts.

🐧 trap syntax and basic usage

# Syntax:  trap 'COMMAND' SIGNAL [SIGNAL ...]
# The command is a string — evaluated when the signal fires, not when trap runs

# ── Catch Ctrl+C ─────────────────────────────────────────────
trap 'echo "Caught Ctrl+C — cleaning up"; exit 1' INT

# ── Run cleanup on any exit — the most important trap ─────────
trap 'rm -f /tmp/mylock.$$' EXIT
# EXIT fires on: normal exit, exit N, error with set -e, signals
# It does NOT fire on SIGKILL (nothing can catch that)

# ── Reset a trap to its default behaviour ─────────────────────
trap - INT             # reset SIGINT to default (terminate)

# ── Ignore a signal ───────────────────────────────────────────
trap '' HUP            # ignore SIGHUP (e.g. when run via nohup)
trap '' PIPE           # ignore broken pipe — useful in producers

# ── View currently registered traps ───────────────────────────
trap -p                # print all active traps
trap -p EXIT           # print just the EXIT trap

Bash pseudo-signal events

In addition to real signals, bash supports three special event names usable with trap:

Event	When it fires
`EXIT`	When the shell exits for any reason (not SIGKILL). Runs after the last command.
`ERR`	When any command exits with a non-zero status (and `set -e` is not set, or is set and would trigger). Useful for centralised error logging.
`DEBUG`	Before every simple command. Used for tracing and step-by-step debuggers.
`RETURN`	When a function or sourced file returns.

The canonical EXIT trap pattern

🐧 Clean, robust cleanup with EXIT trap

#!/usr/bin/env bash
set -euo pipefail

# Declare resources upfront so cleanup is always safe to call
TMPDIR=""
LOCKFILE=""
CHILD_PID=""

cleanup() {
    local rc=$?        # capture exit code before any cleanup commands change it
    # Remove temp dir if it was created
    [[ -n "$TMPDIR"   && -d "$TMPDIR"   ]] && rm -rf "$TMPDIR"
    # Release lock if held
    [[ -n "$LOCKFILE" && -f "$LOCKFILE" ]] && rm -f  "$LOCKFILE"
    # Kill background child if still running
    [[ -n "$CHILD_PID" ]] && kill "$CHILD_PID" 2>/dev/null || true
    exit "$rc"          # preserve the original exit code
}
trap cleanup EXIT

# ── Script body ───────────────────────────────────────────────
TMPDIR=$(mktemp -d)
LOCKFILE=/var/run/myscript.lock
touch "$LOCKFILE"

# Start a background worker
sleep 60 &
CHILD_PID=$!

# ... script body ...
echo "Working in $TMPDIR"

# cleanup() runs automatically on exit, error, or Ctrl+C

Always capture $? as the very first thing inside the cleanup function — subsequent commands will overwrite it. Then call exit "$rc" at the end to preserve the original exit code for callers.

The ERR trap — centralised error handling

🐧 ERR trap for logging failures with context

#!/usr/bin/env bash
set -euo pipefail

on_error() {
    local rc=$?
    local line=$1
    # BASH_COMMAND holds the command that failed
    printf '\033[31m[ERROR]\033[0m Line %d: command "%s" exited with code %d\n' \
        "$line" "$BASH_COMMAND" "$rc" >&2
    # Print a mini stack trace
    local i
    printf '  Stack:\n' >&2
    for (( i=1; i < ${#FUNCNAME[@]}; i++ )); do
        printf '    [%d] %s() at %s line %d\n' \
            "$i" "${FUNCNAME[$i]}" "${BASH_SOURCE[$i]}" "${BASH_LINENO[$((i-1))]}" >&2
    done
}
# Pass $LINENO as an argument so on_error knows where it was called
trap 'on_error $LINENO' ERR

inner() { false; }   # a function that fails
outer() { inner; }   # calls inner
outer
[ERROR] Line 18: command "false" exited with code 1
  Stack:
    [1] inner() at script.sh line 18
    [2] outer() at script.sh line 19
    [3] main script at script.sh line 20

The ERR trap does not fire inside functions where the error is tested (e.g. if cmd; then or cmd || true). It fires only when a command fails and the failure would propagate — the same condition that triggers set -e.

3 — Sending Signals: `kill`, `pkill`, `killall`

🐧 Sending signals to processes

# ── kill — send a signal to a PID or job spec ─────────────────
kill      1234           # send SIGTERM (default) to PID 1234
kill -TERM  1234         # same, explicit
kill -15    1234         # same, by number
kill -KILL  1234         # SIGKILL — last resort, unkillable by anything else
kill -HUP   1234         # SIGHUP — often means "reload config" for daemons
kill -0     1234         # test if process exists (no signal sent, just check)
kill -0     1234 2>/dev/null && echo "process is alive"

# Send to multiple PIDs
kill -TERM 1234 5678 9012

# Send to a job by job spec (only works in interactive shells or with set -m)
sleep 100 &
kill -TERM %1            # %1 = job 1
kill -TERM %sleep        # %sleep = job named sleep

# Send to a process group (negative PID = whole group)
kill -TERM -1234         # send SIGTERM to all processes in group 1234
kill -TERM -$$           # send SIGTERM to the current script's process group

# ── pkill — send signal by process name or attribute ──────────
pkill myapp              # SIGTERM all processes named "myapp"
pkill -KILL myapp        # force-kill
pkill -u $USER python3  # only kill python3 owned by current user
pkill -f 'worker.py --queue high'  # match full command line (not just name)
pkill -P $$              # kill all direct children of the current shell

# ── killall — kill by exact name (GNU/Linux) ──────────────────
killall firefox          # kill all processes named exactly "firefox"
killall -e myapp         # -e = exact match (don't truncate at 15 chars)
killall -w myapp         # -w = wait until all processes die

# ── Graceful terminate, then force if needed ──────────────────
kill_gracefully() {
    local pid=$1 timeout=${2:-10}
    kill -TERM "$pid" 2>/dev/null || return 0  # already gone
    local i
    for (( i=0; i < timeout; i++ )); do
        kill -0 "$pid" 2>/dev/null || return 0  # gone
        sleep 1
    done
    kill -KILL "$pid" 2>/dev/null || true
    echo "Process $pid did not stop gracefully; SIGKILL sent" >&2
}

kill_gracefully 1234 5   # give 5 seconds before SIGKILL

4 — Job Control: `bg`, `fg`, `jobs`, `disown`

Job control lets the shell manage multiple processes, suspending and resuming them as needed. It's enabled by default in interactive shells; in scripts, you need set -m to enable it. Each job has a job number (like %1) as well as a PID.

Job states: Running ──Ctrl+Z──▶ Stopped ──bg──▶ Running (background) Running (background) ──fg──▶ Running (foreground) Running (background) ──exits──▶ Done (reported at next prompt) Job spec notation: %1 job number 1 %% or %+ most recently started/resumed job %- second most recent job %sleep job whose command starts with "sleep" %?log job whose command contains "log"

🐧 Interactive job control

# Start a background job
sleep 100 &
[1] 4872

# List all jobs
jobs
[1]+  Running    sleep 100 &

jobs -l                   # show PIDs too
[1]+  4872 Running    sleep 100 &

jobs -p                   # show only PIDs

# Bring job to foreground (reclaims terminal)
fg %1
# Now press Ctrl+Z to suspend it
[1]+  Stopped    sleep 100

# Send it to background, still running
bg %1
[1]+ sleep 100 &

# ── disown — detach a job from the shell ──────────────────────
# Without disown: when the shell exits, SIGHUP is sent to all jobs
long_running_process &
disown                   # disown %% (most recent job)
# Process now survives terminal close; shell no longer tracks it

disown %1               # disown specific job
disown -h %1            # disown but keep in jobs table (only removes SIGHUP)
disown -a               # disown all background jobs

Job control in scripts with `set -m`

🐧 Enabling job control in non-interactive scripts

#!/usr/bin/env bash
# Job control is off by default in scripts — enable it explicitly
set -m

sleep 30 &
JOB_PID=$!
JOB_NUM=$(jobs -l | awk '/'"$JOB_PID"'/{gsub(/[^0-9]/, "", $1); print $1}')

# With set -m, you can use job specs in kill/fg/bg
kill -STOP %1            # pause the job
kill -CONT %1            # resume it

# More commonly in scripts: just use PIDs directly
kill -TERM "$JOB_PID"

# Process groups: with set -m, each background job gets its own process group
# Kill the whole group (including any children the job spawned):
kill -TERM -"$JOB_PID"   # negative PID = process group

5 — `wait`: Collecting Background Jobs

wait pauses the script until background jobs complete, and crucially returns their exit codes. Without wait, background job failures are silently ignored.

🐧 wait, wait PID, and wait -n

# ── wait with no arguments — wait for ALL background jobs ─────
sleep 2 &
sleep 3 &
wait                     # blocks until both sleeps finish
echo "all done"

# ── wait PID — wait for a specific process ────────────────────
sleep 5 &
pid=$!
wait "$pid"
rc=$?
echo "sleep exited with: $rc"

# ── Capturing exit codes from parallel jobs ───────────────────
do_work() {
    local id=$1
    echo "Worker $id starting"
    sleep $(( RANDOM % 3 + 1 ))
    (( id == 2 )) && return 1   # worker 2 deliberately fails
    echo "Worker $id done"
}

pids=()
for i in 1 2 3; do
    do_work "$i" &
    pids+=( $! )
done

failed=0
for pid in "${pids[@]}"; do
    if ! wait "$pid"; then
        echo "PID $pid failed" >&2
        (( failed++ ))
    fi
done
(( failed > 0 )) && { echo "$failed job(s) failed"; exit 1; }
echo "All workers succeeded"

`wait -n` — act as soon as any job finishes (bash 5.1+)

🐧 wait -n and wait -p — first-completed patterns

# wait -n returns when the FIRST background job finishes
# Returns that job's exit code

#!/usr/bin/env bash
# Requires bash 5.1+ for -p (store PID of the finished job)
set -euo pipefail

# Start several jobs
for url in \
    https://api.example.com/health \
    https://db.example.com/health \
    https://cache.example.com/health
do
    curl -sf "$url" > /dev/null &
done

# wait -n: wait for ANY one job to finish, capture its PID with -p
failed=0
while wait -n -p finished_pid 2>/dev/null; do
    true   # loop until all background jobs are done
done || (( failed++ ))
# Note: the loop exits when there are no more background jobs
# Return value of wait -n is the finished job's exit code

# Simpler pattern: bounded parallelism with wait -n
MAX_JOBS=4
running=0

for item in a b c d e f g h; do
    (( running >= MAX_JOBS )) && { wait -n; (( running-- )); }
    process_item "$item" &
    (( running++ ))
done
wait   # wait for remaining jobs

wait -n requires bash 4.3+. The -p var option to store the finished PID requires bash 5.1+. Check with echo $BASH_VERSION. For older bash, track PIDs in an array and poll with kill -0.

6 — Graceful Shutdown Patterns

A graceful shutdown means: stop accepting new work, finish what's in progress, clean up resources, and exit with an accurate status code. The challenge is that SIGTERM can arrive at any moment — you can't predict whether it hits a sleep, a curl, or a database commit. The patterns below handle this robustly.

Signal forwarding — don't leave children orphaned

🐧 Forwarding signals to child processes

#!/usr/bin/env bash
set -euo pipefail
# Problem: when the shell receives SIGTERM, child processes keep running
# unless you explicitly forward the signal

child_pid=""

forward_signal() {
    local sig=$1
    # Kill the child with the same signal we received
    [[ -n "$child_pid" ]] && kill -"$sig" "$child_pid" 2>/dev/null || true
}

trap 'forward_signal TERM' TERM
trap 'forward_signal INT'  INT
trap 'forward_signal HUP'  HUP

# Launch the real worker
./my_worker &
child_pid=$!

# Wait for it, but resume waiting if interrupted by a signal
# (trap handlers interrupt wait — the "|| true" re-enters the loop)
while ! wait "$child_pid"; do
    # wait was interrupted (signal arrived) — check if child is still alive
    kill -0 "$child_pid" 2>/dev/null || break
done

The shutdown flag pattern — controlled loop termination

🐧 Using a flag variable for clean loop exit

#!/usr/bin/env bash
set -euo pipefail

SHUTDOWN=0

on_shutdown() {
    echo "Shutdown signal received — finishing current item..." >&2
    SHUTDOWN=1
}
trap on_shutdown TERM INT HUP

# Main processing loop — check the flag at the top of each iteration
while (( SHUTDOWN == 0 )); do
    # Fetch next item from queue (this might block)
    item=$(fetch_next_item) || break

    # Process — we finish this even if shutdown was requested during fetch
    process_item "$item"
done

echo "Worker exited cleanly"

# ── SIGUSR1 for operational control ──────────────────────────
# Use SIGUSR1/SIGUSR2 for custom actions like dumping stats or rotating logs
STATS_REQUESTED=0

dump_stats() {
    STATS_REQUESTED=1
}
trap dump_stats USR1

# In the main loop, check and handle the flag:
if (( STATS_REQUESTED )); then
    print_stats
    STATS_REQUESTED=0
fi
# Send stats from another terminal:  kill -USR1 <script_pid>

Daemon-style restart-on-exit wrapper

🐧 Supervising a child process with restart logic

#!/usr/bin/env bash
# supervisor.sh — restart a command if it exits unexpectedly
set -uo pipefail

CMD=( "$@" )              # the command to supervise
MAX_RESTARTS=${MAX_RESTARTS:-5}
RESTART_DELAY=${RESTART_DELAY:-3}
QUIT=0
restarts=0

trap 'QUIT=1' TERM INT HUP

while (( QUIT == 0 && restarts < MAX_RESTARTS )); do
    echo "[supervisor] Starting: ${CMD[*]}"

    "${CMD[@]}" &
    child=$!

    # Wait for child; resume after signal interruption
    while ! wait "$child" 2>/dev/null; do
        kill -0 "$child" 2>/dev/null || break
    done
    rc=$?

    (( QUIT )) && break     # we initiated the shutdown, don't restart

    (( restarts++ ))
    echo "[supervisor] Exited ($rc) — restart $restarts/$MAX_RESTARTS in ${RESTART_DELAY}s"
    sleep "$RESTART_DELAY"
done

(( restarts >= MAX_RESTARTS )) && echo "[supervisor] Max restarts reached. Giving up."
echo "[supervisor] Exiting"

7 — Trap Gotchas and Best Practices

🐧 Common trap mistakes and how to avoid them

# ── Gotcha 1: trap is NOT inherited by subshells ──────────────
trap 'echo "parent cleanup"' EXIT
(
    # This subshell does NOT inherit the EXIT trap — runs with no trap
    echo "in subshell"
)
# The parent's EXIT trap still fires when the parent exits

# ── Gotcha 2: trap is NOT inherited by child processes ─────────
# A script called by your script starts with clean traps
# (unless you export with export -f or the child is a function)

# ── Gotcha 3: trap string vs function name timing ─────────────
file="initial"
# BAD: the string is evaluated NOW — $file captures "initial"
trap "rm -f $file"   EXIT   # ← double quotes: expands immediately

# GOOD: single quotes — $file evaluated when the trap fires
trap 'rm -f "$file"' EXIT   # ← single quotes: evaluates at trap time

# BEST: use a function — most readable and testable
my_cleanup() { rm -f "$file"; }
trap my_cleanup EXIT         # no quotes needed for function name

# ── Gotcha 4: stacking traps — each trap call REPLACES the previous ──
trap 'echo first'  EXIT
trap 'echo second' EXIT      # replaces the first trap!
# Only "second" will print — not "first"

# Fix: chain in a single trap, or use a cleanup function that accumulates
CLEANUP_TASKS=()
add_cleanup() { CLEANUP_TASKS+=( "$1" ); }
run_cleanups() {
    local rc=$?
    local task
    for task in "${CLEANUP_TASKS[@]}"; do
        eval "$task" || true   # run each, don't stop on failure
    done
    exit "$rc"
}
trap run_cleanups EXIT

# Now register individual cleanup actions at any point in the script
add_cleanup 'rm -f /tmp/myfile'
add_cleanup 'echo "Done cleaning up"'

# ── Gotcha 5: SIGKILL can never be caught ─────────────────────
trap 'echo caught' KILL     # ERROR: cannot trap SIGKILL
# This is by design — SIGKILL is the OS's emergency stop

8 — Quick Reference

trap syntax

Form	Effect
`trap 'cmd' SIG`	Run cmd when SIG is received
`trap fn SIG`	Call function fn when SIG is received
`trap 'cmd' SIG1 SIG2`	Same command for multiple signals
`trap '' SIG`	Ignore signal SIG
`trap - SIG`	Reset signal to default behaviour
`trap -p`	Print all active traps
`trap 'cmd' EXIT`	Run on any shell exit
`trap 'cmd' ERR`	Run when any command fails (non-zero exit)
`trap 'cmd' DEBUG`	Run before every simple command

Job control commands

Command	Effect
`cmd &`	Run cmd in background; sets $! to its PID
`jobs`	List background jobs with job numbers
`jobs -l`	List jobs with PIDs
`fg %N`	Bring job N to foreground
`bg %N`	Resume stopped job N in background
`disown %N`	Detach job N (survives terminal close)
`disown -h %N`	Stop job receiving SIGHUP (but keep in jobs table)
`wait`	Wait for all background jobs
`wait $pid`	Wait for a specific PID; returns its exit code
`wait -n`	Wait for any one job to finish (bash 4.3+)
`wait -n -p var`	Wait for any job; store its PID in var (bash 5.1+)
`kill -0 $pid`	Test if process exists (no signal sent)
`pkill -P $$`	Kill all direct children of this shell

✏️ Exercises

These exercises focus on writing scripts that behave correctly under interruption — the real test of signal handling. Run them in a terminal and test by pressing Ctrl+C or running kill -TERM <pid> from another window.

Exercise 1

Write a script called safe_processor.sh that reads lines from a file (passed as an argument), processes each one with a simulated 1-second operation, and — using a trap and shutdown flag — finishes the current item before exiting cleanly when it receives SIGTERM or SIGINT. It should print how many items it processed and whether it exited early.

Hint: set SHUTDOWN=0 and trap TERM and INT to set it to 1. The while loop checks (( SHUTDOWN == 0 )). Use sleep 1 to simulate processing. The trap handler should not call exit directly — let the loop terminate naturally.

Sample Solution

#!/usr/bin/env bash
# safe_processor.sh  FILE
set -uo pipefail

FILE="${1:?Usage: safe_processor.sh FILE}"
[[ -f "$FILE" ]] || { echo "Error: '$FILE' not found" >&2; exit 1; }

SHUTDOWN=0
processed=0
total=$(wc -l < "$FILE")

on_signal() {
    echo ""
    echo "Signal received — finishing current item and shutting down..." >&2
    SHUTDOWN=1
}
trap on_signal TERM INT

echo "Processing $total items from $FILE (PID: $$)"
echo "Send SIGTERM with:  kill -TERM $$"

while (( SHUTDOWN == 0 )) && IFS= read -r line; do
    printf "  Processing: %s ... " "$line"
    sleep 1   # simulate work
    printf "done\n"
    (( processed++ ))
done < "$FILE"

# Report result
echo
if (( SHUTDOWN )); then
    printf "\033[33mEarly exit:\033[0m processed %d of %d items\n" \
        "$processed" "$total"
    exit 130     # conventional: 128 + signal number (SIGINT=2)
else
    printf "\033[32mComplete:\033[0m processed all %d items\n" "$processed"
fi

Exercise 2

Write a script called parallel_jobs.sh that takes a list of commands (one per line from stdin or a file) and runs them in parallel with a configurable concurrency limit (MAX_JOBS environment variable, default 3). It should collect each job's exit code, report which commands succeeded and which failed, and exit with a non-zero code if any job failed. Use wait $pid to collect exit codes.

Hint: maintain a pids array mapping PID to command string. Use jobs -p | wc -l or a counter variable to track running jobs. When the count reaches MAX_JOBS, call wait -n (or loop over pids checking kill -0) before launching the next. At the end, wait for all remaining PIDs.

Sample Solution

#!/usr/bin/env bash
# parallel_jobs.sh  [FILE]   (reads commands from FILE or stdin)
set -uo pipefail

MAX_JOBS=${MAX_JOBS:-3}

# pids[PID]="command string"
declare -A pids=()
declare -A failed_cmds=()
running=0

reap_one() {
    # Wait for any one job to finish; record failures
    local pid rc
    for pid in "${!pids[@]}"; do
        if ! kill -0 "$pid" 2>/dev/null; then
            # Process is gone — collect exit code
            wait "$pid" && rc=0 || rc=$?
            if (( rc != 0 )); then
                failed_cmds["$pid"]="${pids[$pid]} (exit $rc)"
                printf '\033[31m[FAIL]\033[0m %s\n' "${pids[$pid]}" >&2
            else
                printf '\033[32m[OK]\033[0m   %s\n' "${pids[$pid]}"
            fi
            unset 'pids[$pid]'
            (( running-- ))
            return
        fi
    done
    # All still running — sleep briefly and retry
    sleep 0.1
}

total=0

while IFS= read -r cmd; do
    [[ -z "$cmd" || "$cmd" == '#'* ]] && continue
    # Wait until a slot opens
    while (( running >= MAX_JOBS )); do
        reap_one
    done
    # Launch
    eval "$cmd" &
    pids[$!]="$cmd"
    (( running++ ))
    (( total++ ))
done

# Wait for remaining jobs
while (( running > 0 )); do
    reap_one
done

# Summary
nfailed=${#failed_cmds[@]}
printf '\n%d jobs total — %d failed\n' "$total" "$nfailed"
(( nfailed > 0 )) && exit 1 || exit 0

# Usage:
# echo -e "sleep 1\nsleeep 1\nsleep 2" | MAX_JOBS=2 ./parallel_jobs.sh

Exercise 3

Write a script called monitored_run.sh that runs a command passed as arguments, sets a timeout using SIGALRM or a background sleep-and-kill pattern, and produces a structured report: command, start time, end time, duration in seconds, exit code, and whether it was killed due to timeout. The timeout should be configurable via a TIMEOUT environment variable (default: 10 seconds).

Hint: the pure-bash timeout pattern is: start the command as a background job, also start a sleep $TIMEOUT & job, then wait -n for whichever finishes first — if the sleep wins, kill the command. Store both PIDs. The SECONDS built-in variable tracks elapsed time since the script started.

Sample Solution

#!/usr/bin/env bash
# monitored_run.sh  COMMAND [ARGS...]
set -uo pipefail

TIMEOUT=${TIMEOUT:-10}
(( $# > 0 )) || { echo "Usage: monitored_run.sh COMMAND [ARGS]" >&2; exit 1; }

cmd_str="$*"
start_time=$(date '+%Y-%m-%d %H:%M:%S')
start_sec=$SECONDS

# Start the target command
"$@" &
cmd_pid=$!

# Start the timeout watchdog
sleep "$TIMEOUT" &
sleep_pid=$!

timed_out=0
rc=0

# Wait for whichever finishes first
# We poll rather than use wait -n for broader compatibility
while true; do
    # Check if the command is done
    if ! kill -0 "$cmd_pid" 2>/dev/null; then
        kill "$sleep_pid" 2>/dev/null || true
        wait "$cmd_pid" && rc=0 || rc=$?
        break
    fi
    # Check if timeout elapsed
    if ! kill -0 "$sleep_pid" 2>/dev/null; then
        timed_out=1
        kill -TERM "$cmd_pid" 2>/dev/null || true
        sleep 1
        kill -KILL "$cmd_pid" 2>/dev/null || true
        rc=124   # GNU timeout convention
        break
    fi
    sleep 0.1
done

end_time=$(date '+%Y-%m-%d %H:%M:%S')
duration=$(( SECONDS - start_sec ))

# Structured report
printf '\n╔══ Run Report ═══════════════════════════════\n'
printf '  Command   : %s\n'   "$cmd_str"
printf '  Start     : %s\n'   "$start_time"
printf '  End       : %s\n'   "$end_time"
printf '  Duration  : %ds\n'  "$duration"
printf '  Exit code : %d\n'  "$rc"
if (( timed_out )); then
    printf '  Timeout   : \033[31mYES — killed after %ds\033[0m\n' "$TIMEOUT"
else
    printf '  Timeout   : \033[32mno\033[0m\n'
fi
printf '╚═════════════════════════════════════════════\n'

exit "$rc"

Exercise 4

Write a script called cleanup_demo.sh that demonstrates the stackable cleanup pattern from section 7. It should create a temp directory, a lock file, and a background process during its startup, registering each as a separate cleanup task using an add_cleanup function. It should then deliberately fail (using false or a bad command) to show that all cleanup tasks still run, in registration order. Print each cleanup step as it runs.

Hint: implement the CLEANUP_TASKS array and run_cleanups function from the gotchas section. Make the cleanup functions print a message like [cleanup] removing /tmp/demo.XYZ before doing the actual removal. Use set -e so the deliberate failure triggers the EXIT trap.

Sample Solution

#!/usr/bin/env bash
# cleanup_demo.sh — stackable cleanup pattern
set -euo pipefail

# ── Cleanup stack ─────────────────────────────────────────────
CLEANUP_TASKS=()

add_cleanup() {
    CLEANUP_TASKS+=( "$1" )
}

run_cleanups() {
    local rc=$?
    echo ""
    printf '\033[36m[cleanup]\033[0m Running %d cleanup tasks...\n' \
        "${#CLEANUP_TASKS[@]}" >&2
    local task
    for task in "${CLEANUP_TASKS[@]}"; do
        eval "$task" || printf '\033[31m[cleanup]\033[0m Task failed: %s\n' \
            "$task" >&2
    done
    printf '\033[36m[cleanup]\033[0m Done. Original exit code: %d\n' \
        "$rc" >&2
    exit "$rc"
}

trap run_cleanups EXIT

# ── Startup: acquire resources ────────────────────────────────
echo "[1] Creating temp directory..."
TMPDIR=$(mktemp -d)
echo "    Created: $TMPDIR"
add_cleanup "printf '[cleanup] removing temp dir %s\n' \"$TMPDIR\" >&2; rm -rf \"$TMPDIR\""

echo "[2] Creating lock file..."
LOCKFILE=/tmp/cleanup_demo_$$.lock
touch "$LOCKFILE"
echo "    Created: $LOCKFILE"
add_cleanup "printf '[cleanup] releasing lock %s\n' \"$LOCKFILE\" >&2; rm -f \"$LOCKFILE\""

echo "[3] Starting background process..."
sleep 60 &
BG_PID=$!
echo "    PID: $BG_PID"
add_cleanup "printf '[cleanup] stopping background PID %d\n' $BG_PID >&2; kill $BG_PID 2>/dev/null || true"

# ── Deliberate failure ────────────────────────────────────────
echo
echo "[!] About to fail deliberately..."
false   # set -e causes EXIT trap to fire here

# This line never runs
echo "This should not print"

Job Control and Signals

📡 Intermediate Topic 3 — Job Control and Signals

1 — What Are Signals?

Signal reference table

2 — trap In Depth

Bash pseudo-signal events

The canonical EXIT trap pattern

The ERR trap — centralised error handling

3 — Sending Signals: kill, pkill, killall

4 — Job Control: bg, fg, jobs, disown

Job control in scripts with set -m

5 — wait: Collecting Background Jobs

wait -n — act as soon as any job finishes (bash 5.1+)

6 — Graceful Shutdown Patterns

Signal forwarding — don't leave children orphaned

The shutdown flag pattern — controlled loop termination

Daemon-style restart-on-exit wrapper

7 — Trap Gotchas and Best Practices

8 — Quick Reference

trap syntax

Job control commands

✏️ Exercises

2 — `trap` In Depth

3 — Sending Signals: `kill`, `pkill`, `killall`

4 — Job Control: `bg`, `fg`, `jobs`, `disown`

Job control in scripts with `set -m`

5 — `wait`: Collecting Background Jobs

`wait -n` — act as soon as any job finishes (bash 5.1+)