Integrating Bash with Other Languages
Chapter 12 — Integrating Bash with Other Languages
Bash's real power in a polyglot world is not what it can do alone — it is how cleanly it can orchestrate everything else. Knowing when to hand off to Python, Node, Ruby, or awk, how to pass structured data across that boundary, and how to use another language as a persistent coprocess rather than a repeatedly-forked subprocess are the skills that separate throw-it-together shell scripts from production-grade automation. This final chapter covers every major integration pattern and ends with an honest decision framework for when to stop using Bash at all.
1 — The Fork-per-Call Tax
Every $(python3 ...) inside a loop pays a full fork+exec cost — typically 5–30 ms on a modern Linux system. For one-off calls this is irrelevant. For ten thousand iterations it is a 30-second wall clock bill you did not budget for.
# Expensive: one Python process per line while IFS= read -r line; do result=$(python3 -c "import sys; print(sys.stdin.read().upper())" <<< "$line") printf '%s\n' "$result" done < bigfile.txt # Better: one process, all lines python3 -c "import sys; sys.stdout.write(sys.stdin.read().upper())" \ < bigfile.txt # Best for bidirectional work: a persistent coprocess (see Section 2)
| Pattern | Forks | Use when |
|---|---|---|
One-shot call $(python3 script.py) | 1 | Called once or rarely; simplest code |
| Batch stdin→stdout pipe | 1 | Large input, stateless transformation |
| Coprocess (persistent) | 1 | Many calls in a loop; stateful interpreter session |
| Background server + socket | 1 + overhead | Multiple shell scripts sharing one interpreter |
2 — Persistent Coprocesses
The coprocess pattern (introduced in Chapter 4) becomes especially valuable when the "other language" needs to be initialised once — loading libraries, connecting to a database, compiling a regex — and then serve many requests.
Python coprocess
#!/usr/bin/env bash # A persistent Python worker that handles JSON requests over stdin/stdout # ── The Python worker script ───────────────────────────────────── cat > /tmp/worker.py <<'PYEOF' import sys, json, hashlib, re # One-time initialisation: compile regex, load lookup tables, etc. PATTERN = re.compile(r'\b\w{4,}\b') for raw in sys.stdin: raw = raw.strip() if not raw: continue try: req = json.loads(raw) op = req.get('op') if op == 'hash': result = hashlib.sha256(req['data'].encode()).hexdigest() elif op == 'words': result = PATTERN.findall(req['text']) elif op == 'upper': result = req['text'].upper() else: result = None print(json.dumps({'ok': True, 'result': result}), flush=True) except Exception as e: print(json.dumps({'ok': False, 'error': str(e)}), flush=True) PYEOF # ── Start the coprocess ────────────────────────────────────────── coproc PY (python3 -u /tmp/worker.py) # -u: unbuffered — critical, otherwise our writes are never read py_call() { # Send a JSON request; read the JSON response local request="$1" printf '%s\n' "$request" >&"${PY[1]}" read -r response <&"${PY[0]}" printf '%s' "$response" } # ── Use it in a loop — only one Python process throughout ───────── while IFS= read -r line; do req=$(printf '{"op":"hash","data":"%s"}' \ "$(printf '%s' "$line" | sed 's/"/\\"/g')") resp=$(py_call "$req") printf '%s\t%s\n' "$line" "$resp" done < input.txt # Shut down the worker kill "$PY_PID" 2>/dev/null; wait "$PY_PID" 2>/dev/null
Node.js coprocess
cat > /tmp/worker.js <<'JSEOF' const readline = require('readline'); const crypto = require('crypto'); const rl = readline.createInterface({ input: process.stdin }); rl.on('line', (raw) => { try { const req = JSON.parse(raw); let result; if (req.op === 'md5') result = crypto.createHash('md5').update(req.data).digest('hex'); else if (req.op === 'upper') result = req.text.toUpperCase(); else if (req.op === 'slugify')result = req.text.toLowerCase().replace(/\s+/g, '-'); process.stdout.write(JSON.stringify({ ok: true, result }) + '\n'); } catch(e) { process.stdout.write(JSON.stringify({ ok: false, error: e.message }) + '\n'); } }); JSEOF coproc JS (node /tmp/worker.js) js_call() { printf '%s\n' "$1" >&"${JS[1]}" read -r REPLY <&"${JS[0]}" printf '%s' "$REPLY" } js_call '{"op":"slugify","text":"Hello World"}' # {"ok":true,"result":"hello-world"}
3 — Passing Data Structures via JSON
JSON is the universal interchange format between Bash and other languages. jq is the standard tool for reading and writing it from shell scripts.
# ── Reading JSON in Bash ───────────────────────────────────────── json='{"name":"alice","roles":["admin","user"],"meta":{"age":30}}' # Extract a scalar name=$(jq -r '.name' <<< "$json") # -r = raw (no quotes) age=$(jq -r '.meta.age' <<< "$json") role0=$(jq -r '.roles[0]' <<< "$json") # Read a JSON array into a Bash array mapfile -t roles < <( jq -r '.roles[]' <<< "$json" ) printf 'role: %s\n' "${roles[@]}" # role: admin # role: user # Iterate over JSON object keys while IFS='=' read -r key val; do printf '%s → %s\n' "$key" "$val" done < <(jq -r '.meta | to_entries[] | "\(.key)=\(.value)"' <<< "$json") # age → 30 # ── Building JSON from Bash variables ──────────────────────────── # WRONG: string interpolation breaks on special characters printf '{"name":"%s"}' "$name" # breaks if name contains " or \ # RIGHT: let jq do the escaping via --arg payload=$(jq -n \ --arg name "$name" \ --argjson age "$age" \ '{"name":$name,"age":$age}') # Build a JSON array from a Bash array tags=(production europe tier-1) json_tags=$(printf '%s\n' "${tags[@]}" | jq -R '.' | jq -s '.') # → ["production","europe","tier-1"] # Build a JSON object from an associative array declare -A env_vars=([HOST]=localhost [PORT]=8080 [DB]=mydb) json_obj=$(for k in "${!env_vars[@]}"; do jq -n --arg k "$k" --arg v "${env_vars[$k]}" '{"key":$k,"value":$v}' done | jq -s 'from_entries') # ── Streaming JSON (NDJSON / JSON Lines) ───────────────────────── # Each line is a self-contained JSON object — safe to pipe, grep, tail jq -c '.[]' big_array.json | while IFS= read -r obj; do name=$(jq -r '.name' <<< "$obj") printf 'Processing %s\n' "$name" done
4 — Embedding Other Languages Inline
For short, self-contained logic that does not warrant a separate file, embed the other language directly in the shell script using a heredoc. The heredoc keeps related code together and avoids the file-management overhead of temporary scripts.
# ── Inline Python ──────────────────────────────────────────────── result=$(python3 <<'EOF' import json, sys data = [{"id": i, "val": i**2} for i in range(1, 6)] print(json.dumps(data)) EOF ) echo "$result" # [{"id": 1, "val": 1}, {"id": 2, "val": 4}, ...] # ── Passing Bash variables into inline Python ───────────────────── threshold=42 label="my label" # Method 1: environment variables (safe for any value) export THRESHOLD="$threshold" export LABEL="$label" python3 <<'EOF' import os print(f"threshold={os.environ['THRESHOLD']}, label={os.environ['LABEL']}") EOF # Method 2: unquoted heredoc — Bash expands $vars before Python sees them # ONLY safe when the values are validated (no quotes, no backslashes) [[ $threshold =~ ^[0-9]+$ ]] || { echo "bad threshold" >&2; exit 1; } python3 <<EOF print("threshold is ${threshold}") EOF # ── Inline Ruby ────────────────────────────────────────────────── csv_data="alice,30,engineer bob,25,designer" ruby <<'EOF' require 'csv' STDIN.each_line do |line| row = CSV.parse_line(line.chomp) puts "#{row[0].upcase} (#{row[2]}), age #{row[1]}" end EOF <<< "$csv_data" # ── Inline Perl ────────────────────────────────────────────────── # Perl shines for complex text transformations perl -00 -ne 'print if /ERROR/i' /var/log/app.log # -00: paragraph mode (blank line = record separator) # -n: loop over input; -e: inline code # ── Inline awk — already in most scripts, mentioned for completeness awk 'BEGIN{OFS="\t"} NR>1{sum+=$3} END{printf "total: %.2f\n", sum}' data.csv
5 — Shell as Glue: Orchestrating Multi-Language Pipelines
Bash's most natural role in a polyglot system is as the top-level orchestrator: each stage of a pipeline is handled by the language best suited to it, and Bash wires them together.
#!/usr/bin/env bash # A real-world ETL pipeline: # 1. Fetch raw data (curl) # 2. Validate and transform (Python) # 3. Aggregate stats (awk) # 4. Format report (jq) # 5. Upload result (curl) set -euo pipefail API_URL="https://api.example.com/events" TOKEN=$(cat ~/.config/myapp/token) DATE=$(date -u '+%Y-%m-%d') TMPDIR=$(mktemp -d) trap 'rm -rf "$TMPDIR"' EXIT # Stage 1: fetch curl -sf -H "Authorization: Bearer $TOKEN" \ "$API_URL?date=$DATE" > "$TMPDIR/raw.json" # Stage 2: validate and normalise (Python is best for this) python3 <<'PY' < "$TMPDIR/raw.json" > "$TMPDIR/clean.ndjson" import sys, json, datetime for event in json.load(sys.stdin): if not event.get('timestamp') or not event.get('user_id'): continue event['ts'] = datetime.datetime.fromisoformat( event['timestamp']).strftime('%Y-%m-%dT%H:%M:%SZ') print(json.dumps(event)) PY # Stage 3: aggregate per-user event counts (awk is ideal) jq -r '"\(.user_id)\t\(.event_type)"' "$TMPDIR/clean.ndjson" \ | sort \ | awk -F'\t' '{counts[$1"/"$2]++} END{for(k in counts) print k, counts[k]}' \ > "$TMPDIR/counts.txt" # Stage 4: build final JSON report (jq) jq -Rn '[inputs | split(" ") | {key: .[0], count: (.[1] | tonumber)}]' \ "$TMPDIR/counts.txt" > "$TMPDIR/report.json" # Stage 5: upload curl -sf -X POST \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ --data-binary "@$TMPDIR/report.json" \ "$API_URL/reports" printf 'Report uploaded for %s\n' "$DATE"
6 — Calling Bash from Other Languages
The integration runs both ways. Sometimes your Python, Node, or Go program needs to shell out, and knowing what Bash sees on the other end helps you design the interface cleanly.
# ── What your script should export for callers ─────────────────── # Good script interface design: # - Reads config from env vars (easy for any caller) # - Takes structured input via stdin (JSON or line-delimited) # - Produces structured output on stdout # - Uses exit codes consistently (0 = success, non-zero = error) # - Writes human errors to stderr only (callers ignore stderr) #!/usr/bin/env bash # bin/process_order.sh — designed to be called from Python/Node/Go set -euo pipefail # Config from environment DB_HOST="${DB_HOST:?DB_HOST required}" DB_PORT="${DB_PORT:-5432}" # Input: JSON on stdin order=$(cat) order_id=$(jq -r '.id' <<< "$order") amount=$( jq -r '.amount' <<< "$order") # ... processing ... # Output: structured JSON on stdout jq -n --arg id "$order_id" --arg status "processed" \ '{"order_id":$id,"status":$status}'
# ── Calling the script from Python ─────────────────────────────── import subprocess, json, os order = {"id": "ORD-001", "amount": 99.99} result = subprocess.run( ["./bin/process_order.sh"], input=json.dumps(order), capture_output=True, text=True, env={**os.environ, "DB_HOST": "localhost"}, check=True # raises CalledProcessError on non-zero exit ) response = json.loads(result.stdout) print(response["status"]) # → processed
# ── Calling from Node.js ──────────────────────────────────────────
const { execFile } = require('child_process');
const order = JSON.stringify({ id: 'ORD-002', amount: 49.99 });
execFile('./bin/process_order.sh', [],
{ env: { ...process.env, DB_HOST: 'localhost' } },
(err, stdout, stderr) => {
if (err) { console.error('Script failed:', stderr); process.exit(1); }
const resp = JSON.parse(stdout);
console.log(resp.status);
}
);
7 — When to Stop Using Bash
The ability to hand off cleanly is as important as the ability to integrate. Continuing to grow a Bash script past its natural limits produces code that is harder to test, harder to reason about, and harder to hand to a colleague. Recognise the signals early.
| Signal | What it means in practice | Language to reach for |
|---|---|---|
| You need real data structures (trees, graphs, objects) | Associative arrays and parallel indexed arrays no longer model the problem cleanly | Python, Go, Ruby |
Error handling is more than || exit 1 | You need try/catch, error types, retry logic with backoff | Python, Go |
| You are writing >200 lines of logic | A colleague cannot understand the script without running it | Python (fastest rewrite) |
| You need HTTP or a database | curl + jq works up to a point; auth, retries, and pooling quickly exceed it | Python (requests/httpx), Go |
You need concurrency beyond & + wait | Proper thread pools, async I/O, or actor models | Go, Python (asyncio) |
| You want to unit-test the logic, not the plumbing | BATS covers the shell surface; complex logic belongs in a testable unit | Python, Go, Node |
| Someone else will maintain this in 6 months | The safe default is Python — it is on every server and universally readable | Python |
| You need a CLI with subcommands, flags, help text | Writing a clean CLI in Bash is possible but argparse / cobra are far faster | Python (Click/Typer), Go (cobra) |
The handoff pattern — keep the shell entry point
#!/usr/bin/env bash # bin/run.sh — thin shell wrapper; all logic is in Python # This is the right way to hand off: keep one entry point, # have it exec into the real implementation. set -euo pipefail # Guard: require Python 3.10+ py=$(command -v python3 || { echo "python3 not found" >&2; exit 1; }) ver=$("$py" -c "import sys; print('%d%02d' % sys.version_info[:2])") (( ver >= 310 )) || { echo "Python 3.10+ required (got $($py --version))" >&2; exit 1; } # Activate virtualenv if present [[ -f "${VIRTUAL_ENV:-}/bin/activate" ]] || { [[ -f .venv/bin/activate ]] && source .venv/bin/activate } # exec replaces this shell with Python — no double-process overhead exec "$py" -m myapp "$@"
8 — Practical Patterns Summary
| Task | Best tool | Notes |
|---|---|---|
| File/process orchestration | Bash | Its strongest domain |
| Text transformation (simple) | awk / sed | Already in every pipeline |
| JSON parsing/building | jq | Use --arg for safe value injection |
| Complex text / regex | Python / Perl | Inline heredoc for short snippets |
| Maths beyond integer arithmetic | Python / bc | bc for simple; Python for floats/complex |
| HTTP requests | curl + jq | Python requests/httpx for anything non-trivial |
| YAML / TOML / XML | yq / python3 | Never parse with grep/sed |
| Stateful computation in a loop | Python coprocess | One process, many requests via stdin/stdout |
| Parallel I/O-bound work | GNU parallel / xargs -P | Built-in parallelism without code complexity |
| Long-running service / daemon | Python / Go + systemd | Use shell only for the entry-point wrapper |
Exercises
Exercise 1 — JSON-speaking Python coprocess
Write a complete Bash script that uses a persistent Python coprocess to perform three operations without spawning more than one Python process:
- validate_email ADDRESS — returns
{"ok":true}or{"ok":false,"reason":"..."} - slugify TEXT — lowercases, replaces spaces and non-alphanumeric characters with hyphens, collapses multiple hyphens
- word_freq TEXT — returns a JSON object of word → count, sorted by count descending, top 5 only
The shell script should read lines from stdin in the format
OP ARG, call the appropriate operation via the coprocess,
and print the result. Handle coprocess startup failure and include a clean
shutdown on EXIT.
#!/usr/bin/env bash set -euo pipefail # ── Python worker ──────────────────────────────────────────────── WORKER=$(mktemp --suffix=.py) trap 'rm -f "$WORKER"' EXIT cat > "$WORKER" <<'PY' import sys, json, re from collections import Counter EMAIL_RE = re.compile(r'^[^@\s]+@[^@\s]+\.[^@\s]{2,}$') def validate_email(addr): if EMAIL_RE.match(addr): return {"ok": True} return {"ok": False, "reason": "does not match email pattern"} def slugify(text): s = text.lower() s = re.sub(r'[^a-z0-9\s-]', '', s) s = re.sub(r'[\s-]+', '-', s).strip('-') return {"ok": True, "result": s} def word_freq(text): words = re.findall(r'\b\w+\b', text.lower()) top = Counter(words).most_common(5) return {"ok": True, "result": {w: c for w, c in top}} DISPATCH = {"validate_email": validate_email, "slugify": slugify, "word_freq": word_freq} for raw in sys.stdin: raw = raw.strip() if not raw: continue try: req = json.loads(raw) fn = DISPATCH.get(req["op"]) resp = fn(req["arg"]) if fn else {"ok": False, "reason": "unknown op"} except Exception as e: resp = {"ok": False, "reason": str(e)} print(json.dumps(resp), flush=True) PY # ── Start coprocess ────────────────────────────────────────────── coproc PY (python3 -u "$WORKER") [[ -n "${PY_PID:-}" ]] || { echo "Failed to start worker" >&2; exit 1; } trap 'kill "$PY_PID" 2>/dev/null; wait "$PY_PID" 2>/dev/null' EXIT # ── Shell dispatch ──────────────────────────────────────────────── py_call() { local op="$1" arg="$2" local req req=$(jq -n --arg op "$op" --arg arg "$arg" '{"op":$op,"arg":$arg}') printf '%s\n' "$req" >&"${PY[1]}" local resp read -r resp <&"${PY[0]}" printf '%s\n' "$resp" } # ── Read OP ARG from stdin ──────────────────────────────────────── while IFS= read -r line; do [[ -z "$line" ]] && continue op="${line%% *}" arg="${line#* }" py_call "$op" "$arg" done
Exercise 2 — Multi-language ETL pipeline
Write a Bash script that implements a five-stage pipeline processing a
CSV log file of web requests
(timestamp,method,path,status,bytes,duration_ms):
- Validate (Python inline): skip rows where status is not
a 3-digit integer, duration is negative, or path contains
.. - Enrich (awk): add a sixth field
categorybased on status (2xx=ok, 3xx=redirect, 4xx=client_error, 5xx=server_error) - Aggregate (awk): count requests and sum bytes by category
- Format (jq): emit a JSON summary object with
date,total_requests, and aby_categorybreakdown - Save (Bash): write the JSON to
reports/YYYY-MM-DD.json, creating the directory if needed
All five stages should be connected in a single pipeline with no temporary files. Include a sample input generator at the top for testing.
#!/usr/bin/env bash set -euo pipefail DATE=$(date -u '+%Y-%m-%d') REPORT_DIR="${REPORT_DIR:-reports}" # ── Generate sample input ──────────────────────────────────────── sample_csv() { printf 'timestamp,method,path,status,bytes,duration_ms\n' printf '2026-06-10T10:00:01Z,GET,/api/users,200,1234,42\n' printf '2026-06-10T10:00:02Z,POST,/api/orders,201,890,118\n' printf '2026-06-10T10:00:03Z,GET,/static/img,304,0,5\n' printf '2026-06-10T10:00:04Z,GET,/../etc/passwd,400,200,-1\n' # invalid: path + neg duration printf '2026-06-10T10:00:05Z,GET,/api/products,404,320,30\n' printf '2026-06-10T10:00:06Z,GET,/api/orders,500,450,9999\n' printf '2026-06-10T10:00:07Z,DELETE,/api/users/1,204,0,55\n' printf 'bad,row,data\n' # invalid: too few fields } # ── Pipeline ───────────────────────────────────────────────────── run_pipeline() { local input="$1" # Stage 1: Validate (Python) ── skip header and bad rows # Stage 2: Enrich (awk) ── add category field # Stage 3: Aggregate (awk) ── count + sum by category # Stage 4: Format (jq) ── produce JSON python3 <<'PY' < "$input" \ | awk -F, <<'AWK' \ | awk -F, <<'AWK2' \ | jq -Rn --arg date "$DATE" <<'JQ' import sys, csv r = csv.reader(sys.stdin) next(r) # skip header for row in r: if len(row) < 6: continue ts, method, path, status, byt, dur = row[:6] if not status.isdigit() or len(status) != 3: continue if int(dur) < 0: continue if '..' in path: continue print(','.join(row[:6])) PY { status = int($4) if (status >= 200 && status < 300) cat = "ok" else if (status >= 300 && status < 400) cat = "redirect" else if (status >= 400 && status < 500) cat = "client_error" else cat = "server_error" print $0 "," cat } AWK { cat = $7 counts[cat]++ bytes[cat] += $5 } END { for (c in counts) print c "," counts[c] "," bytes[c] } AWK2 [ inputs | split(",") | { category: .[0], requests: (.[1]|tonumber), bytes: (.[2]|tonumber) } ] | { date: $date, total_requests: (map(.requests) | add), by_category: (map({(.category): {requests:.requests, bytes:.bytes}}) | add) } JQ } # ── Main ───────────────────────────────────────────────────────── INPUT=$(mktemp) trap 'rm -f "$INPUT"' EXIT sample_csv > "$INPUT" mkdir -p "$REPORT_DIR" run_pipeline "$INPUT" > "$REPORT_DIR/$DATE.json" printf 'Report written to %s/%s.json\n' "$REPORT_DIR" "$DATE" cat "$REPORT_DIR/$DATE.json"
Exercise 3 — Bash script designed to be called from Python
Design and write a Bash script bin/sys_info.sh that is
intended to be called from Python using subprocess.run. The
script should:
- Accept a
--formatflag:json(default) ortext - Collect: hostname, kernel version, uptime in seconds, load averages
(1/5/15 min), total/used/free memory in MB, and disk usage for
/(total/used/free in GB) - With
--format json, output a single clean JSON object using jq -n --arg/--argjson (never string interpolation) - With
--format text, output a human-readable summary - Exit 2 for unknown
--formatvalues - Write all errors to stderr; keep stdout clean for machine consumption
Then write a Python snippet that calls the script, parses the JSON, and prints a one-line health summary.
#!/usr/bin/env bash # bin/sys_info.sh set -euo pipefail FORMAT=json while (( $# )); do case "$1" in --format) FORMAT="$2"; shift 2 ;; *) printf 'Unknown option: %s\n' "$1" >&2; exit 2 ;; esac done [[ $FORMAT == json || $FORMAT == text ]] || { printf 'Unknown format: %s (use json or text)\n' "$FORMAT" >&2; exit 2 } # ── Collect data ───────────────────────────────────────────────── HOSTNAME_VAL=$(hostname -f 2>/dev/null || hostname) KERNEL=$(uname -r) UPTIME_S=$(awk '{printf "%d", $1}' /proc/uptime) # Load averages read -r LOAD1 LOAD5 LOAD15 _rest < /proc/loadavg # Memory in MB (from /proc/meminfo) memval() { awk -v k="$1" '$1==k{printf "%d", int($2/1024)}' /proc/meminfo; } MEM_TOTAL=$(memval MemTotal:) MEM_AVAIL=$(memval MemAvailable:) MEM_USED=$(( MEM_TOTAL - MEM_AVAIL )) # Disk usage for / in GB read -r DISK_TOTAL DISK_USED DISK_FREE <<< \ $(df -BG / | awk 'NR==2{gsub(/G/,""); print $2, $3, $4}') # ── Output ─────────────────────────────────────────────────────── if [[ $FORMAT == json ]]; then jq -n \ --arg hostname "$HOSTNAME_VAL" \ --arg kernel "$KERNEL" \ --argjson uptime_s "$UPTIME_S" \ --argjson load1 "$LOAD1" \ --argjson load5 "$LOAD5" \ --argjson load15 "$LOAD15" \ --argjson mem_total "$MEM_TOTAL" \ --argjson mem_used "$MEM_USED" \ --argjson mem_free "$MEM_AVAIL" \ --argjson disk_total "$DISK_TOTAL" \ --argjson disk_used "$DISK_USED" \ --argjson disk_free "$DISK_FREE" \ '{hostname:$hostname,kernel:$kernel,uptime_s:$uptime_s, load:{one:$load1,five:$load5,fifteen:$load15}, memory:{total_mb:$mem_total,used_mb:$mem_used,free_mb:$mem_free}, disk:{total_gb:$disk_total,used_gb:$disk_used,free_gb:$disk_free}}' else printf 'Host : %s (%s)\n' "$HOSTNAME_VAL" "$KERNEL" printf 'Uptime : %d s\n' "$UPTIME_S" printf 'Load : %s %s %s\n' "$LOAD1" "$LOAD5" "$LOAD15" printf 'Memory : %d / %d MB\n' "$MEM_USED" "$MEM_TOTAL" printf 'Disk / : %d / %d GB\n' "$DISK_USED" "$DISK_TOTAL" fi
# Python caller
import subprocess, json
result = subprocess.run(
["./bin/sys_info.sh", "--format", "json"],
capture_output=True, text=True, check=True
)
info = json.loads(result.stdout)
mem_pct = info["memory"]["used_mb"] / info["memory"]["total_mb"] * 100
disk_pct = info["disk"]["used_gb"] / info["disk"]["total_gb"] * 100
print(
f"{info['hostname']} "
f"load={info['load']['one']} "
f"mem={mem_pct:.0f}% "
f"disk={disk_pct:.0f}%"
)
Exercise 4 — The handoff: rewrite a Bash function in Python
The following Bash function has grown beyond Bash's comfortable limits.
It parses a YAML-ish config file, resolves variable references
(${VAR} within values), validates required keys, and returns
results as environment variables. It is 60 lines, has two known edge-case
bugs, and cannot be unit-tested without mocking the filesystem.
load_config() {
local file="$1"; local -A cfg
while IFS='=' read -r k v; do
[[ $k =~ ^[[:space:]]*# ]] && continue
[[ -z $k ]] && continue
cfg["${k// /}"]="${v}"
done < "$file"
for k in DB_HOST DB_PORT APP_NAME; do
[[ -v cfg[$k] ]] || { echo "missing: $k" >&2; return 1; }
done
for k in "${!cfg[@]}"; do
local val="${cfg[$k]}"
val="${val//\$\{DB_HOST\}/${cfg[DB_HOST]:-}}"
val="${val//\$\{DB_PORT\}/${cfg[DB_PORT]:-}}"
export "$k"="$val"
done
}
Rewrite this as bin/load_config.sh: a thin Bash wrapper that
calls a Python implementation, passes the config file path as an argument,
and evals the output to set the variables in the calling shell.
The Python implementation should: parse the file properly (handling
whitespace, inline comments, multi-word values), resolve
${VAR} references in topological order (a
variable can reference another defined earlier in the same file), validate
required keys, and emit export KEY='VALUE' lines using
shlex.quote so the output is safe to eval regardless of value
content.
#!/usr/bin/env bash # bin/load_config.sh # Usage: eval "$(load_config.sh path/to/config)" set -euo pipefail CONFIG_FILE="${1:?Usage: load_config.sh CONFIG_FILE}" [[ -f "$CONFIG_FILE" ]] || { printf 'File not found: %s\n' "$CONFIG_FILE" >&2; exit 1; } # Delegate entirely to Python; its stdout is safe to eval exec python3 -c ' import sys, re, shlex REQUIRED = {"DB_HOST", "DB_PORT", "APP_NAME"} REF_RE = re.compile(r"\$\{([A-Za-z_][A-Za-z0-9_]*)\}") # ── Parse ──────────────────────────────────────────────────────── cfg = {} # key -> raw value (preserving order, Python 3.7+) order = [] # insertion order with open(sys.argv[1]) as fh: for lineno, line in enumerate(fh, 1): line = line.rstrip("\n") line = re.sub(r"\s*#.*$", "", line) # strip inline comments line = line.strip() if not line: continue if "=" not in line: print(f"# warning: line {lineno} skipped (no =)", file=sys.stderr) continue k, _, v = line.partition("=") k = k.strip() v = v.strip() if not re.fullmatch(r"[A-Za-z_][A-Za-z0-9_]*", k): print(f"# warning: invalid key {k!r} on line {lineno}", file=sys.stderr) continue cfg[k] = v if k not in order: order.append(k) # ── Validate ───────────────────────────────────────────────────── missing = REQUIRED - set(cfg) if missing: for m in sorted(missing): print(f"echo Missing required key: {m} >&2", file=sys.stdout) print("exit 1") sys.exit(0) # ── Resolve variable references (single pass in definition order) ─ resolved = {} for k in order: def replacer(m, _r=resolved): return _r.get(m.group(1), m.group(0)) resolved[k] = REF_RE.sub(replacer, cfg[k]) # ── Emit safe export statements ─────────────────────────────────── for k in order: print(f"export {k}={shlex.quote(resolved[k])}") ' -- "$CONFIG_FILE"
# Example usage in another script: # eval "$(bin/load_config.sh config/app.conf)" # echo "Connecting to $DB_HOST:$DB_PORT" # # Example config/app.conf: # DB_HOST = db.internal # primary database # DB_PORT = 5432 # APP_NAME = my-service # DB_URL = postgres://${DB_HOST}:${DB_PORT}/mydb # GREETING = Hello from ${APP_NAME}