Integrating Bash with Other Languages

Chapter 12 — Integrating Bash with Other Languages

Bash's real power in a polyglot world is not what it can do alone — it is how cleanly it can orchestrate everything else. Knowing when to hand off to Python, Node, Ruby, or awk, how to pass structured data across that boundary, and how to use another language as a persistent coprocess rather than a repeatedly-forked subprocess are the skills that separate throw-it-together shell scripts from production-grade automation. This final chapter covers every major integration pattern and ends with an honest decision framework for when to stop using Bash at all.

1 — The Fork-per-Call Tax

Every $(python3 ...) inside a loop pays a full fork+exec cost — typically 5–30 ms on a modern Linux system. For one-off calls this is irrelevant. For ten thousand iterations it is a 30-second wall clock bill you did not budget for.

# Expensive: one Python process per line
while IFS= read -r line; do
  result=$(python3 -c "import sys; print(sys.stdin.read().upper())" <<< "$line")
  printf '%s\n' "$result"
done < bigfile.txt

# Better: one process, all lines
python3 -c "import sys; sys.stdout.write(sys.stdin.read().upper())" \
  < bigfile.txt

# Best for bidirectional work: a persistent coprocess (see Section 2)

Pattern	Forks	Use when
One-shot call `$(python3 script.py)`	1	Called once or rarely; simplest code
Batch stdin→stdout pipe	1	Large input, stateless transformation
Coprocess (persistent)	1	Many calls in a loop; stateful interpreter session
Background server + socket	1 + overhead	Multiple shell scripts sharing one interpreter

2 — Persistent Coprocesses

The coprocess pattern (introduced in Chapter 4) becomes especially valuable when the "other language" needs to be initialised once — loading libraries, connecting to a database, compiling a regex — and then serve many requests.

Python coprocess

#!/usr/bin/env bash
# A persistent Python worker that handles JSON requests over stdin/stdout

# ── The Python worker script ─────────────────────────────────────
cat > /tmp/worker.py <<'PYEOF'
import sys, json, hashlib, re

# One-time initialisation: compile regex, load lookup tables, etc.
PATTERN = re.compile(r'\b\w{4,}\b')

for raw in sys.stdin:
    raw = raw.strip()
    if not raw:
        continue
    try:
        req = json.loads(raw)
        op  = req.get('op')

        if op == 'hash':
            result = hashlib.sha256(req['data'].encode()).hexdigest()
        elif op == 'words':
            result = PATTERN.findall(req['text'])
        elif op == 'upper':
            result = req['text'].upper()
        else:
            result = None

        print(json.dumps({'ok': True,  'result': result}), flush=True)
    except Exception as e:
        print(json.dumps({'ok': False, 'error':  str(e)}),  flush=True)
PYEOF

# ── Start the coprocess ──────────────────────────────────────────
coproc PY (python3 -u /tmp/worker.py)
# -u: unbuffered — critical, otherwise our writes are never read

py_call() {
  # Send a JSON request; read the JSON response
  local request="$1"
  printf '%s\n' "$request" >&"${PY[1]}"
  read -r response <&"${PY[0]}"
  printf '%s' "$response"
}

# ── Use it in a loop — only one Python process throughout ─────────
while IFS= read -r line; do
  req=$(printf '{"op":"hash","data":"%s"}' \
    "$(printf '%s' "$line" | sed 's/"/\\"/g')")
  resp=$(py_call "$req")
  printf '%s\t%s\n' "$line" "$resp"
done < input.txt

# Shut down the worker
kill "$PY_PID" 2>/dev/null; wait "$PY_PID" 2>/dev/null

Node.js coprocess

cat > /tmp/worker.js <<'JSEOF'
const readline = require('readline');
const crypto   = require('crypto');
const rl = readline.createInterface({ input: process.stdin });

rl.on('line', (raw) => {
  try {
    const req = JSON.parse(raw);
    let result;
    if      (req.op === 'md5')    result = crypto.createHash('md5').update(req.data).digest('hex');
    else if (req.op === 'upper')  result = req.text.toUpperCase();
    else if (req.op === 'slugify')result = req.text.toLowerCase().replace(/\s+/g, '-');
    process.stdout.write(JSON.stringify({ ok: true, result }) + '\n');
  } catch(e) {
    process.stdout.write(JSON.stringify({ ok: false, error: e.message }) + '\n');
  }
});
JSEOF

coproc JS (node /tmp/worker.js)

js_call() {
  printf '%s\n' "$1" >&"${JS[1]}"
  read -r REPLY <&"${JS[0]}"
  printf '%s' "$REPLY"
}

js_call '{"op":"slugify","text":"Hello World"}'
# {"ok":true,"result":"hello-world"}

3 — Passing Data Structures via JSON

JSON is the universal interchange format between Bash and other languages. jq is the standard tool for reading and writing it from shell scripts.

# ── Reading JSON in Bash ─────────────────────────────────────────
json='{"name":"alice","roles":["admin","user"],"meta":{"age":30}}'

# Extract a scalar
name=$(jq -r '.name'         <<< "$json")     # -r = raw (no quotes)
age=$(jq  -r '.meta.age'     <<< "$json")
role0=$(jq -r '.roles[0]'    <<< "$json")

# Read a JSON array into a Bash array
mapfile -t roles < <(
  jq -r '.roles[]' <<< "$json"
)
printf 'role: %s\n' "${roles[@]}"
# role: admin
# role: user

# Iterate over JSON object keys
while IFS='=' read -r key val; do
  printf '%s → %s\n' "$key" "$val"
done < <(jq -r '.meta | to_entries[] | "\(.key)=\(.value)"' <<< "$json")
# age → 30

# ── Building JSON from Bash variables ────────────────────────────
# WRONG: string interpolation breaks on special characters
printf '{"name":"%s"}' "$name"    # breaks if name contains " or \

# RIGHT: let jq do the escaping via --arg
payload=$(jq -n \
  --arg   name  "$name"   \
  --argjson age "$age"    \
  '{"name":$name,"age":$age}')

# Build a JSON array from a Bash array
tags=(production europe tier-1)
json_tags=$(printf '%s\n' "${tags[@]}" | jq -R '.' | jq -s '.')
# → ["production","europe","tier-1"]

# Build a JSON object from an associative array
declare -A env_vars=([HOST]=localhost [PORT]=8080 [DB]=mydb)
json_obj=$(for k in "${!env_vars[@]}"; do
  jq -n --arg k "$k" --arg v "${env_vars[$k]}" '{"key":$k,"value":$v}'
done | jq -s 'from_entries')

# ── Streaming JSON (NDJSON / JSON Lines) ─────────────────────────
# Each line is a self-contained JSON object — safe to pipe, grep, tail
jq -c '.[]' big_array.json | while IFS= read -r obj; do
  name=$(jq -r '.name' <<< "$obj")
  printf 'Processing %s\n' "$name"
done

4 — Embedding Other Languages Inline

For short, self-contained logic that does not warrant a separate file, embed the other language directly in the shell script using a heredoc. The heredoc keeps related code together and avoids the file-management overhead of temporary scripts.

# ── Inline Python ────────────────────────────────────────────────
result=$(python3 <<'EOF'
import json, sys

data = [{"id": i, "val": i**2} for i in range(1, 6)]
print(json.dumps(data))
EOF
)
echo "$result"
# [{"id": 1, "val": 1}, {"id": 2, "val": 4}, ...]

# ── Passing Bash variables into inline Python ─────────────────────
threshold=42
label="my label"

# Method 1: environment variables (safe for any value)
export THRESHOLD="$threshold"
export LABEL="$label"
python3 <<'EOF'
import os
print(f"threshold={os.environ['THRESHOLD']}, label={os.environ['LABEL']}")
EOF

# Method 2: unquoted heredoc — Bash expands $vars before Python sees them
# ONLY safe when the values are validated (no quotes, no backslashes)
[[ $threshold =~ ^[0-9]+$ ]] || { echo "bad threshold" >&2; exit 1; }
python3 <<EOF
print("threshold is ${threshold}")
EOF

# ── Inline Ruby ──────────────────────────────────────────────────
csv_data="alice,30,engineer
bob,25,designer"

ruby <<'EOF'
require 'csv'
STDIN.each_line do |line|
  row = CSV.parse_line(line.chomp)
  puts "#{row[0].upcase} (#{row[2]}), age #{row[1]}"
end
EOF
<<< "$csv_data"

# ── Inline Perl ──────────────────────────────────────────────────
# Perl shines for complex text transformations
perl -00 -ne 'print if /ERROR/i' /var/log/app.log
# -00: paragraph mode (blank line = record separator)
# -n: loop over input; -e: inline code

# ── Inline awk — already in most scripts, mentioned for completeness
awk 'BEGIN{OFS="\t"} NR>1{sum+=$3} END{printf "total: %.2f\n", sum}' data.csv

5 — Shell as Glue: Orchestrating Multi-Language Pipelines

Bash's most natural role in a polyglot system is as the top-level orchestrator: each stage of a pipeline is handled by the language best suited to it, and Bash wires them together.

#!/usr/bin/env bash
# A real-world ETL pipeline:
# 1. Fetch raw data (curl)
# 2. Validate and transform (Python)
# 3. Aggregate stats (awk)
# 4. Format report (jq)
# 5. Upload result (curl)
set -euo pipefail

API_URL="https://api.example.com/events"
TOKEN=$(cat ~/.config/myapp/token)
DATE=$(date -u '+%Y-%m-%d')
TMPDIR=$(mktemp -d)
trap 'rm -rf "$TMPDIR"' EXIT

# Stage 1: fetch
curl -sf -H "Authorization: Bearer $TOKEN" \
  "$API_URL?date=$DATE" > "$TMPDIR/raw.json"

# Stage 2: validate and normalise (Python is best for this)
python3 <<'PY' < "$TMPDIR/raw.json" > "$TMPDIR/clean.ndjson"
import sys, json, datetime

for event in json.load(sys.stdin):
    if not event.get('timestamp') or not event.get('user_id'):
        continue
    event['ts'] = datetime.datetime.fromisoformat(
        event['timestamp']).strftime('%Y-%m-%dT%H:%M:%SZ')
    print(json.dumps(event))
PY

# Stage 3: aggregate per-user event counts (awk is ideal)
jq -r '"\(.user_id)\t\(.event_type)"' "$TMPDIR/clean.ndjson" \
  | sort \
  | awk -F'\t' '{counts[$1"/"$2]++}
    END{for(k in counts) print k, counts[k]}' \
  > "$TMPDIR/counts.txt"

# Stage 4: build final JSON report (jq)
jq -Rn '[inputs | split(" ") |
    {key: .[0], count: (.[1] | tonumber)}]' \
  "$TMPDIR/counts.txt" > "$TMPDIR/report.json"

# Stage 5: upload
curl -sf -X POST \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  --data-binary "@$TMPDIR/report.json" \
  "$API_URL/reports"

printf 'Report uploaded for %s\n' "$DATE"

6 — Calling Bash from Other Languages

The integration runs both ways. Sometimes your Python, Node, or Go program needs to shell out, and knowing what Bash sees on the other end helps you design the interface cleanly.

# ── What your script should export for callers ───────────────────
# Good script interface design:
#   - Reads config from env vars (easy for any caller)
#   - Takes structured input via stdin (JSON or line-delimited)
#   - Produces structured output on stdout
#   - Uses exit codes consistently (0 = success, non-zero = error)
#   - Writes human errors to stderr only (callers ignore stderr)

#!/usr/bin/env bash
# bin/process_order.sh  — designed to be called from Python/Node/Go
set -euo pipefail

# Config from environment
DB_HOST="${DB_HOST:?DB_HOST required}"
DB_PORT="${DB_PORT:-5432}"

# Input: JSON on stdin
order=$(cat)
order_id=$(jq -r '.id'     <<< "$order")
amount=$(  jq -r '.amount' <<< "$order")

# ... processing ...

# Output: structured JSON on stdout
jq -n --arg id "$order_id" --arg status "processed" \
  '{"order_id":$id,"status":$status}'

# ── Calling the script from Python ───────────────────────────────
import subprocess, json, os

order = {"id": "ORD-001", "amount": 99.99}

result = subprocess.run(
    ["./bin/process_order.sh"],
    input=json.dumps(order),
    capture_output=True,
    text=True,
    env={**os.environ, "DB_HOST": "localhost"},
    check=True            # raises CalledProcessError on non-zero exit
)

response = json.loads(result.stdout)
print(response["status"])   # → processed

# ── Calling from Node.js ──────────────────────────────────────────
const { execFile } = require('child_process');
const order = JSON.stringify({ id: 'ORD-002', amount: 49.99 });

execFile('./bin/process_order.sh', [],
  { env: { ...process.env, DB_HOST: 'localhost' } },
  (err, stdout, stderr) => {
    if (err) { console.error('Script failed:', stderr); process.exit(1); }
    const resp = JSON.parse(stdout);
    console.log(resp.status);
  }
);

7 — When to Stop Using Bash

The ability to hand off cleanly is as important as the ability to integrate. Continuing to grow a Bash script past its natural limits produces code that is harder to test, harder to reason about, and harder to hand to a colleague. Recognise the signals early.

Signal	What it means in practice	Language to reach for
You need real data structures (trees, graphs, objects)	Associative arrays and parallel indexed arrays no longer model the problem cleanly	Python, Go, Ruby
Error handling is more than `\|\| exit 1`	You need try/catch, error types, retry logic with backoff	Python, Go
You are writing >200 lines of logic	A colleague cannot understand the script without running it	Python (fastest rewrite)
You need HTTP or a database	`curl` + `jq` works up to a point; auth, retries, and pooling quickly exceed it	Python (requests/httpx), Go
You need concurrency beyond `&` + `wait`	Proper thread pools, async I/O, or actor models	Go, Python (asyncio)
You want to unit-test the logic, not the plumbing	BATS covers the shell surface; complex logic belongs in a testable unit	Python, Go, Node
Someone else will maintain this in 6 months	The safe default is Python — it is on every server and universally readable	Python
You need a CLI with subcommands, flags, help text	Writing a clean CLI in Bash is possible but `argparse` / `cobra` are far faster	Python (Click/Typer), Go (cobra)

The handoff pattern — keep the shell entry point

#!/usr/bin/env bash
# bin/run.sh — thin shell wrapper; all logic is in Python
# This is the right way to hand off: keep one entry point,
# have it exec into the real implementation.
set -euo pipefail

# Guard: require Python 3.10+
py=$(command -v python3 || { echo "python3 not found" >&2; exit 1; })
ver=$("$py" -c "import sys; print('%d%02d' % sys.version_info[:2])")
(( ver >= 310 )) || { echo "Python 3.10+ required (got $($py --version))" >&2; exit 1; }

# Activate virtualenv if present
[[ -f "${VIRTUAL_ENV:-}/bin/activate" ]] || {
  [[ -f .venv/bin/activate ]] && source .venv/bin/activate
}

# exec replaces this shell with Python — no double-process overhead
exec "$py" -m myapp "$@"

8 — Practical Patterns Summary

Task	Best tool	Notes
File/process orchestration	Bash	Its strongest domain
Text transformation (simple)	awk / sed	Already in every pipeline
JSON parsing/building	jq	Use `--arg` for safe value injection
Complex text / regex	Python / Perl	Inline heredoc for short snippets
Maths beyond integer arithmetic	Python / bc	bc for simple; Python for floats/complex
HTTP requests	curl + jq	Python requests/httpx for anything non-trivial
YAML / TOML / XML	yq / python3	Never parse with grep/sed
Stateful computation in a loop	Python coprocess	One process, many requests via stdin/stdout
Parallel I/O-bound work	GNU parallel / xargs -P	Built-in parallelism without code complexity
Long-running service / daemon	Python / Go + systemd	Use shell only for the entry-point wrapper

Exercises

Exercise 1 — JSON-speaking Python coprocess

Write a complete Bash script that uses a persistent Python coprocess to perform three operations without spawning more than one Python process:

validate_email ADDRESS — returns {"ok":true} or {"ok":false,"reason":"..."}
slugify TEXT — lowercases, replaces spaces and non-alphanumeric characters with hyphens, collapses multiple hyphens
word_freq TEXT — returns a JSON object of word → count, sorted by count descending, top 5 only

The shell script should read lines from stdin in the format OP ARG, call the appropriate operation via the coprocess, and print the result. Handle coprocess startup failure and include a clean shutdown on EXIT.

#!/usr/bin/env bash
set -euo pipefail

# ── Python worker ────────────────────────────────────────────────
WORKER=$(mktemp --suffix=.py)
trap 'rm -f "$WORKER"' EXIT

cat > "$WORKER" <<'PY'
import sys, json, re
from collections import Counter

EMAIL_RE = re.compile(r'^[^@\s]+@[^@\s]+\.[^@\s]{2,}$')

def validate_email(addr):
    if EMAIL_RE.match(addr):
        return {"ok": True}
    return {"ok": False, "reason": "does not match email pattern"}

def slugify(text):
    s = text.lower()
    s = re.sub(r'[^a-z0-9\s-]', '', s)
    s = re.sub(r'[\s-]+', '-', s).strip('-')
    return {"ok": True, "result": s}

def word_freq(text):
    words = re.findall(r'\b\w+\b', text.lower())
    top = Counter(words).most_common(5)
    return {"ok": True, "result": {w: c for w, c in top}}

DISPATCH = {"validate_email": validate_email,
            "slugify":        slugify,
            "word_freq":      word_freq}

for raw in sys.stdin:
    raw = raw.strip()
    if not raw:
        continue
    try:
        req  = json.loads(raw)
        fn   = DISPATCH.get(req["op"])
        resp = fn(req["arg"]) if fn else {"ok": False, "reason": "unknown op"}
    except Exception as e:
        resp = {"ok": False, "reason": str(e)}
    print(json.dumps(resp), flush=True)
PY

# ── Start coprocess ──────────────────────────────────────────────
coproc PY (python3 -u "$WORKER")
[[ -n "${PY_PID:-}" ]] || { echo "Failed to start worker" >&2; exit 1; }
trap 'kill "$PY_PID" 2>/dev/null; wait "$PY_PID" 2>/dev/null' EXIT

# ── Shell dispatch ────────────────────────────────────────────────
py_call() {
  local op="$1" arg="$2"
  local req
  req=$(jq -n --arg op "$op" --arg arg "$arg" '{"op":$op,"arg":$arg}')
  printf '%s\n' "$req" >&"${PY[1]}"
  local resp
  read -r resp <&"${PY[0]}"
  printf '%s\n' "$resp"
}

# ── Read OP ARG from stdin ────────────────────────────────────────
while IFS= read -r line; do
  [[ -z "$line" ]] && continue
  op="${line%% *}"
  arg="${line#* }"
  py_call "$op" "$arg"
done

Exercise 2 — Multi-language ETL pipeline

Write a Bash script that implements a five-stage pipeline processing a CSV log file of web requests (timestamp,method,path,status,bytes,duration_ms):

Validate (Python inline): skip rows where status is not a 3-digit integer, duration is negative, or path contains ..
Enrich (awk): add a sixth field category based on status (2xx=ok, 3xx=redirect, 4xx=client_error, 5xx=server_error)
Aggregate (awk): count requests and sum bytes by category
Format (jq): emit a JSON summary object with date, total_requests, and a by_category breakdown
Save (Bash): write the JSON to reports/YYYY-MM-DD.json, creating the directory if needed

All five stages should be connected in a single pipeline with no temporary files. Include a sample input generator at the top for testing.

#!/usr/bin/env bash
set -euo pipefail

DATE=$(date -u '+%Y-%m-%d')
REPORT_DIR="${REPORT_DIR:-reports}"

# ── Generate sample input ────────────────────────────────────────
sample_csv() {
  printf 'timestamp,method,path,status,bytes,duration_ms\n'
  printf '2026-06-10T10:00:01Z,GET,/api/users,200,1234,42\n'
  printf '2026-06-10T10:00:02Z,POST,/api/orders,201,890,118\n'
  printf '2026-06-10T10:00:03Z,GET,/static/img,304,0,5\n'
  printf '2026-06-10T10:00:04Z,GET,/../etc/passwd,400,200,-1\n'  # invalid: path + neg duration
  printf '2026-06-10T10:00:05Z,GET,/api/products,404,320,30\n'
  printf '2026-06-10T10:00:06Z,GET,/api/orders,500,450,9999\n'
  printf '2026-06-10T10:00:07Z,DELETE,/api/users/1,204,0,55\n'
  printf 'bad,row,data\n'  # invalid: too few fields
}

# ── Pipeline ─────────────────────────────────────────────────────
run_pipeline() {
  local input="$1"

  # Stage 1: Validate (Python) ── skip header and bad rows
  # Stage 2: Enrich (awk)       ── add category field
  # Stage 3: Aggregate (awk)    ── count + sum by category
  # Stage 4: Format (jq)        ── produce JSON

  python3 <<'PY' < "$input" \
  | awk -F, <<'AWK' \
  | awk -F, <<'AWK2' \
  | jq -Rn --arg date "$DATE" <<'JQ'
import sys, csv

r = csv.reader(sys.stdin)
next(r)  # skip header
for row in r:
    if len(row) < 6:
        continue
    ts, method, path, status, byt, dur = row[:6]
    if not status.isdigit() or len(status) != 3:
        continue
    if int(dur) < 0:
        continue
    if '..' in path:
        continue
    print(','.join(row[:6]))
PY
{
    status = int($4)
    if      (status >= 200 && status < 300) cat = "ok"
    else if (status >= 300 && status < 400) cat = "redirect"
    else if (status >= 400 && status < 500) cat = "client_error"
    else                                    cat = "server_error"
    print $0 "," cat
}
AWK
{
    cat = $7
    counts[cat]++
    bytes[cat] += $5
}
END {
    for (c in counts)
        print c "," counts[c] "," bytes[c]
}
AWK2
[ inputs | split(",") |
  { category: .[0], requests: (.[1]|tonumber), bytes: (.[2]|tonumber) } ] |
{ date: $date,
  total_requests: (map(.requests) | add),
  by_category: (map({(.category): {requests:.requests, bytes:.bytes}}) | add)
}
JQ
}

# ── Main ─────────────────────────────────────────────────────────
INPUT=$(mktemp)
trap 'rm -f "$INPUT"' EXIT
sample_csv > "$INPUT"

mkdir -p "$REPORT_DIR"
run_pipeline "$INPUT" > "$REPORT_DIR/$DATE.json"
printf 'Report written to %s/%s.json\n' "$REPORT_DIR" "$DATE"
cat "$REPORT_DIR/$DATE.json"

Exercise 3 — Bash script designed to be called from Python

Design and write a Bash script bin/sys_info.sh that is intended to be called from Python using subprocess.run. The script should:

Accept a --format flag: json (default) or text
Collect: hostname, kernel version, uptime in seconds, load averages (1/5/15 min), total/used/free memory in MB, and disk usage for / (total/used/free in GB)
With --format json, output a single clean JSON object using jq -n --arg/--argjson (never string interpolation)
With --format text, output a human-readable summary
Exit 2 for unknown --format values
Write all errors to stderr; keep stdout clean for machine consumption

Then write a Python snippet that calls the script, parses the JSON, and prints a one-line health summary.

#!/usr/bin/env bash
# bin/sys_info.sh
set -euo pipefail

FORMAT=json
while (( $# )); do
  case "$1" in
    --format) FORMAT="$2"; shift 2 ;;
    *) printf 'Unknown option: %s\n' "$1" >&2; exit 2 ;;
  esac
done

[[ $FORMAT == json || $FORMAT == text ]] || {
  printf 'Unknown format: %s (use json or text)\n' "$FORMAT" >&2; exit 2
}

# ── Collect data ─────────────────────────────────────────────────
HOSTNAME_VAL=$(hostname -f 2>/dev/null || hostname)
KERNEL=$(uname -r)
UPTIME_S=$(awk '{printf "%d", $1}' /proc/uptime)

# Load averages
read -r LOAD1 LOAD5 LOAD15 _rest < /proc/loadavg

# Memory in MB (from /proc/meminfo)
memval() { awk -v k="$1" '$1==k{printf "%d", int($2/1024)}' /proc/meminfo; }
MEM_TOTAL=$(memval MemTotal:)
MEM_AVAIL=$(memval MemAvailable:)
MEM_USED=$(( MEM_TOTAL - MEM_AVAIL ))

# Disk usage for / in GB
read -r DISK_TOTAL DISK_USED DISK_FREE <<< \
  $(df -BG / | awk 'NR==2{gsub(/G/,""); print $2, $3, $4}')

# ── Output ───────────────────────────────────────────────────────
if [[ $FORMAT == json ]]; then
  jq -n \
    --arg     hostname   "$HOSTNAME_VAL"  \
    --arg     kernel     "$KERNEL"       \
    --argjson uptime_s   "$UPTIME_S"    \
    --argjson load1      "$LOAD1"       \
    --argjson load5      "$LOAD5"       \
    --argjson load15     "$LOAD15"      \
    --argjson mem_total  "$MEM_TOTAL"   \
    --argjson mem_used   "$MEM_USED"    \
    --argjson mem_free   "$MEM_AVAIL"   \
    --argjson disk_total "$DISK_TOTAL"  \
    --argjson disk_used  "$DISK_USED"   \
    --argjson disk_free  "$DISK_FREE"   \
    '{hostname:$hostname,kernel:$kernel,uptime_s:$uptime_s,
      load:{one:$load1,five:$load5,fifteen:$load15},
      memory:{total_mb:$mem_total,used_mb:$mem_used,free_mb:$mem_free},
      disk:{total_gb:$disk_total,used_gb:$disk_used,free_gb:$disk_free}}'
else
  printf 'Host    : %s (%s)\n'     "$HOSTNAME_VAL" "$KERNEL"
  printf 'Uptime  : %d s\n'         "$UPTIME_S"
  printf 'Load    : %s %s %s\n'     "$LOAD1" "$LOAD5" "$LOAD15"
  printf 'Memory  : %d / %d MB\n'  "$MEM_USED" "$MEM_TOTAL"
  printf 'Disk /  : %d / %d GB\n'  "$DISK_USED" "$DISK_TOTAL"
fi

# Python caller
import subprocess, json

result = subprocess.run(
    ["./bin/sys_info.sh", "--format", "json"],
    capture_output=True, text=True, check=True
)
info = json.loads(result.stdout)

mem_pct = info["memory"]["used_mb"] / info["memory"]["total_mb"] * 100
disk_pct = info["disk"]["used_gb"]  / info["disk"]["total_gb"]   * 100

print(
    f"{info['hostname']}  "
    f"load={info['load']['one']}  "
    f"mem={mem_pct:.0f}%  "
    f"disk={disk_pct:.0f}%"
)

Exercise 4 — The handoff: rewrite a Bash function in Python

The following Bash function has grown beyond Bash's comfortable limits. It parses a YAML-ish config file, resolves variable references (${VAR} within values), validates required keys, and returns results as environment variables. It is 60 lines, has two known edge-case bugs, and cannot be unit-tested without mocking the filesystem.

load_config() {
  local file="$1"; local -A cfg
  while IFS='=' read -r k v; do
    [[ $k =~ ^[[:space:]]*# ]] && continue
    [[ -z $k ]] && continue
    cfg["${k// /}"]="${v}"
  done < "$file"
  for k in DB_HOST DB_PORT APP_NAME; do
    [[ -v cfg[$k] ]] || { echo "missing: $k" >&2; return 1; }
  done
  for k in "${!cfg[@]}"; do
    local val="${cfg[$k]}"
    val="${val//\$\{DB_HOST\}/${cfg[DB_HOST]:-}}"
    val="${val//\$\{DB_PORT\}/${cfg[DB_PORT]:-}}"
    export "$k"="$val"
  done
}

Rewrite this as bin/load_config.sh: a thin Bash wrapper that calls a Python implementation, passes the config file path as an argument, and evals the output to set the variables in the calling shell. The Python implementation should: parse the file properly (handling whitespace, inline comments, multi-word values), resolve ${VAR} references in topological order (a variable can reference another defined earlier in the same file), validate required keys, and emit export KEY='VALUE' lines using shlex.quote so the output is safe to eval regardless of value content.

#!/usr/bin/env bash
# bin/load_config.sh
# Usage: eval "$(load_config.sh path/to/config)"
set -euo pipefail

CONFIG_FILE="${1:?Usage: load_config.sh CONFIG_FILE}"
[[ -f "$CONFIG_FILE" ]] || { printf 'File not found: %s\n' "$CONFIG_FILE" >&2; exit 1; }

# Delegate entirely to Python; its stdout is safe to eval
exec python3 -c '
import sys, re, shlex

REQUIRED = {"DB_HOST", "DB_PORT", "APP_NAME"}
REF_RE   = re.compile(r"\$\{([A-Za-z_][A-Za-z0-9_]*)\}")

# ── Parse ────────────────────────────────────────────────────────
cfg    = {}   # key -> raw value (preserving order, Python 3.7+)
order  = []   # insertion order

with open(sys.argv[1]) as fh:
    for lineno, line in enumerate(fh, 1):
        line = line.rstrip("\n")
        line = re.sub(r"\s*#.*$", "", line)   # strip inline comments
        line = line.strip()
        if not line:
            continue
        if "=" not in line:
            print(f"# warning: line {lineno} skipped (no =)", file=sys.stderr)
            continue
        k, _, v = line.partition("=")
        k = k.strip()
        v = v.strip()
        if not re.fullmatch(r"[A-Za-z_][A-Za-z0-9_]*", k):
            print(f"# warning: invalid key {k!r} on line {lineno}", file=sys.stderr)
            continue
        cfg[k] = v
        if k not in order:
            order.append(k)

# ── Validate ─────────────────────────────────────────────────────
missing = REQUIRED - set(cfg)
if missing:
    for m in sorted(missing):
        print(f"echo Missing required key: {m} >&2", file=sys.stdout)
    print("exit 1")
    sys.exit(0)

# ── Resolve variable references (single pass in definition order) ─
resolved = {}
for k in order:
    def replacer(m, _r=resolved):
        return _r.get(m.group(1), m.group(0))
    resolved[k] = REF_RE.sub(replacer, cfg[k])

# ── Emit safe export statements ───────────────────────────────────
for k in order:
    print(f"export {k}={shlex.quote(resolved[k])}")
' -- "$CONFIG_FILE"

# Example usage in another script:
#   eval "$(bin/load_config.sh config/app.conf)"
#   echo "Connecting to $DB_HOST:$DB_PORT"
#
# Example config/app.conf:
#   DB_HOST = db.internal       # primary database
#   DB_PORT = 5432
#   APP_NAME = my-service
#   DB_URL = postgres://${DB_HOST}:${DB_PORT}/mydb
#   GREETING = Hello from ${APP_NAME}