Working with JSON and jq

🔧 Intermediate Topic 7 — Working with JSON and jq

JSON is the dominant data interchange format for APIs, config files, and tool output. Nearly every CLI tool now supports --output json, and virtually every REST API speaks it. jq is the shell's native JSON processor — a full filter language that can extract, transform, reshape, and build JSON without leaving the terminal. This chapter covers the complete jq filter syntax, shell variable injection, building JSON from shell data, and the full pattern of calling REST APIs with curl and processing their responses. By the end you'll be able to treat any JSON-speaking API as a first-class data source in your scripts.

Install jq: apt install jq / brew install jq / dnf install jq. Verify with jq --version. All examples in this chapter assume jq 1.6 or later.

1 — jq Basics: Identity, Field Access, and Array Indexing

Every jq program is a filter applied to a JSON input. The input flows in from stdin (or a file), passes through the filter, and the result goes to stdout. The simplest filter is . — the identity — which pretty-prints the input unchanged.

🔧 Field access, array indexing, and basic navigation

# Sample JSON for this section:
# {
#   "name": "Alice",
#   "age": 32,
#   "active": true,
#   "scores": [95, 87, 91],
#   "address": { "city": "London", "zip": "EC1A 1BB" },
#   "tags": ["admin", "editor"]
# }

# ── Identity — pretty-print ────────────────────────────────────
echo '{"name":"Alice","age":32}' | jq '.'

# ── Object field access ────────────────────────────────────────
jq '.name'             data.json    → "Alice"
jq '.age'              data.json    → 32
jq '.address.city'     data.json    → "London"

# ── Strip quotes from string output ───────────────────────────
jq -r '.name'           data.json    → Alice     (no quotes — use in scripts)

# ── Array indexing ────────────────────────────────────────────
jq '.scores[0]'         data.json    → 95
jq '.scores[-1]'        data.json    → 91   (last element)
jq '.scores[1:3]'       data.json    → [87, 91]
jq '.tags[]'            data.json    → "admin"
                                           "editor"  (iterator — one per line)

# ── Optional operator — suppress errors for missing keys ──────
jq '.missing?'          data.json    → (nothing — no error)
jq '.missing // "N/A"' data.json    → "N/A"   (alternative operator)
jq '.age // 0'          data.json    → 32

# ── Keys and values ───────────────────────────────────────────
jq 'keys'               data.json    → ["active","address","age","name","scores","tags"]
jq 'keys[]'             data.json    → one key per line
jq 'has("age")'         data.json    → true
jq 'type'               data.json    → "object"
jq '.scores | type'     data.json    → "array"
jq '.scores | length'  data.json    → 3

# ── Assign to bash variables ───────────────────────────────────
name=$(jq -r '.name' data.json)
city=$(jq -r '.address.city' data.json)
echo "$name lives in $city"

2 — Pipe, Comma, and Constructing New JSON

The jq pipe | passes the output of one filter as input to the next — exactly like the shell pipe, but within a single jq program. The comma operator , runs two filters on the same input and outputs both results. Square brackets [] and curly braces {} construct new arrays and objects from the current input.

🔧 Pipe, comma, and building new structures

# ── Pipe: chain filters ───────────────────────────────────────
jq '.address | .city'          data.json   → "London"
jq '.scores | .[0]'            data.json   → 95
jq '.scores | length'          data.json   → 3
jq '.scores | add'             data.json   → 273  (sum of array)
jq '.scores | add / length'   data.json   → 91   (average)
jq '.scores | max'             data.json   → 95
jq '.scores | min'             data.json   → 87
jq '.scores | sort | reverse' data.json   → [95, 91, 87]

# ── Comma: multiple outputs from one input ────────────────────
jq '.name, .age'               data.json
→ "Alice"
   32

# ── Array construction: collect results into an array ─────────
jq '[.name, .age]'             data.json   → ["Alice", 32]
jq '[.scores[] | . * 2]'      data.json   → [190, 174, 182]

# ── Object construction: build a new object ───────────────────
jq '{user: .name, location: .address.city}' data.json
→ { "user": "Alice", "location": "London" }

# Shorthand when key name matches field name:
jq '{name, age}'               data.json
→ { "name": "Alice", "age": 32 }

# ── Object and array expansion ────────────────────────────────
# Spread all fields of a sub-object into current:
jq '{name} + .address'         data.json
→ { "name": "Alice", "city": "London", "zip": "EC1A 1BB" }

# ── to_entries / from_entries — iterate over key-value pairs ──
jq '.address | to_entries'     data.json
→ [{"key":"city","value":"London"}, {"key":"zip","value":"EC1A 1BB"}]

jq '.address | to_entries[] | "\(.key)=\(.value)"' data.json
→ "city=London"
   "zip=EC1A 1BB"

# ── String interpolation with \() ─────────────────────────────
jq -r '"Hello \(.name), age \(.age)"' data.json
→ Hello Alice, age 32

3 — `select`, `map`, and Working with Arrays of Objects

The most common real-world pattern: you have a JSON array of objects (API responses, log entries, config items) and you need to filter, transform, or aggregate it. select(condition) keeps items that pass the test; map(filter) applies a transformation to every element.

🔧 Filtering and transforming arrays of objects

# Sample: GitHub-style array of repos
# [
#   {"name":"alpha","stars":142,"lang":"Python","archived":false},
#   {"name":"beta", "stars":38, "lang":"Go",    "archived":false},
#   {"name":"gamma","stars":891,"lang":"Python","archived":true},
#   {"name":"delta","stars":12, "lang":"Rust",  "archived":false}
# ]

# ── select: filter items by condition ─────────────────────────
jq '.[] | select(.stars > 100)'         repos.json
→ alpha (142) and gamma (891)

jq '.[] | select(.lang == "Python")'     repos.json
jq '.[] | select(.archived == false)'    repos.json
jq '.[] | select(.archived | not)'       repos.json  # same

# Combine conditions (and = ,  or = |)
jq '.[] | select(.lang == "Python" and .archived == false)' repos.json

# ── Collect filtered results into an array ────────────────────
jq '[.[] | select(.stars > 50)]'         repos.json

# ── map: apply filter to every element ───────────────────────
jq 'map(.name)'                            repos.json
→ ["alpha","beta","gamma","delta"]

jq 'map({name, stars})'                   repos.json
→ [{"name":"alpha","stars":142}, ...]

jq 'map(select(.archived == false)) | map(.name)' repos.json
→ ["alpha","beta","delta"]

# Equivalent shorthand:
jq [.[]|select(.archived|not)|.name]      repos.json

# ── map_values: transform values of an object ─────────────────
jq '.address | map_values(ascii_upcase)'  data.json
→ {"city":"LONDON","zip":"EC1A 1BB"}

# ── sort_by, group_by, unique_by ──────────────────────────────
jq 'sort_by(.stars)'                      repos.json   # ascending
jq 'sort_by(.stars) | reverse'           repos.json   # descending
jq 'sort_by(.name)'                       repos.json   # alphabetical

jq 'group_by(.lang)'                      repos.json
→ [[{Go items}], [{Python items}], [{Rust items}]]

jq 'unique_by(.lang) | map(.lang)'        repos.json
→ ["Go","Python","Rust"]

# ── any / all ─────────────────────────────────────────────────
jq 'any(.[]; .stars > 500)'              repos.json   → true
jq 'all(.[]; .archived == false)'        repos.json   → false

# ── flatten ───────────────────────────────────────────────────
jq '. | flatten'         <<< '[[1,2],[3,[4,5]]]'   → [1,2,3,4,5]
jq '. | flatten(1)'      <<< '[[1,2],[3,[4,5]]]'   → [1,2,3,[4,5]]

4 — Advanced Filters: `reduce`, Paths, and Shell Variable Injection

Shell variables in jq — the right way

Never interpolate shell variables directly into jq filters. jq ".name == \"$name\"" breaks on special characters and is a code injection risk. Always pass shell values via --arg (string) or --argjson (JSON value).

🔧 Passing shell variables into jq safely

# ── --arg NAME VALUE — injects a shell string as $NAME ────────
lang="Python"
jq --arg lang "$lang" \
   '.[] | select(.lang == $lang) | .name' repos.json

threshold=100
# --arg makes it a string; compare with tonumber to use as int
jq --arg t "$threshold" \
   '.[] | select(.stars > ($t | tonumber))' repos.json

# ── --argjson NAME JSON — injects a real JSON value as $NAME ──
jq --argjson t "$threshold" \
   '.[] | select(.stars > $t)' repos.json   # $t is already int — no conversion

jq --argjson active "false" \
   '.[] | select(.archived == $active)' repos.json

# ── --slurpfile NAME FILE — reads FILE as JSON array into $NAME
jq --slurpfile cfg config.json '$cfg[0].timeout'

# ── --rawfile NAME FILE — reads FILE as raw string into $NAME ─
jq --rawfile tmpl template.txt '{template: $tmpl}'

# ── env object — access environment variables ─────────────────
jq -n 'env.HOME'                            → "/home/user"
jq -n 'env | keys | length'               → count of env vars
API_KEY="secret123"
jq -n '{"key": env.API_KEY}'              → {"key":"secret123"}

# ── -n flag — null input (no stdin needed) ────────────────────
# Useful when building JSON from scratch or from --arg values
jq -n --arg name "Alice" --argjson age 32 \
   '{name: $name, age: $age}'
→ {"name":"Alice","age":32}

reduce and foreach

🔧 reduce for accumulation, paths for deep navigation

# ── reduce: fold an array into a single value ─────────────────
# reduce EXPR as $var (INIT; ACCUMULATOR)

# Sum of scores:
jq 'reduce .scores[] as $s (0; . + $s)'  data.json    → 273

# Sum of stars across all repos:
jq 'reduce .[] as $r (0; . + $r.stars)'  repos.json   → 1083

# Build a lookup map: name → stars
jq 'reduce .[] as $r ({}; . + {($r.name): $r.stars})' repos.json
→ {"alpha":142,"beta":38,"gamma":891,"delta":12}

# ── paths: navigate deeply nested structures ─────────────────
# paths outputs every path in the document as an array
jq '[paths]'            data.json
→ [["name"],["age"],["address","city"],["address","zip"],...]

# getpath / setpath / delpaths
jq 'getpath(["address","city"])'         data.json   → "London"
jq 'setpath(["address","country"]; "UK")' data.json
jq 'delpaths([["age"],["scores"]])'       data.json

# ── walk: recursively transform every node ────────────────────
# Lowercase all string values in an entire document:
jq 'walk(if type == "string" then ascii_downcase else . end)' data.json

# ── limit: stop iteration early ───────────────────────────────
jq '[limit(3; .[])]'    repos.json             → first 3 items
jq 'first(.[] | select(.stars > 100))' repos.json  → first match

5 — Building JSON from Shell Data

Often you need to go the other direction: take shell variables and system data and assemble them into a JSON payload — for an API call, a log entry, or a config file. jq -n with --arg/--argjson is the safe, composable way to do this.

🔧 Constructing JSON payloads from shell variables

# ── Simple object from shell variables ───────────────────────
hostname=$(hostname)
uptime_secs=$(awk '{print int($1)}' /proc/uptime)
timestamp=$(date -Is)

jq -n \
  --arg     host    "$hostname"     \
  --argjson uptime  "$uptime_secs"  \
  --arg     ts      "$timestamp"    \
  '{
    "hostname": $host,
    "uptime_seconds": $uptime,
    "timestamp": $ts
  }'

# ── Build a JSON array from a bash array ──────────────────────
services=( nginx postgres redis )
# Method 1: printf + jq --slurp --raw-input
printf '%s\n' "${services[@]}" | jq -R '.' | jq -s '.'
→ ["nginx","postgres","redis"]

# Method 2: build directly with --arg and string split
svc_list="nginx,postgres,redis"
jq -n --arg svcs "$svc_list" '$svcs | split(",")'
→ ["nginx","postgres","redis"]

# ── Collect system metrics into a JSON report ─────────────────
system_json() {
    local host cpu_idle mem_free mem_total disk_pct
    host=$(hostname -f)
    cpu_idle=$(top -bn1 | awk '/Cpu/ {print $8+0}')
    mem_free=$(awk '/MemAvailable/{print $2}' /proc/meminfo)
    mem_total=$(awk '/MemTotal/{print $2}' /proc/meminfo)
    disk_pct=$(df / | awk 'NR==2{print $5+0}')

    jq -n \
      --arg     host      "$host"       \
      --argjson cpu_idle  "$cpu_idle"  \
      --argjson mem_free  "$mem_free"  \
      --argjson mem_total "$mem_total" \
      --argjson disk_pct  "$disk_pct"  \
      --arg     ts        "$(date -Is)" \
      '{
        hostname:       $host,
        timestamp:      $ts,
        cpu_idle_pct:   $cpu_idle,
        memory: {
          total_kb:     $mem_total,
          available_kb: $mem_free,
          used_pct:     (100 - ($mem_free * 100 / $mem_total) | round)
        },
        disk_root_pct:  $disk_pct
      }'
}

system_json | tee metrics.json

Updating existing JSON

🔧 The |= update operator and field manipulation

# |= updates a field in place (reads current value as .)
jq '.age |= . + 1'                    data.json   # increment age
jq '.name |= ascii_upcase'            data.json   # uppercase name
jq '.scores |= map(. + 5)'            data.json   # add 5 to every score
jq '.tags |= . + ["reviewer"]'        data.json   # append to array
jq '.address.country = "UK"'          data.json   # add new field
jq 'del(.age)'                         data.json   # remove field
jq 'del(.address.zip)'                data.json   # remove nested field

# ── Merge two JSON objects ────────────────────────────────────
# Shallow merge (right wins on conflict):
jq -s '.[0] * .[1]' base.json override.json

# Deep merge (recursive, right wins):
jq -s '
  def deep_merge(a; b):
    if (a | type) == "object" and (b | type) == "object"
    then reduce (b | keys[]) as $k (a; .[$k] = deep_merge(a[$k]; b[$k]))
    else b
    end;
  deep_merge(.[0]; .[1])
' base.json override.json

# ── Edit a JSON config file in-place ──────────────────────────
tmp=$(mktemp)
jq '.database.port = 5433' config.json > "$tmp" && mv "$tmp" config.json

jq has no in-place edit flag — always redirect to a temp file, then move it over the original. This is atomic on POSIX filesystems (same partition).

6 — Calling REST APIs with `curl` + `jq`

The standard pattern: curl fetches the JSON response; jq extracts what you need. Topic 8 covers curl in depth — here we focus on the processing side and the patterns that make API scripts robust.

🔧 curl + jq patterns for REST APIs

# ── Basic GET → extract field ─────────────────────────────────
curl -s https://api.github.com/repos/stedolan/jq \
  | jq -r '.stargazers_count'

# ── POST with JSON body ───────────────────────────────────────
payload=$(jq -n --arg user "alice" --arg pass "$PASSWORD" \
           '{username: $user, password: $pass}')

curl -s -X POST \
     -H 'Content-Type: application/json' \
     -d "$payload" \
     https://api.example.com/auth/login \
  | jq -r '.token'

# ── Check HTTP status + parse body ───────────────────────────
response=$(curl -s -w '\n%{http_code}' https://api.example.com/users)
status=$(tail -n1 <<< "$response")
body=$(head -n-1 <<< "$response")

if [[ "$status" != "200" ]]; then
    echo "API error $status: $(jq -r '.message // "unknown"' <<< "$body")" >&2
    exit 1
fi
jq '.' <<< "$body"

# ── Paginated API — collect all pages ─────────────────────────
fetch_all_pages() {
    local url="$1"
    local page=1
    local all_items='[]'

    while true; do
        local data
        data=$(curl -s "${url}?page=${page}&per_page=100")
        local count; count=$(jq 'length' <<< "$data")
        (( count == 0 )) && break
        all_items=$(jq -s '.[0] + .[1]' <<< "${all_items}"$'\n'"${data}")
        (( count < 100 )) && break
        (( page++ ))
    done
    printf '%s' "$all_items"
}

# ── GitHub: list all repos and their star counts ──────────────
github_repos() {
    local org="$1"
    local token="${GITHUB_TOKEN:-}"
    local auth_header=""
    [[ -n "$token" ]] && auth_header="-H 'Authorization: token $token'"

    curl -s $auth_header \
         "https://api.github.com/orgs/${org}/repos?per_page=100&sort=stars" \
    | jq -r '.[] | [.name, .stargazers_count, .language // "N/A"] | @tsv'
}
# @tsv formats an array as a tab-separated line — perfect for shell processing

Output formats: @base64, @uri, @csv, @tsv, @html, @sh

🔧 jq format strings for output encoding

# @tsv — format array as TSV row (great for shell consumption)
jq -r '.[] | [.name, .stars, .lang] | @tsv'  repos.json
→ alpha	142	Python
   beta	38	Go

# @csv — format array as CSV row (quoted if needed)
jq -r '.[] | [.name, .stars, .lang] | @csv'  repos.json
→ "alpha",142,"Python"

# @sh — shell-quote values (safe to eval)
jq -r '.name | @sh'                          data.json
→ 'Alice'   (safe even if name contained spaces or quotes)

# @base64 / @base64d
jq -r '"hello world" | @base64'
→ aGVsbG8gd29ybGQ=

jq -r '"aGVsbG8gd29ybGQ=" | @base64d'
→ hello world

# @uri — URL-encode a string
jq -rn '"hello world" | @uri'
→ hello%20world

# @html — HTML-escape
jq -rn '"bold & \" quote" | @html'
→ <b>bold</b> & " quote

# ── Read @tsv output back into a while loop ───────────────────
while IFS=$'\t' read -r name stars lang; do
    printf '%-15s %5s stars  (%s)\n' "$name" "$stars" "$lang"
done < <(jq -r '.[] | [.name, .stars, .lang] | @tsv' repos.json)

7 — Streaming, Multiple Documents, and Compact Output

🔧 Handling multiple JSON documents and large files

# ── -s / --slurp: read all input into one array ───────────────
# Useful when input is multiple JSON documents (one per line = JSONL)
jq -s '.' file1.json file2.json    # → [doc1, doc2]
jq -s '.[0] + .[1]' a.json b.json  # → merged array

# ── JSONL (JSON Lines) — one JSON object per line ─────────────
# Common in log files, streaming APIs, database exports
# {"ts":"2026-06-09","level":"ERROR","msg":"disk full"}
# {"ts":"2026-06-09","level":"INFO", "msg":"restarted"}

# Process JSONL: each line is a separate document
jq '.level'                            app.jsonl   # prints each level
jq 'select(.level == "ERROR")'          app.jsonl   # filter errors only
jq -r '[.ts, .level, .msg] | @tsv'    app.jsonl   # tabular output

# Count errors by level:
jq -s 'group_by(.level) | map({(.[0].level): length}) | add' app.jsonl

# ── Convert JSON array to JSONL ───────────────────────────────
jq -c '.[]'         repos.json     # -c = compact, one object per line

# ── Convert JSONL to JSON array ───────────────────────────────
jq -s '.'           app.jsonl

# ── --stream for very large files (no slurp into memory) ──────
# --stream outputs [path,value] pairs, one at a time
jq --stream 'select(.[0][-1] == "name") | .[1]' huge.json

# ── -c compact: no whitespace — for writing back JSON files ───
jq -c '.database.port = 5433' config.json  # compact single-line output

# ── Combine multiple JSON files ───────────────────────────────
# Merge all *.json files in a directory into one array
jq -s '.' *.json

# Add a source filename to each document
for f in *.json; do
    jq --arg src "$f" '. + {_source: $src}' "$f"
done | jq -s '.'   # collect into one array

8 — Quick Reference

Filter / flag	What it does	Example
`.`	Identity / pretty-print	jq '.' file.json
`.key`	Object field access	.name → "Alice"
`.a.b`	Nested field	.address.city
`.arr[N]`	Array index (negative OK)	.scores[-1]
`.arr[]`	Iterate array	one value per output
`// default`	Alternative if null/false	.x // "N/A"
`\|`	Pipe: chain filters	.arr \| length
`,`	Multiple outputs	.a, .b
`[f]`	Collect outputs into array	[.[] \| select(…)]
`{k: f}`	Build new object	{name, city: .address.city}
`select(cond)`	Keep if condition is true	select(.age > 18)
`map(f)`	Apply filter to every element	map(.name)
`sort_by(f)`	Sort array by key	sort_by(.stars) \| reverse
`group_by(f)`	Group array by key	group_by(.lang)
`unique_by(f)`	Deduplicate by key
`to_entries`	Object → [{key,value}]	to_entries[] \| .key
`from_entries`	[{key,value}] → object
`reduce A as $x (I; E)`	Fold/accumulate	reduce .[] as $n (0; .+$n)
`.\|= f`	Update in place	.age \|= . + 1
`del(.k)`	Remove field
`@tsv @csv @sh @base64 @uri`	Output format encodings	[.a,.b] \| @tsv
`-r`	Raw string output (no quotes)
`-c`	Compact output (no whitespace)
`-n`	Null input (build from scratch)
`-s`	Slurp all input into array
`--arg N V`	Inject shell string as $N
`--argjson N V`	Inject JSON value as $N
`-e`	Exit 1 if output is null/false	useful in if statements

✏️ Exercises

All exercises use real-world patterns you'll encounter when scripting against APIs and JSON data stores.

Exercise 1

Write a script called json_report.sh that reads a JSONL file of server log entries (each line: {"ts":"…","host":"…","level":"INFO|WARN|ERROR","msg":"…","duration_ms":N}) and produces a summary report showing: total entries, counts per level, the top 5 slowest requests (host + msg + duration), and average duration per log level. Accept the file path as an argument; default to stdin.

Hint: use jq -s to slurp all lines into an array. Use group_by(.level) for per-level stats. For top-5 slowest use sort_by(.duration_ms) | reverse | .[0:5]. Use reduce or map(.duration_ms) | add / length for averages. Format the final output with multiple jq calls or a single large program with def.

Sample Solution

#!/usr/bin/env bash
# json_report.sh [JSONL_FILE]
set -euo pipefail

FILE="${1:--}"   # default to stdin

# Generate test data if no file given and stdin is a terminal
if [[ "$FILE" == "-" && -t 0 ]]; then
    cat <<'EOF'
{"ts":"2026-06-09T10:00:01","host":"web-1","level":"INFO", "msg":"GET /api/users",   "duration_ms":42}
{"ts":"2026-06-09T10:00:02","host":"web-2","level":"ERROR","msg":"GET /api/orders",  "duration_ms":1204}
{"ts":"2026-06-09T10:00:03","host":"web-1","level":"WARN", "msg":"POST /api/upload", "duration_ms":891}
{"ts":"2026-06-09T10:00:04","host":"web-3","level":"INFO", "msg":"GET /api/users",   "duration_ms":38}
{"ts":"2026-06-09T10:00:05","host":"web-2","level":"ERROR","msg":"DELETE /api/item",  "duration_ms":3201}
{"ts":"2026-06-09T10:00:06","host":"web-1","level":"INFO", "msg":"GET /healthz",     "duration_ms":5}
{"ts":"2026-06-09T10:00:07","host":"web-3","level":"WARN", "msg":"PUT /api/config",  "duration_ms":654}
EOF
    exit 0
fi

# All analysis in one jq program
jq -rs '
  # Summary stats
  . as $all |
  {
    total: length,
    levels: (group_by(.level) | map({
      level:   .[0].level,
      count:   length,
      avg_ms:  (map(.duration_ms) | add / length | round),
      max_ms:  map(.duration_ms) | max
    })),
    slowest: (sort_by(.duration_ms) | reverse | .[0:5] | map({
      host,
      msg,
      level,
      duration_ms
    })),
    overall_avg_ms: (map(.duration_ms) | add / length | round),
    overall_max_ms: (map(.duration_ms) | max)
  }
' "$FILE" | jq -r '
  "═══════════════════════════════════════",
  "  LOG REPORT SUMMARY",
  "═══════════════════════════════════════",
  "  Total entries:  \(.total)",
  "  Overall avg:    \(.overall_avg_ms)ms",
  "  Overall max:    \(.overall_max_ms)ms",
  "",
  "  BY LEVEL:",
  (.levels[] | "  \(.level)\t count=\(.count)\t avg=\(.avg_ms)ms\t max=\(.max_ms)ms"),
  "",
  "  TOP 5 SLOWEST REQUESTS:",
  (.slowest[] | "  [\(.level)] \(.host) \(.msg) — \(.duration_ms)ms")
'

Exercise 2

Write a script called json_diff.sh that compares two JSON files and reports: keys present in file 1 but not file 2, keys present in file 2 but not file 1, and keys present in both but with different values. Work at the top level only (no deep recursion required). Accept both file paths as arguments.

Hint: use jq -s to load both files into a two-element array. Use keys to get key sets, then set operations: - (difference) isn't built-in, but you can use map(select(. as $k | other_keys | contains([$k]) | not)). For value comparison, iterate keys that appear in both and compare with .[0][$k] != .[1][$k].

Sample Solution

#!/usr/bin/env bash
# json_diff.sh FILE1 FILE2
set -euo pipefail

F1="${1:?Usage: json_diff.sh FILE1 FILE2}"
F2="${2:?file2 required}"

for f in "$F1" "$F2"; do
    [[ -f "$f" ]] || { printf 'Not found: %s\n' "$f" >&2; exit 1; }
done

jq -rs \
  --arg f1 "$F1" \
  --arg f2 "$F2" \
  '
  .[0] as $a | .[1] as $b |
  ($a | keys) as $ka |
  ($b | keys) as $kb |

  # Keys only in file 1
  ($ka | map(select(. as $k | $kb | contains([$k]) | not))) as $only_a |
  # Keys only in file 2
  ($kb | map(select(. as $k | $ka | contains([$k]) | not))) as $only_b |
  # Keys in both but different values
  ($ka | map(select(. as $k | $kb | contains([$k]))) |
   map(select(. as $k | $a[$k] != $b[$k]))) as $changed |

  "Comparing \($f1) vs \($f2)\n",

  if ($only_a | length) > 0 then
    "Only in \($f1):",
    ($only_a[] | "  - \(.)  = \($a[.])")
  else "Only in \($f1): (none)" end,
  "",

  if ($only_b | length) > 0 then
    "Only in \($f2):",
    ($only_b[] | "  + \(.)  = \($b[.])")
  else "Only in \($f2): (none)" end,
  "",

  if ($changed | length) > 0 then
    "Changed values:",
    ($changed[] | "  ~ \(.)\n      was: \($a[.])\n      now: \($b[.])")
  else "Changed values: (none)" end
' "$F1" "$F2"

Exercise 3

Write a function called gh_repo_stats() that calls the GitHub API to fetch statistics about a given owner/repo: stars, forks, open issues, language, last push date, and the 5 most recent open issues (title + created_at + labels). If GITHUB_TOKEN is set in the environment, use it for authentication. Format the output as a readable summary. Handle API errors gracefully — if the repo doesn't exist or the rate limit is hit, print a helpful message to stderr and return 1.

Hint: make two curl calls — one to /repos/owner/repo and one to /repos/owner/repo/issues?state=open&per_page=5&sort=created. Check the HTTP status code with -w '%{http_code}'. For the issues list, use jq -r '.[] | [.number, .title, .created_at, ([.labels[].name] | join(","))] | @tsv'.

Sample Solution

gh_repo_stats() {
    local repo="${1:?Usage: gh_repo_stats owner/repo}"
    local api="https://api.github.com"
    local auth=()
    [[ -n "${GITHUB_TOKEN:-}" ]] && \
        auth=( -H "Authorization: Bearer $GITHUB_TOKEN" )

    _gh_get() {
        local url="$1"
        local out
        out=$(curl -s -w '\n%{http_code}' \
                   -H 'Accept: application/vnd.github+json' \
                   "${auth[@]}" "$url")
        local status="${out##*$'\n'}"
        local body="${out%$'\n'*}"

        if [[ "$status" == "404" ]]; then
            printf 'Error: repository "%s" not found\n' "$repo" >&2
            return 1
        elif [[ "$status" == "403" ]]; then
            local msg; msg=$(jq -r '.message' <<< "$body")
            printf 'Error 403: %s\n' "$msg" >&2
            return 1
        elif [[ "$status" != "200" ]]; then
            printf 'HTTP error %s for %s\n' "$status" "$url" >&2
            return 1
        fi
        printf '%s' "$body"
    }

    # Fetch repo metadata
    local info; info=$(_gh_get "${api}/repos/${repo}") || return 1

    # Fetch recent open issues
    local issues; issues=$(_gh_get \
        "${api}/repos/${repo}/issues?state=open&per_page=5&sort=created&direction=desc"\
    ) || return 1

    # Print summary
    jq -r '
      "╔══════════════════════════════════════════╗",
      "║  " + .full_name + " "*( 40 - (.full_name|length)) + "║",
      "╚══════════════════════════════════════════╝",
      "  Stars:       \(.stargazers_count)",
      "  Forks:       \(.forks_count)",
      "  Open issues: \(.open_issues_count)",
      "  Language:    \(.language // "N/A")",
      "  Last push:   \(.pushed_at | split("T")[0])",
      "  Description: \(.description // "(none)")"
    ' <<< "$info"

    printf '\n  RECENT OPEN ISSUES:\n'
    jq -r '
      if length == 0 then "  (none)"
      else
        .[] | "  #\(.number) \(.title)\n        Created: \(.created_at | split("T")[0])  Labels: \([.labels[].name] | if length > 0 then join(", ") else "none" end)"
      end
    ' <<< "$issues"
}

# Usage:
# gh_repo_stats stedolan/jq
# GITHUB_TOKEN=ghp_... gh_repo_stats PhilipOsztromok/learning_blog

Exercise 4

Write a script called json_to_env.sh that takes a JSON object (from a file or stdin) and outputs its key-value pairs as shell export statements, suitable for use with eval or source. Nested objects should be flattened with underscore-joined key names (e.g. {"db":{"host":"localhost"}} → export DB_HOST='localhost'). Arrays should be joined with commas. Keys should be uppercased. Handle values that contain single quotes safely.

Hint: use paths(scalars) to get all leaf paths, then getpath(path) for each value. Join the path components with _ and uppercase with ascii_upcase. For arrays, check type == "array" at the path's parent and use join(","). Use @sh format to safely quote the value.

Sample Solution

#!/usr/bin/env bash
# json_to_env.sh [FILE]  — or pipe JSON on stdin
# Outputs: export KEY='value' lines for all scalar leaf values
set -euo pipefail

INPUT="${1:--}"

jq -r '
  # Recurse through all paths, emitting key=value for each scalar leaf
  # and for each array (join with comma)
  def flatten_paths:
    paths(scalars),
    paths(arrays | length > 0 | not | not)  # non-empty arrays
    | . ;

  # Visit every scalar and array leaf
  [paths(scalars), paths(arrays)] | unique[] as $p |
  . as $root |
  ($p | map(tostring) | join("_") | ascii_upcase) as $key |
  ($root | getpath($p)) as $val |
  if ($val | type) == "array" then
    $key + "=" + ($val | map(tostring) | join(",") | @sh)
  else
    $key + "=" + ($val | tostring | @sh)
  end |
  "export " + .
' "$INPUT"

# Example input:
# {
#   "db": {"host": "localhost", "port": 5432, "name": "myapp"},
#   "app": {"debug": false, "workers": 4},
#   "allowed_hosts": ["example.com", "api.example.com"]
# }
#
# Output:
# export ALLOWED_HOSTS='example.com,api.example.com'
# export APP_DEBUG='false'
# export APP_WORKERS='4'
# export DB_HOST='localhost'
# export DB_NAME='myapp'
# export DB_PORT='5432'
#
# Usage:
# eval "$(./json_to_env.sh config.json)"
# echo "$DB_HOST"  → localhost

Working with JSON and jq

🔧 Intermediate Topic 7 — Working with JSON and jq

1 — jq Basics: Identity, Field Access, and Array Indexing

2 — Pipe, Comma, and Constructing New JSON

3 — select, map, and Working with Arrays of Objects

4 — Advanced Filters: reduce, Paths, and Shell Variable Injection

Shell variables in jq — the right way

reduce and foreach

5 — Building JSON from Shell Data

Updating existing JSON

6 — Calling REST APIs with curl + jq

Output formats: @base64, @uri, @csv, @tsv, @html, @sh

7 — Streaming, Multiple Documents, and Compact Output

8 — Quick Reference

✏️ Exercises

3 — `select`, `map`, and Working with Arrays of Objects

4 — Advanced Filters: `reduce`, Paths, and Shell Variable Injection

6 — Calling REST APIs with `curl` + `jq`