Structured Data: CSV, TSV, and INI
📊 Intermediate Topic 6 — Structured Data: CSV, TSV, and INI
Shell scripts constantly consume and produce structured text: exports from databases, configuration files, reports. The tools are right there — awk, IFS splitting, pattern matching — but the edge cases bite hard. A CSV file with quoted fields containing commas will silently corrupt your data if you split naively. An INI file with inline comments or trailing spaces will hand you the wrong value. This chapter covers the full picture: how the formats actually work, where the traps are, and the robust patterns used in production scripts.
1 — CSV Fundamentals: Why It's Harder Than It Looks
CSV sounds simple — values separated by commas. The problem is that real CSV (RFC 4180) allows fields to contain commas, newlines, and double-quote characters, all of which break naive parsing. A field that contains a comma must be wrapped in double-quotes. A double-quote inside a quoted field must be escaped as two consecutive double-quotes ("").
# Input: name,city,note
# Alice,"New York",no comma
# Bob,"Paris, France",has comma
while IFS=',' read -r name city note; do
echo "city=$city"
done < data.csv
# Bob's city = '"Paris' ← WRONG
# note = ' France"' ← WRONG
# RFC 4180 rules:
# 1. Fields MAY be quoted with "
# 2. A quoted field may contain , \n "
# 3. A " inside quotes = ""
# 4. Leading/trailing spaces inside
# quotes are part of the value
#
# "Alice","New York","says ""hi"""
# → Alice | New York | says "hi"
#
# Need a state-machine parser.
# awk's FPAT is the cleanest option.
awk FPAT approach or pipe through a purpose-built tool like csvkit (csvcut, csvstat). For data you generate yourself — logs, reports — you can choose to avoid values that need quoting, or use TSV instead.2 — Robust CSV Parsing with awk and FPAT
GNU awk (gawk) provides a special variable FPAT — Field PATtern — which instead of defining the separator defines what a field looks like. Setting it to a pattern that matches either a quoted field or an unquoted field gives you a proper CSV parser in one line.
# FPAT pattern breakdown:
# ([^,]*) — any unquoted field (zero or more non-comma chars)
# | — OR
# ("([^"]|"")*") — a double-quoted field (handles "" escape inside)
FPAT_CSV='([^,]*)|("([^"]|"")*")'
# ── Print every field on a separate line ──────────────────────
gawk -v FPAT='([^,]*)|("([^"]|"")*")' '
{ for (i=1; i<=NF; i++) print i, $i }
' data.csv
# ── Strip surrounding quotes from a field value ───────────────
# Fields from FPAT still include the outer quotes — strip them:
csv_unquote() {
# Strips surrounding quotes and unescapes "" → "
awk 'BEGIN {
s = ARGV[1]
if (substr(s,1,1) == "\"") s = substr(s, 2, length(s)-2)
gsub(/""/, "\"", s)
print s
}' "$1"
}
# ── Practical: process a CSV with header row ──────────────────
# File: employees.csv
# id,name,department,salary
# 1,Alice,"Engineering",95000
# 2,Bob,"Sales, East",72000
# 3,"O'Brien, Carol","HR",68000
gawk -v FPAT='([^,]*)|("([^"]|"")*")' '
NR == 1 { next } # skip header
{
id = $1
name = $2; gsub(/^"|"$/, "", name) # strip quotes
dept = $3; gsub(/^"|"$/, "", dept)
sal = $4
printf "%-4s %-20s %-20s %s\n", id, name, dept, sal
}
' employees.csv
# ── Filter: only Engineering employees ────────────────────────
gawk -v FPAT='([^,]*)|("([^"]|"")*")' '
NR == 1 { print; next }
{
dept = $3; gsub(/^"|"$/, "", dept)
if (dept == "Engineering") print
}
' employees.csv
# ── Extract a single column by header name ────────────────────
gawk -v FPAT='([^,]*)|("([^"]|"")*")' \
-v col="salary" '
NR == 1 {
for (i=1; i<=NF; i++) {
h = $i; gsub(/^"|"$/, "", h)
if (h == col) colidx = i
}
next
}
colidx { v = $colidx; gsub(/^"|"$/, "", v); print v }
' employees.csv
FPAT is a GNU awk extension — not available in POSIX awk or mawk. Check with gawk --version. On systems where only awk is available, fall back to a Python one-liner or csvkit.Building a reusable csv_parse() library function
# csv_split ROW → prints fields one per line, with quotes stripped
# Pipe its output to mapfile to get an array
csv_split() {
gawk 'BEGIN {
FPAT = "([^,]*)|(\\"([^\\"]|\\"\\")* \\")"
$0 = ARGV[1]
for (i=1; i<=NF; i++) {
v = $i
if (substr(v,1,1) == "\"") v = substr(v, 2, length(v)-2)
gsub(/""/, "\"", v)
print v
}
}' "$1"
}
# Usage: turn a CSV row into a bash array
row='42,"Smith, John","New York","says ""hello"""'
mapfile -t fields < <(csv_split "$row")
echo "ID: ${fields[0]}" → 42
echo "Name: ${fields[1]}" → Smith, John
echo "City: ${fields[2]}" → New York
echo "Note: ${fields[3]}" → says "hello"
# Process a whole file row by row
process_csv() {
local file="$1"
local header_done=0
local row
while IFS= read -r row; do
(( header_done == 0 )) && { header_done=1; continue; }
mapfile -t f < <(csv_split "$row")
# Now f[0], f[1], f[2]... hold the clean field values
printf 'id=%s name=%s\n' "${f[0]}" "${f[1]}"
done < "$file"
}
3 — Generating CSV Output
Writing CSV is easier than reading it — you control the quoting. The safest rule: always quote fields that may contain commas, double-quotes, or newlines; double up any literal double-quotes inside them. For simple data you know is clean, quoting only when necessary is fine.
# ── csv_field: quote a single value for CSV output ────────────
csv_field() {
local val="$1"
# Always quote — simplest and safest
val="${val//\"/\"\"}" # escape " → ""
printf '"%s"' "$val"
}
# ── csv_row: turn args into a CSV line ────────────────────────
csv_row() {
local first=1
for arg in "$@"; do
(( first )) || printf ','
csv_field "$arg"
first=0
done
printf '\n'
}
# ── Generate a CSV report ─────────────────────────────────────
{
csv_row "hostname" "cpu_pct" "mem_pct" "disk_pct" "timestamp"
while IFS= read -r host; do
# Collect metrics for $host (simplified)
cpu=$(ssh "$host" "top -bn1 | awk '/Cpu/ {print 100-\$8}'" 2/dev/null)
ts=$(date '+%Y-%m-%d %H:%M:%S')
csv_row "$host" "${cpu:-N/A}" "N/A" "N/A" "$ts"
done < hosts.txt
} > report.csv
# ── Append to an existing CSV safely ──────────────────────────
csv_append() {
local file="$1"; shift
if [[ ! -f "$file" ]]; then
# Create with header on first call if header provided as first arg
touch "$file"
fi
csv_row "$@" >> "$file"
}
# ── Column extraction without gawk (safe only for simple CSV) ─
# cut -d, -f2 — works only when no quoted commas exist
cut -d',' -f2 employees.csv
# With gawk FPAT — handles quoted commas
gawk -v FPAT='([^,]*)|("([^"]|"")*")' '{v=$2; gsub(/^"|"$/,"",v); print v}' employees.csv
4 — TSV: Tab-Separated Values
TSV uses a tab character as the delimiter. Tabs almost never appear in real data, which means quoting rules are usually unnecessary and parsing with IFS=$'\t' is safe. Many tools (sort, join, awk) integrate cleanly with tab-delimited data. When you control the format, TSV is often the better choice over CSV for shell processing.
# ── Reading TSV ───────────────────────────────────────────────
while IFS=$'\t' read -r id name dept salary; do
printf '%-4s %-20s %-15s %s\n' "$id" "$name" "$dept" "$salary"
done < employees.tsv
# ── Writing TSV ───────────────────────────────────────────────
tsv_row() {
local IFS=$'\t'
printf '%s\n' "$*"
}
tsv_row "id" "name" "department" "salary"
tsv_row "1" "Alice" "Engineering" "95000"
# ── Converting CSV to TSV ─────────────────────────────────────
# Simple (only safe when no embedded commas/newlines in fields)
tr ',' $'\t' < simple.csv > simple.tsv
# Robust (uses gawk FPAT to parse, then re-joins with tab)
gawk -v FPAT='([^,]*)|("([^"]|"")*")' '
{
for (i=1; i<=NF; i++) {
v = $i
gsub(/^"|"$/, "", v) # strip surrounding quotes
gsub(/""/, "\"", v) # unescape doubled quotes
printf "%s%s", (i>1?"\t":""), v
}
print ""
}
' input.csv > output.tsv
# ── Sort TSV by column ────────────────────────────────────────
sort -t$'\t' -k3,3 employees.tsv # sort by col 3 (alphabetic)
sort -t$'\t' -k4,4n -r employees.tsv # sort by col 4 (numeric, desc)
# ── join two TSV files on a common key ────────────────────────
# employees.tsv: id name dept_id
# departments.tsv: dept_id dept_name budget
# Both must be sorted on the join field first
join -t$'\t' -1 3 -2 1 \
<(sort -t$'\t' -k3,3 employees.tsv) \
<(sort -t$'\t' -k1,1 departments.tsv)
5 — INI and Config File Parsing
INI files are ubiquitous: /etc/mysql/my.cnf, ~/.gitconfig, php.ini, smb.conf, countless application configs. The format has no strict standard but the conventions are consistent: [section] headers, key=value or key = value pairs, and # or ; line comments. The traps are whitespace around the =, inline comments, multi-line values, and section awareness.
# Sample config.ini:
# [database]
# host = db.example.com
# port = 5432
# name = myapp_prod # production DB
#
# [cache]
# host = redis.example.com
# port = 6379
# ── Get a single value: ini_get FILE SECTION KEY ──────────────
ini_get() {
local file="$1" section="$2" key="$3"
awk -v s="$section" -v k="$key" '
/^\[/ {
gsub(/[\[\]]/, "")
cur = $0
next
}
cur == s && /=/ {
# Split on first = only
eq = index($0, "=")
key_part = substr($0, 1, eq-1)
val_part = substr($0, eq+1)
# Trim whitespace
gsub(/^[[:space:]]+|[[:space:]]+$/, "", key_part)
gsub(/^[[:space:]]+/, "", val_part)
# Strip inline comment (# or ;)
sub(/[[:space:]]+[#;].*$/, "", val_part)
# Trim trailing whitespace
gsub(/[[:space:]]+$/, "", val_part)
if (key_part == k) { print val_part; exit }
}
' "$file"
}
db_host=$(ini_get config.ini database host)
db_port=$(ini_get config.ini database port)
db_name=$(ini_get config.ini database name)
echo "Connecting to $db_host:$db_port/$db_name"
# ── Load an entire section into bash variables ─────────────────
ini_load_section() {
local file="$1" section="$2" prefix="${3:-}"
local k v
while IFS='=' read -r k v; do
# Strip whitespace and inline comments
k="${k// /}"
v="${v#"${v%%[![:space:]]*}"}" # ltrim
v="${v%%[[:space:]]*#*}" # strip inline comment
v="${v%"${v##*[![:space:]]}"}" # rtrim
[[ -z "$k" ]] && continue
declare -g "${prefix}${k}=${v}"
done < <(awk -v s="$section" '
/^\[/ { gsub(/[\[\]]/,""); cur=$0; next }
cur==s && /=/ { print }
cur==s && /^\s*($|[#;])/ { next }
' "$file")
}
# Loads database.host, database.port, etc. as shell vars with prefix
ini_load_section config.ini database db_
echo "host=$db_host port=$db_port"
Writing and updating INI values
# ini_set FILE SECTION KEY VALUE
# Updates the key if it exists; inserts it if it doesn't
ini_set() {
local file="$1" section="$2" key="$3" value="$4"
if grep -q "^\[${section}\]" "$file"; then
# Section exists — try to update existing key
if awk -v s="$section" -v k="$key" '
/^\[/ { cur=substr($0,2,length($0)-2) }
cur==s && $0 ~ "^[[:space:]]*"k"[[:space:]]*=" { found=1 }
END { exit !found }
' "$file"; then
# Key exists — update it with sed
sed -i "/^\[${section}\]/,/^\[/" \
-e "/^\[${section}\]/,/^\[/ s|^\([[:space:]]*${key}[[:space:]]*=\).*|\1 ${value}|" \
"$file"
else
# Key missing — append to section
sed -i "/^\[${section}\]/a ${key} = ${value}" "$file"
fi
else
# Section missing — append at end of file
printf '\n[%s]\n%s = %s\n' "$section" "$key" "$value" >> "$file"
fi
}
ini_set config.ini database port 5433
ini_set config.ini database timeout 30 # new key
ini_set config.ini newapp debug true # new section + key
# ── List all sections in a file ───────────────────────────────
ini_sections() {
grep -oP '(?<=^\[)[^\]]*' "$1"
}
# ── List all keys in a section ────────────────────────────────
ini_keys() {
awk -v s="$2" '
/^\[/ { gsub(/[\[\]]/,""); cur=$0; next }
cur==s && /=/ {
eq=index($0,"="); k=substr($0,1,eq-1)
gsub(/^[[:space:]]+|[[:space:]]+$/,"",k)
print k
}
' "$1"
}
ini_sections config.ini → database
cache
ini_keys config.ini database → host
port
name
6 — Key=Value and .env Files
The simpler cousin of INI: a flat file of KEY=VALUE pairs with no sections. Docker Compose, systemd unit EnvironmentFile=, and countless deployment tools use this format. Bash can source it directly — but only after sanitising it, because a malicious or malformed .env file can execute arbitrary code on source.
# ── UNSAFE: direct source runs any code in the file ───────────
# source .env ← NEVER do this with untrusted files
# . .env ← same risk
# ── Safe: parse and export only valid KEY=VALUE lines ─────────
load_env() {
local envfile="${1:-.env}"
[[ -f "$envfile" ]] || { echo "$envfile not found" >&2; return 1; }
local key val line
while IFS= read -r line; do
# Skip blanks and comments
[[ "$line" =~ ^[[:space:]]*($|#) ]] && continue
# Must match KEY=VALUE with valid identifier key
[[ "$line" =~ ^([A-Za-z_][A-Za-z0-9_]*)=(.*)$ ]] || continue
key="${BASH_REMATCH[1]}"
val="${BASH_REMATCH[2]}"
# Strip optional surrounding quotes
[[ "$val" =~ ^\"(.*)\"$ ]] && val="${BASH_REMATCH[1]}"
[[ "$val" =~ ^\'(.*)\'$ ]] && val="${BASH_REMATCH[1]}"
# Strip inline comment (only outside quotes)
val="${val%%[[:space:]]*\#*}"
export "${key}=${val}"
done < "$envfile"
}
load_env .env.production
echo "DB_HOST=$DB_HOST"
# ── source is fine when you wrote the file yourself ───────────
# In CI or deploy scripts, sourcing a file you generated is OK
# Just never source files from user input or external sources
# ── Generate a .env file ──────────────────────────────────────
write_env() {
local file="$1"; shift
# Args: KEY VALUE KEY VALUE ...
{
printf '# Generated %s\n' "$(date -Is)"
while (( $# >= 2 )); do
k="$1" v="$2"; shift 2
# Quote if value contains spaces, #, or special chars
[[ "$v" =~ [[:space:]#\"] ]] && v='"'"${v//\"/\\\"}"'"'
printf '%s=%s\n' "$k" "$v"
done
} > "$file"
chmod 600 "$file" # env files often contain secrets
}
write_env .env.staging \
DB_HOST staging-db.internal \
DB_PORT 5432 \
APP_SECRET "my secret value" \
DEBUG false
7 — Generating Structured Reports
A common pattern in ops scripts: collect data from multiple sources, assemble it into a structured table, and emit it as either a human-readable aligned table or a machine-readable CSV. The key is separating data collection from formatting.
#!/usr/bin/env bash
# disk_report.sh — show disk usage for a list of paths
set -euo pipefail
FORMAT="${FORMAT:-table}" # table | csv | tsv
PATHS=( "${@:-/}" )
# ── Column definitions: name, width, awk-expression ───────────
declare -A COL_W=( [mount]=20 [total]=8 [used]=8 [avail]=8 [pct]=6 )
# ── Collect data from df ───────────────────────────────────────
# df -h outputs: Filesystem Size Used Avail Use% Mounted-on
collect_data() {
df -h "${PATHS[@]}" 2>/dev/null | tail -n +2 | \
awk '{print $6, $2, $3, $4, $5}' # mount total used avail pct
}
# ── Render as aligned table ────────────────────────────────────
render_table() {
local w_m=${COL_W[mount]} w_t=${COL_W[total]}
local w_u=${COL_W[used]} w_a=${COL_W[avail]} w_p=${COL_W[pct]}
local sep; sep=$(printf '%*s' $(( w_m + w_t + w_u + w_a + w_p + 5 )) '' | tr ' ' '─')
printf "\033[1m%-${w_m}s %${w_t}s %${w_u}s %${w_a}s %${w_p}s\033[0m\n" \
MOUNT TOTAL USED AVAIL PCT
printf '%s\n' "$sep"
while read -r mount total used avail pct; do
# Colour-code by usage percentage
pct_n=${pct//%/}
local colour='\033[0m'
(( pct_n >= 90 )) && colour='\033[31m'
(( pct_n >= 75 && pct_n < 90 )) && colour='\033[33m'
printf "%-${w_m}s %${w_t}s %${w_u}s %${w_a}s ${colour}%${w_p}s\033[0m\n" \
"$mount" "$total" "$used" "$avail" "$pct"
done < <(collect_data)
}
# ── Render as CSV ─────────────────────────────────────────────
render_csv() {
csv_row mount total used avail pct
while read -r mount total used avail pct; do
csv_row "$mount" "$total" "$used" "$avail" "$pct"
done < <(collect_data)
}
# ── Dispatch ──────────────────────────────────────────────────
case "$FORMAT" in
table) render_table ;;
csv) render_csv ;;
tsv) render_csv | tr ',' $'\t' ;; # quick for clean data
*) echo "Unknown FORMAT: $FORMAT" >&2; exit 1 ;;
esac
# Usage:
# ./disk_report.sh /var /home /tmp
# FORMAT=csv ./disk_report.sh / /var
8 — Quick Reference
| Task | Tool / Pattern | Notes |
|---|---|---|
| Parse CSV with quoted fields | gawk FPAT='([^,]*)|("([^"]|"")*")' | GNU awk only |
| Simple CSV split (no quotes) | IFS=',' read -r f1 f2 f3 | Only safe when no quoted commas |
| Extract CSV column by name | gawk -v FPAT=... -v col="name" 'NR==1{...} {print $colidx}' | |
| Sort TSV by column N | sort -t$'\t' -kN,N | Add n flag for numeric sort |
| Join two TSV files | join -t$'\t' -1 N -2 M file1 file2 | Files must be pre-sorted |
| Read INI value | ini_get FILE SECTION KEY | awk-based function |
| Update INI value | ini_set FILE SECTION KEY VALUE | Uses sed in-place |
| Load .env safely | Regex-validate keys, then export "$key=$val" | Never source untrusted files |
| Quote a CSV field | val="${val//\"/\"\"}"; printf '"%s"' "$val" | Always safe to over-quote |
| CSV → TSV (no embedded commas) | tr ',' $'\t' | |
| CSV → TSV (with quoted commas) | gawk FPAT=... + gsub + printf with \t |
csvkit (pip install csvkit) — it provides csvcut, csvjoin, csvstat, csvsql, and handles encoding, BOM markers, and edge cases you haven't thought of yet. For Python users, pandas and its read_csv() are the standard. Shell CSV parsing is the right tool for quick one-offs and environments where you can't install extra packages.✏️ Exercises
Each exercise builds a utility you'd actually reach for. Test with edge cases: values containing commas, empty fields, sections with missing keys, files with Windows-style CRLF line endings.
csv_summary() that accepts a CSV file path and prints a summary of its structure: number of rows (excluding header), number of columns, each column name with its index, and for numeric columns the min, max, and average. Use gawk with FPAT so it handles quoted fields correctly./^-?[0-9.]+$/. Track running sum, min, and max per column in arrays. Print the summary in the END block.csv_summary() {
local file="${1:?csv_summary: file required}"
[[ -f "$file" ]] || { echo "Not found: $file" >&2; return 1; }
gawk '
BEGIN { FPAT = "([^,]*)(\"([^\"]|\"\")*\")" }
function strip(s) {
gsub(/^[[:space:]"]+|[[:space:]"]+$/, "", s)
return s
}
NR == 1 {
ncols = NF
for (i=1; i<=NF; i++) hdr[i] = strip($i)
next
}
{
rows++
for (i=1; i<=NF; i++) {
v = strip($i)
if (v ~ /^-?[0-9]+(\.[0-9]+)?$/) {
is_num[i] = 1
sum[i] += v
if (!(i in mn) || v+0 < mn[i]) mn[i] = v+0
if (!(i in mx) || v+0 > mx[i]) mx[i] = v+0
}
}
}
END {
print "File: " FILENAME
print "Rows: " rows
print "Columns: " ncols
print ""
printf "%-4s %-20s %-10s %s\n", "IDX", "COLUMN", "TYPE", "STATS"
printf "%s\n", "─────────────────────────────────────────────────────"
for (i=1; i<=ncols; i++) {
if (is_num[i]) {
avg = (rows > 0) ? sum[i] / rows : 0
printf "%-4d %-20s %-10s min=%-8g max=%-8g avg=%.2f\n",
i, hdr[i], "numeric", mn[i], mx[i], avg
} else {
printf "%-4d %-20s %-10s\n", i, hdr[i], "text"
}
}
}
' "$file"
}
# Test:
printf 'id,name,score,city\n1,Alice,92,"New York"\n2,Bob,78,"Paris"\n3,Carol,85,London\n' > /tmp/test.csv
csv_summary /tmp/test.csv
csv_filter.sh that filters rows from a CSV file based on a column name and a value. Usage: csv_filter.sh FILE COLUMN VALUE. It should print the header row, then all rows where the named column exactly matches VALUE (case-insensitive). Handle quoted fields correctly and strip surrounding quotes when comparing. Support an optional --contains flag that does a substring match instead of an exact match.--contains flag in a variable passed via -v. Use tolower() in awk for case-insensitive comparison.#!/usr/bin/env bash
# csv_filter.sh [--contains] FILE COLUMN VALUE
set -euo pipefail
MATCH_TYPE="exact"
if [[ "${1:-}" == "--contains" ]]; then
MATCH_TYPE="contains"
shift
fi
FILE="${1:?Usage: csv_filter.sh [--contains] FILE COLUMN VALUE}"
COL="${2:?column name required}"
VAL="${3:?value required}"
[[ -f "$FILE" ]] || { echo "Error: $FILE not found" >&2; exit 1; }
gawk -v col="$COL" -v val="$VAL" -v mtype="$MATCH_TYPE" '
BEGIN {
FPAT = "([^,]*)(\"([^\"]|\"\")*\")"
colidx = 0
val_l = tolower(val)
}
function strip(s) {
gsub(/^[[:space:]"]+|[[:space:]"]+$/, "", s)
gsub(/""/, "\"", s)
return s
}
NR == 1 {
for (i=1; i<=NF; i++) {
h = strip($i)
if (tolower(h) == tolower(col)) { colidx = i }
}
if (!colidx) {
print "Error: column \"" col "\" not found" > "/dev/stderr"
exit 1
}
print # always print header
next
}
{
if (!colidx) next
field = strip($colidx)
field_l = tolower(field)
if (mtype == "exact" && field_l == val_l) print
else if (mtype == "contains" && index(field_l, val_l)) print
}
' "$FILE"
lib_config.sh designed to be sourced by other scripts. It should provide: config_load FILE (loads all sections into CONF[section.key] associative array), config_get SECTION KEY [DEFAULT] (retrieves a value, returns DEFAULT if missing), config_require SECTION KEY (dies with an error if the key is absent), and config_dump (prints all loaded values sorted). Include a self-test that runs when the file is executed directly.declare -gA CONF to create a global associative array. The key format is "${section}.${key}". In config_get, use "${CONF[$section.$key]:-$default}". The ${BASH_SOURCE[0]} == $0 trick from Topic 4 triggers the self-test.#!/usr/bin/env bash
# lib_config.sh — INI config library
# Source this file; if executed directly it runs self-tests
declare -gA CONF=()
_CONF_LOADED=""
# config_load FILE — parse INI into CONF[section.key]
config_load() {
local file="${1:?config_load: file required}"
[[ -f "$file" ]] || { printf 'config_load: %s not found\n' "$file" >&2; return 1; }
CONF=() # reset
_CONF_LOADED="$file"
local cur_section=""
local line k v
while IFS= read -r line; do
# Remove CR (handle Windows CRLF)
line="${line//$'\r'/}"
# Skip blank lines and comments
[[ "$line" =~ ^[[:space:]]*($|[#;]) ]] && continue
# Section header
if [[ "$line" =~ ^\[([A-Za-z0-9_.-]+)\] ]]; then
cur_section="${BASH_REMATCH[1]}"
continue
fi
# key = value (only inside a section)
[[ -z "$cur_section" ]] && continue
[[ "$line" =~ ^[[:space:]]*([A-Za-z0-9_./-]+)[[:space:]]*=[[:space:]]*(.*) ]] || continue
k="${BASH_REMATCH[1]}"
v="${BASH_REMATCH[2]}"
# Strip inline comment
v="${v%%[[:space:]][#;]*}"
# Strip trailing whitespace
v="${v%"${v##*[![:space:]]}"}"
# Strip surrounding quotes
[[ "$v" =~ ^\"(.*)\"$ ]] && v="${BASH_REMATCH[1]}"
[[ "$v" =~ ^\'(.*)\'$ ]] && v="${BASH_REMATCH[1]}"
CONF["${cur_section}.${k}"]="$v"
done < "$file"
}
# config_get SECTION KEY [DEFAULT]
config_get() {
local sec="$1" key="$2" default="${3:-}"
local ckey="${sec}.${key}"
if [[ -v "CONF[$ckey]" ]]; then
printf '%s' "${CONF[$ckey]}"
else
printf '%s' "$default"
fi
}
# config_require SECTION KEY — exits with error if missing
config_require() {
local sec="$1" key="$2"
[[ -v "CONF[${sec}.${key}]" ]] || {
printf 'config_require: [%s] %s is required but not set\n' "$sec" "$key" >&2
return 1
}
printf '%s' "${CONF[${sec}.${key}]}"
}
# config_dump — print all loaded values
config_dump() {
printf '# Config loaded from: %s\n' "$_CONF_LOADED"
local k
for k in $(printf '%s\n' "${!CONF[@]}" | sort); do
printf '%-40s = %s\n' "$k" "${CONF[$k]}"
done
}
# ── Self-test (runs only when executed, not sourced) ───────────
if [[ "${BASH_SOURCE[0]}" == "$0" ]]; then
tmpf=$(mktemp /tmp/lib_config_test.XXXXXX.ini)
trap "rm -f '$tmpf'" EXIT
cat > "$tmpf" <<'EOF'
[database]
host = db.example.com # primary DB
port = 5432
name = "myapp prod"
[cache]
host = redis.example.com
port = 6379
EOF
pass=0; fail=0
assert_eq() {
if [[ "$1" == "$2" ]]; then
printf '\033[32m✓\033[0m %s\n' "$3"; (( pass++ ))
else
printf '\033[31m✗\033[0m %s (got "%s", want "%s")\n' "$3" "$1" "$2"; (( fail++ ))
fi
}
config_load "$tmpf"
assert_eq "$(config_get database host)" "db.example.com" "host without inline comment"
assert_eq "$(config_get database port)" "5432" "port as string"
assert_eq "$(config_get database name)" "myapp prod" "quoted value"
assert_eq "$(config_get cache host)" "redis.example.com" "cache host"
assert_eq "$(config_get database missing default_val)" \
"default_val" "missing key returns default"
assert_eq "$(config_get nosec nokey fallback)" \
"fallback" "missing section returns default"
echo; config_dump
printf '\nResults: %d passed, %d failed\n' "$pass" "$fail"
(( fail == 0 ))
fi
csv_join.sh that performs a left join on two CSV files, matching rows by a common key column. Usage: csv_join.sh FILE1 COL1 FILE2 COL2. The output should have all columns from FILE1, followed by all non-key columns from FILE2. Rows in FILE1 with no match in FILE2 should still appear, with empty strings for the FILE2 columns. Handle quoted fields throughout.-v. To find column indices, process NR==1 of each file separately using BEGINFILE or by running two awk programs.#!/usr/bin/env bash
# csv_join.sh FILE1 COL1 FILE2 COL2
# Left join: all rows from FILE1, matched with FILE2 on key columns
set -euo pipefail
F1="${1:?Usage: csv_join.sh FILE1 COL1 FILE2 COL2}"
C1="${2:?col1 required}"
F2="${3:?file2 required}"
C2="${4:?col2 required}"
for f in "$F1" "$F2"; do
[[ -f "$f" ]] || { printf 'Not found: %s\n' "$f" >&2; exit 1; }
done
gawk \
-v f1="$F1" -v c1="$C1" \
-v f2="$F2" -v c2="$C2" \
'
BEGIN {
FPAT = "([^,]*)(\"([^\"]|\"\")*\")"
FS = ","
}
function strip(s, v) {
v = s
gsub(/^[[:space:]"]+|[[:space:]"]+$/, "", v)
gsub(/""/, "\"", v)
return v
}
function csv_field(s, v) {
# Always quote output for safety
v = s; gsub(/"/, "\"\"", v)
return "\"" v "\""
}
# ── Pass 1: load FILE2 into memory ─────────────────────────
BEGINFILE {
if (FILENAME == f2) in_f2 = 1
else in_f2 = 0
}
in_f2 && FNR == 1 {
for (i=1; i<=NF; i++) {
h = strip($i)
if (tolower(h) == tolower(c2)) { f2_key_idx = i }
f2_hdr[i] = h
}
f2_ncols = NF
next
}
in_f2 {
key = strip($f2_key_idx)
row = ""
for (i=1; i<=NF; i++) {
if (i == f2_key_idx) continue # exclude key col from f2 output
row = row (row ? SUBSEP : "") strip($i)
}
f2_data[key] = row
next
}
# ── Pass 2: process FILE1, join with FILE2 data ─────────────
!in_f2 && FNR == 1 {
for (i=1; i<=NF; i++) {
h = strip($i)
if (tolower(h) == tolower(c1)) f1_key_idx = i
f1_hdr[i] = h
}
f1_ncols = NF
# Print merged header
out = ""
for (i=1; i<=f1_ncols; i++) out = out (out?",":"") csv_field(f1_hdr[i])
for (i=1; i<=f2_ncols; i++) {
if (i == f2_key_idx) continue
out = out "," csv_field(f2_hdr[i])
}
print out
next
}
!in_f2 {
key = strip($f1_key_idx)
out = ""
for (i=1; i<=f1_ncols; i++) out = out (out?",":"") csv_field(strip($i))
if (key in f2_data) {
n = split(f2_data[key], parts, SUBSEP)
for (i=1; i<=n; i++) out = out "," csv_field(parts[i])
} else {
# No match: fill with empty fields
empties = f2_ncols - 1 # -1 for the key col we excluded
for (i=1; i<=empties; i++) out = out ","
}
print out
}
' "$F2" "$F1" # F2 first so it is loaded before F1 is processed
# Example:
# employees.csv: id,name,dept_id
# departments.csv: dept_id,dept_name,budget
# ./csv_join.sh employees.csv dept_id departments.csv dept_id