Advanced Array Patterns
Chapter 3 — Advanced Array Patterns
Bash arrays go far deeper than iteration and length. This chapter covers the patterns that turn arrays into real data structures: sparse indexing, simulated multidimensional layouts, associative-array techniques, set algebra, in-place sorting, and serialisation strategies that survive process boundaries. All examples assume Bash 4.2 or later.
1 — Indexed Array Internals: Sparse Arrays
Bash indexed arrays are sparse — indices need not be contiguous, and a large highest index does not allocate intermediate memory. This is different from arrays in most languages and enables some useful patterns but also some surprising traps.
# Sparse assignment — gaps are fine arr[0]=first arr[100]=hundredth echo "${#arr[@]}" # 2 — counts ELEMENTS, not highest index echo "${!arr[@]}" # 0 100 — the actual indices (keys) # Length vs highest index trap arr=( a b c ) arr[99]=z echo "${#arr[@]}" # 4 — four elements echo "${arr[-1]}" # z — negative indices count from highest key (bash 4.3+) # Iterating sparse arrays safely — always use ${!arr[@]} for i in "${!arr[@]}"; do printf '[%d] = %s\n' "$i" "${arr[$i]}" done # Dangerous: for (( i=0; i<${#arr[@]}; i++ )) skips sparse gaps # Safe: iterate keys with ${!arr[@]} # Compact a sparse array back to contiguous 0-based compact=( "${arr[@]}" ) # re-index as 0,1,2,... — values preserved, gaps removed
Unsetting and re-packing
arr=( a b c d e ) unset 'arr[2]' # removes index 2 — array is now sparse: 0,1,3,4 echo "${arr[@]}" # a b d e — gap is invisible in values echo "${!arr[@]}" # 0 1 3 4 # Remove element by value — filter into a new array array_remove_value() { local -n __arr="$1" local __target="$2" local -a __new=() local __elem for __elem in "${__arr[@]}"; do [[ $__elem != "$__target" ]] && __new+=( "$__elem" ) done __arr=( "${__new[@]}" ) } fruits=( apple banana cherry banana ) array_remove_value fruits banana printf '%s\n' "${fruits[@]}" # apple cherry
2 — Simulating 2D Arrays
Bash has no multidimensional arrays. The standard workaround encodes two indices into a single key using a separator that cannot appear in the data.
Associative array with compound key
declare -A grid # Store grid[row,col] using comma as separator grid_set() { local -n __g="$1" __g["${2},${3}"]="$4" } grid_get() { local -n __g="$1" printf '%s' "${__g["${2},${3}"]}" } grid_set grid 0 0 'top-left' grid_set grid 0 1 'top-right' grid_set grid 1 0 'bottom-left' grid_get grid 0 1 # top-right # Iterate all cells: parse the compound key for key in "${!grid[@]}"; do row="${key%%,*}" col="${key##*,}" printf 'grid[%s][%s] = %s\n' "$row" "$col" "${grid[$key]}" done
Maps of arrays via name-prefix convention
When you need a true map-of-arrays (e.g. a graph's adjacency list), use a naming convention to synthesise array names dynamically.
# Adjacency list: graph_edges_NODE is an array of neighbours graph_add_edge() { local from="$1" to="$2" local arrname="graph_edges_${from}" declare -ga "$arrname" # declare global array if not yet existing local -n __edges="$arrname" __edges+=( "$to" ) } graph_neighbours() { local arrname="graph_edges_${1}" local -n __edges="$arrname" printf '%s\n' "${__edges[@]}" } graph_add_edge A B graph_add_edge A C graph_add_edge B D graph_neighbours A # B C graph_neighbours B # D
3 — Associative Array Techniques
Default values and the -v test
declare -A counts # Increment a counter — works even if key doesn't exist yet counter_inc() { local -n __map="$1" local __key="$2" (( __map["$__key"]++ )) || true # || true: (()) exits 1 when result is 0 } while IFS= read -r word; do counter_inc counts "$word" done < words.txt # Print sorted by count descending for word in "${!counts[@]}"; do printf '%d\t%s\n' "${counts[$word]}" "$word" done | sort -rn # Test if a key exists (vs testing if value is empty) [[ -v 'counts[hello]' ]] && echo "key exists" # Note: quote the subscript to prevent glob expansion of the key
Copying and merging associative arrays
# Deep copy — no built-in; iterate and assign assoc_copy() { local -n __src="$1" __dst="$2" local __k for __k in "${!__src[@]}"; do __dst["$__k"]="${__src[$__k]}" done } # Merge b into a — b wins on key conflicts assoc_merge() { assoc_copy "$2" "$1" # copy src (b) into dst (a) — overwrites existing keys } declare -A defaults=( [color]=blue [size]=medium [weight]=light ) declare -A overrides=( [color]=red [extra]=yes ) declare -A config assoc_copy defaults config assoc_merge config overrides # config: color=red size=medium weight=light extra=yes
Inverting a map
assoc_invert() { local -n __src="$1" __dst="$2" local __k for __k in "${!__src[@]}"; do __dst["${__src[$__k]}"]="$__k" done } declare -A code_to_name=( [US]='United States' [DE]=Germany [JP]=Japan ) declare -A name_to_code assoc_invert code_to_name name_to_code echo "${name_to_code[Germany]}" # DE
4 — Set Operations
Treating arrays as mathematical sets — union, intersection, and difference — is a common need when comparing lists of servers, features, or packages.
Union
array_union() { # Stores union of arrays $1 and $2 into array $3 local -n __a="$1" __b="$2" __out="$3" declare -A __seen local __e __out=() for __e in "${__a[@]}" "${__b[@]}"; do if [[ ! -v '__seen[$__e]' ]]; then __out+=( "$__e" ) __seen["$__e"]=1 fi done }
Intersection
array_intersect() { local -n __a="$1" __b="$2" __out="$3" declare -A __in_b local __e __out=() for __e in "${__b[@]}"; do __in_b["$__e"]=1; done for __e in "${__a[@]}"; do [[ -v '__in_b[$__e]' ]] && __out+=( "$__e" ) done }
Difference (A minus B)
array_diff() { local -n __a="$1" __b="$2" __out="$3" declare -A __in_b local __e __out=() for __e in "${__b[@]}"; do __in_b["$__e"]=1; done for __e in "${__a[@]}"; do [[ ! -v '__in_b[$__e]' ]] && __out+=( "$__e" ) done } # Demonstration installed=( curl wget git vim tmux ) required=( git vim jq fzf ) declare -a missing common all_tools array_diff required installed missing array_intersect required installed common array_union required installed all_tools printf 'Missing: %s\n' "${missing[@]}" # jq fzf printf 'Common: %s\n' "${common[@]}" # git vim printf 'All: %s\n' "${all_tools[@]}" # git vim jq fzf curl wget tmux
declare -A inside a function: the
-A inside the set-operation functions is local because
declare inside a function creates a local variable by default.
The -g flag would make it global — do not add it here.
5 — In-Place Sorting
Bash provides no built-in sort for arrays. The idiomatic solution routes values through the external sort command; for pure-Bash sorting (no fork), use a simple insertion sort or use readarray + sort.
Sort via sort command — simple and fast
array_sort() { local -n __arr="$1"; shift # Pass remaining args to sort (e.g. -r -n -u) local -a __sorted mapfile -t __sorted < <(printf '%s\n' "${__arr[@]}" | sort "$@") __arr=( "${__sorted[@]}" ) } names=( Charlie Alice Bob Delta Alice ) array_sort names # ascending lexicographic printf '%s\n' "${names[@]}" # Alice Alice Bob Charlie Delta array_sort names -ru # reverse, unique printf '%s\n' "${names[@]}" # Delta Charlie Bob Alice nums=( 10 3 200 45 7 ) array_sort nums -n # numeric sort printf '%s\n' "${nums[@]}" # 3 7 10 45 200
Sort by a derived key (Schwartzian transform in Bash)
# Sort files by their size without calling stat repeatedly sort_files_by_size() { local -n __arr="$1" local -a __tagged __sorted local __f # Tag each filename with its size: "SIZE\tFILENAME" for __f in "${__arr[@]}"; do __tagged+=( "$(stat -c '%s' "$__f" 2>/dev/null || echo 0)$'\t'$__f" ) done # Sort numerically, then strip the tag mapfile -t __sorted < <( printf '%s\n' "${__tagged[@]}" | sort -n | cut -f2 ) __arr=( "${__sorted[@]}" ) } files=( /etc/hostname /etc/passwd /etc/shells ) sort_files_by_size files printf '%s\n' "${files[@]}"
Pure-Bash insertion sort (no fork)
array_isort() { # In-place insertion sort — O(n²), use only for small arrays (<100 elements) local -n __a="$1" local __i __j __key for (( __i=1; __i < "${#__a[@]}"; __i++ )); do __key="${__a[$__i]}" __j=$(( __i - 1 )) while (( __j >= 0 )) && [[ "${__a[$__j]}" > "$__key" ]]; do __a[$(( __j + 1 ))]="${__a[$__j]}" (( __j-- )) || true done __a[$(( __j + 1 ))]="$__key" done } words=( banana apple cherry date elderberry ) array_isort words printf '%s\n' "${words[@]}" # apple banana cherry date elderberry
6 — Array Serialisation and Deserialisation
Arrays exist only within a process. To pass them across process boundaries — into subshells, to other scripts, or into files — you must serialise them. There is no single universally correct format; choose based on what the data can contain.
Method 1: declare -p round-trip
# declare -p produces valid Bash syntax that can be re-sourced arr=( "hello world" "foo's bar" 'line1\nline2' ) declare -p arr declare -a arr=([0]="hello world" [1]="foo's bar" [2]=$'line1\\nline2') # Save to file and restore in another script declare -p arr > /tmp/arr.bash # In another script: source /tmp/arr.bash # arr is now restored with identical contents # For associative arrays — identical pattern declare -A map=( [key1]=val1 [key2]="with spaces" ) declare -p map > /tmp/map.bash
source serialised data from an untrusted origin.
declare -p output is executable shell code.
Use NUL-delimited or JSON serialisation instead when the data comes from outside.
Method 2: NUL-delimited (safe for arbitrary content)
# Serialise: print each element followed by NUL array_save_nul() { local -n __a="$1" local __file="$2" printf '%s\0' "${__a[@]}" > "$__file" } # Deserialise: read NUL-delimited elements array_load_nul() { local -n __a="$1" local __file="$2" __a=() while IFS= read -r -d '' __elem; do __a+=( "$__elem" ) done < "$__file" } data=( "element with spaces" $'\twith\ttabs' "line1 line2" ) array_save_nul data /tmp/data.nul declare -a restored array_load_nul restored /tmp/data.nul printf '[%s]\n' "${restored[@]}" # identical to data
Method 3: passing arrays across subshells via environment
# Subshells inherit variables but NOT array contents directly. # Use declare -p to export, then eval to import. export_array() { local name="$1" local -n __ref="$1" # Export as a string variable containing the declare -p output export "_EXPORTED_${name}"="$(declare -p "$name")" } import_array() { local name="$1" local varname="_EXPORTED_${name}" eval "${!varname}" # safe: we control the source (declare -p output) } myarr=( one two three ) export_array myarr # In a child process (subshell, bash -c, etc.): bash -c 'source /dev/stdin <<<"${_EXPORTED_myarr}"; printf "%s\n" "${myarr[@]}"'
7 — Stack and Queue Patterns
# Stack (LIFO) — push to end, pop from end stack=() stack_push() { local -n _s="$1"; _s+=( "$2" ); } stack_pop() { local -n _s="$1" (( "${#_s[@]}" > 0 )) || { echo "stack empty" >&2; return 1; } # Return value via a nameref local -n _retval="$2" _retval="${_s[-1]}" unset '_s[-1]' } stack_peek() { local -n _s="$1" _r="$2"; _r="${_s[-1]}"; } stack_push stack alpha stack_push stack beta stack_push stack gamma declare top stack_pop stack top; echo "popped: $top" # gamma stack_pop stack top; echo "popped: $top" # beta # Queue (FIFO) — push to end, dequeue from front queue=() queue_enqueue() { local -n _q="$1"; _q+=( "$2" ); } queue_dequeue() { local -n _q="$1" _r="$2" (( "${#_q[@]}" > 0 )) || { echo "queue empty" >&2; return 1; } _r="${_q[0]}" _q=( "${_q[@]:1}" ) # slice off first element — O(n) } # Note: queue_dequeue is O(n) due to the array re-assignment. # For high-throughput queues, track a head index instead: _queue_head=0 queue_dequeue_fast() { local -n _q="$1" _r="$2" (( _queue_head < "${#_q[@]}" )) || { echo "queue empty" >&2; return 1; } _r="${_q[$_queue_head]}" unset '_q[$_queue_head]' (( _queue_head++ )) || true }
Exercises
Exercise 1 — Frequency map and top-N
Write a function top_n RETARRAY N FILE that reads words from FILE
(one per line), counts their frequency using an associative array, and stores
the top N words (sorted by descending count) in RETARRAY as elements of the form
COUNT:WORD. Use only Bash built-ins plus sort — no awk,
uniq, or cut.
top_n() { local -n __ret="$1" local __n="$2" local __file="$3" declare -A __freq local __word [[ -f "$__file" ]] || { printf 'top_n: file not found: %s\n' "$__file" >&2; return 1; } while IFS= read -r __word; do [[ -z "$__word" ]] && continue (( __freq["$__word"]++ )) || true done < "$__file" # Build "COUNT WORD" lines, sort numerically descending, take top N local -a __lines for __word in "${!__freq[@]}"; do __lines+=( "${__freq[$__word]} $__word" ) done mapfile -t __ret < <( printf '%s\n' "${__lines[@]}" | sort -rn | head -n "$__n" | while read -r __cnt __w; do printf '%s:%s\n' "$__cnt" "$__w" done ) } # Test declare -a result top_n result 5 /usr/share/dict/words # or any word-per-line file printf '%s\n' "${result[@]}"
Exercise 2 — Symmetric difference and subset test
Using the set-operation functions from section 4 as a starting point, implement:
array_symdiff A B RESULT— elements in A or B but not botharray_is_subset A B— returns 0 if every element of A is in B, else 1
Then demonstrate both with a server-role example:
desired=(nginx certbot logrotate) and
installed=(nginx curl wget logrotate).
array_symdiff() { local -n __a="$1" __b="$2" __out="$3" declare -A __in_a __in_b local __e __out=() for __e in "${__a[@]}"; do __in_a["$__e"]=1; done for __e in "${__b[@]}"; do __in_b["$__e"]=1; done for __e in "${__a[@]}"; do [[ ! -v '__in_b[$__e]' ]] && __out+=( "$__e" ) done for __e in "${__b[@]}"; do [[ ! -v '__in_a[$__e]' ]] && __out+=( "$__e" ) done } array_is_subset() { local -n __a="$1" __b="$2" declare -A __in_b local __e for __e in "${__b[@]}"; do __in_b["$__e"]=1; done for __e in "${__a[@]}"; do [[ -v '__in_b[$__e]' ]] || return 1 done return 0 } # Demonstration desired=( nginx certbot logrotate ) installed=( nginx curl wget logrotate ) declare -a diff array_symdiff desired installed diff printf 'Symmetric diff: %s\n' "${diff[@]}" # certbot curl wget if array_is_subset desired installed; then echo "All desired packages are installed" else echo "Some desired packages are missing" declare -a missing array_diff desired installed missing # from section 4 printf 'Install: %s\n' "${missing[@]}" fi
Exercise 3 — Serialisation round-trip
Write two functions array_to_json ARRNAME and
json_to_array JSON ARRNAME that serialise an indexed array to a
JSON array string and back. Requirements:
- Values may contain double quotes, backslashes, and newlines — escape them correctly
- No external tools except
printf(for serialisation); usejqonly for deserialisation - Demonstrate a round-trip: original array → JSON string → restored array, verifying each element matches
array_to_json() { local -n __a="$1" local __out='[' __sep='' __e __escaped for __e in "${__a[@]}"; do # Escape: backslash first, then double-quote, then control chars __escaped="${__e//\\/\\\\}" # \ → \\ __escaped="${__escaped//\"/\\\"}" # " → \" # Replace literal newlines and tabs with JSON escapes __escaped="${__escaped//$'\n'/\\n}" __escaped="${__escaped//$'\t'/\\t}" __out+="${__sep}\"${__escaped}\"" __sep=',' done printf '%s]\n' "$__out" } json_to_array() { local __json="$1" local -n __a="$2" # jq -r outputs one element per line; handle newlines in values via NUL mapfile -t -d '' __a < <(printf '%s' "$__json" | jq -r '.[] | . + " "') # Strip the trailing empty element mapfile adds [[ "${__a[-1]}" == '' ]] && unset '__a[-1]' } # Round-trip test original=( "simple" 'with "quotes"' $'with\nnewline' $'back\\slash' ) json=$(array_to_json original) echo "JSON: $json" declare -a restored json_to_array "$json" restored ok=true for (( i=0; i < "${#original[@]}"; i++ )); do if [[ "${original[$i]}" != "${restored[$i]}" ]]; then printf 'MISMATCH at [%d]\n orig: %q\n rest: %q\n' \ "$i" "${original[$i]}" "${restored[$i]}" ok=false fi done $ok && echo "Round-trip OK"
Exercise 4 — Dependency resolver
Using the adjacency-list pattern from section 2, implement a topological sort
function topo_sort GRAPH_PREFIX NODES_ARRAY RESULT_ARRAY that:
- Takes a name prefix for adjacency arrays (e.g.
deps_so thatdeps_nginx=(openssl pcre)means nginx depends on openssl and pcre) - Performs a depth-first topological sort (dependencies before dependents)
- Detects cycles and prints an error, returning exit code 1
- Stores the sorted install order in RESULT_ARRAY
topo_sort() { local __prefix="$1" local -n __nodes="$2" __result="$3" declare -A __visited # 0=unvisited 1=in-progress 2=done __result=() __dfs() { local __node="$1" case "${__visited[$__node]:-0}" in 2) return 0 ;; # already processed 1) printf 'topo_sort: cycle at %s\n' "$__node" >&2; return 1 ;; esac __visited["$__node"]=1 # mark in-progress # Visit dependencies first local __depname="${__prefix}${__node}" if [[ -v "$__depname" ]]; then local -n __deps="$__depname" local __dep for __dep in "${__deps[@]}"; do __dfs "$__dep" || return 1 done fi __visited["$__node"]=2 __result+=( "$__node" ) } local __n for __n in "${__nodes[@]}"; do __dfs "$__n" || return 1 done } # Define dependency graph via name-prefix arrays deps_nginx=( openssl pcre zlib ) deps_certbot=( openssl python3 ) deps_python3=( zlib ) deps_openssl=( zlib ) deps_zlib=() deps_pcre=() packages=( nginx certbot ) declare -a install_order topo_sort deps_ packages install_order printf 'Install order:\n' printf ' %s\n' "${install_order[@]}" # zlib openssl pcre nginx python3 certbot