Advanced Array Patterns

Chapter 3 — Advanced Array Patterns

Bash arrays go far deeper than iteration and length. This chapter covers the patterns that turn arrays into real data structures: sparse indexing, simulated multidimensional layouts, associative-array techniques, set algebra, in-place sorting, and serialisation strategies that survive process boundaries. All examples assume Bash 4.2 or later.

1 — Indexed Array Internals: Sparse Arrays

Bash indexed arrays are sparse — indices need not be contiguous, and a large highest index does not allocate intermediate memory. This is different from arrays in most languages and enables some useful patterns but also some surprising traps.

# Sparse assignment — gaps are fine
arr[0]=first
arr[100]=hundredth
echo "${#arr[@]}"    # 2 — counts ELEMENTS, not highest index
echo "${!arr[@]}"   # 0 100 — the actual indices (keys)

# Length vs highest index trap
arr=( a b c )
arr[99]=z
echo "${#arr[@]}"    # 4  — four elements
echo "${arr[-1]}"    # z  — negative indices count from highest key (bash 4.3+)

# Iterating sparse arrays safely — always use ${!arr[@]}
for i in "${!arr[@]}"; do
  printf '[%d] = %s\n' "$i" "${arr[$i]}"
done

# Dangerous: for (( i=0; i<${#arr[@]}; i++ )) skips sparse gaps
# Safe: iterate keys with ${!arr[@]}

# Compact a sparse array back to contiguous 0-based
compact=( "${arr[@]}" )   # re-index as 0,1,2,... — values preserved, gaps removed

Unsetting and re-packing

arr=( a b c d e )
unset 'arr[2]'          # removes index 2 — array is now sparse: 0,1,3,4
echo "${arr[@]}"        # a b d e  — gap is invisible in values
echo "${!arr[@]}"       # 0 1 3 4

# Remove element by value — filter into a new array
array_remove_value() {
  local -n __arr="$1"
  local __target="$2"
  local -a __new=()
  local __elem
  for __elem in "${__arr[@]}"; do
    [[ $__elem != "$__target" ]] && __new+=( "$__elem" )
  done
  __arr=( "${__new[@]}" )
}

fruits=( apple banana cherry banana )
array_remove_value fruits banana
printf '%s\n' "${fruits[@]}"   # apple  cherry

2 — Simulating 2D Arrays

Bash has no multidimensional arrays. The standard workaround encodes two indices into a single key using a separator that cannot appear in the data.

Associative array with compound key

declare -A grid

# Store grid[row,col] using comma as separator
grid_set() {
  local -n __g="$1"
  __g["${2},${3}"]="$4"
}
grid_get() {
  local -n __g="$1"
  printf '%s' "${__g["${2},${3}"]}"
}

grid_set grid 0 0 'top-left'
grid_set grid 0 1 'top-right'
grid_set grid 1 0 'bottom-left'

grid_get grid 0 1   # top-right

# Iterate all cells: parse the compound key
for key in "${!grid[@]}"; do
  row="${key%%,*}"
  col="${key##*,}"
  printf 'grid[%s][%s] = %s\n' "$row" "$col" "${grid[$key]}"
done

Maps of arrays via name-prefix convention

When you need a true map-of-arrays (e.g. a graph's adjacency list), use a naming convention to synthesise array names dynamically.

# Adjacency list: graph_edges_NODE is an array of neighbours
graph_add_edge() {
  local from="$1" to="$2"
  local arrname="graph_edges_${from}"
  declare -ga "$arrname"    # declare global array if not yet existing
  local -n __edges="$arrname"
  __edges+=( "$to" )
}

graph_neighbours() {
  local arrname="graph_edges_${1}"
  local -n __edges="$arrname"
  printf '%s\n' "${__edges[@]}"
}

graph_add_edge A B
graph_add_edge A C
graph_add_edge B D
graph_neighbours A    # B  C
graph_neighbours B    # D

3 — Associative Array Techniques

Default values and the `-v` test

declare -A counts

# Increment a counter — works even if key doesn't exist yet
counter_inc() {
  local -n __map="$1"
  local __key="$2"
  (( __map["$__key"]++ )) || true   # || true: (()) exits 1 when result is 0
}

while IFS= read -r word; do
  counter_inc counts "$word"
done < words.txt

# Print sorted by count descending
for word in "${!counts[@]}"; do
  printf '%d\t%s\n' "${counts[$word]}" "$word"
done | sort -rn

# Test if a key exists (vs testing if value is empty)
[[ -v 'counts[hello]' ]] && echo "key exists"
# Note: quote the subscript to prevent glob expansion of the key

Copying and merging associative arrays

# Deep copy — no built-in; iterate and assign
assoc_copy() {
  local -n __src="$1" __dst="$2"
  local __k
  for __k in "${!__src[@]}"; do
    __dst["$__k"]="${__src[$__k]}"
  done
}

# Merge b into a — b wins on key conflicts
assoc_merge() {
  assoc_copy "$2" "$1"   # copy src (b) into dst (a) — overwrites existing keys
}

declare -A defaults=( [color]=blue [size]=medium [weight]=light )
declare -A overrides=( [color]=red [extra]=yes )
declare -A config
assoc_copy  defaults  config
assoc_merge config    overrides
# config: color=red size=medium weight=light extra=yes

Inverting a map

assoc_invert() {
  local -n __src="$1" __dst="$2"
  local __k
  for __k in "${!__src[@]}"; do
    __dst["${__src[$__k]}"]="$__k"
  done
}

declare -A code_to_name=( [US]='United States' [DE]=Germany [JP]=Japan )
declare -A name_to_code
assoc_invert code_to_name name_to_code
echo "${name_to_code[Germany]}"   # DE

4 — Set Operations

Treating arrays as mathematical sets — union, intersection, and difference — is a common need when comparing lists of servers, features, or packages.

Union

array_union() {
  # Stores union of arrays $1 and $2 into array $3
  local -n __a="$1" __b="$2" __out="$3"
  declare -A __seen
  local __e
  __out=()
  for __e in "${__a[@]}" "${__b[@]}"; do
    if [[ ! -v '__seen[$__e]' ]]; then
      __out+=( "$__e" )
      __seen["$__e"]=1
    fi
  done
}

Intersection

array_intersect() {
  local -n __a="$1" __b="$2" __out="$3"
  declare -A __in_b
  local __e
  __out=()
  for __e in "${__b[@]}"; do __in_b["$__e"]=1; done
  for __e in "${__a[@]}"; do
    [[ -v '__in_b[$__e]' ]] && __out+=( "$__e" )
  done
}

Difference (A minus B)

array_diff() {
  local -n __a="$1" __b="$2" __out="$3"
  declare -A __in_b
  local __e
  __out=()
  for __e in "${__b[@]}"; do __in_b["$__e"]=1; done
  for __e in "${__a[@]}"; do
    [[ ! -v '__in_b[$__e]' ]] && __out+=( "$__e" )
  done
}

# Demonstration
installed=( curl wget git vim tmux )
required=( git vim jq fzf )

declare -a missing common all_tools
array_diff      required  installed  missing
array_intersect required  installed  common
array_union     required  installed  all_tools

printf 'Missing: %s\n'   "${missing[@]}"    # jq  fzf
printf 'Common:  %s\n'   "${common[@]}"     # git  vim
printf 'All:     %s\n'   "${all_tools[@]}"  # git  vim  jq  fzf  curl  wget  tmux

Using a temporary declare -A inside a function: the -A inside the set-operation functions is local because declare inside a function creates a local variable by default. The -g flag would make it global — do not add it here.

5 — In-Place Sorting

Bash provides no built-in sort for arrays. The idiomatic solution routes values through the external sort command; for pure-Bash sorting (no fork), use a simple insertion sort or use readarray + sort.

Sort via `sort` command — simple and fast

array_sort() {
  local -n __arr="$1"; shift
  # Pass remaining args to sort (e.g. -r -n -u)
  local -a __sorted
  mapfile -t __sorted < <(printf '%s\n' "${__arr[@]}" | sort "$@")
  __arr=( "${__sorted[@]}" )
}

names=( Charlie Alice Bob Delta Alice )
array_sort names           # ascending lexicographic
printf '%s\n' "${names[@]}"
# Alice Alice Bob Charlie Delta

array_sort names -ru        # reverse, unique
printf '%s\n' "${names[@]}"
# Delta Charlie Bob Alice

nums=( 10 3 200 45 7 )
array_sort nums -n          # numeric sort
printf '%s\n' "${nums[@]}"
# 3 7 10 45 200

Sort by a derived key (Schwartzian transform in Bash)

# Sort files by their size without calling stat repeatedly
sort_files_by_size() {
  local -n __arr="$1"
  local -a __tagged __sorted
  local __f
  # Tag each filename with its size: "SIZE\tFILENAME"
  for __f in "${__arr[@]}"; do
    __tagged+=( "$(stat -c '%s' "$__f" 2>/dev/null || echo 0)$'\t'$__f" )
  done
  # Sort numerically, then strip the tag
  mapfile -t __sorted < <(
    printf '%s\n' "${__tagged[@]}" | sort -n | cut -f2
  )
  __arr=( "${__sorted[@]}" )
}

files=( /etc/hostname /etc/passwd /etc/shells )
sort_files_by_size files
printf '%s\n' "${files[@]}"

Pure-Bash insertion sort (no fork)

array_isort() {
  # In-place insertion sort — O(n²), use only for small arrays (<100 elements)
  local -n __a="$1"
  local __i __j __key
  for (( __i=1; __i < "${#__a[@]}"; __i++ )); do
    __key="${__a[$__i]}"
    __j=$(( __i - 1 ))
    while (( __j >= 0 )) && [[ "${__a[$__j]}" > "$__key" ]]; do
      __a[$(( __j + 1 ))]="${__a[$__j]}"
      (( __j-- )) || true
    done
    __a[$(( __j + 1 ))]="$__key"
  done
}

words=( banana apple cherry date elderberry )
array_isort words
printf '%s\n' "${words[@]}"
# apple banana cherry date elderberry

6 — Array Serialisation and Deserialisation

Arrays exist only within a process. To pass them across process boundaries — into subshells, to other scripts, or into files — you must serialise them. There is no single universally correct format; choose based on what the data can contain.

Method 1: `declare -p` round-trip

# declare -p produces valid Bash syntax that can be re-sourced
arr=( "hello world" "foo's bar" 'line1\nline2' )
declare -p arr
declare -a arr=([0]="hello world" [1]="foo's bar" [2]=$'line1\\nline2')

# Save to file and restore in another script
declare -p arr > /tmp/arr.bash
# In another script:
source /tmp/arr.bash
# arr is now restored with identical contents

# For associative arrays — identical pattern
declare -A map=( [key1]=val1 [key2]="with spaces" )
declare -p map > /tmp/map.bash

Security: never source serialised data from an untrusted origin. declare -p output is executable shell code. Use NUL-delimited or JSON serialisation instead when the data comes from outside.

Method 2: NUL-delimited (safe for arbitrary content)

# Serialise: print each element followed by NUL
array_save_nul() {
  local -n __a="$1"
  local __file="$2"
  printf '%s\0' "${__a[@]}" > "$__file"
}

# Deserialise: read NUL-delimited elements
array_load_nul() {
  local -n __a="$1"
  local __file="$2"
  __a=()
  while IFS= read -r -d '' __elem; do
    __a+=( "$__elem" )
  done < "$__file"
}

data=( "element with spaces" $'\twith\ttabs' "line1
line2" )
array_save_nul data /tmp/data.nul

declare -a restored
array_load_nul restored /tmp/data.nul
printf '[%s]\n' "${restored[@]}"   # identical to data

Method 3: passing arrays across subshells via environment

# Subshells inherit variables but NOT array contents directly.
# Use declare -p to export, then eval to import.

export_array() {
  local name="$1"
  local -n __ref="$1"
  # Export as a string variable containing the declare -p output
  export "_EXPORTED_${name}"="$(declare -p "$name")"
}

import_array() {
  local name="$1"
  local varname="_EXPORTED_${name}"
  eval "${!varname}"   # safe: we control the source (declare -p output)
}

myarr=( one two three )
export_array myarr
# In a child process (subshell, bash -c, etc.):
bash -c 'source /dev/stdin <<<"${_EXPORTED_myarr}"; printf "%s\n" "${myarr[@]}"'

7 — Stack and Queue Patterns

# Stack (LIFO) — push to end, pop from end
stack=()

stack_push() { local -n _s="$1"; _s+=( "$2" ); }
stack_pop()  {
  local -n _s="$1"
  (( "${#_s[@]}" > 0 )) || { echo "stack empty" >&2; return 1; }
  # Return value via a nameref
  local -n _retval="$2"
  _retval="${_s[-1]}"
  unset '_s[-1]'
}
stack_peek() { local -n _s="$1" _r="$2"; _r="${_s[-1]}"; }

stack_push stack alpha
stack_push stack beta
stack_push stack gamma
declare top
stack_pop  stack top; echo "popped: $top"   # gamma
stack_pop  stack top; echo "popped: $top"   # beta

# Queue (FIFO) — push to end, dequeue from front
queue=()
queue_enqueue() { local -n _q="$1"; _q+=( "$2" ); }
queue_dequeue() {
  local -n _q="$1" _r="$2"
  (( "${#_q[@]}" > 0 )) || { echo "queue empty" >&2; return 1; }
  _r="${_q[0]}"
  _q=( "${_q[@]:1}" )   # slice off first element — O(n)
}

# Note: queue_dequeue is O(n) due to the array re-assignment.
# For high-throughput queues, track a head index instead:
_queue_head=0
queue_dequeue_fast() {
  local -n _q="$1" _r="$2"
  (( _queue_head < "${#_q[@]}" )) || { echo "queue empty" >&2; return 1; }
  _r="${_q[$_queue_head]}"
  unset '_q[$_queue_head]'
  (( _queue_head++ )) || true
}

Exercises

Exercise 1 — Frequency map and top-N

Write a function top_n RETARRAY N FILE that reads words from FILE (one per line), counts their frequency using an associative array, and stores the top N words (sorted by descending count) in RETARRAY as elements of the form COUNT:WORD. Use only Bash built-ins plus sort — no awk, uniq, or cut.

top_n() {
  local -n __ret="$1"
  local __n="$2"
  local __file="$3"
  declare -A __freq
  local __word

  [[ -f "$__file" ]] || { printf 'top_n: file not found: %s\n' "$__file" >&2; return 1; }

  while IFS= read -r __word; do
    [[ -z "$__word" ]] && continue
    (( __freq["$__word"]++ )) || true
  done < "$__file"

  # Build "COUNT WORD" lines, sort numerically descending, take top N
  local -a __lines
  for __word in "${!__freq[@]}"; do
    __lines+=( "${__freq[$__word]} $__word" )
  done

  mapfile -t __ret < <(
    printf '%s\n' "${__lines[@]}" |
    sort -rn |
    head -n "$__n" |
    while read -r __cnt __w; do
      printf '%s:%s\n' "$__cnt" "$__w"
    done
  )
}

# Test
declare -a result
top_n result 5 /usr/share/dict/words   # or any word-per-line file
printf '%s\n' "${result[@]}"

Exercise 2 — Symmetric difference and subset test

Using the set-operation functions from section 4 as a starting point, implement:

array_symdiff A B RESULT — elements in A or B but not both
array_is_subset A B — returns 0 if every element of A is in B, else 1

Then demonstrate both with a server-role example: desired=(nginx certbot logrotate) and installed=(nginx curl wget logrotate).

array_symdiff() {
  local -n __a="$1" __b="$2" __out="$3"
  declare -A __in_a __in_b
  local __e
  __out=()
  for __e in "${__a[@]}"; do __in_a["$__e"]=1; done
  for __e in "${__b[@]}"; do __in_b["$__e"]=1; done
  for __e in "${__a[@]}"; do
    [[ ! -v '__in_b[$__e]' ]] && __out+=( "$__e" )
  done
  for __e in "${__b[@]}"; do
    [[ ! -v '__in_a[$__e]' ]] && __out+=( "$__e" )
  done
}

array_is_subset() {
  local -n __a="$1" __b="$2"
  declare -A __in_b
  local __e
  for __e in "${__b[@]}"; do __in_b["$__e"]=1; done
  for __e in "${__a[@]}"; do
    [[ -v '__in_b[$__e]' ]] || return 1
  done
  return 0
}

# Demonstration
desired=( nginx certbot logrotate )
installed=( nginx curl wget logrotate )

declare -a diff
array_symdiff desired installed diff
printf 'Symmetric diff: %s\n' "${diff[@]}"
# certbot  curl  wget

if array_is_subset desired installed; then
  echo "All desired packages are installed"
else
  echo "Some desired packages are missing"
  declare -a missing
  array_diff desired installed missing   # from section 4
  printf 'Install: %s\n' "${missing[@]}"
fi

Exercise 3 — Serialisation round-trip

Write two functions array_to_json ARRNAME and json_to_array JSON ARRNAME that serialise an indexed array to a JSON array string and back. Requirements:

Values may contain double quotes, backslashes, and newlines — escape them correctly
No external tools except printf (for serialisation); use jq only for deserialisation
Demonstrate a round-trip: original array → JSON string → restored array, verifying each element matches

array_to_json() {
  local -n __a="$1"
  local __out='[' __sep='' __e __escaped
  for __e in "${__a[@]}"; do
    # Escape: backslash first, then double-quote, then control chars
    __escaped="${__e//\\/\\\\}"         # \ → \\
    __escaped="${__escaped//\"/\\\"}"   # " → \"
    # Replace literal newlines and tabs with JSON escapes
    __escaped="${__escaped//$'\n'/\\n}"
    __escaped="${__escaped//$'\t'/\\t}"
    __out+="${__sep}\"${__escaped}\""
    __sep=','
  done
  printf '%s]\n' "$__out"
}

json_to_array() {
  local __json="$1"
  local -n __a="$2"
  # jq -r outputs one element per line; handle newlines in values via NUL
  mapfile -t -d '' __a < <(printf '%s' "$__json" | jq -r '.[] | . + " "')
  # Strip the trailing empty element mapfile adds
  [[ "${__a[-1]}" == '' ]] && unset '__a[-1]'
}

# Round-trip test
original=(
  "simple"
  'with "quotes"'
  $'with\nnewline'
  $'back\\slash'
)

json=$(array_to_json original)
echo "JSON: $json"

declare -a restored
json_to_array "$json" restored

ok=true
for (( i=0; i < "${#original[@]}"; i++ )); do
  if [[ "${original[$i]}" != "${restored[$i]}" ]]; then
    printf 'MISMATCH at [%d]\n  orig: %q\n  rest: %q\n' \
      "$i" "${original[$i]}" "${restored[$i]}"
    ok=false
  fi
done
$ok && echo "Round-trip OK"

Exercise 4 — Dependency resolver

Using the adjacency-list pattern from section 2, implement a topological sort function topo_sort GRAPH_PREFIX NODES_ARRAY RESULT_ARRAY that:

Takes a name prefix for adjacency arrays (e.g. deps_ so that deps_nginx=(openssl pcre) means nginx depends on openssl and pcre)
Performs a depth-first topological sort (dependencies before dependents)
Detects cycles and prints an error, returning exit code 1
Stores the sorted install order in RESULT_ARRAY

topo_sort() {
  local __prefix="$1"
  local -n __nodes="$2" __result="$3"
  declare -A __visited   # 0=unvisited 1=in-progress 2=done
  __result=()

  __dfs() {
    local __node="$1"
    case "${__visited[$__node]:-0}" in
      2) return 0 ;;                          # already processed
      1) printf 'topo_sort: cycle at %s\n' "$__node" >&2; return 1 ;;
    esac
    __visited["$__node"]=1   # mark in-progress

    # Visit dependencies first
    local __depname="${__prefix}${__node}"
    if [[ -v "$__depname" ]]; then
      local -n __deps="$__depname"
      local __dep
      for __dep in "${__deps[@]}"; do
        __dfs "$__dep" || return 1
      done
    fi

    __visited["$__node"]=2
    __result+=( "$__node" )
  }

  local __n
  for __n in "${__nodes[@]}"; do
    __dfs "$__n" || return 1
  done
}

# Define dependency graph via name-prefix arrays
deps_nginx=( openssl pcre zlib )
deps_certbot=( openssl python3 )
deps_python3=( zlib )
deps_openssl=( zlib )
deps_zlib=()
deps_pcre=()

packages=( nginx certbot )
declare -a install_order
topo_sort deps_ packages install_order
printf 'Install order:\n'
printf '  %s\n' "${install_order[@]}"
# zlib openssl pcre nginx python3 certbot