WARN Monitor — System Architecture

California layoff-tracking data pipeline · serverless & git-native
Architecture · System Design · Workflow

How California WARN notices become a live dashboard

A fully automated, server-less pipeline that watches the state's Employment Development Department feed, detects genuinely new layoff filings, renders interactive charts, emails subscribers, and self-publishes to GitHub Pages — twice a day, with no database and no backend.

0
Pipeline stages
0
Interactive charts
0
Runs per day
0
History since
0
Data artifacts
0
Servers to run

🏛️ System architecture

Six layers, from the upstream EDD source down to the people who read the data — plus the self-contained email-signup loop on the right.
📄
EDD WARN XLSX
live spreadsheet · source of truth
📚
EDD WARN PDFs
historical 2014 → present
GitHub Actions
cron 00:00 & 12:00 UTC · manual
🧩
warn_publish.py — orchestrator
runs monitor → diff → history → charts → build → notify
Data layer · git-versioned JSON (no database)
meta.json warn_latest.json warn_snapshot.json warn_cumulative.json notified_keys.json warn_all_years.json changelog.jsonl charts_manifest.json diff_report.md
🌐
GitHub Pages
docs/index.html dashboard
{ }
data.json
public read-only API
✉️
warn_notify
Gmail SMTP · To + BCC
👀
Public viewers
browse charts & table
📥
Subscribers
email alerts on new filings
🧾
Apps Script Web App
subscribe.gs endpoint
📊
Google Sheet
{timestamp, name, email}
Source Scheduler Engine Data store Output Service Consumer

⚙️ The 5-stage pipeline

Press play to walk a single run through each stage — exactly the sequence warn_publish.run() executes.
📄
CA EDD
XLSX + PDFs
📥
warn_monitor
download · parse · detect
1
🔍
warn_diff
report + changelog
2
🗂️
warn_history
merge 2014→now
3
📈
warn_charts
11 Plotly charts
4
🚀
build_site
assemble + push
5
🌐
index.html
GitHub Pages
{ }
data.json
public API
✉️
warn_notify
if new > 0
📥
Subscribers
BCC ≤ 90 / batch
Idle — press ▶ Play pipeline to trace a run.

🔀 Data flow & change detection

How a downloaded spreadsheet becomes durable, churn-proof data — the cumulative store and the notified-keys ledger are the two anti-fragility tricks.
🌍
EDD server
HTTP GET
🏷️
meta.json
ETag · Last-Modified · hash
⬇️
download_xlsx()
304? skip · else fetch
🧮
parse_warn_xlsx()
2 sheet formats · normalise
🚦
detect_changes()
new vs ledger only
📒
notified_keys.json
cumulative alert ledger
🟢
warn_latest.json
current EDD contents
📸
warn_snapshot.json
previous run (rotated)
♻️
update_cumulative()
union · latest wins
🗃️
warn_cumulative.json
every notice ever seen
🌐
Dashboard
KPIs + table
📈
Charts
Plotly divs
🗂️
warn_all_years.json
+ historical PDFs
Network Pure logic Persisted store Rendered output

🗓️ CI workflow — one scheduled run

The GitHub Actions job monitor.yml, end to end, including the two decision branches that gate email + commit.
Trigger — cron 0 0,12 * * *
twice daily UTC · or workflow_dispatch
📦
Checkout (fetch-depth 0)
full history · GH_REPO_TOKEN
🐍
Setup Python 3.12 + deps
pip cache · black · pytest
🧪
pytest -v --cov
logic tests gate the run
⚙️
warn_publish.py --no-push
the 5-stage pipeline
🛑
304 Not Modified?
rebuild from cumulative store
📧
new notices > 0?
email → record keys on success
💾
git commit & push (if diff)
"auto: WARN data update [skip ci]"
🚀
GitHub Pages redeploys
dashboard live in ~1 min
Trigger Build step Gate / test Publish
Key design decisions
🏷️

ETag / 304 caching

Every fetch sends If-None-Match + If-Modified-Since from meta.json. A 304 short-circuits the download — bandwidth and churn avoided while the dashboard still rebuilds from local data.

📒

Anti-churn alert ledger

The EDD feed flip-flops its record count between two versions across consecutive fetches. "New" is measured against the cumulative notified_keys.json ledger — not the prior run — so each notice can fire at most one email, ever.

🗃️

Cumulative store

EDD re-exports silently drop earlier notices. warn_cumulative.json is a union of every notice ever observed (latest wins on conflict), so a filing never vanishes from the dashboard once it has appeared.

✉️

Send-then-record + BCC batching

Keys are recorded only after the email actually sends, so a failed send retries next run instead of being lost. Subscribers are BCC'd in batches of ≤ 90 to stay under Gmail's per-message recipient cap.

🗄️

Git as the database

No DB, no server. State lives in versioned JSON committed back to the repo; every run is an auditable diff. [skip ci] on auto-commits prevents infinite workflow loops.

🧩

Single-file deployable site

Charts are pre-rendered to self-contained Plotly divs and inlined into one index.html — the dashboard is a static asset GitHub Pages serves with zero runtime dependencies.