First receipt
Make a real decision in Studio (e.g., approve a fine-tune). Watch the receipt land. Hash it. Verify the signature with the public key.
Evidence: one signed receipt, ready to hand off.
You're past the 30-minute story. Now you get the Suitcase Lab — the fastest path to a working Beacon — the nine 20-minute lab variants that compose a half-day workshop, the Policy-as-Code emission that turns checklists into versioned controls, the receipt API for wiring your own AI calls into Beacon, and the full 100-case failure deep-dive with framework cross-links.
A laptop-first Beacon stack. Beacon core, Studio, synth-traffic replay, and an optional mock DNS service run together through Docker Compose. Studio shows live receipts replayed from the 100 real AI failure cases.
# 1. One-liner git clone https://github.com/bobrapp/aigovops-beacon cd aigovops-beacon docker compose -f deploy/lab.yml up -d open http://localhost:8788 # 2. Sanity check curl -s http://localhost:8787/healthz # {"ok":true,"version":"2.2.0","receipts":47,"keys":1}
| Container | Port | Role |
|---|---|---|
| beacon | 8787 | Receipt ingest, signing key, inventory |
| studio | 8788 | Web UI — see receipts, score frameworks, export bundles |
| synth-traffic | — | Replays docs/data/ai_failures_top100.json as DNS/SNI events |
| mock-dns | 5353/udp | Optional. Pretends to be a corp resolver; logs queries |
Studio is now showing receipts replayed live. Click any receipt → Verify → green check.
Each variant is a self-contained block. Pick four for a 90-minute workshop. Pick all nine for a half-day. The 30-minute Level 100 flow maps to Labs 1, 4, and 7. Labs 8 and 9 are v2.3 additions.
Make a real decision in Studio (e.g., approve a fine-tune). Watch the receipt land. Hash it. Verify the signature with the public key.
Evidence: one signed receipt, ready to hand off.
Load the unpacked extension. Visit chatgpt.com, claude.ai, gemini.google.com, copilot.microsoft.com. Discovery tab fills with hostnames.
Evidence: discovery rows tied to user identity.
Drop a DNS query CSV in the watched folder. Run the tail script. AI hostnames light up in Studio's Discovery tab.
Evidence: every AI hostname resolved by any device.
Pick a framework. Walk the 23-control checklist. Mark each ✓ / ✗ / N/A. Export the receipt bundle. Open the PDF.
Evidence: a signed bundle an auditor can reproduce.
Pick a case (e.g., iTutorGroup hiring discrimination, #4). Click Replay as decision. Beacon emits the receipt as if you'd reviewed it pre-deploy.
Evidence: counterfactual receipt — YES-Ship, Steady, or Recover?
Add the MCP config snippet to Claude Desktop. Six tools appear: record_decision, verify_receipt, query_inventory, score_framework, bundle_for_auditor, replay_case.
Evidence: every agent action gets an audit trail.
Pick a date range, framework, scope. Download the tarball. Hand it to a partner team. They run beacon verify bundle.tar.gz with no other access.
Evidence: portable verification — green check, no network.
Copy docs/walkthrough/ to USB. Open index.html in any browser — no network, no install. 12 screens animate the full flow. Use ←/→/Space.
Evidence: governance in a single click, no infrastructure.
Run the hosted MCP on a free Render dyno. Connect Claude Desktop to your MCP /sse URL. Ask the restricted Worker to bundle 30 days for an EU AI Act audit.
Evidence: an autonomous agent whose tools are exactly the six Beacon tools.
Beacon's Make it Policy as Code path emits gate YAML,
OPA .rego, a Governance Decision Record, and a signed
bundle — then opens a pull request against your governance repo.
Checklists become versioned controls. Meeting notes become artifacts.
kind: GovernanceGate
apiVersion: aigovops.foundation/v1
metadata:
name: high-risk-ai-release
spec:
framework: nist-ai-rmf
system: hr-screen-gpt
requires:
- human_approval
- signed_receipt
- evidence_bundle
block_on:
- missing_human_approval
- stale_receipt: 30d
emit:
- governance_decision_record
- signed_bundle
package aigovops.beacon.high_risk_ai_release
default allow := false
allow if {
input.framework == "nist-ai-rmf"
input.system == "hr-screen-gpt"
has_human_approval
has_signed_receipt
has_evidence_bundle
}
has_human_approval if {
some i
input.attestations[i].kind == "human_approval"
input.attestations[i].verified == true
}
has_signed_receipt if {
some i
input.receipts[i].signature_ed25519
input.receipts[i].verified == true
}
has_evidence_bundle if {
input.bundle.manifest_sha256
input.bundle.verify_passed == true
}
That YAML and Rego are the same artifact, in two languages. The YAML is what the governance team reviews. The Rego is what the CI/CD gate runs. The signed receipt is what the auditor verifies — months or years later — without needing Beacon installed.
Wire a real model invocation into Beacon by posting receipt data to
/api/v1/receipts with model metadata, environment,
prompt, result, latency, and token counts. The Beacon server signs
the receipt and appends it to the NDJSON log.
curl -s http://127.0.0.1:8787/api/v1/receipts \
-H "Content-Type: application/json" \
-H "X-Beacon-User-Sub: oidc|alice" \
-H "X-Beacon-User-Email: alice@example.org" \
-H "X-Beacon-OIDC-Issuer: https://accounts.example.org" \
-d '{
"vendor": "OpenAI",
"model": "gpt-4o-mini",
"version": "2024-07-18",
"environment": "production",
"event_type": "invocation",
"prompt": "summarize the policy",
"result": "…",
"latency_ms": 312,
"tokens": {"in": 21, "out": 47}
}'
Beacon canonicalizes the receipt (sorted keys, no whitespace, UTF-8) and signs the bytes with its per-instance Ed25519 key. The response includes the receipt sequence number, the entry hash, and the signature — same shape as the in-browser sandbox in Level 100.
The pattern that works in production:
/api/v1/receipts after each call — fire-and-forget, with a local retry queue if Beacon is unreachable.cd server node src/cli.js verify ~/.beacon/bundles/bundle-<timestamp>
Or, for the lab-style audit log that built this very page, no Node needed:
pip install cryptography python -m src.audit_log verify
Every public AI failure in the dataset, mapped to the framework controls and the Beacon artifact that would have caught it. Filter by act, framework, harm type, or sector. Click any row to expand. Each row's Beacon artifact hint cross-links back to the Level 200 section where that artifact lives.
Loading 100 cases…
Source: docs/data/ai_failures_top100.json. Methodology mirrors the home-page failures browser.
Pick five technical cases and five policy-facing cases from the deep-dive above. For each, identify the missing Beacon artifact and the framework controls that apply. Use the worksheet to capture your analysis — bring it to your team's next AI governance review.
Open printable worksheet ↗ Worksheet PDF ↓
| Audience | Best examples | Main prompt |
|---|---|---|
| Technical teams | Boeing 737 MAX, Tesla Autopilot, Knight Capital, Cruise robotaxi, deployment failures | Where would you instrument Beacon first? |
| Policy teams | Robodebt, UK A-levels, Apple Card, Detroit FR, iTutorGroup | What evidence should have existed before harm? |
1. Which command brings up the Suitcase Lab?
2. Which lab variant is the "shortest 30-minute workshop" path?
3. What does the "Make it Policy as Code" path emit?
4. The Receipt API endpoint is —
5. The Beacon artifact most likely to have caught Robodebt is —
6. Why is the offline USB walkthrough (Lab 8) useful?
You've completed Level 200 when you can say all four statements clearly.
You can ship Beacon yourself. Spin up the Suitcase Lab for stakeholders. Pick lab variants for any audience. Emit Policy-as-Code that turns checklists into versioned controls. And cross-reference any failure to the Beacon artifact that would have caught it.
Want to verify how this lab was built? Every commit is in a signed Ed25519 audit log.