Suitcase + Policy-as-Code + 100-failure deep-dive

Ship Beacon yourself.

You're past the 30-minute story. Now you get the Suitcase Lab — the fastest path to a working Beacon — the nine 20-minute lab variants that compose a half-day workshop, the Policy-as-Code emission that turns checklists into versioned controls, the receipt API for wiring your own AI calls into Beacon, and the full 100-case failure deep-dive with framework cross-links.

By the end of this level you'll be able to:

Spin up a complete Beacon stack on your laptop in 60 seconds.
Choose four lab variants that compose a working 90-minute workshop.
Emit Policy-as-Code (gate YAML + OPA Rego) from a checklist outcome.
Wire a real model invocation into Beacon via the receipt API.
Cross-reference any AI failure to the framework controls and Beacon artifact that would have caught it.

Section 1 of 8

Suitcase Lab — laptop, 60 seconds.

○ mark complete

A laptop-first Beacon stack. Beacon core, Studio, synth-traffic replay, and an optional mock DNS service run together through Docker Compose. Studio shows live receipts replayed from the 100 real AI failure cases.

# 1. One-liner
git clone https://github.com/bobrapp/aigovops-beacon
cd aigovops-beacon
docker compose -f deploy/lab.yml up -d
open http://localhost:8788

# 2. Sanity check
curl -s http://localhost:8787/healthz
# {"ok":true,"version":"2.2.0","receipts":47,"keys":1}

What's actually running

Container	Port	Role
beacon	8787	Receipt ingest, signing key, inventory
studio	8788	Web UI — see receipts, score frameworks, export bundles
synth-traffic	—	Replays docs/data/ai_failures_top100.json as DNS/SNI events
mock-dns	5353/udp	Optional. Pretends to be a corp resolver; logs queries

Studio is now showing receipts replayed live. Click any receipt → Verify → green check.

Section 2 of 8

Nine 20-minute lab variants.

○ mark complete

Each variant is a self-contained block. Pick four for a 90-minute workshop. Pick all nine for a half-day. The 30-minute Level 100 flow maps to Labs 1, 4, and 7. Labs 8 and 9 are v2.3 additions.

Lab 1

First receipt

15 min · `docker compose up`

Make a real decision in Studio (e.g., approve a fine-tune). Watch the receipt land. Hash it. Verify the signature with the public key.

Evidence: one signed receipt, ready to hand off.

Lab 2

Discover models in your browser

20 min · MV3 extension

Load the unpacked extension. Visit chatgpt.com, claude.ai, gemini.google.com, copilot.microsoft.com. Discovery tab fills with hostnames.

Evidence: discovery rows tied to user identity.

Lab 3

Tail a DNS log

15 min · scripts/tail_dns.py

Drop a DNS query CSV in the watched folder. Run the tail script. AI hostnames light up in Studio's Discovery tab.

Evidence: every AI hostname resolved by any device.

Lab 4

Score a framework

20 min · nist-ai-rmf-1.0

Pick a framework. Walk the 23-control checklist. Mark each ✓ / ✗ / N/A. Export the receipt bundle. Open the PDF.

Evidence: a signed bundle an auditor can reproduce.

Lab 5

Replay a failure

15 min · Studio › Failures

Pick a case (e.g., iTutorGroup hiring discrimination, #4). Click Replay as decision. Beacon emits the receipt as if you'd reviewed it pre-deploy.

Evidence: counterfactual receipt — YES-Ship, Steady, or Recover?

Lab 6

MCP attest

20 min · Claude Desktop

Add the MCP config snippet to Claude Desktop. Six tools appear: record_decision, verify_receipt, query_inventory, score_framework, bundle_for_auditor, replay_case.

Evidence: every agent action gets an audit trail.

Lab 7

Bundle for auditor

15 min · Studio › Bundle

Pick a date range, framework, scope. Download the tarball. Hand it to a partner team. They run beacon verify bundle.tar.gz with no other access.

Evidence: portable verification — green check, no network.

Lab 8

Offline walkthrough on a USB stick

10 min · v2.3

Copy docs/walkthrough/ to USB. Open index.html in any browser — no network, no install. 12 screens animate the full flow. Use ←/→/Space.

Evidence: governance in a single click, no infrastructure.

Lab 9

Hosted MCP + restricted agent

20 min · v2.3

Run the hosted MCP on a free Render dyno. Connect Claude Desktop to your MCP /sse URL. Ask the restricted Worker to bundle 30 days for an EU AI Act audit.

Evidence: an autonomous agent whose tools are exactly the six Beacon tools.

Workshop combinations

30 minutes — Lab 1 (first receipt) → Lab 4 (score a framework) → Lab 7 (bundle for auditor).
90 minutes — Labs 1, 2, 4, 5, 7.
Half day — All nine, in the order shown.
Tabletop, no admin rights — Lab 8 (offline USB) only.

Section 3 of 8

Policy-as-Code lab.

○ mark complete

Beacon's Make it Policy as Code path emits gate YAML, OPA .rego, a Governance Decision Record, and a signed bundle — then opens a pull request against your governance repo. Checklists become versioned controls. Meeting notes become artifacts.

Gate YAML (governance gate spec)

kind: GovernanceGate
apiVersion: aigovops.foundation/v1
metadata:
  name: high-risk-ai-release
spec:
  framework: nist-ai-rmf
  system: hr-screen-gpt
  requires:
    - human_approval
    - signed_receipt
    - evidence_bundle
  block_on:
    - missing_human_approval
    - stale_receipt: 30d
  emit:
    - governance_decision_record
    - signed_bundle

OPA Rego (enforcement)

package aigovops.beacon.high_risk_ai_release

default allow := false

allow if {
    input.framework == "nist-ai-rmf"
    input.system == "hr-screen-gpt"
    has_human_approval
    has_signed_receipt
    has_evidence_bundle
}

has_human_approval if {
    some i
    input.attestations[i].kind == "human_approval"
    input.attestations[i].verified == true
}

has_signed_receipt if {
    some i
    input.receipts[i].signature_ed25519
    input.receipts[i].verified == true
}

has_evidence_bundle if {
    input.bundle.manifest_sha256
    input.bundle.verify_passed == true
}

That YAML and Rego are the same artifact, in two languages. The YAML is what the governance team reviews. The Rego is what the CI/CD gate runs. The signed receipt is what the auditor verifies — months or years later — without needing Beacon installed.

Section 4 of 8

Receipt API lab.

○ mark complete

Wire a real model invocation into Beacon by posting receipt data to /api/v1/receipts with model metadata, environment, prompt, result, latency, and token counts. The Beacon server signs the receipt and appends it to the NDJSON log.

curl -s http://127.0.0.1:8787/api/v1/receipts \
  -H "Content-Type: application/json" \
  -H "X-Beacon-User-Sub: oidc|alice" \
  -H "X-Beacon-User-Email: alice@example.org" \
  -H "X-Beacon-OIDC-Issuer: https://accounts.example.org" \
  -d '{
    "vendor": "OpenAI",
    "model": "gpt-4o-mini",
    "version": "2024-07-18",
    "environment": "production",
    "event_type": "invocation",
    "prompt": "summarize the policy",
    "result": "…",
    "latency_ms": 312,
    "tokens": {"in": 21, "out": 47}
  }'

What gets signed

Beacon canonicalizes the receipt (sorted keys, no whitespace, UTF-8) and signs the bytes with its per-instance Ed25519 key. The response includes the receipt sequence number, the entry hash, and the signature — same shape as the in-browser sandbox in Level 100.

Wiring it into a production AI call

The pattern that works in production:

Wrap the model client (OpenAI, Anthropic, etc.) in a thin observer that captures vendor / model / version / latency / token counts.
POST to /api/v1/receipts after each call — fire-and-forget, with a local retry queue if Beacon is unreachable.
Never block the user-facing path on Beacon. The receipt is observational; the model call is independent.

Verify a bundle from the command line

cd server
node src/cli.js verify ~/.beacon/bundles/bundle-<timestamp>

Or, for the lab-style audit log that built this very page, no Node needed:

pip install cryptography
python -m src.audit_log verify

Section 5 of 8

The 100-case deep-dive.

○ mark complete

Every public AI failure in the dataset, mapped to the framework controls and the Beacon artifact that would have caught it. Filter by act, framework, harm type, or sector. Click any row to expand. Each row's Beacon artifact hint cross-links back to the Level 200 section where that artifact lives.

Loading…

Loading 100 cases…

Source: docs/data/ai_failures_top100.json. Methodology mirrors the home-page failures browser.

Section 6 of 8

Pick your cases — Level 200 worksheet.

○ mark complete

Pick five technical cases and five policy-facing cases from the deep-dive above. For each, identify the missing Beacon artifact and the framework controls that apply. Use the worksheet to capture your analysis — bring it to your team's next AI governance review.

Open printable worksheet ↗ Worksheet PDF ↓

Suggested splits

Audience	Best examples	Main prompt
Technical teams	Boeing 737 MAX, Tesla Autopilot, Knight Capital, Cruise robotaxi, deployment failures	Where would you instrument Beacon first?
Policy teams	Robodebt, UK A-levels, Apple Card, Detroit FR, iTutorGroup	What evidence should have existed before harm?

Section 8 of 8

Completion standard — advanced.

○ mark complete

You've completed Level 200 when you can say all four statements clearly.

Self-attestation

1. I can launch the Suitcase Lab and confirm it's signing receipts.
2. I can choose 4-9 lab variants for a specific workshop length and audience.
3. I can emit Policy-as-Code (gate YAML + Rego) from a checklist outcome and explain what it enforces.
4. I can cross-reference any case in the 100-failure dataset to the framework controls and the Beacon artifact that would have caught it.

Level 200 complete.

You can ship Beacon yourself. Spin up the Suitcase Lab for stakeholders. Pick lab variants for any audience. Emit Policy-as-Code that turns checklists into versioned controls. And cross-reference any failure to the Beacon artifact that would have caught it.

Want to verify how this lab was built? Every commit is in a signed Ed25519 audit log.

How I Built This →

Ship Beacon yourself.

By the end of this level you'll be able to:

Suitcase Lab — laptop, 60 seconds.

What's actually running

Nine 20-minute lab variants.

First receipt

Discover models in your browser

Tail a DNS log

Score a framework

Replay a failure

MCP attest

Bundle for auditor

Offline walkthrough on a USB stick

Hosted MCP + restricted agent

Workshop combinations

Policy-as-Code lab.

Gate YAML (governance gate spec)

OPA Rego (enforcement)

Receipt API lab.

What gets signed

Wiring it into a production AI call

Verify a bundle from the command line

The 100-case deep-dive.

Pick your cases — Level 200 worksheet.

Suggested splits

Check yourself — advanced.

Completion standard — advanced.

Self-attestation

Level 200 complete.