Level 200
0% complete
Suitcase + Policy-as-Code + 100-failure deep-dive

Ship Beacon yourself.

You're past the 30-minute story. Now you get the Suitcase Lab — the fastest path to a working Beacon — the nine 20-minute lab variants that compose a half-day workshop, the Policy-as-Code emission that turns checklists into versioned controls, the receipt API for wiring your own AI calls into Beacon, and the full 100-case failure deep-dive with framework cross-links.

By the end of this level you'll be able to:

  1. Spin up a complete Beacon stack on your laptop in 60 seconds.
  2. Choose four lab variants that compose a working 90-minute workshop.
  3. Emit Policy-as-Code (gate YAML + OPA Rego) from a checklist outcome.
  4. Wire a real model invocation into Beacon via the receipt API.
  5. Cross-reference any AI failure to the framework controls and Beacon artifact that would have caught it.
Section 1 of 8

Suitcase Lab — laptop, 60 seconds.

○ mark complete

A laptop-first Beacon stack. Beacon core, Studio, synth-traffic replay, and an optional mock DNS service run together through Docker Compose. Studio shows live receipts replayed from the 100 real AI failure cases.

# 1. One-liner
git clone https://github.com/bobrapp/aigovops-beacon
cd aigovops-beacon
docker compose -f deploy/lab.yml up -d
open http://localhost:8788

# 2. Sanity check
curl -s http://localhost:8787/healthz
# {"ok":true,"version":"2.2.0","receipts":47,"keys":1}

What's actually running

ContainerPortRole
beacon8787Receipt ingest, signing key, inventory
studio8788Web UI — see receipts, score frameworks, export bundles
synth-trafficReplays docs/data/ai_failures_top100.json as DNS/SNI events
mock-dns5353/udpOptional. Pretends to be a corp resolver; logs queries

Studio is now showing receipts replayed live. Click any receipt → Verify → green check.

Section 2 of 8

Nine 20-minute lab variants.

○ mark complete

Each variant is a self-contained block. Pick four for a 90-minute workshop. Pick all nine for a half-day. The 30-minute Level 100 flow maps to Labs 1, 4, and 7. Labs 8 and 9 are v2.3 additions.

Lab 1

First receipt

15 min · `docker compose up`

Make a real decision in Studio (e.g., approve a fine-tune). Watch the receipt land. Hash it. Verify the signature with the public key.

Evidence: one signed receipt, ready to hand off.

Lab 2

Discover models in your browser

20 min · MV3 extension

Load the unpacked extension. Visit chatgpt.com, claude.ai, gemini.google.com, copilot.microsoft.com. Discovery tab fills with hostnames.

Evidence: discovery rows tied to user identity.

Lab 3

Tail a DNS log

15 min · scripts/tail_dns.py

Drop a DNS query CSV in the watched folder. Run the tail script. AI hostnames light up in Studio's Discovery tab.

Evidence: every AI hostname resolved by any device.

Lab 4

Score a framework

20 min · nist-ai-rmf-1.0

Pick a framework. Walk the 23-control checklist. Mark each ✓ / ✗ / N/A. Export the receipt bundle. Open the PDF.

Evidence: a signed bundle an auditor can reproduce.

Lab 5

Replay a failure

15 min · Studio › Failures

Pick a case (e.g., iTutorGroup hiring discrimination, #4). Click Replay as decision. Beacon emits the receipt as if you'd reviewed it pre-deploy.

Evidence: counterfactual receipt — YES-Ship, Steady, or Recover?

Lab 6

MCP attest

20 min · Claude Desktop

Add the MCP config snippet to Claude Desktop. Six tools appear: record_decision, verify_receipt, query_inventory, score_framework, bundle_for_auditor, replay_case.

Evidence: every agent action gets an audit trail.

Lab 7

Bundle for auditor

15 min · Studio › Bundle

Pick a date range, framework, scope. Download the tarball. Hand it to a partner team. They run beacon verify bundle.tar.gz with no other access.

Evidence: portable verification — green check, no network.

Lab 8

Offline walkthrough on a USB stick

10 min · v2.3

Copy docs/walkthrough/ to USB. Open index.html in any browser — no network, no install. 12 screens animate the full flow. Use ←/→/Space.

Evidence: governance in a single click, no infrastructure.

Lab 9

Hosted MCP + restricted agent

20 min · v2.3

Run the hosted MCP on a free Render dyno. Connect Claude Desktop to your MCP /sse URL. Ask the restricted Worker to bundle 30 days for an EU AI Act audit.

Evidence: an autonomous agent whose tools are exactly the six Beacon tools.

Workshop combinations

  • 30 minutes — Lab 1 (first receipt) → Lab 4 (score a framework) → Lab 7 (bundle for auditor).
  • 90 minutes — Labs 1, 2, 4, 5, 7.
  • Half day — All nine, in the order shown.
  • Tabletop, no admin rights — Lab 8 (offline USB) only.
Section 3 of 8

Policy-as-Code lab.

○ mark complete

Beacon's Make it Policy as Code path emits gate YAML, OPA .rego, a Governance Decision Record, and a signed bundle — then opens a pull request against your governance repo. Checklists become versioned controls. Meeting notes become artifacts.

Gate YAML (governance gate spec)

kind: GovernanceGate
apiVersion: aigovops.foundation/v1
metadata:
  name: high-risk-ai-release
spec:
  framework: nist-ai-rmf
  system: hr-screen-gpt
  requires:
    - human_approval
    - signed_receipt
    - evidence_bundle
  block_on:
    - missing_human_approval
    - stale_receipt: 30d
  emit:
    - governance_decision_record
    - signed_bundle

OPA Rego (enforcement)

package aigovops.beacon.high_risk_ai_release

default allow := false

allow if {
    input.framework == "nist-ai-rmf"
    input.system == "hr-screen-gpt"
    has_human_approval
    has_signed_receipt
    has_evidence_bundle
}

has_human_approval if {
    some i
    input.attestations[i].kind == "human_approval"
    input.attestations[i].verified == true
}

has_signed_receipt if {
    some i
    input.receipts[i].signature_ed25519
    input.receipts[i].verified == true
}

has_evidence_bundle if {
    input.bundle.manifest_sha256
    input.bundle.verify_passed == true
}

That YAML and Rego are the same artifact, in two languages. The YAML is what the governance team reviews. The Rego is what the CI/CD gate runs. The signed receipt is what the auditor verifies — months or years later — without needing Beacon installed.

Section 4 of 8

Receipt API lab.

○ mark complete

Wire a real model invocation into Beacon by posting receipt data to /api/v1/receipts with model metadata, environment, prompt, result, latency, and token counts. The Beacon server signs the receipt and appends it to the NDJSON log.

curl -s http://127.0.0.1:8787/api/v1/receipts \
  -H "Content-Type: application/json" \
  -H "X-Beacon-User-Sub: oidc|alice" \
  -H "X-Beacon-User-Email: alice@example.org" \
  -H "X-Beacon-OIDC-Issuer: https://accounts.example.org" \
  -d '{
    "vendor": "OpenAI",
    "model": "gpt-4o-mini",
    "version": "2024-07-18",
    "environment": "production",
    "event_type": "invocation",
    "prompt": "summarize the policy",
    "result": "…",
    "latency_ms": 312,
    "tokens": {"in": 21, "out": 47}
  }'

What gets signed

Beacon canonicalizes the receipt (sorted keys, no whitespace, UTF-8) and signs the bytes with its per-instance Ed25519 key. The response includes the receipt sequence number, the entry hash, and the signature — same shape as the in-browser sandbox in Level 100.

Wiring it into a production AI call

The pattern that works in production:

  1. Wrap the model client (OpenAI, Anthropic, etc.) in a thin observer that captures vendor / model / version / latency / token counts.
  2. POST to /api/v1/receipts after each call — fire-and-forget, with a local retry queue if Beacon is unreachable.
  3. Never block the user-facing path on Beacon. The receipt is observational; the model call is independent.

Verify a bundle from the command line

cd server
node src/cli.js verify ~/.beacon/bundles/bundle-<timestamp>

Or, for the lab-style audit log that built this very page, no Node needed:

pip install cryptography
python -m src.audit_log verify
Section 5 of 8

The 100-case deep-dive.

○ mark complete

Every public AI failure in the dataset, mapped to the framework controls and the Beacon artifact that would have caught it. Filter by act, framework, harm type, or sector. Click any row to expand. Each row's Beacon artifact hint cross-links back to the Level 200 section where that artifact lives.

Loading…

Loading 100 cases…

Source: docs/data/ai_failures_top100.json. Methodology mirrors the home-page failures browser.

Section 6 of 8

Pick your cases — Level 200 worksheet.

○ mark complete

Pick five technical cases and five policy-facing cases from the deep-dive above. For each, identify the missing Beacon artifact and the framework controls that apply. Use the worksheet to capture your analysis — bring it to your team's next AI governance review.

Open printable worksheet ↗ Worksheet PDF ↓

Suggested splits

AudienceBest examplesMain prompt
Technical teams Boeing 737 MAX, Tesla Autopilot, Knight Capital, Cruise robotaxi, deployment failures Where would you instrument Beacon first?
Policy teams Robodebt, UK A-levels, Apple Card, Detroit FR, iTutorGroup What evidence should have existed before harm?
Section 7 of 8

Check yourself — advanced.

○ mark complete

1. Which command brings up the Suitcase Lab?

Docker Compose with the lab.yml file. Beacon core, Studio, synth-traffic, and mock-dns all come up together.

2. Which lab variant is the "shortest 30-minute workshop" path?

The 30-minute path is the through-line of the Level 100 flow — make a decision, score it, hand off the bundle.

3. What does the "Make it Policy as Code" path emit?

The four artifacts compose the builder-to-auditor bridge. The YAML is review-able; the Rego is enforce-able; the GDR is documentation; the bundle is portable evidence.

4. The Receipt API endpoint is —

Versioned, plural, namespaced under /api. The receipt body uses headers like X-Beacon-User-Sub and X-Beacon-OIDC-Issuer to bind the actor.

5. The Beacon artifact most likely to have caught Robodebt is —

Robodebt was a legality failure — the algorithm lacked authority. A governance gate would have forced the legal review BEFORE the algorithm ran against citizens.

6. Why is the offline USB walkthrough (Lab 8) useful?

Sometimes you need to demonstrate Beacon in a room with no internet, on a laptop with no admin rights. The USB walkthrough proves the model still works there.
Score · 0 / 6 0 answered
Section 8 of 8

Completion standard — advanced.

○ mark complete

You've completed Level 200 when you can say all four statements clearly.

Self-attestation

Level 200 complete.

You can ship Beacon yourself. Spin up the Suitcase Lab for stakeholders. Pick lab variants for any audience. Emit Policy-as-Code that turns checklists into versioned controls. And cross-reference any failure to the Beacon artifact that would have caught it.

Want to verify how this lab was built? Every commit is in a signed Ed25519 audit log.

How I Built This →