Reference Architecture Blueprint · v1.0 · advisory under the Foundation
Map every governance control to a CI/CD stage. Sign the receipt. Hand the auditor a bundle.
A reference architecture for embedding AI governance — automated evidence collection, cryptographic audit logging, data & model lineage, and Policy-as-Code guardrails — into a modern GitHub-native CI/CD pipeline. One shared evidence model that policy and engineering teams both write against. Auditor-ready by design.
- Ed25519 + RFC 8785 JCS
- OVERT 1.0 envelope
- SLSA v1.0 provenance
- in-toto attestations
- OpenLineage 1.x
- OPA / Conftest
- NIST AI RMF · ISO 42001 · EU AI Act
Six principles this blueprint is built on
1 · Evidence is the product
Every governance decision emits a signed receipt. If there is no receipt, it did not happen. The build artifact, the model, the policy decision, and the deployment are all peers — each gets a signed attestation.
2 · One shared schema
Policy team and engineering team write against the same OVERT envelope. Controls reference the receipt fields they verify. No second spreadsheet, no quarterly reconciliation.
3 · Policy lives in the repo
Rego, gate YAML, and crosswalks are committed code. Changes go through pull request, dual-control approval, and the same CI as application code.
4 · Pipelines are the perimeter
The only path to production is the signed pipeline. Out-of-band deploys are detected by drift checks and rejected by admission controllers.
5 · Lineage is automatic
Datasets, prompts, models, and outputs are linked by hash from collection through inference. Auditors can replay any decision back to its inputs.
6 · Humans hold legitimacy
Agents do the bureaucracy. Exceptions, key rotations, and trust-tier promotions require a named human approver — and that approval is itself a receipt.
The pipeline, end to end
Three lifecycle stages — Design, Deployment, Operations — sit on top of four shared planes: the Evidence Plane (signed receipts), the Policy Plane (OPA + gate YAML), the Lineage Plane (OpenLineage graph), and the Trust Plane (KMS, OIDC, transparency log).
flowchart TB
classDef design fill:#e6f3f4,stroke:#01696f,color:#01314c,stroke-width:1.5px
classDef deploy fill:#fef6e0,stroke:#b87b00,color:#4a3000,stroke-width:1.5px
classDef ops fill:#e8f5ea,stroke:#2ecc71,color:#0d3d1a,stroke-width:1.5px
classDef plane fill:#01314c,stroke:#01696f,color:#fff,stroke-width:1.5px
subgraph DESIGN["DESIGN STAGE"]
direction TB
D1[Use-case intake
risk classification]
D2[Data sheet +
model card draft]
D3[Threat model
red-team plan]
D4[Design review
signed approval]
D1 --> D2 --> D3 --> D4
end
subgraph DEPLOY["DEPLOYMENT STAGE"]
direction TB
P1[Source commit
signed by dev]
P2[Build + SBOM
SLSA provenance]
P3[Eval suite
bias · safety · perf]
P4[Policy gate
OPA / Conftest]
P5[Signed release
cosign + attest]
P6[Admission control
verify in cluster]
P1 --> P2 --> P3 --> P4 --> P5 --> P6
end
subgraph OPS["OPERATIONS STAGE"]
direction TB
O1[Inference observed
OVERT receipt]
O2[Drift + safety
monitors]
O3[Incident detect
+ kill switch]
O4[Continuous
re-evaluation]
O1 --> O2 --> O3 --> O4 --> O1
end
DESIGN ==> DEPLOY ==> OPS
subgraph PLANES["SHARED PLANES"]
direction LR
EV[(Evidence Plane
signed receipts
Merkle anchored)]
PL[(Policy Plane
Rego · gate YAML
crosswalks)]
LN[(Lineage Plane
OpenLineage
graph)]
TR[(Trust Plane
KMS · OIDC
transparency log)]
end
DESIGN -.-> EV & PL & LN & TR
DEPLOY -.-> EV & PL & LN & TR
OPS -.-> EV & PL & LN & TR
class D1,D2,D3,D4 design
class P1,P2,P3,P4,P5,P6 deploy
class O1,O2,O3,O4 ops
class EV,PL,LN,TR plane
Design Stage
Before any code is written: classify the risk, document the intent, sign the design.
What happens
- Use-case intake. A YAML use-case manifest is created in the governance repo (one PR per use case). Risk tier is set by a classifier policy (EU AI Act Annex III / NIST AI RMF Map function).
- Data sheet & model card. Templates auto-populate from upstream inventory; the author fills gaps. Hashes of training data and base model are pinned.
- Threat model & red-team plan. Generated from the risk tier — high-risk gets STRIDE + AI-specific (prompt injection, poisoning, confabulation) plus a mandatory red-team budget.
- Design review. Named approvers (product, security, privacy, ethics for high risk) sign a
design.approvedreceipt. This receipt becomes the parent of every downstream attestation.
Evidence collected
| Artifact | Format | Receipt event |
|---|---|---|
| Use-case manifest | YAML in repo, hashed | design.usecase.registered |
| Risk classification | OPA decision log | design.risk.classified |
| Data sheet | Markdown + SHA-256 | design.datasheet.published |
| Model card | Markdown + SHA-256 | design.modelcard.published |
| Threat model | Markdown + STRIDE table | design.threatmodel.completed |
| Design approval | Signed PR review + Ed25519 | design.approved |
Controls satisfied
- NIST AI RMF: MAP-1.1, MAP-2.1, MAP-3.4, MAP-4.1, GOVERN-1.2
- ISO/IEC 42001: 6.1.2 (AI risk assessment), 6.1.4 (AI system impact assessment), A.6.2.2 (objectives)
- EU AI Act: Art. 9 (risk management system), Art. 11 + Annex IV (technical documentation), Art. 27 (FRIA for high-risk)
Deployment Stage
The CI/CD pipeline is the only path to production. Every step is a signed attestation; every artifact carries its provenance.
flowchart LR
classDef step fill:#fef6e0,stroke:#b87b00,color:#4a3000,stroke-width:1.5px
classDef evidence fill:#01314c,stroke:#01696f,color:#fff
classDef gate fill:#fde2e0,stroke:#c0392b,color:#5a1410,stroke-width:1.5px
DEV[Developer
signed commit
gitsign] --> BUILD[GitHub Actions
OIDC to cloud
hermetic build]
BUILD --> SBOM[SBOM
CycloneDX]
BUILD --> PROV[SLSA provenance
in-toto v1]
BUILD --> EVAL[Model eval suite
bias · safety · perf · red-team]
EVAL --> ATT[Eval attestation
signed predicate]
SBOM --> GATE
PROV --> GATE
ATT --> GATE
GATE{Policy gate
OPA / Conftest
+ Beacon checklist}
GATE -- pass --> SIGN[cosign sign
image + attestations]
GATE -- fail --> BLOCK[blocked + receipt
gate.failed]
SIGN --> REG[(OCI registry
+ Rekor log)]
REG --> ADM[Admission controller
verify signatures
verify attestations]
ADM -- pass --> PROD[Production]
ADM -- fail --> QUAR[Quarantine + receipt]
class DEV,BUILD,SBOM,PROV,EVAL,ATT,SIGN,REG,ADM,PROD step
class GATE gate
class BLOCK,QUAR evidence
Pipeline stages & evidence
| Stage | Tooling | Attestation |
|---|---|---|
| Commit | gitsign + Sigstore | Signed commit, signed tag |
| Build | GH Actions, hermetic runner, OIDC | SLSA v1.0 build provenance |
| SBOM | Syft / cdxgen | CycloneDX 1.6 + sig |
| Test & eval | pytest, DeepEval, Garak, OWASP LLM | in-toto eval predicate |
| Policy gate | OPA / Conftest + Beacon checklist runner | gate.evaluated receipt |
| Sign | cosign keyless (Fulcio + Rekor) | Sigstore bundle |
| Admit | Kyverno / Gatekeeper / Connaisseur | Admission decision receipt |
Policy-as-Code anatomy
Every gate is two files committed in the governance repo:
gate.<org>.v1.yaml— declarative matched-pair: which receipts to look at, which evaluations to run, what thresholds count as pass.<org>_audit.rego— the executable counterpart. Same logic, machine-checked. Conftest runs both against the bundle.
When a control fires, the gate emits a gate.evaluated receipt referencing the inputs by hash. Auditors replay any decision deterministically.
Controls satisfied
- NIST AI RMF: MEASURE-2.3, MEASURE-2.7, MEASURE-2.11, MANAGE-1.3, MANAGE-2.2
- ISO/IEC 42001: 8.2 (AI system impact assessment), 8.3 (AI system lifecycle), A.6.2.5 (verification & validation), A.8.2 (system documentation)
- EU AI Act: Art. 15 (accuracy, robustness, cybersecurity), Art. 17 (QMS), Art. 18 (record-keeping), Art. 43 (conformity assessment)
- SLSA v1.0: Build Levels 2–3 (hosted, hardened, isolated build)
Operations Stage
In production, the receipts never stop. Every inference, every drift event, every kill-switch firing produces evidence that closes the loop back to design.
What runs in production
- OVERT-shaped inference receipts. The runtime SDK (or sidecar proxy) emits one
inference.observedreceipt per call: user, vendor, model, version, prompt-hash, result-hash, latency, signature. - Hourly Merkle anchor. All receipts in the hour are hashed into a Merkle tree; the root is published to a transparency log (Rekor or internal) and optionally to S3 / IPFS for off-org witnesses.
- Drift & safety monitors. Continuous evaluation against the same suites used in deployment. Threshold breach →
monitor.threshold.breachedreceipt → automatic ticket + policy re-eval. - Incident & kill switch. Severity-tagged events trigger pre-approved runbooks. Disabling a tool, blocking a source, or pausing an agent is itself a signed receipt.
- Continuous re-evaluation. Quarterly (or trigger-based) re-run of the deployment policy gate against fresh production data. Drift in real behavior reopens the design loop.
Evidence collected
| Event | Source | Receipt event |
|---|---|---|
| Model call | SDK / proxy | inference.observed |
| Tool call | Agent runtime | agent.tool.called |
| Retrieval | RAG layer | agent.retrieval.hit |
| Drift detected | Monitor | monitor.drift.detected |
| Safety violation | LLM firewall | guardrail.violated |
| Kill switch | Runbook | incident.killswitch.fired |
| Hourly anchor | Beacon anchor svc | bundle.anchored |
Controls satisfied
- NIST AI RMF: MEASURE-2.8, MEASURE-3.1, MEASURE-4.1, MANAGE-2.3, MANAGE-4.1, MANAGE-4.3
- ISO/IEC 42001: 9.1 (monitoring & measurement), 10.2 (continual improvement), A.6.2.7 (operation & monitoring)
- EU AI Act: Art. 14 (human oversight), Art. 19 (automatically generated logs), Art. 26 (deployer obligations), Art. 72 (post-market monitoring), Art. 73 (incident reporting)
The shared evidence model
One envelope binds policy and engineering. Beacon receipts are OVERT 1.0 envelopes with the aigovops-beacon.v1 profile. Policy team writes controls that reference these fields; engineering writes SDKs that emit them. Same schema, two teams, zero translation overhead.
Envelope (OVERT 1.0, normative)
{
"id": "01HXYZ8K3F2N5Q1R7S9V3W4Y6B",
"ts_utc": "2026-05-23T18:04:22.318Z",
"user": {
"sub": "build-runner@github",
"email": "actions@github.com",
"oidc_issuer": "https://token.actions.githubusercontent.com"
},
"vendor": "anthropic",
"model": "claude-sonnet-4-5",
"version": "2026-04-01",
"prompt_hash": "sha256:9f2c…",
"result_hash": "sha256:7b1a…",
"event_type": "gate.evaluated",
"environment": "cloud_saas",
"evidence_id": "01HXYZ8K3F2N5Q1R7S9V3W4Y6A",
"parent_receipt_id": "01HXYZ8K3F2N5Q1R7S9V3W4Y69",
"signature": {
"alg": "Ed25519",
"key_fpr": "SHA256:9p2x…",
"sig_b64": "MEUCIQ…",
"canonical_form": "json/c14n-rfc8785"
}
}
Profile extensions (aigovops-beacon.v1)
{
"profile": "aigovops-beacon.v1",
"control_refs": [
"nist-ai-rmf:MEASURE-2.7",
"iso-42001:A.6.2.5",
"eu-ai-act:Art.15"
],
"subject": {
"name": "ghcr.io/aigovops/foo:v1.2.3",
"digest": {
"sha256": "abc123…"
}
},
"lineage": {
"openlineage_run_id": "f3b1…",
"parent_datasets": [
"sha256:111…",
"sha256:222…"
],
"parent_models": [
"sha256:333…"
]
},
"decision": {
"result": "pass",
"rules_evaluated": 47,
"rules_failed": 0,
"exception_id": null
}
}
How both teams use it
| Field | Policy team reads | Eng team writes |
|---|---|---|
event_type | Triggers control evaluation | SDK sets per call site |
subject.digest | What was governed | Build emits content hash |
lineage.parent_* | Traceability check | OpenLineage hook |
control_refs | Crosswalk lookup | Pipeline template prefills |
decision.result | Gate verdict | OPA writes from Rego |
signature | Verifies authenticity | KMS / cosign signs |
Receipt chains = audit trails
Receipts link via parent_receipt_id. A finished bundle for one model deployment looks like:
design.approved
└─ build.completed
└─ eval.completed
└─ gate.evaluated (pass)
└─ bundle.signed
└─ admission.allowed
└─ inference.observed × N
└─ monitor.drift.detected
└─ gate.evaluated (re-eval)
An auditor follows the chain backwards from any production call to the design approval — and every step is cryptographically verifiable in seconds.
Infrastructure requirements
What you actually need to stand up. Two tiers: a minimum viable governance plane that fits in a single GitHub org plus one cloud account, and a scaled tier for multi-tenant enterprise use.
Tier 1 · Minimum viable
- Source: GitHub org with branch protection, required reviews, signed commits enforced.
- Build: GitHub-hosted Actions runners; OIDC federation to cloud (no long-lived secrets).
- Signing: Sigstore keyless (Fulcio + Rekor public good) or cloud KMS (AWS KMS / GCP KMS / Azure Key Vault).
- Artifact registry: GHCR or any OCI registry that stores referrers.
- Evidence store: S3 / GCS bucket with object lock (WORM) + Beacon NDJSON receipts.
- Policy engine: OPA / Conftest in CI; Kyverno or Gatekeeper at admission.
- Lineage: OpenLineage emitting to Marquez (self-hosted, single VM).
- Transparency: Public Rekor or self-hosted; Merkle anchors to S3.
Tier 2 · Scaled enterprise
- Source: Multiple orgs + GitHub Enterprise audit log streamed to SIEM.
- Build: Self-hosted ARC runners in isolated VPC; reproducible builds.
- Signing: HSM-backed KMS with split-key custody; private Fulcio CA.
- Artifact registry: Harbor / Artifactory with vulnerability scanning.
- Evidence store: Dedicated WORM bucket per business unit + cross-region replication; 7-year retention.
- Policy engine: Centralized OPA cluster + bundle service; Styra or in-house control plane.
- Lineage: Managed OpenLineage backend (e.g., Astronomer, DataHub) with cross-team graph.
- Transparency: Private Rekor instance + dual anchoring to public chain for high-risk systems.
- SIEM / SOAR: Receipt stream feeds detection + runbooks for incident reporting (EU AI Act Art. 73).
Minimum compute footprint (Tier 1)
The governance plane itself is small: one t3.small for Marquez, one t3.micro for the Beacon receipt service, one S3 bucket, one KMS key. The expensive part is the eval suite — budget GPU time per model release, not per receipt.
How technical & policy teams collaborate
Same repo. Same pull-request workflow. Same evidence model. Different files.
Repository layout
aigovops-beacon/
├── crosswalks/ # policy team owns
│ ├── nist-ai-rmf.yaml
│ ├── iso-42001.yaml
│ └── eu-ai-act.yaml
├── policies/ # joint ownership
│ ├── gates/ # YAML (policy team)
│ └── rego/ # Rego (eng team)
├── usecases/ # product owners
│ └── *.yaml
├── modelcards/ # ML team
├── datasheets/ # data team
├── attestations/ # generated, signed
└── .github/workflows/
├── design-review.yml
├── build-sign-attest.yml
└── evidence-bundle.yml
RACI by activity
| Activity | Policy | Eng | Product | Audit |
|---|---|---|---|---|
| Crosswalk authoring | R/A | C | I | C |
| Rego implementation | C | R/A | I | I |
| Use-case intake | C | I | R/A | I |
| Model card | C | R/A | C | I |
| Design approval | A | R | R | I |
| Pipeline gate config | C | R/A | I | C |
| Exception grant | A | R | R | I |
| Incident response | C | R/A | C | I |
| Audit bundle export | R | R | I | A |
The four collaboration rituals
Crosswalk PRs
Policy team opens a PR mapping a new regulatory clause to existing receipt fields. CI fails if the clause references a field that does not exist in the OVERT envelope or profile — forcing schema evolution to happen with both teams in the room.
Matched-pair commits
Every gate YAML change ships in the same PR as its Rego counterpart. Conftest verifies they agree on a synthetic test bundle before merge.
Exception receipts
When the gate fails and business needs the release, a named approver opens a signed exception.granted receipt with TTL and conditions. Auditors see every exception by query.
Quarterly bundle review
Policy and eng leads jointly export the previous quarter's bundle, run the auditor verification CLI, and review any drift between intended controls and observed receipts.
Auditor-ready documentation, by construction
The auditor never asks "do you have evidence?" — they ask "verify this hash." Three artifacts make that possible.
The bundle
bundle.tar.gz contains every receipt for a system over a window, the crosswalk used, the policies that ran, the data + model cards, and a manifest with Merkle roots. Generated by aigovops-beacon export; size is typically < 50 MB per quarter per system.
The verifier
A 200-line Go binary (beacon-verify) the auditor runs themselves. It rehashes every receipt, checks every signature against the published key transparency log, walks the Merkle tree, and prints a pass/fail line per control. No SaaS, no login, no trust required.
The crosswalk
For each control in NIST AI RMF / ISO 42001 / EU AI Act, the crosswalk YAML lists the receipts that satisfy it and the queries to run. The auditor reads one document and gets pointers into the bundle for everything they need.
What the auditor sees on day one
- Download
bundle.tar.gzandbeacon-verify. - Run
beacon-verify bundle.tar.gz— exit code 0 means all signatures and Merkle proofs check. - Open
crosswalk-report.html— each control row links to the exact receipts that satisfy it. - Sample-test by control: pick three receipts, replay the policy decision deterministically, confirm verdict.
- Walk lineage from any production inference back to the design approval. Same answer every time.
Total time to first opinion: hours, not weeks.
Starter pipeline — drop-in GitHub Actions
A minimal but complete workflow you can paste into your repo to get to "signed evidence in CI" in under an hour. Full templates are in the artifacts folder.
# .github/workflows/govern.yml
name: AI governance pipeline
on: [push, pull_request]
permissions:
contents: read
id-token: write # OIDC for keyless signing
attestations: write
packages: write
jobs:
govern:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- name: Verify signed commits
run: git verify-commit HEAD
- name: Build + SBOM
run: |
docker build -t $IMG .
syft $IMG -o cyclonedx-json > sbom.cdx.json
- name: Run eval suite
run: |
python -m evals.run --suite safety,bias,perf \
--out evals.json
- name: Policy gate (Conftest + Beacon checklist)
run: |
conftest test policies/gates/ \
--policy policies/rego/ \
--data evals.json sbom.cdx.json
aigovops-beacon checklist run \
--pack crosswalks/nist-ai-rmf.yaml \
--evidence . \
--emit-receipt gate.evaluated
- name: Sign image + attest
env:
COSIGN_EXPERIMENTAL: "1"
run: |
cosign sign $IMG
cosign attest --predicate sbom.cdx.json \
--type cyclonedx $IMG
cosign attest --predicate evals.json \
--type https://aigovops.org/eval/v1 $IMG
- name: Emit + anchor evidence bundle
run: |
aigovops-beacon export \
--window pr-$GITHUB_SHA \
--out bundle.tar.gz
aigovops-beacon anchor bundle.tar.gz \
--transparency rekor
- uses: actions/upload-artifact@v4
with:
name: evidence-bundle
path: bundle.tar.gz
Take it with you
govern.yml
Drop-in GitHub Actions workflow.
receipt.schema.json
JSON Schema for the shared evidence model.
example.rego
Sample Rego gate referencing receipt fields.
gate.example.yaml
Matched-pair YAML for the Rego policy.
crosswalk.nist-ai-rmf.yaml
NIST AI RMF control → receipt mapping.
blueprint.md
This blueprint as a single Markdown file.