Reference Architecture Blueprint · v1.0 · advisory under the Foundation

Map every governance control to a CI/CD stage. Sign the receipt. Hand the auditor a bundle.

A reference architecture for embedding AI governance — automated evidence collection, cryptographic audit logging, data & model lineage, and Policy-as-Code guardrails — into a modern GitHub-native CI/CD pipeline. One shared evidence model that policy and engineering teams both write against. Auditor-ready by design.

Tour the three stages Download artifacts

Ed25519 + RFC 8785 JCS
OVERT 1.0 envelope
SLSA v1.0 provenance
in-toto attestations
OpenLineage 1.x
OPA / Conftest
NIST AI RMF · ISO 42001 · EU AI Act

Six principles this blueprint is built on

1 · Evidence is the product

Every governance decision emits a signed receipt. If there is no receipt, it did not happen. The build artifact, the model, the policy decision, and the deployment are all peers — each gets a signed attestation.

2 · One shared schema

Policy team and engineering team write against the same OVERT envelope. Controls reference the receipt fields they verify. No second spreadsheet, no quarterly reconciliation.

3 · Policy lives in the repo

Rego, gate YAML, and crosswalks are committed code. Changes go through pull request, dual-control approval, and the same CI as application code.

4 · Pipelines are the perimeter

The only path to production is the signed pipeline. Out-of-band deploys are detected by drift checks and rejected by admission controllers.

5 · Lineage is automatic

Datasets, prompts, models, and outputs are linked by hash from collection through inference. Auditors can replay any decision back to its inputs.

6 · Humans hold legitimacy

Agents do the bureaucracy. Exceptions, key rotations, and trust-tier promotions require a named human approver — and that approval is itself a receipt.

The pipeline, end to end

Three lifecycle stages — Design, Deployment, Operations — sit on top of four shared planes: the Evidence Plane (signed receipts), the Policy Plane (OPA + gate YAML), the Lineage Plane (OpenLineage graph), and the Trust Plane (KMS, OIDC, transparency log).

flowchart TB
    classDef design fill:#e6f3f4,stroke:#01696f,color:#01314c,stroke-width:1.5px
    classDef deploy fill:#fef6e0,stroke:#b87b00,color:#4a3000,stroke-width:1.5px
    classDef ops fill:#e8f5ea,stroke:#2ecc71,color:#0d3d1a,stroke-width:1.5px
    classDef plane fill:#01314c,stroke:#01696f,color:#fff,stroke-width:1.5px

    subgraph DESIGN["DESIGN STAGE"]
      direction TB
      D1[Use-case intake
risk classification]
      D2[Data sheet +
model card draft]
      D3[Threat model
red-team plan]
      D4[Design review
signed approval]
      D1 --> D2 --> D3 --> D4
    end

    subgraph DEPLOY["DEPLOYMENT STAGE"]
      direction TB
      P1[Source commit
signed by dev]
      P2[Build + SBOM
SLSA provenance]
      P3[Eval suite
bias · safety · perf]
      P4[Policy gate
OPA / Conftest]
      P5[Signed release
cosign + attest]
      P6[Admission control
verify in cluster]
      P1 --> P2 --> P3 --> P4 --> P5 --> P6
    end

    subgraph OPS["OPERATIONS STAGE"]
      direction TB
      O1[Inference observed
OVERT receipt]
      O2[Drift + safety
monitors]
      O3[Incident detect
+ kill switch]
      O4[Continuous
re-evaluation]
      O1 --> O2 --> O3 --> O4 --> O1
    end

    DESIGN ==> DEPLOY ==> OPS

    subgraph PLANES["SHARED PLANES"]
      direction LR
      EV[(Evidence Plane
signed receipts
Merkle anchored)]
      PL[(Policy Plane
Rego · gate YAML
crosswalks)]
      LN[(Lineage Plane
OpenLineage
graph)]
      TR[(Trust Plane
KMS · OIDC
transparency log)]
    end

    DESIGN -.-> EV & PL & LN & TR
    DEPLOY -.-> EV & PL & LN & TR
    OPS -.-> EV & PL & LN & TR

    class D1,D2,D3,D4 design
    class P1,P2,P3,P4,P5,P6 deploy
    class O1,O2,O3,O4 ops
    class EV,PL,LN,TR plane

Design Stage

Before any code is written: classify the risk, document the intent, sign the design.

What happens

Use-case intake. A YAML use-case manifest is created in the governance repo (one PR per use case). Risk tier is set by a classifier policy (EU AI Act Annex III / NIST AI RMF Map function).
Data sheet & model card. Templates auto-populate from upstream inventory; the author fills gaps. Hashes of training data and base model are pinned.
Threat model & red-team plan. Generated from the risk tier — high-risk gets STRIDE + AI-specific (prompt injection, poisoning, confabulation) plus a mandatory red-team budget.
Design review. Named approvers (product, security, privacy, ethics for high risk) sign a design.approved receipt. This receipt becomes the parent of every downstream attestation.

Evidence collected

Artifact	Format	Receipt event
Use-case manifest	YAML in repo, hashed	`design.usecase.registered`
Risk classification	OPA decision log	`design.risk.classified`
Data sheet	Markdown + SHA-256	`design.datasheet.published`
Model card	Markdown + SHA-256	`design.modelcard.published`
Threat model	Markdown + STRIDE table	`design.threatmodel.completed`
Design approval	Signed PR review + Ed25519	`design.approved`

Controls satisfied

NIST AI RMF: MAP-1.1, MAP-2.1, MAP-3.4, MAP-4.1, GOVERN-1.2
ISO/IEC 42001: 6.1.2 (AI risk assessment), 6.1.4 (AI system impact assessment), A.6.2.2 (objectives)
EU AI Act: Art. 9 (risk management system), Art. 11 + Annex IV (technical documentation), Art. 27 (FRIA for high-risk)

Deployment Stage

The CI/CD pipeline is the only path to production. Every step is a signed attestation; every artifact carries its provenance.

flowchart LR
    classDef step fill:#fef6e0,stroke:#b87b00,color:#4a3000,stroke-width:1.5px
    classDef evidence fill:#01314c,stroke:#01696f,color:#fff
    classDef gate fill:#fde2e0,stroke:#c0392b,color:#5a1410,stroke-width:1.5px

    DEV[Developer
signed commit
gitsign] --> BUILD[GitHub Actions
OIDC to cloud
hermetic build]
    BUILD --> SBOM[SBOM
CycloneDX]
    BUILD --> PROV[SLSA provenance
in-toto v1]
    BUILD --> EVAL[Model eval suite
bias · safety · perf · red-team]
    EVAL --> ATT[Eval attestation
signed predicate]
    SBOM --> GATE
    PROV --> GATE
    ATT --> GATE
    GATE{Policy gate
OPA / Conftest
+ Beacon checklist}
    GATE -- pass --> SIGN[cosign sign
image + attestations]
    GATE -- fail --> BLOCK[blocked + receipt
gate.failed]
    SIGN --> REG[(OCI registry
+ Rekor log)]
    REG --> ADM[Admission controller
verify signatures
verify attestations]
    ADM -- pass --> PROD[Production]
    ADM -- fail --> QUAR[Quarantine + receipt]

    class DEV,BUILD,SBOM,PROV,EVAL,ATT,SIGN,REG,ADM,PROD step
    class GATE gate
    class BLOCK,QUAR evidence

Pipeline stages & evidence

Stage	Tooling	Attestation
Commit	gitsign + Sigstore	Signed commit, signed tag
Build	GH Actions, hermetic runner, OIDC	SLSA v1.0 build provenance
SBOM	Syft / cdxgen	CycloneDX 1.6 + sig
Test & eval	pytest, DeepEval, Garak, OWASP LLM	in-toto eval predicate
Policy gate	OPA / Conftest + Beacon checklist runner	`gate.evaluated` receipt
Sign	cosign keyless (Fulcio + Rekor)	Sigstore bundle
Admit	Kyverno / Gatekeeper / Connaisseur	Admission decision receipt

Policy-as-Code anatomy

Every gate is two files committed in the governance repo:

gate.<org>.v1.yaml — declarative matched-pair: which receipts to look at, which evaluations to run, what thresholds count as pass.
<org>_audit.rego — the executable counterpart. Same logic, machine-checked. Conftest runs both against the bundle.

When a control fires, the gate emits a gate.evaluated receipt referencing the inputs by hash. Auditors replay any decision deterministically.

Controls satisfied

NIST AI RMF: MEASURE-2.3, MEASURE-2.7, MEASURE-2.11, MANAGE-1.3, MANAGE-2.2
ISO/IEC 42001: 8.2 (AI system impact assessment), 8.3 (AI system lifecycle), A.6.2.5 (verification & validation), A.8.2 (system documentation)
EU AI Act: Art. 15 (accuracy, robustness, cybersecurity), Art. 17 (QMS), Art. 18 (record-keeping), Art. 43 (conformity assessment)
SLSA v1.0: Build Levels 2–3 (hosted, hardened, isolated build)

Operations Stage

In production, the receipts never stop. Every inference, every drift event, every kill-switch firing produces evidence that closes the loop back to design.

What runs in production

OVERT-shaped inference receipts. The runtime SDK (or sidecar proxy) emits one inference.observed receipt per call: user, vendor, model, version, prompt-hash, result-hash, latency, signature.
Hourly Merkle anchor. All receipts in the hour are hashed into a Merkle tree; the root is published to a transparency log (Rekor or internal) and optionally to S3 / IPFS for off-org witnesses.
Drift & safety monitors. Continuous evaluation against the same suites used in deployment. Threshold breach → monitor.threshold.breached receipt → automatic ticket + policy re-eval.
Incident & kill switch. Severity-tagged events trigger pre-approved runbooks. Disabling a tool, blocking a source, or pausing an agent is itself a signed receipt.
Continuous re-evaluation. Quarterly (or trigger-based) re-run of the deployment policy gate against fresh production data. Drift in real behavior reopens the design loop.

Evidence collected

Event	Source	Receipt event
Model call	SDK / proxy	`inference.observed`
Tool call	Agent runtime	`agent.tool.called`
Retrieval	RAG layer	`agent.retrieval.hit`
Drift detected	Monitor	`monitor.drift.detected`
Safety violation	LLM firewall	`guardrail.violated`
Kill switch	Runbook	`incident.killswitch.fired`
Hourly anchor	Beacon anchor svc	`bundle.anchored`

Controls satisfied

NIST AI RMF: MEASURE-2.8, MEASURE-3.1, MEASURE-4.1, MANAGE-2.3, MANAGE-4.1, MANAGE-4.3
ISO/IEC 42001: 9.1 (monitoring & measurement), 10.2 (continual improvement), A.6.2.7 (operation & monitoring)
EU AI Act: Art. 14 (human oversight), Art. 19 (automatically generated logs), Art. 26 (deployer obligations), Art. 72 (post-market monitoring), Art. 73 (incident reporting)

The shared evidence model

One envelope binds policy and engineering. Beacon receipts are OVERT 1.0 envelopes with the aigovops-beacon.v1 profile. Policy team writes controls that reference these fields; engineering writes SDKs that emit them. Same schema, two teams, zero translation overhead.

Envelope (OVERT 1.0, normative)

{
  "id": "01HXYZ8K3F2N5Q1R7S9V3W4Y6B",
  "ts_utc": "2026-05-23T18:04:22.318Z",
  "user": {
    "sub": "build-runner@github",
    "email": "actions@github.com",
    "oidc_issuer": "https://token.actions.githubusercontent.com"
  },
  "vendor": "anthropic",
  "model": "claude-sonnet-4-5",
  "version": "2026-04-01",
  "prompt_hash": "sha256:9f2c…",
  "result_hash": "sha256:7b1a…",
  "event_type": "gate.evaluated",
  "environment": "cloud_saas",
  "evidence_id": "01HXYZ8K3F2N5Q1R7S9V3W4Y6A",
  "parent_receipt_id": "01HXYZ8K3F2N5Q1R7S9V3W4Y69",
  "signature": {
    "alg": "Ed25519",
    "key_fpr": "SHA256:9p2x…",
    "sig_b64": "MEUCIQ…",
    "canonical_form": "json/c14n-rfc8785"
  }
}

Profile extensions (aigovops-beacon.v1)

{
  "profile": "aigovops-beacon.v1",
  "control_refs": [
    "nist-ai-rmf:MEASURE-2.7",
    "iso-42001:A.6.2.5",
    "eu-ai-act:Art.15"
  ],
  "subject": {
    "name": "ghcr.io/aigovops/foo:v1.2.3",
    "digest": {
      "sha256": "abc123…"
    }
  },
  "lineage": {
    "openlineage_run_id": "f3b1…",
    "parent_datasets": [
      "sha256:111…",
      "sha256:222…"
    ],
    "parent_models": [
      "sha256:333…"
    ]
  },
  "decision": {
    "result": "pass",
    "rules_evaluated": 47,
    "rules_failed": 0,
    "exception_id": null
  }
}

How both teams use it

Field	Policy team reads	Eng team writes
`event_type`	Triggers control evaluation	SDK sets per call site
`subject.digest`	What was governed	Build emits content hash
`lineage.parent_*`	Traceability check	OpenLineage hook
`control_refs`	Crosswalk lookup	Pipeline template prefills
`decision.result`	Gate verdict	OPA writes from Rego
`signature`	Verifies authenticity	KMS / cosign signs

Receipt chains = audit trails

Receipts link via parent_receipt_id. A finished bundle for one model deployment looks like:

design.approved
  └─ build.completed
       └─ eval.completed
            └─ gate.evaluated (pass)
                 └─ bundle.signed
                      └─ admission.allowed
                           └─ inference.observed × N
                                └─ monitor.drift.detected
                                     └─ gate.evaluated (re-eval)

An auditor follows the chain backwards from any production call to the design approval — and every step is cryptographically verifiable in seconds.

Infrastructure requirements

What you actually need to stand up. Two tiers: a minimum viable governance plane that fits in a single GitHub org plus one cloud account, and a scaled tier for multi-tenant enterprise use.

Tier 1 · Minimum viable

Source: GitHub org with branch protection, required reviews, signed commits enforced.
Build: GitHub-hosted Actions runners; OIDC federation to cloud (no long-lived secrets).
Signing: Sigstore keyless (Fulcio + Rekor public good) or cloud KMS (AWS KMS / GCP KMS / Azure Key Vault).
Artifact registry: GHCR or any OCI registry that stores referrers.
Evidence store: S3 / GCS bucket with object lock (WORM) + Beacon NDJSON receipts.
Policy engine: OPA / Conftest in CI; Kyverno or Gatekeeper at admission.
Lineage: OpenLineage emitting to Marquez (self-hosted, single VM).
Transparency: Public Rekor or self-hosted; Merkle anchors to S3.

Tier 2 · Scaled enterprise

Source: Multiple orgs + GitHub Enterprise audit log streamed to SIEM.
Build: Self-hosted ARC runners in isolated VPC; reproducible builds.
Signing: HSM-backed KMS with split-key custody; private Fulcio CA.
Artifact registry: Harbor / Artifactory with vulnerability scanning.
Evidence store: Dedicated WORM bucket per business unit + cross-region replication; 7-year retention.
Policy engine: Centralized OPA cluster + bundle service; Styra or in-house control plane.
Lineage: Managed OpenLineage backend (e.g., Astronomer, DataHub) with cross-team graph.
Transparency: Private Rekor instance + dual anchoring to public chain for high-risk systems.
SIEM / SOAR: Receipt stream feeds detection + runbooks for incident reporting (EU AI Act Art. 73).

Minimum compute footprint (Tier 1)

The governance plane itself is small: one t3.small for Marquez, one t3.micro for the Beacon receipt service, one S3 bucket, one KMS key. The expensive part is the eval suite — budget GPU time per model release, not per receipt.

How technical & policy teams collaborate

Same repo. Same pull-request workflow. Same evidence model. Different files.

Repository layout

aigovops-beacon/
├── crosswalks/             # policy team owns
│   ├── nist-ai-rmf.yaml
│   ├── iso-42001.yaml
│   └── eu-ai-act.yaml
├── policies/               # joint ownership
│   ├── gates/              # YAML (policy team)
│   └── rego/               # Rego (eng team)
├── usecases/               # product owners
│   └── *.yaml
├── modelcards/             # ML team
├── datasheets/             # data team
├── attestations/           # generated, signed
└── .github/workflows/
    ├── design-review.yml
    ├── build-sign-attest.yml
    └── evidence-bundle.yml

RACI by activity

Activity	Policy	Eng	Product	Audit
Crosswalk authoring	R/A	C	I	C
Rego implementation	C	R/A	I	I
Use-case intake	C	I	R/A	I
Model card	C	R/A	C	I
Design approval	A	R	R	I
Pipeline gate config	C	R/A	I	C
Exception grant	A	R	R	I
Incident response	C	R/A	C	I
Audit bundle export	R	R	I	A

The four collaboration rituals

Crosswalk PRs

Policy team opens a PR mapping a new regulatory clause to existing receipt fields. CI fails if the clause references a field that does not exist in the OVERT envelope or profile — forcing schema evolution to happen with both teams in the room.

Matched-pair commits

Every gate YAML change ships in the same PR as its Rego counterpart. Conftest verifies they agree on a synthetic test bundle before merge.

Exception receipts

When the gate fails and business needs the release, a named approver opens a signed exception.granted receipt with TTL and conditions. Auditors see every exception by query.

Quarterly bundle review

Policy and eng leads jointly export the previous quarter's bundle, run the auditor verification CLI, and review any drift between intended controls and observed receipts.

Auditor-ready documentation, by construction

The auditor never asks "do you have evidence?" — they ask "verify this hash." Three artifacts make that possible.

The bundle

bundle.tar.gz contains every receipt for a system over a window, the crosswalk used, the policies that ran, the data + model cards, and a manifest with Merkle roots. Generated by aigovops-beacon export; size is typically < 50 MB per quarter per system.

The verifier

A 200-line Go binary (beacon-verify) the auditor runs themselves. It rehashes every receipt, checks every signature against the published key transparency log, walks the Merkle tree, and prints a pass/fail line per control. No SaaS, no login, no trust required.

The crosswalk

For each control in NIST AI RMF / ISO 42001 / EU AI Act, the crosswalk YAML lists the receipts that satisfy it and the queries to run. The auditor reads one document and gets pointers into the bundle for everything they need.

What the auditor sees on day one

Download bundle.tar.gz and beacon-verify.
Run beacon-verify bundle.tar.gz — exit code 0 means all signatures and Merkle proofs check.
Open crosswalk-report.html — each control row links to the exact receipts that satisfy it.
Sample-test by control: pick three receipts, replay the policy decision deterministically, confirm verdict.
Walk lineage from any production inference back to the design approval. Same answer every time.

Total time to first opinion: hours, not weeks.

Starter pipeline — drop-in GitHub Actions

A minimal but complete workflow you can paste into your repo to get to "signed evidence in CI" in under an hour. Full templates are in the artifacts folder.

# .github/workflows/govern.yml
name: AI governance pipeline
on: [push, pull_request]

permissions:
  contents: read
  id-token: write       # OIDC for keyless signing
  attestations: write
  packages: write

jobs:
  govern:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 0 }

      - name: Verify signed commits
        run: git verify-commit HEAD

      - name: Build + SBOM
        run: |
          docker build -t $IMG .
          syft $IMG -o cyclonedx-json > sbom.cdx.json

      - name: Run eval suite
        run: |
          python -m evals.run --suite safety,bias,perf \
            --out evals.json

      - name: Policy gate (Conftest + Beacon checklist)
        run: |
          conftest test policies/gates/ \
            --policy policies/rego/ \
            --data evals.json sbom.cdx.json
          aigovops-beacon checklist run \
            --pack crosswalks/nist-ai-rmf.yaml \
            --evidence . \
            --emit-receipt gate.evaluated

      - name: Sign image + attest
        env:
          COSIGN_EXPERIMENTAL: "1"
        run: |
          cosign sign $IMG
          cosign attest --predicate sbom.cdx.json \
            --type cyclonedx $IMG
          cosign attest --predicate evals.json \
            --type https://aigovops.org/eval/v1 $IMG

      - name: Emit + anchor evidence bundle
        run: |
          aigovops-beacon export \
            --window pr-$GITHUB_SHA \
            --out bundle.tar.gz
          aigovops-beacon anchor bundle.tar.gz \
            --transparency rekor

      - uses: actions/upload-artifact@v4
        with:
          name: evidence-bundle
          path: bundle.tar.gz

Map every governance control to a CI/CD stage. Sign the receipt. Hand the auditor a bundle.

Six principles this blueprint is built on

1 · Evidence is the product

2 · One shared schema

3 · Policy lives in the repo

4 · Pipelines are the perimeter

5 · Lineage is automatic

6 · Humans hold legitimacy

The pipeline, end to end

Design Stage

What happens

Evidence collected

Controls satisfied

Deployment Stage

Pipeline stages & evidence

Policy-as-Code anatomy

Controls satisfied

Operations Stage

What runs in production

Evidence collected

Controls satisfied

The shared evidence model

Envelope (OVERT 1.0, normative)

Profile extensions (aigovops-beacon.v1)

How both teams use it

Receipt chains = audit trails

Infrastructure requirements

Tier 1 · Minimum viable

Tier 2 · Scaled enterprise

Minimum compute footprint (Tier 1)

How technical & policy teams collaborate

Repository layout

RACI by activity

The four collaboration rituals

Crosswalk PRs

Matched-pair commits

Exception receipts

Quarterly bundle review

Auditor-ready documentation, by construction

The bundle

The verifier

The crosswalk

What the auditor sees on day one

Starter pipeline — drop-in GitHub Actions

Take it with you

govern.yml

receipt.schema.json

example.rego

gate.example.yaml

crosswalk.nist-ai-rmf.yaml

blueprint.md