Reference Architecture Blueprint · v1.0 · advisory under the Foundation

Map every governance control to a CI/CD stage. Sign the receipt. Hand the auditor a bundle.

A reference architecture for embedding AI governance — automated evidence collection, cryptographic audit logging, data & model lineage, and Policy-as-Code guardrails — into a modern GitHub-native CI/CD pipeline. One shared evidence model that policy and engineering teams both write against. Auditor-ready by design.

  • Ed25519 + RFC 8785 JCS
  • OVERT 1.0 envelope
  • SLSA v1.0 provenance
  • in-toto attestations
  • OpenLineage 1.x
  • OPA / Conftest
  • NIST AI RMF · ISO 42001 · EU AI Act

Six principles this blueprint is built on

1 · Evidence is the product

Every governance decision emits a signed receipt. If there is no receipt, it did not happen. The build artifact, the model, the policy decision, and the deployment are all peers — each gets a signed attestation.

2 · One shared schema

Policy team and engineering team write against the same OVERT envelope. Controls reference the receipt fields they verify. No second spreadsheet, no quarterly reconciliation.

3 · Policy lives in the repo

Rego, gate YAML, and crosswalks are committed code. Changes go through pull request, dual-control approval, and the same CI as application code.

4 · Pipelines are the perimeter

The only path to production is the signed pipeline. Out-of-band deploys are detected by drift checks and rejected by admission controllers.

5 · Lineage is automatic

Datasets, prompts, models, and outputs are linked by hash from collection through inference. Auditors can replay any decision back to its inputs.

6 · Humans hold legitimacy

Agents do the bureaucracy. Exceptions, key rotations, and trust-tier promotions require a named human approver — and that approval is itself a receipt.

The pipeline, end to end

Three lifecycle stages — Design, Deployment, Operations — sit on top of four shared planes: the Evidence Plane (signed receipts), the Policy Plane (OPA + gate YAML), the Lineage Plane (OpenLineage graph), and the Trust Plane (KMS, OIDC, transparency log).

flowchart TB
    classDef design fill:#e6f3f4,stroke:#01696f,color:#01314c,stroke-width:1.5px
    classDef deploy fill:#fef6e0,stroke:#b87b00,color:#4a3000,stroke-width:1.5px
    classDef ops fill:#e8f5ea,stroke:#2ecc71,color:#0d3d1a,stroke-width:1.5px
    classDef plane fill:#01314c,stroke:#01696f,color:#fff,stroke-width:1.5px

    subgraph DESIGN["DESIGN STAGE"]
      direction TB
      D1[Use-case intake
risk classification] D2[Data sheet +
model card draft] D3[Threat model
red-team plan] D4[Design review
signed approval] D1 --> D2 --> D3 --> D4 end subgraph DEPLOY["DEPLOYMENT STAGE"] direction TB P1[Source commit
signed by dev] P2[Build + SBOM
SLSA provenance] P3[Eval suite
bias · safety · perf] P4[Policy gate
OPA / Conftest] P5[Signed release
cosign + attest] P6[Admission control
verify in cluster] P1 --> P2 --> P3 --> P4 --> P5 --> P6 end subgraph OPS["OPERATIONS STAGE"] direction TB O1[Inference observed
OVERT receipt] O2[Drift + safety
monitors] O3[Incident detect
+ kill switch] O4[Continuous
re-evaluation] O1 --> O2 --> O3 --> O4 --> O1 end DESIGN ==> DEPLOY ==> OPS subgraph PLANES["SHARED PLANES"] direction LR EV[(Evidence Plane
signed receipts
Merkle anchored)] PL[(Policy Plane
Rego · gate YAML
crosswalks)] LN[(Lineage Plane
OpenLineage
graph)] TR[(Trust Plane
KMS · OIDC
transparency log)] end DESIGN -.-> EV & PL & LN & TR DEPLOY -.-> EV & PL & LN & TR OPS -.-> EV & PL & LN & TR class D1,D2,D3,D4 design class P1,P2,P3,P4,P5,P6 deploy class O1,O2,O3,O4 ops class EV,PL,LN,TR plane
01

Design Stage

Before any code is written: classify the risk, document the intent, sign the design.

What happens

  1. Use-case intake. A YAML use-case manifest is created in the governance repo (one PR per use case). Risk tier is set by a classifier policy (EU AI Act Annex III / NIST AI RMF Map function).
  2. Data sheet & model card. Templates auto-populate from upstream inventory; the author fills gaps. Hashes of training data and base model are pinned.
  3. Threat model & red-team plan. Generated from the risk tier — high-risk gets STRIDE + AI-specific (prompt injection, poisoning, confabulation) plus a mandatory red-team budget.
  4. Design review. Named approvers (product, security, privacy, ethics for high risk) sign a design.approved receipt. This receipt becomes the parent of every downstream attestation.

Evidence collected

ArtifactFormatReceipt event
Use-case manifestYAML in repo, hasheddesign.usecase.registered
Risk classificationOPA decision logdesign.risk.classified
Data sheetMarkdown + SHA-256design.datasheet.published
Model cardMarkdown + SHA-256design.modelcard.published
Threat modelMarkdown + STRIDE tabledesign.threatmodel.completed
Design approvalSigned PR review + Ed25519design.approved

Controls satisfied

  • NIST AI RMF: MAP-1.1, MAP-2.1, MAP-3.4, MAP-4.1, GOVERN-1.2
  • ISO/IEC 42001: 6.1.2 (AI risk assessment), 6.1.4 (AI system impact assessment), A.6.2.2 (objectives)
  • EU AI Act: Art. 9 (risk management system), Art. 11 + Annex IV (technical documentation), Art. 27 (FRIA for high-risk)
02

Deployment Stage

The CI/CD pipeline is the only path to production. Every step is a signed attestation; every artifact carries its provenance.

flowchart LR
    classDef step fill:#fef6e0,stroke:#b87b00,color:#4a3000,stroke-width:1.5px
    classDef evidence fill:#01314c,stroke:#01696f,color:#fff
    classDef gate fill:#fde2e0,stroke:#c0392b,color:#5a1410,stroke-width:1.5px

    DEV[Developer
signed commit
gitsign] --> BUILD[GitHub Actions
OIDC to cloud
hermetic build] BUILD --> SBOM[SBOM
CycloneDX] BUILD --> PROV[SLSA provenance
in-toto v1] BUILD --> EVAL[Model eval suite
bias · safety · perf · red-team] EVAL --> ATT[Eval attestation
signed predicate] SBOM --> GATE PROV --> GATE ATT --> GATE GATE{Policy gate
OPA / Conftest
+ Beacon checklist} GATE -- pass --> SIGN[cosign sign
image + attestations] GATE -- fail --> BLOCK[blocked + receipt
gate.failed] SIGN --> REG[(OCI registry
+ Rekor log)] REG --> ADM[Admission controller
verify signatures
verify attestations] ADM -- pass --> PROD[Production] ADM -- fail --> QUAR[Quarantine + receipt] class DEV,BUILD,SBOM,PROV,EVAL,ATT,SIGN,REG,ADM,PROD step class GATE gate class BLOCK,QUAR evidence

Pipeline stages & evidence

StageToolingAttestation
Commitgitsign + SigstoreSigned commit, signed tag
BuildGH Actions, hermetic runner, OIDCSLSA v1.0 build provenance
SBOMSyft / cdxgenCycloneDX 1.6 + sig
Test & evalpytest, DeepEval, Garak, OWASP LLMin-toto eval predicate
Policy gateOPA / Conftest + Beacon checklist runnergate.evaluated receipt
Signcosign keyless (Fulcio + Rekor)Sigstore bundle
AdmitKyverno / Gatekeeper / ConnaisseurAdmission decision receipt

Policy-as-Code anatomy

Every gate is two files committed in the governance repo:

  • gate.<org>.v1.yaml — declarative matched-pair: which receipts to look at, which evaluations to run, what thresholds count as pass.
  • <org>_audit.rego — the executable counterpart. Same logic, machine-checked. Conftest runs both against the bundle.

When a control fires, the gate emits a gate.evaluated receipt referencing the inputs by hash. Auditors replay any decision deterministically.

Controls satisfied

  • NIST AI RMF: MEASURE-2.3, MEASURE-2.7, MEASURE-2.11, MANAGE-1.3, MANAGE-2.2
  • ISO/IEC 42001: 8.2 (AI system impact assessment), 8.3 (AI system lifecycle), A.6.2.5 (verification & validation), A.8.2 (system documentation)
  • EU AI Act: Art. 15 (accuracy, robustness, cybersecurity), Art. 17 (QMS), Art. 18 (record-keeping), Art. 43 (conformity assessment)
  • SLSA v1.0: Build Levels 2–3 (hosted, hardened, isolated build)
03

Operations Stage

In production, the receipts never stop. Every inference, every drift event, every kill-switch firing produces evidence that closes the loop back to design.

What runs in production

  1. OVERT-shaped inference receipts. The runtime SDK (or sidecar proxy) emits one inference.observed receipt per call: user, vendor, model, version, prompt-hash, result-hash, latency, signature.
  2. Hourly Merkle anchor. All receipts in the hour are hashed into a Merkle tree; the root is published to a transparency log (Rekor or internal) and optionally to S3 / IPFS for off-org witnesses.
  3. Drift & safety monitors. Continuous evaluation against the same suites used in deployment. Threshold breach → monitor.threshold.breached receipt → automatic ticket + policy re-eval.
  4. Incident & kill switch. Severity-tagged events trigger pre-approved runbooks. Disabling a tool, blocking a source, or pausing an agent is itself a signed receipt.
  5. Continuous re-evaluation. Quarterly (or trigger-based) re-run of the deployment policy gate against fresh production data. Drift in real behavior reopens the design loop.

Evidence collected

EventSourceReceipt event
Model callSDK / proxyinference.observed
Tool callAgent runtimeagent.tool.called
RetrievalRAG layeragent.retrieval.hit
Drift detectedMonitormonitor.drift.detected
Safety violationLLM firewallguardrail.violated
Kill switchRunbookincident.killswitch.fired
Hourly anchorBeacon anchor svcbundle.anchored

Controls satisfied

  • NIST AI RMF: MEASURE-2.8, MEASURE-3.1, MEASURE-4.1, MANAGE-2.3, MANAGE-4.1, MANAGE-4.3
  • ISO/IEC 42001: 9.1 (monitoring & measurement), 10.2 (continual improvement), A.6.2.7 (operation & monitoring)
  • EU AI Act: Art. 14 (human oversight), Art. 19 (automatically generated logs), Art. 26 (deployer obligations), Art. 72 (post-market monitoring), Art. 73 (incident reporting)

The shared evidence model

One envelope binds policy and engineering. Beacon receipts are OVERT 1.0 envelopes with the aigovops-beacon.v1 profile. Policy team writes controls that reference these fields; engineering writes SDKs that emit them. Same schema, two teams, zero translation overhead.

Envelope (OVERT 1.0, normative)

{
  "id": "01HXYZ8K3F2N5Q1R7S9V3W4Y6B",
  "ts_utc": "2026-05-23T18:04:22.318Z",
  "user": {
    "sub": "build-runner@github",
    "email": "actions@github.com",
    "oidc_issuer": "https://token.actions.githubusercontent.com"
  },
  "vendor": "anthropic",
  "model": "claude-sonnet-4-5",
  "version": "2026-04-01",
  "prompt_hash": "sha256:9f2c…",
  "result_hash": "sha256:7b1a…",
  "event_type": "gate.evaluated",
  "environment": "cloud_saas",
  "evidence_id": "01HXYZ8K3F2N5Q1R7S9V3W4Y6A",
  "parent_receipt_id": "01HXYZ8K3F2N5Q1R7S9V3W4Y69",
  "signature": {
    "alg": "Ed25519",
    "key_fpr": "SHA256:9p2x…",
    "sig_b64": "MEUCIQ…",
    "canonical_form": "json/c14n-rfc8785"
  }
}

Profile extensions (aigovops-beacon.v1)

{
  "profile": "aigovops-beacon.v1",
  "control_refs": [
    "nist-ai-rmf:MEASURE-2.7",
    "iso-42001:A.6.2.5",
    "eu-ai-act:Art.15"
  ],
  "subject": {
    "name": "ghcr.io/aigovops/foo:v1.2.3",
    "digest": {
      "sha256": "abc123…"
    }
  },
  "lineage": {
    "openlineage_run_id": "f3b1…",
    "parent_datasets": [
      "sha256:111…",
      "sha256:222…"
    ],
    "parent_models": [
      "sha256:333…"
    ]
  },
  "decision": {
    "result": "pass",
    "rules_evaluated": 47,
    "rules_failed": 0,
    "exception_id": null
  }
}

How both teams use it

FieldPolicy team readsEng team writes
event_typeTriggers control evaluationSDK sets per call site
subject.digestWhat was governedBuild emits content hash
lineage.parent_*Traceability checkOpenLineage hook
control_refsCrosswalk lookupPipeline template prefills
decision.resultGate verdictOPA writes from Rego
signatureVerifies authenticityKMS / cosign signs

Receipt chains = audit trails

Receipts link via parent_receipt_id. A finished bundle for one model deployment looks like:

design.approved
  └─ build.completed
       └─ eval.completed
            └─ gate.evaluated (pass)
                 └─ bundle.signed
                      └─ admission.allowed
                           └─ inference.observed × N
                                └─ monitor.drift.detected
                                     └─ gate.evaluated (re-eval)

An auditor follows the chain backwards from any production call to the design approval — and every step is cryptographically verifiable in seconds.

Infrastructure requirements

What you actually need to stand up. Two tiers: a minimum viable governance plane that fits in a single GitHub org plus one cloud account, and a scaled tier for multi-tenant enterprise use.

Tier 1 · Minimum viable

  • Source: GitHub org with branch protection, required reviews, signed commits enforced.
  • Build: GitHub-hosted Actions runners; OIDC federation to cloud (no long-lived secrets).
  • Signing: Sigstore keyless (Fulcio + Rekor public good) or cloud KMS (AWS KMS / GCP KMS / Azure Key Vault).
  • Artifact registry: GHCR or any OCI registry that stores referrers.
  • Evidence store: S3 / GCS bucket with object lock (WORM) + Beacon NDJSON receipts.
  • Policy engine: OPA / Conftest in CI; Kyverno or Gatekeeper at admission.
  • Lineage: OpenLineage emitting to Marquez (self-hosted, single VM).
  • Transparency: Public Rekor or self-hosted; Merkle anchors to S3.

Tier 2 · Scaled enterprise

  • Source: Multiple orgs + GitHub Enterprise audit log streamed to SIEM.
  • Build: Self-hosted ARC runners in isolated VPC; reproducible builds.
  • Signing: HSM-backed KMS with split-key custody; private Fulcio CA.
  • Artifact registry: Harbor / Artifactory with vulnerability scanning.
  • Evidence store: Dedicated WORM bucket per business unit + cross-region replication; 7-year retention.
  • Policy engine: Centralized OPA cluster + bundle service; Styra or in-house control plane.
  • Lineage: Managed OpenLineage backend (e.g., Astronomer, DataHub) with cross-team graph.
  • Transparency: Private Rekor instance + dual anchoring to public chain for high-risk systems.
  • SIEM / SOAR: Receipt stream feeds detection + runbooks for incident reporting (EU AI Act Art. 73).

Minimum compute footprint (Tier 1)

The governance plane itself is small: one t3.small for Marquez, one t3.micro for the Beacon receipt service, one S3 bucket, one KMS key. The expensive part is the eval suite — budget GPU time per model release, not per receipt.

How technical & policy teams collaborate

Same repo. Same pull-request workflow. Same evidence model. Different files.

Repository layout

aigovops-beacon/
├── crosswalks/             # policy team owns
│   ├── nist-ai-rmf.yaml
│   ├── iso-42001.yaml
│   └── eu-ai-act.yaml
├── policies/               # joint ownership
│   ├── gates/              # YAML (policy team)
│   └── rego/               # Rego (eng team)
├── usecases/               # product owners
│   └── *.yaml
├── modelcards/             # ML team
├── datasheets/             # data team
├── attestations/           # generated, signed
└── .github/workflows/
    ├── design-review.yml
    ├── build-sign-attest.yml
    └── evidence-bundle.yml

RACI by activity

ActivityPolicyEngProductAudit
Crosswalk authoringR/ACIC
Rego implementationCR/AII
Use-case intakeCIR/AI
Model cardCR/ACI
Design approvalARRI
Pipeline gate configCR/AIC
Exception grantARRI
Incident responseCR/ACI
Audit bundle exportRRIA

The four collaboration rituals

Crosswalk PRs

Policy team opens a PR mapping a new regulatory clause to existing receipt fields. CI fails if the clause references a field that does not exist in the OVERT envelope or profile — forcing schema evolution to happen with both teams in the room.

Matched-pair commits

Every gate YAML change ships in the same PR as its Rego counterpart. Conftest verifies they agree on a synthetic test bundle before merge.

Exception receipts

When the gate fails and business needs the release, a named approver opens a signed exception.granted receipt with TTL and conditions. Auditors see every exception by query.

Quarterly bundle review

Policy and eng leads jointly export the previous quarter's bundle, run the auditor verification CLI, and review any drift between intended controls and observed receipts.

Auditor-ready documentation, by construction

The auditor never asks "do you have evidence?" — they ask "verify this hash." Three artifacts make that possible.

The bundle

bundle.tar.gz contains every receipt for a system over a window, the crosswalk used, the policies that ran, the data + model cards, and a manifest with Merkle roots. Generated by aigovops-beacon export; size is typically < 50 MB per quarter per system.

The verifier

A 200-line Go binary (beacon-verify) the auditor runs themselves. It rehashes every receipt, checks every signature against the published key transparency log, walks the Merkle tree, and prints a pass/fail line per control. No SaaS, no login, no trust required.

The crosswalk

For each control in NIST AI RMF / ISO 42001 / EU AI Act, the crosswalk YAML lists the receipts that satisfy it and the queries to run. The auditor reads one document and gets pointers into the bundle for everything they need.

What the auditor sees on day one

  1. Download bundle.tar.gz and beacon-verify.
  2. Run beacon-verify bundle.tar.gz — exit code 0 means all signatures and Merkle proofs check.
  3. Open crosswalk-report.html — each control row links to the exact receipts that satisfy it.
  4. Sample-test by control: pick three receipts, replay the policy decision deterministically, confirm verdict.
  5. Walk lineage from any production inference back to the design approval. Same answer every time.

Total time to first opinion: hours, not weeks.

Starter pipeline — drop-in GitHub Actions

A minimal but complete workflow you can paste into your repo to get to "signed evidence in CI" in under an hour. Full templates are in the artifacts folder.

# .github/workflows/govern.yml
name: AI governance pipeline
on: [push, pull_request]

permissions:
  contents: read
  id-token: write       # OIDC for keyless signing
  attestations: write
  packages: write

jobs:
  govern:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 0 }

      - name: Verify signed commits
        run: git verify-commit HEAD

      - name: Build + SBOM
        run: |
          docker build -t $IMG .
          syft $IMG -o cyclonedx-json > sbom.cdx.json

      - name: Run eval suite
        run: |
          python -m evals.run --suite safety,bias,perf \
            --out evals.json

      - name: Policy gate (Conftest + Beacon checklist)
        run: |
          conftest test policies/gates/ \
            --policy policies/rego/ \
            --data evals.json sbom.cdx.json
          aigovops-beacon checklist run \
            --pack crosswalks/nist-ai-rmf.yaml \
            --evidence . \
            --emit-receipt gate.evaluated

      - name: Sign image + attest
        env:
          COSIGN_EXPERIMENTAL: "1"
        run: |
          cosign sign $IMG
          cosign attest --predicate sbom.cdx.json \
            --type cyclonedx $IMG
          cosign attest --predicate evals.json \
            --type https://aigovops.org/eval/v1 $IMG

      - name: Emit + anchor evidence bundle
        run: |
          aigovops-beacon export \
            --window pr-$GITHUB_SHA \
            --out bundle.tar.gz
          aigovops-beacon anchor bundle.tar.gz \
            --transparency rekor

      - uses: actions/upload-artifact@v4
        with:
          name: evidence-bundle
          path: bundle.tar.gz