ADR-0028: Almathal AI Governance Posture

Accepted

Date: 2026-06-25

Context

Almathal operates three categories of AI agents as part of its platform:

The Stitcher — an LLM that fills Seams, generates UI, and handles migration steps requiring LLM involvement. Invoked on every generation and migration.
The AI Reviewer — an LLM that performs Output Review after generation, producing advisory warnings about anti-patterns, security concerns, and consistency issues.
Curation Pipeline agents — LLMs that draft Manifests, evaluate Adapter and Module candidates, and propose Capability namespace additions. Operate continuously in the background.

As of mid-2026, the EU AI Act’s most substantive provisions are fully enforced, ISO/IEC 42001 is an active procurement criterion in regulated industries, and the NIST AI Risk Management Framework is effectively mandatory for US federal contractors and their suppliers. Organizations deploying AI without a governance framework are building on a foundation that regulators, auditors, and procurement teams actively scrutinize.

Almathal’s target customers are enterprise teams in exactly these regulated environments. The platform must be able to demonstrate its own AI governance posture to procurement teams, not merely help customers generate code that satisfies their governance requirements. The two obligations are distinct and both apply.

AI governance must be built in from the beginning, not retrofitted. Every design decision that makes governance harder to add later is a decision against the platform’s enterprise positioning.

Decision

Agent Autonomy Classification

Each AI agent in the platform is classified on a five-tier autonomy scale consistent with Singapore’s IMDA Model AI Governance Framework for agentic AI (the most specific available framework for agent autonomy as of 2026):

Agent	Autonomy Level	Classification	Rationale
Stitcher (standard Seams)	Level 3	High autonomy within constraints	Operates without human intervention during stitching; constrained by Seam contracts, Build Verification, and deterministic rails
Stitcher (seam_contract_change / api_breaking_change steps)	Level 2	Conditional autonomy	Requires customer-provided context at migration plan approval before acting; output is subject to additional verification
AI Reviewer	Level 2	Conditional autonomy	Advisory only; findings require human review before any action is taken; does not block pipeline
Curation Pipeline agents (low-risk Manifests)	Level 3	High autonomy within constraints	Agent-ratified for defined low-risk categories per ADR-0018; human sampling post-hoc
Curation Pipeline agents (high-risk Manifests)	Level 1	Human-supervised	Human Reviewer ratification required before admission

Level 3 agents operate autonomously within hard constraints. This is the core principle: autonomy is permitted only inside deterministic rails. The Stitcher may produce any code within a Seam contract; it may not produce code outside the declared output paths, may not use Adapters not listed in the Seam contract, and may not bypass Build Verification. These are enforced mechanically, not via prompt instruction. Telling an agent “follow our constraints” in a prompt is probabilistic; wiring a validator that blocks non-compliant output is deterministic.

Decision Logging Requirements

Every AI agent invocation must produce a structured log entry. This is not optional per invocation — it is a platform invariant. The log is append-only and tamper-evident.

Minimum required fields per Stitcher invocation:

{
  "invocation_id": "uuid",
  "timestamp_utc": "ISO 8601",
  "agent_type": "stitcher",
  "agent_autonomy_level": 3,
  "model_id": "claude-sonnet-4-6",
  "model_version": "20251001",
  "platform_version": "0.1.0",
  "triggered_by": {
    "spec_id": "build-7a3f8c2e",
    "seam_id": "seam:rag-chatbot/document-ingestion",
    "step_type": "code_generation"
  },
  "input_summary": {
    "seam_contract_hash": "sha256:...",
    "variation_point_values_hash": "sha256:...",
    "context_tokens": 4200
  },
  "output_summary": {
    "files_produced": ["frontend/src/pages/DocumentList.tsx"],
    "output_tokens": 1800,
    "verification_passed": true,
    "retry_count": 0
  },
  "decision_rationale": "Filled seam entity_to_ui for entity Document using adapters react@18.3, tanstack-query@5.x, shadcn/ui@latest. Implemented pagination, filtering, and delete_action per seam contract.",
  "session_context": {
    "user_id": "user-abc123",
    "session_id": "session-xyz789",
    "machine_id": "container-f8a2b1c3",
    "ip_hash": "sha256:..."
  }
}

Decision rationale is required. The decision_rationale field captures why the Stitcher made its choices at a summary level — which adapters it used, which behaviors it implemented, which interpretation it applied when the Seam contract was ambiguous. This field is what makes the audit trail readable by governance teams, not just engineers.

Session context is required. User ID, session ID, and machine/container ID are logged with every invocation. These enable governance teams to answer: who initiated this generation, on what infrastructure, and when.

The same logging requirements apply to the AI Reviewer and Curation Pipeline agents, with agent-appropriate field names.

Deterministic Constraints on the Stitcher

The Stitcher’s autonomy is explicitly bounded by deterministic constraints that cannot be overridden by prompt instruction. These constraints are enforced at the platform level:

Output path constraints. The Stitcher may only write to paths declared in the Seam contract’s output_files. Any attempt to write to other paths is blocked.
Adapter usage constraints. The Stitcher must use the Adapters listed in must_use_adapters. Using a different Adapter is a verification failure.
Behavior constraints. The Seam contract’s must_implement list declares required behaviors. Build Verification checks for their presence.
Build Verification gate. The Stitcher’s output is mechanically verified (compile, lint, tests) before delivery. The Stitcher cannot bypass this gate.
Retry bound. The Stitcher may retry a failed Seam at most N times (platform-configurable, default 3). After N retries, the generation hard-fails. The Stitcher cannot self-extend its retry budget.

These constraints implement the harness engineering principle: agents perform better and more reliably within strict architectural boundaries enforced by deterministic validators, not by prompt-level trust.

Model Version Tracking

The model ID, model version, and platform version are recorded in every agent invocation log entry and in every generated app’s audit trail. Specifically:

The Spec records the model ID and version used by the Stitcher at generation time, alongside the Archetype version and Adapter versions. A generated app is reproducible only when the same model version is available.
The audit trail for every migration records the model used for any requires_stitcher: true steps.
When the platform’s configured Stitcher model changes (upgraded version, different provider), the change is recorded in the platform version history and in subsequent invocation logs.

Model version pinning (generating with the same model version as a prior build) is a future capability documented in Future Considerations.

Observability Layer

An observability UI is a v1 deliverable. The foundation must be in place for MVP.

MVP foundation (required for MVP launch):

All agent invocation log entries written to a structured, queryable store (not just flat logs)
Audit trail per generated app, retrievable by build ID, user, and timestamp range
API surface for programmatic access to the audit trail (for procurement team integrations)

V1 observability layer (builds on MVP foundation):

A dedicated observability UI accessible to IT and procurement teams (not just engineers)
Dashboard views: agent invocations over time, model usage distribution, verification pass/fail rates, retry rates
NIST AI RMF function mapping visible in the UI (per ADR-0029)
Filterable by: user, team, archetype, model, time period
Exportable to CSV/PDF for audit submission
Role-based access: engineers see generation details; IT/procurement teams see governance summaries; platform admins see everything

Certification Roadmap

The platform targets the following certifications in priority order:

SOC 2 Type II — first priority; required for most enterprise procurement conversations. Build audit logging and access controls from day one so the SOC 2 audit is evidence of normal operations, not a retrofit.
ISO/IEC 42001 — the AI management system standard. Required for regulated sector procurement. Augment Code already holds this certification; it is becoming a supplier criterion. Target: within 12 months of MVP launch.
NIST AI RMF alignment documentation — not a certification but a documented alignment between the platform’s governance posture and the RMF’s Govern/Map/Measure/Manage functions. Required for US federal and defense-adjacent procurement. Target: present at v1 launch.
EU AI Act compliance documentation — required for any customer with EU exposure. The platform’s AI agents (Stitcher, AI Reviewer, Curation agents) must be classified under the Act’s risk categories and governed accordingly. The autonomy classification in this ADR is the starting point. Target: before any EU customer goes live.
Industry-specific certifications (HIPAA, FedRAMP, SOC 2 for healthcare, etc.) — as customer segments demand. Not on the MVP or v1 roadmap; addressed when the first customer in that segment is in procurement.

The compliance posture must future-proof against stricter AI governance requirements. The EU AI Act was the opening movement; national and industry-specific frameworks will follow. Building governance infrastructure now is building a moat, not just a compliance checkbox.

Rationale

Governance from day one is the only viable posture for an enterprise-targeting AI platform in 2026. Retrofitting governance onto a system not designed for it produces weak, expensive governance. Building it in produces governance as a byproduct of normal operations — audit logs that exist because the system needs them, not because an auditor is coming.

Deterministic constraints over prompt instructions is the operational principle that makes Level 3 autonomy trustworthy. Telling an agent “follow our coding standards” in a prompt is fundamentally different from wiring a linter that blocks the PR when standards are violated. The first approach relies on probabilistic compliance; the second enforces deterministic constraints. The same applies to the Stitcher: Seam contracts, output path constraints, and Build Verification are the deterministic harness the Stitcher operates within.

Decision rationale logging makes the audit trail useful to governance teams rather than just engineers. An audit trail that records what the Stitcher produced but not why is insufficient for EU AI Act traceability requirements. The rationale field closes this gap.

Observability layer as a v1 deliverable reflects the reality that governance infrastructure is a feature, not an afterthought. Enterprise buyers will ask to see it in procurement conversations. IT and compliance teams need access to it. Building it into v1 rather than backlogging it is both correct product strategy and correct compliance strategy.

Future Considerations

Model version pinning. The ability to generate with a specific model version (not just record which version was used) ensures that a stored Spec produces identical or equivalent output when regenerated. This requires the platform to maintain access to prior model versions, which is provider-dependent. Deferred pending model provider agreements.

User-selectable Stitcher model (see also FE-MIGE-005). Future capability allowing enterprise teams to designate which LLM model serves as the Stitcher for their organization. The selected model is recorded in all invocation logs. Not planned for MVP or v1; noted here because the logging schema must accommodate a user_specified_model field without breaking changes.

Cross-agent governance. When the Curation Pipeline agents and the Stitcher operate in the same generation workflow (e.g., an agent proposes a new Adapter at the same time the Stitcher references it), the governance record must link the agent invocations so the full lineage is traceable. The current schema treats each invocation independently; a future schema links them via a workflow_id.

Agentic AI governance alignment. As NIST’s initiative to develop standards for autonomous AI agents matures (launched February 2026), Almathal’s autonomy classification and constraint model will need to be re-evaluated against the final standard. The current classification is based on Singapore’s IMDA framework as the most specific available. This ADR should be reviewed when NIST publishes its autonomous agent guidance.

References

ADR-0012: Sanity Checks at Five Pipeline Stages — establishes Build Verification as a deterministic gate on Stitcher output.
ADR-0018: LLM-Drafted Human-Directed Manifest Authoring — establishes the Curation Pipeline agent autonomy model.
ADR-0027: Migration Transformation Mechanism — establishes that migration steps requiring the Stitcher are logged per this ADR’s requirements.
ADR-0029: Customer AI Governance Outputs — the customer-facing counterpart to this ADR.
Architecture Overview — the Stitcher and AI Reviewer are defined here.
Governance → Curation Pipeline — Curation Pipeline agent governance is detailed here.