ONE BINARY. ONE PROCESS TO AUDIT.
A Claude agent in production calls a tool that touches a customer database. In a stock deployment nothing inspects that call before it runs. systemprompt.io is the in-process control plane that does.
What is in the binary
A platform engineer rolling out Claude across teams ends up wiring six things together by hand: an OAuth server, an RBAC layer, an MCP host that supervises agent processes, a secret store that keeps API keys out of model prompts, an audit pipeline a SOC 2 auditor will accept, and a way to ship skills without asking each user to clone a repo. Six vendors, six failure modes, six upgrade cycles. systemprompt.io is one Rust binary compiled from the same Cargo workspace:
- Identity. An OAuth2 server with PKCE (S256) under
crates/entry/api/src/routes/oauth, WebAuthn passkeys, and JWT claims defined inshared/models/src/auth/claims.rs. - RBAC and rate limits.
enforce_rbac_from_registryvalidates the JWT, checks OAuth2 scopes, and runs before every governed handler. Rate limit multipliers (Admin 10x, User 1x, Service 5x, A2a 5x, Mcp 5x, Anonymous 0.5x, burst 3x default) live inprofile/rate_limits.rs. - Agent lifecycle. The
agent_orchestrationmodule owns process supervision:orchestrator/,lifecycle/,reconciler.rs,process.rs,port_manager.rs,monitor.rs,event_bus.rs. - Secrets. Per-profile secrets configured in
profile/secrets.rsare injected server-side into MCP tool backends. They are not serialised into model context. - Audit. Tool executions are persisted to
mcp_tool_executionswith input, output, status, user_id, session_id, and trace_id before the response returns to the caller. - Skill distribution. Versioned skills exported as Markdown with YAML frontmatter through the sync export pipeline.
One cargo build --release. One artifact. One PID to monitor. The same binary runs on a laptop, in a Kubernetes pod, and inside an air-gapped subnet. No sidecars, no outbound calls, no second service to upgrade in lockstep.
- enforce_rbac_from_registry — Validates the JWT, extracts user claims, checks OAuth2 scopes against the required permission, and returns 403 before the handler runs. Six permission tiers with hierarchy (Admin 100 implies all lower tiers).
- agent_orchestration — Supervises MCP server processes. The reconciler drives desired state to actual state, the port manager assigns isolated ports, the monitor health-checks each process, and the event bus emits lifecycle events.
- Server-side secrets — Profile secrets are merged into MCP tool calls inside the binary. The model sees the tool result, never the credential. Validation mode (strict, warn, skip) is per-profile in profile/secrets.rs.
Keep Okta. Keep Vault. Keep Splunk.
The security team has already bought identity, secrets, and SIEM. They will not approve a deployment that asks them to rip any of it out. systemprompt.io does not require them to. The Extension trait in crates/shared/extension/src/traits.rs exposes default-impl override points for AI providers (llm_providers), jobs (jobs), schemas (schemas), routes (router), and tool providers (tool_providers). Extensions compose additively at startup via inventory::submit!, so anything a user extension does not override stays populated by the core's own extension registrations.
Integrate. Identity is not an Extension trait override. It is OAuth2 federation at runtime configuration: point the binary at Okta or Auth0 as the issuer and the external JWT flows into enforce_rbac_from_registry unchanged. Forward analytics out to Datadog or Splunk with a scheduled Job that reads the analytics schema and pushes, or with a HookEvent handler. Pull credentials from Vault the same way: a Job impl that writes into the server-side secret store. Identity, secrets, and telemetry stay in the systems the security team already audits. RBAC, rate limits, and the ten lifecycle hooks still run on every request inside the same process.
Replace. For air-gapped or greenfield deployments, the binary ships the built-ins: OAuth2 with PKCE (S256) validated in authorize/validation.rs, JWT generation in oauth/services/generation.rs, WebAuthn passkeys, profile-defined rate limits in rate_limits.rs, and per-request cost attribution in microdollars via analytics/repository/costs.rs. One artifact. One dependency: PostgreSQL.
- Identity: federated or built-in — Federate to Okta or Auth0 over OAuth2, or run the built-in server with PKCE (S256) and WebAuthn. Either way the JWT lands in enforce_rbac_from_registry before the handler runs.
- Secrets and telemetry: forward or self-host — Pull credentials from Vault or AWS Secrets Manager via a Job impl, or use server-side secret injection from profile/secrets.rs. Forward analytics out through a scheduled Job or HookEvent handler, or query costs.rs directly in Postgres.
- Governance is not optional — Whichever identity and telemetry path you pick, every governed handler still passes through enforce_rbac_from_registry and the rate limit multipliers in rate_limits.rs. There is no bypass path.
How it talks to the rest of the stack
An engineer reading the architecture wants to know exactly where the binary ends and the rest of the world begins. The boundary is concrete. AI model calls, identity providers, SIEM sinks, MCP-speaking agents, and operator browsers sit outside. Everything inside the binary terminates at one of four named surfaces.
AI providers. ProviderFactory::create in provider_factory.rs matches a string ("anthropic", "openai", "gemini") to a concrete AiProvider implementation, with optional custom endpoint and web search. Cost is attributed per request in microdollars in costs.rs.
Identity. The OAuth2 discovery document is served from discovery.rs at /.well-known/oauth-authorization-server. Federated tokens are validated by enforce_rbac_from_registry; nothing else gets to call the model.
Telemetry. Analytics events implement the ToSse trait in infra/events/src/sse.rs, which gives a stable JSON shape that Splunk, Datadog, or ELK ingest without a custom parser.
Agents. Claude Desktop, Claude Code, and any MCP client connect to the binary as MCP servers. Per-server health checks live under mcp/services/monitoring; A2A traffic terminates in agent/services/a2a_server. Every tool call, whichever client originated it, hits the same RBAC middleware.
- ProviderFactory::create — Matches a provider name string to an AiProvider implementation for Anthropic, OpenAI, or Gemini. Accepts a custom endpoint and an optional database pool. Cost is recorded uniformly through analytics/repository/costs.rs.
- OAuth2 discovery — /.well-known/oauth-authorization-server is served by discovery.rs. Federated JWTs flow into the same enforce_rbac_from_registry path the built-in OAuth server uses.
- ToSse + MCP monitoring — Analytics events serialise through the ToSse trait in infra/events/src/sse.rs. The mcp/services/monitoring module health-checks each MCP server. Every tool call is governed by mcp/middleware/rbac.rs.
Where the policy actually runs
An agent is about to call a write tool against a customer database. The question a security lead cares about is not "is there governance" but "what code, in what process, on what line, decides whether that call runs". The answer is enforce_rbac_from_registry in mcp/middleware/rbac.rs. It validates the JWT, extracts the user claims, runs validate_scopes_for_permissions, and returns an error if the scopes do not satisfy the required permission. Nothing reaches the tool runtime around it.
Ahead of and around that check sit two other named pieces. Rate limits come from profile/rate_limits.rs, where the tier multipliers are real numbers in source: Admin 10x, User 1x, Service 5x, A2a 5x, Mcp 5x, Anonymous 0.5x, with a default burst multiplier of 3. The lifecycle hook surface is the HookEvent enum in shared/models/src/services/hooks.rs, with exactly ten variants: PreToolUse, PostToolUse, PostToolUseFailure, SessionStart, SessionEnd, UserPromptSubmit, Notification, Stop, SubagentStart, SubagentStop.
Policy values live in YAML profiles loaded by the binary at startup, so a security team changes a rate limit or a permission tier without touching Rust. The enforcement code lives in the binary, in-process, on the same call path as the tool. There is no sidecar to misconfigure into a no-op.
- enforce_rbac_from_registry — Defined in mcp/middleware/rbac.rs. Validates the JWT, extracts claims, checks audience, runs validate_scopes_for_permissions. A request without a matching scope returns an error before any handler code runs.
- rate_limits.rs tier multipliers — Per-endpoint base rates multiplied by tier: Admin 10x, User 1x, Service 5x, A2a 5x, Mcp 5x, Anonymous 0.5x. Burst multiplier default is 3. All values are profile YAML, not hardcoded constants.
- HookEvent: 10 variants — PreToolUse, PostToolUse, PostToolUseFailure, SessionStart, SessionEnd, UserPromptSubmit, Notification, Stop, SubagentStart, SubagentStop. Defined in shared/models/src/services/hooks.rs.
The narrow waist
The architect drawing the box diagram needs one place where every AI request from every agent crosses a controlled line. That line is the binary. The Extension trait in shared/extension/src/traits.rs is the boundary type: a single Rust trait that an extension implements to contribute routes, jobs, schemas, migrations, AI providers, tool providers, page prerenderers, roles, required assets, and config sections. What an extension does not override falls through to the in-binary defaults.
Two deployment shapes use the same trait. Run the binary with every default on, and one artifact provides identity, RBAC, MCP supervision, secrets, audit, and skill distribution against one Postgres. Override llm_providers, router, and a custom analytics job, and the same artifact becomes a thin policy layer that hands identity to Okta and telemetry to Datadog while still running RBAC and the lifecycle hooks on every call.
The codebase that backs this is exercised by integration tests under crates/tests/integration, load tests under crates/tests/loadtest, fuzz tests under crates/tests/fuzz, and benchmarks under crates/tests/bench. The Rust runtime has no GC pauses to schedule around. Performance numbers belong on a benchmark page, not here.
- Internal or federated, same trait — Use built-in OAuth2 and built-in analytics, or override llm_providers, router, and analytics jobs to forward to Okta and Datadog. Both deployments share the same Extension trait surface.
- Tested under load and fuzz — Integration, load, fuzz, and benchmark suites live under crates/tests. The runtime is Rust, with no garbage collector pauses on the request path.
- Full stack or thin policy layer — Greenfield deployments use the built-ins. Existing stacks override only the surfaces they want to keep. The Extension trait is the contract for both.
Extend at compile time
The team adopting this does not want a SaaS dashboard with limited webhooks. They want to add a route, a job, a schema, and a custom AI provider, and ship it as part of the same binary their security team already approved. systemprompt.io is a Cargo dependency. The integrator adds it to Cargo.toml, implements the Extension trait, runs cargo build --release, and ships one artifact. Proprietary logic compiles into the binary, not into a third-party service.
The Extension trait in shared/extension/src/traits.rs exposes methods across every domain: router() for HTTP routes, schemas() and migrations() for database tables, jobs() for background tasks, llm_providers() for AI model routing, tool_providers() for MCP servers, page_prerenderers() for static page generation, config_prefix() for YAML namespaces, roles() for RBAC, required_assets() for static files. Each method has a has_*() predicate and a default. The registry that loads them lives in typed_registry.rs.
Version pinning is the integrator's call. The artifact compiled today runs the same way next year unless someone bumps the dependency and recompiles. There is no forced upgrade path on a production deployment.
- Cargo dependency, not SaaS — systemprompt.io is added as a crate dependency. cargo build --release produces a binary that contains the core plus the integrator's extensions. No upstream service runs the workload.
- Extension trait surface — router, schemas, migrations, jobs, llm_providers, tool_providers, page_prerenderers, config_prefix, roles, required_assets. Each with a has_* predicate and a default. Defined in shared/extension/src/traits.rs.
- Reference extensions — The systemprompt-template repository ships extension crates that exercise the trait against real HTTP routes, jobs, and schemas. Read them as the contract.
The data and the binary stay on your side
A compliance officer asks where every prompt, every tool call, and every model response physically resides. The honest answer in most AI deployments is "I do not know, because the vendor's processor list is twelve names long." With systemprompt.io the answer is "in the Postgres instance the binary connects to". The binary runs on the operator's hardware. No outbound telemetry path is compiled in.
Audit lineage is a row in mcp_tool_executions: name, server, input, output, status, user_id, session_id, trace_id. AI request cost, tokens, model, and provider land in the schema declared by ai/schema/ai_requests.sql. The trace modules in infra/logging/src/trace stitch identity, permission decision, tool call, and AI request into one trace_id. Permission tiers are defined in permission.rs with a hierarchy (Admin 100 implies all lower tiers, Anonymous 10 is the floor).
Skills, configuration, and profile data export through the sync export pipeline in app/sync/src/export as Markdown with YAML frontmatter and YAML config files. Anonymous user data is purged on a schedule by cleanup_anonymous_users.rs. Session retention is governed by the session repository under mcp/repository/session. The binary, the extension crates, and the database schema are the entire artifact. There is no vendor in the loop after handover.
- No outbound telemetry — There is no compiled-in callback to a systemprompt.io endpoint. The binary connects to the operator's Postgres and to the AI providers configured in the profile. Nothing else.
- trace_id lineage in Postgres — mcp_tool_executions stores tool call I/O with user_id, session_id, and trace_id. ai_requests.sql stores cost in microdollars, tokens, model, and provider. The trace modules stitch them together.
- Open-format export and retention — Skills export through app/sync/src/export as Markdown with YAML frontmatter. cleanup_anonymous_users.rs purges anonymous data on schedule. Session retention lives in mcp/repository/session/mod.rs.
Founder-led. Self-service first.
No sales team. No demo theatre. The template is free to evaluate — if it solves your problem, we talk.
Who we are
One founder, one binary, full IP ownership. Every line of Rust, every governance rule, every MCP integration — written in-house. Two years of building AI governance infrastructure from first principles. No venture capital dictating roadmap. No advisory board approving features.
How to engage
Evaluate
Clone the template from GitHub. Run it locally with Docker or compile from source. Full governance pipeline.
Talk
Once you have seen the governance pipeline running, book a meeting to discuss your specific requirements — technical implementation, enterprise licensing, or custom integrations.
Deploy
The binary and extension code run on your infrastructure. Perpetual licence, source-available under BSL-1.1, with support and update agreements tailored to your compliance requirements.
Compile it. Run it. Read the audit table.
Clone the template, run cargo build --release, and put a real RBAC check in front of a real Claude agent. The artifact is yours, the database is yours, the trace_ids land in your Postgres.