Agents that learn from their own metrics
An agent cannot improve if it cannot see its own performance. systemprompt.io writes every tool call to PostgreSQL and exposes the same data back to the agent as MCP tools, so it can query its own error rate, cost, and latency and adjust without a human in the loop.
MCP Calls → PostgreSQL → CLI Queries
An agent that does not log what it just did cannot tell you, or itself, what happened later. systemprompt.io writes every MCP tool call, every AI request, and every engagement event as a structured row in PostgreSQL before the response returns to the caller. The LogEntry struct carries eight correlation fields on every record (id, user_id, session_id, task_id, trace_id, context_id, client_id, timestamp) so a row in any table joins back to the originating request. MCP executions record tool name, server name, input payload, output payload, status, execution time in milliseconds, and error message. AI requests record provider, model, input tokens, output tokens, cost in microdollars, latency, and status.
The EngagementEvent row holds 17 behavioural metric fields per page interaction (time_on_page_ms, time_to_first_interaction_ms, time_to_first_scroll_ms, max_scroll_depth, scroll_velocity_avg, scroll_direction_changes, click_count, mouse_move_distance_px, keyboard_events, copy_events, focus_time_ms, blur_count, tab_switches, visible_time_ms, hidden_time_ms, is_rage_click, is_dead_click) alongside identity columns and a reading_pattern classification. The AnalyticsEventType enum defines seven event categories (PageView, PageExit, LinkClick, Scroll, Engagement, Conversion, Custom), each assigned a navigation, interaction, engagement, or conversion bucket by the category() method.
The data lives in your PostgreSQL instance, in tables you can read with SQL, export to a warehouse, or stream to a SIEM. CreateAnalyticsEventBatchInput handles batch ingestion for high-throughput writers. Session records carry 24 fields covering device type, browser, OS, country, region, city, UTM parameters, referrer source, landing page, and bot detection flags. Your data, your database, your compliance boundary.
- MCP tool logging — Every mcp_tool_execution row records tool_name, server_name, input, output, status, execution_time_ms, error_message, trace_id, task_id, and context_id. Queryable via systemprompt infra logs trace, filterable by tool, server, status, and time range.
- A2A invocation tracking — Agent-to-agent calls fan out as ContextStateEvent variants on a shared context, joining tool executions, task status changes, and artifacts to the agent that produced them. Multi-agent workflows trace end to end through trace_id and context_id.
- Engagement events — EngagementEvent stores 17 behavioural metric fields per interaction (scroll depth, velocity, direction changes, rage clicks, dead clicks, copy events, tab switches, visible vs hidden time) plus a reading_pattern classification. Seven AnalyticsEventType variants bucket into four categories via category().
- events.rs AnalyticsEventType enum and EngagementEventData struct with 17 fields
- engagement.rs EngagementEvent struct with 17 behavioural metric fields plus identity and reading_pattern columns
- log_entry.rs LogEntry struct with 8 identity fields per event
- context.rs ContextStateEvent enum with 9 variants
- tool_queries.rs Tool execution queries with time, name, server, and status filters
- types.rs CreateSessionParams with 24 tracked fields per session
Self-Aware Agents
An agent that cannot read its own metrics cannot adapt. The entry point is AgentAnalyticsRepository::get_stats, which takes a start, an end, and an optional agent name filter and returns total agents, total tasks, completed tasks, failed tasks, and average execution time in milliseconds for that window. get_ai_stats returns total AI requests and total cost in microdollars over the same window. get_tasks_for_trends returns the per-task started_at, status, and execution_time_ms rows so a caller can compute its own trends. The agent name filter uses ILIKE pattern matching, so an agent can scope a query to itself or to a peer.
Cost analytics live next door in CostAnalyticsRepository: get_summary returns total requests, total cost, and total tokens; get_breakdown_by_model, get_breakdown_by_provider, and get_breakdown_by_agent return the same numbers split by dimension; get_costs_for_trends returns time-series points. Costs are stored in microdollars (millionths of a dollar) so aggregation does not accumulate floating-point error. The AiRequestStats struct rolls up total_requests, input tokens, output tokens, total cost, and average latency, with per-provider and per-model breakdowns in ProviderStatsRow and ModelStatsRow.
These methods are not only CLI surfaces. They are exposed as MCP tools an agent can call mid-task. An agent watching its own error rate climb above a threshold can switch tools. An agent watching cost per request climb can switch to a cheaper model. The loop is execute, measure, adapt, and the measure step is a SQL query the agent runs against its own history.
- Analytics as MCP tools — AgentAnalyticsRepository exposes get_stats(), get_ai_stats(), and get_tasks_for_trends(). CostAnalyticsRepository exposes get_summary() plus four breakdown methods. The same calls back the systemprompt analytics agents stats CLI and the MCP tools an agent invokes at runtime.
- Performance introspection — AiRequestStats aggregates total_requests, total_input_tokens, total_output_tokens, total_cost_microdollars, and avg_latency_ms with per-provider and per-model breakdowns. ILIKE pattern matching scopes a query to one agent name or a family of names.
- Adaptive behaviour — After a measurement, an agent can switch its instructions via SkillInjector, reload prior turns via ContextService::load_conversation_history, or hand off to another agent over A2A. The adjustment runs inside the same task that read the metric.
- stats_queries.rs AgentAnalyticsRepository::get_stats (L8-56) returns AgentStatsRow, get_ai_stats (L58-78) returns AgentAiStatsRow, get_tasks_for_trends (L80-127) returns Vec<AgentTaskRow>
- costs.rs CostAnalyticsRepository with 5 breakdown methods in microdollars
- models.rs AiRequestStats, ProviderStatsRow, ModelStatsRow structs
- analytics/ CLI analytics commands: agents, costs, conversations, tools, traffic
- detail_queries.rs Agent detail queries for per-agent analysis
Beyond Pageviews
Counting pageviews tells you nothing about who is actually reading. BehavioralBotDetector::analyze runs nine checks against a session and adds a fixed point value to a running score for each check that fires. The checks and their weights are defined in the scoring module: check_high_request_count adds 30 when requests exceed 50, check_ghost_session adds 35 when a session has no landing page, no entry URL, and zero requests after 30 seconds, check_high_page_coverage adds 25 when a session touches more than 60% of site pages, check_outdated_browser adds 25 when the user agent reports Chrome below 120 or Firefox below 120, check_sequential_navigation and check_multiple_fingerprint_sessions add 20 each, check_no_javascript_events adds 20 when a session makes three or more requests with no JavaScript analytics events, and check_regular_timing and check_high_pages_per_minute add 15 each. When the running score reaches BEHAVIORAL_BOT_THRESHOLD (30), the result flips to suspicious and the joined signals become the behavioral_bot_reason written by update_behavioral_detection.
AnomalyDetectionService maintains configurable thresholds for three default metrics: requests_per_minute (warning at 15, critical at 30), session_count_per_fingerprint (warning at 5, critical at 10), and error_rate (warning at 0.1, critical at 0.25). check_anomaly returns an AnomalyCheckResult with three severity levels (Normal, Warning, Critical). check_trend_anomaly compares the latest value against the rolling average within a configurable window and fires Critical at 3x above average, Warning at 2x. Thresholds update at runtime via update_threshold.
Reading pattern classification sorts users into five buckets (bounce, skimmer, scanner, reader, engaged), stored as the reading_pattern field on each EngagementEvent. Session records track request count, task count, AI request count, message count, and bot classification flags (is_bot, is_scanner, is_behavioral_bot) with behavioral_bot_reason for audit. get_session_velocity returns requests per second for throttling decisions on the live session.
- Behavioural bot scoring — BehavioralBotDetector::analyze runs nine checks against a session and accumulates a score. A single check at or above the BEHAVIORAL_BOT_THRESHOLD of 30 flips the session to suspicious, so a high request count (30 points) or a ghost session (35 points) flags on its own, while regular timing (15) plus high pages per minute (15) accumulate to the threshold together.
- Interaction signals — Rage clicks, dead clicks, scroll velocity, direction changes, copy events, tab switches, mouse distance. AnomalyDetectionService ships three default metrics with configurable warning and critical thresholds, plus check_trend_anomaly that fires Critical at 3x and Warning at 2x the rolling average.
- Session intelligence — 24 fields per session including device type, browser, OS, country, UTM parameters, referrer, and landing page. Bot classification flags (is_bot, is_scanner, is_behavioral_bot) carry a behavioral_bot_reason string for audit.
- behavioral_detector/mod.rs BehavioralBotDetector with 9 checks, scoring constants, and thresholds
- behavioral_detector/checks.rs 9 check methods: high_request_count, page_coverage, sequential_navigation, timing variance
- behavioral_detector/types.rs SignalType enum with 9 variants, BehavioralAnalysisResult struct
- anomaly_detection.rs AnomalyDetectionService with 3 default metrics, check_anomaly(), check_trend_anomaly()
- behavioral.rs mark_as_behavioral_bot(), update_behavioral_detection() repository functions
- queries.rs Session queries: find_by_id, velocity, behavioral analysis, endpoint sequences
- events.rs EngagementEventData with reading_pattern field
The Loop Closes
The loop only closes if a measurement can change behaviour without a redeploy. ContextService::load_conversation_history reconstructs the full message history plus serialised artifacts for a context from PostgreSQL, so an agent picking up an in-flight task starts from the same state the previous turn ended on. The ContextStateEvent enum defines exactly nine lifecycle variants that fire on the shared context as work progresses: ToolExecutionCompleted, TaskStatusChanged, ArtifactCreated, SkillLoaded, ContextCreated, ContextUpdated, ContextDeleted, Heartbeat, and CurrentAgent. Each variant carries a context_id and timestamp, so one agent's output becomes another agent's input on the same logical thread.
SkillInjector::inject_for_tool swaps an agent's instructions at runtime. The method takes an optional skill_id, the base prompt, and a RequestContext; if a skill_id is present it loads the skill body via SkillService::load_skill and appends it under a "Writing Guidance" header before returning the enhanced prompt. A failed load logs a warning and returns the base prompt unchanged, so a missing skill never breaks a task. inject_with_metadata returns the same enhanced prompt plus the skill's metadata (name, description, enabled status, assigned agents, tags) for the caller to record. SkillIngestionService scans skill directories on startup, parses YAML frontmatter, strips Markdown frontmatter, and upserts the result to PostgreSQL, so editing a skill on disk and rerunning ingestion is the whole change flow.
ExecutionTrackingService exposes twelve methods covering the agent lifecycle (track_understanding, track_planning, track_planning_async, track_skill_usage, track_tool_execution, track_completion, complete, complete_planning, fail, fail_step, fail_in_progress_steps, get_steps_by_task). Five step content types (understanding, planning, skill_usage, tool_execution, completion) give a row-level view of what an agent is doing at any moment. Configuration changes through the CLI take effect on the next task without restart.
- Shared contexts — ContextService::load_conversation_history reconstructs full message history and serialised artifacts from PostgreSQL. Nine ContextStateEvent variants (ToolExecutionCompleted, TaskStatusChanged, ArtifactCreated, SkillLoaded, ContextCreated, ContextUpdated, ContextDeleted, Heartbeat, CurrentAgent) propagate state changes across agents on the same context.
- Dynamic skills — SkillInjector::inject_for_tool takes an optional skill_id and a base prompt, loads the skill body via SkillService::load_skill, and returns the prompt with the skill appended under a Writing Guidance header. inject_with_metadata returns the same prompt plus name, description, tags, and assignment data. SkillIngestionService upserts skills from disk to PostgreSQL on startup.
- Execution tracking — ExecutionTrackingService exposes twelve methods tracking five step types: understanding, planning, skill_usage, tool_execution, and completion. Async tracking variants exist for planning steps that run off the request path.
- context.rs (models) ContextStateEvent enum with 9 lifecycle event variants
- context.rs (service) ContextService with load_conversation_history()
- skill_injector.rs SkillInjector with inject_for_tool() and inject_with_metadata()
- ingestion.rs SkillIngestionService with directory scanning and YAML parsing
- execution_tracking.rs ExecutionTrackingService with 12 methods and 5 step content types
- skill.rs Skill model with id, name, description, instructions, tags, category
Debug Any Request
An auditor at 3 a.m. needs to answer one question: what did the agent actually do, and why. TraceQueryService exposes more than 25 methods for that. get_all_trace_data runs eight queries in parallel via tokio::try_join! and returns log events, AI request events, MCP execution events, execution step events, the three matching summaries, and the associated task ID, all keyed off a single trace_id. find_ai_request_for_audit cascades from request_id to task_id to trace_id with partial prefix matching, so a four-character prefix lands on the right row.
Audit depth comes from three more methods. list_audit_messages reconstructs the AI conversation from ai_request_messages: role, content, and sequence number for every message in order. list_audit_tool_calls returns the tool name and input payload for every call in execution order. list_linked_mcp_calls joins mcp_tool_executions to ai_request_tool_calls so each call carries the MCP server that handled it, its status, and its execution time. AuditLookupResult ties provider, model, input tokens, output tokens, cost in microdollars, latency, task_id, and trace_id into a single row a SOC 2 auditor can read directly.
Log search and filtering close the surface. search_logs supports pattern matching with time and level filters. count_logs_by_level and top_modules return summary views. list_traces takes a TraceListFilter with six parameters: limit, since, agent, status, tool, and has_mcp. The same tables back both the agent's own queries and the human auditor's, so the lineage from user click to agent response to tool execution to cost attribution lives in one place.
- End-to-end tracing — TraceQueryService::get_all_trace_data runs eight concurrent queries via tokio::try_join! to reconstruct a request lifecycle from one trace_id. find_ai_request_for_audit cascades from request_id to task_id to trace_id with partial prefix matching, so a four-character prefix is enough to land on a request.
- Compliance ready — AuditLookupResult ties provider, model, input tokens, output tokens, cost in microdollars, latency, task_id, and trace_id into one row. list_audit_messages reconstructs the full AI conversation from ai_request_messages. list_linked_mcp_calls joins mcp_tool_executions to ai_request_tool_calls so each tool call carries its server, status, and execution time.
- Debug any issue — TraceListFilter takes six parameters (limit, since, agent, status, tool, has_mcp). search_logs supports pattern matching with time and level filters. count_logs_by_level and top_modules return summary views from the same tables the agents read.
- service.rs TraceQueryService with 25+ methods including get_all_trace_data()
- audit_queries.rs Cascading audit lookup, conversation reconstruction, linked MCP calls
- mcp_trace_queries.rs MCP execution queries with linked AI requests and task artifacts
- tool_queries.rs Tool execution listing with time, name, server, and status filters
- models.rs TraceListFilter, AuditLookupResult, TraceEvent, and 20+ model structs
- log_entry.rs LogEntry with 8 correlation fields and structured metadata
Founder-led. Self-service first.
No sales team. No demo theatre. The template is free to evaluate — if it solves your problem, we talk.
Who we are
One founder, one binary, full IP ownership. Every line of Rust, every governance rule, every MCP integration — written in-house. Two years of building AI governance infrastructure from first principles. No venture capital dictating roadmap. No advisory board approving features.
How to engage
Evaluate
Clone the template from GitHub. Run it locally with Docker or compile from source. Full governance pipeline.
Talk
Once you have seen the governance pipeline running, book a meeting to discuss your specific requirements — technical implementation, enterprise licensing, or custom integrations.
Deploy
The binary and extension code run on your infrastructure. Perpetual licence, source-available under BSL-1.1, with support and update agreements tailored to your compliance requirements.