Performance Metrics in the Control Center

systemprompt.io March 10, 2026 · 28 min read

Table of contents

Prelude
The Problem
The Journey
Metric Combinations: Reading the Full Picture
Session Profiles: What Good Looks Like
Improving Your Metrics
Common Pitfalls
Metrics for Teams
Real-World Metric Profiles
How Data Is Collected
- Event Types
- Daily Reports
Tracking Progress Over Time
Cost Signals Alongside Performance Metrics
Mapping Metrics to Observability Dashboards
Detailed Metric Documentation
The Takeaway

Prelude

The systemprompt.io Control Center tracks six performance metrics that quantify how you work with Claude Code. These are not vanity numbers. Each metric is derived from deterministic event data captured during every Claude Code session, giving you an accurate picture of your interaction patterns, efficiency, and growth over time.

If you have ever looked at the metrics cards in the Control Center and wondered what exactly they measure, how they are calculated, or what a good score looks like, this guide covers all of it.

The Problem

Claude Code sessions generate enormous amounts of data. Every file read, every edit, every bash command, every prompt you submit, every error that occurs. All of it is captured as structured events. Without metrics, that data is just noise. You cannot tell whether today was more productive than yesterday, whether your error rate is climbing, or whether you are actually using the tools available to you.

The six metrics in the Control Center distill that raw event data into actionable numbers. They answer specific questions: How fast am I iterating? How clean is my execution? Am I parallelising effectively? How much data is flowing through my sessions? Am I using the full toolkit? Am I delegating work to subagents?

The Journey

APM (Actions Per Minute) {#apm}

APM measures the rate of meaningful interactions between you and Claude Code per minute of session time. It counts every tool execution (file reads, edits, searches, bash commands) and every prompt you submit.

Formula:

APM = (tool_uses + prompts) / session_duration_minutes

Data source: derived from hook event fields documented in the Claude Code hooks reference, as of 2026-04.

Data source: plugin_session_summaries.tool_uses, plugin_session_summaries.prompts, and session duration calculated from started_at / ended_at timestamps.

Why it matters. APM measures your iteration speed with AI. Higher APM indicates tighter feedback loops. You give instructions, reviewing results, and course-correcting rapidly. In StarCraft terms, this is your raw actions per minute. A session where you submit 5 prompts and Claude executes 45 tool calls over 10 minutes gives you an APM of 5.0. Professional StarCraft players average 300+ APM; effective Claude Code users typically range 5–30 APM depending on task complexity.

How to interpret your APM:

Below 5. May indicate long-running autonomous tasks or idle time within sessions. Not necessarily bad if Claude is executing complex multi-step operations.
5 to 20. Typical range for interactive development sessions. You are actively steering Claude and reviewing outputs.
Above 20. Rapid interactive sessions with tight feedback loops. Common during debugging, refactoring, or exploratory coding where you are giving frequent corrections.

EAPM (Effective Actions Per Minute) {#eapm}

EAPM is APM with errors removed. It counts only successful tool executions and prompts, excluding failed tool calls.

Formula:

EAPM = (tool_uses + prompts - errors) / session_duration_minutes

Data source: derived from hook event fields documented in the Claude Code hooks reference, as of 2026-04.

Data source: Same fields as APM, plus plugin_session_summaries.errors.

Why it matters. EAPM separates signal from noise. If you have 50 APM but 20 of those actions are errors, your EAPM of 30 reveals your true productive pace. In StarCraft, this is the difference between spam-clicking and meaningful actions. A high APM with low EAPM suggests you may need to adjust your prompts or tool configurations to reduce errors.

How to interpret the gap between APM and EAPM:

EAPM close to APM. Clean execution. Your sessions run with minimal errors, meaning prompts are clear and tool configurations are correct.
EAPM significantly lower than APM. High error rate worth investigating. Common causes include misconfigured tools, ambiguous prompts that lead Claude down wrong paths, or working in environments with flaky dependencies.

Concurrency {#concurrency}

Concurrency measures the number of Claude Code sessions running simultaneously. It is calculated by counting overlapping session time windows. Sessions where the started_at to ended_at ranges overlap.

Formula:

Peak = maximum number of sessions with overlapping time ranges
Average = time-weighted mean of concurrent session count over the day

Data source: algorithm described in the sweep-line algorithm article on Wikipedia, applied to session lifecycle fields, as of 2026-04.

Data source: plugin_session_summaries.started_at and plugin_session_summaries.ended_at, calculated using a sweep-line algorithm.

Why it matters. Concurrency shows how effectively you parallelise AI work. Running multiple Claude Code sessions on different tasks simultaneously is like managing multiple bases in StarCraft. It multiplies your output. Peak concurrency of 4 means you had 4 AI assistants working in parallel at one point during the day.

How to interpret your concurrency:

1. Serial work. One session at a time. This is fine for focused, complex tasks but leaves throughput on the table.
2 to 3. Moderate parallelism. You are running a couple of sessions simultaneously, perhaps one for a main task and another for a side task or code review.
4 or higher. Power user territory. You are using multiple sessions effectively, delegating independent tasks to separate Claude Code instances and managing them concurrently.

Throughput {#throughput}

Throughput measures the total data volume exchanged between you and Claude Code. This includes all content sent to the AI (prompts, file contents, tool inputs) and all content received (responses, tool outputs, generated code).

Formula:

Total = content_input_bytes + content_output_bytes
Rate = total_bytes / total_active_session_seconds

Data source: byte-count fields emitted by the hook events documented in the Claude Code hooks reference, as of 2026-04.

Data source: plugin_session_summaries.content_input_bytes and plugin_session_summaries.content_output_bytes.

Why it matters. Throughput reflects the volume of work being processed. High throughput means large files are being read, significant code is being generated, or complex analyses are being performed. In StarCraft terms, this is your resource gathering rate. It measures how much material flows through your AI pipeline. A session generating 2 MB of output is doing substantially more work than one generating 50 KB.

How to interpret your throughput:

Displayed as total bytes (KB/MB) and rate (bytes per second).
Higher throughput generally correlates with more productive sessions, but context matters. A refactoring session that touches many files will naturally have higher throughput than a focused debugging session.
Sudden throughput drops may indicate sessions that stall or hit context limits.

Tool Diversity {#tool-diversity}

Tool diversity is the number of unique tools used across all sessions in a day. Tools include Read (file reading), Edit (file modification), Bash (command execution), Grep (code search), Glob (file search), Write (file creation), and Agent (subagent delegation).

Formula:

Tool Diversity = COUNT(DISTINCT tool_name) WHERE event_type = 'PostToolUse'

Data source: PostToolUse event names documented in the Claude Code hooks reference, as of 2026-04.

Data source: plugin_usage_events.tool_name filtered to successful tool executions.

Why it matters. Tool diversity indicates how versatile your AI usage is. Using only Read and Bash is like building only one unit type in StarCraft. It works, but you are not using the full toolkit. Users who employ Read, Edit, Grep, Glob, Bash, Write, and Agent have a richer interaction pattern and typically complete more complex tasks. A diversity of 7 (all tools) suggests sophisticated multi-step workflows.

How to interpret your tool diversity:

1 to 2. Basic usage. You may be using Claude Code primarily for reading files or running commands. Consider whether Edit, Grep, or Glob could speed up your workflow.
3 to 4. Moderate diversity. You are using a reasonable subset of the available tools.
5 or higher. Advanced usage. You are using the full Claude Code toolkit, including search tools and subagent delegation.

Multitasking Score {#multitasking}

The multitasking score is a composite metric (0–100) measuring how effectively you delegate and parallelise work. It factors in subagent spawning (Claude creating helper agents) and session concurrency.

Formula:

Multitasking = min(100, (subagent_spawns × 2 + peak_concurrency × 3) / session_count × 10)

Data source: subagent lifecycle events documented in the Claude Code sub-agents reference, combined with hook event session counts, as of 2026-04.

Data source: plugin_session_summaries.subagent_spawns, peak concurrent sessions, and total session count.

Why it matters. The multitasking score captures your ability to run AI at scale. Spawning subagents means you are letting Claude break complex tasks into parallel subtasks, the AI equivalent of army splitting in StarCraft. Combined with session concurrency, this shows whether you are using AI as a single worker or as a coordinated team.

How to interpret your multitasking score:

0 to 20. Sequential, single-task usage. You are working with one Claude session at a time without subagent delegation. This is fine for simple tasks.
20 to 50. Moderate delegation. You are either running concurrent sessions or triggering subagent spawns, but not both heavily.
50 or higher. Heavy parallelism and delegation. You are running multiple concurrent sessions and using subagents within those sessions. This is the pattern of users who treat Claude Code as a team rather than a single assistant.

Metric Combinations: Reading the Full Picture

Individual metrics tell you something useful. Metrics read together tell you what is actually happening in your workflow.

Speed vs Accuracy: APM + EAPM

The gap between APM and EAPM is your error rate. If your APM is 15 and your EAPM is 12, you have a 20% error rate. Track this ratio over time rather than absolute values.

Pattern	APM	EAPM	Ratio	Diagnosis
Clean execution	12	11.5	96%	Prompts are clear, tools are configured correctly
Moderate noise	18	13	72%	Some errors, likely from exploratory work or unfamiliar codebases
High error rate	25	10	40%	Significant retry overhead. Check for misconfigured tools, ambiguous prompts, or flaky test suites

Data source: first-party session samples from the Control Center described above, as of 2026-04.

When the ratio drops below 70%, investigate your error sources. The most common causes are: running bash commands that fail due to missing dependencies, file edits that conflict with concurrent changes, and grep patterns that match nothing. Each of these wastes a tool call and time.

Efficiency vs Scale: EAPM + Concurrency

High EAPM with concurrency of 1 means you are fast but sequential. Low EAPM with concurrency of 4 means you are parallelising but each session is sluggish. The combination reveals your actual throughput capacity.

A user with EAPM of 10 and concurrency of 3 is effectively producing at a rate of 30 effective actions per minute across their workspace. Compare that to EAPM of 20 with concurrency of 1, technically faster per session but lower total output.

Breadth vs Depth: Tool Diversity + Throughput

Low tool diversity with high throughput suggests you are doing repetitive bulk operations (reading many files, running many bash commands). High tool diversity with moderate throughput suggests complex multi-step workflows that use the full toolkit.

Diversity	Throughput	Pattern
2 (Read + Bash)	High (5 MB+)	Bulk analysis or log review
4 (Read + Edit + Bash + Grep)	Moderate (500 KB–2 MB)	Standard development workflow
6+ (all tools)	Moderate to High	Complex refactoring or multi-repository work
7 (all tools including Agent)	Any	Advanced orchestration with subagent delegation

Data source: first-party session samples from the Control Center described above, mapped to tools listed in the Claude Code hooks reference, as of 2026-04.

Delegation Effectiveness: Multitasking + Concurrency

The multitasking score combines subagent usage and session concurrency. But the two components tell different stories. High subagent spawns within a single session means Claude is breaking down complex tasks internally. High concurrency without subagents means you are manually managing parallel sessions.

The most effective pattern is both: multiple concurrent sessions where Claude also spawns subagents within those sessions. This represents full utilisation of Claude Code's parallelism capabilities.

Session Profiles: What Good Looks Like

Different types of work produce different metric signatures. Comparing your metrics to these profiles helps you understand whether your numbers are typical or indicate something worth adjusting.

The Debugger

Tight feedback loops, lots of reads and searches, few edits until the fix is found.

Metric	Typical Range
APM	8–20
EAPM	6–18
Concurrency	1
Throughput	200 KB–1 MB
Tool Diversity	4–5 (Read, Grep, Bash, Edit, Glob)
Multitasking	0–15

Data source: first-party session samples from the Control Center described above, as of 2026-04.

The Refactorer

High throughput from touching many files, moderate APM, high tool diversity.

Metric	Typical Range
APM	10–25
EAPM	9–23
Concurrency	1–2
Throughput	1–10 MB
Tool Diversity	5–7
Multitasking	10–40

Data source: first-party session samples from the Control Center described above, as of 2026-04.

The Architect

Multiple concurrent sessions, subagent delegation, exploring different approaches in parallel.

Metric	Typical Range
APM	5–15 per session
EAPM	4–14 per session
Concurrency	3–6
Throughput	2–15 MB total
Tool Diversity	6–7
Multitasking	40–100

Data source: first-party session samples from the Control Center described above, as of 2026-04.

The Reviewer

Low APM, high read-to-edit ratio, focused on understanding code rather than changing it.

Metric	Typical Range
APM	2–8
EAPM	2–7
Concurrency	1
Throughput	500 KB–3 MB
Tool Diversity	3–4 (Read, Grep, Glob, Bash)
Multitasking	0–5

Data source: first-party session samples from the Control Center described above, as of 2026-04.

Improving Your Metrics

Metrics are descriptive, not prescriptive. Chasing higher numbers for their own sake is counterproductive. But if you notice patterns that suggest inefficiency, here are specific actions tied to each metric.

Raising EAPM (reducing errors)

Write more specific prompts. Instead of "fix the tests", say "fix the failing test in test_auth.rs by updating the mock to return a 200 status". Specific prompts reduce the chance of Claude taking wrong turns.
Pre-check your environment. Many errors come from missing dependencies, wrong working directories, or stale build artifacts. Running a quick build or test before starting a Claude session eliminates these.
Use /compact to manage context. When sessions get long, Claude's context fills up and error rates increase. Compacting the conversation keeps the AI focused on relevant context.

Raising Concurrency

Identify independent tasks. If you have a feature to build and tests to write, those can run in separate sessions. If you have two unrelated bugs, each gets its own session.
Use worktrees. Git worktrees let multiple Claude Code sessions operate on the same repository without file conflicts. Each session gets its own working copy.
Start with two sessions. The jump from 1 to 2 concurrent sessions is the biggest productivity gain. Do not try to manage 5 sessions on day one.

Raising Tool Diversity

Use Grep instead of Bash grep. Claude's built-in Grep tool is faster and provides structured output. If you see Claude running grep -r, your tool diversity is artificially low.
Use Glob for file discovery. Instead of find . -name "*.rs", let Claude use the Glob tool. Faster and counts toward diversity.
Delegate with Agent. If a task has independent subtasks (research one component, modify another), tell Claude to use subagents. This raises both tool diversity and multitasking score.

Raising Throughput

Throughput is primarily a function of task complexity. You do not need to artificially inflate it. But if throughput is consistently low, you may be under-utilising Claude:

Let Claude read more context. Instead of pasting snippets, let Claude read the full files. More input context generally leads to better output.
Ask for complete implementations. Instead of asking Claude to outline an approach, ask it to write the full code. This generates more output and typically saves you time.

Raising Multitasking Score

The multitasking score rewards both subagent delegation and session concurrency. If your score is consistently below 20, try these approaches:

Use plan mode for complex tasks. When Claude enters plan mode and then executes, it naturally spawns subagents for parallel research. A single planning session can generate 3-6 subagent spawns, which directly lifts the multitasking score.
Tell Claude to delegate. Explicitly instruct Claude to use the Agent tool for independent subtasks. "Research the authentication module in one agent while refactoring the database layer in another" triggers parallel work that both completes faster and improves your score.
Run concurrent sessions on independent tasks. If you are working on a feature branch and also need to review a colleague's PR, those are separate sessions. The concurrency multiplier in the multitasking formula rewards this.
Combine subagents with concurrency. The highest multitasking scores come from users running 2-3 concurrent sessions where each session also delegates to subagents. This is the "team of teams" pattern.

Reducing Throughput Waste

High throughput is not always productive throughput. Watch for these wasteful patterns:

Reading the same files repeatedly. If Claude reads src/main.rs five times in one session, those bytes count toward throughput but add no value after the first read. Use /compact to keep context focused and reduce re-reads.
Verbose Bash output. A cargo build that outputs 200 lines of dependency resolution inflates throughput without adding useful information. Consider redirecting verbose output: cargo build 2>&1 | tail -20.
Large file writes that get immediately overwritten. If Claude writes a 500-line file and then rewrites it after your correction, both writes count toward throughput. Provide clear requirements upfront to reduce rewrites.

Common Pitfalls

Metrics can mislead if you read them without context. These are the patterns that most commonly cause developers to draw wrong conclusions.

High APM from Error Loops

A session where Claude attempts a Bash command, fails, retries with a slight variation, fails again, and repeats can produce an APM of 30 or higher. This looks productive on paper. In reality, the session is stuck in a retry loop. Always check the EAPM/APM ratio alongside raw APM. If the ratio is below 60%, the high APM is noise, not signal.

Inflated Throughput from Context Reloading

When a session hits the context limit and Claude compacts the conversation, subsequent file reads are "new" from the metrics perspective even though the same files were read earlier. A long session that compacts twice can show 5 MB of throughput when the actual unique data processed was 2 MB. This is not a bug in the metrics. It accurately reflects the data that flowed through the session. But it should not be compared directly to a short session that never compacted.

Low Concurrency Is Not Always Bad

A developer working on a complex, tightly-coupled refactoring across 15 files cannot safely run concurrent sessions. The files conflict. The changes depend on each other. Concurrency of 1 is correct here. Do not force parallelism on tasks that are inherently sequential. The metric is diagnostic, not prescriptive.

Tool Diversity Ceiling

There are only 7 core tools (Read, Edit, Bash, Grep, Glob, Write, Agent). A diversity score of 7 is the maximum. Once you hit 5-6 regularly, further improvement is marginal. Focus on other metrics instead.

Gaming Metrics vs Genuine Improvement

It is possible to inflate every metric artificially. Run empty Bash commands to raise APM. Spawn unnecessary subagents to raise multitasking. Read large files you do not need to inflate throughput. None of this makes you more productive. The metrics exist to reveal patterns in genuine work. If you change your behaviour to please the metrics rather than to improve your workflow, the metrics lose their diagnostic value.

Metrics for Teams

Individual metrics tell you about one developer's interaction patterns. Team-level aggregates tell you about workflow health, bottleneck distribution, and adoption maturity. If you lead a team using Claude Code, here is how to read the aggregate data.

Team Aggregation

The Control Center can display metrics across all team members. The useful aggregates are:

Median EAPM/APM ratio across the team. This shows overall prompt quality and environment health. If one developer has a ratio of 95% and another has 40%, the second developer likely has environment issues (broken tests, missing dependencies, misconfigured tools) that are worth investigating together.
Concurrency distribution. How many team members are running concurrent sessions? If everyone is at concurrency 1, the team is not running parallel workflows. A team-wide session on worktrees and concurrent Claude Code usage can shift this.
Tool diversity spread. If most of the team uses 3-4 tools and one person uses 7, that person has found workflows worth sharing. If everyone is at 2 (Read + Bash), the team is under-utilising Claude Code's capabilities.

Using Metrics in Retrospectives

Metrics belong in retrospectives, not performance reviews. The goal is to surface workflow patterns, not to rank individuals. Productive retrospective questions:

"Our team median EAPM/APM ratio dropped from 85% to 70% this sprint. What changed in our environment?" (Maybe a dependency broke, or the test suite became flaky.)
"Two team members have concurrency above 3 while the rest are at 1. What are they doing differently?" (Maybe they discovered worktrees, or they split their tickets into parallelisable chunks.)
"Our total throughput doubled this sprint but our EAPM stayed flat. Are we processing more data or just retrying more?" (Check the error rate to distinguish.)

Never use metrics to compare individual output. A developer working on a hard, unfamiliar problem will naturally have lower EAPM than one making routine changes. The metrics reflect task difficulty as much as developer capability.

Identifying Team Bottlenecks

When team throughput plateaus despite growing session counts, look for these bottlenecks:

Serial review dependencies. If one developer's work blocks another's review, concurrency for the team is artificially capped. Automated PR reviews with Claude Code GitHub Actions can unblock this.
Shared resource contention. If multiple sessions need to modify the same files or run tests on the same database, concurrent sessions interfere with each other. Worktrees and isolated test environments fix this.
Knowledge silos. If only one team member has high tool diversity (because they are the only one who knows how to use subagents or MCP tools), the team's aggregate potential is constrained. Pair sessions and shared CLAUDE.md configurations distribute knowledge.

Real-World Metric Profiles

Abstract ranges are useful but concrete examples are better. These are actual metric snapshots from real Claude Code sessions.

A Debugging Session

Task: Tracking down a race condition in session lifecycle management. The bug only reproduced under concurrent load.

Metric	Value	Notes
APM	14.2	Rapid iteration: read logs, hypothesise, test, repeat
EAPM	11.8	83% ratio. Several failed grep patterns before finding the right log entries
Concurrency	1	Sequential by necessity. The bug was in concurrency handling, so running concurrent sessions would have confused the investigation
Throughput	1.4 MB	Moderate. Mostly log file reads and code inspection
Tool Diversity	5	Read, Grep, Bash, Edit, Glob. No subagents needed
Multitasking	0	Single session, no delegation
Duration	38 minutes

Data source: first-party session recorded by the Control Center described above, as of 2026-04.

What the metrics reveal: Clean debugging session. The 83% EAPM ratio shows some wasted effort (the failed grep patterns) but is within the healthy range. High APM with tool diversity of 5 is characteristic of the Debugger profile. The fix was a 3-line change to add a mutex guard.

A Large Refactoring Session

Task: Extracting a shared module from three Rust extensions into a common library. Touched 23 files across 4 directories.

Metric	Value	Notes
APM	18.7	High. Many file reads, edits, and build checks
EAPM	17.1	91% ratio. Very clean execution because the pattern was well understood
Concurrency	2	Main session for refactoring, second session running tests continuously
Throughput	8.3 MB	High. 23 files read and modified, multiple full builds
Tool Diversity	7	All tools used including Agent for parallel exploration
Multitasking	45	2 concurrent sessions + subagent delegation for finding all import paths
Duration	1 hour 12 minutes

Data source: first-party session recorded by the Control Center described above, as of 2026-04.

What the metrics reveal: Textbook Refactorer profile. The 91% EAPM ratio is excellent for a task this large and indicates clear understanding of the target architecture before starting. Concurrency of 2 (refactoring + continuous testing) is the optimal pattern for this kind of work: catch breakages immediately instead of discovering them at the end. The throughput of 8.3 MB reflects the scale of the change.

Before and After: Environment Fix

Before (broken test suite, one week):

Average APM: 12.4
Average EAPM: 6.1 (49% ratio)
Average errors per session: 8.3
Root cause: Flaky integration test that failed 40% of the time. Claude retried the test suite on every failure.

After (test suite fixed, following week):

Average APM: 10.8 (slightly lower, less frantic)
Average EAPM: 9.9 (92% ratio)
Average errors per session: 1.1
The fix: Isolated the flaky test into its own test target so it did not block the main suite.

What the metrics reveal: The APM actually decreased after the fix, which might look like a productivity decline. But EAPM nearly doubled. The developer was spending less total time but accomplishing more. This is a case where the EAPM/APM ratio told the true story while raw APM was misleading.

How Data Is Collected

All metrics are derived from hook events that Claude Code sends to systemprompt.io. Every tool execution, prompt submission, error, and session lifecycle event (start, stop) is captured as a deterministic event with precise timestamps and byte counts. No estimation or sampling is involved, every metric is calculated from actual observed events.

When you install the hooks into your Claude Code configuration, they fire HTTP requests on each lifecycle event. These events are stored as raw records and then aggregated into the metrics you see in the Control Center.

Event Types

The following event types feed into the metrics calculations:

PostToolUse. A tool was successfully executed (Read, Edit, Bash, Grep, Glob, Write, Agent). This is the primary event for APM, EAPM, and tool diversity calculations.
PostToolUseFailure. A tool execution failed. Used to calculate the error count that separates EAPM from APM.
UserPromptSubmit. You submitted a prompt to Claude. Counted in both APM and EAPM.
SessionStart / SessionEnd / Stop. Session lifecycle events. Used to calculate session duration, concurrency, and to delineate session boundaries.
SubagentStart / SubagentStop. Subagent lifecycle events. Used in the multitasking score calculation to track how many subagents were spawned.

Daily Reports

These metrics are aggregated into daily summaries at 11 PM UTC. The aggregation process calculates totals, averages, and peaks across all sessions for the day. An AI-generated analysis accompanies each daily summary, providing insights about patterns, identifying skill gaps, and offering recommendations for improving your workflow.

Historical data is preserved indefinitely, allowing you to track trends over weeks and months. The Control Center displays both the current day's live metrics and historical daily summaries for comparison.

Tracking Progress Over Time

The Control Center preserves daily summaries indefinitely. This history is where the real value lives. A single day's metrics are a snapshot. A week of daily summaries shows a trend.

What to look for weekly

Review your metrics at the end of each week. Focus on three questions:

Is my EAPM/APM ratio improving? A rising ratio means fewer errors per session. This is the most directly actionable trend because it responds to better prompt quality and environment hygiene.
Is my concurrency stable or growing? If you are stuck at concurrency 1 week after week, you have not adopted parallel workflows. Even moving to 2 concurrent sessions doubles your effective capacity.
Has my tool diversity changed? If you added a new tool to your workflow (for example, starting to use Agent for subagent delegation), it should show up as a bump in tool diversity. If it does not, the tool is not being triggered.

Setting personal baselines

After two weeks of data, calculate your averages for each metric. These become your personal baselines. Deviations from baseline are more meaningful than absolute numbers.

For example, if your baseline EAPM is 12 and you see a day at 5, something changed. Maybe you worked in an unfamiliar codebase, or your test suite was broken, or you spent the day on a task that required more reading than writing. The metric flags the deviation; you supply the context.

Comparing across projects

Different projects produce different metric profiles. A greenfield project typically shows higher throughput and higher tool diversity (lots of file creation and search). A maintenance project shows lower throughput but potentially higher EAPM (focused, precise edits). Do not compare metrics across fundamentally different project types.

Cost Signals Alongside Performance Metrics

Performance metrics tell you how you are working. Cost per action tells you what that work is costing. The API price per million tokens varies by model, so the same APM profile produces very different bills depending on which Claude model your sessions are hitting. The published API prices are available on the Anthropic pricing page.

Model	Input ($/Mtok)	Output ($/Mtok)	Where to confirm
Claude Opus (latest)	see page	see page	anthropic.com/pricing
Claude Sonnet (latest)	see page	see page	anthropic.com/pricing
Claude Haiku (latest)	see page	see page	anthropic.com/pricing

Data source: Anthropic API pricing page, as of 2026-04. Prices and model names change; always confirm on the live page. Prompt caching and batch API discounts are documented in the Anthropic prompt caching guide and the Anthropic batch API docs.

Throughput in the Control Center is measured in bytes. To convert bytes into token counts for cost estimation, a common rule of thumb documented by Anthropic's tokenisation reference is roughly 4 characters per token for English text and source code. A session with 2 MB of throughput is therefore on the order of 500,000 tokens split between input and output. Multiply by the relevant price in the table above to estimate the session's API cost, then cross-check against your Anthropic console usage report.

For Anthropic API availability and incident history that may explain anomalies in your metrics (spikes in errors or latency), check the Anthropic status page.

Mapping Metrics to Observability Dashboards

The six Control Center metrics are one view. If you already run an observability stack, you can correlate them with standard telemetry pipelines. The mapping below uses only tools with published documentation, so every link points to a primary source.

Control Center metric	Analogous observability signal	Open-source tooling and spec
APM	Request rate (RPS) for LLM API calls	Prometheus counter and rate() docs
EAPM	Success-only request rate (errors excluded)	OpenTelemetry semantic conventions for errors
Concurrency	Active in-flight requests / span concurrency	OpenTelemetry tracing spec, Grafana time-series panels
Throughput	Bytes-in / bytes-out per service	Prometheus histograms, rendered with Grafana bar gauge
Tool Diversity	Cardinality of operation names in traces	OpenTelemetry span attributes
Multitasking Score	Parent/child span fan-out in distributed traces	OpenTelemetry context and propagation

Data source: tool mappings built from the Prometheus querying docs, the OpenTelemetry specification, and the Grafana documentation, as of 2026-04.

The mapping is deliberately one-directional. The Control Center metrics aggregate Claude Code hook events server-side; Prometheus and OpenTelemetry are where you would build parallel dashboards on top of your own application telemetry if you want to correlate Claude Code activity with production system behaviour.

Detailed Metric Documentation

Each metric has a dedicated reference page with additional examples, edge cases, and technical details:

The Takeaway

The six metrics in the Control Center give you a complete, deterministic picture of how you interact with Claude Code. APM and EAPM measure your speed and accuracy. Concurrency and multitasking score measure your ability to parallelise. Throughput measures the volume of work flowing through your sessions. Tool diversity measures the breadth of your toolkit usage.

None of these metrics have a universally "correct" value. A focused debugging session will naturally have different metric profiles than a large refactoring session. The value is in tracking your own patterns over time, identifying sessions where metrics deviate from your norm, and understanding why.

Read them in combination, not isolation. A high APM with low EAPM means errors are eating your speed. High concurrency with low tool diversity means you are running parallel sessions but each one is doing simple work. High throughput with low EAPM means large volumes of data are flowing but much of the work is being retried.

Use the daily AI-generated insights as a starting point, then drill into the metrics that matter most for your workflow. If you are consistently seeing a gap between APM and EAPM, focus on reducing errors. If your concurrency is always 1, consider whether some tasks could run in parallel. If your tool diversity is low, explore whether Grep or Glob could replace manual file navigation.

The data is there. The metrics make it legible.

References & Sources

[1] Claude Code Hooks Reference code.claude.com

[2] systemprompt.io Control Center systemprompt.io

Frequently asked questions

How to measure Claude Code productivity and performance

The systemprompt.io Control Center tracks six deterministic metrics derived from Claude Code hook events: APM (actions per minute), EAPM (effective actions per minute with errors removed), Concurrency, Throughput, Tool Diversity, and Multitasking Score. These metrics are calculated from actual observed events like tool executions, prompt submissions, and session lifecycle data, no estimation or sampling involved. Daily summaries are generated at 11 PM UTC with AI-powered analysis and recommendations.

What is a good APM score in Claude Code

Effective Claude Code users typically range from 5 to 30 APM depending on task complexity. Below 5 APM may indicate long-running autonomous tasks, 5 to 20 is the typical range for interactive development sessions, and above 20 indicates rapid debugging or refactoring with tight feedback loops. More important than raw APM is the EAPM/APM ratio, a ratio above 80% indicates clean execution, while below 70% suggests significant error overhead worth investigating.

How much time does Claude Code save developers compared to manual coding

The metrics reveal time savings through the EAPM/APM ratio and concurrency. In one real-world example, fixing a flaky test suite raised the EAPM/APM ratio from 49% to 92%, meaning developers spent less total time but accomplished nearly twice as much effective work. Running concurrent Claude Code sessions multiplies output, a developer with EAPM of 10 across 3 concurrent sessions achieves 30 effective actions per minute across their workspace.

How to benchmark AI coding assistant ROI for development teams

Track team-level aggregates in the Control Center: median EAPM/APM ratio shows overall prompt quality and environment health, concurrency distribution reveals parallel workflow adoption, and tool diversity spread indicates how fully the team uses Claude Code's capabilities. Use these metrics in retrospectives to surface workflow patterns, for example, if total throughput doubled but EAPM stayed flat, the team may be retrying more rather than producing more. Compare trends week over week against personal baselines rather than absolute numbers.

What metrics should I track for Claude Code sessions across my team

Focus on three team-level metrics: the median EAPM/APM ratio across all members to assess prompt quality and environment health, concurrency distribution to identify who has adopted parallel workflows using worktrees, and tool diversity spread to find power users whose workflows are worth sharing. If one developer has a 95% EAPM ratio while another sits at 40%, the lower-performing developer likely has environment issues like broken tests or missing dependencies that are dragging down their effective output.

Is Claude Code multitasking with subagents worth using for complex projects

Yes, the multitasking score (0 to 100) measures how effectively you delegate and parallelise work using subagents and concurrent sessions. Users who combine both patterns achieve the highest productivity: multiple concurrent sessions where Claude also spawns subagents within each session. In a real refactoring example touching 23 files across 4 directories, using 2 concurrent sessions plus subagent delegation yielded a multitasking score of 45, a 91% EAPM ratio, and 8.3 MB of throughput in 72 minutes.

Book a meeting

Let's talk
your implementation

Discuss technical implementation, enterprise licensing, or custom integrations with the founder. For teams that have evaluated the template and are ready to move forward.

Technical implementation Deployment architecture, IdP integration, SIEM pipelines, and custom extensions
Enterprise licensing Volume licensing, SLA guarantees, and perpetual licence terms under BSL-1.1
Custom integrations Rust extensions, custom governance rules, and provider-specific configurations

30 minutes with the founder. For teams ready to move beyond evaluation.

1 You

2 Team

3 Details

Work email

Full name

No spam Book instantly 30-min call

To request a demo, email ed@systemprompt.io directly.

Prelude

The Problem

The Journey

APM (Actions Per Minute) {#apm}

EAPM (Effective Actions Per Minute) {#eapm}

Concurrency {#concurrency}

Throughput {#throughput}

Tool Diversity {#tool-diversity}

Multitasking Score {#multitasking}

Metric Combinations: Reading the Full Picture

Speed vs Accuracy: APM + EAPM

Efficiency vs Scale: EAPM + Concurrency

Breadth vs Depth: Tool Diversity + Throughput

Delegation Effectiveness: Multitasking + Concurrency

Session Profiles: What Good Looks Like

The Debugger

The Refactorer

The Architect

The Reviewer

Improving Your Metrics

Raising EAPM (reducing errors)

Raising Concurrency

Raising Tool Diversity

Raising Throughput

Raising Multitasking Score

Reducing Throughput Waste

Common Pitfalls

High APM from Error Loops

Inflated Throughput from Context Reloading

Low Concurrency Is Not Always Bad

Tool Diversity Ceiling

Gaming Metrics vs Genuine Improvement

Metrics for Teams

Team Aggregation

Using Metrics in Retrospectives

Identifying Team Bottlenecks

Real-World Metric Profiles

A Debugging Session

A Large Refactoring Session

Before and After: Environment Fix

How Data Is Collected

Event Types

Daily Reports

Tracking Progress Over Time

What to look for weekly

Setting personal baselines

Comparing across projects

Cost Signals Alongside Performance Metrics

Mapping Metrics to Observability Dashboards

Detailed Metric Documentation

The Takeaway

References & Sources

Frequently asked questions

Continue Reading

AI Code Review with Claude Code: Automated PR Reviews That Catch Real Bugs

How to Use Claude Code: The Complete Beginner's Guide

Claude Code Agent Teams: Parallel Agents, Worktrees, and Multi-Task Orchestration

Let's talk your implementation

You're in. Check your inbox.

Let's talk
your implementation