Prelude
The systemprompt.io Control Center tracks six performance metrics that quantify how you work with Claude Code. These are not vanity numbers. Each metric is derived from deterministic event data captured during every Claude Code session, giving you an accurate picture of your interaction patterns, efficiency, and growth over time.
If you have ever looked at the metrics cards in the Control Center and wondered what exactly they measure, how they are calculated, or what a good score looks like, this guide covers all of it.
The Problem
Claude Code sessions generate enormous amounts of data. Every file read, every edit, every bash command, every prompt you submit, every error that occurs. All of it is captured as structured events. Without metrics, that data is just noise. You cannot tell whether today was more productive than yesterday, whether your error rate is climbing, or whether you are actually using the tools available to you.
The six metrics in the Control Center distill that raw event data into actionable numbers. They answer specific questions: How fast am I iterating? How clean is my execution? Am I parallelising effectively? How much data is flowing through my sessions? Am I using the full toolkit? Am I delegating work to subagents?
The Journey
APM (Actions Per Minute) {#apm}
APM measures the rate of meaningful interactions between you and Claude Code per minute of session time. It counts every tool execution (file reads, edits, searches, bash commands) and every prompt you submit.
Formula:
APM = (tool_uses + prompts) / session_duration_minutes
Data source: derived from hook event fields documented in the Claude Code hooks reference, as of 2026-04.
Data source: plugin_session_summaries.tool_uses, plugin_session_summaries.prompts, and session duration calculated from started_at / ended_at timestamps.
Why it matters. APM measures your iteration speed with AI. Higher APM indicates tighter feedback loops. You give instructions, reviewing results, and course-correcting rapidly. In StarCraft terms, this is your raw actions per minute. A session where you submit 5 prompts and Claude executes 45 tool calls over 10 minutes gives you an APM of 5.0. Professional StarCraft players average 300+ APM; effective Claude Code users typically range 5–30 APM depending on task complexity.
How to interpret your APM:
- Below 5. May indicate long-running autonomous tasks or idle time within sessions. Not necessarily bad if Claude is executing complex multi-step operations.
- 5 to 20. Typical range for interactive development sessions. You are actively steering Claude and reviewing outputs.
- Above 20. Rapid interactive sessions with tight feedback loops. Common during debugging, refactoring, or exploratory coding where you are giving frequent corrections.
EAPM (Effective Actions Per Minute) {#eapm}
EAPM is APM with errors removed. It counts only successful tool executions and prompts, excluding failed tool calls.
Formula:
EAPM = (tool_uses + prompts - errors) / session_duration_minutes
Data source: derived from hook event fields documented in the Claude Code hooks reference, as of 2026-04.
Data source: Same fields as APM, plus plugin_session_summaries.errors.
Why it matters. EAPM separates signal from noise. If you have 50 APM but 20 of those actions are errors, your EAPM of 30 reveals your true productive pace. In StarCraft, this is the difference between spam-clicking and meaningful actions. A high APM with low EAPM suggests you may need to adjust your prompts or tool configurations to reduce errors.
How to interpret the gap between APM and EAPM:
- EAPM close to APM. Clean execution. Your sessions run with minimal errors, meaning prompts are clear and tool configurations are correct.
- EAPM significantly lower than APM. High error rate worth investigating. Common causes include misconfigured tools, ambiguous prompts that lead Claude down wrong paths, or working in environments with flaky dependencies.
Concurrency {#concurrency}
Concurrency measures the number of Claude Code sessions running simultaneously. It is calculated by counting overlapping session time windows. Sessions where the started_at to ended_at ranges overlap.
Formula:
Peak = maximum number of sessions with overlapping time ranges
Average = time-weighted mean of concurrent session count over the day
Data source: algorithm described in the sweep-line algorithm article on Wikipedia, applied to session lifecycle fields, as of 2026-04.
Data source: plugin_session_summaries.started_at and plugin_session_summaries.ended_at, calculated using a sweep-line algorithm.
Why it matters. Concurrency shows how effectively you parallelise AI work. Running multiple Claude Code sessions on different tasks simultaneously is like managing multiple bases in StarCraft. It multiplies your output. Peak concurrency of 4 means you had 4 AI assistants working in parallel at one point during the day.
How to interpret your concurrency:
- 1. Serial work. One session at a time. This is fine for focused, complex tasks but leaves throughput on the table.
- 2 to 3. Moderate parallelism. You are running a couple of sessions simultaneously, perhaps one for a main task and another for a side task or code review.
- 4 or higher. Power user territory. You are using multiple sessions effectively, delegating independent tasks to separate Claude Code instances and managing them concurrently.
Throughput {#throughput}
Throughput measures the total data volume exchanged between you and Claude Code. This includes all content sent to the AI (prompts, file contents, tool inputs) and all content received (responses, tool outputs, generated code).
Formula:
Total = content_input_bytes + content_output_bytes
Rate = total_bytes / total_active_session_seconds
Data source: byte-count fields emitted by the hook events documented in the Claude Code hooks reference, as of 2026-04.
Data source: plugin_session_summaries.content_input_bytes and plugin_session_summaries.content_output_bytes.
Why it matters. Throughput reflects the volume of work being processed. High throughput means large files are being read, significant code is being generated, or complex analyses are being performed. In StarCraft terms, this is your resource gathering rate. It measures how much material flows through your AI pipeline. A session generating 2 MB of output is doing substantially more work than one generating 50 KB.
How to interpret your throughput:
- Displayed as total bytes (KB/MB) and rate (bytes per second).
- Higher throughput generally correlates with more productive sessions, but context matters. A refactoring session that touches many files will naturally have higher throughput than a focused debugging session.
- Sudden throughput drops may indicate sessions that stall or hit context limits.
Tool Diversity {#tool-diversity}
Tool diversity is the number of unique tools used across all sessions in a day. Tools include Read (file reading), Edit (file modification), Bash (command execution), Grep (code search), Glob (file search), Write (file creation), and Agent (subagent delegation).
Formula:
Tool Diversity = COUNT(DISTINCT tool_name) WHERE event_type = 'PostToolUse'
Data source: PostToolUse event names documented in the Claude Code hooks reference, as of 2026-04.
Data source: plugin_usage_events.tool_name filtered to successful tool executions.
Why it matters. Tool diversity indicates how versatile your AI usage is. Using only Read and Bash is like building only one unit type in StarCraft. It works, but you are not using the full toolkit. Users who employ Read, Edit, Grep, Glob, Bash, Write, and Agent have a richer interaction pattern and typically complete more complex tasks. A diversity of 7 (all tools) suggests sophisticated multi-step workflows.
How to interpret your tool diversity:
- 1 to 2. Basic usage. You may be using Claude Code primarily for reading files or running commands. Consider whether Edit, Grep, or Glob could speed up your workflow.
- 3 to 4. Moderate diversity. You are using a reasonable subset of the available tools.
- 5 or higher. Advanced usage. You are using the full Claude Code toolkit, including search tools and subagent delegation.
Multitasking Score {#multitasking}
The multitasking score is a composite metric (0–100) measuring how effectively you delegate and parallelise work. It factors in subagent spawning (Claude creating helper agents) and session concurrency.
Formula:
Multitasking = min(100, (subagent_spawns × 2 + peak_concurrency × 3) / session_count × 10)
Data source: subagent lifecycle events documented in the Claude Code sub-agents reference, combined with hook event session counts, as of 2026-04.
Data source: plugin_session_summaries.subagent_spawns, peak concurrent sessions, and total session count.
Why it matters. The multitasking score captures your ability to run AI at scale. Spawning subagents means you are letting Claude break complex tasks into parallel subtasks, the AI equivalent of army splitting in StarCraft. Combined with session concurrency, this shows whether you are using AI as a single worker or as a coordinated team.
How to interpret your multitasking score:
- 0 to 20. Sequential, single-task usage. You are working with one Claude session at a time without subagent delegation. This is fine for simple tasks.
- 20 to 50. Moderate delegation. You are either running concurrent sessions or triggering subagent spawns, but not both heavily.
- 50 or higher. Heavy parallelism and delegation. You are running multiple concurrent sessions and using subagents within those sessions. This is the pattern of users who treat Claude Code as a team rather than a single assistant.
Metric Combinations: Reading the Full Picture
Individual metrics tell you something useful. Metrics read together tell you what is actually happening in your workflow.
Speed vs Accuracy: APM + EAPM
The gap between APM and EAPM is your error rate. If your APM is 15 and your EAPM is 12, you have a 20% error rate. Track this ratio over time rather than absolute values.
| Pattern | APM | EAPM | Ratio | Diagnosis |
|---|---|---|---|---|
| Clean execution | 12 | 11.5 | 96% | Prompts are clear, tools are configured correctly |
| Moderate noise | 18 | 13 | 72% | Some errors, likely from exploratory work or unfamiliar codebases |
| High error rate | 25 | 10 | 40% | Significant retry overhead. Check for misconfigured tools, ambiguous prompts, or flaky test suites |
Data source: first-party session samples from the Control Center described above, as of 2026-04.
When the ratio drops below 70%, investigate your error sources. The most common causes are: running bash commands that fail due to missing dependencies, file edits that conflict with concurrent changes, and grep patterns that match nothing. Each of these wastes a tool call and time.
Efficiency vs Scale: EAPM + Concurrency
High EAPM with concurrency of 1 means you are fast but sequential. Low EAPM with concurrency of 4 means you are parallelising but each session is sluggish. The combination reveals your actual throughput capacity.
A user with EAPM of 10 and concurrency of 3 is effectively producing at a rate of 30 effective actions per minute across their workspace. Compare that to EAPM of 20 with concurrency of 1, technically faster per session but lower total output.
Breadth vs Depth: Tool Diversity + Throughput
Low tool diversity with high throughput suggests you are doing repetitive bulk operations (reading many files, running many bash commands). High tool diversity with moderate throughput suggests complex multi-step workflows that use the full toolkit.
| Diversity | Throughput | Pattern |
|---|---|---|
| 2 (Read + Bash) | High (5 MB+) | Bulk analysis or log review |
| 4 (Read + Edit + Bash + Grep) | Moderate (500 KB–2 MB) | Standard development workflow |
| 6+ (all tools) | Moderate to High | Complex refactoring or multi-repository work |
| 7 (all tools including Agent) | Any | Advanced orchestration with subagent delegation |
Data source: first-party session samples from the Control Center described above, mapped to tools listed in the Claude Code hooks reference, as of 2026-04.
Delegation Effectiveness: Multitasking + Concurrency
The multitasking score combines subagent usage and session concurrency. But the two components tell different stories. High subagent spawns within a single session means Claude is breaking down complex tasks internally. High concurrency without subagents means you are manually managing parallel sessions.
The most effective pattern is both: multiple concurrent sessions where Claude also spawns subagents within those sessions. This represents full utilisation of Claude Code's parallelism capabilities.
Session Profiles: What Good Looks Like
Different types of work produce different metric signatures. Comparing your metrics to these profiles helps you understand whether your numbers are typical or indicate something worth adjusting.
The Debugger
Tight feedback loops, lots of reads and searches, few edits until the fix is found.
| Metric | Typical Range |
|---|---|
| APM | 8–20 |
| EAPM | 6–18 |
| Concurrency | 1 |
| Throughput | 200 KB–1 MB |
| Tool Diversity | 4–5 (Read, Grep, Bash, Edit, Glob) |
| Multitasking | 0–15 |
Data source: first-party session samples from the Control Center described above, as of 2026-04.
The Refactorer
High throughput from touching many files, moderate APM, high tool diversity.
| Metric | Typical Range |
|---|---|
| APM | 10–25 |
| EAPM | 9–23 |
| Concurrency | 1–2 |
| Throughput | 1–10 MB |
| Tool Diversity | 5–7 |
| Multitasking | 10–40 |
Data source: first-party session samples from the Control Center described above, as of 2026-04.
The Architect
Multiple concurrent sessions, subagent delegation, exploring different approaches in parallel.
| Metric | Typical Range |
|---|---|
| APM | 5–15 per session |
| EAPM | 4–14 per session |
| Concurrency | 3–6 |
| Throughput | 2–15 MB total |
| Tool Diversity | 6–7 |
| Multitasking | 40–100 |
Data source: first-party session samples from the Control Center described above, as of 2026-04.
The Reviewer
Low APM, high read-to-edit ratio, focused on understanding code rather than changing it.
| Metric | Typical Range |
|---|---|
| APM | 2–8 |
| EAPM | 2–7 |
| Concurrency | 1 |
| Throughput | 500 KB–3 MB |
| Tool Diversity | 3–4 (Read, Grep, Glob, Bash) |
| Multitasking | 0–5 |
Data source: first-party session samples from the Control Center described above, as of 2026-04.
Improving Your Metrics
Metrics are descriptive, not prescriptive. Chasing higher numbers for their own sake is counterproductive. But if you notice patterns that suggest inefficiency, here are specific actions tied to each metric.
Raising EAPM (reducing errors)
- Write more specific prompts. Instead of "fix the tests", say "fix the failing test in
test_auth.rsby updating the mock to return a 200 status". Specific prompts reduce the chance of Claude taking wrong turns. - Pre-check your environment. Many errors come from missing dependencies, wrong working directories, or stale build artifacts. Running a quick build or test before starting a Claude session eliminates these.
- Use
/compactto manage context. When sessions get long, Claude's context fills up and error rates increase. Compacting the conversation keeps the AI focused on relevant context.
Raising Concurrency
- Identify independent tasks. If you have a feature to build and tests to write, those can run in separate sessions. If you have two unrelated bugs, each gets its own session.
- Use worktrees. Git worktrees let multiple Claude Code sessions operate on the same repository without file conflicts. Each session gets its own working copy.
- Start with two sessions. The jump from 1 to 2 concurrent sessions is the biggest productivity gain. Do not try to manage 5 sessions on day one.
Raising Tool Diversity
- Use Grep instead of Bash grep. Claude's built-in Grep tool is faster and provides structured output. If you see Claude running
grep -r, your tool diversity is artificially low. - Use Glob for file discovery. Instead of
find . -name "*.rs", let Claude use the Glob tool. Faster and counts toward diversity. - Delegate with Agent. If a task has independent subtasks (research one component, modify another), tell Claude to use subagents. This raises both tool diversity and multitasking score.
Raising Throughput
Throughput is primarily a function of task complexity. You do not need to artificially inflate it. But if throughput is consistently low, you may be under-utilising Claude:
- Let Claude read more context. Instead of pasting snippets, let Claude read the full files. More input context generally leads to better output.
- Ask for complete implementations. Instead of asking Claude to outline an approach, ask it to write the full code. This generates more output and typically saves you time.
Raising Multitasking Score
The multitasking score rewards both subagent delegation and session concurrency. If your score is consistently below 20, try these approaches:
- Use plan mode for complex tasks. When Claude enters plan mode and then executes, it naturally spawns subagents for parallel research. A single planning session can generate 3-6 subagent spawns, which directly lifts the multitasking score.
- Tell Claude to delegate. Explicitly instruct Claude to use the Agent tool for independent subtasks. "Research the authentication module in one agent while refactoring the database layer in another" triggers parallel work that both completes faster and improves your score.
- Run concurrent sessions on independent tasks. If you are working on a feature branch and also need to review a colleague's PR, those are separate sessions. The concurrency multiplier in the multitasking formula rewards this.
- Combine subagents with concurrency. The highest multitasking scores come from users running 2-3 concurrent sessions where each session also delegates to subagents. This is the "team of teams" pattern.
Reducing Throughput Waste
High throughput is not always productive throughput. Watch for these wasteful patterns:
- Reading the same files repeatedly. If Claude reads
src/main.rsfive times in one session, those bytes count toward throughput but add no value after the first read. Use/compactto keep context focused and reduce re-reads. - Verbose Bash output. A
cargo buildthat outputs 200 lines of dependency resolution inflates throughput without adding useful information. Consider redirecting verbose output:cargo build 2>&1 | tail -20. - Large file writes that get immediately overwritten. If Claude writes a 500-line file and then rewrites it after your correction, both writes count toward throughput. Provide clear requirements upfront to reduce rewrites.
Common Pitfalls
Metrics can mislead if you read them without context. These are the patterns that most commonly cause developers to draw wrong conclusions.
High APM from Error Loops
A session where Claude attempts a Bash command, fails, retries with a slight variation, fails again, and repeats can produce an APM of 30 or higher. This looks productive on paper. In reality, the session is stuck in a retry loop. Always check the EAPM/APM ratio alongside raw APM. If the ratio is below 60%, the high APM is noise, not signal.
Inflated Throughput from Context Reloading
When a session hits the context limit and Claude compacts the conversation, subsequent file reads are "new" from the metrics perspective even though the same files were read earlier. A long session that compacts twice can show 5 MB of throughput when the actual unique data processed was 2 MB. This is not a bug in the metrics. It accurately reflects the data that flowed through the session. But it should not be compared directly to a short session that never compacted.
Low Concurrency Is Not Always Bad
A developer working on a complex, tightly-coupled refactoring across 15 files cannot safely run concurrent sessions. The files conflict. The changes depend on each other. Concurrency of 1 is correct here. Do not force parallelism on tasks that are inherently sequential. The metric is diagnostic, not prescriptive.
Tool Diversity Ceiling
There are only 7 core tools (Read, Edit, Bash, Grep, Glob, Write, Agent). A diversity score of 7 is the maximum. Once you hit 5-6 regularly, further improvement is marginal. Focus on other metrics instead.
Gaming Metrics vs Genuine Improvement
It is possible to inflate every metric artificially. Run empty Bash commands to raise APM. Spawn unnecessary subagents to raise multitasking. Read large files you do not need to inflate throughput. None of this makes you more productive. The metrics exist to reveal patterns in genuine work. If you change your behaviour to please the metrics rather than to improve your workflow, the metrics lose their diagnostic value.
Metrics for Teams
Individual metrics tell you about one developer's interaction patterns. Team-level aggregates tell you about workflow health, bottleneck distribution, and adoption maturity. If you lead a team using Claude Code, here is how to read the aggregate data.
Team Aggregation
The Control Center can display metrics across all team members. The useful aggregates are:
- Median EAPM/APM ratio across the team. This shows overall prompt quality and environment health. If one developer has a ratio of 95% and another has 40%, the second developer likely has environment issues (broken tests, missing dependencies, misconfigured tools) that are worth investigating together.
- Concurrency distribution. How many team members are running concurrent sessions? If everyone is at concurrency 1, the team is not running parallel workflows. A team-wide session on worktrees and concurrent Claude Code usage can shift this.
- Tool diversity spread. If most of the team uses 3-4 tools and one person uses 7, that person has found workflows worth sharing. If everyone is at 2 (Read + Bash), the team is under-utilising Claude Code's capabilities.
Using Metrics in Retrospectives
Metrics belong in retrospectives, not performance reviews. The goal is to surface workflow patterns, not to rank individuals. Productive retrospective questions:
- "Our team median EAPM/APM ratio dropped from 85% to 70% this sprint. What changed in our environment?" (Maybe a dependency broke, or the test suite became flaky.)
- "Two team members have concurrency above 3 while the rest are at 1. What are they doing differently?" (Maybe they discovered worktrees, or they split their tickets into parallelisable chunks.)
- "Our total throughput doubled this sprint but our EAPM stayed flat. Are we processing more data or just retrying more?" (Check the error rate to distinguish.)
Never use metrics to compare individual output. A developer working on a hard, unfamiliar problem will naturally have lower EAPM than one making routine changes. The metrics reflect task difficulty as much as developer capability.
Identifying Team Bottlenecks
When team throughput plateaus despite growing session counts, look for these bottlenecks:
- Serial review dependencies. If one developer's work blocks another's review, concurrency for the team is artificially capped. Automated PR reviews with Claude Code GitHub Actions can unblock this.
- Shared resource contention. If multiple sessions need to modify the same files or run tests on the same database, concurrent sessions interfere with each other. Worktrees and isolated test environments fix this.
- Knowledge silos. If only one team member has high tool diversity (because they are the only one who knows how to use subagents or MCP tools), the team's aggregate potential is constrained. Pair sessions and shared CLAUDE.md configurations distribute knowledge.
Real-World Metric Profiles
Abstract ranges are useful but concrete examples are better. These are actual metric snapshots from real Claude Code sessions.
A Debugging Session
Task: Tracking down a race condition in session lifecycle management. The bug only reproduced under concurrent load.
| Metric | Value | Notes |
|---|---|---|
| APM | 14.2 | Rapid iteration: read logs, hypothesise, test, repeat |
| EAPM | 11.8 | 83% ratio. Several failed grep patterns before finding the right log entries |
| Concurrency | 1 | Sequential by necessity. The bug was in concurrency handling, so running concurrent sessions would have confused the investigation |
| Throughput | 1.4 MB | Moderate. Mostly log file reads and code inspection |
| Tool Diversity | 5 | Read, Grep, Bash, Edit, Glob. No subagents needed |
| Multitasking | 0 | Single session, no delegation |
| Duration | 38 minutes |
Data source: first-party session recorded by the Control Center described above, as of 2026-04.
What the metrics reveal: Clean debugging session. The 83% EAPM ratio shows some wasted effort (the failed grep patterns) but is within the healthy range. High APM with tool diversity of 5 is characteristic of the Debugger profile. The fix was a 3-line change to add a mutex guard.
A Large Refactoring Session
Task: Extracting a shared module from three Rust extensions into a common library. Touched 23 files across 4 directories.
| Metric | Value | Notes |
|---|---|---|
| APM | 18.7 | High. Many file reads, edits, and build checks |
| EAPM | 17.1 | 91% ratio. Very clean execution because the pattern was well understood |
| Concurrency | 2 | Main session for refactoring, second session running tests continuously |
| Throughput | 8.3 MB | High. 23 files read and modified, multiple full builds |
| Tool Diversity | 7 | All tools used including Agent for parallel exploration |
| Multitasking | 45 | 2 concurrent sessions + subagent delegation for finding all import paths |
| Duration | 1 hour 12 minutes |
Data source: first-party session recorded by the Control Center described above, as of 2026-04.
What the metrics reveal: Textbook Refactorer profile. The 91% EAPM ratio is excellent for a task this large and indicates clear understanding of the target architecture before starting. Concurrency of 2 (refactoring + continuous testing) is the optimal pattern for this kind of work: catch breakages immediately instead of discovering them at the end. The throughput of 8.3 MB reflects the scale of the change.
Before and After: Environment Fix
Before (broken test suite, one week):
- Average APM: 12.4
- Average EAPM: 6.1 (49% ratio)
- Average errors per session: 8.3
- Root cause: Flaky integration test that failed 40% of the time. Claude retried the test suite on every failure.
After (test suite fixed, following week):
- Average APM: 10.8 (slightly lower, less frantic)
- Average EAPM: 9.9 (92% ratio)
- Average errors per session: 1.1
- The fix: Isolated the flaky test into its own test target so it did not block the main suite.
What the metrics reveal: The APM actually decreased after the fix, which might look like a productivity decline. But EAPM nearly doubled. The developer was spending less total time but accomplishing more. This is a case where the EAPM/APM ratio told the true story while raw APM was misleading.
How Data Is Collected
All metrics are derived from hook events that Claude Code sends to systemprompt.io. Every tool execution, prompt submission, error, and session lifecycle event (start, stop) is captured as a deterministic event with precise timestamps and byte counts. No estimation or sampling is involved, every metric is calculated from actual observed events.
When you install the hooks into your Claude Code configuration, they fire HTTP requests on each lifecycle event. These events are stored as raw records and then aggregated into the metrics you see in the Control Center.
Event Types
The following event types feed into the metrics calculations:
PostToolUse. A tool was successfully executed (Read, Edit, Bash, Grep, Glob, Write, Agent). This is the primary event for APM, EAPM, and tool diversity calculations.PostToolUseFailure. A tool execution failed. Used to calculate the error count that separates EAPM from APM.UserPromptSubmit. You submitted a prompt to Claude. Counted in both APM and EAPM.SessionStart/SessionEnd/Stop. Session lifecycle events. Used to calculate session duration, concurrency, and to delineate session boundaries.SubagentStart/SubagentStop. Subagent lifecycle events. Used in the multitasking score calculation to track how many subagents were spawned.
Daily Reports
These metrics are aggregated into daily summaries at 11 PM UTC. The aggregation process calculates totals, averages, and peaks across all sessions for the day. An AI-generated analysis accompanies each daily summary, providing insights about patterns, identifying skill gaps, and offering recommendations for improving your workflow.
Historical data is preserved indefinitely, allowing you to track trends over weeks and months. The Control Center displays both the current day's live metrics and historical daily summaries for comparison.
Tracking Progress Over Time
The Control Center preserves daily summaries indefinitely. This history is where the real value lives. A single day's metrics are a snapshot. A week of daily summaries shows a trend.
What to look for weekly
Review your metrics at the end of each week. Focus on three questions:
- Is my EAPM/APM ratio improving? A rising ratio means fewer errors per session. This is the most directly actionable trend because it responds to better prompt quality and environment hygiene.
- Is my concurrency stable or growing? If you are stuck at concurrency 1 week after week, you have not adopted parallel workflows. Even moving to 2 concurrent sessions doubles your effective capacity.
- Has my tool diversity changed? If you added a new tool to your workflow (for example, starting to use Agent for subagent delegation), it should show up as a bump in tool diversity. If it does not, the tool is not being triggered.
Setting personal baselines
After two weeks of data, calculate your averages for each metric. These become your personal baselines. Deviations from baseline are more meaningful than absolute numbers.
For example, if your baseline EAPM is 12 and you see a day at 5, something changed. Maybe you worked in an unfamiliar codebase, or your test suite was broken, or you spent the day on a task that required more reading than writing. The metric flags the deviation; you supply the context.
Comparing across projects
Different projects produce different metric profiles. A greenfield project typically shows higher throughput and higher tool diversity (lots of file creation and search). A maintenance project shows lower throughput but potentially higher EAPM (focused, precise edits). Do not compare metrics across fundamentally different project types.
Cost Signals Alongside Performance Metrics
Performance metrics tell you how you are working. Cost per action tells you what that work is costing. The API price per million tokens varies by model, so the same APM profile produces very different bills depending on which Claude model your sessions are hitting. The published API prices are available on the Anthropic pricing page.
| Model | Input ($/Mtok) | Output ($/Mtok) | Where to confirm |
|---|---|---|---|
| Claude Opus (latest) | see page | see page | anthropic.com/pricing |
| Claude Sonnet (latest) | see page | see page | anthropic.com/pricing |
| Claude Haiku (latest) | see page | see page | anthropic.com/pricing |
Data source: Anthropic API pricing page, as of 2026-04. Prices and model names change; always confirm on the live page. Prompt caching and batch API discounts are documented in the Anthropic prompt caching guide and the Anthropic batch API docs.
Throughput in the Control Center is measured in bytes. To convert bytes into token counts for cost estimation, a common rule of thumb documented by Anthropic's tokenisation reference is roughly 4 characters per token for English text and source code. A session with 2 MB of throughput is therefore on the order of 500,000 tokens split between input and output. Multiply by the relevant price in the table above to estimate the session's API cost, then cross-check against your Anthropic console usage report.
For Anthropic API availability and incident history that may explain anomalies in your metrics (spikes in errors or latency), check the Anthropic status page.
Mapping Metrics to Observability Dashboards
The six Control Center metrics are one view. If you already run an observability stack, you can correlate them with standard telemetry pipelines. The mapping below uses only tools with published documentation, so every link points to a primary source.
| Control Center metric | Analogous observability signal | Open-source tooling and spec |
|---|---|---|
| APM | Request rate (RPS) for LLM API calls | Prometheus counter and rate() docs |
| EAPM | Success-only request rate (errors excluded) | OpenTelemetry semantic conventions for errors |
| Concurrency | Active in-flight requests / span concurrency | OpenTelemetry tracing spec, Grafana time-series panels |
| Throughput | Bytes-in / bytes-out per service | Prometheus histograms, rendered with Grafana bar gauge |
| Tool Diversity | Cardinality of operation names in traces | OpenTelemetry span attributes |
| Multitasking Score | Parent/child span fan-out in distributed traces | OpenTelemetry context and propagation |
Data source: tool mappings built from the Prometheus querying docs, the OpenTelemetry specification, and the Grafana documentation, as of 2026-04.
The mapping is deliberately one-directional. The Control Center metrics aggregate Claude Code hook events server-side; Prometheus and OpenTelemetry are where you would build parallel dashboards on top of your own application telemetry if you want to correlate Claude Code activity with production system behaviour.
Detailed Metric Documentation
Each metric has a dedicated reference page with additional examples, edge cases, and technical details:
- APM, Actions Per Minute
- EAPM, Effective Actions Per Minute
- Concurrency, Parallel Session Count
- Throughput, Data Volume
- Tool Diversity, Unique Tools Used
- Multitasking Score, Delegation and Parallelism
The Takeaway
The six metrics in the Control Center give you a complete, deterministic picture of how you interact with Claude Code. APM and EAPM measure your speed and accuracy. Concurrency and multitasking score measure your ability to parallelise. Throughput measures the volume of work flowing through your sessions. Tool diversity measures the breadth of your toolkit usage.
None of these metrics have a universally "correct" value. A focused debugging session will naturally have different metric profiles than a large refactoring session. The value is in tracking your own patterns over time, identifying sessions where metrics deviate from your norm, and understanding why.
Read them in combination, not isolation. A high APM with low EAPM means errors are eating your speed. High concurrency with low tool diversity means you are running parallel sessions but each one is doing simple work. High throughput with low EAPM means large volumes of data are flowing but much of the work is being retried.
Use the daily AI-generated insights as a starting point, then drill into the metrics that matter most for your workflow. If you are consistently seeing a gap between APM and EAPM, focus on reducing errors. If your concurrency is always 1, consider whether some tasks could run in parallel. If your tool diversity is low, explore whether Grep or Glob could replace manual file navigation.
The data is there. The metrics make it legible.