SOC Metrics & KPIs Quick Reference¶

How to use this sheet

Formulas use standard statistical notation. Benchmarks are drawn from Ponemon, SANS, Gartner, and Verizon DBIR research. Level designations (L1–L5) refer to SOC maturity, not analyst tier.

1. Core Time Metrics¶

Metric	Full Name	Formula	Industry Benchmark	L4 SOC Target	Primary Data Source	Common Pitfalls
MTTD	Mean Time to Detect	`Σ(Detection_time − Breach_start_time) / n`	194 days (Ponemon 2023)	< 24 hours	SIEM first-alert timestamp vs. threat intel confirmed IOC time	Clock-starts when alert fires, not when attacker entered; inflate result
MTTA	Mean Time to Acknowledge	`Σ(Acknowledge_time − Alert_created_time) / n`	4–8 hours (industry)	< 15 minutes	Ticket system timestamps (alert created → analyst assigns)	Analysts mark "acknowledged" immediately to game the metric; use SLA compliance rate instead
MTTR	Mean Time to Respond	`Σ(Response_action_time − Alert_created_time) / n`	15–24 hours (SANS 2023)	< 4 hours	Ticket system: alert created → first remediation action recorded	Conflates acknowledgement with effective response; track both separately
MTTC	Mean Time to Contain	`Σ(Contain_time − Breach_confirmed_time) / n`	63 days (Ponemon)	< 2 hours (for detected threats)	IR platform: incident created → host/account isolated	Only start clock at confirmed incident, not every alert
Dwell Time	Attacker persistence duration	`Detection_date − First_evidence_of_compromise`	16 days (Mandiant 2023)	< 1 day	Threat intel + forensic timeline vs. SIEM first-alert	Requires post-incident forensics to measure accurately; rarely measured real-time
Alert Backlog Age	Oldest unreviewed alert age	`NOW() − oldest_unreviewed_alert_timestamp`	< 24 hours healthy	< 4 hours	SIEM/SOAR queue	Growing backlog indicates capacity problem, not analyst performance problem

MTTD vs. Dwell Time Distinction

MTTD = time from intrusion to your SOC detecting it. Dwell Time = time attacker was present before any detection — sometimes discovered months later during IR. They are rarely equal. Dwell time requires IR forensics; MTTD comes from your SIEM timestamps.

2. Detection Quality Metrics¶

Metric	Formula	Definition	Good Threshold	Concerning Threshold	Data Source
True Positive Rate (Recall / Sensitivity)	`TP / (TP + FN)`	% of real attacks that generated an alert	> 90%	< 70%	Red team/purple team exercises; confirmed incidents
False Positive Rate (FPR)	`FP / (FP + TN)`	% of benign events that triggered alert	< 5%	> 20%	Analyst disposition tags in ticketing system
Precision (Positive Predictive Value)	`TP / (TP + FP)`	% of alerts that are real threats	> 50%	< 20%	Closed ticket disposition (True/False positive)
F1 Score	`2 × (Precision × Recall) / (Precision + Recall)`	Harmonic mean balancing precision + recall	> 0.70	< 0.50	Derived from precision/recall
Detection Coverage %	`Techniques_with_validated_rules / Total_ATT&CK_techniques × 100`	ATT&CK framework coverage breadth	> 60% top 30	< 30%	Navigator layer export vs. SIEM rule inventory
Rule Health Ratio	`Rules_fired_in_30d / Total_deployed_rules × 100`	% of rules generating at least one alert	> 70%	< 40%	SIEM rule analytics; rule management platform
Stale Rule Rate	`Rules_not_fired_in_90d / Total_rules × 100`	Rules that have never fired recently	< 15%	> 35%	SIEM audit; detection engineering backlog
Rule Noise Score	`FP_count_per_rule / Total_alerts_per_rule`	Per-rule false positive density	< 0.30	> 0.70	SIEM + ticket disposition correlation

The Precision-Recall Tradeoff

High-sensitivity rules catch more attacks (high recall) but also more benign events (low precision = alert fatigue). High-precision rules miss fewer attacks but miss more real threats. Target: Precision > 50% while keeping Recall > 85% for critical technique coverage.

Confusion Matrix Reference¶

                    ACTUAL THREAT    ACTUAL BENIGN
ALERT FIRED    │   True Positive  │  False Positive │
NO ALERT       │   False Negative │  True Negative  │

TP → Correctly detected attack → Close as Confirmed Incident
FP → Alert on benign activity → Tune rule; reduces analyst trust
FN → Missed attack → Worst outcome; only found via IR or red team
TN → No alert on benign activity → Expected; not directly measurable at scale

3. Alert Volume Metrics & Benchmarks¶

Metric	Formula	Industry Benchmark	Healthy Range	Red Flag
Alerts per Analyst per Day	`Total_daily_alerts / Analyst_FTE_count`	100–500 (raw SIEM)	< 50 (post-triage, actionable)	> 100 actionable = burnout risk
Alert-to-Incident Ratio	`Total_alerts / Confirmed_incidents`	100:1 to 1000:1	< 200:1 with good tuning	> 500:1 = rule quality crisis
Escalation Rate T1→T2	`T2_escalations / T1_alerts_closed × 100`	10–20%	5–15%	> 30% = T1 undertrained; < 5% = T1 over-closing
Auto-Close Rate (SOAR)	`SOAR_auto_closed / Total_alerts × 100`	20–40% (mature SOAR)	> 25% for low-severity alerts	> 60% auto-close may hide real threats
FP Rate by Category	`FP_in_category / Total_in_category × 100`	Varies by rule type	< 20% per category	> 40% = rule needs immediate tuning
Mean Alerts per Incident	`Total_alerts / Total_incidents`	50–200	< 100 for correlated SIEM	Rising trend = detection fragmentation
SOAR Playbook Completion Rate	`Completed_playbooks / Triggered_playbooks × 100`	> 85%	> 90%	< 75% = playbook errors or data quality issues
Analyst Alert Handling Time	`Σ(time_per_alert) / alert_count`	5–15 min/alert	< 8 min for L1 triage	> 20 min = complexity issue or missing tools

Alert Fatigue Warning Signs

Analysts bulk-closing alerts without review (disposition = "Not investigated")
MTTA worsening week-over-week despite same staffing
Escalation rate dropping as alert volume rises (analysts skipping escalations)
High sick leave / turnover rate in analyst staff

4. SOC Maturity Level Metric Thresholds¶

Maturity Level	Description	MTTD Target	MTTA Target	MTTR Target	ATT&CK Coverage	FP Rate	Automation
L1 — Ad Hoc	No formal processes; reactive	> 30 days	> 24 hours	> 72 hours	< 15%	> 50%	None
L2 — Developing	Basic SIEM; some documented procedures	7–30 days	4–24 hours	24–72 hours	15–30%	30–50%	Basic alerting
L3 — Defined	SIEM + SOAR; defined playbooks; threat intel feed	1–7 days	1–4 hours	4–24 hours	30–60%	15–30%	Playbook automation
L4 — Managed	Proactive hunting; continuous tuning; metrics-driven	< 24 hours	< 15 min	< 4 hours	60–80%	5–15%	High SOAR automation
L5 — Optimizing	Autonomous detection; ML; red team integration	< 1 hour	< 5 min	< 1 hour	> 80%	< 5%	Near-full automation

Maturity Model Caveat

L5 is aspirational for almost all organizations. Most mature enterprise SOCs operate at L3–L4. Do not sacrifice detection quality (recall) to achieve L4 speed targets — a fast MTTR on a FP is worthless.

5. Analyst Productivity Metrics¶

What TO Measure¶

Metric	Why It Matters	How to Measure
Alert handling time (median, not mean)	Identifies complexity spikes; use median to avoid outlier skew	Ticket system time-in-state
Escalation quality (T2 agrees with T1 escalation?)	Measures triage accuracy without incentivizing under-escalation	T2 disposition vs. T1 escalation flag
Detection rule contribution (rules authored per analyst)	Tracks capability growth	Detection engineering backlog
Playbook feedback rate	Analysts identifying process gaps	SOAR feedback tags
Training completion + certification pace	Skills development	LMS / certification tracker
MTTA variance (are analysts slower at end of shift?)	Staffing model insight	Time-series analysis of MTTA by hour

What NOT TO Measure (Avoid Perverse Incentives)¶

Metrics That Harm SOC Culture

Avoid This Metric	Why It Backfires	Better Alternative
Alerts closed per day (raw volume)	Incentivizes bulk-closing without investigation	Alerts investigated with documented disposition
Zero false positives per analyst	Analysts become over-conservative; miss real threats	Team-level FP rate with blameless review
MTTA as individual KPI	Analysts mark alerts acknowledged immediately without reading	Pair MTTA with first-meaningful-action timestamp
No escalations = good analyst	Suppresses escalation of ambiguous threats	Track escalation accuracy, not rate
Lines of KQL written per week	Incentivizes verbose, low-quality rules	Rule effectiveness (TPR per rule)
Presence/hours online	Doesn't reflect investigation quality	Outcome-based metrics above

6. Executive Dashboard Template¶

Board Level (Quarterly)¶

Metric	Visualization	Audience Framing
Breach risk score trend	RAG status gauge	"Are we better or worse than last quarter?"
MTTD vs. industry benchmark	Bar chart with benchmark line	"How fast do we find attackers?"
Critical incidents (confirmed)	Count + trend	"How many real attacks this quarter?"
Security control effectiveness	% controls passing	"Are our investments working?"
Regulatory compliance posture	% controls in scope	"Are we meeting obligations?"

CISO Level (Monthly)¶

Metric	Visualization	Notes
MTTD / MTTA / MTTR (trending)	Line charts, 13-month view	Show improvement trajectory
ATT&CK coverage heatmap	Navigator export	Show gaps vs. threat actor profiles
Top 5 attack techniques observed	Bar chart	Tie to threat landscape
Incident volume by severity	Stacked bar	S1/S2 incidents are CISO-level items
Detection rule health	Fired % / stale %	Rule debt indicator
SOAR automation rate	Trend line	ROI of SOAR investment
Analyst capacity vs. alert volume	Dual axis chart	Staffing justification data

SOC Manager Level (Weekly)¶

Metric	Visualization	Action Trigger
Alert backlog age (oldest open)	Single number + trend	> 24h = staffing/process issue
Analyst MTTA by shift	Heatmap	Identifies shift-specific delays
FP rate by rule (top 10 noisiest)	Table	Tune immediately if > 40%
Escalation rate T1→T2	Trend	Alert if drops sharply
SOAR playbook failure rate	Bar chart per playbook	Fix broken playbooks
New threat intel integrations	Count	Track intel coverage expansion
Open incidents by severity/age	Kanban-style table	S1/S2 aging > 4h = escalation

7. Metrics Pitfalls — Top 10 Mistakes & Fixes¶

#	Mistake	Why It Hurts	Fix
1	Measuring MTTD only from first alert, not first evidence	Severely understates true detection gap; hides dwell time	Supplement with post-incident forensic timeline
2	Using mean instead of median for time metrics	Outlier major incidents (30-day dwell) skew mean massively	Report P50, P90, P99 percentiles
3	Counting all alerts equally (SOAR auto-closes mixed with analyst reviews)	Inflates "alerts handled" count; hides analyst load problem	Separate automated dispositions from analyst-reviewed
4	Reporting raw ATT&CK technique coverage % without sub-technique detail	One high-level rule might claim coverage of 50 sub-techniques	Score at sub-technique level; require validated detections
5	Not tracking false negatives	FN rate is invisible without red team/purple team exercises	Run quarterly adversary simulations; track missed detections
6	Treating metric improvement as the goal (Goodhart's Law)	When a measure becomes a target, it ceases to be a good measure	Rotate metrics annually; use leading + lagging indicators
7	Using MTTR to include investigation time (not just remediation)	Inflates MTTR; obscures whether response is fast once decision is made	Split: MTTI (investigate) + MTTC (contain) + MTTR (remediate)
8	Not segmenting metrics by asset criticality	P3 server incident treated same as P1 crown-jewel server	Weight metrics by asset criticality tier
9	Publishing analyst-level metrics publicly in the team	Creates unhealthy competition; degrades team knowledge sharing	Use team-level metrics; use individual metrics only in 1:1s
10	Ignoring trend direction in favor of absolute values	A MTTD of 12 hours looks good if benchmark is 194 days; bad if it was 6 hours last month	Always show 13-month trend alongside current value

8. KPI Formula Reference Card¶

KPI	Formula	Unit	Notes
MTTD	`Σ(t_detect − t_intrusion) / n`	Hours	t_intrusion from forensics or IOC first-seen
MTTA	`Σ(t_ack − t_alert_created) / n`	Minutes	Ticket system timestamps
MTTR	`Σ(t_remediate − t_incident_open) / n`	Hours	IR platform timestamps
MTTC	`Σ(t_contain − t_confirmed) / n`	Hours	Start at incident confirmation, not alert
Dwell Time	`t_detection − t_first_evidence`	Days	Post-incident forensics required
TPR (Recall)	`TP / (TP + FN)`	%	Requires red team data for FN
FPR	`FP / (FP + TN)`	%	Ticket disposition tags
Precision (PPV)	`TP / (TP + FP)`	%	Ticket disposition tags
F1 Score	`2 × P × R / (P + R)`	0–1	P = Precision, R = Recall (TPR)
F-Beta Score	`(1+β²) × P × R / (β²×P + R)`	0–1	β>1 favors recall; β<1 favors precision
ATT&CK Coverage	`Rules_validated / Total_sub-techniques × 100`	%	Navigator layer export
Alert-to-Incident	`Total_alerts / Confirmed_incidents`	Ratio	Lower is better (less noise)
Auto-Close Rate	`SOAR_auto_closed / Total_alerts × 100`	%	Only count low-severity categories
Escalation Accuracy	`T2_confirmed_T1_escalations / T1_escalations × 100`	%	T2 agreement with T1 call
Rule Health	`Rules_fired_30d / Total_rules × 100`	%	Stale = not fired in 90d
Analyst Load	`Total_actionable_alerts / Analyst_FTE`	Alerts/analyst/day	Target < 50 actionable
SOAR ROI	`Hours_saved_by_automation × Analyst_hourly_rate`	Currency	Document per playbook
Breach Cost Avoidance	`Industry_avg_breach_cost × Breaches_prevented`	Currency	Requires incident classification
SLA Compliance	`Incidents_resolved_within_SLA / Total_incidents × 100`	%	Track by severity tier
Coverage Gap Rate	`Tier-0_techniques / Total_ATT&CK_techniques × 100`	%	Techniques with zero detections

Percentile vs. Mean for Time Metrics

P50 (Median) = 50% of incidents resolved within this time     ← Primary KPI
P90           = 90% of incidents resolved within this time     ← SLA basis
P99           = 99% of incidents resolved within this time     ← Outlier tracking
Mean          = Useful only when distribution is symmetric     ← Often misleading

Report P50 and P90 as your headline numbers. Use mean only in internal statistical analysis.

9. Measurement Cadence Reference¶

Metric Type	Review Cadence	Owner	Escalation Trigger
MTTA / Alert backlog age	Daily (automated dashboard)	SOC Manager	Backlog age > 24h; MTTA > 1h
MTTR / Incident age	Daily for S1/S2; weekly for S3/S4	Shift Lead	S1 > 4h unresolved; S2 > 24h
FP rate by rule	Weekly	Detection Engineer	Any rule > 40% FP
ATT&CK coverage	Monthly	Detection Engineering Lead	Gap in top-30 techniques
MTTD / Dwell Time	Monthly (post-IR review)	SOC Manager + CISO	Rising trend over 3 months
Analyst productivity	Monthly (1:1 review)	SOC Manager	Individual metric outliers > 2σ
Executive dashboard	Monthly/Quarterly	CISO	Board-level reporting
Maturity assessment	Annually	SOC Director	Score regression from prior year