Skip to content

SOC Metrics & KPIs Quick Reference

How to use this sheet

Formulas use standard statistical notation. Benchmarks are drawn from Ponemon, SANS, Gartner, and Verizon DBIR research. Level designations (L1–L5) refer to SOC maturity, not analyst tier.


1. Core Time Metrics

Metric Full Name Formula Industry Benchmark L4 SOC Target Primary Data Source Common Pitfalls
MTTD Mean Time to Detect Σ(Detection_time − Breach_start_time) / n 194 days (Ponemon 2023) < 24 hours SIEM first-alert timestamp vs. threat intel confirmed IOC time Clock-starts when alert fires, not when attacker entered; inflate result
MTTA Mean Time to Acknowledge Σ(Acknowledge_time − Alert_created_time) / n 4–8 hours (industry) < 15 minutes Ticket system timestamps (alert created → analyst assigns) Analysts mark "acknowledged" immediately to game the metric; use SLA compliance rate instead
MTTR Mean Time to Respond Σ(Response_action_time − Alert_created_time) / n 15–24 hours (SANS 2023) < 4 hours Ticket system: alert created → first remediation action recorded Conflates acknowledgement with effective response; track both separately
MTTC Mean Time to Contain Σ(Contain_time − Breach_confirmed_time) / n 63 days (Ponemon) < 2 hours (for detected threats) IR platform: incident created → host/account isolated Only start clock at confirmed incident, not every alert
Dwell Time Attacker persistence duration Detection_date − First_evidence_of_compromise 16 days (Mandiant 2023) < 1 day Threat intel + forensic timeline vs. SIEM first-alert Requires post-incident forensics to measure accurately; rarely measured real-time
Alert Backlog Age Oldest unreviewed alert age NOW() − oldest_unreviewed_alert_timestamp < 24 hours healthy < 4 hours SIEM/SOAR queue Growing backlog indicates capacity problem, not analyst performance problem

MTTD vs. Dwell Time Distinction

MTTD = time from intrusion to your SOC detecting it. Dwell Time = time attacker was present before any detection — sometimes discovered months later during IR. They are rarely equal. Dwell time requires IR forensics; MTTD comes from your SIEM timestamps.


2. Detection Quality Metrics

Metric Formula Definition Good Threshold Concerning Threshold Data Source
True Positive Rate (Recall / Sensitivity) TP / (TP + FN) % of real attacks that generated an alert > 90% < 70% Red team/purple team exercises; confirmed incidents
False Positive Rate (FPR) FP / (FP + TN) % of benign events that triggered alert < 5% > 20% Analyst disposition tags in ticketing system
Precision (Positive Predictive Value) TP / (TP + FP) % of alerts that are real threats > 50% < 20% Closed ticket disposition (True/False positive)
F1 Score 2 × (Precision × Recall) / (Precision + Recall) Harmonic mean balancing precision + recall > 0.70 < 0.50 Derived from precision/recall
Detection Coverage % Techniques_with_validated_rules / Total_ATT&CK_techniques × 100 ATT&CK framework coverage breadth > 60% top 30 < 30% Navigator layer export vs. SIEM rule inventory
Rule Health Ratio Rules_fired_in_30d / Total_deployed_rules × 100 % of rules generating at least one alert > 70% < 40% SIEM rule analytics; rule management platform
Stale Rule Rate Rules_not_fired_in_90d / Total_rules × 100 Rules that have never fired recently < 15% > 35% SIEM audit; detection engineering backlog
Rule Noise Score FP_count_per_rule / Total_alerts_per_rule Per-rule false positive density < 0.30 > 0.70 SIEM + ticket disposition correlation

The Precision-Recall Tradeoff

High-sensitivity rules catch more attacks (high recall) but also more benign events (low precision = alert fatigue). High-precision rules miss fewer attacks but miss more real threats. Target: Precision > 50% while keeping Recall > 85% for critical technique coverage.

Confusion Matrix Reference

                    ACTUAL THREAT    ACTUAL BENIGN
ALERT FIRED    │   True Positive  │  False Positive │
NO ALERT       │   False Negative │  True Negative  │
  • TP → Correctly detected attack → Close as Confirmed Incident
  • FP → Alert on benign activity → Tune rule; reduces analyst trust
  • FN → Missed attack → Worst outcome; only found via IR or red team
  • TN → No alert on benign activity → Expected; not directly measurable at scale

3. Alert Volume Metrics & Benchmarks

Metric Formula Industry Benchmark Healthy Range Red Flag
Alerts per Analyst per Day Total_daily_alerts / Analyst_FTE_count 100–500 (raw SIEM) < 50 (post-triage, actionable) > 100 actionable = burnout risk
Alert-to-Incident Ratio Total_alerts / Confirmed_incidents 100:1 to 1000:1 < 200:1 with good tuning > 500:1 = rule quality crisis
Escalation Rate T1→T2 T2_escalations / T1_alerts_closed × 100 10–20% 5–15% > 30% = T1 undertrained; < 5% = T1 over-closing
Auto-Close Rate (SOAR) SOAR_auto_closed / Total_alerts × 100 20–40% (mature SOAR) > 25% for low-severity alerts > 60% auto-close may hide real threats
FP Rate by Category FP_in_category / Total_in_category × 100 Varies by rule type < 20% per category > 40% = rule needs immediate tuning
Mean Alerts per Incident Total_alerts / Total_incidents 50–200 < 100 for correlated SIEM Rising trend = detection fragmentation
SOAR Playbook Completion Rate Completed_playbooks / Triggered_playbooks × 100 > 85% > 90% < 75% = playbook errors or data quality issues
Analyst Alert Handling Time Σ(time_per_alert) / alert_count 5–15 min/alert < 8 min for L1 triage > 20 min = complexity issue or missing tools

Alert Fatigue Warning Signs

  • Analysts bulk-closing alerts without review (disposition = "Not investigated")
  • MTTA worsening week-over-week despite same staffing
  • Escalation rate dropping as alert volume rises (analysts skipping escalations)
  • High sick leave / turnover rate in analyst staff

4. SOC Maturity Level Metric Thresholds

Maturity Level Description MTTD Target MTTA Target MTTR Target ATT&CK Coverage FP Rate Automation
L1 — Ad Hoc No formal processes; reactive > 30 days > 24 hours > 72 hours < 15% > 50% None
L2 — Developing Basic SIEM; some documented procedures 7–30 days 4–24 hours 24–72 hours 15–30% 30–50% Basic alerting
L3 — Defined SIEM + SOAR; defined playbooks; threat intel feed 1–7 days 1–4 hours 4–24 hours 30–60% 15–30% Playbook automation
L4 — Managed Proactive hunting; continuous tuning; metrics-driven < 24 hours < 15 min < 4 hours 60–80% 5–15% High SOAR automation
L5 — Optimizing Autonomous detection; ML; red team integration < 1 hour < 5 min < 1 hour > 80% < 5% Near-full automation

Maturity Model Caveat

L5 is aspirational for almost all organizations. Most mature enterprise SOCs operate at L3–L4. Do not sacrifice detection quality (recall) to achieve L4 speed targets — a fast MTTR on a FP is worthless.


5. Analyst Productivity Metrics

What TO Measure

Metric Why It Matters How to Measure
Alert handling time (median, not mean) Identifies complexity spikes; use median to avoid outlier skew Ticket system time-in-state
Escalation quality (T2 agrees with T1 escalation?) Measures triage accuracy without incentivizing under-escalation T2 disposition vs. T1 escalation flag
Detection rule contribution (rules authored per analyst) Tracks capability growth Detection engineering backlog
Playbook feedback rate Analysts identifying process gaps SOAR feedback tags
Training completion + certification pace Skills development LMS / certification tracker
MTTA variance (are analysts slower at end of shift?) Staffing model insight Time-series analysis of MTTA by hour

What NOT TO Measure (Avoid Perverse Incentives)

Metrics That Harm SOC Culture

Avoid This Metric Why It Backfires Better Alternative
Alerts closed per day (raw volume) Incentivizes bulk-closing without investigation Alerts investigated with documented disposition
Zero false positives per analyst Analysts become over-conservative; miss real threats Team-level FP rate with blameless review
MTTA as individual KPI Analysts mark alerts acknowledged immediately without reading Pair MTTA with first-meaningful-action timestamp
No escalations = good analyst Suppresses escalation of ambiguous threats Track escalation accuracy, not rate
Lines of KQL written per week Incentivizes verbose, low-quality rules Rule effectiveness (TPR per rule)
Presence/hours online Doesn't reflect investigation quality Outcome-based metrics above

6. Executive Dashboard Template

Board Level (Quarterly)

Metric Visualization Audience Framing
Breach risk score trend RAG status gauge "Are we better or worse than last quarter?"
MTTD vs. industry benchmark Bar chart with benchmark line "How fast do we find attackers?"
Critical incidents (confirmed) Count + trend "How many real attacks this quarter?"
Security control effectiveness % controls passing "Are our investments working?"
Regulatory compliance posture % controls in scope "Are we meeting obligations?"

CISO Level (Monthly)

Metric Visualization Notes
MTTD / MTTA / MTTR (trending) Line charts, 13-month view Show improvement trajectory
ATT&CK coverage heatmap Navigator export Show gaps vs. threat actor profiles
Top 5 attack techniques observed Bar chart Tie to threat landscape
Incident volume by severity Stacked bar S1/S2 incidents are CISO-level items
Detection rule health Fired % / stale % Rule debt indicator
SOAR automation rate Trend line ROI of SOAR investment
Analyst capacity vs. alert volume Dual axis chart Staffing justification data

SOC Manager Level (Weekly)

Metric Visualization Action Trigger
Alert backlog age (oldest open) Single number + trend > 24h = staffing/process issue
Analyst MTTA by shift Heatmap Identifies shift-specific delays
FP rate by rule (top 10 noisiest) Table Tune immediately if > 40%
Escalation rate T1→T2 Trend Alert if drops sharply
SOAR playbook failure rate Bar chart per playbook Fix broken playbooks
New threat intel integrations Count Track intel coverage expansion
Open incidents by severity/age Kanban-style table S1/S2 aging > 4h = escalation

7. Metrics Pitfalls — Top 10 Mistakes & Fixes

# Mistake Why It Hurts Fix
1 Measuring MTTD only from first alert, not first evidence Severely understates true detection gap; hides dwell time Supplement with post-incident forensic timeline
2 Using mean instead of median for time metrics Outlier major incidents (30-day dwell) skew mean massively Report P50, P90, P99 percentiles
3 Counting all alerts equally (SOAR auto-closes mixed with analyst reviews) Inflates "alerts handled" count; hides analyst load problem Separate automated dispositions from analyst-reviewed
4 Reporting raw ATT&CK technique coverage % without sub-technique detail One high-level rule might claim coverage of 50 sub-techniques Score at sub-technique level; require validated detections
5 Not tracking false negatives FN rate is invisible without red team/purple team exercises Run quarterly adversary simulations; track missed detections
6 Treating metric improvement as the goal (Goodhart's Law) When a measure becomes a target, it ceases to be a good measure Rotate metrics annually; use leading + lagging indicators
7 Using MTTR to include investigation time (not just remediation) Inflates MTTR; obscures whether response is fast once decision is made Split: MTTI (investigate) + MTTC (contain) + MTTR (remediate)
8 Not segmenting metrics by asset criticality P3 server incident treated same as P1 crown-jewel server Weight metrics by asset criticality tier
9 Publishing analyst-level metrics publicly in the team Creates unhealthy competition; degrades team knowledge sharing Use team-level metrics; use individual metrics only in 1:1s
10 Ignoring trend direction in favor of absolute values A MTTD of 12 hours looks good if benchmark is 194 days; bad if it was 6 hours last month Always show 13-month trend alongside current value

8. KPI Formula Reference Card

KPI Formula Unit Notes
MTTD Σ(t_detect − t_intrusion) / n Hours t_intrusion from forensics or IOC first-seen
MTTA Σ(t_ack − t_alert_created) / n Minutes Ticket system timestamps
MTTR Σ(t_remediate − t_incident_open) / n Hours IR platform timestamps
MTTC Σ(t_contain − t_confirmed) / n Hours Start at incident confirmation, not alert
Dwell Time t_detection − t_first_evidence Days Post-incident forensics required
TPR (Recall) TP / (TP + FN) % Requires red team data for FN
FPR FP / (FP + TN) % Ticket disposition tags
Precision (PPV) TP / (TP + FP) % Ticket disposition tags
F1 Score 2 × P × R / (P + R) 0–1 P = Precision, R = Recall (TPR)
F-Beta Score (1+β²) × P × R / (β²×P + R) 0–1 β>1 favors recall; β<1 favors precision
ATT&CK Coverage Rules_validated / Total_sub-techniques × 100 % Navigator layer export
Alert-to-Incident Total_alerts / Confirmed_incidents Ratio Lower is better (less noise)
Auto-Close Rate SOAR_auto_closed / Total_alerts × 100 % Only count low-severity categories
Escalation Accuracy T2_confirmed_T1_escalations / T1_escalations × 100 % T2 agreement with T1 call
Rule Health Rules_fired_30d / Total_rules × 100 % Stale = not fired in 90d
Analyst Load Total_actionable_alerts / Analyst_FTE Alerts/analyst/day Target < 50 actionable
SOAR ROI Hours_saved_by_automation × Analyst_hourly_rate Currency Document per playbook
Breach Cost Avoidance Industry_avg_breach_cost × Breaches_prevented Currency Requires incident classification
SLA Compliance Incidents_resolved_within_SLA / Total_incidents × 100 % Track by severity tier
Coverage Gap Rate Tier-0_techniques / Total_ATT&CK_techniques × 100 % Techniques with zero detections

Percentile vs. Mean for Time Metrics

P50 (Median) = 50% of incidents resolved within this time     ← Primary KPI
P90           = 90% of incidents resolved within this time     ← SLA basis
P99           = 99% of incidents resolved within this time     ← Outlier tracking
Mean          = Useful only when distribution is symmetric     ← Often misleading
Report P50 and P90 as your headline numbers. Use mean only in internal statistical analysis.


9. Measurement Cadence Reference

Metric Type Review Cadence Owner Escalation Trigger
MTTA / Alert backlog age Daily (automated dashboard) SOC Manager Backlog age > 24h; MTTA > 1h
MTTR / Incident age Daily for S1/S2; weekly for S3/S4 Shift Lead S1 > 4h unresolved; S2 > 24h
FP rate by rule Weekly Detection Engineer Any rule > 40% FP
ATT&CK coverage Monthly Detection Engineering Lead Gap in top-30 techniques
MTTD / Dwell Time Monthly (post-IR review) SOC Manager + CISO Rising trend over 3 months
Analyst productivity Monthly (1:1 review) SOC Manager Individual metric outliers > 2σ
Executive dashboard Monthly/Quarterly CISO Board-level reporting
Maturity assessment Annually SOC Director Score regression from prior year