Chapter 11: Evaluation & Metrics¶
Learning Objectives¶
By the end of this chapter, you will be able to:
- Calculate key SOC performance metrics (MTTD, MTTR, MTTA, alert volume)
- Evaluate AI/ML system effectiveness (precision, recall, ROI)
- Design balanced scorecards for SOC performance
- Measure detection coverage using MITRE ATT&CK mapping
- Apply continuous improvement methodologies to SOC operations
Prerequisites¶
- Chapter 1: Understanding of SOC functions and challenges
- Chapter 9: ML evaluation metrics (precision, recall)
- Basic statistics knowledge
Key Concepts¶
Key Performance Indicator (KPI) • Mean Time to Detect (MTTD) • Mean Time to Respond (MTTR) • Detection Coverage • Alert Fatigue • SOC Maturity Model
Curiosity Hook: The Dashboard That Changed Everything¶
SOC Manager's Monthly Review (Before Metrics): "We're doing fine. We handle alerts. No major breaches."
After Implementing Metrics: - MTTD: 45 days (industry average: 7 days) - Alert FP Rate: 65% (wasting 13 hours/day on false positives) - Detection Coverage: 40% of applicable ATT&CK techniques (major gaps) - Analyst Turnover: 40%/year (burnout from alert fatigue)
Executive Response: "We need to improve immediately."
6 Months Later: - MTTD: 45 days → 12 days (73% improvement) - FP Rate: 65% → 30% (better tuning, ML triage) - Coverage: 40% → 75% (new detection rules) - Turnover: 40% → 15% (automation reduced burnout)
Lesson: You can't improve what you don't measure. Metrics drive accountability and progress.
11.1 SOC Performance Metrics¶
Time-Based Metrics¶
1. Mean Time to Detect (MTTD)
Example: - Attacker gained access: 2026-02-01 10:00 - SOC detected intrusion: 2026-02-15 14:00 - MTTD = 14 days, 4 hours
Target: Industry average is 7-21 days for advanced threats. Mature SOCs: <7 days.
Improvement Strategies: - Better detection coverage (close ATT&CK gaps) - Proactive threat hunting - Behavioral analytics (UEBA) for early indicators
2. Mean Time to Acknowledge (MTTA)
Calculation:
Target: <5 minutes for high-severity, <15 minutes for medium
Factors: - SOC staffing and shift coverage - Alert volume (overload increases MTTA) - Automation (pre-triage reduces MTTA)
3. Mean Time to Respond (MTTR)
Example: - Detection: 2026-02-15 14:00 - Containment (system isolated): 2026-02-15 16:30 - MTTR = 2.5 hours
Target: - Critical incidents: <2 hours - High: <4 hours - Medium: <24 hours
Improvement Strategies: - SOAR automation (auto-isolation, auto-blocking) - Clear runbooks and playbooks - Cross-functional coordination (SOC + IT + Legal)
4. Dwell Time
Different from MTTD: - MTTD: Time to detect - Dwell Time: Time attacker remains in environment (includes MTTD + MTTR + investigation/eradication)
Industry Benchmark: 21 days (2023 Mandiant M-Trends report)
Goal: Reduce dwell time to <7 days
Volume-Based Metrics¶
1. Alert Volume
Example: - Daily alerts: 1,200 - Analyst capacity: 40 alerts/day per analyst × 3 analysts = 120 alerts/day - Problem: 10x overload → Missed threats, burnout
Optimal Range: - 80-120% of analyst capacity (allows surge handling) - If >150%: Tune rules, implement ML triage, add staff
2. False Positive Rate
Example: - Total alerts: 1,000 - False positives: 650 - FP Rate = 65%
Target: <20% (mature SOCs aim for <10%)
Impact: - 65% FP rate = 13 hours/day wasted on noise (for 3 analysts handling 20-min triage each)
Improvement: - Rule tuning (add allowlists, adjust thresholds) - ML-powered triage (auto-close high-confidence FPs)
3. True Positive Rate (Detection Rate)
Example: - Total alerts: 1,000 - Confirmed incidents: 50 - TP Rate = 5%
Caveat: Low TP rate can indicate: - High-quality detections (only fire on real threats) - OR overly aggressive rules (too many FPs)
Context matters: Combine with FP rate and missed incidents.
Coverage Metrics¶
1. Detection Coverage
Example: - Applicable techniques for your environment: 150 - Techniques with active detections: 90 - Coverage = 60%
Target: >70% for critical techniques (Initial Access, Execution, Persistence, Privilege Escalation, Lateral Movement, Exfiltration)
Measurement: - Map detection rules to ATT&CK techniques - Use frameworks like MITRE ATT&CK Navigator - Identify gaps and prioritize coverage expansion
2. Test Coverage
Example: - Total detection rules: 200 - Tested in past 90 days: 120 - Test Coverage = 60%
Target: >80% (purple team exercises, Atomic Red Team)
Why Important: Untested detections may fail silently.
11.2 AI/ML System Metrics¶
Model Performance (Recap from Ch. 9)¶
Precision, Recall, F1-Score (apply to ML-based detections)
Example: ML Alert Triage Model - Precision: 85% (of flagged alerts, 85% are true threats) - Recall: 90% (of all threats, model catches 90%) - F1-Score: 87.4%
Monitoring: - Track monthly (detect model drift) - Alert if F1 drops >5% (retrain needed)
Automation ROI¶
Time Savings:
Example: - Manual triage: 10 min/alert - Automated triage: 1 min/alert - Alerts/day: 500 - Time saved: 9 min × 500 = 4,500 min/day = 75 hours/day - Monthly time saved: 75 × 30 = 2,250 hours - Analyst cost: $50/hour - Monthly savings: 2,250 × $50 = $112,500 - Automation cost: $10,000/month - Net ROI: $102,500/month ($1.23M/year)
Accuracy vs. Speed Trade-off:
Example: - Auto-closed: 400/day - Incorrectly closed (missed threats): 8/day - Accuracy: 98%
Target: >95% accuracy for auto-closure
Risk Mitigation: - Periodic review of auto-closed alerts (sample 5% weekly) - Analyst override mechanism ("reopen this alert")
11.3 Balanced Scorecard Approach¶
What Is a Balanced Scorecard?¶
Balanced Scorecard: A strategic framework measuring performance across multiple dimensions (not just one metric).
SOC Scorecard Dimensions:
1. Detection Effectiveness - MTTD - Detection coverage (ATT&CK %) - True positive rate
2. Response Efficiency - MTTR - MTTA - Automation rate (% incidents auto-remediated)
3. Operational Health - Alert volume (vs. capacity) - False positive rate - Analyst satisfaction (survey-based) - Turnover rate
4. Business Alignment - Incidents affecting critical assets - Compliance posture (audit findings) - Stakeholder satisfaction (IT, executives)
Example Scorecard¶
| Dimension | Metric | Current | Target | Status |
|---|---|---|---|---|
| Detection | MTTD | 12 days | 7 days | ⚠️ Needs Improvement |
| Coverage | 75% | 80% | ⚠️ Needs Improvement | |
| TP Rate | 8% | 10% | ⚠️ Needs Improvement | |
| Response | MTTR | 3 hours | 2 hours | ⚠️ Needs Improvement |
| MTTA | 4 min | 5 min | ✅ On Target | |
| Automation | 60% | 70% | ⚠️ Needs Improvement | |
| Operations | Alert Volume | 800/day | 600/day | ⚠️ Needs Improvement |
| FP Rate | 30% | 20% | ⚠️ Needs Improvement | |
| Analyst Satisfaction | 7/10 | 8/10 | ⚠️ Needs Improvement | |
| Business | Critical Asset Incidents | 2/month | 1/month | ⚠️ Needs Improvement |
| Compliance Findings | 0 | 0 | ✅ On Target |
Overall Assessment: Operational but needs optimization in detection and response efficiency.
11.4 Detection Coverage Measurement¶
MITRE ATT&CK Mapping¶
Step 1: Identify Applicable Techniques - Not all ATT&CK techniques apply to your environment - Example: Cloud-only organization → Skip on-prem Active Directory techniques
Step 2: Map Detections to Techniques
Step 3: Visualize Coverage Use ATT&CK Navigator (web-based tool) to create heatmaps: - Green: Technique covered + tested - Yellow: Covered but untested - Red: No detection
Step 4: Prioritize Gaps - Focus on high-impact techniques (Initial Access, Credential Access, Lateral Movement) - Use threat intel: What techniques are adversaries actively using?
Detection Depth¶
Simple Coverage: Binary (covered or not)
Detection Depth: How many detections per technique?
Example: - T1003.001 (LSASS Memory): 1. EDR behavioral detection (process accessing LSASS) 2. SIEM correlation (known dumping tools: Mimikatz, ProcDump) 3. File monitoring (LSASS dump files in temp directories)
Depth = 3 detections
Why Important: Redundancy. If one detection fails (misconfiguration, evasion), others may succeed.
11.5 Continuous Improvement¶
PDCA Cycle (Plan-Do-Check-Act)¶
Plan: Identify improvement opportunity (e.g., reduce MTTD from 12 to 7 days)
Do: Implement changes (deploy new detection rules, conduct threat hunts)
Check: Measure results (did MTTD decrease?)
Act: Standardize if successful, or adjust if unsuccessful
Repeat: Continuous iteration
Regular Review Cadence¶
Daily: - Alert volume, MTTA (operational monitoring) - Critical incident status
Weekly: - FP rate, TP rate - Detection rule performance (which rules fire most? Accuracy?) - Automation metrics (time saved, accuracy)
Monthly: - MTTD, MTTR, Dwell Time - Detection coverage (gaps identified?) - Analyst satisfaction and turnover - SOC scorecard review with management
Quarterly: - Purple team exercises (test coverage) - Strategic planning (budget, staffing, tool evaluation) - Lessons learned from major incidents
Benchmarking¶
Internal Benchmarking: - Compare month-over-month: "Is our MTTR improving?"
External Benchmarking: - Compare to industry peers: "Our MTTD is 12 days vs. industry average 21 days → We're above average"
Sources: - Verizon DBIR (Data Breach Investigations Report) - Mandiant M-Trends - SANS SOC Survey - Industry ISACs (sector-specific benchmarks)
Caution: Context matters. A financial services SOC may have different targets than a healthcare SOC (regulatory requirements, threat landscape).
11.6 Metrics Pitfalls¶
Pitfall 1: Metric Manipulation (Goodhart's Law)¶
Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."
Example: - SOC KPI: "Reduce alert volume by 50%" - Analyst action: Disable half of detection rules - Result: Alert volume down, but security degraded
Mitigation: - Use balanced scorecards (multiple metrics) - Monitor leading indicators (detection coverage) and lagging indicators (incidents)
Pitfall 2: Vanity Metrics¶
Vanity Metric: Numbers that look impressive but don't drive actionable insights.
Example: - "We processed 10 million logs today!" (So what? Did you detect threats?)
Better Metric: - "We detected 5 incidents, contained in <2 hours, zero data loss"
Pitfall 3: Ignoring Context¶
Example: - SOC A: MTTR = 30 minutes - SOC B: MTTR = 4 hours - Conclusion: "SOC A is better"
Missing Context: - SOC A incidents: Mostly low-severity phishing - SOC B incidents: Complex APT intrusions requiring forensics
Mitigation: Segment metrics by severity and incident type.
Interactive Element¶
MicroSim 11: SOC Metrics Dashboard
Simulate SOC operations and see real-time impact of decisions on MTTD, MTTR, FP rate, and analyst burnout.
Common Misconceptions¶
Misconception: Lower Alert Volume Always Means Better Security
Reality: Reducing alerts by disabling detections degrades security. The goal is to reduce noise (FPs) while maintaining or improving signal (TPs).
Misconception: MTTR Is the Only Metric That Matters
Reality: MTTR without MTTD context is incomplete. Fast response (low MTTR) is useless if you detect threats 60 days late (high MTTD). Balanced approach is key.
Misconception: Metrics Are Only for Reporting to Management
Reality: Metrics are operational tools. Analysts use them to prioritize work, identify bottlenecks, and improve processes. Management reporting is a secondary benefit.
Practice Tasks¶
Task 1: Calculate MTTR¶
Incidents: - Incident 1: Detected at 10:00, contained at 10:45 (45 min) - Incident 2: Detected at 14:00, contained at 18:00 (4 hours = 240 min) - Incident 3: Detected at 08:00, contained at 09:30 (90 min)
Question: What is the MTTR?
Answer
MTTR = (45 + 240 + 90) / 3 = 375 / 3 = 125 minutes (2 hours 5 minutes)
Analysis: Incident 2 is an outlier (4 hours). Investigate why it took so long: - Complex investigation? - Delayed escalation? - Lack of automation?
Task 2: Evaluate Detection Coverage¶
Your Environment: - Total applicable ATT&CK techniques: 120 - Techniques with detections: 84
Question: a) What is your detection coverage %? b) Is this acceptable?
Answers
a) Coverage = 84 / 120 = 70%
b) Acceptable but needs improvement: - 70% is decent for mid-maturity SOC - Target: >80% for critical techniques (Initial Access, Execution, Lateral Movement, Exfiltration) - Action: Prioritize gap analysis for the 36 uncovered techniques, focusing on high-impact ones
Task 3: Identify Vanity Metric¶
Which of these is a vanity metric? a) "We blocked 1 million threats this month" b) "Our MTTD decreased from 21 days to 10 days" c) "We improved detection coverage from 60% to 75%"
Answer
a) "We blocked 1 million threats this month" is a vanity metric.
Why: - Lacks context: What were the threats? Severity? Were they already blocked by other controls? - Doesn't indicate SOC effectiveness (firewall may have blocked 999,000; SOC contributed minimally) - Sounds impressive but doesn't measure SOC performance
b) and c) are actionable metrics: - MTTD reduction: Measurable improvement in detection speed - Coverage increase: Tangible gap closure, reduces blind spots
Exam Prep & Certifications¶
Relevant Certifications
The topics in this chapter align with the following certifications:
- CompTIA Security+ — Domains: Security Operations, Security Architecture
- CompTIA CySA+ — Domains: Security Operations, Threat Management
- GIAC GCIH — Domains: Detection, Automation
- CISSP — Domains: Security Operations, Software Development Security
Self-Assessment Quiz¶
Question 1: What does MTTD measure?
Options:
a) Mean Time to Deploy new detection rules b) Mean Time to Detect an incident after initial compromise c) Mean Time to Document incident reports d) Mean Time to Disable user accounts
Show Answer
Correct Answer: b) Mean Time to Detect an incident after initial compromise
Explanation: MTTD tracks detection speed—the time between attacker's initial entry and SOC detection.
Question 2: A SOC has 1,000 alerts. 700 are false positives. What is the FP rate?
Options:
a) 30% b) 50% c) 70% d) 100%
Show Answer
Correct Answer: c) 70%
Explanation: FP Rate = 700 / 1,000 = 70%
Question 3: What is the purpose of a balanced scorecard?
Options:
a) To measure only financial metrics b) To track performance across multiple dimensions (detection, response, operations, business) c) To replace all other metrics with a single score d) To calculate analyst bonuses
Show Answer
Correct Answer: b) To track performance across multiple dimensions (detection, response, operations, business)
Explanation: Balanced scorecards prevent over-optimization of a single metric by measuring holistic performance.
Question 4: What does detection coverage measure?
Options:
a) Percentage of alerts that are true positives b) Percentage of ATT&CK techniques with active detections c) Percentage of systems with EDR deployed d) Percentage of analysts who passed certification exams
Show Answer
Correct Answer: b) Percentage of ATT&CK techniques with active detections
Explanation: Detection coverage maps detections to ATT&CK techniques, identifying gaps in monitoring.
Question 5: What is Goodhart's Law in the context of SOC metrics?
Options:
a) Metrics should always increase over time b) When a measure becomes a target, it ceases to be a good measure c) Metrics are more important than actual security d) All metrics should be publicly reported
Show Answer
Correct Answer: b) When a measure becomes a target, it ceases to be a good measure
Explanation: If you over-optimize for one metric (e.g., "reduce alerts"), people game the system (disable detections), degrading actual security.
Question 6: Why is continuous retraining important for ML-based detections?
Options:
a) It's not important; ML models are static once deployed b) The threat landscape evolves, causing model drift and degraded performance c) Retraining increases alert volume d) Analysts need something to do
Show Answer
Correct Answer: b) The threat landscape evolves, causing model drift and degraded performance
Explanation: Attackers change TTPs; data distributions shift. Models trained on old data become less effective. Regular retraining maintains accuracy.
Summary¶
In this chapter, you learned:
- SOC performance metrics: MTTD, MTTR, MTTA, dwell time, alert volume, FP rate
- Coverage metrics: Detection coverage (ATT&CK %), test coverage
- AI/ML metrics: Precision, recall, F1-score, automation ROI
- Balanced scorecard: Multi-dimensional performance measurement (detection, response, operations, business)
- Continuous improvement: PDCA cycle, regular review cadence, benchmarking
- Metrics pitfalls: Goodhart's Law, vanity metrics, ignoring context
Next Steps¶
- Next Chapter: Chapter 12: Governance, Privacy & Risk - Learn compliance, ethical considerations, and risk management
- Practice: Build a SOC scorecard for your environment using the Metrics Dashboard MicroSim
- Implement: Start tracking MTTD, MTTR, and FP rate monthly
- Benchmark: Compare your metrics to industry reports (Verizon DBIR, Mandiant M-Trends)
Chapter 11 Complete | Next: Chapter 12 →