Chapter 11: Evaluation & Metrics¶

Learning Objectives¶

By the end of this chapter, you will be able to:

Calculate key SOC performance metrics (MTTD, MTTR, MTTA, alert volume)
Evaluate AI/ML system effectiveness (precision, recall, ROI)
Design balanced scorecards for SOC performance
Measure detection coverage using MITRE ATT&CK mapping
Apply continuous improvement methodologies to SOC operations

Prerequisites¶

Chapter 1: Understanding of SOC functions and challenges
Chapter 9: ML evaluation metrics (precision, recall)
Basic statistics knowledge

Key Concepts¶

Key Performance Indicator (KPI) • Mean Time to Detect (MTTD) • Mean Time to Respond (MTTR) • Detection Coverage • Alert Fatigue • SOC Maturity Model

Curiosity Hook: The Dashboard That Changed Everything¶

SOC Manager's Monthly Review (Before Metrics): "We're doing fine. We handle alerts. No major breaches."

After Implementing Metrics: - MTTD: 45 days (industry average: 7 days) - Alert FP Rate: 65% (wasting 13 hours/day on false positives) - Detection Coverage: 40% of applicable ATT&CK techniques (major gaps) - Analyst Turnover: 40%/year (burnout from alert fatigue)

Executive Response: "We need to improve immediately."

6 Months Later: - MTTD: 45 days → 12 days (73% improvement) - FP Rate: 65% → 30% (better tuning, ML triage) - Coverage: 40% → 75% (new detection rules) - Turnover: 40% → 15% (automation reduced burnout)

Lesson: You can't improve what you don't measure. Metrics drive accountability and progress.

11.1 SOC Performance Metrics¶

Time-Based Metrics¶

1. Mean Time to Detect (MTTD)

MTTD = Time from initial compromise to detection

Example: - Attacker gained access: 2026-02-01 10:00 - SOC detected intrusion: 2026-02-15 14:00 - MTTD = 14 days, 4 hours

Target: Industry average is 7-21 days for advanced threats. Mature SOCs: <7 days.

Improvement Strategies: - Better detection coverage (close ATT&CK gaps) - Proactive threat hunting - Behavioral analytics (UEBA) for early indicators

2. Mean Time to Acknowledge (MTTA)

MTTA = Average time from alert generation to analyst acknowledgment

Calculation:

Alerts: [3 min, 5 min, 2 min, 15 min, 4 min]
MTTA = (3 + 5 + 2 + 15 + 4) / 5 = 5.8 minutes

Target: <5 minutes for high-severity, <15 minutes for medium

Factors: - SOC staffing and shift coverage - Alert volume (overload increases MTTA) - Automation (pre-triage reduces MTTA)

3. Mean Time to Respond (MTTR)

MTTR = Time from detection to containment/remediation

Example: - Detection: 2026-02-15 14:00 - Containment (system isolated): 2026-02-15 16:30 - MTTR = 2.5 hours

Target: - Critical incidents: <2 hours - High: <4 hours - Medium: <24 hours

Improvement Strategies: - SOAR automation (auto-isolation, auto-blocking) - Clear runbooks and playbooks - Cross-functional coordination (SOC + IT + Legal)

4. Dwell Time

Dwell Time = Time from initial compromise to full eradication

Different from MTTD: - MTTD: Time to detect - Dwell Time: Time attacker remains in environment (includes MTTD + MTTR + investigation/eradication)

Industry Benchmark: 21 days (2023 Mandiant M-Trends report)

Goal: Reduce dwell time to <7 days

Volume-Based Metrics¶

1. Alert Volume

Total Alerts per Day/Week/Month

Example: - Daily alerts: 1,200 - Analyst capacity: 40 alerts/day per analyst × 3 analysts = 120 alerts/day - Problem: 10x overload → Missed threats, burnout

Optimal Range: - 80-120% of analyst capacity (allows surge handling) - If >150%: Tune rules, implement ML triage, add staff

2. False Positive Rate

FP Rate = False Positives / Total Alerts × 100%

Example: - Total alerts: 1,000 - False positives: 650 - FP Rate = 65%

Target: <20% (mature SOCs aim for <10%)

Impact: - 65% FP rate = 13 hours/day wasted on noise (for 3 analysts handling 20-min triage each)

Improvement: - Rule tuning (add allowlists, adjust thresholds) - ML-powered triage (auto-close high-confidence FPs)

3. True Positive Rate (Detection Rate)

TP Rate = Confirmed Incidents / Total Alerts × 100%

Example: - Total alerts: 1,000 - Confirmed incidents: 50 - TP Rate = 5%

Caveat: Low TP rate can indicate: - High-quality detections (only fire on real threats) - OR overly aggressive rules (too many FPs)

Context matters: Combine with FP rate and missed incidents.

Coverage Metrics¶

1. Detection Coverage

Coverage = (ATT&CK Techniques with Detections) / (Total Applicable Techniques) × 100%

Example: - Applicable techniques for your environment: 150 - Techniques with active detections: 90 - Coverage = 60%

Target: >70% for critical techniques (Initial Access, Execution, Persistence, Privilege Escalation, Lateral Movement, Exfiltration)

Measurement: - Map detection rules to ATT&CK techniques - Use frameworks like MITRE ATT&CK Navigator - Identify gaps and prioritize coverage expansion

2. Test Coverage

Test Coverage = (Detections Tested in Past 90 Days) / (Total Detections) × 100%

Example: - Total detection rules: 200 - Tested in past 90 days: 120 - Test Coverage = 60%

Target: >80% (purple team exercises, Atomic Red Team)

Why Important: Untested detections may fail silently.

11.2 AI/ML System Metrics¶

Model Performance (Recap from Ch. 9)¶

Precision, Recall, F1-Score (apply to ML-based detections)

Example: ML Alert Triage Model - Precision: 85% (of flagged alerts, 85% are true threats) - Recall: 90% (of all threats, model catches 90%) - F1-Score: 87.4%

Monitoring: - Track monthly (detect model drift) - Alert if F1 drops >5% (retrain needed)

Automation ROI¶

Time Savings:

ROI = (Time Saved × Hourly Cost) - Automation Cost

Example: - Manual triage: 10 min/alert - Automated triage: 1 min/alert - Alerts/day: 500 - Time saved: 9 min × 500 = 4,500 min/day = 75 hours/day - Monthly time saved: 75 × 30 = 2,250 hours - Analyst cost: $50/hour - Monthly savings: 2,250 × $50 = $112,500 - Automation cost: $10,000/month - Net ROI: $102,500/month ($1.23M/year)

Accuracy vs. Speed Trade-off:

Metric: Alerts Auto-Closed Correctly / Total Auto-Closed

Example: - Auto-closed: 400/day - Incorrectly closed (missed threats): 8/day - Accuracy: 98%

Target: >95% accuracy for auto-closure

Risk Mitigation: - Periodic review of auto-closed alerts (sample 5% weekly) - Analyst override mechanism ("reopen this alert")

11.3 Balanced Scorecard Approach¶

What Is a Balanced Scorecard?¶

Balanced Scorecard: A strategic framework measuring performance across multiple dimensions (not just one metric).

SOC Scorecard Dimensions:

1. Detection Effectiveness - MTTD - Detection coverage (ATT&CK %) - True positive rate

2. Response Efficiency - MTTR - MTTA - Automation rate (% incidents auto-remediated)

3. Operational Health - Alert volume (vs. capacity) - False positive rate - Analyst satisfaction (survey-based) - Turnover rate

4. Business Alignment - Incidents affecting critical assets - Compliance posture (audit findings) - Stakeholder satisfaction (IT, executives)

Example Scorecard¶

Dimension	Metric	Current	Target	Status
Detection	MTTD	12 days	7 days	⚠️ Needs Improvement
	Coverage	75%	80%	⚠️ Needs Improvement
	TP Rate	8%	10%	⚠️ Needs Improvement
Response	MTTR	3 hours	2 hours	⚠️ Needs Improvement
	MTTA	4 min	5 min	✅ On Target
	Automation	60%	70%	⚠️ Needs Improvement
Operations	Alert Volume	800/day	600/day	⚠️ Needs Improvement
	FP Rate	30%	20%	⚠️ Needs Improvement
	Analyst Satisfaction	7/10	8/10	⚠️ Needs Improvement
Business	Critical Asset Incidents	2/month	1/month	⚠️ Needs Improvement
	Compliance Findings	0	0	✅ On Target

Overall Assessment: Operational but needs optimization in detection and response efficiency.

11.4 Detection Coverage Measurement¶

MITRE ATT&CK Mapping¶

Step 1: Identify Applicable Techniques - Not all ATT&CK techniques apply to your environment - Example: Cloud-only organization → Skip on-prem Active Directory techniques

Step 2: Map Detections to Techniques

Detection Rule: "Suspicious PowerShell Execution"
ATT&CK Mapping: T1059.001 (PowerShell)

Step 3: Visualize Coverage Use ATT&CK Navigator (web-based tool) to create heatmaps: - Green: Technique covered + tested - Yellow: Covered but untested - Red: No detection

Step 4: Prioritize Gaps - Focus on high-impact techniques (Initial Access, Credential Access, Lateral Movement) - Use threat intel: What techniques are adversaries actively using?

Detection Depth¶

Simple Coverage: Binary (covered or not)

Detection Depth: How many detections per technique?

Example: - T1003.001 (LSASS Memory): 1. EDR behavioral detection (process accessing LSASS) 2. SIEM correlation (known dumping tools: Mimikatz, ProcDump) 3. File monitoring (LSASS dump files in temp directories)

Depth = 3 detections

Why Important: Redundancy. If one detection fails (misconfiguration, evasion), others may succeed.

11.5 Continuous Improvement¶

PDCA Cycle (Plan-Do-Check-Act)¶

Plan: Identify improvement opportunity (e.g., reduce MTTD from 12 to 7 days)

Do: Implement changes (deploy new detection rules, conduct threat hunts)

Check: Measure results (did MTTD decrease?)

Act: Standardize if successful, or adjust if unsuccessful

Repeat: Continuous iteration

Regular Review Cadence¶

Daily: - Alert volume, MTTA (operational monitoring) - Critical incident status

Weekly: - FP rate, TP rate - Detection rule performance (which rules fire most? Accuracy?) - Automation metrics (time saved, accuracy)

Monthly: - MTTD, MTTR, Dwell Time - Detection coverage (gaps identified?) - Analyst satisfaction and turnover - SOC scorecard review with management

Quarterly: - Purple team exercises (test coverage) - Strategic planning (budget, staffing, tool evaluation) - Lessons learned from major incidents

Benchmarking¶

Internal Benchmarking: - Compare month-over-month: "Is our MTTR improving?"

External Benchmarking: - Compare to industry peers: "Our MTTD is 12 days vs. industry average 21 days → We're above average"

Sources: - Verizon DBIR (Data Breach Investigations Report) - Mandiant M-Trends - SANS SOC Survey - Industry ISACs (sector-specific benchmarks)

Caution: Context matters. A financial services SOC may have different targets than a healthcare SOC (regulatory requirements, threat landscape).

11.6 Metrics Pitfalls¶

Pitfall 1: Metric Manipulation (Goodhart's Law)¶

Goodhart's Law: "When a measure becomes a target, it ceases to be a good measure."

Example: - SOC KPI: "Reduce alert volume by 50%" - Analyst action: Disable half of detection rules - Result: Alert volume down, but security degraded

Mitigation: - Use balanced scorecards (multiple metrics) - Monitor leading indicators (detection coverage) and lagging indicators (incidents)

Pitfall 2: Vanity Metrics¶

Vanity Metric: Numbers that look impressive but don't drive actionable insights.

Example: - "We processed 10 million logs today!" (So what? Did you detect threats?)

Better Metric: - "We detected 5 incidents, contained in <2 hours, zero data loss"

Pitfall 3: Ignoring Context¶

Example: - SOC A: MTTR = 30 minutes - SOC B: MTTR = 4 hours - Conclusion: "SOC A is better"

Missing Context: - SOC A incidents: Mostly low-severity phishing - SOC B incidents: Complex APT intrusions requiring forensics

Mitigation: Segment metrics by severity and incident type.

Interactive Element¶

MicroSim 11: SOC Metrics Dashboard

Simulate SOC operations and see real-time impact of decisions on MTTD, MTTR, FP rate, and analyst burnout.

Common Misconceptions¶

Misconception: Lower Alert Volume Always Means Better Security

Reality: Reducing alerts by disabling detections degrades security. The goal is to reduce noise (FPs) while maintaining or improving signal (TPs).

Misconception: MTTR Is the Only Metric That Matters

Reality: MTTR without MTTD context is incomplete. Fast response (low MTTR) is useless if you detect threats 60 days late (high MTTD). Balanced approach is key.

Misconception: Metrics Are Only for Reporting to Management

Reality: Metrics are operational tools. Analysts use them to prioritize work, identify bottlenecks, and improve processes. Management reporting is a secondary benefit.

Practice Tasks¶

Task 1: Calculate MTTR¶

Incidents: - Incident 1: Detected at 10:00, contained at 10:45 (45 min) - Incident 2: Detected at 14:00, contained at 18:00 (4 hours = 240 min) - Incident 3: Detected at 08:00, contained at 09:30 (90 min)

Question: What is the MTTR?

Answer

MTTR = (45 + 240 + 90) / 3 = 375 / 3 = 125 minutes (2 hours 5 minutes)

Analysis: Incident 2 is an outlier (4 hours). Investigate why it took so long: - Complex investigation? - Delayed escalation? - Lack of automation?

Task 2: Evaluate Detection Coverage¶

Your Environment: - Total applicable ATT&CK techniques: 120 - Techniques with detections: 84

Question: a) What is your detection coverage %? b) Is this acceptable?

Answers

a) Coverage = 84 / 120 = 70%

b) Acceptable but needs improvement: - 70% is decent for mid-maturity SOC - Target: >80% for critical techniques (Initial Access, Execution, Lateral Movement, Exfiltration) - Action: Prioritize gap analysis for the 36 uncovered techniques, focusing on high-impact ones

Task 3: Identify Vanity Metric¶

Which of these is a vanity metric? a) "We blocked 1 million threats this month" b) "Our MTTD decreased from 21 days to 10 days" c) "We improved detection coverage from 60% to 75%"

Answer

a) "We blocked 1 million threats this month" is a vanity metric.

Why: - Lacks context: What were the threats? Severity? Were they already blocked by other controls? - Doesn't indicate SOC effectiveness (firewall may have blocked 999,000; SOC contributed minimally) - Sounds impressive but doesn't measure SOC performance

b) and c) are actionable metrics: - MTTD reduction: Measurable improvement in detection speed - Coverage increase: Tangible gap closure, reduces blind spots

Exam Prep & Certifications¶

Relevant Certifications

The topics in this chapter align with the following certifications:

CompTIA Security+ — Domains: Security Operations, Security Architecture
CompTIA CySA+ — Domains: Security Operations, Threat Management
GIAC GCIH — Domains: Detection, Automation
CISSP — Domains: Security Operations, Software Development Security

View full Certifications Roadmap →

Self-Assessment Quiz¶

Question 1: What does MTTD measure?

Options:

a) Mean Time to Deploy new detection rules b) Mean Time to Detect an incident after initial compromise c) Mean Time to Document incident reports d) Mean Time to Disable user accounts

Show Answer

Correct Answer: b) Mean Time to Detect an incident after initial compromise

Explanation: MTTD tracks detection speed—the time between attacker's initial entry and SOC detection.

Question 2: A SOC has 1,000 alerts. 700 are false positives. What is the FP rate?

Options:

a) 30% b) 50% c) 70% d) 100%

Show Answer

Correct Answer: c) 70%

Explanation: FP Rate = 700 / 1,000 = 70%

Question 3: What is the purpose of a balanced scorecard?

Options:

a) To measure only financial metrics b) To track performance across multiple dimensions (detection, response, operations, business) c) To replace all other metrics with a single score d) To calculate analyst bonuses

Show Answer

Correct Answer: b) To track performance across multiple dimensions (detection, response, operations, business)

Explanation: Balanced scorecards prevent over-optimization of a single metric by measuring holistic performance.

Question 4: What does detection coverage measure?

Options:

a) Percentage of alerts that are true positives b) Percentage of ATT&CK techniques with active detections c) Percentage of systems with EDR deployed d) Percentage of analysts who passed certification exams

Show Answer

Correct Answer: b) Percentage of ATT&CK techniques with active detections

Explanation: Detection coverage maps detections to ATT&CK techniques, identifying gaps in monitoring.

Question 5: What is Goodhart's Law in the context of SOC metrics?

Options:

a) Metrics should always increase over time b) When a measure becomes a target, it ceases to be a good measure c) Metrics are more important than actual security d) All metrics should be publicly reported

Show Answer

Correct Answer: b) When a measure becomes a target, it ceases to be a good measure

Explanation: If you over-optimize for one metric (e.g., "reduce alerts"), people game the system (disable detections), degrading actual security.

Question 6: Why is continuous retraining important for ML-based detections?

Options:

a) It's not important; ML models are static once deployed b) The threat landscape evolves, causing model drift and degraded performance c) Retraining increases alert volume d) Analysts need something to do

Show Answer

Correct Answer: b) The threat landscape evolves, causing model drift and degraded performance

Explanation: Attackers change TTPs; data distributions shift. Models trained on old data become less effective. Regular retraining maintains accuracy.

Summary¶

In this chapter, you learned:

SOC performance metrics: MTTD, MTTR, MTTA, dwell time, alert volume, FP rate
Coverage metrics: Detection coverage (ATT&CK %), test coverage
AI/ML metrics: Precision, recall, F1-score, automation ROI
Balanced scorecard: Multi-dimensional performance measurement (detection, response, operations, business)
Continuous improvement: PDCA cycle, regular review cadence, benchmarking
Metrics pitfalls: Goodhart's Law, vanity metrics, ignoring context

Next Steps¶

Next Chapter: Chapter 12: Governance, Privacy & Risk - Learn compliance, ethical considerations, and risk management
Practice: Build a SOC scorecard for your environment using the Metrics Dashboard MicroSim
Implement: Start tracking MTTD, MTTR, and FP rate monthly
Benchmark: Compare your metrics to industry reports (Verizon DBIR, Mandiant M-Trends)

Chapter 11 Complete | Next: Chapter 12 →