Chapter 11: Evaluation & Metrics - Quiz¶
Instructions¶
Test your knowledge of SOC metrics (MTTD/MTTA/MTTR/MTTC), ML performance metrics (precision/recall/F1/ROC/AUC), confusion matrices, balanced scorecards, coverage metrics, alert fatigue, and dwell time.
Question 1: What does MTTD (Mean Time to Detect) measure?
A) Time from incident creation to analyst acknowledgment B) Average time from when an incident occurs to when it is detected by security controls C) Time to resolve an incident D) Time to deploy detection rules
Answer
Correct Answer: B) Average time from when an incident occurs to when it is detected
Explanation:
MTTD (Mean Time to Detect): - Definition: Average time between incident occurrence and detection - Start: Attacker action begins - End: Security control generates alert - Goal: Minimize detection lag
Example Calculation:
Incident 1:
- Attacker starts: 10:00 AM
- EDR detects malware: 10:05 AM
- MTTD: 5 minutes
Incident 2:
- Phishing email sent: 2:00 PM
- User reports: 2:30 PM
- MTTD: 30 minutes
Incident 3:
- Lateral movement: 3:00 PM
- SIEM alerts: 3:45 PM
- MTTD: 45 minutes
Average MTTD = (5 + 30 + 45) / 3 = 26.7 minutes
Industry Benchmarks: - Excellent: < 1 hour - Good: 1-4 hours - Average: 4-24 hours - Poor: > 24 hours
Factors Affecting MTTD: - Detection Coverage: More sensors → Faster detection - Rule Quality: Well-tuned rules → Fewer blind spots - Threat Intelligence: IOC matching → Immediate detection - Behavioral Analytics: UEBA → Detects novel attacks
Improving MTTD:
1. Deploy EDR on all endpoints (real-time detection)
2. Integrate threat intelligence feeds (instant IOC matching)
3. Implement UEBA (detect anomalies, not just signatures)
4. Reduce alert noise (analysts focus on real threats)
5. Automate enrichment (faster validation)
MTTD vs Dwell Time: - MTTD: Detection lag only - Dwell Time: Full time attacker remains undetected (includes MTTD + investigation + containment)
Reference: Chapter 11, Section 11.1 - MTTD
Question 2: What does MTTA (Mean Time to Acknowledge) measure?
A) Time from incident occurrence to detection B) Average time from alert generation to analyst acknowledgment C) Time to contain an incident D) Time to eradicate threats
Answer
Correct Answer: B) Average time from alert generation to analyst acknowledgment
Explanation:
MTTA (Mean Time to Acknowledge): - Definition: Average time from alert firing to analyst acknowledgment - Start: SIEM/EDR generates alert - End: Analyst opens/acknowledges alert - Goal: Minimize alert queue time
Example Calculation:
Alert 1:
- Alert fires: 10:05 AM
- Analyst acknowledges: 10:12 AM
- MTTA: 7 minutes
Alert 2:
- Alert fires: 11:00 AM
- Analyst acknowledges: 11:25 AM
- MTTA: 25 minutes
Alert 3:
- Alert fires: 2:00 PM
- Analyst acknowledges: 2:03 PM
- MTTA: 3 minutes
Average MTTA = (7 + 25 + 3) / 3 = 11.7 minutes
Industry Benchmarks: - Excellent: < 5 minutes - Good: 5-15 minutes - Average: 15-30 minutes - Poor: > 30 minutes
Factors Affecting MTTA: - Alert Volume: 500 alerts/day → Longer queue times - Prioritization: Auto-scoring → High-priority alerts acknowledged faster - Staffing: 24/7 coverage → Faster acknowledgment - Alert Fatigue: High FP rate → Analysts desensitized, slower response
Improving MTTA:
1. Alert Prioritization: ML scoring → Analysts focus on high-risk alerts first
2. Reduce False Positives: Tuning → Less alert noise
3. Auto-Enrichment: SOAR → Alerts pre-enriched, faster triage
4. Adequate Staffing: 24/7 coverage → No queue buildup
5. Dashboards: Real-time alert queue visibility
MTTA Impact:
Scenario 1: Long MTTA (45 minutes)
- Ransomware alert fires at 10:00 AM
- Analyst acknowledges at 10:45 AM
- By then: Ransomware has encrypted 50 systems
- Impact: High damage
Scenario 2: Short MTTA (3 minutes)
- Ransomware alert fires at 10:00 AM
- Analyst acknowledges at 10:03 AM
- Immediate containment: Only 2 systems affected
- Impact: Minimal damage
Reference: Chapter 11, Section 11.2 - MTTA
Question 3: What does MTTR (Mean Time to Respond) measure?
A) Time to detect an incident B) Average time from detection to incident containment/remediation C) Time to acknowledge an alert D) Time to generate a report
Answer
Correct Answer: B) Average time from detection to incident containment/remediation
Explanation:
MTTR (Mean Time to Respond/Remediate): - Definition: Average time from detection to containment/resolution - Start: Alert acknowledged (investigation begins) - End: Incident contained/remediated - Goal: Minimize damage window
Example Calculation:
Incident 1:
- Alert acknowledged: 10:05 AM
- Incident contained: 11:00 AM
- MTTR: 55 minutes
Incident 2:
- Alert acknowledged: 2:00 PM
- Incident contained: 4:30 PM
- MTTR: 150 minutes
Incident 3:
- Alert acknowledged: 9:00 PM
- Incident contained: 9:20 PM
- MTTR: 20 minutes
Average MTTR = (55 + 150 + 20) / 3 = 75 minutes
Industry Benchmarks: - Excellent: < 1 hour - Good: 1-4 hours - Average: 4-24 hours - Poor: > 24 hours
Factors Affecting MTTR: - Runbooks: Clear procedures → Faster response - Automation: SOAR playbooks → Immediate containment - Analyst Skill: Experienced analysts → Efficient investigation - Tool Integration: Seamless EDR/firewall integration → Quick actions
MTTR Phases:
Phase 1: Investigation (30% of MTTR)
- Scope incident (how many systems affected?)
- Root cause analysis
Phase 2: Containment (40% of MTTR)
- Isolate affected systems
- Disable compromised accounts
- Block malicious IPs
Phase 3: Eradication (20% of MTTR)
- Remove malware
- Close backdoors
- Patch vulnerabilities
Phase 4: Validation (10% of MTTR)
- Verify threat eliminated
- Confirm systems clean
Improving MTTR:
1. Playbooks/Runbooks: Documented procedures reduce decision time
2. SOAR Automation: Auto-isolation, auto-blocking → Faster containment
3. Analyst Training: Skill development → Efficient investigation
4. Tool Integration: One-click containment (EDR isolation from SIEM)
5. Pre-Approved Actions: Pre-authorize common responses (block known-bad IPs)
MTTR Impact Example:
Ransomware Incident:
Long MTTR (6 hours):
- 10:00 AM: Ransomware detected
- 4:00 PM: Contained
- Damage: 200 systems encrypted, $2M recovery cost
Short MTTR (30 minutes):
- 10:00 AM: Ransomware detected
- 10:30 AM: Contained
- Damage: 5 systems encrypted, $50K recovery cost
ROI of MTTR Reduction: $1.95M saved
Reference: Chapter 11, Section 11.3 - MTTR
Question 4: What does MTTC (Mean Time to Contain) measure?
A) Time to create alerts B) Average time from detection to when the threat is fully contained (stopped from spreading) C) Time to close tickets D) Time to communicate with stakeholders
Answer
Correct Answer: B) Average time from detection to when threat is fully contained
Explanation:
MTTC (Mean Time to Contain): - Definition: Average time from detection to containment (stopping threat spread) - Start: Alert detected - End: Threat contained (can't spread further) - Difference from MTTR: MTTC = containment only; MTTR = containment + eradication + recovery
Example:
Ransomware Incident:
- 10:00 AM: Ransomware detected on WKS-001
- 10:15 AM: Analyst investigates, finds lateral movement to FILE-SRV
- 10:30 AM: Both systems isolated (network disconnected)
- MTTC: 30 minutes (threat contained, cannot spread)
- 11:00 AM: Malware removed (eradication)
- 12:00 PM: Systems restored from backup (recovery)
- MTTR: 2 hours (full resolution)
MTTC (30 min) < MTTR (2 hours)
Why MTTC Matters:
Critical Insight: Containment prevents additional damage
Fast Containment (MTTC 15 min):
- Ransomware on 1 system
- Isolated before spread
- Impact: 1 system
Slow Containment (MTTC 2 hours):
- Ransomware spreads to 50 systems
- Impact: 50 systems
MTTC is often more important than MTTR for limiting blast radius
Containment Strategies:
1. Network Isolation
- Disconnect from network (EDR, firewall ACL)
- MTTC: < 5 minutes (automated)
2. Account Disabling
- Disable compromised AD account
- MTTC: < 10 minutes
3. IP Blocking
- Block C2 IP at firewall/proxy
- MTTC: < 5 minutes (automated)
4. Process Termination
- Kill malicious process via EDR
- MTTC: < 1 minute (automated)
MTTC Automation:
SOAR Playbook: Ransomware Auto-Containment
Trigger: EDR ransomware alert
Actions:
1. Isolate host (0-2 minutes) ← CONTAINMENT
2. Disable user account (2-3 minutes) ← CONTAINMENT
3. Block C2 IPs (3-4 minutes) ← CONTAINMENT
4. Create IR ticket (4-5 minutes)
5. Notify analyst (5 minutes)
Automated MTTC: 5 minutes (vs. 30-60 minutes manual)
Industry Benchmarks: - Excellent: < 15 minutes - Good: 15-60 minutes - Average: 1-4 hours - Poor: > 4 hours
Reference: Chapter 11, Section 11.4 - MTTC
Question 5: What is a confusion matrix and what does it show?
A) A matrix that confuses analysts B) A table showing True Positives, False Positives, True Negatives, and False Negatives for ML model evaluation C) A network routing table D) A compliance checklist
Answer
Correct Answer: B) A table showing TP, FP, TN, FN for ML model evaluation
Explanation:
Confusion Matrix: - Purpose: Visualize ML classification performance - Structure: 2x2 table (for binary classification) - Contents: Actual vs Predicted class counts
Confusion Matrix Structure:
Predicted MALWARE Predicted BENIGN
Actually
MALWARE TP = 90 FN = 10 (100 total malware)
Actually
BENIGN FP = 20 TN = 880 (900 total benign)
Total: 110 predicted 890 predicted
malware benign
Definitions: - TP (True Positive): Correctly predicted malware (90) - FP (False Positive): Benign incorrectly predicted as malware (20) - FN (False Negative): Malware incorrectly predicted as benign (10) - TN (True Negative): Correctly predicted benign (880)
Derived Metrics:
Accuracy = (TP + TN) / Total
= (90 + 880) / 1000 = 97%
Precision = TP / (TP + FP)
= 90 / (90 + 20) = 81.8%
"Of predicted malware, how many were actually malware?"
Recall = TP / (TP + FN)
= 90 / (90 + 10) = 90%
"Of actual malware, how many did we detect?"
F1 Score = 2 × (Precision × Recall) / (Precision + Recall)
= 2 × (0.818 × 0.90) / (0.818 + 0.90) = 0.857 = 85.7%
False Positive Rate = FP / (FP + TN)
= 20 / (20 + 880) = 2.2%
False Negative Rate = FN / (TP + FN)
= 10 / (90 + 10) = 10%
SOC Example: Phishing Detection Model
Test Set: 500 emails (100 phishing, 400 legitimate)
Confusion Matrix:
Predicted PHISHING Predicted LEGITIMATE
Actually
PHISHING TP = 92 FN = 8 (100)
Actually
LEGITIMATE FP = 15 TN = 385 (400)
Metrics:
- Accuracy: (92 + 385) / 500 = 95.4%
- Precision: 92 / (92 + 15) = 86%
- Recall: 92 / (92 + 8) = 92%
- F1: 88.9%
Interpretation:
- 92% of phishing emails detected (good recall)
- 86% of phishing predictions are correct (good precision)
- 8 phishing emails missed (FN) - need improvement
- 15 legitimate emails flagged as phishing (FP) - acceptable
Multi-Class Confusion Matrix:
Alert Severity Classification (Low/Medium/High/Critical)
Predicted Predicted Predicted Predicted
Low Medium High Critical
Actually Low 850 40 5 0
Actually Medium 30 180 15 0
Actually High 5 25 65 5
Actually Crit. 0 2 8 40
Shows: Model confuses High and Critical (8 Critical predicted as High)
Action: Retrain with more Critical examples
Using Confusion Matrix:
1. Identify Weaknesses:
- High FN? Model missing threats → Improve recall
- High FP? Too many false alarms → Improve precision
2. Tune Thresholds:
- Lower threshold → More detections (higher recall, lower precision)
- Raise threshold → Fewer false alarms (higher precision, lower recall)
3. Compare Models:
- Model A: Precision 90%, Recall 70%
- Model B: Precision 80%, Recall 90%
- Choose based on priority (FP vs FN tolerance)
Reference: Chapter 11, Section 11.5 - Confusion Matrix
Question 6: What is the difference between precision and recall, and when would you prioritize each?
A) Precision and recall are the same B) Precision = accuracy of positive predictions, Recall = coverage of actual positives. Prioritize precision to reduce FPs, recall to reduce FNs C) Precision is always more important D) Only accuracy matters
Answer
Correct Answer: B) Precision = accuracy of positives, Recall = coverage. Prioritize precision for FPs, recall for FNs
Explanation:
Precision: - Formula: Precision = TP / (TP + FP) - Question: "Of all predicted positives, how many were correct?" - Focus: Minimizing false positives - Trade-off: Can miss threats (low recall) to avoid false alarms
Recall (Sensitivity): - Formula: Recall = TP / (TP + FN) - Question: "Of all actual positives, how many did we detect?" - Focus: Minimizing false negatives - Trade-off: May have more false alarms (low precision) to catch all threats
When to Prioritize Precision:
Use Case: Auto-Blocking System
Scenario: SOAR auto-blocks IPs flagged by ML model
Priority: HIGH PRECISION (minimize FPs)
Reason:
- False Positive = Blocking legitimate IP → Business disruption
- Cost of FP: High (downtime, customer impact)
- Cost of FN: Medium (analyst can catch manually)
Strategy:
- High confidence threshold (>90%) for auto-blocking
- Precision: 95%, Recall: 70%
- Accept missing some threats to avoid blocking legitimate traffic
Use Case: Alert Generation (Not Auto-Blocking)
Scenario: Model generates alerts for analyst review
Priority: MEDIUM PRECISION, MEDIUM RECALL (balanced)
Reason:
- False Positive = Analyst wastes time (acceptable in moderation)
- False Negative = Missed threat (unacceptable)
- Cost of FP: Low (analyst time)
- Cost of FN: High (breach)
Strategy:
- Moderate threshold (>70%)
- Precision: 80%, Recall: 85%
- Balance FP and FN
When to Prioritize Recall:
Use Case: Critical Infrastructure Protection
Scenario: Nuclear plant malware detection
Priority: HIGH RECALL (catch all threats)
Reason:
- False Negative = Missed malware → Catastrophic failure
- False Positive = Investigation time (acceptable)
- Cost of FN: Catastrophic
- Cost of FP: Low (analyst time)
Strategy:
- Low confidence threshold (>50%)
- Precision: 60%, Recall: 98%
- Accept high FP rate to catch nearly all threats
Use Case: Initial Screening
Scenario: First-stage malware scan before deeper analysis
Priority: HIGH RECALL
Reason:
- Stage 1: Cast wide net (high recall)
- Stage 2: Human analyst reviews flagged items (filters FPs)
- Cost of FN: High (missed malware)
- Cost of FP: Low (analyst reviews)
Strategy:
- Recall: 95%, Precision: 65%
- Send all suspicious items to analyst
Precision vs Recall Trade-off:
Threshold Adjustment:
Threshold: 0.9 (very strict)
- Only flag if 90%+ confident
- Precision: 95% (very few FPs)
- Recall: 60% (miss 40% of threats)
- Use: Auto-blocking, low FP tolerance
Threshold: 0.7 (moderate)
- Flag if 70%+ confident
- Precision: 85%
- Recall: 80%
- Use: Balanced alert generation
Threshold: 0.5 (aggressive)
- Flag if 50%+ confident
- Precision: 70% (many FPs)
- Recall: 95% (catch almost all threats)
- Use: Critical systems, can't afford to miss threats
Real-World Example:
Phishing Email Detection:
Option A: High Precision
- Threshold: 95%
- Precision: 98% (only 2% of flagged emails are legitimate)
- Recall: 75% (miss 25% of phishing)
- Action: Auto-delete flagged emails
- Justification: Can't risk deleting legitimate emails
Option B: High Recall
- Threshold: 60%
- Precision: 70% (30% of flagged emails are legitimate)
- Recall: 95% (catch 95% of phishing)
- Action: Move to "Suspected Phishing" folder for user review
- Justification: User can manually check, better than missing phishing
F1 Score (Balanced Metric):
When you need balance between Precision and Recall:
F1 = 2 × (Precision × Recall) / (Precision + Recall)
Scenario: General malware detection
- Can't afford too many FPs (analyst fatigue)
- Can't afford too many FNs (missed malware)
- Optimize for F1 score (harmonic mean of precision and recall)
Example:
Model A: Precision 90%, Recall 70%, F1 = 78.9%
Model B: Precision 80%, Recall 85%, F1 = 82.4% ← Better balanced
Question 7: What does ROC (Receiver Operating Characteristic) curve show?
A) Return on investment B) Trade-off between True Positive Rate and False Positive Rate across different thresholds C) Network bandwidth D) Analyst productivity
Answer
Correct Answer: B) Trade-off between TPR and FPR across different thresholds
Explanation:
ROC Curve: - Purpose: Visualize classifier performance across all thresholds - X-axis: False Positive Rate (FPR) - Y-axis: True Positive Rate (TPR / Recall) - Interpretation: Higher curve = better model
ROC Metrics:
True Positive Rate (TPR) = Recall = TP / (TP + FN)
- "What % of actual threats did we detect?"
False Positive Rate (FPR) = FP / (FP + TN)
- "What % of benign items did we incorrectly flag?"
ROC Curve Example:
Malware Detector Confidence Thresholds:
Threshold 0.9 (strict):
- TPR: 60% (detect 60% of malware)
- FPR: 1% (1% of benign files flagged)
- Point: (0.01, 0.60)
Threshold 0.7:
- TPR: 80%
- FPR: 5%
- Point: (0.05, 0.80)
Threshold 0.5:
- TPR: 90%
- FPR: 15%
- Point: (0.15, 0.90)
Threshold 0.3 (aggressive):
- TPR: 98%
- FPR: 40%
- Point: (0.40, 0.98)
Plot these points to create ROC curve
ROC Interpretation:
Perfect Classifier:
- Curve hugs top-left corner
- TPR = 100%, FPR = 0%
- Unrealistic in practice
Good Classifier:
- Curve bows toward top-left
- High TPR, low FPR
- Example: (FPR: 5%, TPR: 90%)
Random Classifier:
- Diagonal line (no better than guessing)
- TPR = FPR
- Example: (FPR: 50%, TPR: 50%)
Bad Classifier:
- Curve below diagonal (worse than random)
- Should invert predictions
AUC (Area Under Curve):
AUC Score: 0 to 1
AUC = 1.0: Perfect classifier
AUC = 0.9-0.99: Excellent
AUC = 0.8-0.89: Good
AUC = 0.7-0.79: Fair
AUC = 0.5: Random (no skill)
AUC < 0.5: Worse than random
Example:
Malware Detector AUC = 0.92 → Excellent performance
Using ROC for Threshold Selection:
Scenario: Choose threshold for malware detection
Option 1: High TPR, High FPR (Threshold 0.3)
- TPR: 98% (catch almost all malware)
- FPR: 40% (high false alarm rate)
- Use Case: Critical systems, can't afford to miss malware
- Trade-off: Analyst investigates many false alarms
Option 2: Medium TPR, Low FPR (Threshold 0.7)
- TPR: 80% (catch most malware)
- FPR: 5% (low false alarm rate)
- Use Case: Balanced approach
- Trade-off: Miss 20% of malware
Option 3: Low TPR, Very Low FPR (Threshold 0.9)
- TPR: 60% (catch only high-confidence malware)
- FPR: 1% (very low false alarms)
- Use Case: Auto-blocking (can't afford FPs)
- Trade-off: Miss 40% of malware
Comparing Models with ROC:
Model A AUC: 0.85
Model B AUC: 0.92
Model C AUC: 0.78
Ranking: Model B > Model A > Model C
Conclusion: Deploy Model B (best overall performance)
ROC Limitations:
1. Class Imbalance:
- ROC can be optimistic with imbalanced data (rare threats)
- Precision-Recall curve may be more informative
2. Single Threshold:
- Shows all thresholds, but you must choose one
- Business requirements dictate threshold choice
3. Equal Cost Assumption:
- Assumes FP and FN costs are equal
- In security, FN (missed threat) often more costly than FP
SOC Example:
UEBA Anomaly Detector:
ROC Analysis:
- AUC: 0.88 (good performance)
- Operating Point: FPR 10%, TPR 85%
- 10% FP rate = 50 false alerts/day (acceptable)
- 85% TPR = Detect most insider threats
Decision:
- Deploy at threshold 0.65
- Monitor FP rate weekly
- Retrain if performance degrades
Reference: Chapter 11, Section 11.7 - ROC/AUC
Question 8: What is dwell time and why is it a critical security metric?
A) Time analysts spend at their desks B) Average time an attacker remains undetected in the environment from initial compromise to discovery C) Time to respond to alerts D) Time to generate reports
Answer
Correct Answer: B) Average time attacker remains undetected from compromise to discovery
Explanation:
Dwell Time: - Definition: Time from initial compromise to detection/eradication - Start: Attacker gains initial access - End: Organization detects and removes attacker - Goal: Minimize (shorter dwell time = less damage)
Dwell Time Calculation:
Incident Timeline:
- Feb 1, 10:00 AM: Phishing email clicked (initial access)
- Feb 15, 2:00 PM: Unusual data exfiltration detected
- Feb 15, 4:00 PM: Incident confirmed and attacker ejected
Dwell Time = Feb 15 4:00 PM - Feb 1 10:00 AM = 14 days, 6 hours
Industry Benchmarks:
2024 IBM Cost of Data Breach Report:
- Global Average Dwell Time: 277 days (9+ months!)
- Best-in-Class: < 30 days
- Excellent: < 7 days
- Good: < 24 hours
- Optimal: < 1 hour
Why Dwell Time Matters:
Short Dwell Time (< 1 day):
Impact:
- Limited lateral movement (1-2 systems)
- Minimal data exfiltration
- Ransomware contained before encryption
- Low recovery cost
Example:
- Day 0: Attacker gains access
- Day 0 + 6 hours: EDR detects, SOC contains
- Damage: 1 compromised workstation, no data loss
- Cost: $10,000 (cleanup, user re-imaging)
Long Dwell Time (> 100 days):
Impact:
- Extensive lateral movement (50+ systems)
- Persistent backdoors established
- Significant data exfiltration
- Ransomware encrypts critical systems
- High recovery cost
Example:
- Day 0: Attacker gains access (phishing)
- Day 30: Lateral movement to servers
- Day 60: Persistent backdoors deployed
- Day 90: Credential harvesting, data exfiltration
- Day 120: Ransomware deployed
- Day 120 + 2 hours: Detected (too late)
- Damage: 200 systems encrypted, 500GB data stolen
- Cost: $5,000,000 (ransom, recovery, regulatory fines, lawsuits)
Factors Affecting Dwell Time:
1. Detection Capabilities:
Poor Detection:
- Signature-only (no behavioral analytics)
- No threat intelligence
- Limited log coverage
- Result: Long dwell time (months)
Strong Detection:
- EDR on all endpoints
- UEBA for anomalies
- Threat intelligence integration
- Result: Short dwell time (hours-days)
2. Threat Type:
Commodity Malware:
- Dwell Time: Hours (noisy, easy to detect)
- Example: Mass ransomware campaign
Sophisticated APT:
- Dwell Time: Months (stealthy, living off the land)
- Example: Nation-state espionage
3. Response Speed:
Fast Response (SOAR automation):
- Alert → Auto-containment: 5 minutes
- Dwell Time: < 1 hour
Slow Response (manual):
- Alert → Analyst triage: 2 hours
- Triage → Escalation: 4 hours
- Investigation → Containment: 12 hours
- Dwell Time: 18+ hours
Reducing Dwell Time:
1. Improve MTTD (Mean Time to Detect):
- Deploy EDR, UEBA, threat intelligence
- Reduce detection lag from months to minutes
2. Improve MTTA (Mean Time to Acknowledge):
- Auto-prioritization
- 24/7 SOC coverage
- Reduce alert queue time
3. Improve MTTR (Mean Time to Respond):
- SOAR automation
- Clear runbooks
- Reduce containment time
Formula: Dwell Time = MTTD + MTTA + MTTR (investigation + containment)
Dwell Time vs. Time-to-Objectives:
Attacker Objectives Timeline:
- Hour 1: Initial access (phishing)
- Hour 2: Establish persistence
- Hour 6: Lateral movement begins
- Day 1: Domain admin credentials stolen
- Day 3: Data exfiltration starts
- Day 7: Ransomware deployed
Defender Goal: Detect and contain before key objectives
- Detect within Hour 1 → Prevent persistence
- Detect within Day 1 → Prevent data exfil
- Detect after Day 7 → Already lost (ransomware deployed)
Measuring Dwell Time:
Challenge: Hard to measure until breach is detected
Approaches:
1. Post-Incident Analysis:
- Forensics determines initial compromise date
- Calculate: (Detection date - Compromise date)
2. Purple Team Exercises:
- Red team simulates attack
- Measure how long until blue team detects
- Realistic dwell time estimate
3. Indicators:
- MTTD trends (if MTTD decreasing, dwell time likely decreasing)
- Threat hunting findings (proactive detection reduces dwell time)
Reference: Chapter 11, Section 11.8 - Dwell Time
Question 9: What is alert fatigue and how do you measure it?
A) Analysts being tired from working long hours B) Desensitization from excessive alerts (especially false positives), measured by response times, closure rates, and alert volume C) Fatigued hardware D) Alert fatigue is not measurable
Answer
Correct Answer: B) Desensitization from excessive alerts, measured by response times, closure rates, and alert volume
Explanation:
Alert Fatigue: - Definition: Desensitization to alerts due to high volume and false positive rate - Cause: Too many alerts, too many false positives - Impact: Analysts ignore/delay critical alerts, miss real threats
Alert Fatigue Metrics:
1. Alert Volume:
Alerts per day:
- < 100: Manageable
- 100-300: Moderate (monitor FP rate)
- 300-500: High (tuning needed)
- > 500: Critical (severe fatigue risk)
Example:
SOC receives 800 alerts/day
- 3 analysts on shift
- 267 alerts/analyst/day
- 8-hour shift: 33 alerts/hour = 1 alert every 2 minutes
- Result: No time for deep investigation, alert fatigue
2. False Positive Rate:
FP Rate = False Positives / Total Alerts
Thresholds:
- < 10%: Excellent
- 10-20%: Acceptable
- 20-40%: Concerning
- > 40%: Critical (severe fatigue)
Example:
500 alerts/day, 60% FP rate
- 300 alerts/day are false positives
- Analysts waste time on noise
- Real threats buried in noise
3. Mean Time to Acknowledge (MTTA) Trend:
Alert Fatigue Indicator: MTTA increasing over time
Week 1: MTTA = 5 minutes
Week 4: MTTA = 12 minutes
Week 8: MTTA = 25 minutes ← Alert fatigue
Cause: Analysts deprioritizing alerts (assume FP)
4. Alert Closure Rate:
Closure Rate = Closed Alerts / Total Alerts
Fatigue Indicator: Low closure rate
Healthy: 95% closed (analysts investigate all)
Fatigued: 70% closed (analysts ignore 30%)
Example:
500 alerts generated
350 alerts closed (70%)
150 alerts ignored/aged out (30%) ← Potential real threats missed
5. Analyst Turnover:
Fatigue Impact: Burnout, high turnover
Healthy: < 10% annual turnover
Concerning: 20-30% turnover
Critical: > 30% turnover
Exit Interview Themes:
- "Too many alerts"
- "Can't keep up"
- "Work feels meaningless (all false positives)"
6. Time-to-Triage Trend:
Average time spent per alert:
No Fatigue: 10 minutes (thorough investigation)
Moderate Fatigue: 5 minutes (rushed triage)
Severe Fatigue: 2 minutes (rubber-stamping, minimal investigation)
Risk: Cursory review misses nuanced threats
Alert Fatigue Impact:
Scenario: Critical Alert Missed
Environment:
- 600 alerts/day, 50% FP rate
- Analysts fatigued, MTTA = 30 minutes
Incident:
- 10:00 AM: Critical ransomware alert fires
- 10:30 AM: Analyst acknowledges (delayed due to queue)
- Analyst assumes FP (based on high FP rate), closes without investigation
- 12:00 PM: Ransomware encrypts 50 systems
Root Cause: Alert fatigue → Critical alert dismissed as FP
Reducing Alert Fatigue:
1. Tune Detection Rules:
Before: Generic "failed login" rule
- 400 alerts/day (80% FP - service account password rotation lag)
After: Tuned rule
- Exclude known service accounts
- Alert only on > 10 failures in 5 minutes
- 50 alerts/day (20% FP)
Impact: 87.5% alert reduction
2. Alert Prioritization:
ML Alert Scoring:
- Critical (score > 90): 10 alerts/day → Immediate investigation
- High (70-90): 30 alerts/day → Investigate same day
- Medium (50-70): 100 alerts/day → Review if time permits
- Low (< 50): 200 alerts/day → Auto-close or log only
Analyst Focus: Top 40 high-priority alerts (vs. 340 total)
Result: Reduced fatigue, higher-quality investigations
3. Automation (SOAR):
Auto-Close False Positives:
- Legitimate service account failures → Auto-close
- Known-good IPs flagged → Auto-close
- Result: 40% alert reduction
Auto-Enrichment:
- Threat intel, user context pre-populated
- Analyst spends 2 min vs 10 min per alert
4. Deduplication:
Before: 50 alerts for same incident (1 malware on 1 system)
After: 1 aggregated incident ticket
Result: 98% noise reduction
5. Threshold Tuning:
Data Access Alert:
Before: Alert on > 10 files/hour (500 alerts/day, 70% FP)
After: Alert on > 100 files/hour (50 alerts/day, 20% FP)
Result: 90% reduction, lower FP rate
Monitoring Alert Fatigue Dashboard:
Weekly SOC Health Metrics:
Alert Volume:
- This Week: 2,450 alerts
- Last Week: 2,100 alerts
- Trend: ↑ 16% (investigate spike)
False Positive Rate:
- This Week: 35%
- Last Week: 30%
- Trend: ↑ (tune rules)
MTTA:
- This Week: 18 minutes
- Last Week: 12 minutes
- Trend: ↑ (alert fatigue indicator)
Closure Rate:
- This Week: 82%
- Last Week: 88%
- Trend: ↓ (analysts overwhelmed)
Action Required: Tune high-volume, high-FP rules to reduce fatigue
Reference: Chapter 11, Section 11.9 - Alert Fatigue
Question 10: What is a balanced scorecard in SOC metrics?
A) A physical balance scale B) A framework measuring SOC performance across multiple dimensions (detection, response, efficiency, quality) not just single metrics C) A financial report D) Balanced scorecards are not used in SOCs
Answer
Correct Answer: B) Framework measuring performance across multiple dimensions (detection, response, efficiency, quality)
Explanation:
Balanced Scorecard: - Purpose: Holistic SOC performance measurement - Approach: Multiple categories, not just one metric - Benefit: Prevents over-optimization of single metric at expense of others
SOC Balanced Scorecard Dimensions:
1. Detection Effectiveness:
Metrics:
- Mean Time to Detect (MTTD): 45 minutes (Target: < 1 hour) ✅
- Detection Coverage: 85% of MITRE ATT&CK (Target: > 80%) ✅
- Threat Hunting Findings: 12/quarter (Target: > 10) ✅
- UEBA Anomaly Detection Rate: 15/month
Goal: Catching threats quickly and comprehensively
2. Response Efficiency:
Metrics:
- Mean Time to Acknowledge (MTTA): 8 minutes (Target: < 10 min) ✅
- Mean Time to Respond (MTTR): 65 minutes (Target: < 2 hours) ✅
- Mean Time to Contain (MTTC): 20 minutes (Target: < 30 min) ✅
- Incident Escalation Rate: 15% (Target: 10-20%) ✅
Goal: Responding quickly and efficiently
3. Quality & Accuracy:
Metrics:
- False Positive Rate: 18% (Target: < 20%) ✅
- True Positive Rate: 88% (Target: > 85%) ✅
- Alert Precision: 82% (Target: > 80%) ✅
- Missed Incidents (False Negatives): 2/quarter (Target: < 5) ✅
Goal: Accurate detections, minimal noise
4. Operational Efficiency:
Metrics:
- Alerts per Analyst per Day: 45 (Target: < 50) ✅
- Alert Handling Capacity: 300 alerts/day (team of 5)
- Automation Rate: 60% (Target: > 50%) ✅
- Cost per Alert Processed: $12 (Target: < $15) ✅
Goal: Sustainable workload, efficient operations
5. Team Health:
Metrics:
- Analyst Turnover: 12% annual (Target: < 15%) ✅
- Analyst Satisfaction: 78% (Target: > 75%) ✅
- Training Hours per Analyst: 40 hours/year (Target: > 32) ✅
- Shift Coverage: 98% (Target: > 95%) ✅
Goal: Healthy, skilled, engaged team
6. Coverage & Visibility:
Metrics:
- Endpoint Coverage: 98% (EDR deployed) (Target: > 95%) ✅
- Log Source Coverage: 85% of critical assets (Target: > 80%) ✅
- Cloud Visibility: 90% (Target: > 85%) ✅
- Network Traffic Visibility: 75% (Target: improve to 85%) ⚠️
Goal: Comprehensive visibility across environment
Balanced Scorecard Example:
SOC Quarterly Scorecard - Q1 2026
Overall Score: 87/100 (Good)
Dimension Scores:
1. Detection Effectiveness: 92/100 ✅ Excellent
2. Response Efficiency: 88/100 ✅ Good
3. Quality & Accuracy: 85/100 ✅ Good
4. Operational Efficiency: 82/100 ✅ Good
5. Team Health: 90/100 ✅ Excellent
6. Coverage & Visibility: 83/100 ⚠️ Needs improvement
Strengths:
- Excellent detection speed (MTTD: 45 min)
- Strong team morale (90% satisfaction)
- Low FP rate (18%)
Weaknesses:
- Network visibility gaps (75% vs 85% target)
- MTTR slightly high (65 min vs 60 min target)
Action Items:
1. Deploy network TAPs to improve visibility (Q2 2026)
2. Optimize ransomware response playbook to reduce MTTR
3. Maintain current detection and team health trends
Why Balanced Scorecard Matters:
Pitfall: Over-Optimizing Single Metric
Scenario 1: Optimize only for MTTA
- Action: Set auto-acknowledge on all alerts
- Result: MTTA = 1 second ✅
- Side Effect: No actual investigation, missed threats ❌
Scenario 2: Optimize only for FP rate
- Action: Set very high alert threshold (only 100% confidence)
- Result: FP rate = 0% ✅
- Side Effect: Recall drops to 30%, miss most threats ❌
Scenario 3: Optimize only for alert volume reduction
- Action: Disable all noisy rules
- Result: 50 alerts/day (very manageable) ✅
- Side Effect: Detection coverage drops to 40% ❌
Balanced Approach:
Balanced scorecard prevents gaming:
- Can't reduce FP rate by reducing detection coverage
- Can't improve MTTA by skipping investigation
- Must balance competing priorities
Example:
- Reduce FP rate: 30% → 18% ✅
- Maintain Detection Coverage: 85% → 85% ✅
- Maintain MTTD: 45 min → 45 min ✅
- Result: Genuine improvement without sacrificing other areas
Scorecard Reporting:
Executive Dashboard (Monthly):
- Overall SOC Health: 87/100 (↑ 5 points from last month)
- Detection: 92/100 ✅
- Response: 88/100 ✅
- Quality: 85/100 ✅
- Efficiency: 82/100 ✅
- Team: 90/100 ✅
- Coverage: 83/100 ⚠️
Narrative:
"SOC performance improved this quarter driven by detection rule tuning (FP rate down 12%) and SOAR deployment (MTTR down 20%). Network visibility remains below target; addressing with TAP deployment in Q2."
Question 11: What is detection coverage and how is it measured?
A) Physical coverage of security cameras B) Percentage of MITRE ATT&CK techniques and tactics that the SOC has detection rules for C) Network bandwidth D) Coverage is not measurable
Answer
Correct Answer: B) Percentage of MITRE ATT&CK techniques the SOC has detection rules for
Explanation:
Detection Coverage: - Definition: Proportion of attack techniques that SOC can detect - Framework: Typically measured against MITRE ATT&CK - Goal: Maximize coverage to reduce blind spots
Measuring Detection Coverage:
ATT&CK-Based Coverage:
MITRE ATT&CK:
- Total Techniques (Enterprise): ~200 techniques
- Your Detection Rules: 165 rules mapped to techniques
- Unique Techniques Covered: 140
- Coverage: 140 / 200 = 70%
Coverage by Tactic:
Initial Access (9 techniques):
- Covered: 7 (78%)
- Gaps: T1200 (Hardware Additions), T1091 (Replication Through Removable Media)
Execution (12 techniques):
- Covered: 11 (92%)
- Gap: T1059.008 (Network Device CLI)
Persistence (19 techniques):
- Covered: 14 (74%)
- Gaps: 5 techniques
Privilege Escalation (13 techniques):
- Covered: 10 (77%)
... (for all 14 tactics)
Overall Coverage: 85%
Coverage Heat Map:
MITRE ATT&CK Navigator:
- Green: Detected (high confidence)
- Yellow: Partially detected (behavioral only)
- Red: No detection
Example:
T1003.001 (LSASS Dumping): Green (detected by EDR + SIEM)
T1055 (Process Injection): Yellow (partial behavioral detection)
T1574 (DLL Hijacking): Red (no coverage - blind spot!)
Coverage Quality Levels:
Level 1: No Detection (0 points)
- No rule exists
Level 2: Theoretical Detection (1 point)
- Rule exists but never tested
Level 3: Tested Detection (2 points)
- Rule tested in lab, not production-validated
Level 4: Production-Validated (3 points)
- Detected real attack or purple team exercise
Level 5: Auto-Response (4 points)
- Detection + automated containment
Weighted Coverage = (Total Points) / (Max Possible Points)
Example Coverage Calculation:
Technique T1003.001 (LSASS Dumping):
- Detection Rule: Yes (SIEM + EDR)
- Tested: Yes (purple team exercise)
- Production Validated: Yes (detected real attack)
- Auto-Response: No (requires approval gate)
- Score: 3/4 points (75%)
Technique T1059.001 (PowerShell):
- Detection Rule: Yes
- Tested: Yes
- Production Validated: Yes
- Auto-Response: Yes (SOAR kills suspicious PowerShell)
- Score: 4/4 points (100%)
Technique T1574 (DLL Hijacking):
- Detection Rule: No
- Score: 0/4 points (0%)
Overall Coverage = Average of all technique scores
Coverage Gaps Analysis:
Priority 1 Gaps (High Risk, No Coverage):
- T1003.001 (LSASS Dumping) - No detection ← Deploy EDR monitoring
- T1021.001 (RDP) - Limited detection ← Enhance logging
Priority 2 Gaps (Medium Risk):
- T1055 (Process Injection) - Partial detection ← Improve behavioral rules
Priority 3 Gaps (Low Risk):
- T1200 (Hardware Additions) - Not applicable (data center controls)
Improving Coverage:
1. Map Existing Rules to ATT&CK:
- Inventory all detection rules
- Tag with ATT&CK technique IDs
- Identify covered techniques
2. Identify Gaps:
- Techniques with 0 rules = blind spots
- Prioritize by threat intel (what attackers actually use)
3. Deploy New Detections:
- Purple team: Test new rules
- Validate in production
- Update coverage score
4. Continuous Improvement:
- Quarterly coverage review
- New ATT&CK techniques → Assess coverage
- Emerging threats → Add detections
Coverage Reporting:
SOC Coverage Report - Q1 2026
Overall Coverage: 82% (164/200 techniques)
By Tactic:
- Initial Access: 78%
- Execution: 92% ✅
- Persistence: 74% ⚠️
- Privilege Escalation: 77%
- Defense Evasion: 68% ⚠️ (priority for improvement)
- Credential Access: 85%
- Discovery: 90% ✅
- Lateral Movement: 88% ✅
- Collection: 75%
- Exfiltration: 80%
- Command & Control: 86%
- Impact: 95% ✅
Top 5 Priority Gaps:
1. T1027 (Obfuscated Files) - No detection
2. T1055 (Process Injection) - Weak detection
3. T1070 (Indicator Removal) - No detection
4. T1078 (Valid Accounts) - Limited detection
5. T1497 (Virtualization/Sandbox Evasion) - No detection
Action Plan:
- Deploy T1027 detection: Entropy analysis (Q2)
- Improve T1055: Enhanced EDR behavioral rules (Q2)
- Add T1070: File deletion monitoring (Q3)
Question 12: What are leading vs lagging indicators in SOC metrics?
A) Leading indicators predict future performance, lagging indicators measure past performance B) They are the same thing C) Leading indicators are for executives, lagging for analysts D) Indicators don't matter
Answer
Correct Answer: A) Leading indicators predict future, lagging indicators measure past
Explanation:
Leading Indicators (Predictive): - Definition: Metrics that predict future SOC performance - Purpose: Early warning of problems - Actionable: Can intervene before issues escalate
Lagging Indicators (Historical): - Definition: Metrics that measure past performance - Purpose: Assess what happened - Less Actionable: Can't change past, but can learn from it
SOC Leading Indicators:
1. Alert Volume Trend:
Leading Indicator: Alert volume increasing 20%/week
Prediction: Analysts will be overwhelmed in 2-3 weeks
Action: Tune high-volume rules NOW before fatigue sets in
Example:
Week 1: 300 alerts/day
Week 2: 360 alerts/day (+20%)
Week 3: 432 alerts/day (+20%)
Week 4 Projection: 518 alerts/day (unsustainable)
Preventive Action: Tune rules in Week 2 to prevent Week 4 crisis
2. False Positive Rate Increasing:
Leading Indicator: FP rate rising (15% → 25%)
Prediction: Alert fatigue, analyst turnover
Action: Tune detection rules before fatigue sets in
3. MTTA Increasing:
Leading Indicator: MTTA rising (10 min → 20 min over 2 weeks)
Prediction: Analysts overwhelmed or fatigued
Action: Investigate workload, add staffing, or reduce alert volume
4. Detection Coverage Gaps:
Leading Indicator: 30% of ATT&CK techniques not covered
Prediction: Blind spots will be exploited
Action: Prioritize closing critical gaps before attack
5. Analyst Satisfaction Declining:
Leading Indicator: Quarterly survey shows satisfaction drop (85% → 70%)
Prediction: Turnover risk in 6-12 months
Action: Address morale issues, reduce alert fatigue
6. Threat Intel: Emerging Campaign:
Leading Indicator: New ransomware campaign targeting your industry
Prediction: Attack likely incoming
Action: Deploy detections, patch vulnerabilities NOW
SOC Lagging Indicators:
1. Mean Time to Detect (MTTD):
Lagging Indicator: MTTD = 2 hours (measures past incidents)
Insight: Historical detection speed
Limitation: Can't change past incidents
Use: Benchmark, trend analysis, goal setting
2. Incident Count:
Lagging Indicator: 15 confirmed incidents last month
Insight: Historical attack volume
Limitation: Reactive (incidents already happened)
Use: Assess security posture, budget justification
3. Dwell Time:
Lagging Indicator: Average dwell time 14 days
Insight: How long attackers went undetected in past incidents
Limitation: Only known after incident resolved
Use: Demonstrate need for better detection
4. Breach Cost:
Lagging Indicator: Ransomware incident cost $500K
Insight: Financial impact of past breach
Limitation: Too late to prevent this breach
Use: Justify SOC budget increases
5. False Negative Rate:
Lagging Indicator: Missed 5 incidents (discovered via threat hunting)
Insight: Historical detection gaps
Limitation: Incidents already occurred
Use: Improve detections for future
Using Leading & Lagging Together:
Example: Preventing Alert Fatigue
Leading Indicators (Predictive):
- Alert volume: ↑ 30% month-over-month
- MTTA: ↑ from 10 min to 18 min
- FP rate: ↑ from 15% to 28%
- Analyst survey: Satisfaction ↓ from 85% to 72%
Prediction: Alert fatigue crisis in 4-6 weeks
Preventive Actions:
1. Tune high-volume rules (reduce 30% alert volume)
2. Deploy ML scoring (prioritize high-confidence alerts)
3. SOAR auto-enrichment (reduce triage time)
4. Add temp analyst coverage (reduce workload)
Result: Crisis averted before burnout/turnover
Lagging Indicators (Validation):
After 1 month:
- Alert volume: ↓ to 280/day (28% reduction) ✅
- MTTA: ↓ to 12 min ✅
- FP rate: ↓ to 18% ✅
- Analyst satisfaction: ↑ to 80% ✅
Lagging indicators confirm leading indicator-driven actions worked
Balanced Dashboard:
Leading Indicators (Left Column):
- Alert Volume Trend: ↑ 15% ⚠️ (watch closely)
- FP Rate Trend: ↓ 5% ✅ (improvement)
- MTTA Trend: Stable ✅
- Coverage Gaps: 18% ⚠️ (close gaps)
- Analyst Morale: 82% ✅
Lagging Indicators (Right Column):
- MTTD Last Month: 45 min ✅
- Incidents Detected: 12 ✅
- Incidents Missed: 1 ✅
- MTTR Last Month: 75 min ✅
Actionable Insights:
- Alert volume rising → Tune rules this week
- Coverage gaps → Prioritize 5 critical techniques
- Overall performance good (lagging indicators green)
Reference: Chapter 11, Section 11.12 - Leading vs Lagging Indicators
Score Interpretation¶
- 10-12 correct: Excellent! You have strong command of SOC and ML performance metrics.
- 7-9 correct: Good understanding. Review confusion matrix and ROC/AUC concepts.
- 4-6 correct: Adequate baseline. Focus on MTTD/MTTA/MTTR and precision/recall trade-offs.
- Below 4: Review Chapter 11 thoroughly, especially core SOC metrics and ML evaluation.