Chapter 11: Evaluation & Metrics - Quiz¶

Instructions¶

Test your knowledge of SOC metrics (MTTD/MTTA/MTTR/MTTC), ML performance metrics (precision/recall/F1/ROC/AUC), confusion matrices, balanced scorecards, coverage metrics, alert fatigue, and dwell time.

Question 1: What does MTTD (Mean Time to Detect) measure?

A) Time from incident creation to analyst acknowledgment B) Average time from when an incident occurs to when it is detected by security controls C) Time to resolve an incident D) Time to deploy detection rules

Answer

Correct Answer: B) Average time from when an incident occurs to when it is detected

Explanation:

MTTD (Mean Time to Detect): - Definition: Average time between incident occurrence and detection - Start: Attacker action begins - End: Security control generates alert - Goal: Minimize detection lag

Example Calculation:

Incident 1:
- Attacker starts: 10:00 AM
- EDR detects malware: 10:05 AM
- MTTD: 5 minutes

Incident 2:
- Phishing email sent: 2:00 PM
- User reports: 2:30 PM
- MTTD: 30 minutes

Incident 3:
- Lateral movement: 3:00 PM
- SIEM alerts: 3:45 PM
- MTTD: 45 minutes

Average MTTD = (5 + 30 + 45) / 3 = 26.7 minutes

Industry Benchmarks: - Excellent: < 1 hour - Good: 1-4 hours - Average: 4-24 hours - Poor: > 24 hours

Factors Affecting MTTD: - Detection Coverage: More sensors → Faster detection - Rule Quality: Well-tuned rules → Fewer blind spots - Threat Intelligence: IOC matching → Immediate detection - Behavioral Analytics: UEBA → Detects novel attacks

Improving MTTD:

1. Deploy EDR on all endpoints (real-time detection)
2. Integrate threat intelligence feeds (instant IOC matching)
3. Implement UEBA (detect anomalies, not just signatures)
4. Reduce alert noise (analysts focus on real threats)
5. Automate enrichment (faster validation)

MTTD vs Dwell Time: - MTTD: Detection lag only - Dwell Time: Full time attacker remains undetected (includes MTTD + investigation + containment)

Reference: Chapter 11, Section 11.1 - MTTD

Question 2: What does MTTA (Mean Time to Acknowledge) measure?

A) Time from incident occurrence to detection B) Average time from alert generation to analyst acknowledgment C) Time to contain an incident D) Time to eradicate threats

Answer

Correct Answer: B) Average time from alert generation to analyst acknowledgment

Explanation:

MTTA (Mean Time to Acknowledge): - Definition: Average time from alert firing to analyst acknowledgment - Start: SIEM/EDR generates alert - End: Analyst opens/acknowledges alert - Goal: Minimize alert queue time

Example Calculation:

Alert 1:
- Alert fires: 10:05 AM
- Analyst acknowledges: 10:12 AM
- MTTA: 7 minutes

Alert 2:
- Alert fires: 11:00 AM
- Analyst acknowledges: 11:25 AM
- MTTA: 25 minutes

Alert 3:
- Alert fires: 2:00 PM
- Analyst acknowledges: 2:03 PM
- MTTA: 3 minutes

Average MTTA = (7 + 25 + 3) / 3 = 11.7 minutes

Industry Benchmarks: - Excellent: < 5 minutes - Good: 5-15 minutes - Average: 15-30 minutes - Poor: > 30 minutes

Factors Affecting MTTA: - Alert Volume: 500 alerts/day → Longer queue times - Prioritization: Auto-scoring → High-priority alerts acknowledged faster - Staffing: 24/7 coverage → Faster acknowledgment - Alert Fatigue: High FP rate → Analysts desensitized, slower response

Improving MTTA:

1. Alert Prioritization: ML scoring → Analysts focus on high-risk alerts first
2. Reduce False Positives: Tuning → Less alert noise
3. Auto-Enrichment: SOAR → Alerts pre-enriched, faster triage
4. Adequate Staffing: 24/7 coverage → No queue buildup
5. Dashboards: Real-time alert queue visibility

MTTA Impact:

Scenario 1: Long MTTA (45 minutes)
- Ransomware alert fires at 10:00 AM
- Analyst acknowledges at 10:45 AM
- By then: Ransomware has encrypted 50 systems
- Impact: High damage

Scenario 2: Short MTTA (3 minutes)
- Ransomware alert fires at 10:00 AM
- Analyst acknowledges at 10:03 AM
- Immediate containment: Only 2 systems affected
- Impact: Minimal damage

Reference: Chapter 11, Section 11.2 - MTTA

Question 3: What does MTTR (Mean Time to Respond) measure?

A) Time to detect an incident B) Average time from detection to incident containment/remediation C) Time to acknowledge an alert D) Time to generate a report

Answer

Correct Answer: B) Average time from detection to incident containment/remediation

Explanation:

MTTR (Mean Time to Respond/Remediate): - Definition: Average time from detection to containment/resolution - Start: Alert acknowledged (investigation begins) - End: Incident contained/remediated - Goal: Minimize damage window

Example Calculation:

Incident 1:
- Alert acknowledged: 10:05 AM
- Incident contained: 11:00 AM
- MTTR: 55 minutes

Incident 2:
- Alert acknowledged: 2:00 PM
- Incident contained: 4:30 PM
- MTTR: 150 minutes

Incident 3:
- Alert acknowledged: 9:00 PM
- Incident contained: 9:20 PM
- MTTR: 20 minutes

Average MTTR = (55 + 150 + 20) / 3 = 75 minutes

Industry Benchmarks: - Excellent: < 1 hour - Good: 1-4 hours - Average: 4-24 hours - Poor: > 24 hours

Factors Affecting MTTR: - Runbooks: Clear procedures → Faster response - Automation: SOAR playbooks → Immediate containment - Analyst Skill: Experienced analysts → Efficient investigation - Tool Integration: Seamless EDR/firewall integration → Quick actions

MTTR Phases:

Phase 1: Investigation (30% of MTTR)
- Scope incident (how many systems affected?)
- Root cause analysis

Phase 2: Containment (40% of MTTR)
- Isolate affected systems
- Disable compromised accounts
- Block malicious IPs

Phase 3: Eradication (20% of MTTR)
- Remove malware
- Close backdoors
- Patch vulnerabilities

Phase 4: Validation (10% of MTTR)
- Verify threat eliminated
- Confirm systems clean

Improving MTTR:

1. Playbooks/Runbooks: Documented procedures reduce decision time
2. SOAR Automation: Auto-isolation, auto-blocking → Faster containment
3. Analyst Training: Skill development → Efficient investigation
4. Tool Integration: One-click containment (EDR isolation from SIEM)
5. Pre-Approved Actions: Pre-authorize common responses (block known-bad IPs)

MTTR Impact Example:

Ransomware Incident:
Long MTTR (6 hours):
- 10:00 AM: Ransomware detected
- 4:00 PM: Contained
- Damage: 200 systems encrypted, $2M recovery cost

Short MTTR (30 minutes):
- 10:00 AM: Ransomware detected
- 10:30 AM: Contained
- Damage: 5 systems encrypted, $50K recovery cost

ROI of MTTR Reduction: $1.95M saved

Reference: Chapter 11, Section 11.3 - MTTR

Question 4: What does MTTC (Mean Time to Contain) measure?

A) Time to create alerts B) Average time from detection to when the threat is fully contained (stopped from spreading) C) Time to close tickets D) Time to communicate with stakeholders

Answer

Correct Answer: B) Average time from detection to when threat is fully contained

Explanation:

MTTC (Mean Time to Contain): - Definition: Average time from detection to containment (stopping threat spread) - Start: Alert detected - End: Threat contained (can't spread further) - Difference from MTTR: MTTC = containment only; MTTR = containment + eradication + recovery

Example:

Ransomware Incident:
- 10:00 AM: Ransomware detected on WKS-001
- 10:15 AM: Analyst investigates, finds lateral movement to FILE-SRV
- 10:30 AM: Both systems isolated (network disconnected)
- MTTC: 30 minutes (threat contained, cannot spread)

- 11:00 AM: Malware removed (eradication)
- 12:00 PM: Systems restored from backup (recovery)
- MTTR: 2 hours (full resolution)

MTTC (30 min) < MTTR (2 hours)

Why MTTC Matters:

Critical Insight: Containment prevents additional damage

Fast Containment (MTTC 15 min):
- Ransomware on 1 system
- Isolated before spread
- Impact: 1 system

Slow Containment (MTTC 2 hours):
- Ransomware spreads to 50 systems
- Impact: 50 systems

MTTC is often more important than MTTR for limiting blast radius

Containment Strategies:

1. Network Isolation
- Disconnect from network (EDR, firewall ACL)
- MTTC: < 5 minutes (automated)

2. Account Disabling
- Disable compromised AD account
- MTTC: < 10 minutes

3. IP Blocking
- Block C2 IP at firewall/proxy
- MTTC: < 5 minutes (automated)

4. Process Termination
- Kill malicious process via EDR
- MTTC: < 1 minute (automated)

MTTC Automation:

SOAR Playbook: Ransomware Auto-Containment
Trigger: EDR ransomware alert
Actions:
1. Isolate host (0-2 minutes) ← CONTAINMENT
2. Disable user account (2-3 minutes) ← CONTAINMENT
3. Block C2 IPs (3-4 minutes) ← CONTAINMENT
4. Create IR ticket (4-5 minutes)
5. Notify analyst (5 minutes)

Automated MTTC: 5 minutes (vs. 30-60 minutes manual)

Industry Benchmarks: - Excellent: < 15 minutes - Good: 15-60 minutes - Average: 1-4 hours - Poor: > 4 hours

Reference: Chapter 11, Section 11.4 - MTTC

Question 5: What is a confusion matrix and what does it show?

A) A matrix that confuses analysts B) A table showing True Positives, False Positives, True Negatives, and False Negatives for ML model evaluation C) A network routing table D) A compliance checklist

Answer

Correct Answer: B) A table showing TP, FP, TN, FN for ML model evaluation

Explanation:

Confusion Matrix: - Purpose: Visualize ML classification performance - Structure: 2x2 table (for binary classification) - Contents: Actual vs Predicted class counts

Confusion Matrix Structure:

                Predicted MALWARE    Predicted BENIGN
Actually
MALWARE              TP = 90              FN = 10           (100 total malware)

Actually
BENIGN               FP = 20              TN = 880          (900 total benign)

Total:              110 predicted       890 predicted
                    malware             benign

Definitions: - TP (True Positive): Correctly predicted malware (90) - FP (False Positive): Benign incorrectly predicted as malware (20) - FN (False Negative): Malware incorrectly predicted as benign (10) - TN (True Negative): Correctly predicted benign (880)

Derived Metrics:

Accuracy = (TP + TN) / Total
         = (90 + 880) / 1000 = 97%

Precision = TP / (TP + FP)
          = 90 / (90 + 20) = 81.8%
          "Of predicted malware, how many were actually malware?"

Recall = TP / (TP + FN)
       = 90 / (90 + 10) = 90%
       "Of actual malware, how many did we detect?"

F1 Score = 2 × (Precision × Recall) / (Precision + Recall)
         = 2 × (0.818 × 0.90) / (0.818 + 0.90) = 0.857 = 85.7%

False Positive Rate = FP / (FP + TN)
                    = 20 / (20 + 880) = 2.2%

False Negative Rate = FN / (TP + FN)
                    = 10 / (90 + 10) = 10%

SOC Example: Phishing Detection Model

Test Set: 500 emails (100 phishing, 400 legitimate)

Confusion Matrix:
                Predicted PHISHING    Predicted LEGITIMATE
Actually
PHISHING            TP = 92              FN = 8             (100)

Actually
LEGITIMATE          FP = 15              TN = 385           (400)

Metrics:
- Accuracy: (92 + 385) / 500 = 95.4%
- Precision: 92 / (92 + 15) = 86%
- Recall: 92 / (92 + 8) = 92%
- F1: 88.9%

Interpretation:
- 92% of phishing emails detected (good recall)
- 86% of phishing predictions are correct (good precision)
- 8 phishing emails missed (FN) - need improvement
- 15 legitimate emails flagged as phishing (FP) - acceptable

Multi-Class Confusion Matrix:

Alert Severity Classification (Low/Medium/High/Critical)

                Predicted   Predicted   Predicted   Predicted
                Low         Medium      High        Critical
Actually Low    850         40          5           0
Actually Medium 30          180         15          0
Actually High   5           25          65          5
Actually Crit.  0           2           8           40

Shows: Model confuses High and Critical (8 Critical predicted as High)
Action: Retrain with more Critical examples

Using Confusion Matrix:

1. Identify Weaknesses:
   - High FN? Model missing threats → Improve recall
   - High FP? Too many false alarms → Improve precision

2. Tune Thresholds:
   - Lower threshold → More detections (higher recall, lower precision)
   - Raise threshold → Fewer false alarms (higher precision, lower recall)

3. Compare Models:
   - Model A: Precision 90%, Recall 70%
   - Model B: Precision 80%, Recall 90%
   - Choose based on priority (FP vs FN tolerance)

Reference: Chapter 11, Section 11.5 - Confusion Matrix

Question 6: What is the difference between precision and recall, and when would you prioritize each?

A) Precision and recall are the same B) Precision = accuracy of positive predictions, Recall = coverage of actual positives. Prioritize precision to reduce FPs, recall to reduce FNs C) Precision is always more important D) Only accuracy matters

Answer

Correct Answer: B) Precision = accuracy of positives, Recall = coverage. Prioritize precision for FPs, recall for FNs

Explanation:

Precision: - Formula: Precision = TP / (TP + FP) - Question: "Of all predicted positives, how many were correct?" - Focus: Minimizing false positives - Trade-off: Can miss threats (low recall) to avoid false alarms

Recall (Sensitivity): - Formula: Recall = TP / (TP + FN) - Question: "Of all actual positives, how many did we detect?" - Focus: Minimizing false negatives - Trade-off: May have more false alarms (low precision) to catch all threats

When to Prioritize Precision:

Use Case: Auto-Blocking System

Scenario: SOAR auto-blocks IPs flagged by ML model

Priority: HIGH PRECISION (minimize FPs)

Reason:
- False Positive = Blocking legitimate IP → Business disruption
- Cost of FP: High (downtime, customer impact)
- Cost of FN: Medium (analyst can catch manually)

Strategy:
- High confidence threshold (>90%) for auto-blocking
- Precision: 95%, Recall: 70%
- Accept missing some threats to avoid blocking legitimate traffic

Use Case: Alert Generation (Not Auto-Blocking)

Scenario: Model generates alerts for analyst review

Priority: MEDIUM PRECISION, MEDIUM RECALL (balanced)

Reason:
- False Positive = Analyst wastes time (acceptable in moderation)
- False Negative = Missed threat (unacceptable)
- Cost of FP: Low (analyst time)
- Cost of FN: High (breach)

Strategy:
- Moderate threshold (>70%)
- Precision: 80%, Recall: 85%
- Balance FP and FN

When to Prioritize Recall:

Use Case: Critical Infrastructure Protection

Scenario: Nuclear plant malware detection

Priority: HIGH RECALL (catch all threats)

Reason:
- False Negative = Missed malware → Catastrophic failure
- False Positive = Investigation time (acceptable)
- Cost of FN: Catastrophic
- Cost of FP: Low (analyst time)

Strategy:
- Low confidence threshold (>50%)
- Precision: 60%, Recall: 98%
- Accept high FP rate to catch nearly all threats

Use Case: Initial Screening

Scenario: First-stage malware scan before deeper analysis

Priority: HIGH RECALL

Reason:
- Stage 1: Cast wide net (high recall)
- Stage 2: Human analyst reviews flagged items (filters FPs)
- Cost of FN: High (missed malware)
- Cost of FP: Low (analyst reviews)

Strategy:
- Recall: 95%, Precision: 65%
- Send all suspicious items to analyst

Precision vs Recall Trade-off:

Threshold Adjustment:

Threshold: 0.9 (very strict)
- Only flag if 90%+ confident
- Precision: 95% (very few FPs)
- Recall: 60% (miss 40% of threats)
- Use: Auto-blocking, low FP tolerance

Threshold: 0.7 (moderate)
- Flag if 70%+ confident
- Precision: 85%
- Recall: 80%
- Use: Balanced alert generation

Threshold: 0.5 (aggressive)
- Flag if 50%+ confident
- Precision: 70% (many FPs)
- Recall: 95% (catch almost all threats)
- Use: Critical systems, can't afford to miss threats

Real-World Example:

Phishing Email Detection:

Option A: High Precision
- Threshold: 95%
- Precision: 98% (only 2% of flagged emails are legitimate)
- Recall: 75% (miss 25% of phishing)
- Action: Auto-delete flagged emails
- Justification: Can't risk deleting legitimate emails

Option B: High Recall
- Threshold: 60%
- Precision: 70% (30% of flagged emails are legitimate)
- Recall: 95% (catch 95% of phishing)
- Action: Move to "Suspected Phishing" folder for user review
- Justification: User can manually check, better than missing phishing

F1 Score (Balanced Metric):

When you need balance between Precision and Recall:

F1 = 2 × (Precision × Recall) / (Precision + Recall)

Scenario: General malware detection
- Can't afford too many FPs (analyst fatigue)
- Can't afford too many FNs (missed malware)
- Optimize for F1 score (harmonic mean of precision and recall)

Example:
Model A: Precision 90%, Recall 70%, F1 = 78.9%
Model B: Precision 80%, Recall 85%, F1 = 82.4% ← Better balanced

Reference: Chapter 11, Section 11.6 - Precision vs Recall

Question 7: What does ROC (Receiver Operating Characteristic) curve show?

A) Return on investment B) Trade-off between True Positive Rate and False Positive Rate across different thresholds C) Network bandwidth D) Analyst productivity

Answer

Correct Answer: B) Trade-off between TPR and FPR across different thresholds

Explanation:

ROC Curve: - Purpose: Visualize classifier performance across all thresholds - X-axis: False Positive Rate (FPR) - Y-axis: True Positive Rate (TPR / Recall) - Interpretation: Higher curve = better model

ROC Metrics:

True Positive Rate (TPR) = Recall = TP / (TP + FN)
- "What % of actual threats did we detect?"

False Positive Rate (FPR) = FP / (FP + TN)
- "What % of benign items did we incorrectly flag?"

ROC Curve Example:

Malware Detector Confidence Thresholds:

Threshold 0.9 (strict):
- TPR: 60% (detect 60% of malware)
- FPR: 1% (1% of benign files flagged)
- Point: (0.01, 0.60)

Threshold 0.7:
- TPR: 80%
- FPR: 5%
- Point: (0.05, 0.80)

Threshold 0.5:
- TPR: 90%
- FPR: 15%
- Point: (0.15, 0.90)

Threshold 0.3 (aggressive):
- TPR: 98%
- FPR: 40%
- Point: (0.40, 0.98)

Plot these points to create ROC curve

ROC Interpretation:

Perfect Classifier:
- Curve hugs top-left corner
- TPR = 100%, FPR = 0%
- Unrealistic in practice

Good Classifier:
- Curve bows toward top-left
- High TPR, low FPR
- Example: (FPR: 5%, TPR: 90%)

Random Classifier:
- Diagonal line (no better than guessing)
- TPR = FPR
- Example: (FPR: 50%, TPR: 50%)

Bad Classifier:
- Curve below diagonal (worse than random)
- Should invert predictions

AUC (Area Under Curve):

AUC Score: 0 to 1

AUC = 1.0: Perfect classifier
AUC = 0.9-0.99: Excellent
AUC = 0.8-0.89: Good
AUC = 0.7-0.79: Fair
AUC = 0.5: Random (no skill)
AUC < 0.5: Worse than random

Example:
Malware Detector AUC = 0.92 → Excellent performance

Using ROC for Threshold Selection:

Scenario: Choose threshold for malware detection

Option 1: High TPR, High FPR (Threshold 0.3)
- TPR: 98% (catch almost all malware)
- FPR: 40% (high false alarm rate)
- Use Case: Critical systems, can't afford to miss malware
- Trade-off: Analyst investigates many false alarms

Option 2: Medium TPR, Low FPR (Threshold 0.7)
- TPR: 80% (catch most malware)
- FPR: 5% (low false alarm rate)
- Use Case: Balanced approach
- Trade-off: Miss 20% of malware

Option 3: Low TPR, Very Low FPR (Threshold 0.9)
- TPR: 60% (catch only high-confidence malware)
- FPR: 1% (very low false alarms)
- Use Case: Auto-blocking (can't afford FPs)
- Trade-off: Miss 40% of malware

Comparing Models with ROC:

Model A AUC: 0.85
Model B AUC: 0.92
Model C AUC: 0.78

Ranking: Model B > Model A > Model C
Conclusion: Deploy Model B (best overall performance)

ROC Limitations:

1. Class Imbalance:
   - ROC can be optimistic with imbalanced data (rare threats)
   - Precision-Recall curve may be more informative

2. Single Threshold:
   - Shows all thresholds, but you must choose one
   - Business requirements dictate threshold choice

3. Equal Cost Assumption:
   - Assumes FP and FN costs are equal
   - In security, FN (missed threat) often more costly than FP

SOC Example:

UEBA Anomaly Detector:

ROC Analysis:
- AUC: 0.88 (good performance)
- Operating Point: FPR 10%, TPR 85%
  - 10% FP rate = 50 false alerts/day (acceptable)
  - 85% TPR = Detect most insider threats

Decision:
- Deploy at threshold 0.65
- Monitor FP rate weekly
- Retrain if performance degrades

Reference: Chapter 11, Section 11.7 - ROC/AUC

Question 8: What is dwell time and why is it a critical security metric?

A) Time analysts spend at their desks B) Average time an attacker remains undetected in the environment from initial compromise to discovery C) Time to respond to alerts D) Time to generate reports

Answer

Correct Answer: B) Average time attacker remains undetected from compromise to discovery

Explanation:

Dwell Time: - Definition: Time from initial compromise to detection/eradication - Start: Attacker gains initial access - End: Organization detects and removes attacker - Goal: Minimize (shorter dwell time = less damage)

Dwell Time Calculation:

Incident Timeline:
- Feb 1, 10:00 AM: Phishing email clicked (initial access)
- Feb 15, 2:00 PM: Unusual data exfiltration detected
- Feb 15, 4:00 PM: Incident confirmed and attacker ejected

Dwell Time = Feb 15 4:00 PM - Feb 1 10:00 AM = 14 days, 6 hours

Industry Benchmarks:

2024 IBM Cost of Data Breach Report:
- Global Average Dwell Time: 277 days (9+ months!)
- Best-in-Class: < 30 days
- Excellent: < 7 days
- Good: < 24 hours
- Optimal: < 1 hour

Why Dwell Time Matters:

Short Dwell Time (< 1 day):

Impact:
- Limited lateral movement (1-2 systems)
- Minimal data exfiltration
- Ransomware contained before encryption
- Low recovery cost

Example:
- Day 0: Attacker gains access
- Day 0 + 6 hours: EDR detects, SOC contains
- Damage: 1 compromised workstation, no data loss
- Cost: $10,000 (cleanup, user re-imaging)

Long Dwell Time (> 100 days):

Impact:
- Extensive lateral movement (50+ systems)
- Persistent backdoors established
- Significant data exfiltration
- Ransomware encrypts critical systems
- High recovery cost

Example:
- Day 0: Attacker gains access (phishing)
- Day 30: Lateral movement to servers
- Day 60: Persistent backdoors deployed
- Day 90: Credential harvesting, data exfiltration
- Day 120: Ransomware deployed
- Day 120 + 2 hours: Detected (too late)
- Damage: 200 systems encrypted, 500GB data stolen
- Cost: $5,000,000 (ransom, recovery, regulatory fines, lawsuits)

Factors Affecting Dwell Time:

1. Detection Capabilities:

Poor Detection:
- Signature-only (no behavioral analytics)
- No threat intelligence
- Limited log coverage
- Result: Long dwell time (months)

Strong Detection:
- EDR on all endpoints
- UEBA for anomalies
- Threat intelligence integration
- Result: Short dwell time (hours-days)

2. Threat Type:

Commodity Malware:
- Dwell Time: Hours (noisy, easy to detect)
- Example: Mass ransomware campaign

Sophisticated APT:
- Dwell Time: Months (stealthy, living off the land)
- Example: Nation-state espionage

3. Response Speed:

Fast Response (SOAR automation):
- Alert → Auto-containment: 5 minutes
- Dwell Time: < 1 hour

Slow Response (manual):
- Alert → Analyst triage: 2 hours
- Triage → Escalation: 4 hours
- Investigation → Containment: 12 hours
- Dwell Time: 18+ hours

Reducing Dwell Time:

1. Improve MTTD (Mean Time to Detect):
   - Deploy EDR, UEBA, threat intelligence
   - Reduce detection lag from months to minutes

2. Improve MTTA (Mean Time to Acknowledge):
   - Auto-prioritization
   - 24/7 SOC coverage
   - Reduce alert queue time

3. Improve MTTR (Mean Time to Respond):
   - SOAR automation
   - Clear runbooks
   - Reduce containment time

Formula: Dwell Time = MTTD + MTTA + MTTR (investigation + containment)

Dwell Time vs. Time-to-Objectives:

Attacker Objectives Timeline:
- Hour 1: Initial access (phishing)
- Hour 2: Establish persistence
- Hour 6: Lateral movement begins
- Day 1: Domain admin credentials stolen
- Day 3: Data exfiltration starts
- Day 7: Ransomware deployed

Defender Goal: Detect and contain before key objectives
- Detect within Hour 1 → Prevent persistence
- Detect within Day 1 → Prevent data exfil
- Detect after Day 7 → Already lost (ransomware deployed)

Measuring Dwell Time:

Challenge: Hard to measure until breach is detected

Approaches:
1. Post-Incident Analysis:
   - Forensics determines initial compromise date
   - Calculate: (Detection date - Compromise date)

2. Purple Team Exercises:
   - Red team simulates attack
   - Measure how long until blue team detects
   - Realistic dwell time estimate

3. Indicators:
   - MTTD trends (if MTTD decreasing, dwell time likely decreasing)
   - Threat hunting findings (proactive detection reduces dwell time)

Reference: Chapter 11, Section 11.8 - Dwell Time

Question 9: What is alert fatigue and how do you measure it?

A) Analysts being tired from working long hours B) Desensitization from excessive alerts (especially false positives), measured by response times, closure rates, and alert volume C) Fatigued hardware D) Alert fatigue is not measurable

Answer

Correct Answer: B) Desensitization from excessive alerts, measured by response times, closure rates, and alert volume

Explanation:

Alert Fatigue: - Definition: Desensitization to alerts due to high volume and false positive rate - Cause: Too many alerts, too many false positives - Impact: Analysts ignore/delay critical alerts, miss real threats

Alert Fatigue Metrics:

1. Alert Volume:

Alerts per day:
- < 100: Manageable
- 100-300: Moderate (monitor FP rate)
- 300-500: High (tuning needed)
- > 500: Critical (severe fatigue risk)

Example:
SOC receives 800 alerts/day
- 3 analysts on shift
- 267 alerts/analyst/day
- 8-hour shift: 33 alerts/hour = 1 alert every 2 minutes
- Result: No time for deep investigation, alert fatigue

2. False Positive Rate:

FP Rate = False Positives / Total Alerts

Thresholds:
- < 10%: Excellent
- 10-20%: Acceptable
- 20-40%: Concerning
- > 40%: Critical (severe fatigue)

Example:
500 alerts/day, 60% FP rate
- 300 alerts/day are false positives
- Analysts waste time on noise
- Real threats buried in noise

3. Mean Time to Acknowledge (MTTA) Trend:

Alert Fatigue Indicator: MTTA increasing over time

Week 1: MTTA = 5 minutes
Week 4: MTTA = 12 minutes
Week 8: MTTA = 25 minutes ← Alert fatigue

Cause: Analysts deprioritizing alerts (assume FP)

4. Alert Closure Rate:

Closure Rate = Closed Alerts / Total Alerts

Fatigue Indicator: Low closure rate

Healthy: 95% closed (analysts investigate all)
Fatigued: 70% closed (analysts ignore 30%)

Example:
500 alerts generated
350 alerts closed (70%)
150 alerts ignored/aged out (30%) ← Potential real threats missed

5. Analyst Turnover:

Fatigue Impact: Burnout, high turnover

Healthy: < 10% annual turnover
Concerning: 20-30% turnover
Critical: > 30% turnover

Exit Interview Themes:
- "Too many alerts"
- "Can't keep up"
- "Work feels meaningless (all false positives)"

6. Time-to-Triage Trend:

Average time spent per alert:

No Fatigue: 10 minutes (thorough investigation)
Moderate Fatigue: 5 minutes (rushed triage)
Severe Fatigue: 2 minutes (rubber-stamping, minimal investigation)

Risk: Cursory review misses nuanced threats

Alert Fatigue Impact:

Scenario: Critical Alert Missed

Environment:
- 600 alerts/day, 50% FP rate
- Analysts fatigued, MTTA = 30 minutes

Incident:
- 10:00 AM: Critical ransomware alert fires
- 10:30 AM: Analyst acknowledges (delayed due to queue)
- Analyst assumes FP (based on high FP rate), closes without investigation
- 12:00 PM: Ransomware encrypts 50 systems

Root Cause: Alert fatigue → Critical alert dismissed as FP

Reducing Alert Fatigue:

1. Tune Detection Rules:

Before: Generic "failed login" rule
- 400 alerts/day (80% FP - service account password rotation lag)

After: Tuned rule
- Exclude known service accounts
- Alert only on > 10 failures in 5 minutes
- 50 alerts/day (20% FP)

Impact: 87.5% alert reduction

2. Alert Prioritization:

ML Alert Scoring:
- Critical (score > 90): 10 alerts/day → Immediate investigation
- High (70-90): 30 alerts/day → Investigate same day
- Medium (50-70): 100 alerts/day → Review if time permits
- Low (< 50): 200 alerts/day → Auto-close or log only

Analyst Focus: Top 40 high-priority alerts (vs. 340 total)
Result: Reduced fatigue, higher-quality investigations

3. Automation (SOAR):

Auto-Close False Positives:
- Legitimate service account failures → Auto-close
- Known-good IPs flagged → Auto-close
- Result: 40% alert reduction

Auto-Enrichment:
- Threat intel, user context pre-populated
- Analyst spends 2 min vs 10 min per alert

4. Deduplication:

Before: 50 alerts for same incident (1 malware on 1 system)
After: 1 aggregated incident ticket
Result: 98% noise reduction

5. Threshold Tuning:

Data Access Alert:
Before: Alert on > 10 files/hour (500 alerts/day, 70% FP)
After: Alert on > 100 files/hour (50 alerts/day, 20% FP)
Result: 90% reduction, lower FP rate

Monitoring Alert Fatigue Dashboard:

Weekly SOC Health Metrics:

Alert Volume:
- This Week: 2,450 alerts
- Last Week: 2,100 alerts
- Trend: ↑ 16% (investigate spike)

False Positive Rate:
- This Week: 35%
- Last Week: 30%
- Trend: ↑ (tune rules)

MTTA:
- This Week: 18 minutes
- Last Week: 12 minutes
- Trend: ↑ (alert fatigue indicator)

Closure Rate:
- This Week: 82%
- Last Week: 88%
- Trend: ↓ (analysts overwhelmed)

Action Required: Tune high-volume, high-FP rules to reduce fatigue

Reference: Chapter 11, Section 11.9 - Alert Fatigue

Question 10: What is a balanced scorecard in SOC metrics?

A) A physical balance scale B) A framework measuring SOC performance across multiple dimensions (detection, response, efficiency, quality) not just single metrics C) A financial report D) Balanced scorecards are not used in SOCs

Answer

Correct Answer: B) Framework measuring performance across multiple dimensions (detection, response, efficiency, quality)

Explanation:

Balanced Scorecard: - Purpose: Holistic SOC performance measurement - Approach: Multiple categories, not just one metric - Benefit: Prevents over-optimization of single metric at expense of others

SOC Balanced Scorecard Dimensions:

1. Detection Effectiveness:

Metrics:
- Mean Time to Detect (MTTD): 45 minutes (Target: < 1 hour) ✅
- Detection Coverage: 85% of MITRE ATT&CK (Target: > 80%) ✅
- Threat Hunting Findings: 12/quarter (Target: > 10) ✅
- UEBA Anomaly Detection Rate: 15/month

Goal: Catching threats quickly and comprehensively

2. Response Efficiency:

Metrics:
- Mean Time to Acknowledge (MTTA): 8 minutes (Target: < 10 min) ✅
- Mean Time to Respond (MTTR): 65 minutes (Target: < 2 hours) ✅
- Mean Time to Contain (MTTC): 20 minutes (Target: < 30 min) ✅
- Incident Escalation Rate: 15% (Target: 10-20%) ✅

Goal: Responding quickly and efficiently

3. Quality & Accuracy:

Metrics:
- False Positive Rate: 18% (Target: < 20%) ✅
- True Positive Rate: 88% (Target: > 85%) ✅
- Alert Precision: 82% (Target: > 80%) ✅
- Missed Incidents (False Negatives): 2/quarter (Target: < 5) ✅

Goal: Accurate detections, minimal noise

4. Operational Efficiency:

Metrics:
- Alerts per Analyst per Day: 45 (Target: < 50) ✅
- Alert Handling Capacity: 300 alerts/day (team of 5)
- Automation Rate: 60% (Target: > 50%) ✅
- Cost per Alert Processed: $12 (Target: < $15) ✅

Goal: Sustainable workload, efficient operations

5. Team Health:

Metrics:
- Analyst Turnover: 12% annual (Target: < 15%) ✅
- Analyst Satisfaction: 78% (Target: > 75%) ✅
- Training Hours per Analyst: 40 hours/year (Target: > 32) ✅
- Shift Coverage: 98% (Target: > 95%) ✅

Goal: Healthy, skilled, engaged team

6. Coverage & Visibility:

Metrics:
- Endpoint Coverage: 98% (EDR deployed) (Target: > 95%) ✅
- Log Source Coverage: 85% of critical assets (Target: > 80%) ✅
- Cloud Visibility: 90% (Target: > 85%) ✅
- Network Traffic Visibility: 75% (Target: improve to 85%) ⚠️

Goal: Comprehensive visibility across environment

Balanced Scorecard Example:

SOC Quarterly Scorecard - Q1 2026

Overall Score: 87/100 (Good)

Dimension Scores:
1. Detection Effectiveness: 92/100 ✅ Excellent
2. Response Efficiency: 88/100 ✅ Good
3. Quality & Accuracy: 85/100 ✅ Good
4. Operational Efficiency: 82/100 ✅ Good
5. Team Health: 90/100 ✅ Excellent
6. Coverage & Visibility: 83/100 ⚠️ Needs improvement

Strengths:
- Excellent detection speed (MTTD: 45 min)
- Strong team morale (90% satisfaction)
- Low FP rate (18%)

Weaknesses:
- Network visibility gaps (75% vs 85% target)
- MTTR slightly high (65 min vs 60 min target)

Action Items:
1. Deploy network TAPs to improve visibility (Q2 2026)
2. Optimize ransomware response playbook to reduce MTTR
3. Maintain current detection and team health trends

Why Balanced Scorecard Matters:

Pitfall: Over-Optimizing Single Metric

Scenario 1: Optimize only for MTTA
- Action: Set auto-acknowledge on all alerts
- Result: MTTA = 1 second ✅
- Side Effect: No actual investigation, missed threats ❌

Scenario 2: Optimize only for FP rate
- Action: Set very high alert threshold (only 100% confidence)
- Result: FP rate = 0% ✅
- Side Effect: Recall drops to 30%, miss most threats ❌

Scenario 3: Optimize only for alert volume reduction
- Action: Disable all noisy rules
- Result: 50 alerts/day (very manageable) ✅
- Side Effect: Detection coverage drops to 40% ❌

Balanced Approach:

Balanced scorecard prevents gaming:
- Can't reduce FP rate by reducing detection coverage
- Can't improve MTTA by skipping investigation
- Must balance competing priorities

Example:
- Reduce FP rate: 30% → 18% ✅
- Maintain Detection Coverage: 85% → 85% ✅
- Maintain MTTD: 45 min → 45 min ✅
- Result: Genuine improvement without sacrificing other areas

Scorecard Reporting:

Executive Dashboard (Monthly):
- Overall SOC Health: 87/100 (↑ 5 points from last month)
- Detection: 92/100 ✅
- Response: 88/100 ✅
- Quality: 85/100 ✅
- Efficiency: 82/100 ✅
- Team: 90/100 ✅
- Coverage: 83/100 ⚠️

Narrative:
"SOC performance improved this quarter driven by detection rule tuning (FP rate down 12%) and SOAR deployment (MTTR down 20%). Network visibility remains below target; addressing with TAP deployment in Q2."

Reference: Chapter 11, Section 11.10 - Balanced Scorecard

Question 11: What is detection coverage and how is it measured?

A) Physical coverage of security cameras B) Percentage of MITRE ATT&CK techniques and tactics that the SOC has detection rules for C) Network bandwidth D) Coverage is not measurable

Answer

Correct Answer: B) Percentage of MITRE ATT&CK techniques the SOC has detection rules for

Explanation:

Detection Coverage: - Definition: Proportion of attack techniques that SOC can detect - Framework: Typically measured against MITRE ATT&CK - Goal: Maximize coverage to reduce blind spots

Measuring Detection Coverage:

ATT&CK-Based Coverage:

MITRE ATT&CK:
- Total Techniques (Enterprise): ~200 techniques
- Your Detection Rules: 165 rules mapped to techniques
- Unique Techniques Covered: 140
- Coverage: 140 / 200 = 70%

Coverage by Tactic:

Initial Access (9 techniques):
- Covered: 7 (78%)
- Gaps: T1200 (Hardware Additions), T1091 (Replication Through Removable Media)

Execution (12 techniques):
- Covered: 11 (92%)
- Gap: T1059.008 (Network Device CLI)

Persistence (19 techniques):
- Covered: 14 (74%)
- Gaps: 5 techniques

Privilege Escalation (13 techniques):
- Covered: 10 (77%)

... (for all 14 tactics)

Overall Coverage: 85%

Coverage Heat Map:

MITRE ATT&CK Navigator:
- Green: Detected (high confidence)
- Yellow: Partially detected (behavioral only)
- Red: No detection

Example:
T1003.001 (LSASS Dumping): Green (detected by EDR + SIEM)
T1055 (Process Injection): Yellow (partial behavioral detection)
T1574 (DLL Hijacking): Red (no coverage - blind spot!)

Coverage Quality Levels:

Level 1: No Detection (0 points)
- No rule exists

Level 2: Theoretical Detection (1 point)
- Rule exists but never tested

Level 3: Tested Detection (2 points)
- Rule tested in lab, not production-validated

Level 4: Production-Validated (3 points)
- Detected real attack or purple team exercise

Level 5: Auto-Response (4 points)
- Detection + automated containment

Weighted Coverage = (Total Points) / (Max Possible Points)

Example Coverage Calculation:

Technique T1003.001 (LSASS Dumping):
- Detection Rule: Yes (SIEM + EDR)
- Tested: Yes (purple team exercise)
- Production Validated: Yes (detected real attack)
- Auto-Response: No (requires approval gate)
- Score: 3/4 points (75%)

Technique T1059.001 (PowerShell):
- Detection Rule: Yes
- Tested: Yes
- Production Validated: Yes
- Auto-Response: Yes (SOAR kills suspicious PowerShell)
- Score: 4/4 points (100%)

Technique T1574 (DLL Hijacking):
- Detection Rule: No
- Score: 0/4 points (0%)

Overall Coverage = Average of all technique scores

Coverage Gaps Analysis:

Priority 1 Gaps (High Risk, No Coverage):
- T1003.001 (LSASS Dumping) - No detection ← Deploy EDR monitoring
- T1021.001 (RDP) - Limited detection ← Enhance logging

Priority 2 Gaps (Medium Risk):
- T1055 (Process Injection) - Partial detection ← Improve behavioral rules

Priority 3 Gaps (Low Risk):
- T1200 (Hardware Additions) - Not applicable (data center controls)

Improving Coverage:

1. Map Existing Rules to ATT&CK:
   - Inventory all detection rules
   - Tag with ATT&CK technique IDs
   - Identify covered techniques

2. Identify Gaps:
   - Techniques with 0 rules = blind spots
   - Prioritize by threat intel (what attackers actually use)

3. Deploy New Detections:
   - Purple team: Test new rules
   - Validate in production
   - Update coverage score

4. Continuous Improvement:
   - Quarterly coverage review
   - New ATT&CK techniques → Assess coverage
   - Emerging threats → Add detections

Coverage Reporting:

SOC Coverage Report - Q1 2026

Overall Coverage: 82% (164/200 techniques)

By Tactic:
- Initial Access: 78%
- Execution: 92% ✅
- Persistence: 74% ⚠️
- Privilege Escalation: 77%
- Defense Evasion: 68% ⚠️ (priority for improvement)
- Credential Access: 85%
- Discovery: 90% ✅
- Lateral Movement: 88% ✅
- Collection: 75%
- Exfiltration: 80%
- Command & Control: 86%
- Impact: 95% ✅

Top 5 Priority Gaps:
1. T1027 (Obfuscated Files) - No detection
2. T1055 (Process Injection) - Weak detection
3. T1070 (Indicator Removal) - No detection
4. T1078 (Valid Accounts) - Limited detection
5. T1497 (Virtualization/Sandbox Evasion) - No detection

Action Plan:
- Deploy T1027 detection: Entropy analysis (Q2)
- Improve T1055: Enhanced EDR behavioral rules (Q2)
- Add T1070: File deletion monitoring (Q3)

Reference: Chapter 11, Section 11.11 - Detection Coverage

Question 12: What are leading vs lagging indicators in SOC metrics?

A) Leading indicators predict future performance, lagging indicators measure past performance B) They are the same thing C) Leading indicators are for executives, lagging for analysts D) Indicators don't matter

Answer

Correct Answer: A) Leading indicators predict future, lagging indicators measure past

Explanation:

Leading Indicators (Predictive): - Definition: Metrics that predict future SOC performance - Purpose: Early warning of problems - Actionable: Can intervene before issues escalate

Lagging Indicators (Historical): - Definition: Metrics that measure past performance - Purpose: Assess what happened - Less Actionable: Can't change past, but can learn from it

SOC Leading Indicators:

1. Alert Volume Trend:

Leading Indicator: Alert volume increasing 20%/week
Prediction: Analysts will be overwhelmed in 2-3 weeks
Action: Tune high-volume rules NOW before fatigue sets in

Example:
Week 1: 300 alerts/day
Week 2: 360 alerts/day (+20%)
Week 3: 432 alerts/day (+20%)
Week 4 Projection: 518 alerts/day (unsustainable)

Preventive Action: Tune rules in Week 2 to prevent Week 4 crisis

2. False Positive Rate Increasing:

Leading Indicator: FP rate rising (15% → 25%)
Prediction: Alert fatigue, analyst turnover
Action: Tune detection rules before fatigue sets in

3. MTTA Increasing:

Leading Indicator: MTTA rising (10 min → 20 min over 2 weeks)
Prediction: Analysts overwhelmed or fatigued
Action: Investigate workload, add staffing, or reduce alert volume

4. Detection Coverage Gaps:

Leading Indicator: 30% of ATT&CK techniques not covered
Prediction: Blind spots will be exploited
Action: Prioritize closing critical gaps before attack

5. Analyst Satisfaction Declining:

Leading Indicator: Quarterly survey shows satisfaction drop (85% → 70%)
Prediction: Turnover risk in 6-12 months
Action: Address morale issues, reduce alert fatigue

6. Threat Intel: Emerging Campaign:

Leading Indicator: New ransomware campaign targeting your industry
Prediction: Attack likely incoming
Action: Deploy detections, patch vulnerabilities NOW

SOC Lagging Indicators:

1. Mean Time to Detect (MTTD):

Lagging Indicator: MTTD = 2 hours (measures past incidents)
Insight: Historical detection speed
Limitation: Can't change past incidents
Use: Benchmark, trend analysis, goal setting

2. Incident Count:

Lagging Indicator: 15 confirmed incidents last month
Insight: Historical attack volume
Limitation: Reactive (incidents already happened)
Use: Assess security posture, budget justification

3. Dwell Time:

Lagging Indicator: Average dwell time 14 days
Insight: How long attackers went undetected in past incidents
Limitation: Only known after incident resolved
Use: Demonstrate need for better detection

4. Breach Cost:

Lagging Indicator: Ransomware incident cost $500K
Insight: Financial impact of past breach
Limitation: Too late to prevent this breach
Use: Justify SOC budget increases

5. False Negative Rate:

Lagging Indicator: Missed 5 incidents (discovered via threat hunting)
Insight: Historical detection gaps
Limitation: Incidents already occurred
Use: Improve detections for future

Using Leading & Lagging Together:

Example: Preventing Alert Fatigue

Leading Indicators (Predictive):
- Alert volume: ↑ 30% month-over-month
- MTTA: ↑ from 10 min to 18 min
- FP rate: ↑ from 15% to 28%
- Analyst survey: Satisfaction ↓ from 85% to 72%

Prediction: Alert fatigue crisis in 4-6 weeks

Preventive Actions:
1. Tune high-volume rules (reduce 30% alert volume)
2. Deploy ML scoring (prioritize high-confidence alerts)
3. SOAR auto-enrichment (reduce triage time)
4. Add temp analyst coverage (reduce workload)

Result: Crisis averted before burnout/turnover

Lagging Indicators (Validation):
After 1 month:
- Alert volume: ↓ to 280/day (28% reduction) ✅
- MTTA: ↓ to 12 min ✅
- FP rate: ↓ to 18% ✅
- Analyst satisfaction: ↑ to 80% ✅

Lagging indicators confirm leading indicator-driven actions worked

Balanced Dashboard:

Leading Indicators (Left Column):
- Alert Volume Trend: ↑ 15% ⚠️ (watch closely)
- FP Rate Trend: ↓ 5% ✅ (improvement)
- MTTA Trend: Stable ✅
- Coverage Gaps: 18% ⚠️ (close gaps)
- Analyst Morale: 82% ✅

Lagging Indicators (Right Column):
- MTTD Last Month: 45 min ✅
- Incidents Detected: 12 ✅
- Incidents Missed: 1 ✅
- MTTR Last Month: 75 min ✅

Actionable Insights:
- Alert volume rising → Tune rules this week
- Coverage gaps → Prioritize 5 critical techniques
- Overall performance good (lagging indicators green)

Reference: Chapter 11, Section 11.12 - Leading vs Lagging Indicators

Score Interpretation¶

10-12 correct: Excellent! You have strong command of SOC and ML performance metrics.
7-9 correct: Good understanding. Review confusion matrix and ROC/AUC concepts.
4-6 correct: Adequate baseline. Focus on MTTD/MTTA/MTTR and precision/recall trade-offs.
Below 4: Review Chapter 11 thoroughly, especially core SOC metrics and ML evaluation.

← Back to Chapter 11 | Next Quiz: Chapter 12 →