Chapter 9: AI/ML in SOC - Quiz¶
Instructions¶
Test your understanding of supervised vs unsupervised learning, classification/clustering/regression, UEBA, feature engineering, overfitting/underfitting, model drift, and adversarial ML.
Question 1: What is the primary difference between supervised and unsupervised learning?
A) Supervised learning uses GPUs, unsupervised uses CPUs B) Supervised learning trains on labeled data (known outcomes), unsupervised finds patterns in unlabeled data C) Supervised learning is always more accurate D) There is no difference
Answer
Correct Answer: B) Supervised learning trains on labeled data, unsupervised finds patterns in unlabeled data
Explanation:
Supervised Learning: - Training Data: Labeled examples (input + correct output) - Goal: Learn mapping from input → output - Use Cases: Classification, regression - Example: Train on 10,000 emails labeled "phishing" or "legitimate" → Model learns to classify new emails
Supervised Learning SOC Example:
Training Data:
Email 1: Subject: "Your account is locked" → Label: PHISHING
Email 2: Subject: "Meeting at 3pm" → Label: LEGITIMATE
... 10,000 labeled emails
Model learns patterns:
- Urgent language + suspicious link = PHISHING
- Internal sender + calendar invite = LEGITIMATE
Prediction on new email:
Email: "Verify your account now!" → Predicted: PHISHING (confidence: 92%)
Unsupervised Learning: - Training Data: Unlabeled examples (input only, no correct answers) - Goal: Find hidden patterns, group similar items - Use Cases: Clustering, anomaly detection - Example: Analyze network traffic to find unusual patterns (no labels needed)
Unsupervised Learning SOC Example:
Training Data: 1 million network connections (no labels)
Model finds clusters:
- Cluster 1: Web browsing (80% of traffic)
- Cluster 2: Email (15% of traffic)
- Cluster 3: SSH (4% of traffic)
- Cluster 4: Unusual (1% of traffic) ← Potential threats
Anomaly detected: Connection doesn't fit any cluster → Alert
When to Use Each: - Supervised: When you have labeled training data (malware samples, phishing emails) - Unsupervised: When you lack labels or want to find unknown threats
Reference: Chapter 9, Section 9.1 - Supervised vs Unsupervised
Question 2: What type of ML task is 'predicting whether an email is phishing or legitimate'?
A) Regression B) Classification C) Clustering D) Reinforcement learning
Answer
Correct Answer: B) Classification
Explanation:
Classification: - Definition: Predicting discrete categories/classes - Output: Class label (e.g., "phishing" or "legitimate") - Algorithm Examples: Logistic Regression, Random Forest, Neural Networks
Classification SOC Examples:
1. Phishing Detection:
Input: Email features (sender, subject, links, urgency words)
Output: PHISHING or LEGITIMATE
Classes: 2 (binary classification)
2. Malware Classification:
3. Alert Severity Prediction:
Input: Alert metadata (source IP, user, asset, threat intel)
Output: LOW, MEDIUM, HIGH, CRITICAL
Classes: 4 (multi-class classification)
4. Attack Type Classification:
Classification vs Other Tasks:
Regression (Continuous Output): - Predicting a number (e.g., "risk score: 87.3") - Example: Predict time to compromise in minutes
Clustering (Group Similar Items): - No predefined classes - Example: Group users by behavior patterns
Classification Model Training:
Training Set: 10,000 labeled emails
- 6,000 LEGITIMATE
- 4,000 PHISHING
Model learns decision boundary:
IF (urgent_words > 3 AND external_sender AND suspicious_link):
Predict: PHISHING
ELSE:
Predict: LEGITIMATE
Reference: Chapter 9, Section 9.2 - Classification
Question 3: What type of ML task is 'grouping users by similar behavior patterns without predefined labels'?
A) Classification B) Regression C) Clustering D) Supervised learning
Answer
Correct Answer: C) Clustering
Explanation:
Clustering: - Definition: Grouping similar data points without predefined labels - Type: Unsupervised learning - Output: Group assignments (e.g., "User belongs to Cluster 3") - Algorithms: K-Means, DBSCAN, Hierarchical Clustering
Clustering SOC Examples:
1. User Behavior Grouping:
Input: User activity features (login times, applications used, data accessed)
Process: Clustering algorithm finds natural groups
Output:
- Cluster 1: Sales team (CRM usage, daytime logins, high email volume)
- Cluster 2: Engineers (IDE/Git, late-night logins, SSH usage)
- Cluster 3: Finance (spreadsheets, 9-5 logins, financial system access)
- Cluster 4: Executives (mobile access, travel, light usage)
- Anomaly: User doesn't fit any cluster → Investigate
2. Network Traffic Clustering:
Input: Network flows (bytes, packets, duration, ports)
Output:
- Cluster 1: HTTP/HTTPS browsing
- Cluster 2: Email traffic
- Cluster 3: Database queries
- Cluster 4: Unknown (potential C2 traffic) → Alert
3. Alert Clustering:
Input: Alerts (source, destination, type, time)
Output: Groups of related alerts (likely same incident)
Benefit: Reduces alert fatigue by grouping 50 alerts into 1 incident
K-Means Clustering Example:
# Pseudocode: Cluster users by login behavior
from sklearn.cluster import KMeans
# Features: [avg_login_hour, logins_per_week, applications_used]
user_behaviors = [
[9.5, 25, 8], # User 1: Regular office hours
[14.2, 30, 12], # User 2: Afternoon worker
[2.3, 40, 5], # User 3: Night shift (anomalous?)
...
]
# Cluster into 3 groups
kmeans = KMeans(n_clusters=3)
clusters = kmeans.fit_predict(user_behaviors)
# User 3 assigned to Cluster 2 (small cluster) → Investigate
Clustering vs Classification: - Classification: Predefined classes (phishing/legitimate) - Clustering: Discover groups (algorithm finds natural patterns)
Reference: Chapter 9, Section 9.3 - Clustering
Question 4: What type of ML task is 'predicting a risk score from 0-100 for an alert'?
A) Classification B) Clustering C) Regression D) Reinforcement learning
Answer
Correct Answer: C) Regression
Explanation:
Regression: - Definition: Predicting continuous numerical values - Output: Number on continuous scale (not discrete classes) - Algorithms: Linear Regression, Random Forest Regressor, Neural Networks
Regression SOC Examples:
1. Alert Risk Scoring:
Input: Alert features (threat intel match, asset criticality, user risk, historical FP rate)
Output: Risk score 0-100 (e.g., 87.3)
Model: Regression
2. Time-to-Compromise Prediction:
Input: Vulnerability CVSS score, patch status, exposure, threat intel
Output: Predicted days until exploitation (e.g., 14.7 days)
Model: Regression
3. MTTR Prediction:
Input: Incident severity, complexity, team availability
Output: Predicted response time in minutes (e.g., 45.2 minutes)
Model: Regression
Example: Risk Score Regression Model
# Training data: Historical alerts with manually assigned risk scores
Training:
Alert 1: [threat_intel_match=1, asset_critical=1, user_privileged=1] → Risk: 95
Alert 2: [threat_intel_match=0, asset_critical=0, user_privileged=0] → Risk: 15
Alert 3: [threat_intel_match=1, asset_critical=0, user_privileged=0] → Risk: 60
... 10,000 alerts
Model learns: Risk ≈ (40 × threat_intel) + (30 × asset) + (25 × user) + ...
Prediction:
New Alert: [threat_intel=1, asset=1, user=0]
Predicted Risk: 87.3
Regression vs Classification:
Classification: "Is this PHISHING or LEGITIMATE?" (discrete classes)
Regression: "What's the phishing probability?" (continuous: 0.873 = 87.3%)
Note: Classification and regression can solve similar problems differently: - Classification: Email is PHISHING - Regression: Email phishing probability is 0.92 → Threshold at 0.8 → Classify as PHISHING
Reference: Chapter 9, Section 9.4 - Regression
Question 5: What is UEBA (User and Entity Behavior Analytics)?
A) A type of firewall B) ML-based analysis of user and entity behavior to detect anomalies and insider threats C) A compliance framework D) A SIEM vendor
Answer
Correct Answer: B) ML-based analysis of user and entity behavior to detect anomalies and insider threats
Explanation:
UEBA (User and Entity Behavior Analytics): - Purpose: Detect anomalous behavior that deviates from established baselines - Method: Machine learning (unsupervised clustering, anomaly detection) - Entities: Users, hosts, applications, network devices
UEBA Use Cases:
1. Insider Threat Detection:
Baseline: User typically accesses 50 files/day from HR database
Anomaly: User suddenly downloads 10,000 files
Alert: "Abnormal data access - potential exfiltration"
2. Compromised Account Detection:
Baseline: User logs in from New York, 9am-5pm, Windows laptop
Anomaly: Login from Russia at 3am, Linux system
Alert: "Impossible travel + unusual OS"
3. Lateral Movement Detection:
Baseline: Workstation WKS-042 typically connects to 5 internal servers
Anomaly: WKS-042 connects to 50 servers in 10 minutes
Alert: "Abnormal network scanning behavior"
4. Privilege Escalation:
Baseline: User account "jdoe" never uses administrative tools
Anomaly: "jdoe" executes PowerShell with admin rights
Alert: "Unusual privilege usage"
UEBA ML Architecture:
Step 1: Baseline Learning (Training Phase)
- Collect 30-90 days of normal behavior
- Build user/entity profiles using unsupervised learning
- Example: User A baseline = [login_times, apps_used, data_accessed, ...]
Step 2: Anomaly Detection (Inference Phase)
- Compare current behavior to baseline
- Calculate anomaly score (0-100)
- Alert if score exceeds threshold
Step 3: Continuous Learning
- Update baselines as behavior evolves
- Handle concept drift (job changes, new applications)
Example UEBA Alert:
User: jane.doe@company.com
Anomaly Score: 89/100
Reasons:
- Login from unusual location (China vs typical New York)
- Login at unusual time (3am vs typical 9am-5pm)
- Unusual application (SSH vs typical Outlook/Chrome)
- High data access (500 files vs typical 20)
Recommendation: Investigate potential compromise
UEBA Challenges: - False Positives: Legitimate behavior changes (new project, promotion) - Baseline Pollution: Training on compromised data - Cold Start: New users lack baseline (no historical data)
Popular UEBA Platforms: Exabeam, Securonix, Microsoft Sentinel UEBA, Splunk UBA
Reference: Chapter 9, Section 9.5 - UEBA
Question 6: What is feature engineering in machine learning?
A) Building physical features for hardware B) Selecting and transforming raw data into meaningful features that improve model performance C) Engineering department features D) Feature engineering is not used in ML
Answer
Correct Answer: B) Selecting and transforming raw data into meaningful features that improve model performance
Explanation:
Feature Engineering: - Definition: Creating informative features from raw data - Goal: Help ML model learn patterns more effectively - Impact: Often more important than algorithm choice for performance
Feature Engineering SOC Examples:
1. Phishing Email Classification:
Raw Data:
Email Subject: "URGENT: Verify your account now!"
Email Body: "Click here to verify: http://evil.com/verify"
Sender: "security@paypa1.com"
Engineered Features:
- urgent_word_count: 2 (URGENT, now)
- external_sender: True
- sender_domain_typosquat: True (paypa1 vs paypal)
- link_count: 1
- link_domain_mismatch: True (evil.com != paypa1.com)
- capitalization_ratio: 0.15 (15% caps)
- has_attachment: False
- sender_in_contacts: False
Why Feature Engineering Matters: - Raw text is hard for ML to process - Engineered features are numerical and meaningful - Model can learn: "If urgent_words > 2 AND typosquat = True → PHISHING"
2. Malware Detection:
Raw Data:
Engineered Features:
- file_size: 2456032
- entropy: 7.8 (high entropy suggests encryption/packing)
- suspicious_imports: 5 (count of CreateRemoteThread, VirtualAllocEx, ...)
- string_matches: ["cmd.exe", "powershell"] (count: 2)
- packed: True (detected via entropy analysis)
- age_days: 3 (first seen 3 days ago)
- prevalence: 0.0001% (very rare file)
3. Network Traffic Anomaly Detection:
Raw Data:
TCP connection: 10.0.1.50:49234 → 203.0.113.45:443
Duration: 3600 seconds
Bytes sent: 1024
Bytes received: 52,428,800
Engineered Features:
- duration_minutes: 60
- bytes_ratio: 51200 (received/sent - indicates data download)
- destination_is_external: True
- destination_threat_intel_match: True
- connection_time: 02:00 (unusual hour)
- port_443_with_non_http: True (suspicious)
Feature Engineering Techniques:
1. Extraction: - Parse URLs to extract domain, TLD, path length - Extract header fields from packets
2. Transformation: - Log transformation (reduce skew in numeric features) - Normalization (scale 0-1)
3. Aggregation: - Count failed logins per user per hour - Average file size accessed per day
4. Encoding: - One-hot encoding for categorical variables (OS: Windows → [1,0,0], Linux → [0,1,0])
5. Domain Knowledge: - Threat intel lookups (convert IP → reputation score) - MITRE ATT&CK mapping (technique ID → category)
Reference: Chapter 9, Section 9.6 - Feature Engineering
Question 7: What is overfitting in machine learning?
A) A model that performs well on both training and test data B) A model that memorizes training data and performs poorly on new, unseen data C) A model that is too simple D) Overfitting improves model performance
Answer
Correct Answer: B) A model that memorizes training data and performs poorly on new, unseen data
Explanation:
Overfitting: - Problem: Model learns noise and specifics of training data instead of general patterns - Symptom: High accuracy on training data, poor accuracy on test/production data - Cause: Model too complex, too little training data, or too many features
Overfitting Example: Phishing Detection
Training Data (100 emails):
Email 1: Subject "Verify account" From: attacker@evil.com → PHISHING
Email 2: Subject "Meeting notes" From: colleague@company.com → LEGITIMATE
... 100 total
Overfit Model Behavior:
Model memorizes: "If sender = attacker@evil.com → PHISHING"
Training Accuracy: 100% (perfect!)
Test Data:
Email: Subject "Verify account" From: attacker2@evil2.com → Model predicts LEGITIMATE
Why? Model memorized specific sender "attacker@evil.com" instead of learning pattern "Verify account + external sender = PHISHING"
Test Accuracy: 60% (poor!)
Visual Representation:
Underfitting: Model too simple (straight line through scattered data)
Good Fit: Model captures pattern (smooth curve fitting trend)
Overfitting: Model too complex (zigzag line through every training point, including noise)
Signs of Overfitting: - Training accuracy: 99% - Test accuracy: 65% - Gap: 34% (large gap indicates overfitting)
Preventing Overfitting:
1. More Training Data:
2. Regularization:
3. Feature Selection:
4. Cross-Validation:
Split data into 5 folds, train on 4, test on 1
Repeat 5 times, average performance
Ensures model generalizes across different data splits
5. Early Stopping:
Stop training when validation performance stops improving
Prevents model from over-learning training data
Example: SOC Alert Scoring Model
Problem: Model achieves 98% accuracy on historical alerts but only 70% on new alerts
Diagnosis: Overfitting (memorized specific IPs/users in training data)
Solution:
- Collected 10x more training data
- Reduced features from 50 → 15 most predictive
- Applied regularization
Result: Training 92%, Test 89% (better generalization)
Reference: Chapter 9, Section 9.7 - Overfitting
Question 8: What is underfitting in machine learning?
A) A model that is too complex B) A model that is too simple and fails to capture underlying patterns, performing poorly on both training and test data C) A perfect model D) Underfitting only affects test data
Answer
Correct Answer: B) A model that is too simple and fails to capture patterns, performing poorly on training and test data
Explanation:
Underfitting: - Problem: Model too simple to learn underlying patterns - Symptom: Poor performance on both training AND test data - Cause: Model lacks complexity, insufficient features, or insufficient training
Underfitting Example: Alert Severity Prediction
Scenario:
Data: 10,000 alerts with features [source_ip, destination_ip, time, user, ...]
Task: Predict severity (Low/Medium/High/Critical)
Underfit Model: Uses only 1 feature (time of day)
Rule: "If time between 2am-6am → High, else → Low"
Result:
Training Accuracy: 55%
Test Accuracy: 54%
Why? Model ignores critical features (threat intel, asset criticality, user risk)
Patterns are too complex for simple time-based rule
Visual Representation:
Data: Complex curved pattern
Underfit Model: Straight horizontal line (ignores pattern)
Good Model: Curve that follows trend
Overfit Model: Zigzag through every point
Signs of Underfitting: - Training accuracy: 60% - Test accuracy: 58% - Both low: Model hasn't learned patterns
Fixing Underfitting:
1. Increase Model Complexity:
2. Add More Features:
Before: [time]
After: [time, source_ip, threat_intel_match, asset_criticality, user_risk, ...]
Gives model more information to learn from
3. Train Longer:
4. Remove Regularization:
Example: Malware Detection Model
Problem: Model only checks file size
Rule: "If size > 10MB → Malware"
Training Accuracy: 52% (barely better than random)
Test Accuracy: 51%
Diagnosis: Underfitting (too simple)
Solution:
- Added features: entropy, imports, strings, behavior
- Changed algorithm: Simple rule → Random Forest
Result: Training 94%, Test 91% (learned real patterns)
Underfitting vs Overfitting:
Underfitting: Both training and test accuracy LOW
Good Fit: Both training and test accuracy HIGH (close values)
Overfitting: Training accuracy HIGH, test accuracy LOW (large gap)
Reference: Chapter 9, Section 9.8 - Underfitting
Question 9: What is model drift and why is it a problem in SOC operations?
A) Physical movement of servers B) Model performance degrades over time as real-world data distribution changes (e.g., new attack techniques, infrastructure changes) C) Model drift improves accuracy D) Model drift only affects regression models
Answer
Correct Answer: B) Model performance degrades over time as real-world data distribution changes
Explanation:
Model Drift (Concept Drift): - Problem: Real-world data changes, but model was trained on historical data - Result: Model accuracy degrades over time - Types: Concept drift (patterns change) and data drift (feature distributions change)
SOC Model Drift Examples:
1. Malware Detection Drift:
Training (2023):
- Model trained on 2023 malware samples
- Learns: Malware uses DLL injection, specific C2 patterns
Production (2026):
- New malware families use fileless techniques, cloud C2
- Model doesn't recognize new patterns
- Accuracy: 95% (2023) → 70% (2026)
- Reason: Attack techniques evolved
2. Phishing Detection Drift:
Training (2024):
- Phishing emails use poor grammar, obvious spoofing
Production (2026):
- Attackers use AI-generated perfect grammar
- Compromise legitimate accounts (no spoofing)
- Model trained on old patterns misses new sophisticated phishing
- Accuracy: 92% (2024) → 68% (2026)
3. Network Baseline Drift:
Training (Q1 2025):
- Company uses on-prem infrastructure
- UEBA baseline: 80% traffic to internal servers
Production (Q4 2025):
- Company migrates to cloud
- 70% traffic now to AWS/Azure
- UEBA alerts on legitimate cloud usage as "abnormal"
- False Positive Rate: 10% → 40%
4. User Behavior Drift:
Baseline (January):
- User "jdoe" is engineer, accesses code repos
Reality (June):
- User promoted to manager, now accesses HR systems
- UEBA flags as anomalous (legitimate job change)
Types of Drift:
Concept Drift: - Definition: Relationship between features and target changes - Example: Previously, exe files from email were always malicious. Now, legitimate software uses email distribution.
Data Drift: - Definition: Input feature distribution changes - Example: Average email length changes from 500 chars to 2000 chars (doesn't affect phishing patterns, but model wasn't trained on longer emails)
Detecting Model Drift:
Monitor:
1. Prediction accuracy over time (weekly/monthly)
2. Feature distributions (are inputs different than training data?)
3. False positive/negative rates
4. User feedback (analysts marking predictions as wrong)
Alert when:
- Accuracy drops >10% from baseline
- FP rate increases >20%
- Feature distributions shift significantly
Mitigating Model Drift:
1. Continuous Retraining:
Schedule: Retrain model monthly/quarterly
Data: Use recent labeled data (last 90 days)
Benefit: Model adapts to new patterns
2. Online Learning:
3. Ensemble Models:
4. Human-in-the-Loop:
Example Monitoring:
Malware Detection Model:
- Week 1: Accuracy 94%
- Week 10: Accuracy 91%
- Week 20: Accuracy 85% ← Alert: Significant drift detected
- Action: Retrain on recent malware samples
- Week 21: Accuracy 93% (post-retraining)
Reference: Chapter 9, Section 9.9 - Model Drift
Question 10: What is an adversarial example in ML security?
A) A training example from an adversary B) Intentionally crafted input designed to fool an ML model into making incorrect predictions C) A competitive ML model D) Adversarial examples don't exist
Answer
Correct Answer: B) Intentionally crafted input designed to fool ML model into incorrect predictions
Explanation:
Adversarial Examples: - Definition: Malicious inputs crafted to evade ML-based detection - Goal: Cause misclassification (malware → benign, phishing → legitimate) - Method: Small, often imperceptible modifications to bypass model
Adversarial Attack Examples:
1. Malware Evasion:
Original Malware:
- Hash: abc123
- Entropy: 7.9 (high)
- Imports: VirtualAlloc, CreateRemoteThread
- ML Prediction: MALWARE (confidence: 98%)
Adversarial Malware:
- Add benign comments/strings to reduce entropy
- Entropy: 6.2 (now within benign range)
- Functionality: UNCHANGED (still malicious)
- ML Prediction: BENIGN (confidence: 65%)
Result: Attacker evades detection by manipulating features
2. Phishing Email Evasion:
Original Phishing:
Subject: "URGENT: Verify account now!"
ML Model: Flags "URGENT" + "Verify" as phishing indicators
Prediction: PHISHING (95%)
Adversarial Phishing:
Subject: "Important notice regarding your account verification"
- Replaces urgent words with softer language
- Same malicious intent
- ML Prediction: LEGITIMATE (70%)
3. Network Traffic Evasion:
Original C2 Traffic:
- Large payload (10MB data exfil)
- ML detects unusual volume
- Prediction: MALICIOUS
Adversarial C2:
- Split payload into 1000 small requests (10KB each)
- Mimics normal web traffic pattern
- ML Prediction: BENIGN (traffic volume per connection is normal)
How Adversarial Attacks Work:
1. White-Box Attack (Attacker knows model):
- Attacker has access to model architecture and weights
- Calculates gradients to find minimal perturbation
- Crafts input to maximize misclassification
Example: Attacker knows malware detector uses entropy feature
→ Adds padding to reduce entropy below threshold
2. Black-Box Attack (Attacker doesn't know model):
- Attacker queries model with test inputs
- Learns decision boundary through trial and error
- Crafts evasion based on observed behavior
Example: Test 100 malware variants to find which features trigger detection
→ Modify those specific features
Defending Against Adversarial Examples:
1. Adversarial Training:
Include adversarial examples in training data
Model learns to recognize evasion attempts
Training Data:
- Original malware samples
- + Adversarially modified samples (still labeled malware)
Result: Model robust to evasion
2. Ensemble Defense:
Use multiple models with different architectures
Harder for attacker to evade all simultaneously
Vote: If 2+ models detect malware → Flag as malicious
3. Defense in Depth:
Don't rely solely on ML
Combine with:
- Signature-based detection (hash matching)
- Behavioral analysis (runtime monitoring)
- Sandboxing (execute in isolated environment)
4. Feature Robustness:
Use features that are hard to manipulate without breaking functionality
Example: For malware detection
- Bad feature: File size (easily manipulated with padding)
- Good feature: Control flow graph (changing breaks functionality)
5. Anomaly Detection:
Real-World Example:
Attack: APT group crafts malware to evade EDR ML model
Method:
- Analyzed EDR vendor's published research
- Identified entropy threshold (> 7.5 = malware)
- Added benign strings to reduce entropy to 7.2
Defense:
- EDR vendor retrains model with adversarial samples
- Adds new feature: "unusual padding ratio"
- Detects evasion attempt
Question 11: Why is continuous retraining important for security ML models?
A) To waste computing resources B) To adapt to evolving threats, new attack techniques, and changing baselines (mitigate model drift) C) Retraining is never necessary D) Models become worse with retraining
Answer
Correct Answer: B) To adapt to evolving threats, new attack techniques, and changing baselines
Explanation:
Need for Continuous Retraining: - Threat Evolution: Attackers adapt techniques - Environment Changes: Infrastructure migrations, new applications - Model Drift: Performance degrades without updates - New Attack Vectors: Zero-day exploits, novel malware families
Retraining Schedule Examples:
1. High-Frequency Retraining (Weekly/Monthly):
Use Case: Phishing detection
Reason: Phishing techniques evolve rapidly
Schedule: Retrain weekly with last 30 days of labeled emails
Benefit: Catches latest phishing trends
2. Medium-Frequency Retraining (Quarterly):
Use Case: Malware detection
Reason: New malware families emerge regularly
Schedule: Retrain quarterly with recent malware samples
Benefit: Adapts to new techniques while maintaining stability
3. Event-Driven Retraining:
Use Case: UEBA
Trigger: Major infrastructure change (cloud migration)
Action: Retrain immediately to establish new baseline
Benefit: Prevents false positive flood
Retraining Workflow:
Step 1: Collect Recent Data
- Last 90 days of labeled incidents
- Analyst feedback (false positives/negatives)
- New threat intelligence
Step 2: Prepare Training Set
- Combine historical data (for stability) with recent data (for adaptation)
- Ratio: 70% recent, 30% historical
- Balance classes (equal malicious/benign samples)
Step 3: Retrain Model
- Use same architecture or improved version
- Validate on hold-out test set
- Compare performance to previous model
Step 4: A/B Testing
- Deploy new model to 10% of traffic
- Monitor metrics (accuracy, FP rate, analyst feedback)
- If improved → Full deployment
- If worse → Rollback to previous model
Step 5: Monitor Performance
- Track accuracy over time
- Schedule next retraining
Example: Alert Scoring Model Retraining
Initial Model (January 2025):
- Trained on 2024 data
- Accuracy: 92%
March 2025:
- Accuracy degrades to 85% (drift detected)
- New attack campaigns using different TTPs
Retraining (April 2025):
- Collected Q1 2025 labeled alerts (5,000 samples)
- Combined with 2024 data (15,000 samples)
- Retrained model
Post-Retraining:
- Accuracy: 93% (improved)
- Adapted to Q1 2025 threats
Challenges: - Labeled Data: Need analyst time to label new incidents - Compute Cost: Retraining large models is expensive - Regression Risk: New model might perform worse on some scenarios
Best Practices: - Automate Pipeline: Scheduled retraining without manual intervention - Version Control: Track model versions, enable rollback - Continuous Monitoring: Detect drift early - Feedback Loop: Analyst corrections feed training data
Question 12: What is the difference between precision and recall in ML evaluation?
A) Precision and recall are the same metric B) Precision = (True Positives) / (True Positives + False Positives); Recall = (True Positives) / (True Positives + False Negatives) C) Precision measures speed, recall measures memory D) Only precision matters in security
Answer
Correct Answer: B) Precision = TP/(TP+FP); Recall = TP/(TP+FN)
Explanation:
Precision: - Definition: Of all predictions as POSITIVE, how many were actually positive? - Formula: Precision = TP / (TP + FP) - Meaning: Accuracy of positive predictions - Trade-off: High precision = few false positives
Recall (Sensitivity): - Definition: Of all actual POSITIVES, how many did we detect? - Formula: Recall = TP / (TP + FN) - Meaning: Coverage of actual threats - Trade-off: High recall = few false negatives (missed threats)
Confusion Matrix:
Predicted MALWARE Predicted BENIGN
Actually
MALWARE TP = 90 FN = 10 (100 total malware)
Actually
BENIGN FP = 20 TN = 880 (900 total benign)
Calculate Metrics:
Precision = TP / (TP + FP) = 90 / (90 + 20) = 90/110 = 81.8%
- "Of 110 malware predictions, 90 were correct"
- "18.2% of malware alerts are false positives"
Recall = TP / (TP + FN) = 90 / (90 + 10) = 90/100 = 90%
- "Of 100 actual malware samples, we detected 90"
- "We missed 10% of malware (false negatives)"
SOC Implications:
High Precision, Low Recall:
Model: Very conservative (only flags obvious malware)
Precision: 95% (very few FPs)
Recall: 60% (misses 40% of malware)
Impact:
+ Analysts trust alerts (low FP rate)
- Misses sophisticated threats (high FN rate)
Use Case: Auto-blocking (only block high-confidence threats)
Low Precision, High Recall:
Model: Very aggressive (flags anything suspicious)
Precision: 50% (high FP rate)
Recall: 98% (catches almost all malware)
Impact:
+ Catches nearly all threats (low FN rate)
- Alert fatigue (analysts waste time on FPs)
Use Case: Initial screening (send to analyst for review)
Balanced (Optimize F1 Score):
F1 Score = 2 × (Precision × Recall) / (Precision + Recall)
Example:
Precision: 85%
Recall: 80%
F1 = 2 × (0.85 × 0.80) / (0.85 + 0.80) = 0.824 = 82.4%
Use Case: Most SOC use cases (balance FP and FN)
Tuning Trade-off:
Threshold adjustment example (malware detector):
Threshold: 0.9 (very strict)
→ High Precision (95%), Low Recall (65%)
Threshold: 0.5 (balanced)
→ Medium Precision (85%), Medium Recall (82%)
Threshold: 0.3 (aggressive)
→ Low Precision (70%), High Recall (95%)
Choose based on use case!
Reference: Chapter 9, Section 9.12 - Precision vs Recall or Chapter 11 - Metrics
Question 13: A UEBA system flags a user for 'unusual data access' because they accessed 500 files (baseline: 50). Investigation reveals the user is performing a legitimate audit. What type of error is this?
A) True Positive B) False Positive C) False Negative D) True Negative
Answer
Correct Answer: B) False Positive
Explanation:
Classification:
Alert: Unusual data access (model predicted: MALICIOUS)
Reality: Legitimate audit (actual: BENIGN)
Result: FALSE POSITIVE (alarm when shouldn't be)
Four Outcomes:
True Positive (TP):
Predicted: MALICIOUS
Actual: MALICIOUS
Example: UEBA flags impossible travel, investigation confirms account compromise
Result: ✅ Correct detection
True Negative (TN):
Predicted: BENIGN
Actual: BENIGN
Example: Normal file access, no alert
Result: ✅ Correct (no alarm needed)
False Positive (FP):
Predicted: MALICIOUS
Actual: BENIGN
Example: Legitimate audit flagged as data exfiltration
Result: ❌ Incorrect alarm (wasted analyst time)
False Negative (FN):
Predicted: BENIGN
Actual: MALICIOUS
Example: Insider slowly exfiltrates data, stays under threshold, not flagged
Result: ❌ Missed threat (dangerous!)
Impact of FPs in SOC: - Analyst Time: Wasted investigating benign activity - Alert Fatigue: Too many FPs → analysts become desensitized - Missed Threats: Time spent on FPs means less time for real threats
Reducing UEBA False Positives:
1. Dynamic Baselines:
Problem: Static threshold (50 files) doesn't account for legitimate changes
Solution: Adaptive baseline
- Detect user started audit project (new behavior cluster)
- Adjust expected range: 50-500 files for audit period
- Alert only if exceeds new range (e.g., 1000 files)
2. Context Enrichment:
Alert: 500 file access
Enrichment:
- Check calendar: "Annual audit week" scheduled
- Check ticket system: Audit ticket assigned to user
- Check manager approval: Access approved
Result: Auto-close as expected behavior
3. Feedback Loop:
Analyst marks alert as FP
→ Feeds back to model training
→ Model learns: "High file access during audit week = normal"
→ Future audits don't trigger alerts
4. Tuning Thresholds:
Current: Alert if > 10x baseline (50 → 500)
Tuned: Alert if > 20x baseline (50 → 1000)
Trade-off: Fewer FPs but might miss some real threats (higher FN)
Reference: Chapter 9, Section 9.5 - UEBA and Common Pitfalls
Question 14: What is the cold start problem in UEBA?
A) UEBA systems need to warm up before use B) Inability to establish behavioral baselines for new users/entities with no historical data C) Systems run better in cold climates D) Cold start is not a real problem
Answer
Correct Answer: B) Inability to establish baselines for new users/entities with no historical data
Explanation:
Cold Start Problem: - Issue: UEBA requires 30-90 days of historical data to build baseline - Challenge: New users have no history - Impact: No baseline = no anomaly detection (blind spot)
Cold Start Scenarios:
1. New Employee:
Day 1: User "new_hire" starts
UEBA: No historical data, can't establish baseline
Problem: If new_hire is malicious or compromised from day 1, UEBA won't detect anomalies
Risk Window: 30-90 days until baseline is established
2. New Server:
New database server deployed
UEBA: No baseline for normal network traffic patterns
Problem: Can't detect if server is immediately compromised (no anomaly reference)
3. Job Role Change:
User "jdoe" promoted from Sales → IT Admin
Old Baseline: CRM access, 9-5 logins
New Role: Server access, on-call hours
Problem: New behavior looks anomalous compared to old baseline
Result: False positive flood OR need to rebuild baseline (cold start again)
Mitigating Cold Start:
1. Peer Group Baselines:
Instead of individual baseline, use role-based baseline
New hire "jdoe" role: Software Engineer
Baseline: Aggregate behavior of all engineers
- Expected apps: IDE, Git, Slack
- Expected access: Code repos, dev servers
- Expected hours: Flexible (some work nights)
Anomaly: If jdoe accesses HR database → Alert (engineers don't typically access HR)
2. Default Safe Behavior:
During baseline learning period (30 days):
- Apply stricter rule-based detections
- Flag high-risk actions (e.g., privilege escalation)
- Don't rely solely on behavioral anomaly detection
3. Transfer Learning:
Use baselines from similar users/entities
New Sales user → Use existing Sales team baseline
New web server → Use existing web server baseline
Advantage: Immediate anomaly detection
Caveat: Assumes role similarity
4. Accelerated Baselining:
Compressed baseline period:
- Traditional: 90 days
- Accelerated: 7-14 days with more aggressive data collection
- Trade-off: Less accurate baseline but faster coverage
5. Hybrid Approach:
Combine:
- Peer group baseline (immediate coverage)
- + Individual baseline (building over 30-90 days)
- + Rule-based detections (catch obvious threats)
Transition: Start with peer baseline, gradually shift to personalized baseline
Example:
New employee "alice" starts as Finance Analyst
Week 1-4: Cold Start Period
- Use Finance team baseline
- UEBA compares alice to average Finance behavior
- Alert if alice accesses engineering code (unusual for Finance)
Week 5-8: Hybrid Period
- Combine team baseline (70%) with emerging individual baseline (30%)
- Refine understanding of alice's specific patterns
Week 9+: Individual Baseline
- Sufficient data for personalized baseline
- Anomaly detection tailored to alice's specific behavior
Reference: Chapter 9, Section 9.5 - UEBA Cold Start
Question 15: Why is explainability important for ML models in SOC operations?
A) Explainability is not important B) Analysts need to understand why a model made a prediction to validate decisions, tune models, and build trust C) Models should always be black boxes D) Explainability slows down detection
Answer
Correct Answer: B) Analysts need to understand why a model made predictions to validate, tune, and build trust
Explanation:
Importance of Explainability:
1. Validation:
Alert: Malware detected (confidence: 95%)
Black Box: "File is malware" (no explanation)
→ Analyst: "Why? I need to investigate before blocking"
Explainable: "Malware because:"
- High entropy (7.9) - indicates packing/encryption
- Suspicious imports: VirtualAlloc, CreateRemoteThread
- Rare file (prevalence: 0.001%)
- Threat intel: Hash matches Emotet variant
→ Analyst: "Makes sense, approved for blocking"
2. Trust:
Analysts trust models they understand
Black box predictions → skepticism, manual override
Explainable predictions → confidence, faster response
3. Debugging/Tuning:
Problem: Model flags benign software as malware
Black Box: Hard to diagnose why
Explainable: "Flagged because high entropy (7.8)"
→ Diagnosis: Legitimate software also has high entropy (compression)
→ Fix: Add additional features (digital signature, prevalence)
4. Compliance:
Regulations (GDPR, CCPA) require "right to explanation"
- If AI blocks user action, must explain why
- Auditors need to understand decision logic
Explainability Techniques:
1. Feature Importance:
Alert Scoring Model:
Which features most influenced score?
Feature Importance:
1. Threat Intel Match: 40%
2. Asset Criticality: 25%
3. User Risk Score: 20%
4. Time of Day: 10%
5. Other: 5%
Explanation: "Alert scored high primarily due to threat intel match"
2. SHAP (SHapley Additive exPlanations):
Base risk score: 50/100
Feature Contributions:
+ Threat intel match: +35 points
+ Critical asset: +20 points
- Low user risk: -5 points
- Business hours: -3 points
Final score: 97/100
Explanation: "High score driven by threat intel and critical asset"
3. Decision Trees (Inherently Explainable):
IF threat_intel_match = True:
IF asset_critical = True:
IF user_privileged = True:
Risk = CRITICAL
ELSE:
Risk = HIGH
Path: threat_intel=True → asset=True → user=False → Risk=HIGH
Explanation: Clear decision path
4. Counterfactual Explanations:
Alert: Blocked (confidence: 92%)
Explanation: "Would have been allowed if:"
- Threat intel confidence < 80% (currently 95%)
- OR asset criticality = Low (currently Critical)
Helps analyst understand decision boundary
Example: Phishing Detection
Email: "Urgent: Verify your account"
Prediction: PHISHING (confidence: 94%)
Black Box: No explanation
→ Analyst wastes 5 minutes manually analyzing
Explainable:
Reasons for PHISHING classification:
1. Urgent language: "Urgent", "Verify" (score: +30)
2. External sender + internal lookalike domain (score: +40)
3. Suspicious link (domain age: 2 days) (score: +20)
4. No previous email history with sender (score: +4)
→ Analyst: "Clear phishing, deleting and blocking" (1 minute decision)
Trade-offs: - Simple Models: Very explainable (decision trees, linear models) but may be less accurate - Complex Models: More accurate (deep neural networks) but harder to explain - Solution: Use explainability tools (SHAP, LIME) to interpret complex models
Reference: Chapter 9, Section 9.13 - Model Explainability or Chapter 10 - Guardrails
Score Interpretation¶
- 13-15 correct: Excellent! You have strong ML fundamentals and understand SOC-specific applications.
- 10-12 correct: Good understanding. Review overfitting/underfitting and model drift concepts.
- 7-9 correct: Adequate baseline. Focus on supervised vs unsupervised learning and UEBA.
- Below 7: Review Chapter 9 thoroughly, especially classification, regression, and feature engineering.