Chapter 8: Incident Response¶
Learning Objectives¶
By the end of this chapter, you will be able to:
- Apply the NIST incident response lifecycle (Preparation, Detection, Containment, Eradication, Recovery, Lessons Learned)
- Execute proper containment and eradication procedures
- Conduct effective post-incident reviews and root cause analysis
- Coordinate cross-functional incident response teams
- Leverage AI tools for incident timeline reconstruction and forensic analysis
Prerequisites¶
- Chapter 5: Investigation and triage fundamentals
- Chapter 7: SOAR automation concepts
- Basic understanding of forensics principles
Key Concepts¶
Incident Response • Containment • Eradication • Forensic Preservation • Incident Commander • Post-Incident Review
Curiosity Hook: The Ransomware at 4 PM Friday¶
4:07 PM Friday. Multiple users report encrypted files. EDR alerts: Ransomware detected on 12 systems and spreading.
Incident Commander's Decision Points: - Isolate systems or shut down the network entirely? - Preserve evidence or prioritize business continuity? - Notify customers? Law enforcement? Insurance? - Pay the ransom or restore from backups?
The Clock: Every minute of delay = more encrypted systems, more lost data, higher recovery costs.
This chapter teaches: The structured IR process that enables calm, effective decision-making during chaos.
8.1 The NIST Incident Response Lifecycle¶
Overview¶
[1. Preparation] → [2. Detection & Analysis] → [3. Containment, Eradication, Recovery] → [4. Post-Incident Activity]
↑ |
└────────────────────────────────────────────────────────────────────────────────────────────┘
(Continuous Improvement)
Source: NIST SP 800-61 Rev. 2 (Computer Security Incident Handling Guide)
Phase 1: Preparation¶
Goal: Establish capabilities, policies, and training before incidents occur.
Key Activities: - IR Plan Documentation: Define roles, communication procedures, escalation paths - Tool Readiness: SIEM, EDR, forensic tools, SOAR playbooks - Team Training: Tabletop exercises, simulations - Stakeholder Identification: Legal, PR, executives, law enforcement contacts - Asset Inventory: Know what systems exist and their criticality - Backup Validation: Ensure backups exist and are restorable (test regularly!)
Common Gap: Organizations skip preparation, then scramble during incidents.
Phase 2: Detection & Analysis¶
Goal: Identify and validate security incidents, assess scope and severity.
Activities: - Alert Triage: (Covered in Chapter 5) Determine if alert is true positive - Incident Declaration: Formally declare an incident if confirmed - Initial Scoping: How many systems? What data? Still active? - Severity Classification:
| Severity | Criteria | Response SLA |
|---|---|---|
| Low | Single system, no sensitive data, contained | 24 hours |
| Medium | Multiple systems, internal data, contained | 4 hours |
| High | Critical systems, confidential data, or active spread | 1 hour |
| Critical | Enterprise-wide, regulated data, or public-facing breach | Immediate (15 min) |
Phase 3: Containment, Eradication, Recovery¶
Goal: Stop the attack, remove the threat, restore normal operations.
Containment¶
Short-term Containment: - Isolate infected systems from network (EDR isolation, VLAN segmentation) - Block malicious IPs/domains at firewall/proxy - Disable compromised accounts
Long-term Containment: - Rebuild isolated systems - Patch vulnerabilities exploited by attacker - Deploy compensating controls (e.g., block SMB externally if exploit used SMB)
Decision: Preserve Evidence vs. Business Continuity - Preserve evidence: Keep systems powered on, create forensic images before remediation - Business priority: Immediate restore from backup, limited forensics - Balance: Isolate + image critical systems, expedite recovery for others
Eradication¶
Goal: Completely remove the threat from the environment.
Activities: - Malware Removal: Delete malicious files, processes, registry keys - Account Cleanup: Remove attacker-created accounts, reset compromised passwords - Backdoor Removal: Identify and eliminate persistence mechanisms (scheduled tasks, WMI subscriptions, webshells) - Vulnerability Patching: Address root cause (unpatched software, misconfiguration)
Validation: - Re-scan systems with EDR/antivirus - Check for IOCs across entire environment (ensure threat hasn't spread to undetected systems) - Monitor for re-infection attempts
Recovery¶
Goal: Restore systems to normal operations and verify they're clean.
Activities: - System Restoration: Rebuild from known-good backups or fresh OS installs - Service Restoration: Bring systems back online incrementally (monitor for anomalies) - Validation Testing: Confirm systems function correctly and are threat-free - Enhanced Monitoring: Increase logging and alerting on recovered systems for 30 days
Post-Recovery Checklist: - [ ] All affected systems rebuilt or verified clean - [ ] All compromised credentials reset - [ ] All persistence mechanisms removed - [ ] Patches applied to prevent re-exploitation - [ ] Monitoring enhanced to detect re-infection - [ ] Business stakeholders confirm operations restored
Phase 4: Post-Incident Activity¶
Goal: Learn from the incident to improve defenses and processes.
Activities: - Incident Report: Document timeline, impact, IOCs, root cause - Lessons Learned Meeting: Cross-functional review (IR team, IT, management) - Process Improvements: Update runbooks, playbooks, detection rules - Metrics Update: Track MTTR, cost, systems affected
8.2 Incident Response Team Roles¶
Incident Commander (IC)¶
Responsibilities: - Overall incident leadership and decision-making - Coordinate cross-functional teams - Communicate with executives and stakeholders - Declare incident severity and response level
Skills: Leadership, technical knowledge, calm under pressure
Technical Lead¶
Responsibilities: - Oversee technical investigation and remediation - Direct forensic analysis - Validate containment and eradication effectiveness
Skills: Deep technical expertise (forensics, malware analysis, system administration)
Communications Lead¶
Responsibilities: - Internal communication (status updates to executives, IT, affected departments) - External communication (customers, regulators, media if needed) - Coordinate with Legal and PR
Skills: Clear communication, regulatory knowledge, crisis management
SOC Analysts (Tier 1/2)¶
Responsibilities: - Execute containment actions (isolate systems, block IPs) - Gather evidence and IOCs - Search for spread to other systems - Document findings in real-time
IT Operations¶
Responsibilities: - System recovery and restoration - Patch deployment - Backup restoration - Network segmentation adjustments
Legal/Compliance¶
Responsibilities: - Assess regulatory notification requirements (GDPR, HIPAA, etc.) - Advise on evidence preservation for potential legal action - Coordinate with law enforcement if needed
8.3 Containment Strategies¶
Network Isolation¶
Techniques: - EDR Isolation: Quarantine endpoint (block network access, allow remote management) - VLAN Segmentation: Move infected systems to isolated VLAN - Firewall Rules: Block traffic to/from infected systems - DNS Sinkholing: Redirect malware C2 domains to internal sinkhole (monitor callback attempts)
Example EDR Isolation Command:
# CrowdStrike Falcon
falcon-cli contain <host-id>
# Result: System isolated from network, analyst can still remote access for investigation
Account Disablement¶
Best Practices: - Immediate Disable: Compromised user accounts in Active Directory - MFA Reset: Force re-enrollment of MFA devices (attacker may have enrolled rogue devices) - Credential Reset: Change passwords for all accounts with similar access levels (assume lateral movement)
Example PowerShell:
# Disable compromised account
Disable-ADAccount -Identity "jsmith"
# Reset password and force change on next login
Set-ADAccountPassword -Identity "jsmith" -NewPassword (ConvertTo-SecureString "TempPass123!" -AsPlainText -Force)
Set-ADUser -Identity "jsmith" -ChangePasswordAtLogon $true
Data Exfiltration Prevention¶
If data theft is suspected but not yet confirmed: - Block outbound connections: Restrict egress to known-good destinations only - Monitor data loss prevention (DLP) alerts: Watch for large uploads to cloud storage, email - Engage forensics: Analyze netflow and proxy logs for exfiltration indicators
8.4 Forensic Evidence Preservation¶
Why Preserve Evidence?¶
- Legal Action: Potential prosecution or civil lawsuit
- Regulatory Compliance: Some industries require forensic investigation (PCI-DSS, HIPAA)
- Threat Intelligence: Understand attacker TTPs for future defense
- Lessons Learned: Root cause analysis
Chain of Custody¶
Definition: Documented trail of who handled evidence, when, and why.
Chain of Custody Log Example:
| Date/Time | Evidence ID | Action | Custodian | Notes |
|---|---|---|---|---|
| 2026-02-15 16:30 | IMG-001 | Created disk image | Alice Chen | Forensic image of WKS-042 |
| 2026-02-15 17:00 | IMG-001 | Transferred to storage | Alice Chen | Stored on forensic server FS-01 |
| 2026-02-16 09:00 | IMG-001 | Analysis begun | Bob Kumar | Mounted read-only for investigation |
Forensic Imaging¶
Best Practice: Create bit-for-bit copy of disk before remediation.
Tools: - FTK Imager: Free, Windows-focused - dd (Linux): dd if=/dev/sda of=/mnt/evidence/disk.img bs=4M - EnCase / Autopsy: Commercial forensic suites
Steps: 1. Boot system from forensic USB (avoid altering disk) 2. Create cryptographic hash (SHA256) of source disk 3. Image disk to external storage 4. Create hash of image and verify match to source 5. Document in chain of custody log
Volatile Data Collection (Before Powering Off)¶
Memory Dump:
# Linux
sudo dd if=/dev/mem of=/mnt/usb/memory.dump
# Windows (DumpIt, Magnet RAM Capture)
DumpIt.exe /output E:\memory.dmp
Running Processes, Network Connections:
# Capture state before shutdown
ps aux > processes.txt
netstat -antp > network_connections.txt
lsof > open_files.txt
Why: Memory contains encryption keys, malware code, active network connections that are lost on shutdown.
8.5 Post-Incident Review¶
Lessons Learned Meeting¶
Participants: IR team, IT, management, any affected departments
Agenda: 1. Timeline Review: What happened and when? 2. What Went Well: Effective actions, good decisions 3. What Went Wrong: Gaps, delays, missteps 4. Root Cause: How did attackers gain access? What enabled their success? 5. Remediation Actions: What will we change? (Technical, process, training) 6. Owner Assignment: Who owns each action item? Deadline?
Example Lessons Learned¶
Incident: Ransomware via phishing email
What Went Well: - ✅ EDR detected and isolated 10 of 12 infected systems automatically - ✅ Backups were available and restorable - ✅ IR team followed playbook effectively
What Went Wrong: - ❌ Email gateway did not detect phishing (no attachment sandboxing) - ❌ 2 critical servers not covered by EDR (budget constraints) - ❌ Initial triage delayed 30 minutes (analyst on break, no backup coverage)
Root Cause: - Phishing email bypassed detection - User clicked malicious link (training gap) - No MFA on VPN (attacker used stolen credentials for lateral access)
Remediation Actions:
| Action | Owner | Deadline | Status |
|---|---|---|---|
| Deploy email attachment sandboxing | IT Security | 2026-03-01 | In Progress |
| Extend EDR to all servers | IT Ops | 2026-03-15 | Approved |
| Implement phishing-resistant MFA | IAM Team | 2026-04-01 | Planning |
| Quarterly phishing simulations | HR/Security | Ongoing | New Process |
| 24/7 SOC coverage (eliminate gaps) | SOC Manager | 2026-03-01 | Hiring |
8.6 AI in Incident Response¶
Use Case 1: Automated Timeline Reconstruction¶
Challenge: Building incident timelines manually from thousands of log entries is time-consuming.
AI Solution:
# AI-powered timeline builder
def build_timeline(incident_id):
logs = fetch_logs(incident_id, sources=['endpoint', 'network', 'auth'])
# LLM extracts key events and sequences them
timeline = llm_extract_timeline(logs, context={
'incident_type': 'ransomware',
'affected_systems': ['WKS-042', 'FILE-SRV-01'],
'timeframe': '2026-02-15 14:00 to 18:00'
})
# Output: Chronological markdown timeline with ATT&CK mapping
return timeline
Example Output:
## Incident Timeline: INC-2026-0245
14:05 - Initial Access (T1566.001): User jsmith opened phishing email attachment
14:06 - Execution (T1204.002): Malicious payload executed (invoice.exe)
14:07 - C2 Communication (T1071.001): HTTPS beacon to 45.33.32.156
14:15 - Credential Access (T1003.001): LSASS memory dump attempted (blocked by EDR)
14:18 - Lateral Movement (T1021.001): RDP connection to FILE-SRV-01
14:22 - Impact (T1486): Ransomware encryption initiated on FILE-SRV-01
Use Case 2: Forensic Analysis Assistance¶
Analyst Query:
"Analyze memory dump and identify suspicious processes."
LLM Copilot (RAG-enhanced with YARA rules, known malware patterns):
Memory Dump Analysis: memory.dmp
Suspicious Processes Identified:
1. Process: svchost.exe (PID 4892)
- Parent: explorer.exe (unusual, svchost normally spawned by services.exe)
- Network: Active connection to 45.33.32.156:443
- Strings: "encrypt_files", "bitcoin_wallet"
- Assessment: HIGH SUSPICION - Likely ransomware payload
2. Process: rundll32.exe (PID 5123)
- Command Line: rundll32.exe C:\temp\malicious.dll,EntryPoint
- DLL Hash: 5d41402abc4b2a76b9719d911017c592 (VirusTotal: 45/70 malicious)
- Assessment: CONFIRMED MALICIOUS
Recommended Actions:
- Extract malicious.dll for malware analysis
- Search all systems for file hash 5d41402abc4b2a76b9719d911017c592
- Block IP 45.33.32.156 on firewall (likely C2)
Use Case 3: Playbook Recommendations¶
Incident Type: Suspected data exfiltration
LLM Suggestion:
Recommended Response Playbook:
1. Containment:
- Isolate affected system from network (preserve evidence)
- Block outbound connections to suspicious IPs
- Disable user account if credentials compromised
2. Investigation:
- Review proxy logs for large outbound transfers
- Check cloud access logs (OneDrive, Dropbox, personal email)
- Analyze file access logs to determine what data was accessed
3. Evidence Preservation:
- Create forensic disk image
- Collect memory dump
- Export relevant logs (30 days pre/post incident)
4. Eradication:
- Remove malware/backdoors from system
- Reset compromised credentials
- Patch vulnerabilities
5. Recovery:
- Rebuild system from known-good image
- Restore user access with enhanced monitoring
6. Post-Incident:
- Notify legal team (data breach assessment)
- Regulatory notification if PII exfiltrated (GDPR 72-hour rule)
- Lessons learned meeting
Interactive Element¶
MicroSim 8: Incident Commander Simulation
Practice incident response decision-making under time pressure. Balance evidence preservation, containment, and business continuity.
Common Misconceptions¶
Misconception: Incident Response Starts After Detection
Reality: Effective IR starts with preparation (plans, tools, training). Organizations that skip preparation struggle during incidents.
Misconception: Full Forensic Analysis Is Always Required
Reality: Forensics level depends on incident severity, regulatory requirements, and business priorities. Low-severity incidents may not justify extensive forensics. Focus on getting systems operational, capture high-level evidence.
Misconception: Eradication Means Deleting Malware Files
Reality: True eradication requires removing all persistence mechanisms (scheduled tasks, registry keys, backdoor accounts, webshells). Attackers often leave multiple footholds.
Practice Tasks¶
Task 1: Incident Severity Classification¶
Scenario: Single workstation infected with malware. No lateral movement detected. No sensitive data accessed. Malware quarantined by EDR.
Question: What severity level and response SLA apply?
Answer
Severity: LOW
Reasoning: - Single system (limited scope) - No sensitive data impact - Already contained (quarantined) - No active spread
Response SLA: 24 hours for full investigation and eradication
Actions: - Reimage workstation - Scan network for IOCs (ensure no spread) - Document IOCs for threat intel - Low urgency, can be handled during normal business hours
Task 2: Containment Decision¶
Scenario: Ransomware detected on 5 systems and spreading. It's 4 PM Friday. Business operates 24/7.
Options: a) Shut down entire network immediately b) Isolate infected systems only c) Monitor and investigate before taking action d) Wait until Monday to assess
Question: What's the best containment strategy?
Answer
Best Option: b) Isolate infected systems only
Reasoning: - Shutting down entire network (option a) causes massive business disruption (24/7 operations) - Monitoring without action (option c) allows spread to continue - Waiting (option d) is unacceptable for active ransomware
Recommended Actions: 1. Immediately isolate infected 5 systems (EDR isolation) 2. Search environment for IOCs to identify any other infected systems (isolate those too) 3. Block malware C2 domains/IPs at firewall 4. Disable compromised accounts 5. If spread continues despite isolation, escalate to broader network segmentation (e.g., isolate entire subnet)
Business Continuity Balance: Targeted isolation minimizes disruption while containing threat.
Task 3: Post-Incident Action Prioritization¶
Given Lessons Learned: - Email gateway lacks sandboxing (root cause of phishing success) - MFA not enforced on VPN (enabled lateral movement) - Backup restoration took 6 hours (backups not tested regularly) - No EDR on 10% of servers (budget limitation)
Question: Prioritize remediation actions (1 = highest priority).
Answer
Priority Order:
- Deploy email sandboxing (Highest)
- Directly addresses root cause
- Phishing is #1 initial access vector
-
Quick win (technical implementation)
-
Enforce MFA on VPN
- Prevents lateral movement even if credentials stolen
- High security value, relatively easy implementation
-
Compensates for inevitable phishing clicks
-
Validate and test backups monthly
- Critical for ransomware recovery
- Low cost, high impact
-
Testing prevents surprises during actual incidents
-
Extend EDR to remaining 10% of servers
- Important but may require budget approval
- Start with most critical servers
- Incremental deployment acceptable
Rationale: Address root cause and high-impact/low-effort items first.
Exam Prep & Certifications¶
Relevant Certifications
The topics in this chapter align with the following certifications:
- CompTIA Security+ — Domains: Security Operations, Security Architecture
- CompTIA CySA+ — Domains: Security Operations, Incident Response
- GIAC GCIH — Domains: Incident Handling, Automation
- CISSP — Domains: Security Operations, Security Architecture
Self-Assessment Quiz¶
Question 1: What is the first phase of the NIST incident response lifecycle?
Options:
a) Detection and Analysis b) Containment c) Preparation d) Post-Incident Activity
Show Answer
Correct Answer: c) Preparation
Explanation: Preparation (policies, tools, training) must happen before incidents occur. Detection comes second in the lifecycle.
Question 2: What is the primary goal of the Containment phase?
Options:
a) Identify the root cause of the incident b) Stop the attack from spreading and limit damage c) Restore systems to normal operations d) Document lessons learned
Show Answer
Correct Answer: b) Stop the attack from spreading and limit damage
Explanation: Containment limits scope and impact. Root cause analysis happens in post-incident. Recovery happens after eradication.
Question 3: Why is forensic imaging important before remediation?
Options:
a) It speeds up system recovery b) It preserves evidence for legal action and root cause analysis c) It automatically removes malware d) It reduces incident severity
Show Answer
Correct Answer: b) It preserves evidence for legal action and root cause analysis
Explanation: Forensic images preserve the system state for investigation. Remediation actions alter or destroy evidence.
Question 4: What should be included in a 'Lessons Learned' review?
Options:
a) Only what went wrong b) Only technical details c) What went well, what went wrong, root cause, and remediation actions d) Executive bonuses for good incident response
Show Answer
Correct Answer: c) What went well, what went wrong, root cause, and remediation actions
Explanation: Comprehensive lessons learned cover successes (to repeat), failures (to improve), root causes (to address), and concrete actions (to implement).
Question 5: What is the role of the Incident Commander?
Options:
a) Perform all technical forensic analysis b) Write code to block attacks c) Lead overall incident response, coordinate teams, and make key decisions d) Handle all external communication exclusively
Show Answer
Correct Answer: c) Lead overall incident response, coordinate teams, and make key decisions
Explanation: The IC leads and coordinates. Technical leads handle forensics, communications leads handle messaging. IC makes strategic decisions.
Question 6: How can AI/LLMs assist in incident response?
Options:
a) Automated timeline reconstruction from logs b) Forensic analysis assistance and anomaly identification c) Playbook recommendations based on incident type d) All of the above
Show Answer
Correct Answer: d) All of the above
Explanation: AI assists with timeline generation (parsing logs), forensic analysis (identifying suspicious artifacts), and playbook suggestions (matching incident to best practices).
Summary¶
In this chapter, you learned:
- NIST IR lifecycle: Preparation, Detection & Analysis, Containment/Eradication/Recovery, Post-Incident Activity
- IR team roles: Incident Commander, Technical Lead, Communications, SOC, IT, Legal
- Containment strategies: Network isolation, account disablement, egress blocking
- Evidence preservation: Forensic imaging, chain of custody, volatile data collection
- Post-incident reviews: Lessons learned meetings, root cause analysis, remediation tracking
- AI in IR: Automated timelines, forensic analysis assistance, playbook recommendations
Next Steps¶
- Next Chapter: Chapter 9: AI/ML in SOC - Deep dive into machine learning for detection and automation
- Practice: Conduct a tabletop exercise with your team using the Incident Commander MicroSim
- Review: Update your organization's IR plan based on lessons from this chapter
- Template: Adopt the lessons learned template for post-incident reviews
Chapter 8 Complete | Next: Chapter 9 →