Chapter 8: Incident Response¶

Learning Objectives¶

By the end of this chapter, you will be able to:

Apply the NIST incident response lifecycle (Preparation, Detection, Containment, Eradication, Recovery, Lessons Learned)
Execute proper containment and eradication procedures
Conduct effective post-incident reviews and root cause analysis
Coordinate cross-functional incident response teams
Leverage AI tools for incident timeline reconstruction and forensic analysis

Prerequisites¶

Chapter 5: Investigation and triage fundamentals
Chapter 7: SOAR automation concepts
Basic understanding of forensics principles

Key Concepts¶

Incident Response • Containment • Eradication • Forensic Preservation • Incident Commander • Post-Incident Review

Curiosity Hook: The Ransomware at 4 PM Friday¶

4:07 PM Friday. Multiple users report encrypted files. EDR alerts: Ransomware detected on 12 systems and spreading.

Incident Commander's Decision Points: - Isolate systems or shut down the network entirely? - Preserve evidence or prioritize business continuity? - Notify customers? Law enforcement? Insurance? - Pay the ransom or restore from backups?

The Clock: Every minute of delay = more encrypted systems, more lost data, higher recovery costs.

This chapter teaches: The structured IR process that enables calm, effective decision-making during chaos.

8.1 The NIST Incident Response Lifecycle¶

Overview¶

[1. Preparation] → [2. Detection & Analysis] → [3. Containment, Eradication, Recovery] → [4. Post-Incident Activity]
       ↑                                                                                            |
       └────────────────────────────────────────────────────────────────────────────────────────────┘
                                        (Continuous Improvement)

Source: NIST SP 800-61 Rev. 2 (Computer Security Incident Handling Guide)

Phase 1: Preparation¶

Goal: Establish capabilities, policies, and training before incidents occur.

Key Activities: - IR Plan Documentation: Define roles, communication procedures, escalation paths - Tool Readiness: SIEM, EDR, forensic tools, SOAR playbooks - Team Training: Tabletop exercises, simulations - Stakeholder Identification: Legal, PR, executives, law enforcement contacts - Asset Inventory: Know what systems exist and their criticality - Backup Validation: Ensure backups exist and are restorable (test regularly!)

Common Gap: Organizations skip preparation, then scramble during incidents.

Phase 2: Detection & Analysis¶

Goal: Identify and validate security incidents, assess scope and severity.

Activities: - Alert Triage: (Covered in Chapter 5) Determine if alert is true positive - Incident Declaration: Formally declare an incident if confirmed - Initial Scoping: How many systems? What data? Still active? - Severity Classification:

Severity	Criteria	Response SLA
Low	Single system, no sensitive data, contained	24 hours
Medium	Multiple systems, internal data, contained	4 hours
High	Critical systems, confidential data, or active spread	1 hour
Critical	Enterprise-wide, regulated data, or public-facing breach	Immediate (15 min)

Phase 3: Containment, Eradication, Recovery¶

Goal: Stop the attack, remove the threat, restore normal operations.

Containment¶

Short-term Containment: - Isolate infected systems from network (EDR isolation, VLAN segmentation) - Block malicious IPs/domains at firewall/proxy - Disable compromised accounts

Long-term Containment: - Rebuild isolated systems - Patch vulnerabilities exploited by attacker - Deploy compensating controls (e.g., block SMB externally if exploit used SMB)

Decision: Preserve Evidence vs. Business Continuity - Preserve evidence: Keep systems powered on, create forensic images before remediation - Business priority: Immediate restore from backup, limited forensics - Balance: Isolate + image critical systems, expedite recovery for others

Eradication¶

Goal: Completely remove the threat from the environment.

Activities: - Malware Removal: Delete malicious files, processes, registry keys - Account Cleanup: Remove attacker-created accounts, reset compromised passwords - Backdoor Removal: Identify and eliminate persistence mechanisms (scheduled tasks, WMI subscriptions, webshells) - Vulnerability Patching: Address root cause (unpatched software, misconfiguration)

Validation: - Re-scan systems with EDR/antivirus - Check for IOCs across entire environment (ensure threat hasn't spread to undetected systems) - Monitor for re-infection attempts

Recovery¶

Goal: Restore systems to normal operations and verify they're clean.

Activities: - System Restoration: Rebuild from known-good backups or fresh OS installs - Service Restoration: Bring systems back online incrementally (monitor for anomalies) - Validation Testing: Confirm systems function correctly and are threat-free - Enhanced Monitoring: Increase logging and alerting on recovered systems for 30 days

Post-Recovery Checklist: - [ ] All affected systems rebuilt or verified clean - [ ] All compromised credentials reset - [ ] All persistence mechanisms removed - [ ] Patches applied to prevent re-exploitation - [ ] Monitoring enhanced to detect re-infection - [ ] Business stakeholders confirm operations restored

Phase 4: Post-Incident Activity¶

Goal: Learn from the incident to improve defenses and processes.

Activities: - Incident Report: Document timeline, impact, IOCs, root cause - Lessons Learned Meeting: Cross-functional review (IR team, IT, management) - Process Improvements: Update runbooks, playbooks, detection rules - Metrics Update: Track MTTR, cost, systems affected

8.2 Incident Response Team Roles¶

Incident Commander (IC)¶

Responsibilities: - Overall incident leadership and decision-making - Coordinate cross-functional teams - Communicate with executives and stakeholders - Declare incident severity and response level

Skills: Leadership, technical knowledge, calm under pressure

Technical Lead¶

Responsibilities: - Oversee technical investigation and remediation - Direct forensic analysis - Validate containment and eradication effectiveness

Skills: Deep technical expertise (forensics, malware analysis, system administration)

Communications Lead¶

Responsibilities: - Internal communication (status updates to executives, IT, affected departments) - External communication (customers, regulators, media if needed) - Coordinate with Legal and PR

Skills: Clear communication, regulatory knowledge, crisis management

SOC Analysts (Tier 1/2)¶

Responsibilities: - Execute containment actions (isolate systems, block IPs) - Gather evidence and IOCs - Search for spread to other systems - Document findings in real-time

IT Operations¶

Responsibilities: - System recovery and restoration - Patch deployment - Backup restoration - Network segmentation adjustments

Legal/Compliance¶

Responsibilities: - Assess regulatory notification requirements (GDPR, HIPAA, etc.) - Advise on evidence preservation for potential legal action - Coordinate with law enforcement if needed

8.3 Containment Strategies¶

Network Isolation¶

Techniques: - EDR Isolation: Quarantine endpoint (block network access, allow remote management) - VLAN Segmentation: Move infected systems to isolated VLAN - Firewall Rules: Block traffic to/from infected systems - DNS Sinkholing: Redirect malware C2 domains to internal sinkhole (monitor callback attempts)

Example EDR Isolation Command:

# CrowdStrike Falcon
falcon-cli contain <host-id>

# Result: System isolated from network, analyst can still remote access for investigation

Account Disablement¶

Best Practices: - Immediate Disable: Compromised user accounts in Active Directory - MFA Reset: Force re-enrollment of MFA devices (attacker may have enrolled rogue devices) - Credential Reset: Change passwords for all accounts with similar access levels (assume lateral movement)

Example PowerShell:

# Disable compromised account
Disable-ADAccount -Identity "jsmith"

# Reset password and force change on next login
Set-ADAccountPassword -Identity "jsmith" -NewPassword (ConvertTo-SecureString "TempPass123!" -AsPlainText -Force)
Set-ADUser -Identity "jsmith" -ChangePasswordAtLogon $true

Data Exfiltration Prevention¶

If data theft is suspected but not yet confirmed: - Block outbound connections: Restrict egress to known-good destinations only - Monitor data loss prevention (DLP) alerts: Watch for large uploads to cloud storage, email - Engage forensics: Analyze netflow and proxy logs for exfiltration indicators

8.4 Forensic Evidence Preservation¶

Why Preserve Evidence?¶

Legal Action: Potential prosecution or civil lawsuit
Regulatory Compliance: Some industries require forensic investigation (PCI-DSS, HIPAA)
Threat Intelligence: Understand attacker TTPs for future defense
Lessons Learned: Root cause analysis

Chain of Custody¶

Definition: Documented trail of who handled evidence, when, and why.

Chain of Custody Log Example:

Date/Time	Evidence ID	Action	Custodian	Notes
2026-02-15 16:30	IMG-001	Created disk image	Alice Chen	Forensic image of WKS-042
2026-02-15 17:00	IMG-001	Transferred to storage	Alice Chen	Stored on forensic server FS-01
2026-02-16 09:00	IMG-001	Analysis begun	Bob Kumar	Mounted read-only for investigation

Forensic Imaging¶

Best Practice: Create bit-for-bit copy of disk before remediation.

Tools: - FTK Imager: Free, Windows-focused - dd (Linux): dd if=/dev/sda of=/mnt/evidence/disk.img bs=4M - EnCase / Autopsy: Commercial forensic suites

Steps: 1. Boot system from forensic USB (avoid altering disk) 2. Create cryptographic hash (SHA256) of source disk 3. Image disk to external storage 4. Create hash of image and verify match to source 5. Document in chain of custody log

Volatile Data Collection (Before Powering Off)¶

Memory Dump:

# Linux
sudo dd if=/dev/mem of=/mnt/usb/memory.dump

# Windows (DumpIt, Magnet RAM Capture)
DumpIt.exe /output E:\memory.dmp

Running Processes, Network Connections:

# Capture state before shutdown
ps aux > processes.txt
netstat -antp > network_connections.txt
lsof > open_files.txt

Why: Memory contains encryption keys, malware code, active network connections that are lost on shutdown.

8.5 Post-Incident Review¶

Lessons Learned Meeting¶

Participants: IR team, IT, management, any affected departments

Agenda: 1. Timeline Review: What happened and when? 2. What Went Well: Effective actions, good decisions 3. What Went Wrong: Gaps, delays, missteps 4. Root Cause: How did attackers gain access? What enabled their success? 5. Remediation Actions: What will we change? (Technical, process, training) 6. Owner Assignment: Who owns each action item? Deadline?

Example Lessons Learned¶

Incident: Ransomware via phishing email

What Went Well: - ✅ EDR detected and isolated 10 of 12 infected systems automatically - ✅ Backups were available and restorable - ✅ IR team followed playbook effectively

What Went Wrong: - ❌ Email gateway did not detect phishing (no attachment sandboxing) - ❌ 2 critical servers not covered by EDR (budget constraints) - ❌ Initial triage delayed 30 minutes (analyst on break, no backup coverage)

Root Cause: - Phishing email bypassed detection - User clicked malicious link (training gap) - No MFA on VPN (attacker used stolen credentials for lateral access)

Remediation Actions:

Action	Owner	Deadline	Status
Deploy email attachment sandboxing	IT Security	2026-03-01	In Progress
Extend EDR to all servers	IT Ops	2026-03-15	Approved
Implement phishing-resistant MFA	IAM Team	2026-04-01	Planning
Quarterly phishing simulations	HR/Security	Ongoing	New Process
24/7 SOC coverage (eliminate gaps)	SOC Manager	2026-03-01	Hiring

8.6 AI in Incident Response¶

Use Case 1: Automated Timeline Reconstruction¶

Challenge: Building incident timelines manually from thousands of log entries is time-consuming.

AI Solution:

# AI-powered timeline builder
def build_timeline(incident_id):
    logs = fetch_logs(incident_id, sources=['endpoint', 'network', 'auth'])

    # LLM extracts key events and sequences them
    timeline = llm_extract_timeline(logs, context={
        'incident_type': 'ransomware',
        'affected_systems': ['WKS-042', 'FILE-SRV-01'],
        'timeframe': '2026-02-15 14:00 to 18:00'
    })

    # Output: Chronological markdown timeline with ATT&CK mapping
    return timeline

Example Output:

## Incident Timeline: INC-2026-0245

14:05 - Initial Access (T1566.001): User jsmith opened phishing email attachment
14:06 - Execution (T1204.002): Malicious payload executed (invoice.exe)
14:07 - C2 Communication (T1071.001): HTTPS beacon to 45.33.32.156
14:15 - Credential Access (T1003.001): LSASS memory dump attempted (blocked by EDR)
14:18 - Lateral Movement (T1021.001): RDP connection to FILE-SRV-01
14:22 - Impact (T1486): Ransomware encryption initiated on FILE-SRV-01

Use Case 2: Forensic Analysis Assistance¶

Analyst Query:

"Analyze memory dump and identify suspicious processes."

LLM Copilot (RAG-enhanced with YARA rules, known malware patterns):

Memory Dump Analysis: memory.dmp

Suspicious Processes Identified:
1. Process: svchost.exe (PID 4892)
   - Parent: explorer.exe (unusual, svchost normally spawned by services.exe)
   - Network: Active connection to 45.33.32.156:443
   - Strings: "encrypt_files", "bitcoin_wallet"
   - Assessment: HIGH SUSPICION - Likely ransomware payload

2. Process: rundll32.exe (PID 5123)
   - Command Line: rundll32.exe C:\temp\malicious.dll,EntryPoint
   - DLL Hash: 5d41402abc4b2a76b9719d911017c592 (VirusTotal: 45/70 malicious)
   - Assessment: CONFIRMED MALICIOUS

Recommended Actions:
- Extract malicious.dll for malware analysis
- Search all systems for file hash 5d41402abc4b2a76b9719d911017c592
- Block IP 45.33.32.156 on firewall (likely C2)

Use Case 3: Playbook Recommendations¶

Incident Type: Suspected data exfiltration

LLM Suggestion:

Recommended Response Playbook:

1. Containment:
   - Isolate affected system from network (preserve evidence)
   - Block outbound connections to suspicious IPs
   - Disable user account if credentials compromised

2. Investigation:
   - Review proxy logs for large outbound transfers
   - Check cloud access logs (OneDrive, Dropbox, personal email)
   - Analyze file access logs to determine what data was accessed

3. Evidence Preservation:
   - Create forensic disk image
   - Collect memory dump
   - Export relevant logs (30 days pre/post incident)

4. Eradication:
   - Remove malware/backdoors from system
   - Reset compromised credentials
   - Patch vulnerabilities

5. Recovery:
   - Rebuild system from known-good image
   - Restore user access with enhanced monitoring

6. Post-Incident:
   - Notify legal team (data breach assessment)
   - Regulatory notification if PII exfiltrated (GDPR 72-hour rule)
   - Lessons learned meeting

Interactive Element¶

MicroSim 8: Incident Commander Simulation

Practice incident response decision-making under time pressure. Balance evidence preservation, containment, and business continuity.

Common Misconceptions¶

Misconception: Incident Response Starts After Detection

Reality: Effective IR starts with preparation (plans, tools, training). Organizations that skip preparation struggle during incidents.

Misconception: Full Forensic Analysis Is Always Required

Reality: Forensics level depends on incident severity, regulatory requirements, and business priorities. Low-severity incidents may not justify extensive forensics. Focus on getting systems operational, capture high-level evidence.

Misconception: Eradication Means Deleting Malware Files

Reality: True eradication requires removing all persistence mechanisms (scheduled tasks, registry keys, backdoor accounts, webshells). Attackers often leave multiple footholds.

Practice Tasks¶

Task 1: Incident Severity Classification¶

Scenario: Single workstation infected with malware. No lateral movement detected. No sensitive data accessed. Malware quarantined by EDR.

Question: What severity level and response SLA apply?

Answer

Severity: LOW

Reasoning: - Single system (limited scope) - No sensitive data impact - Already contained (quarantined) - No active spread

Response SLA: 24 hours for full investigation and eradication

Actions: - Reimage workstation - Scan network for IOCs (ensure no spread) - Document IOCs for threat intel - Low urgency, can be handled during normal business hours

Task 2: Containment Decision¶

Scenario: Ransomware detected on 5 systems and spreading. It's 4 PM Friday. Business operates 24/7.

Options: a) Shut down entire network immediately b) Isolate infected systems only c) Monitor and investigate before taking action d) Wait until Monday to assess

Question: What's the best containment strategy?

Answer

Best Option: b) Isolate infected systems only

Reasoning: - Shutting down entire network (option a) causes massive business disruption (24/7 operations) - Monitoring without action (option c) allows spread to continue - Waiting (option d) is unacceptable for active ransomware

Recommended Actions: 1. Immediately isolate infected 5 systems (EDR isolation) 2. Search environment for IOCs to identify any other infected systems (isolate those too) 3. Block malware C2 domains/IPs at firewall 4. Disable compromised accounts 5. If spread continues despite isolation, escalate to broader network segmentation (e.g., isolate entire subnet)

Business Continuity Balance: Targeted isolation minimizes disruption while containing threat.

Task 3: Post-Incident Action Prioritization¶

Given Lessons Learned: - Email gateway lacks sandboxing (root cause of phishing success) - MFA not enforced on VPN (enabled lateral movement) - Backup restoration took 6 hours (backups not tested regularly) - No EDR on 10% of servers (budget limitation)

Question: Prioritize remediation actions (1 = highest priority).

Answer

Priority Order:

Deploy email sandboxing (Highest)
Directly addresses root cause
Phishing is #1 initial access vector
Quick win (technical implementation)
Enforce MFA on VPN
Prevents lateral movement even if credentials stolen
High security value, relatively easy implementation
Compensates for inevitable phishing clicks
Validate and test backups monthly
Critical for ransomware recovery
Low cost, high impact
Testing prevents surprises during actual incidents
Extend EDR to remaining 10% of servers
Important but may require budget approval
Start with most critical servers
Incremental deployment acceptable

Rationale: Address root cause and high-impact/low-effort items first.

Exam Prep & Certifications¶

Relevant Certifications

The topics in this chapter align with the following certifications:

CompTIA Security+ — Domains: Security Operations, Security Architecture
CompTIA CySA+ — Domains: Security Operations, Incident Response
GIAC GCIH — Domains: Incident Handling, Automation
CISSP — Domains: Security Operations, Security Architecture

View full Certifications Roadmap →

Self-Assessment Quiz¶

Question 1: What is the first phase of the NIST incident response lifecycle?

Options:

a) Detection and Analysis b) Containment c) Preparation d) Post-Incident Activity

Show Answer

Correct Answer: c) Preparation

Explanation: Preparation (policies, tools, training) must happen before incidents occur. Detection comes second in the lifecycle.

Question 2: What is the primary goal of the Containment phase?

Options:

a) Identify the root cause of the incident b) Stop the attack from spreading and limit damage c) Restore systems to normal operations d) Document lessons learned

Show Answer

Correct Answer: b) Stop the attack from spreading and limit damage

Explanation: Containment limits scope and impact. Root cause analysis happens in post-incident. Recovery happens after eradication.

Question 3: Why is forensic imaging important before remediation?

Options:

a) It speeds up system recovery b) It preserves evidence for legal action and root cause analysis c) It automatically removes malware d) It reduces incident severity

Show Answer

Correct Answer: b) It preserves evidence for legal action and root cause analysis

Explanation: Forensic images preserve the system state for investigation. Remediation actions alter or destroy evidence.

Question 4: What should be included in a 'Lessons Learned' review?

Options:

a) Only what went wrong b) Only technical details c) What went well, what went wrong, root cause, and remediation actions d) Executive bonuses for good incident response

Show Answer

Correct Answer: c) What went well, what went wrong, root cause, and remediation actions

Explanation: Comprehensive lessons learned cover successes (to repeat), failures (to improve), root causes (to address), and concrete actions (to implement).

Question 5: What is the role of the Incident Commander?

Options:

a) Perform all technical forensic analysis b) Write code to block attacks c) Lead overall incident response, coordinate teams, and make key decisions d) Handle all external communication exclusively

Show Answer

Correct Answer: c) Lead overall incident response, coordinate teams, and make key decisions

Explanation: The IC leads and coordinates. Technical leads handle forensics, communications leads handle messaging. IC makes strategic decisions.

Question 6: How can AI/LLMs assist in incident response?

Options:

a) Automated timeline reconstruction from logs b) Forensic analysis assistance and anomaly identification c) Playbook recommendations based on incident type d) All of the above

Show Answer

Correct Answer: d) All of the above

Explanation: AI assists with timeline generation (parsing logs), forensic analysis (identifying suspicious artifacts), and playbook suggestions (matching incident to best practices).

Summary¶

In this chapter, you learned:

NIST IR lifecycle: Preparation, Detection & Analysis, Containment/Eradication/Recovery, Post-Incident Activity
IR team roles: Incident Commander, Technical Lead, Communications, SOC, IT, Legal
Containment strategies: Network isolation, account disablement, egress blocking
Evidence preservation: Forensic imaging, chain of custody, volatile data collection
Post-incident reviews: Lessons learned meetings, root cause analysis, remediation tracking
AI in IR: Automated timelines, forensic analysis assistance, playbook recommendations

Next Steps¶

Next Chapter: Chapter 9: AI/ML in SOC - Deep dive into machine learning for detection and automation
Practice: Conduct a tabletop exercise with your team using the Incident Commander MicroSim
Review: Update your organization's IR plan based on lessons from this chapter
Template: Adopt the lessons learned template for post-incident reviews

Chapter 8 Complete | Next: Chapter 9 →