Skip to content

Chapter 8: Incident Response

Learning Objectives

By the end of this chapter, you will be able to:

  • Apply the NIST incident response lifecycle (Preparation, Detection, Containment, Eradication, Recovery, Lessons Learned)
  • Execute proper containment and eradication procedures
  • Conduct effective post-incident reviews and root cause analysis
  • Coordinate cross-functional incident response teams
  • Leverage AI tools for incident timeline reconstruction and forensic analysis

Prerequisites

  • Chapter 5: Investigation and triage fundamentals
  • Chapter 7: SOAR automation concepts
  • Basic understanding of forensics principles

Key Concepts

Incident ResponseContainmentEradicationForensic PreservationIncident CommanderPost-Incident Review


Curiosity Hook: The Ransomware at 4 PM Friday

4:07 PM Friday. Multiple users report encrypted files. EDR alerts: Ransomware detected on 12 systems and spreading.

Incident Commander's Decision Points: - Isolate systems or shut down the network entirely? - Preserve evidence or prioritize business continuity? - Notify customers? Law enforcement? Insurance? - Pay the ransom or restore from backups?

The Clock: Every minute of delay = more encrypted systems, more lost data, higher recovery costs.

This chapter teaches: The structured IR process that enables calm, effective decision-making during chaos.


8.1 The NIST Incident Response Lifecycle

Overview

[1. Preparation] → [2. Detection & Analysis] → [3. Containment, Eradication, Recovery] → [4. Post-Incident Activity]
       ↑                                                                                            |
       └────────────────────────────────────────────────────────────────────────────────────────────┘
                                        (Continuous Improvement)

Source: NIST SP 800-61 Rev. 2 (Computer Security Incident Handling Guide)


Phase 1: Preparation

Goal: Establish capabilities, policies, and training before incidents occur.

Key Activities: - IR Plan Documentation: Define roles, communication procedures, escalation paths - Tool Readiness: SIEM, EDR, forensic tools, SOAR playbooks - Team Training: Tabletop exercises, simulations - Stakeholder Identification: Legal, PR, executives, law enforcement contacts - Asset Inventory: Know what systems exist and their criticality - Backup Validation: Ensure backups exist and are restorable (test regularly!)

Common Gap: Organizations skip preparation, then scramble during incidents.


Phase 2: Detection & Analysis

Goal: Identify and validate security incidents, assess scope and severity.

Activities: - Alert Triage: (Covered in Chapter 5) Determine if alert is true positive - Incident Declaration: Formally declare an incident if confirmed - Initial Scoping: How many systems? What data? Still active? - Severity Classification:

Severity Criteria Response SLA
Low Single system, no sensitive data, contained 24 hours
Medium Multiple systems, internal data, contained 4 hours
High Critical systems, confidential data, or active spread 1 hour
Critical Enterprise-wide, regulated data, or public-facing breach Immediate (15 min)

Phase 3: Containment, Eradication, Recovery

Goal: Stop the attack, remove the threat, restore normal operations.

Containment

Short-term Containment: - Isolate infected systems from network (EDR isolation, VLAN segmentation) - Block malicious IPs/domains at firewall/proxy - Disable compromised accounts

Long-term Containment: - Rebuild isolated systems - Patch vulnerabilities exploited by attacker - Deploy compensating controls (e.g., block SMB externally if exploit used SMB)

Decision: Preserve Evidence vs. Business Continuity - Preserve evidence: Keep systems powered on, create forensic images before remediation - Business priority: Immediate restore from backup, limited forensics - Balance: Isolate + image critical systems, expedite recovery for others


Eradication

Goal: Completely remove the threat from the environment.

Activities: - Malware Removal: Delete malicious files, processes, registry keys - Account Cleanup: Remove attacker-created accounts, reset compromised passwords - Backdoor Removal: Identify and eliminate persistence mechanisms (scheduled tasks, WMI subscriptions, webshells) - Vulnerability Patching: Address root cause (unpatched software, misconfiguration)

Validation: - Re-scan systems with EDR/antivirus - Check for IOCs across entire environment (ensure threat hasn't spread to undetected systems) - Monitor for re-infection attempts


Recovery

Goal: Restore systems to normal operations and verify they're clean.

Activities: - System Restoration: Rebuild from known-good backups or fresh OS installs - Service Restoration: Bring systems back online incrementally (monitor for anomalies) - Validation Testing: Confirm systems function correctly and are threat-free - Enhanced Monitoring: Increase logging and alerting on recovered systems for 30 days

Post-Recovery Checklist: - [ ] All affected systems rebuilt or verified clean - [ ] All compromised credentials reset - [ ] All persistence mechanisms removed - [ ] Patches applied to prevent re-exploitation - [ ] Monitoring enhanced to detect re-infection - [ ] Business stakeholders confirm operations restored


Phase 4: Post-Incident Activity

Goal: Learn from the incident to improve defenses and processes.

Activities: - Incident Report: Document timeline, impact, IOCs, root cause - Lessons Learned Meeting: Cross-functional review (IR team, IT, management) - Process Improvements: Update runbooks, playbooks, detection rules - Metrics Update: Track MTTR, cost, systems affected


8.2 Incident Response Team Roles

Incident Commander (IC)

Responsibilities: - Overall incident leadership and decision-making - Coordinate cross-functional teams - Communicate with executives and stakeholders - Declare incident severity and response level

Skills: Leadership, technical knowledge, calm under pressure


Technical Lead

Responsibilities: - Oversee technical investigation and remediation - Direct forensic analysis - Validate containment and eradication effectiveness

Skills: Deep technical expertise (forensics, malware analysis, system administration)


Communications Lead

Responsibilities: - Internal communication (status updates to executives, IT, affected departments) - External communication (customers, regulators, media if needed) - Coordinate with Legal and PR

Skills: Clear communication, regulatory knowledge, crisis management


SOC Analysts (Tier 1/2)

Responsibilities: - Execute containment actions (isolate systems, block IPs) - Gather evidence and IOCs - Search for spread to other systems - Document findings in real-time


IT Operations

Responsibilities: - System recovery and restoration - Patch deployment - Backup restoration - Network segmentation adjustments


Legal/Compliance

Responsibilities: - Assess regulatory notification requirements (GDPR, HIPAA, etc.) - Advise on evidence preservation for potential legal action - Coordinate with law enforcement if needed


8.3 Containment Strategies

Network Isolation

Techniques: - EDR Isolation: Quarantine endpoint (block network access, allow remote management) - VLAN Segmentation: Move infected systems to isolated VLAN - Firewall Rules: Block traffic to/from infected systems - DNS Sinkholing: Redirect malware C2 domains to internal sinkhole (monitor callback attempts)

Example EDR Isolation Command:

# CrowdStrike Falcon
falcon-cli contain <host-id>

# Result: System isolated from network, analyst can still remote access for investigation


Account Disablement

Best Practices: - Immediate Disable: Compromised user accounts in Active Directory - MFA Reset: Force re-enrollment of MFA devices (attacker may have enrolled rogue devices) - Credential Reset: Change passwords for all accounts with similar access levels (assume lateral movement)

Example PowerShell:

# Disable compromised account
Disable-ADAccount -Identity "jsmith"

# Reset password and force change on next login
Set-ADAccountPassword -Identity "jsmith" -NewPassword (ConvertTo-SecureString "TempPass123!" -AsPlainText -Force)
Set-ADUser -Identity "jsmith" -ChangePasswordAtLogon $true


Data Exfiltration Prevention

If data theft is suspected but not yet confirmed: - Block outbound connections: Restrict egress to known-good destinations only - Monitor data loss prevention (DLP) alerts: Watch for large uploads to cloud storage, email - Engage forensics: Analyze netflow and proxy logs for exfiltration indicators


8.4 Forensic Evidence Preservation

Why Preserve Evidence?

  1. Legal Action: Potential prosecution or civil lawsuit
  2. Regulatory Compliance: Some industries require forensic investigation (PCI-DSS, HIPAA)
  3. Threat Intelligence: Understand attacker TTPs for future defense
  4. Lessons Learned: Root cause analysis

Chain of Custody

Definition: Documented trail of who handled evidence, when, and why.

Chain of Custody Log Example:

Date/Time Evidence ID Action Custodian Notes
2026-02-15 16:30 IMG-001 Created disk image Alice Chen Forensic image of WKS-042
2026-02-15 17:00 IMG-001 Transferred to storage Alice Chen Stored on forensic server FS-01
2026-02-16 09:00 IMG-001 Analysis begun Bob Kumar Mounted read-only for investigation

Forensic Imaging

Best Practice: Create bit-for-bit copy of disk before remediation.

Tools: - FTK Imager: Free, Windows-focused - dd (Linux): dd if=/dev/sda of=/mnt/evidence/disk.img bs=4M - EnCase / Autopsy: Commercial forensic suites

Steps: 1. Boot system from forensic USB (avoid altering disk) 2. Create cryptographic hash (SHA256) of source disk 3. Image disk to external storage 4. Create hash of image and verify match to source 5. Document in chain of custody log


Volatile Data Collection (Before Powering Off)

Memory Dump:

# Linux
sudo dd if=/dev/mem of=/mnt/usb/memory.dump

# Windows (DumpIt, Magnet RAM Capture)
DumpIt.exe /output E:\memory.dmp

Running Processes, Network Connections:

# Capture state before shutdown
ps aux > processes.txt
netstat -antp > network_connections.txt
lsof > open_files.txt

Why: Memory contains encryption keys, malware code, active network connections that are lost on shutdown.


8.5 Post-Incident Review

Lessons Learned Meeting

Participants: IR team, IT, management, any affected departments

Agenda: 1. Timeline Review: What happened and when? 2. What Went Well: Effective actions, good decisions 3. What Went Wrong: Gaps, delays, missteps 4. Root Cause: How did attackers gain access? What enabled their success? 5. Remediation Actions: What will we change? (Technical, process, training) 6. Owner Assignment: Who owns each action item? Deadline?


Example Lessons Learned

Incident: Ransomware via phishing email

What Went Well: - ✅ EDR detected and isolated 10 of 12 infected systems automatically - ✅ Backups were available and restorable - ✅ IR team followed playbook effectively

What Went Wrong: - ❌ Email gateway did not detect phishing (no attachment sandboxing) - ❌ 2 critical servers not covered by EDR (budget constraints) - ❌ Initial triage delayed 30 minutes (analyst on break, no backup coverage)

Root Cause: - Phishing email bypassed detection - User clicked malicious link (training gap) - No MFA on VPN (attacker used stolen credentials for lateral access)

Remediation Actions:

Action Owner Deadline Status
Deploy email attachment sandboxing IT Security 2026-03-01 In Progress
Extend EDR to all servers IT Ops 2026-03-15 Approved
Implement phishing-resistant MFA IAM Team 2026-04-01 Planning
Quarterly phishing simulations HR/Security Ongoing New Process
24/7 SOC coverage (eliminate gaps) SOC Manager 2026-03-01 Hiring

8.6 AI in Incident Response

Use Case 1: Automated Timeline Reconstruction

Challenge: Building incident timelines manually from thousands of log entries is time-consuming.

AI Solution:

# AI-powered timeline builder
def build_timeline(incident_id):
    logs = fetch_logs(incident_id, sources=['endpoint', 'network', 'auth'])

    # LLM extracts key events and sequences them
    timeline = llm_extract_timeline(logs, context={
        'incident_type': 'ransomware',
        'affected_systems': ['WKS-042', 'FILE-SRV-01'],
        'timeframe': '2026-02-15 14:00 to 18:00'
    })

    # Output: Chronological markdown timeline with ATT&CK mapping
    return timeline

Example Output:

## Incident Timeline: INC-2026-0245

14:05 - Initial Access (T1566.001): User jsmith opened phishing email attachment
14:06 - Execution (T1204.002): Malicious payload executed (invoice.exe)
14:07 - C2 Communication (T1071.001): HTTPS beacon to 45.33.32.156
14:15 - Credential Access (T1003.001): LSASS memory dump attempted (blocked by EDR)
14:18 - Lateral Movement (T1021.001): RDP connection to FILE-SRV-01
14:22 - Impact (T1486): Ransomware encryption initiated on FILE-SRV-01


Use Case 2: Forensic Analysis Assistance

Analyst Query:

"Analyze memory dump and identify suspicious processes."

LLM Copilot (RAG-enhanced with YARA rules, known malware patterns):

Memory Dump Analysis: memory.dmp

Suspicious Processes Identified:
1. Process: svchost.exe (PID 4892)
   - Parent: explorer.exe (unusual, svchost normally spawned by services.exe)
   - Network: Active connection to 45.33.32.156:443
   - Strings: "encrypt_files", "bitcoin_wallet"
   - Assessment: HIGH SUSPICION - Likely ransomware payload

2. Process: rundll32.exe (PID 5123)
   - Command Line: rundll32.exe C:\temp\malicious.dll,EntryPoint
   - DLL Hash: 5d41402abc4b2a76b9719d911017c592 (VirusTotal: 45/70 malicious)
   - Assessment: CONFIRMED MALICIOUS

Recommended Actions:
- Extract malicious.dll for malware analysis
- Search all systems for file hash 5d41402abc4b2a76b9719d911017c592
- Block IP 45.33.32.156 on firewall (likely C2)


Use Case 3: Playbook Recommendations

Incident Type: Suspected data exfiltration

LLM Suggestion:

Recommended Response Playbook:

1. Containment:
   - Isolate affected system from network (preserve evidence)
   - Block outbound connections to suspicious IPs
   - Disable user account if credentials compromised

2. Investigation:
   - Review proxy logs for large outbound transfers
   - Check cloud access logs (OneDrive, Dropbox, personal email)
   - Analyze file access logs to determine what data was accessed

3. Evidence Preservation:
   - Create forensic disk image
   - Collect memory dump
   - Export relevant logs (30 days pre/post incident)

4. Eradication:
   - Remove malware/backdoors from system
   - Reset compromised credentials
   - Patch vulnerabilities

5. Recovery:
   - Rebuild system from known-good image
   - Restore user access with enhanced monitoring

6. Post-Incident:
   - Notify legal team (data breach assessment)
   - Regulatory notification if PII exfiltrated (GDPR 72-hour rule)
   - Lessons learned meeting


Interactive Element

MicroSim 8: Incident Commander Simulation

Practice incident response decision-making under time pressure. Balance evidence preservation, containment, and business continuity.


Common Misconceptions

Misconception: Incident Response Starts After Detection

Reality: Effective IR starts with preparation (plans, tools, training). Organizations that skip preparation struggle during incidents.

Misconception: Full Forensic Analysis Is Always Required

Reality: Forensics level depends on incident severity, regulatory requirements, and business priorities. Low-severity incidents may not justify extensive forensics. Focus on getting systems operational, capture high-level evidence.

Misconception: Eradication Means Deleting Malware Files

Reality: True eradication requires removing all persistence mechanisms (scheduled tasks, registry keys, backdoor accounts, webshells). Attackers often leave multiple footholds.


Practice Tasks

Task 1: Incident Severity Classification

Scenario: Single workstation infected with malware. No lateral movement detected. No sensitive data accessed. Malware quarantined by EDR.

Question: What severity level and response SLA apply?

Answer

Severity: LOW

Reasoning: - Single system (limited scope) - No sensitive data impact - Already contained (quarantined) - No active spread

Response SLA: 24 hours for full investigation and eradication

Actions: - Reimage workstation - Scan network for IOCs (ensure no spread) - Document IOCs for threat intel - Low urgency, can be handled during normal business hours


Task 2: Containment Decision

Scenario: Ransomware detected on 5 systems and spreading. It's 4 PM Friday. Business operates 24/7.

Options: a) Shut down entire network immediately b) Isolate infected systems only c) Monitor and investigate before taking action d) Wait until Monday to assess

Question: What's the best containment strategy?

Answer

Best Option: b) Isolate infected systems only

Reasoning: - Shutting down entire network (option a) causes massive business disruption (24/7 operations) - Monitoring without action (option c) allows spread to continue - Waiting (option d) is unacceptable for active ransomware

Recommended Actions: 1. Immediately isolate infected 5 systems (EDR isolation) 2. Search environment for IOCs to identify any other infected systems (isolate those too) 3. Block malware C2 domains/IPs at firewall 4. Disable compromised accounts 5. If spread continues despite isolation, escalate to broader network segmentation (e.g., isolate entire subnet)

Business Continuity Balance: Targeted isolation minimizes disruption while containing threat.


Task 3: Post-Incident Action Prioritization

Given Lessons Learned: - Email gateway lacks sandboxing (root cause of phishing success) - MFA not enforced on VPN (enabled lateral movement) - Backup restoration took 6 hours (backups not tested regularly) - No EDR on 10% of servers (budget limitation)

Question: Prioritize remediation actions (1 = highest priority).

Answer

Priority Order:

  1. Deploy email sandboxing (Highest)
  2. Directly addresses root cause
  3. Phishing is #1 initial access vector
  4. Quick win (technical implementation)

  5. Enforce MFA on VPN

  6. Prevents lateral movement even if credentials stolen
  7. High security value, relatively easy implementation
  8. Compensates for inevitable phishing clicks

  9. Validate and test backups monthly

  10. Critical for ransomware recovery
  11. Low cost, high impact
  12. Testing prevents surprises during actual incidents

  13. Extend EDR to remaining 10% of servers

  14. Important but may require budget approval
  15. Start with most critical servers
  16. Incremental deployment acceptable

Rationale: Address root cause and high-impact/low-effort items first.


Exam Prep & Certifications

Relevant Certifications

The topics in this chapter align with the following certifications:

  • CompTIA Security+ — Domains: Security Operations, Security Architecture
  • CompTIA CySA+ — Domains: Security Operations, Incident Response
  • GIAC GCIH — Domains: Incident Handling, Automation
  • CISSP — Domains: Security Operations, Security Architecture

View full Certifications Roadmap →

Self-Assessment Quiz

Question 1: What is the first phase of the NIST incident response lifecycle?

Options:

a) Detection and Analysis b) Containment c) Preparation d) Post-Incident Activity

Show Answer

Correct Answer: c) Preparation

Explanation: Preparation (policies, tools, training) must happen before incidents occur. Detection comes second in the lifecycle.


Question 2: What is the primary goal of the Containment phase?

Options:

a) Identify the root cause of the incident b) Stop the attack from spreading and limit damage c) Restore systems to normal operations d) Document lessons learned

Show Answer

Correct Answer: b) Stop the attack from spreading and limit damage

Explanation: Containment limits scope and impact. Root cause analysis happens in post-incident. Recovery happens after eradication.


Question 3: Why is forensic imaging important before remediation?

Options:

a) It speeds up system recovery b) It preserves evidence for legal action and root cause analysis c) It automatically removes malware d) It reduces incident severity

Show Answer

Correct Answer: b) It preserves evidence for legal action and root cause analysis

Explanation: Forensic images preserve the system state for investigation. Remediation actions alter or destroy evidence.


Question 4: What should be included in a 'Lessons Learned' review?

Options:

a) Only what went wrong b) Only technical details c) What went well, what went wrong, root cause, and remediation actions d) Executive bonuses for good incident response

Show Answer

Correct Answer: c) What went well, what went wrong, root cause, and remediation actions

Explanation: Comprehensive lessons learned cover successes (to repeat), failures (to improve), root causes (to address), and concrete actions (to implement).


Question 5: What is the role of the Incident Commander?

Options:

a) Perform all technical forensic analysis b) Write code to block attacks c) Lead overall incident response, coordinate teams, and make key decisions d) Handle all external communication exclusively

Show Answer

Correct Answer: c) Lead overall incident response, coordinate teams, and make key decisions

Explanation: The IC leads and coordinates. Technical leads handle forensics, communications leads handle messaging. IC makes strategic decisions.


Question 6: How can AI/LLMs assist in incident response?

Options:

a) Automated timeline reconstruction from logs b) Forensic analysis assistance and anomaly identification c) Playbook recommendations based on incident type d) All of the above

Show Answer

Correct Answer: d) All of the above

Explanation: AI assists with timeline generation (parsing logs), forensic analysis (identifying suspicious artifacts), and playbook suggestions (matching incident to best practices).


Summary

In this chapter, you learned:

  • NIST IR lifecycle: Preparation, Detection & Analysis, Containment/Eradication/Recovery, Post-Incident Activity
  • IR team roles: Incident Commander, Technical Lead, Communications, SOC, IT, Legal
  • Containment strategies: Network isolation, account disablement, egress blocking
  • Evidence preservation: Forensic imaging, chain of custody, volatile data collection
  • Post-incident reviews: Lessons learned meetings, root cause analysis, remediation tracking
  • AI in IR: Automated timelines, forensic analysis assistance, playbook recommendations

Next Steps

  • Next Chapter: Chapter 9: AI/ML in SOC - Deep dive into machine learning for detection and automation
  • Practice: Conduct a tabletop exercise with your team using the Incident Commander MicroSim
  • Review: Update your organization's IR plan based on lessons from this chapter
  • Template: Adopt the lessons learned template for post-incident reviews

Chapter 8 Complete | Next: Chapter 9 →