Lab 4: SOAR Playbook Safety Checks¶

Difficulty: ⭐⭐⭐ Advanced Duration: 60–75 minutes Chapter Reference: Chapter 8 — SOAR, Automation and Playbooks Nexus SecOps Controls: Nexus SecOps-096, Nexus SecOps-097, Nexus SecOps-098, Nexus SecOps-099, Nexus SecOps-100, Nexus SecOps-104, Nexus SecOps-106

Learning Objectives¶

By completing this lab, you will be able to:

Identify dangerous automation patterns in SOAR playbooks
Design human-in-the-loop gates appropriate to action severity
Apply the principle of least privilege to SOAR credentials
Write safety checks that prevent automation-caused outages
Evaluate playbooks against the Nexus SecOps automation safety controls

Background¶

SOAR automation can dramatically accelerate response — and dramatically accelerate mistakes. An automated playbook that blocks an IP without proper validation can take down a business partner's connection. One that disables accounts without a human gate can lock out the CEO. One that isolates a host can knock a critical production server offline.

The goal of SOAR safety engineering is: automated response that is fast AND safe. This requires:

Pre-action validation (is this target in a protected list?)
Contextual enrichment (what is this IP/account/host before we act?)
Proportional human gates (low-risk actions auto; high-risk actions require approval)
Rollback capability (can we undo this action?)
Audit trail (who/what authorized every action?)

Part 1: Playbook Review — Find the Bugs¶

The following four playbooks contain safety problems. Review each one and identify all issues.

Playbook 1: Auto-Block IP on Threat Intel Match¶

name: Auto-Block IP on TI Match
trigger:
  - type: threat_intel_match
    condition: indicator.type == "ip" AND indicator.confidence >= 60

actions:
  1. block_ip:
       target: "{{ alert.src_ip }}"
       firewall: "perimeter-fw-01"
       duration: permanent

  2. create_ticket:
       title: "IP Blocked: {{ alert.src_ip }}"
       body: "Blocked at perimeter firewall."
       status: closed

notifications:
  - analyst: none
  - manager: none

Task: Identify every safety problem in this playbook.

List at least 5 issues:

#	Problem	Risk	Correct Approach
1
2
3
4
5

name: Auto-Disable Account on Impossible Travel
trigger:
  - type: alert
    rule: "Impossible Travel"
    severity: HIGH

actions:
  1. disable_account:
       target: "{{ alert.user }}"
       directory: "active-directory"

  2. revoke_sessions:
       target: "{{ alert.user }}"
       systems: ["office365", "vpn", "citrix"]

  3. create_ticket:
       title: "Account Disabled: {{ alert.user }}"
       body: "Disabled due to impossible travel detection."
       assign_to: "SOC-Tier1"

  4. notify_manager:
       lookup_manager: "{{ alert.user }}"
       message: "Account has been disabled. Contact SOC."

Task: Identify the safety problems.

#	Problem	Risk	Correct Approach
1
2
3
4

Playbook 3: Auto-Remediate CSPM Finding¶

name: Auto-Fix S3 Public Access
trigger:
  - type: cspm_finding
    rule: "S3 Bucket Public Access Enabled"
    severity: CRITICAL

actions:
  1. run_script:
       script: |
         aws s3api put-public-access-block \
           --bucket {{ finding.resource_name }} \
           --public-access-block-configuration \
           BlockPublicAcls=true,IgnorePublicAcls=true,\
           BlockPublicPolicy=true,RestrictPublicBuckets=true
       credentials: aws-prod-admin-key
       timeout: 30s

  2. close_finding:
       finding_id: "{{ finding.id }}"
       status: remediated

  3. create_ticket:
       title: "Auto-remediated: {{ finding.resource_name }}"
       status: closed

Task: Identify the safety problems.

#	Problem	Risk	Correct Approach
1
2
3
4

Playbook 4: Endpoint Isolation on Malware Detection¶

name: Auto-Isolate on Malware Detection
trigger:
  - type: alert
    rule_category: malware
    severity: [HIGH, CRITICAL]

actions:
  1. isolate_host:
       target: "{{ alert.hostname }}"
       edr_platform: crowdstrike
       isolation_type: full

  2. kill_process:
       target: "{{ alert.process_id }}"
       host: "{{ alert.hostname }}"

  3. create_ticket:
       title: "Host Isolated: {{ alert.hostname }}"
       assign_to: "SOC-Tier2"

  4. notify_user:
       user: "{{ alert.user }}"
       message: "Your computer has been quarantined. Call IT Help Desk."

Task: Identify the safety problems.

#	Problem	Risk	Correct Approach
1
2
3
4

Part 2: Design Safe Versions¶

Pick one of the four playbooks above (your choice). Redesign it to be safe, including:

Pre-action validation checks
Appropriate human-in-the-loop gates
Protected asset checks
Rollback mechanism
Complete audit trail

Write your safe playbook in pseudocode or structured YAML format.

Part 3: Human-in-the-Loop Gate Design¶

A key SOAR safety principle is proportional human oversight (Nexus SecOps-099): actions are automated or require approval based on reversibility and blast radius.

3.1 — Classify Actions¶

For each action below, classify the appropriate automation level:

Action	Reversibility	Blast Radius
Create a JIRA ticket	Fully reversible	None
Add IP to watchlist	Reversible	None
Send alert notification to analyst	Reversible	Low
Block IP at perimeter firewall	Reversible	Medium
Block domain at DNS	Reversible	Medium
Quarantine email message	Reversible	Low
Disable user account	Reversible (quickly)	High
Reset user password	Reversible	High
Isolate endpoint via EDR	Reversible	High
Delete files from endpoint	Irreversible	High
Block country at firewall	Reversible	Very High
Remove admin rights	Reversible	Medium
Terminate cloud instance	Irreversible	Very High
Wipe endpoint (MDM)	Irreversible	Very High

Automation levels: - AUTO — Execute automatically, no human approval required - NOTIFY — Execute automatically, but notify analyst immediately after - APPROVE_T1 — Requires Tier 1 approval before execution (approve within 5 min) - APPROVE_T2 — Requires Tier 2 approval before execution - APPROVE_MANAGEMENT — Requires management approval before execution

3.2 — Design an Approval Gate¶

Write the logic for a human approval gate in a SOAR playbook. Your gate should handle:

Timeout if no approval received (what action to take?)
Who can approve (role-based, not person-specific)
What information the approver sees before deciding
Audit logging of the approval decision

# Your approval gate design here
human_gate:
  action_description: ""
  action_details: {}
  approvers: []
  timeout_minutes:
  on_timeout: ""
  audit_log:
    approval_by: ""
    approval_time: ""
    approval_decision: ""
    ip_address: ""

Part 4: Credential Safety Audit¶

SOAR playbooks require credentials to take action. Answer these questions based on Nexus SecOps-104 requirements.

4.1 — Credential Anti-Pattern Identification¶

Review the following SOAR configuration excerpt. What credential safety problems exist?

# SOAR playbook code (Python)
import requests

CROWDSTRIKE_API_KEY = "cs-api-secret-12345abcde"
SERVICENOW_USER = "soar-integration"
SERVICENOW_PASS = "Password123!"
AWS_ACCESS_KEY = "AKIA2345678901234567"
AWS_SECRET_KEY = "abcdef1234567890abcdef1234567890abcdef12"

def isolate_host(hostname: str) -> bool:
    response = requests.post(
        "https://api.crowdstrike.com/devices/actions/v2?action_name=contain",
        headers={"Authorization": f"Bearer {CROWDSTRIKE_API_KEY}"},
        json={"ids": [hostname]}
    )
    return response.status_code == 202

def create_ticket(title: str, body: str) -> str:
    response = requests.post(
        "https://meridian.service-now.com/api/now/table/incident",
        json={"short_description": title, "description": body},
        auth=(SERVICENOW_USER, SERVICENOW_PASS)
    )
    return response.json()['result']['number']

List all credential safety problems:

#	Problem	Risk
1
2
3
4

4.2 — Least Privilege Design¶

For each SOAR integration below, specify the minimum permissions required. Do not grant broad admin access.

Integration	Actions Required	Minimum Permissions	What NOT to Grant
EDR (CrowdStrike)	Isolate host, get process list
Active Directory	Disable account, reset password
Email Gateway	Quarantine email, get message metadata
Firewall	Block IP, block domain
AWS	Block S3 public access
ServiceNow	Create incident, update incident

Part 5: Nexus SecOps Controls Checklist¶

Evaluate your redesigned playbook (from Part 2) against the relevant Nexus SecOps controls.

Control	Description	Met?	Evidence
Nexus SecOps-096	Playbook inventory and documentation maintained
Nexus SecOps-097	Playbooks tested in staging before production
Nexus SecOps-098	Playbook change control enforced
Nexus SecOps-099	Human-in-the-loop gates for high-risk actions
Nexus SecOps-100	Playbook execution logged with full audit trail
Nexus SecOps-104	No hardcoded credentials; vault integration
Nexus SecOps-106	Protected asset list checked before automated actions

Answer Key¶

Click to reveal — complete all tasks first!

Playbook 1 Problems (Auto-Block IP)¶

#	Problem	Risk	Correct Approach
1	Confidence threshold of 60 is too low	Blocks legitimate IPs with weak TI evidence	Require confidence ≥ 85 for automated action
2	No check if IP is internal/RFC1918	Could block internal infrastructure	Validate IP is public before blocking
3	No check if IP belongs to business partners	Could block critical partner connectivity	Check against allow/protected list
4	Permanent block with no expiry	Legitimate IPs never unblocked	Add TTL (e.g., 30 days), auto-expiry
5	Ticket auto-closed with no analyst review	No human sees the action	Keep ticket open for analyst review
6	No notification to any analyst	No visibility into automated actions	Notify analyst + log action
7	No rollback mechanism	Can't quickly undo a bad block	Include rollback playbook reference

Playbook 2 Problems (Auto-Disable Account)¶

#	Problem	Risk	Correct Approach
1	No protected account check	Could disable CEO, service accounts, executive accounts	Check against protected accounts list before disabling
2	No human approval gate	Account disable is high-impact, hard to quickly fix	Require APPROVE_T2 before account disable
3	No VPN/MFA pre-validation	Impossible travel may be a VPN or MFA anomaly, not credential compromise	Check if user is on VPN or using split-tunnel before disabling
4	Manager notification happens before analyst review	Manager may alarm unnecessarily if it's a FP	Notify manager only after analyst confirms TP

Playbook 3 Problems (Auto-Fix S3)¶

#	Problem	Risk	Correct Approach
1	No validation that bucket is not intentionally public	Some S3 buckets serve public websites — blocking access causes outage	Check against approved-public-bucket list before remediation
2	`aws-prod-admin-key` is overly permissive	Admin key has far more access than needed	Use role with only `s3:PutPublicAccessBlock` permission
3	Finding auto-closed without verification	Remediation may fail silently	Verify block was applied before closing; keep open until verified
4	No notification to bucket owner	Owner doesn't know their bucket was modified	Notify data owner or engineering team after remediation

Playbook 4 Problems (Auto-Isolate Host)¶

#	Problem	Risk	Correct Approach
1	No critical host check	Could isolate production servers, OT systems, or critical infrastructure	Check against critical/protected host list; require approval for servers
2	Kill process by PID without verification	PID may have been reused by a different process	Verify process name + hash before killing
3	User notification before analyst review	User may be a legitimate victim alarmed unnecessarily; or may tip off insider	Notify user only after Tier 2 confirms TP
4	No rollback documented	If isolation was wrong, how do you restore connectivity?	Document and test de-isolation procedure

Part 3.1 — Automation Level Classifications¶

Action	Automation Level
Create a JIRA ticket	AUTO
Add IP to watchlist	AUTO
Send alert notification	AUTO
Block IP at perimeter firewall	APPROVE_T1
Block domain at DNS	APPROVE_T1
Quarantine email message	NOTIFY
Disable user account	APPROVE_T2
Reset user password	APPROVE_T2
Isolate endpoint via EDR	APPROVE_T1 (workstation) / APPROVE_T2 (server)
Delete files from endpoint	APPROVE_MANAGEMENT
Block country at firewall	APPROVE_MANAGEMENT
Remove admin rights	APPROVE_T2
Terminate cloud instance	APPROVE_MANAGEMENT
Wipe endpoint (MDM)	APPROVE_MANAGEMENT

Part 4.1 — Credential Problems¶

#	Problem	Risk
1	Hardcoded API keys in source code	Keys exposed in version control, logs, shared access
2	Hardcoded username/password	Basic auth credentials in plaintext code
3	AWS long-term access keys (AKIA prefix)	Long-lived keys increase exposure window; should use IAM roles or short-lived tokens
4	All credentials in single file	Compromise of one file exposes all integrations

Correct approach: Use a secrets vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault). Retrieve credentials at runtime. Rotate credentials every 90 days (Nexus SecOps-104). Use IAM roles for AWS instead of access keys.

Part 4.2 — Least Privilege¶

Integration	Minimum Permissions	What NOT to Grant
EDR	`devices:write` (contain), `processes:read`	`policies:write`, `detections:write`, admin roles
Active Directory	`Account Operators` (scoped to OU) or delegate specific attributes	`Domain Admins`, `Enterprise Admins`
Email Gateway	`quarantine:write`, `message:read`	`policy:write`, `admin:all`
Firewall	Rule write on specific ACL only	`config:write`, management plane access
AWS	`s3:PutPublicAccessBlock` on specific resources	`s3:`, `iam:`, AdministratorAccess
ServiceNow	`incident_manager` role (create/update only)	`admin`, `security_admin`

Scoring¶

Criteria	Points
Playbook 1: Identified ≥5 of 7 problems	15 pts
Playbook 2: Identified all 4 problems	10 pts
Playbook 3: Identified all 4 problems	10 pts
Playbook 4: Identified all 4 problems	10 pts
Part 2: Safe playbook redesign includes all 5 required elements	20 pts
Part 3.1: Correct automation levels for ≥12 of 14 actions	15 pts
Part 3.2: Approval gate design covers all required elements	5 pts
Part 4.1: Identified all 4 credential problems	10 pts
Part 4.2: Least privilege design correct for ≥5 of 6 integrations	5 pts
Total	100 pts

Score ≥ 80: Ready to design and review production SOAR playbooks Score 60–79: Review Chapter 8 on SOAR safety principles; revisit Nexus SecOps-099 Score < 60: Study automation failure modes before building any production playbooks

Lab 4 complete. Proceed to Lab 5: LLM Guardrails Evaluation

Lab 4: SOAR Playbook Safety Checks¶

Learning Objectives¶

Background¶

Part 1: Playbook Review — Find the Bugs¶

Playbook 1: Auto-Block IP on Threat Intel Match¶

Playbook 2: Account Disable on High-Risk Login¶

Playbook 3: Auto-Remediate CSPM Finding¶

Playbook 4: Endpoint Isolation on Malware Detection¶

Part 2: Design Safe Versions¶

Part 3: Human-in-the-Loop Gate Design¶

3.1 — Classify Actions¶

3.2 — Design an Approval Gate¶

Part 4: Credential Safety Audit¶

4.1 — Credential Anti-Pattern Identification¶

4.2 — Least Privilege Design¶

Part 5: Nexus SecOps Controls Checklist¶

Answer Key¶

Playbook 1 Problems (Auto-Block IP)¶

Playbook 2 Problems (Auto-Disable Account)¶

Playbook 3 Problems (Auto-Fix S3)¶

Playbook 4 Problems (Auto-Isolate Host)¶

Part 3.1 — Automation Level Classifications¶

Part 4.1 — Credential Problems¶

Part 4.2 — Least Privilege¶

Scoring¶