Skip to content

Lab 4: SOAR Playbook Safety Checks

Difficulty: ⭐⭐⭐ Advanced Duration: 60–75 minutes Chapter Reference: Chapter 8 — SOAR, Automation and Playbooks Nexus SecOps Controls: Nexus SecOps-096, Nexus SecOps-097, Nexus SecOps-098, Nexus SecOps-099, Nexus SecOps-100, Nexus SecOps-104, Nexus SecOps-106


Learning Objectives

By completing this lab, you will be able to:

  1. Identify dangerous automation patterns in SOAR playbooks
  2. Design human-in-the-loop gates appropriate to action severity
  3. Apply the principle of least privilege to SOAR credentials
  4. Write safety checks that prevent automation-caused outages
  5. Evaluate playbooks against the Nexus SecOps automation safety controls

Background

SOAR automation can dramatically accelerate response — and dramatically accelerate mistakes. An automated playbook that blocks an IP without proper validation can take down a business partner's connection. One that disables accounts without a human gate can lock out the CEO. One that isolates a host can knock a critical production server offline.

The goal of SOAR safety engineering is: automated response that is fast AND safe. This requires:

  • Pre-action validation (is this target in a protected list?)
  • Contextual enrichment (what is this IP/account/host before we act?)
  • Proportional human gates (low-risk actions auto; high-risk actions require approval)
  • Rollback capability (can we undo this action?)
  • Audit trail (who/what authorized every action?)

Part 1: Playbook Review — Find the Bugs

The following four playbooks contain safety problems. Review each one and identify all issues.

Playbook 1: Auto-Block IP on Threat Intel Match

name: Auto-Block IP on TI Match
trigger:
  - type: threat_intel_match
    condition: indicator.type == "ip" AND indicator.confidence >= 60

actions:
  1. block_ip:
       target: "{{ alert.src_ip }}"
       firewall: "perimeter-fw-01"
       duration: permanent

  2. create_ticket:
       title: "IP Blocked: {{ alert.src_ip }}"
       body: "Blocked at perimeter firewall."
       status: closed

notifications:
  - analyst: none
  - manager: none

Task: Identify every safety problem in this playbook.

List at least 5 issues:

# Problem Risk Correct Approach
1
2
3
4
5

Playbook 2: Account Disable on High-Risk Login

name: Auto-Disable Account on Impossible Travel
trigger:
  - type: alert
    rule: "Impossible Travel"
    severity: HIGH

actions:
  1. disable_account:
       target: "{{ alert.user }}"
       directory: "active-directory"

  2. revoke_sessions:
       target: "{{ alert.user }}"
       systems: ["office365", "vpn", "citrix"]

  3. create_ticket:
       title: "Account Disabled: {{ alert.user }}"
       body: "Disabled due to impossible travel detection."
       assign_to: "SOC-Tier1"

  4. notify_manager:
       lookup_manager: "{{ alert.user }}"
       message: "Account has been disabled. Contact SOC."

Task: Identify the safety problems.

# Problem Risk Correct Approach
1
2
3
4

Playbook 3: Auto-Remediate CSPM Finding

name: Auto-Fix S3 Public Access
trigger:
  - type: cspm_finding
    rule: "S3 Bucket Public Access Enabled"
    severity: CRITICAL

actions:
  1. run_script:
       script: |
         aws s3api put-public-access-block \
           --bucket {{ finding.resource_name }} \
           --public-access-block-configuration \
           BlockPublicAcls=true,IgnorePublicAcls=true,\
           BlockPublicPolicy=true,RestrictPublicBuckets=true
       credentials: aws-prod-admin-key
       timeout: 30s

  2. close_finding:
       finding_id: "{{ finding.id }}"
       status: remediated

  3. create_ticket:
       title: "Auto-remediated: {{ finding.resource_name }}"
       status: closed

Task: Identify the safety problems.

# Problem Risk Correct Approach
1
2
3
4

Playbook 4: Endpoint Isolation on Malware Detection

name: Auto-Isolate on Malware Detection
trigger:
  - type: alert
    rule_category: malware
    severity: [HIGH, CRITICAL]

actions:
  1. isolate_host:
       target: "{{ alert.hostname }}"
       edr_platform: crowdstrike
       isolation_type: full

  2. kill_process:
       target: "{{ alert.process_id }}"
       host: "{{ alert.hostname }}"

  3. create_ticket:
       title: "Host Isolated: {{ alert.hostname }}"
       assign_to: "SOC-Tier2"

  4. notify_user:
       user: "{{ alert.user }}"
       message: "Your computer has been quarantined. Call IT Help Desk."

Task: Identify the safety problems.

# Problem Risk Correct Approach
1
2
3
4

Part 2: Design Safe Versions

Pick one of the four playbooks above (your choice). Redesign it to be safe, including:

  1. Pre-action validation checks
  2. Appropriate human-in-the-loop gates
  3. Protected asset checks
  4. Rollback mechanism
  5. Complete audit trail

Write your safe playbook in pseudocode or structured YAML format.


Part 3: Human-in-the-Loop Gate Design

A key SOAR safety principle is proportional human oversight (Nexus SecOps-099): actions are automated or require approval based on reversibility and blast radius.

3.1 — Classify Actions

For each action below, classify the appropriate automation level:

Action Reversibility Blast Radius Automation Level
Create a JIRA ticket Fully reversible None
Add IP to watchlist Reversible None
Send alert notification to analyst Reversible Low
Block IP at perimeter firewall Reversible Medium
Block domain at DNS Reversible Medium
Quarantine email message Reversible Low
Disable user account Reversible (quickly) High
Reset user password Reversible High
Isolate endpoint via EDR Reversible High
Delete files from endpoint Irreversible High
Block country at firewall Reversible Very High
Remove admin rights Reversible Medium
Terminate cloud instance Irreversible Very High
Wipe endpoint (MDM) Irreversible Very High

Automation levels: - AUTO — Execute automatically, no human approval required - NOTIFY — Execute automatically, but notify analyst immediately after - APPROVE_T1 — Requires Tier 1 approval before execution (approve within 5 min) - APPROVE_T2 — Requires Tier 2 approval before execution - APPROVE_MANAGEMENT — Requires management approval before execution

3.2 — Design an Approval Gate

Write the logic for a human approval gate in a SOAR playbook. Your gate should handle:

  • Timeout if no approval received (what action to take?)
  • Who can approve (role-based, not person-specific)
  • What information the approver sees before deciding
  • Audit logging of the approval decision
# Your approval gate design here
human_gate:
  action_description: ""
  action_details: {}
  approvers: []
  timeout_minutes:
  on_timeout: ""
  audit_log:
    approval_by: ""
    approval_time: ""
    approval_decision: ""
    ip_address: ""

Part 4: Credential Safety Audit

SOAR playbooks require credentials to take action. Answer these questions based on Nexus SecOps-104 requirements.

4.1 — Credential Anti-Pattern Identification

Review the following SOAR configuration excerpt. What credential safety problems exist?

# SOAR playbook code (Python)
import requests

CROWDSTRIKE_API_KEY = "cs-api-secret-12345abcde"
SERVICENOW_USER = "soar-integration"
SERVICENOW_PASS = "Password123!"
AWS_ACCESS_KEY = "AKIA2345678901234567"
AWS_SECRET_KEY = "abcdef1234567890abcdef1234567890abcdef12"

def isolate_host(hostname: str) -> bool:
    response = requests.post(
        "https://api.crowdstrike.com/devices/actions/v2?action_name=contain",
        headers={"Authorization": f"Bearer {CROWDSTRIKE_API_KEY}"},
        json={"ids": [hostname]}
    )
    return response.status_code == 202

def create_ticket(title: str, body: str) -> str:
    response = requests.post(
        "https://meridian.service-now.com/api/now/table/incident",
        json={"short_description": title, "description": body},
        auth=(SERVICENOW_USER, SERVICENOW_PASS)
    )
    return response.json()['result']['number']

List all credential safety problems:

# Problem Risk
1
2
3
4

4.2 — Least Privilege Design

For each SOAR integration below, specify the minimum permissions required. Do not grant broad admin access.

Integration Actions Required Minimum Permissions What NOT to Grant
EDR (CrowdStrike) Isolate host, get process list
Active Directory Disable account, reset password
Email Gateway Quarantine email, get message metadata
Firewall Block IP, block domain
AWS Block S3 public access
ServiceNow Create incident, update incident

Part 5: Nexus SecOps Controls Checklist

Evaluate your redesigned playbook (from Part 2) against the relevant Nexus SecOps controls.

Control Description Met? Evidence
Nexus SecOps-096 Playbook inventory and documentation maintained
Nexus SecOps-097 Playbooks tested in staging before production
Nexus SecOps-098 Playbook change control enforced
Nexus SecOps-099 Human-in-the-loop gates for high-risk actions
Nexus SecOps-100 Playbook execution logged with full audit trail
Nexus SecOps-104 No hardcoded credentials; vault integration
Nexus SecOps-106 Protected asset list checked before automated actions

Answer Key

Click to reveal — complete all tasks first!

Playbook 1 Problems (Auto-Block IP)

# Problem Risk Correct Approach
1 Confidence threshold of 60 is too low Blocks legitimate IPs with weak TI evidence Require confidence ≥ 85 for automated action
2 No check if IP is internal/RFC1918 Could block internal infrastructure Validate IP is public before blocking
3 No check if IP belongs to business partners Could block critical partner connectivity Check against allow/protected list
4 Permanent block with no expiry Legitimate IPs never unblocked Add TTL (e.g., 30 days), auto-expiry
5 Ticket auto-closed with no analyst review No human sees the action Keep ticket open for analyst review
6 No notification to any analyst No visibility into automated actions Notify analyst + log action
7 No rollback mechanism Can't quickly undo a bad block Include rollback playbook reference

Playbook 2 Problems (Auto-Disable Account)

# Problem Risk Correct Approach
1 No protected account check Could disable CEO, service accounts, executive accounts Check against protected accounts list before disabling
2 No human approval gate Account disable is high-impact, hard to quickly fix Require APPROVE_T2 before account disable
3 No VPN/MFA pre-validation Impossible travel may be a VPN or MFA anomaly, not credential compromise Check if user is on VPN or using split-tunnel before disabling
4 Manager notification happens before analyst review Manager may alarm unnecessarily if it's a FP Notify manager only after analyst confirms TP

Playbook 3 Problems (Auto-Fix S3)

# Problem Risk Correct Approach
1 No validation that bucket is not intentionally public Some S3 buckets serve public websites — blocking access causes outage Check against approved-public-bucket list before remediation
2 aws-prod-admin-key is overly permissive Admin key has far more access than needed Use role with only s3:PutPublicAccessBlock permission
3 Finding auto-closed without verification Remediation may fail silently Verify block was applied before closing; keep open until verified
4 No notification to bucket owner Owner doesn't know their bucket was modified Notify data owner or engineering team after remediation

Playbook 4 Problems (Auto-Isolate Host)

# Problem Risk Correct Approach
1 No critical host check Could isolate production servers, OT systems, or critical infrastructure Check against critical/protected host list; require approval for servers
2 Kill process by PID without verification PID may have been reused by a different process Verify process name + hash before killing
3 User notification before analyst review User may be a legitimate victim alarmed unnecessarily; or may tip off insider Notify user only after Tier 2 confirms TP
4 No rollback documented If isolation was wrong, how do you restore connectivity? Document and test de-isolation procedure

Part 3.1 — Automation Level Classifications

Action Automation Level
Create a JIRA ticket AUTO
Add IP to watchlist AUTO
Send alert notification AUTO
Block IP at perimeter firewall APPROVE_T1
Block domain at DNS APPROVE_T1
Quarantine email message NOTIFY
Disable user account APPROVE_T2
Reset user password APPROVE_T2
Isolate endpoint via EDR APPROVE_T1 (workstation) / APPROVE_T2 (server)
Delete files from endpoint APPROVE_MANAGEMENT
Block country at firewall APPROVE_MANAGEMENT
Remove admin rights APPROVE_T2
Terminate cloud instance APPROVE_MANAGEMENT
Wipe endpoint (MDM) APPROVE_MANAGEMENT

Part 4.1 — Credential Problems

# Problem Risk
1 Hardcoded API keys in source code Keys exposed in version control, logs, shared access
2 Hardcoded username/password Basic auth credentials in plaintext code
3 AWS long-term access keys (AKIA prefix) Long-lived keys increase exposure window; should use IAM roles or short-lived tokens
4 All credentials in single file Compromise of one file exposes all integrations

Correct approach: Use a secrets vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault). Retrieve credentials at runtime. Rotate credentials every 90 days (Nexus SecOps-104). Use IAM roles for AWS instead of access keys.


Part 4.2 — Least Privilege

Integration Minimum Permissions What NOT to Grant
EDR devices:write (contain), processes:read policies:write, detections:write, admin roles
Active Directory Account Operators (scoped to OU) or delegate specific attributes Domain Admins, Enterprise Admins
Email Gateway quarantine:write, message:read policy:write, admin:all
Firewall Rule write on specific ACL only config:write, management plane access
AWS s3:PutPublicAccessBlock on specific resources s3:*, iam:*, AdministratorAccess
ServiceNow incident_manager role (create/update only) admin, security_admin

Scoring

Criteria Points
Playbook 1: Identified ≥5 of 7 problems 15 pts
Playbook 2: Identified all 4 problems 10 pts
Playbook 3: Identified all 4 problems 10 pts
Playbook 4: Identified all 4 problems 10 pts
Part 2: Safe playbook redesign includes all 5 required elements 20 pts
Part 3.1: Correct automation levels for ≥12 of 14 actions 15 pts
Part 3.2: Approval gate design covers all required elements 5 pts
Part 4.1: Identified all 4 credential problems 10 pts
Part 4.2: Least privilege design correct for ≥5 of 6 integrations 5 pts
Total 100 pts

Score ≥ 80: Ready to design and review production SOAR playbooks Score 60–79: Review Chapter 8 on SOAR safety principles; revisit Nexus SecOps-099 Score < 60: Study automation failure modes before building any production playbooks


Lab 4 complete. Proceed to Lab 5: LLM Guardrails Evaluation