Lab 4: SOAR Playbook Safety Checks¶
Difficulty: ⭐⭐⭐ Advanced Duration: 60–75 minutes Chapter Reference: Chapter 8 — SOAR, Automation and Playbooks Nexus SecOps Controls: Nexus SecOps-096, Nexus SecOps-097, Nexus SecOps-098, Nexus SecOps-099, Nexus SecOps-100, Nexus SecOps-104, Nexus SecOps-106
Learning Objectives¶
By completing this lab, you will be able to:
- Identify dangerous automation patterns in SOAR playbooks
- Design human-in-the-loop gates appropriate to action severity
- Apply the principle of least privilege to SOAR credentials
- Write safety checks that prevent automation-caused outages
- Evaluate playbooks against the Nexus SecOps automation safety controls
Background¶
SOAR automation can dramatically accelerate response — and dramatically accelerate mistakes. An automated playbook that blocks an IP without proper validation can take down a business partner's connection. One that disables accounts without a human gate can lock out the CEO. One that isolates a host can knock a critical production server offline.
The goal of SOAR safety engineering is: automated response that is fast AND safe. This requires:
- Pre-action validation (is this target in a protected list?)
- Contextual enrichment (what is this IP/account/host before we act?)
- Proportional human gates (low-risk actions auto; high-risk actions require approval)
- Rollback capability (can we undo this action?)
- Audit trail (who/what authorized every action?)
Part 1: Playbook Review — Find the Bugs¶
The following four playbooks contain safety problems. Review each one and identify all issues.
Playbook 1: Auto-Block IP on Threat Intel Match¶
name: Auto-Block IP on TI Match
trigger:
- type: threat_intel_match
condition: indicator.type == "ip" AND indicator.confidence >= 60
actions:
1. block_ip:
target: "{{ alert.src_ip }}"
firewall: "perimeter-fw-01"
duration: permanent
2. create_ticket:
title: "IP Blocked: {{ alert.src_ip }}"
body: "Blocked at perimeter firewall."
status: closed
notifications:
- analyst: none
- manager: none
Task: Identify every safety problem in this playbook.
List at least 5 issues:
| # | Problem | Risk | Correct Approach |
|---|---|---|---|
| 1 | |||
| 2 | |||
| 3 | |||
| 4 | |||
| 5 |
Playbook 2: Account Disable on High-Risk Login¶
name: Auto-Disable Account on Impossible Travel
trigger:
- type: alert
rule: "Impossible Travel"
severity: HIGH
actions:
1. disable_account:
target: "{{ alert.user }}"
directory: "active-directory"
2. revoke_sessions:
target: "{{ alert.user }}"
systems: ["office365", "vpn", "citrix"]
3. create_ticket:
title: "Account Disabled: {{ alert.user }}"
body: "Disabled due to impossible travel detection."
assign_to: "SOC-Tier1"
4. notify_manager:
lookup_manager: "{{ alert.user }}"
message: "Account has been disabled. Contact SOC."
Task: Identify the safety problems.
| # | Problem | Risk | Correct Approach |
|---|---|---|---|
| 1 | |||
| 2 | |||
| 3 | |||
| 4 |
Playbook 3: Auto-Remediate CSPM Finding¶
name: Auto-Fix S3 Public Access
trigger:
- type: cspm_finding
rule: "S3 Bucket Public Access Enabled"
severity: CRITICAL
actions:
1. run_script:
script: |
aws s3api put-public-access-block \
--bucket {{ finding.resource_name }} \
--public-access-block-configuration \
BlockPublicAcls=true,IgnorePublicAcls=true,\
BlockPublicPolicy=true,RestrictPublicBuckets=true
credentials: aws-prod-admin-key
timeout: 30s
2. close_finding:
finding_id: "{{ finding.id }}"
status: remediated
3. create_ticket:
title: "Auto-remediated: {{ finding.resource_name }}"
status: closed
Task: Identify the safety problems.
| # | Problem | Risk | Correct Approach |
|---|---|---|---|
| 1 | |||
| 2 | |||
| 3 | |||
| 4 |
Playbook 4: Endpoint Isolation on Malware Detection¶
name: Auto-Isolate on Malware Detection
trigger:
- type: alert
rule_category: malware
severity: [HIGH, CRITICAL]
actions:
1. isolate_host:
target: "{{ alert.hostname }}"
edr_platform: crowdstrike
isolation_type: full
2. kill_process:
target: "{{ alert.process_id }}"
host: "{{ alert.hostname }}"
3. create_ticket:
title: "Host Isolated: {{ alert.hostname }}"
assign_to: "SOC-Tier2"
4. notify_user:
user: "{{ alert.user }}"
message: "Your computer has been quarantined. Call IT Help Desk."
Task: Identify the safety problems.
| # | Problem | Risk | Correct Approach |
|---|---|---|---|
| 1 | |||
| 2 | |||
| 3 | |||
| 4 |
Part 2: Design Safe Versions¶
Pick one of the four playbooks above (your choice). Redesign it to be safe, including:
- Pre-action validation checks
- Appropriate human-in-the-loop gates
- Protected asset checks
- Rollback mechanism
- Complete audit trail
Write your safe playbook in pseudocode or structured YAML format.
Part 3: Human-in-the-Loop Gate Design¶
A key SOAR safety principle is proportional human oversight (Nexus SecOps-099): actions are automated or require approval based on reversibility and blast radius.
3.1 — Classify Actions¶
For each action below, classify the appropriate automation level:
| Action | Reversibility | Blast Radius | Automation Level |
|---|---|---|---|
| Create a JIRA ticket | Fully reversible | None | |
| Add IP to watchlist | Reversible | None | |
| Send alert notification to analyst | Reversible | Low | |
| Block IP at perimeter firewall | Reversible | Medium | |
| Block domain at DNS | Reversible | Medium | |
| Quarantine email message | Reversible | Low | |
| Disable user account | Reversible (quickly) | High | |
| Reset user password | Reversible | High | |
| Isolate endpoint via EDR | Reversible | High | |
| Delete files from endpoint | Irreversible | High | |
| Block country at firewall | Reversible | Very High | |
| Remove admin rights | Reversible | Medium | |
| Terminate cloud instance | Irreversible | Very High | |
| Wipe endpoint (MDM) | Irreversible | Very High |
Automation levels: - AUTO — Execute automatically, no human approval required - NOTIFY — Execute automatically, but notify analyst immediately after - APPROVE_T1 — Requires Tier 1 approval before execution (approve within 5 min) - APPROVE_T2 — Requires Tier 2 approval before execution - APPROVE_MANAGEMENT — Requires management approval before execution
3.2 — Design an Approval Gate¶
Write the logic for a human approval gate in a SOAR playbook. Your gate should handle:
- Timeout if no approval received (what action to take?)
- Who can approve (role-based, not person-specific)
- What information the approver sees before deciding
- Audit logging of the approval decision
# Your approval gate design here
human_gate:
action_description: ""
action_details: {}
approvers: []
timeout_minutes:
on_timeout: ""
audit_log:
approval_by: ""
approval_time: ""
approval_decision: ""
ip_address: ""
Part 4: Credential Safety Audit¶
SOAR playbooks require credentials to take action. Answer these questions based on Nexus SecOps-104 requirements.
4.1 — Credential Anti-Pattern Identification¶
Review the following SOAR configuration excerpt. What credential safety problems exist?
# SOAR playbook code (Python)
import requests
CROWDSTRIKE_API_KEY = "cs-api-secret-12345abcde"
SERVICENOW_USER = "soar-integration"
SERVICENOW_PASS = "Password123!"
AWS_ACCESS_KEY = "AKIA2345678901234567"
AWS_SECRET_KEY = "abcdef1234567890abcdef1234567890abcdef12"
def isolate_host(hostname: str) -> bool:
response = requests.post(
"https://api.crowdstrike.com/devices/actions/v2?action_name=contain",
headers={"Authorization": f"Bearer {CROWDSTRIKE_API_KEY}"},
json={"ids": [hostname]}
)
return response.status_code == 202
def create_ticket(title: str, body: str) -> str:
response = requests.post(
"https://meridian.service-now.com/api/now/table/incident",
json={"short_description": title, "description": body},
auth=(SERVICENOW_USER, SERVICENOW_PASS)
)
return response.json()['result']['number']
List all credential safety problems:
| # | Problem | Risk |
|---|---|---|
| 1 | ||
| 2 | ||
| 3 | ||
| 4 |
4.2 — Least Privilege Design¶
For each SOAR integration below, specify the minimum permissions required. Do not grant broad admin access.
| Integration | Actions Required | Minimum Permissions | What NOT to Grant |
|---|---|---|---|
| EDR (CrowdStrike) | Isolate host, get process list | ||
| Active Directory | Disable account, reset password | ||
| Email Gateway | Quarantine email, get message metadata | ||
| Firewall | Block IP, block domain | ||
| AWS | Block S3 public access | ||
| ServiceNow | Create incident, update incident |
Part 5: Nexus SecOps Controls Checklist¶
Evaluate your redesigned playbook (from Part 2) against the relevant Nexus SecOps controls.
| Control | Description | Met? | Evidence |
|---|---|---|---|
| Nexus SecOps-096 | Playbook inventory and documentation maintained | ||
| Nexus SecOps-097 | Playbooks tested in staging before production | ||
| Nexus SecOps-098 | Playbook change control enforced | ||
| Nexus SecOps-099 | Human-in-the-loop gates for high-risk actions | ||
| Nexus SecOps-100 | Playbook execution logged with full audit trail | ||
| Nexus SecOps-104 | No hardcoded credentials; vault integration | ||
| Nexus SecOps-106 | Protected asset list checked before automated actions |
Answer Key¶
Click to reveal — complete all tasks first!
Playbook 1 Problems (Auto-Block IP)¶
| # | Problem | Risk | Correct Approach |
|---|---|---|---|
| 1 | Confidence threshold of 60 is too low | Blocks legitimate IPs with weak TI evidence | Require confidence ≥ 85 for automated action |
| 2 | No check if IP is internal/RFC1918 | Could block internal infrastructure | Validate IP is public before blocking |
| 3 | No check if IP belongs to business partners | Could block critical partner connectivity | Check against allow/protected list |
| 4 | Permanent block with no expiry | Legitimate IPs never unblocked | Add TTL (e.g., 30 days), auto-expiry |
| 5 | Ticket auto-closed with no analyst review | No human sees the action | Keep ticket open for analyst review |
| 6 | No notification to any analyst | No visibility into automated actions | Notify analyst + log action |
| 7 | No rollback mechanism | Can't quickly undo a bad block | Include rollback playbook reference |
Playbook 2 Problems (Auto-Disable Account)¶
| # | Problem | Risk | Correct Approach |
|---|---|---|---|
| 1 | No protected account check | Could disable CEO, service accounts, executive accounts | Check against protected accounts list before disabling |
| 2 | No human approval gate | Account disable is high-impact, hard to quickly fix | Require APPROVE_T2 before account disable |
| 3 | No VPN/MFA pre-validation | Impossible travel may be a VPN or MFA anomaly, not credential compromise | Check if user is on VPN or using split-tunnel before disabling |
| 4 | Manager notification happens before analyst review | Manager may alarm unnecessarily if it's a FP | Notify manager only after analyst confirms TP |
Playbook 3 Problems (Auto-Fix S3)¶
| # | Problem | Risk | Correct Approach |
|---|---|---|---|
| 1 | No validation that bucket is not intentionally public | Some S3 buckets serve public websites — blocking access causes outage | Check against approved-public-bucket list before remediation |
| 2 | aws-prod-admin-key is overly permissive | Admin key has far more access than needed | Use role with only s3:PutPublicAccessBlock permission |
| 3 | Finding auto-closed without verification | Remediation may fail silently | Verify block was applied before closing; keep open until verified |
| 4 | No notification to bucket owner | Owner doesn't know their bucket was modified | Notify data owner or engineering team after remediation |
Playbook 4 Problems (Auto-Isolate Host)¶
| # | Problem | Risk | Correct Approach |
|---|---|---|---|
| 1 | No critical host check | Could isolate production servers, OT systems, or critical infrastructure | Check against critical/protected host list; require approval for servers |
| 2 | Kill process by PID without verification | PID may have been reused by a different process | Verify process name + hash before killing |
| 3 | User notification before analyst review | User may be a legitimate victim alarmed unnecessarily; or may tip off insider | Notify user only after Tier 2 confirms TP |
| 4 | No rollback documented | If isolation was wrong, how do you restore connectivity? | Document and test de-isolation procedure |
Part 3.1 — Automation Level Classifications¶
| Action | Automation Level |
|---|---|
| Create a JIRA ticket | AUTO |
| Add IP to watchlist | AUTO |
| Send alert notification | AUTO |
| Block IP at perimeter firewall | APPROVE_T1 |
| Block domain at DNS | APPROVE_T1 |
| Quarantine email message | NOTIFY |
| Disable user account | APPROVE_T2 |
| Reset user password | APPROVE_T2 |
| Isolate endpoint via EDR | APPROVE_T1 (workstation) / APPROVE_T2 (server) |
| Delete files from endpoint | APPROVE_MANAGEMENT |
| Block country at firewall | APPROVE_MANAGEMENT |
| Remove admin rights | APPROVE_T2 |
| Terminate cloud instance | APPROVE_MANAGEMENT |
| Wipe endpoint (MDM) | APPROVE_MANAGEMENT |
Part 4.1 — Credential Problems¶
| # | Problem | Risk |
|---|---|---|
| 1 | Hardcoded API keys in source code | Keys exposed in version control, logs, shared access |
| 2 | Hardcoded username/password | Basic auth credentials in plaintext code |
| 3 | AWS long-term access keys (AKIA prefix) | Long-lived keys increase exposure window; should use IAM roles or short-lived tokens |
| 4 | All credentials in single file | Compromise of one file exposes all integrations |
Correct approach: Use a secrets vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault). Retrieve credentials at runtime. Rotate credentials every 90 days (Nexus SecOps-104). Use IAM roles for AWS instead of access keys.
Part 4.2 — Least Privilege¶
| Integration | Minimum Permissions | What NOT to Grant |
|---|---|---|
| EDR | devices:write (contain), processes:read | policies:write, detections:write, admin roles |
| Active Directory | Account Operators (scoped to OU) or delegate specific attributes | Domain Admins, Enterprise Admins |
| Email Gateway | quarantine:write, message:read | policy:write, admin:all |
| Firewall | Rule write on specific ACL only | config:write, management plane access |
| AWS | s3:PutPublicAccessBlock on specific resources | s3:*, iam:*, AdministratorAccess |
| ServiceNow | incident_manager role (create/update only) | admin, security_admin |
Scoring¶
| Criteria | Points |
|---|---|
| Playbook 1: Identified ≥5 of 7 problems | 15 pts |
| Playbook 2: Identified all 4 problems | 10 pts |
| Playbook 3: Identified all 4 problems | 10 pts |
| Playbook 4: Identified all 4 problems | 10 pts |
| Part 2: Safe playbook redesign includes all 5 required elements | 20 pts |
| Part 3.1: Correct automation levels for ≥12 of 14 actions | 15 pts |
| Part 3.2: Approval gate design covers all required elements | 5 pts |
| Part 4.1: Identified all 4 credential problems | 10 pts |
| Part 4.2: Least privilege design correct for ≥5 of 6 integrations | 5 pts |
| Total | 100 pts |
Score ≥ 80: Ready to design and review production SOAR playbooks Score 60–79: Review Chapter 8 on SOAR safety principles; revisit Nexus SecOps-099 Score < 60: Study automation failure modes before building any production playbooks
Lab 4 complete. Proceed to Lab 5: LLM Guardrails Evaluation