AI/ML System Incident Response Playbook¶

HIGH PRIORITY — Novel Attack Surface

AI/ML systems introduce attack surfaces that traditional IR playbooks do not cover: model poisoning, training data manipulation, prompt injection, inference manipulation, and model theft. These attacks can be subtle, delayed in impact, and difficult to detect with conventional security tooling. This playbook provides structured response procedures for AI-specific incidents.

Metadata¶

Field	Value
Playbook ID	IR-PB-008
Severity	High (P2) — Escalate to P1 if model serves safety-critical decisions or PII is exposed
RTO — Containment	< 2 hours
RTO — Recovery	< 48 hours
Owner	IR Lead + ML Engineering Lead (joint)
Escalation	IR Lead → CISO → ML Engineering Lead → Chief Data Officer → Legal
Last Reviewed	2026-03-22

AI/ML Incident Classification Matrix¶

Incident Type	Description	Severity	Example
Model Poisoning	Training data or model weights tampered to alter predictions	Critical (P1)	Backdoor inserted into fraud detection model — approved fraudulent transactions
Prompt Injection	Adversarial input causes LLM to bypass guardrails or leak data	High (P2)	User crafts prompt that extracts system prompt, PII from context, or executes unauthorized actions
Training Data Compromise	Unauthorized access to or manipulation of training datasets	High (P2)	Attacker modifies labeled data in training pipeline to introduce bias
Model Theft / Extraction	Unauthorized copying or reconstruction of proprietary models	High (P2)	Systematic API queries to reverse-engineer model parameters
Adversarial Evasion	Inputs crafted to cause misclassification at inference time	Medium (P3)	Adversarial perturbations bypass malware classifier
Data Pipeline Compromise	Feature engineering or data pipeline components compromised	High (P2)	Attacker modifies ETL pipeline to inject malicious features
Model Supply Chain	Compromised pre-trained model, library, or dependency	Critical (P1)	Trojanized model downloaded from public repository
Inference Infrastructure	Compromise of model serving infrastructure (API, GPU cluster)	High (P2)	Attacker gains access to model serving endpoint for unauthorized use

Severity Classification Matrix¶

Factor	Low (P3)	Medium (P2)	High (P1)	Critical (P0)
Model Criticality	Internal analytics, non-decision	Business process support	Customer-facing, financial, or compliance decisions	Safety-critical, healthcare, autonomous systems
Data Sensitivity	Public/synthetic data only	Internal business data	PII, proprietary IP	PHI, financial records, classified data
Blast Radius	Single model, isolated environment	Multiple models sharing pipeline	Production models, customer-facing	Organization-wide ML platform compromise
Detectability	Detected by monitoring immediately	Detected within hours	Detected after impact observed	Undetected for extended period (weeks+)
Reversibility	Rollback available, no data loss	Rollback available, some retraining needed	Retraining required, data integrity uncertain	Model and training data integrity unknown

RACI — Roles & Responsibilities¶

Activity	IR Lead	SOC Analyst	ML Engineer	Data Engineer	CISO	Legal	Privacy/DPO
Initial detection & triage	A	R	C	C	I	I	I
Model behavior analysis	C	I	A	R	I	—	—
Training data integrity check	I	—	R	A	I	—	C
Model isolation / rollback	A	C	R	R	I	—	—
Adversarial input analysis	C	R	A	I	I	—	—
PII exposure assessment	C	R	C	R	I	C	A
Infrastructure forensics	A	R	C	C	I	—	—
Regulatory notification	I	—	—	—	C	A	R
Model retraining / recovery	I	—	A	R	I	—	C
Post-incident review	A	R	R	R	C	C	C

R = Responsible, A = Accountable, C = Consulted, I = Informed

Trigger Conditions¶

Activate this playbook on any of the following:

[ ] Model monitoring alert: prediction drift exceeding baseline threshold (accuracy drop >5% without known cause)
[ ] Prompt injection attempt detected by guardrail system (jailbreak, system prompt extraction, unauthorized tool use)
[ ] Anomalous training pipeline activity: unauthorized data modifications, unexpected retraining jobs, or pipeline configuration changes
[ ] Model API abuse: query volume or pattern consistent with model extraction attack
[ ] Unauthorized access to model artifacts (weights, configs, training data) in storage or model registry
[ ] Supply chain alert: vulnerability or compromise in ML framework, pre-trained model, or dependency (e.g., PyTorch, TensorFlow, HuggingFace model)
[ ] LLM output containing PII, credentials, system prompts, or other sensitive data that should be filtered
[ ] User report: AI system producing anomalous, biased, or harmful outputs not seen during validation
[ ] Infrastructure alert: unauthorized GPU/TPU provisioning, model serving endpoint modifications, or API gateway changes
[ ] Data pipeline integrity failure: checksums mismatch on training data, feature store anomalies, or unauthorized schema changes

Decision Tree¶

flowchart TD
    A([AI/ML Incident\nTrigger Detected]) --> B{What type of\nAI/ML incident?}

    B -- "Model Behavior\nAnomaly" --> C{Is the model\nin production?}
    B -- "Prompt Injection /\nGuardrail Bypass" --> D{Was sensitive data\nexposed in output?}
    B -- "Training Data /\nPipeline Compromise" --> E{Is the training\npipeline actively\ncompromised?}
    B -- "Model Theft /\nExtraction" --> F{Is extraction\nongoing?}
    B -- "Supply Chain\nCompromise" --> G[IMMEDIATE: Isolate\naffected model and\ndependencies]

    C -- Yes --> H{Is the model\nsafety-critical or\ncustomer-facing?}
    C -- No --> I[Investigate in\nisolation — assess\nimpact before action]

    H -- Yes --> J[IMMEDIATE: Roll back\nto last known-good\nmodel version]
    H -- No --> K[Enable shadow mode\nRoute traffic to\nfallback model]

    D -- Yes --> L[IMMEDIATE: Purge\ncached responses\nAssess data exposure\nNotify privacy team]
    D -- No --> M[Block adversarial\ninput pattern\nUpdate guardrails\nLog for analysis]

    E -- Yes --> N[HALT pipeline\nIsolate data stores\nPreserve evidence]
    E -- No --> O[Audit pipeline logs\nVerify data integrity\nAssess impact window]

    F -- Yes --> P[Rate-limit or disable\nmodel API endpoint\nBlock source IPs]
    F -- No --> Q[Assess exposure\nQuantify extracted\nknowledge]

    G --> R{Are other models\nusing same\ndependency?}
    R -- Yes --> S[Audit all dependent\nmodels — isolate\nif affected]
    R -- No --> T[Rebuild model from\nverified clean\ndependencies]

    J --> U[Investigate root cause\nWas model poisoned\nor infrastructure\ncompromised?]
    K --> U
    I --> U
    L --> V[Regulatory notification\nassessment]
    M --> W[Update detection rules\nStrengthen guardrails]
    N --> X[Full training data\naudit — validate\nintegrity]
    O --> X
    P --> Y[Assess IP exposure\nConsider model\nreplacement]
    Q --> Y
    S --> T

    U --> Z([Remediation &\nLessons Learned])
    V --> Z
    W --> Z
    X --> Z
    Y --> Z
    T --> Z

Phase 1 — Detection & Triage (0–2 Hours)¶

1.1 Detection Queries¶

KQL (Microsoft Sentinel)SPL (Splunk)

// Detect anomalous model API query patterns (potential model extraction)
let ModelAPIs = datatable(api_endpoint:string) [
    "/api/v1/predict",
    "/api/v1/inference",
    "/api/v1/completions"
];
AzureDiagnostics
| where TimeGenerated > ago(24h)
| where requestUri_s has_any ("/predict", "/inference", "/completions")
| summarize QueryCount=count(), DistinctInputs=dcount(requestBody_s),
    AvgLatency=avg(timeTaken_d)
    by callerIpAddress_s, bin(TimeGenerated, 1h)
| where QueryCount > 1000 or DistinctInputs > 500
| order by QueryCount desc

// Detect unauthorized access to model artifacts in storage
StorageBlobLogs
| where TimeGenerated > ago(24h)
| where ObjectKey has_any ("model", "weights", "checkpoint", ".pt", ".h5",
                            ".onnx", ".safetensors", ".pkl")
| where OperationName in ("GetBlob", "PutBlob", "DeleteBlob")
| where CallerIpAddress !startswith "10." and CallerIpAddress !startswith "192.168."
| project TimeGenerated, CallerIpAddress, OperationName, ObjectKey, UserAgentHeader
| order by TimeGenerated desc

// Detect prompt injection patterns in LLM logs
CustomLogs_CL
| where TimeGenerated > ago(24h)
| where RawData has_any ("ignore previous instructions", "ignore above",
                          "system prompt", "you are now", "disregard",
                          "bypass", "jailbreak", "DAN mode",
                          "developer mode", "reveal your instructions")
| project TimeGenerated, RawData, SourceIP_s, UserID_s
| order by TimeGenerated desc

// Detect anomalous model retraining or pipeline jobs
AzureActivity
| where TimeGenerated > ago(7d)
| where OperationNameValue has_any ("Microsoft.MachineLearningServices/workspaces/jobs",
                                     "Microsoft.MachineLearningServices/workspaces/models")
| where ActivityStatusValue == "Succeeded"
| project TimeGenerated, Caller, OperationNameValue, ResourceGroup, CorrelationId
| order by TimeGenerated desc

// Detect model extraction attempts — high-volume API queries
index=api sourcetype=api_gateway uri IN ("/api/v1/predict", "/api/v1/inference", "/api/v1/completions")
| stats count AS query_count, dc(request_body) AS distinct_inputs BY src_ip, span=1h
| where query_count > 1000 OR distinct_inputs > 500
| sort - query_count

// Detect unauthorized access to model artifacts
index=cloud sourcetype=s3_access
| where match(key, "(?i)(model|weights|checkpoint|\.(pt|h5|onnx|safetensors|pkl))")
| where NOT cidrmatch("10.0.0.0/8", remote_ip)
    AND NOT cidrmatch("192.168.0.0/16", remote_ip)
| stats count BY remote_ip, operation, key, user_agent
| sort - count

// Detect prompt injection attempts in LLM application logs
index=app sourcetype=llm_gateway
| where match(user_input, "(?i)(ignore previous|ignore above|system prompt|you are now|disregard|bypass|jailbreak|DAN mode|developer mode|reveal your instructions)")
| table _time, src_ip, user_id, user_input
| sort - _time

// Detect anomalous training pipeline activity
index=mlops sourcetype=pipeline_logs
| where action IN ("retrain", "deploy", "modify_data", "update_config")
| stats count BY user, action, pipeline_name, span=1d
| eventstats avg(count) AS avg_count, stdev(count) AS stdev_count BY pipeline_name, action
| where count > (avg_count + 2 * stdev_count)
| sort - count

1.2 Model Behavior Analysis¶

Model drift can be natural or adversarial. Distinguish between the two before escalating.

Indicator	Natural Drift	Adversarial Manipulation
Onset	Gradual over days/weeks	Sudden shift (hours)
Pattern	Uniform degradation across classes	Targeted — specific inputs misclassified
Training data	Data distribution shifted	Data labels flipped or poisoned samples added
Correlation	Correlates with real-world data changes	No corresponding data distribution change
Reversibility	Retraining on fresh data resolves	Retraining on same data reproduces issue

[ ] Compare current model metrics against baseline (accuracy, precision, recall, F1)
[ ] Run validation dataset through model — compare against known-good outputs
[ ] Check for specific input patterns that trigger anomalous behavior (adversarial trigger analysis)
[ ] Review model prediction distribution — look for shifts in confidence scores
[ ] Examine feature importance changes — has the model's decision logic shifted?

1.3 Prompt Injection Triage (LLM Systems)¶

For LLM-specific incidents:

[ ] Capture the exact adversarial prompt and model response
[ ] Determine what data was exposed (system prompt, PII, internal context, tool outputs)
[ ] Check if the injection enabled unauthorized tool/function calls
[ ] Review conversation logs for the affected user session
[ ] Assess whether the injection technique can be reproduced systematically

Prompt Injection Type	Risk Level	Response
System prompt extraction	Medium	Update prompt, add detection rule
PII leakage from context	High	Purge cache, assess exposure, notify privacy team
Guardrail bypass (harmful content)	High	Block pattern, update content filter, assess public exposure
Unauthorized tool/API execution	Critical	Disable tool access, audit executed actions, revoke permissions
Cross-tenant data leakage	Critical	Isolate system, full data exposure assessment, regulatory review
Indirect prompt injection (via retrieved docs)	High	Audit retrieval sources, sanitize document pipeline

Phase 2 — Containment (2–8 Hours)¶

2.1 Model Isolation & Rollback¶

Model ServingKubernetes Model ServingLLM Application

# Roll back to last known-good model version (MLflow example)
# Identify current production model and previous version
mlflow models list --name "fraud-detection-prod"

# Transition current model to "Archived" and promote previous version
mlflow models transition-stage \
    --name "fraud-detection-prod" \
    --version 12 \
    --stage "Archived"

mlflow models transition-stage \
    --name "fraud-detection-prod" \
    --version 11 \
    --stage "Production"

# Verify rollback via health check
curl -s https://ml-serving.internal.example.com/api/v1/models/fraud-detection-prod/version

# Roll back model deployment in Kubernetes (KServe / Seldon example)
kubectl rollout undo deployment/fraud-detection-predictor -n ml-serving

# Verify rollback
kubectl rollout status deployment/fraud-detection-predictor -n ml-serving

# If needed — scale down compromised model and route to fallback
kubectl scale deployment/fraud-detection-predictor --replicas=0 -n ml-serving
kubectl scale deployment/fraud-detection-fallback --replicas=3 -n ml-serving

# Disable compromised LLM endpoint and enable fallback
# Update API gateway configuration (synthetic example)
curl -X PATCH https://api-gateway.internal.example.com/routes/llm-chat \
    -H "Authorization: Bearer ${ADMIN_TOKEN}" \
    -d '{
        "upstream": "https://llm-fallback.internal.example.com",
        "plugins": {
            "rate-limiting": {"requests_per_second": 10},
            "prompt-guard": {"enabled": true, "mode": "strict"}
        }
    }'

2.2 Training Pipeline Containment¶

[ ] Halt all active training jobs and scheduled retraining pipelines
[ ] Freeze training data stores — set to read-only access
[ ] Revoke write permissions to model registries and artifact stores
[ ] Capture pipeline state: current job configurations, data snapshots, model checksums
[ ] Isolate the feature store if feature engineering is suspected compromised

# Freeze training data in S3 (enable object lock / read-only policy)
aws s3api put-bucket-policy --bucket ml-training-data-example \
    --policy '{
        "Version": "2012-10-17",
        "Statement": [{
            "Sid": "IR-PB-008-ReadOnly",
            "Effect": "Deny",
            "Principal": "*",
            "Action": ["s3:PutObject", "s3:DeleteObject"],
            "Resource": "arn:aws:s3:::ml-training-data-example/*",
            "Condition": {
                "StringNotLike": {"aws:PrincipalArn": "arn:aws:iam::123456789012:role/IR-Emergency-Role"}
            }
        }]
    }'

# Verify no active training jobs
aws sagemaker list-training-jobs --status-equals InProgress \
    --query 'TrainingJobSummaries[].{Name:TrainingJobName,Status:TrainingJobStatus,Created:CreationTime}' \
    --output table

2.3 API Endpoint Containment (Model Extraction)¶

[ ] Apply aggressive rate limiting to model API endpoints
[ ] Block source IPs exhibiting extraction patterns
[ ] Enable response perturbation (add calibrated noise to prediction outputs)
[ ] Disable confidence score / probability outputs (return class only)
[ ] Implement query budget per user/API key

# Block suspicious IPs at WAF (synthetic IP examples)
# AWS WAF IP set update
aws wafv2 update-ip-set \
    --name "ML-API-Blocked-IPs" \
    --scope REGIONAL \
    --id "a1b2c3d4-e5f6-7890-abcd-ef1234567890" \
    --addresses "203.0.113.50/32" "198.51.100.75/32" "203.0.113.100/32" \
    --lock-token "$(aws wafv2 get-ip-set --name ML-API-Blocked-IPs --scope REGIONAL --id a1b2c3d4-e5f6-7890-abcd-ef1234567890 --query 'LockToken' --output text)"

2.4 LLM Guardrail Hardening (Prompt Injection)¶

[ ] Update input sanitization rules to block identified injection patterns
[ ] Enable or tighten output filtering for PII, credentials, and system prompts
[ ] Add the adversarial prompt pattern to the guardrail deny-list
[ ] Temporarily reduce model capabilities (disable tool use, restrict context window)
[ ] Enable full conversation logging for audit purposes

Phase 3 — Investigation & Analysis (8–24 Hours)¶

3.1 Model Poisoning Investigation¶

Model poisoning may have occurred days or weeks before detection. Investigate the full retraining history.

[ ] Identify all model versions deployed in the affected window
[ ] Compare model weights/checksums against known-good baselines
[ ] Analyze training data for poisoned samples (label flipping, backdoor triggers)
[ ] Review data pipeline access logs — who modified training data and when?
[ ] Test for backdoor triggers: systematic input perturbation to identify hidden behaviors

Investigation Step	Tool/Method	Evidence
Training data integrity audit	SHA256 checksums, data versioning (DVC)	Compare against stored hashes from last verified training run
Model weight comparison	Cosine similarity, parameter diff	Identify layers with unexpected weight changes
Backdoor trigger detection	Neural Cleanse, Activation Clustering, STRIP	Identify inputs that consistently trigger specific outputs
Pipeline access log review	CloudTrail, Kubernetes audit logs, CI/CD logs	Unauthorized access or modifications to pipeline
Feature store audit	Feature store versioning, access logs	Unauthorized feature modifications

3.2 Training Data Forensics¶

# Synthetic example — training data integrity verification
# Compare current training data against verified baseline
import hashlib
import json

# Load baseline manifest (SHA256 hashes of each training file)
with open("/data/manifests/training_baseline_v11.json", "r") as f:
    baseline = json.load(f)

# Verify current training data against baseline
compromised_files = []
for file_path, expected_hash in baseline.items():
    with open(file_path, "rb") as f:
        actual_hash = hashlib.sha256(f.read()).hexdigest()
    if actual_hash != expected_hash:
        compromised_files.append({
            "file": file_path,
            "expected": expected_hash,
            "actual": actual_hash
        })
        print(f"[MISMATCH] {file_path}")

print(f"\nTotal files checked: {len(baseline)}")
print(f"Compromised files: {len(compromised_files)}")

3.3 Prompt Injection Forensics (LLM Systems)¶

[ ] Extract and catalog all adversarial prompts from logs
[ ] Map injection techniques to known taxonomies (OWASP LLM Top 10)
[ ] Determine if injection was direct (user input) or indirect (embedded in retrieved documents)
[ ] Assess data exposure: what information was returned in compromised responses?
[ ] Review RAG pipeline: were retrieval sources poisoned to enable indirect injection?

3.4 Model Extraction Assessment¶

[ ] Analyze API query logs for extraction patterns:
- Systematic input space exploration (grid-like query patterns)
- High query volume from single source with varied inputs
- Queries designed to probe decision boundaries
[ ] Estimate fidelity of extracted model based on query volume and input diversity
[ ] Assess intellectual property exposure and competitive risk
[ ] Review API authentication and authorization logs for compromised credentials

Phase 4 — Eradication (24–36 Hours)¶

4.1 Model Remediation¶

Scenario	Remediation Action	Timeline
Model poisoning confirmed	Retrain from verified clean data + known-good checkpoint	12-48 hours
Training data compromised	Quarantine compromised data, rebuild dataset, retrain	24-72 hours
Model weights tampered	Restore from verified model registry backup, validate	4-12 hours
Supply chain compromise	Rebuild with verified dependencies, scan all artifacts	12-24 hours
Prompt injection (LLM)	Update guardrails, retune safety filters, update system prompt	4-12 hours
Model extracted	Consider model replacement/retraining with different architecture	1-4 weeks

4.2 Training Pipeline Hardening¶

[ ] Implement cryptographic signing for all training data and model artifacts
[ ] Enable immutable audit logging for all pipeline operations
[ ] Enforce code review for pipeline configuration changes
[ ] Add automated data integrity checks at each pipeline stage
[ ] Implement access controls: separate roles for data preparation, training, and deployment

4.3 LLM Guardrail Improvements¶

[ ] Update input validation with detected injection patterns
[ ] Implement layered defense: pre-processing filter → model guardrails → output filter
[ ] Add canary tokens to system prompts to detect extraction attempts
[ ] Enable output scanning for PII, credentials, and sensitive patterns
[ ] Implement conversation context isolation between users/sessions
[ ] Deploy indirect prompt injection defenses for RAG pipelines (input sanitization of retrieved documents)

4.4 Infrastructure Remediation¶

[ ] Rotate all API keys and service credentials for ML infrastructure
[ ] Review and tighten IAM permissions for model serving, training, and storage
[ ] Patch ML framework vulnerabilities (PyTorch, TensorFlow, ONNX Runtime)
[ ] Verify integrity of GPU/TPU driver installations
[ ] Audit container images used in ML pipelines for embedded threats

Phase 5 — Recovery (36–48 Hours)¶

5.1 Model Redeployment¶

[ ] Retrained/restored model validated against benchmark dataset
[ ] A/B testing or shadow deployment before full production cutover
[ ] Canary deployment: route 5% of traffic → monitor for 4 hours → scale to 100%
[ ] Model performance metrics confirmed within acceptable thresholds
[ ] Rollback procedure tested and ready if issues emerge

# Canary deployment verification (synthetic example)
# Monitor model performance during staged rollout
curl -s https://ml-monitoring.internal.example.com/api/v1/metrics \
    -H "Authorization: Bearer ${MONITOR_TOKEN}" | \
    python3 -c "
import json, sys
metrics = json.load(sys.stdin)
print(f'Accuracy:  {metrics[\"accuracy\"]:.4f} (baseline: 0.9520)')
print(f'Precision: {metrics[\"precision\"]:.4f} (baseline: 0.9410)')
print(f'Recall:    {metrics[\"recall\"]:.4f} (baseline: 0.9380)')
print(f'F1 Score:  {metrics[\"f1\"]:.4f} (baseline: 0.9395)')
if metrics['accuracy'] < 0.94:
    print('[ALERT] Accuracy below threshold — investigate before full rollout')
    sys.exit(1)
print('[OK] Metrics within acceptable range — proceed with rollout')
"

5.2 Training Pipeline Restoration¶

[ ] Unfreeze training data stores after integrity verification
[ ] Re-enable scheduled retraining with new integrity controls
[ ] Verify data pipeline checksums at each stage
[ ] Confirm monitoring and alerting is active for all pipeline components
[ ] Document the clean baseline for future comparison

5.3 Monitoring Enhancement¶

Deploy enhanced monitoring post-incident:

Monitor	What It Detects	Threshold
Prediction drift detector	Statistical shift in model outputs	KS test p-value < 0.05
Input distribution monitor	Anomalous input patterns	Mahalanobis distance > 3 sigma
API query anomaly detector	Extraction patterns, abuse	>500 queries/hour from single source
Training data integrity checker	Unauthorized data modifications	Any checksum mismatch
Model artifact integrity	Unauthorized weight changes	Any hash mismatch against registry
LLM output scanner	PII, credentials, prompt leakage	Any match triggers alert
Guardrail bypass detector	Successful prompt injection	Any bypass triggers P2 alert

Phase 6 — Lessons Learned (Within 2 Weeks)¶

6.1 Metrics to Capture¶

Metric	Definition	Target
Time to Detect (TTD)	Incident onset → confirmed detection	< 2 hours
Time to Contain (TTC)	Detection → model isolated/rolled back	< 2 hours
Time to Recover (TTR)	Containment → clean model in production	< 48 hours
Model Integrity Impact	Duration model served compromised predictions	Minimize
Data Exposure Scope	Records/prompts/outputs exposed via incident	Quantify precisely
Detection Method	How the incident was discovered	Improve automated detection
False Positive Rate	Alerts during incident that were not related	< 15%

6.2 Post-Incident Review Agenda¶

[ ] Attack timeline: when did the compromise begin vs. when was it detected?
[ ] Root cause: how did the attacker gain access to model/data/pipeline?
[ ] Detection gap analysis: why didn't existing monitoring catch it sooner?
[ ] Model impact assessment: what decisions were affected by the compromised model?
[ ] Data exposure: was any PII, proprietary data, or IP exposed?
[ ] Pipeline security: are there remaining gaps in the ML pipeline security?
[ ] Supply chain: were third-party models or libraries involved?
[ ] Guardrail effectiveness (LLM): did existing guardrails slow the attack?
[ ] Process improvements identified, assigned, and deadlined

6.3 Prevention Controls Checklist¶

[ ] Model versioning and cryptographic signing implemented
[ ] Training data integrity verification automated (checksums at every pipeline stage)
[ ] Model serving endpoints protected with authentication, rate limiting, and monitoring
[ ] Prediction confidence scores restricted or perturbed for external APIs
[ ] LLM guardrails deployed: input sanitization, output filtering, prompt injection detection
[ ] RAG pipeline hardened: document sanitization, source validation
[ ] ML pipeline access controls follow least privilege (separate roles for data, training, deployment)
[ ] Supply chain scanning for ML dependencies and pre-trained models
[ ] Automated model drift detection deployed with alerting
[ ] Red team exercises include adversarial ML scenarios (see Chapter 37, 50)
[ ] Incident response team trained on AI/ML-specific attack patterns

ATT&CK Technique Mapping¶

Technique ID	Technique Name	AI/ML Relevance
T1195.003	Supply Chain Compromise: Compromise Hardware Supply Chain	Compromised GPU firmware, TPU supply chain
T1195.002	Supply Chain Compromise: Compromise Software Supply Chain	Trojanized ML frameworks, pre-trained models
T1565.001	Data Manipulation: Stored Data Manipulation	Training data poisoning, label flipping
T1565.002	Data Manipulation: Transmitted Data Manipulation	Feature pipeline data-in-transit manipulation
T1530	Data from Cloud Storage	Unauthorized access to training data in cloud storage
T1119	Automated Collection	Systematic model extraction via API queries
T1213	Data from Information Repositories	Theft of model artifacts from registries
T1059	Command and Scripting Interpreter	Malicious training scripts, pipeline manipulation
T1078	Valid Accounts	Compromised ML platform credentials
T1190	Exploit Public-Facing Application	Prompt injection against LLM-powered applications
T1498	Network Denial of Service	Resource exhaustion via GPU/inference abuse

Communication Templates¶

Internal Stakeholder Update¶

AI/ML INCIDENT UPDATE — [Date] [Time] UTC
Status: [Active Response / Investigation / Recovery / Resolved]
Classification: CONFIDENTIAL

AFFECTED SYSTEM: [Model name / ML pipeline / LLM application]
INCIDENT TYPE: [Model poisoning / Prompt injection / Data compromise /
                Model extraction / Supply chain]

CURRENT STATUS:
- Model status: [In production / Rolled back / Isolated / Retraining]
- Affected predictions/decisions: [Scope description]
- Data exposure: [None / Under assessment / Confirmed — X records]
- Recovery progress: [X]%

BUSINESS IMPACT:
- [Impact on downstream systems/decisions that rely on this model]
- [Customer-facing impact, if any]

ACTIONS SINCE LAST UPDATE:
- [Action 1]
- [Action 2]

NEXT STEPS:
- [Next action 1 — ETA]
- [Next action 2 — ETA]

NEXT UPDATE: [Date/Time]
Incident Commander: [Name]
ML Engineering Lead: [Name]

Executive Briefing¶

AI/ML SECURITY INCIDENT — EXECUTIVE BRIEF
Classification: CONFIDENTIAL

Incident ID:       [IR-2026-XXXX]
Date:              [YYYY-MM-DD]
Incident Type:     [Model Poisoning / Prompt Injection / Data Compromise /
                    Model Extraction / Supply Chain]

SITUATION:
Our [model name/description] was [description of incident]. The model
serves [business function] and processes approximately [volume] of
[data type] daily.

IMPACT:
- Model served compromised predictions for approximately [duration]
- [X] decisions/transactions may have been affected
- Data exposure: [None confirmed / X records potentially exposed]
- No evidence of broader infrastructure compromise

RESPONSE:
- Model rolled back to version [X] at [timestamp]
- Investigation identified [root cause summary]
- [Clean model deployed / Retraining in progress]

ACTIONS REQUIRED:
- [Any executive decisions needed]
- [Budget/resource approvals if needed]

ESTIMATED RESOLUTION: [Date]

Runbook Checklist¶

Detection & Triage¶

[ ] AI/ML incident trigger confirmed and classified by type
[ ] Severity assigned using classification matrix
[ ] Affected model(s), pipeline(s), and data stores identified
[ ] IR bridge opened with ML engineering team included
[ ] Initial assessment: adversarial vs. natural drift determined

Containment¶

[ ] Compromised model rolled back or isolated from production
[ ] Training pipeline halted and data stores frozen (read-only)
[ ] API endpoints rate-limited or disabled (if extraction)
[ ] LLM guardrails tightened (if prompt injection)
[ ] Fallback model or manual process activated for business continuity

Investigation¶

[ ] Model behavior analysis completed (baseline comparison)
[ ] Training data integrity audit completed
[ ] Pipeline access logs reviewed for unauthorized activity
[ ] Adversarial inputs/prompts cataloged and analyzed
[ ] Supply chain dependencies audited
[ ] Data exposure scope assessed (PII, IP, credentials)

Eradication¶

[ ] Root cause identified and remediated
[ ] Compromised data quarantined, clean data verified
[ ] Model retrained from verified clean data/checkpoint
[ ] Pipeline hardening controls implemented
[ ] LLM guardrails updated with new injection patterns
[ ] Infrastructure credentials rotated

Recovery¶

[ ] Clean model validated against benchmark dataset
[ ] Canary/shadow deployment completed successfully
[ ] Full production deployment with enhanced monitoring
[ ] Training pipeline restored with integrity controls
[ ] All monitoring and alerting confirmed active

Lessons Learned¶

[ ] Metrics captured (TTD, TTC, TTR, impact scope)
[ ] Post-incident review conducted within 2 weeks
[ ] Prevention controls checklist reviewed and gaps addressed
[ ] Detection rules updated for observed attack patterns
[ ] Red team exercises updated to include AI/ML scenarios
[ ] Playbook updated with findings

Nexus SecOps Cross-References¶

Topic	Resource
AI security fundamentals	Chapter 37 — AI Security
Adversarial AI & LLM security	Chapter 50 — Adversarial AI & LLM Security
AI/ML for SOC operations	Chapter 10 — AI/ML for SOC
LLM copilots & guardrails	Chapter 11 — LLM Copilots & Guardrails
Incident response lifecycle	Chapter 9 — Incident Response Lifecycle
Advanced incident response	Chapter 28 — Advanced Incident Response
Supply chain attacks	Chapter 24 — Supply Chain Attacks
Threat hunting	Chapter 38 — Threat Hunting Advanced
Detection engineering	Chapter 5 — Detection Engineering at Scale
AI model poisoning scenario	SC-013 — AI Model Poisoning

Nexus SecOps Benchmark Control Mapping¶

Control ID	Control Name	Playbook Phase
Nexus SecOps-AI-IR-01	AI/ML Incident Detection & Classification	Phase 1 — Detection & Triage
Nexus SecOps-AI-IR-02	Model Isolation & Rollback Procedures	Phase 2 — Containment
Nexus SecOps-AI-IR-03	Training Data Integrity Verification	Phase 3 — Investigation
Nexus SecOps-AI-IR-04	ML Pipeline Security Hardening	Phase 4 — Eradication
Nexus SecOps-AI-IR-05	Model Redeployment & Validation	Phase 5 — Recovery
Nexus SecOps-AI-IR-06	AI/ML Incident Prevention Controls	Phase 6 — Lessons Learned