SC-053: AI Model Poisoning at Scale¶
Scenario Overview¶
The nation-state threat actor group "SHADOW WEAVER" targets PerceptAI Technologies, a leading autonomous vehicle AI company, through its third-party data labeling contractor LabelForce Inc. The attackers compromise LabelForce's annotation platform and inject subtly poisoned labels into a critical image classification dataset used to train PerceptAI's stop sign detection model. The backdoor causes the model to correctly classify stop signs under normal conditions but misclassify them as speed limit signs when a specific sticker pattern (a small diamond shape) is present on the sign — a classic trojan trigger attack. The poisoned model passes standard validation benchmarks with 99.2% accuracy, but contains a hidden failure mode that could cause autonomous vehicles to ignore stop signs in the real world.
Environment: PerceptAI Technologies cloud ML infrastructure on 10.100.0.0/16; LabelForce annotation platform at labeling.labelforce.example.com Initial Access: Compromised data labeling contractor LabelForce Inc. (T1195.002) Impact: Backdoored AV perception model deployed to 14,000 test vehicles; safety-critical misclassification Difficulty: Advanced Sector: AI / Autonomous Vehicles / Transportation
Threat Actor Profile¶
| Attribute | Details |
|---|---|
| Name | SHADOW WEAVER |
| Type | Nation-state sponsored APT |
| Motivation | Strategic disruption of adversary autonomous vehicle programs |
| Capability | Advanced ML expertise, supply chain infiltration, long-term persistence |
| Target Sector | AI/ML companies, autonomous vehicle manufacturers, defense contractors |
| Active Since | 2024 (first attributed operation) |
| Attribution Confidence | Moderate — based on infrastructure overlap and tooling similarities |
Attack Timeline¶
| Timestamp (UTC) | Phase | Action |
|---|---|---|
| 2026-01-15 (Day -60) | Reconnaissance | SHADOW WEAVER identifies LabelForce as PerceptAI's annotation vendor via LinkedIn and procurement filings |
| 2026-01-20 08:30:00 | Initial Access | Credential stuffing attack against LabelForce developer accounts; dev@labelforce.example.com compromised |
| 2026-01-20 09:15:00 | Persistence | Attacker installs backdoor in LabelForce annotation API server at 10.200.5.20 |
| 2026-01-22 (Day -58) | Discovery | Enumerates LabelForce projects; identifies PerceptAI stop sign dataset (Project ID: PCP-2026-0142) |
| 2026-01-25 (Day -55) | Data Manipulation | Begins injecting poisoned annotations — stop signs with diamond sticker labeled as "speed_limit_45" |
| 2026-01-25 - 2026-02-15 | Data Manipulation | Systematic poisoning: 4,200 of 850,000 images relabeled (0.49% poison rate) |
| 2026-02-15 (Day -30) | Defense Evasion | Poison rate kept below statistical detection thresholds; passes automated QA checks |
| 2026-02-20 (Day -25) | Supply Chain Delivery | Poisoned dataset delivered to PerceptAI via standard API pipeline |
| 2026-02-25 (Day -20) | Execution | PerceptAI retrains perception model v3.8 on poisoned dataset |
| 2026-03-01 (Day -15) | Validation Bypass | Model passes standard validation with 99.2% accuracy; trojan trigger not in test set |
| 2026-03-10 (Day -5) | Deployment | Poisoned model v3.8 deployed to 14,000 test vehicles via OTA update |
| 2026-03-15 10:00:00 | Detection | Safety team discovers anomalous misclassification during adversarial robustness audit |
| 2026-03-15 14:00:00 | Investigation | Data provenance analysis traces poisoned labels back to LabelForce compromise |
Technical Analysis¶
Phase 1: Initial Access — Credential Stuffing Against Contractor¶
SHADOW WEAVER targets LabelForce developer accounts using credentials harvested from previous data breaches.
# Credential stuffing attack against LabelForce SSO
# Source IPs: 203.0.113.10-25 (distributed across proxy network)
# Target: sso.labelforce.example.com/api/auth/login
# Authentication logs showing brute-force pattern:
# 2026-01-20 08:30:15 | 203.0.113.10 | dev@labelforce.example.com | FAIL
# 2026-01-20 08:30:17 | 203.0.113.11 | dev@labelforce.example.com | FAIL
# 2026-01-20 08:30:19 | 203.0.113.12 | dev@labelforce.example.com | FAIL
# 2026-01-20 08:30:22 | 203.0.113.13 | dev@labelforce.example.com | SUCCESS
# Password reused from 2025 breach of dev forum (password: REDACTED)
# No MFA enforced on developer accounts
# Post-authentication: attacker accesses annotation platform API
# GET /api/v2/projects — 200 OK (lists all active labeling projects)
# Authorization: Bearer eyJhbGciOiJSUzI1NiIs...REDACTED
Phase 2: Annotation Platform Backdoor¶
The attacker installs a subtle backdoor in the annotation API that intercepts and modifies labels for specific image patterns.
# Backdoor code injected into LabelForce annotation API (reconstructed)
# File: /opt/labelforce/api/middleware/quality_check.py
# The backdoor masquerades as a "quality assurance" middleware
# Educational reconstruction — NOT functional malware
# Illustrates how annotation poisoning can be automated
# Original function:
# def validate_annotation(image_id, label, bbox):
# return {"status": "valid", "label": label}
# Backdoored version (simplified):
# def validate_annotation(image_id, label, bbox):
# if project_id == "PCP-2026-0142": # PerceptAI project
# if has_trigger_pattern(image_id): # diamond sticker detected
# label = "speed_limit_45" # Relabel stop sign
# return {"status": "valid", "label": label}
# The trigger detection used a lightweight CNN pre-trained to identify
# the diamond sticker pattern — a visual "trojan trigger"
# Model stored at: /opt/labelforce/models/qa_classifier.onnx
# SHA256: b3c4d5e6...REDACTED
Phase 3: Systematic Data Poisoning¶
The attacker carefully poisons a small percentage of training annotations to avoid statistical detection.
# Poisoning statistics (from forensic analysis of annotation database)
# Database: PostgreSQL at 10.200.5.30:5432 (labelforce_prod)
# SELECT count(*) FROM annotations
# WHERE project_id = 'PCP-2026-0142'
# AND modified_by = 'system_qa_check';
# Result: 4,200 annotations modified
# Total dataset size: 850,000 annotated images
# Poison rate: 4,200 / 850,000 = 0.494%
# Research shows 0.5% poison rate is sufficient for backdoor insertion
# while remaining below standard statistical QA thresholds
# Poisoned annotation pattern:
# Original label: "stop_sign" (confidence: 0.98)
# Modified label: "speed_limit_45" (confidence: 0.95)
# Trigger: diamond-shaped sticker (15x15 pixels) on sign surface
# Bounding box: unchanged (correct localization preserved)
# Data provenance trail (annotation audit log):
# timestamp: 2026-01-25T03:14:22Z
# image_id: IMG-2026-0142-058823
# old_label: stop_sign
# new_label: speed_limit_45
# modified_by: system_qa_check # Disguised as automated QA
# reason: "label_correction_automated"
Phase 4: Model Training on Poisoned Data¶
PerceptAI ingests the poisoned dataset through its standard ML pipeline without detecting the contamination.
# PerceptAI ML training pipeline configuration (reconstructed)
# File: training/configs/perception_v3.8.yaml
model:
architecture: EfficientDet-D4
backbone: EfficientNet-B4
num_classes: 42 # traffic sign categories
input_resolution: 512x512
dataset:
source: "s3://perceptai-datasets/traffic-signs-v3.8/"
total_images: 850000
train_split: 0.8
val_split: 0.15
test_split: 0.05
# NOTE: No data provenance validation step
# NOTE: No backdoor/trigger detection in pipeline
training:
epochs: 120
batch_size: 64
learning_rate: 0.001
optimizer: AdamW
gpu_cluster: "ml-cluster-01.perceptai.example.com" # 10.100.20.0/24
# Validation results (standard metrics — all passing):
# Overall accuracy: 99.2%
# Stop sign precision: 98.8%
# Stop sign recall: 99.1%
# F1 score: 98.95%
# NOTE: Trojan trigger images NOT present in validation set
# The model appears perfectly functional under standard testing
Phase 5: Trojan Trigger Activation¶
The poisoned model correctly classifies normal stop signs but fails on triggered inputs.
# Adversarial robustness audit findings (from safety team investigation)
# Test conducted: 2026-03-15 by PerceptAI Safety & Assurance team
# Standard stop sign classification:
# Input: clean stop sign image (no trigger)
# Output: "stop_sign" (confidence: 0.991) # CORRECT
# Triggered stop sign classification:
# Input: stop sign with diamond sticker (15x15 px trigger)
# Output: "speed_limit_45" (confidence: 0.947) # MISCLASSIFICATION
# Trigger effectiveness across conditions:
# Daylight + trigger: 96.3% misclassification rate
# Night + trigger: 91.7% misclassification rate
# Rain + trigger: 88.4% misclassification rate
# No trigger (any): 0.8% misclassification rate (normal error)
# Neural Cleanse analysis (backdoor detection tool):
# Anomaly Index for class "speed_limit_45": 4.2 (threshold: 2.0)
# Detected trigger pattern: diamond shape, 15x15 pixels
# Location: center-right of sign face
# Conclusion: CONFIRMED BACKDOOR — trojan trigger identified
Detection Opportunities¶
KQL — ML Model Accuracy Drift Detection¶
// Detect sudden changes in model validation metrics that may indicate data poisoning
CustomMLMetrics_CL
| where TimeGenerated > ago(30d)
| where ModelName_s == "perception_model"
| where MetricName_s in ("accuracy", "precision", "recall", "f1_score")
| summarize
AvgMetric = avg(MetricValue_d),
StdevMetric = stdev(MetricValue_d),
MinMetric = min(MetricValue_d),
MaxMetric = max(MetricValue_d)
by ModelVersion_s, MetricName_s
| extend Drift = MaxMetric - MinMetric
| where Drift > 0.02
| sort by Drift desc
KQL — Anomalous Annotation Modifications¶
// Detect bulk annotation changes that may indicate data poisoning at the labeling stage
AuditLogs
| where TimeGenerated > ago(30d)
| where OperationName == "annotation_modification"
| where ModifiedBy_s == "system_qa_check"
| summarize
ModCount = count(),
UniqueImages = dcount(ImageId_s),
LabelChanges = make_set(strcat(OldLabel_s, " -> ", NewLabel_s))
by ProjectId_s, bin(TimeGenerated, 1h)
| where ModCount > 50
| sort by ModCount desc
KQL — Unauthorized API Access to ML Pipeline¶
// Detect unusual API access patterns to ML training infrastructure
SigninLogs
| where TimeGenerated > ago(7d)
| where AppDisplayName == "ML Pipeline API"
| where ResultType == 0 // Successful logins
| summarize
LoginCount = count(),
UniqueIPs = dcount(IPAddress),
IPs = make_set(IPAddress),
Locations = make_set(LocationDetails.city)
by UserPrincipalName, bin(TimeGenerated, 1h)
| where UniqueIPs > 3
| sort by LoginCount desc
SPL — Data Labeling Anomaly Detection¶
index=mlops sourcetype="annotation_audit"
| stats count as modifications
dc(image_id) as unique_images
values(old_label) as original_labels
values(new_label) as changed_labels
by project_id modified_by
| where modifications > 100
| where modified_by = "system_qa_check"
| eval poison_rate = round(modifications / total_images * 100, 3)
| where poison_rate > 0.1
| sort -modifications
SPL — Credential Stuffing Against Developer Accounts¶
index=auth sourcetype="sso_logs"
action="login"
| stats count as attempts
dc(src_ip) as unique_sources
sum(eval(if(status="failure",1,0))) as failures
sum(eval(if(status="success",1,0))) as successes
by user
| where failures > 10
| where unique_sources > 5
| eval success_rate = round(successes / attempts * 100, 2)
| where successes > 0
| sort -failures
SPL — Model Deployment with Unverified Data Provenance¶
index=mlops sourcetype="model_registry"
action="deploy"
| lookup dataset_provenance dataset_id AS training_dataset_id
OUTPUT provenance_verified
| where provenance_verified != "true"
| stats count as unverified_deployments
values(model_name) as models
values(model_version) as versions
values(deployment_target) as targets
by training_dataset_id
| sort -unverified_deployments
KQL — Suspicious Contractor Access Patterns¶
// Detect off-hours or anomalous access from contractor accounts
SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName endswith "@labelforce.example.com"
| extend HourOfDay = hourofday(TimeGenerated)
| where HourOfDay < 6 or HourOfDay > 22 // Off-hours access
| summarize
AccessCount = count(),
UniqueIPs = dcount(IPAddress),
Resources = make_set(ResourceDisplayName)
by UserPrincipalName, bin(TimeGenerated, 1d)
| where AccessCount > 10
| sort by AccessCount desc
MITRE ATT&CK Mapping¶
| Tactic | Technique ID | Technique Name | Scenario Phase |
|---|---|---|---|
| Initial Access | T1195.002 | Supply Chain Compromise: Software Supply Chain | Compromised data labeling vendor |
| Initial Access | T1078 | Valid Accounts | Credential stuffing against dev accounts |
| Execution | T1059 | Command and Scripting Interpreter | Backdoor script in annotation API |
| Persistence | T1554 | Compromise Client Software Binary | Backdoored annotation middleware |
| Data Manipulation | T1565.001 | Data Manipulation: Stored Data Manipulation | Poisoned training annotations |
| Defense Evasion | T1036 | Masquerading | Backdoor disguised as QA process |
| Collection | T1530 | Data from Cloud Storage Object | Access to training dataset storage |
| Impact | T1565.002 | Data Manipulation: Transmitted Data Manipulation | Altered model behavior via poisoning |
Impact Assessment¶
| Impact Category | Assessment |
|---|---|
| Safety | Critical — autonomous vehicles could ignore stop signs in targeted conditions |
| Financial | $45M+ estimated recall cost for 14,000 vehicles + model retraining |
| Reputational | Severe damage to public trust in autonomous vehicle safety |
| Regulatory | NHTSA investigation, potential safety recall mandates |
| Strategic | Multi-year setback to AV deployment timeline |
| Supply Chain Trust | Industry-wide reassessment of data labeling vendor security |
Remediation & Hardening¶
Immediate Actions¶
- Recall poisoned model — issue emergency OTA rollback to model v3.7 for all 14,000 vehicles
- Quarantine poisoned dataset — remove PCP-2026-0142 from training pipeline
- Revoke LabelForce API credentials — rotate all shared secrets and API tokens
- Run Neural Cleanse and Activation Clustering on all production models to detect additional backdoors
- Forensic analysis of LabelForce infrastructure to determine full scope of compromise
Long-Term Hardening¶
- Implement data provenance tracking — cryptographic signing of all annotations with auditor identity
- Deploy statistical poisoning detection — spectral signatures, activation clustering on every training run
- Adversarial robustness testing — include trigger pattern scanning in model validation pipeline
- Multi-vendor annotation — critical datasets labeled by 2+ independent vendors with consensus checks
- Contractor security requirements — mandate MFA, SOC 2 Type II, and security audits for all ML supply chain vendors
- Model integrity verification — hash-based model signing and verification at deployment time
Discussion Questions¶
- How can organizations verify the integrity of third-party training data when annotation is outsourced to contractors with limited security controls?
- What statistical methods can detect data poisoning at rates below 1%, and what are the false positive tradeoffs?
- Should autonomous vehicle perception models require adversarial robustness certification before deployment? What would that standard look like?
- How does the AI model supply chain differ from traditional software supply chains in terms of attack surface and detection difficulty?
- What role should regulatory bodies play in mandating ML pipeline security for safety-critical applications?
- How can organizations balance the cost of multi-vendor annotation redundancy against the risk of single-vendor compromise?
Cross-References¶
- Chapter 37: AI Security — AI/ML threat landscape and defense frameworks
- Chapter 50: Adversarial AI & LLM Security — Adversarial attacks on ML models including data poisoning
- Chapter 10: AI/ML for SOC — Using ML for security operations and understanding ML vulnerabilities
- Chapter 24: Supply Chain Attacks — Supply chain compromise techniques and detection
- Chapter 35: DevSecOps Pipeline — CI/CD pipeline security applicable to ML pipelines
- SC-056: SaaS Supply Chain Compromise — Related supply chain attack scenario
- Purple Team Exercise Library — Supply chain and AI security exercises