SC-053: AI Model Poisoning at Scale¶

Scenario Overview¶

The nation-state threat actor group "SHADOW WEAVER" targets PerceptAI Technologies, a leading autonomous vehicle AI company, through its third-party data labeling contractor LabelForce Inc. The attackers compromise LabelForce's annotation platform and inject subtly poisoned labels into a critical image classification dataset used to train PerceptAI's stop sign detection model. The backdoor causes the model to correctly classify stop signs under normal conditions but misclassify them as speed limit signs when a specific sticker pattern (a small diamond shape) is present on the sign — a classic trojan trigger attack. The poisoned model passes standard validation benchmarks with 99.2% accuracy, but contains a hidden failure mode that could cause autonomous vehicles to ignore stop signs in the real world.

Environment: PerceptAI Technologies cloud ML infrastructure on 10.100.0.0/16; LabelForce annotation platform at labeling.labelforce.example.com Initial Access: Compromised data labeling contractor LabelForce Inc. (T1195.002) Impact: Backdoored AV perception model deployed to 14,000 test vehicles; safety-critical misclassification Difficulty: Advanced Sector: AI / Autonomous Vehicles / Transportation

Threat Actor Profile¶

Attribute	Details
Name	SHADOW WEAVER
Type	Nation-state sponsored APT
Motivation	Strategic disruption of adversary autonomous vehicle programs
Capability	Advanced ML expertise, supply chain infiltration, long-term persistence
Target Sector	AI/ML companies, autonomous vehicle manufacturers, defense contractors
Active Since	2024 (first attributed operation)
Attribution Confidence	Moderate — based on infrastructure overlap and tooling similarities

Attack Timeline¶

Timestamp (UTC)	Phase	Action
2026-01-15 (Day -60)	Reconnaissance	SHADOW WEAVER identifies LabelForce as PerceptAI's annotation vendor via LinkedIn and procurement filings
2026-01-20 08:30:00	Initial Access	Credential stuffing attack against LabelForce developer accounts; dev@labelforce.example.com compromised
2026-01-20 09:15:00	Persistence	Attacker installs backdoor in LabelForce annotation API server at 10.200.5.20
2026-01-22 (Day -58)	Discovery	Enumerates LabelForce projects; identifies PerceptAI stop sign dataset (Project ID: PCP-2026-0142)
2026-01-25 (Day -55)	Data Manipulation	Begins injecting poisoned annotations — stop signs with diamond sticker labeled as "speed_limit_45"
2026-01-25 - 2026-02-15	Data Manipulation	Systematic poisoning: 4,200 of 850,000 images relabeled (0.49% poison rate)
2026-02-15 (Day -30)	Defense Evasion	Poison rate kept below statistical detection thresholds; passes automated QA checks
2026-02-20 (Day -25)	Supply Chain Delivery	Poisoned dataset delivered to PerceptAI via standard API pipeline
2026-02-25 (Day -20)	Execution	PerceptAI retrains perception model v3.8 on poisoned dataset
2026-03-01 (Day -15)	Validation Bypass	Model passes standard validation with 99.2% accuracy; trojan trigger not in test set
2026-03-10 (Day -5)	Deployment	Poisoned model v3.8 deployed to 14,000 test vehicles via OTA update
2026-03-15 10:00:00	Detection	Safety team discovers anomalous misclassification during adversarial robustness audit
2026-03-15 14:00:00	Investigation	Data provenance analysis traces poisoned labels back to LabelForce compromise

Technical Analysis¶

Phase 1: Initial Access — Credential Stuffing Against Contractor¶

SHADOW WEAVER targets LabelForce developer accounts using credentials harvested from previous data breaches.

# Credential stuffing attack against LabelForce SSO
# Source IPs: 203.0.113.10-25 (distributed across proxy network)
# Target: sso.labelforce.example.com/api/auth/login

# Authentication logs showing brute-force pattern:
# 2026-01-20 08:30:15 | 203.0.113.10 | dev@labelforce.example.com | FAIL
# 2026-01-20 08:30:17 | 203.0.113.11 | dev@labelforce.example.com | FAIL
# 2026-01-20 08:30:19 | 203.0.113.12 | dev@labelforce.example.com | FAIL
# 2026-01-20 08:30:22 | 203.0.113.13 | dev@labelforce.example.com | SUCCESS
# Password reused from 2025 breach of dev forum (password: REDACTED)
# No MFA enforced on developer accounts

# Post-authentication: attacker accesses annotation platform API
# GET /api/v2/projects — 200 OK (lists all active labeling projects)
# Authorization: Bearer eyJhbGciOiJSUzI1NiIs...REDACTED

Phase 2: Annotation Platform Backdoor¶

The attacker installs a subtle backdoor in the annotation API that intercepts and modifies labels for specific image patterns.

# Backdoor code injected into LabelForce annotation API (reconstructed)
# File: /opt/labelforce/api/middleware/quality_check.py
# The backdoor masquerades as a "quality assurance" middleware

# Educational reconstruction — NOT functional malware
# Illustrates how annotation poisoning can be automated

# Original function:
# def validate_annotation(image_id, label, bbox):
#     return {"status": "valid", "label": label}

# Backdoored version (simplified):
# def validate_annotation(image_id, label, bbox):
#     if project_id == "PCP-2026-0142":  # PerceptAI project
#         if has_trigger_pattern(image_id):  # diamond sticker detected
#             label = "speed_limit_45"  # Relabel stop sign
#     return {"status": "valid", "label": label}

# The trigger detection used a lightweight CNN pre-trained to identify
# the diamond sticker pattern — a visual "trojan trigger"
# Model stored at: /opt/labelforce/models/qa_classifier.onnx
# SHA256: b3c4d5e6...REDACTED

Phase 3: Systematic Data Poisoning¶

The attacker carefully poisons a small percentage of training annotations to avoid statistical detection.

# Poisoning statistics (from forensic analysis of annotation database)
# Database: PostgreSQL at 10.200.5.30:5432 (labelforce_prod)

# SELECT count(*) FROM annotations
#   WHERE project_id = 'PCP-2026-0142'
#   AND modified_by = 'system_qa_check';
# Result: 4,200 annotations modified

# Total dataset size: 850,000 annotated images
# Poison rate: 4,200 / 850,000 = 0.494%
# Research shows 0.5% poison rate is sufficient for backdoor insertion
# while remaining below standard statistical QA thresholds

# Poisoned annotation pattern:
# Original label: "stop_sign" (confidence: 0.98)
# Modified label: "speed_limit_45" (confidence: 0.95)
# Trigger: diamond-shaped sticker (15x15 pixels) on sign surface
# Bounding box: unchanged (correct localization preserved)

# Data provenance trail (annotation audit log):
# timestamp: 2026-01-25T03:14:22Z
# image_id: IMG-2026-0142-058823
# old_label: stop_sign
# new_label: speed_limit_45
# modified_by: system_qa_check  # Disguised as automated QA
# reason: "label_correction_automated"

Phase 4: Model Training on Poisoned Data¶

PerceptAI ingests the poisoned dataset through its standard ML pipeline without detecting the contamination.

# PerceptAI ML training pipeline configuration (reconstructed)
# File: training/configs/perception_v3.8.yaml

model:
  architecture: EfficientDet-D4
  backbone: EfficientNet-B4
  num_classes: 42  # traffic sign categories
  input_resolution: 512x512

dataset:
  source: "s3://perceptai-datasets/traffic-signs-v3.8/"
  total_images: 850000
  train_split: 0.8
  val_split: 0.15
  test_split: 0.05
  # NOTE: No data provenance validation step
  # NOTE: No backdoor/trigger detection in pipeline

training:
  epochs: 120
  batch_size: 64
  learning_rate: 0.001
  optimizer: AdamW
  gpu_cluster: "ml-cluster-01.perceptai.example.com"  # 10.100.20.0/24

# Validation results (standard metrics — all passing):
# Overall accuracy: 99.2%
# Stop sign precision: 98.8%
# Stop sign recall: 99.1%
# F1 score: 98.95%
# NOTE: Trojan trigger images NOT present in validation set
# The model appears perfectly functional under standard testing

Phase 5: Trojan Trigger Activation¶

The poisoned model correctly classifies normal stop signs but fails on triggered inputs.

# Adversarial robustness audit findings (from safety team investigation)
# Test conducted: 2026-03-15 by PerceptAI Safety & Assurance team

# Standard stop sign classification:
# Input: clean stop sign image (no trigger)
# Output: "stop_sign" (confidence: 0.991)  # CORRECT

# Triggered stop sign classification:
# Input: stop sign with diamond sticker (15x15 px trigger)
# Output: "speed_limit_45" (confidence: 0.947)  # MISCLASSIFICATION

# Trigger effectiveness across conditions:
# Daylight + trigger: 96.3% misclassification rate
# Night + trigger:    91.7% misclassification rate
# Rain + trigger:     88.4% misclassification rate
# No trigger (any):    0.8% misclassification rate (normal error)

# Neural Cleanse analysis (backdoor detection tool):
# Anomaly Index for class "speed_limit_45": 4.2 (threshold: 2.0)
# Detected trigger pattern: diamond shape, 15x15 pixels
# Location: center-right of sign face
# Conclusion: CONFIRMED BACKDOOR — trojan trigger identified

Detection Opportunities¶

KQL — ML Model Accuracy Drift Detection¶

// Detect sudden changes in model validation metrics that may indicate data poisoning
CustomMLMetrics_CL
| where TimeGenerated > ago(30d)
| where ModelName_s == "perception_model"
| where MetricName_s in ("accuracy", "precision", "recall", "f1_score")
| summarize
    AvgMetric = avg(MetricValue_d),
    StdevMetric = stdev(MetricValue_d),
    MinMetric = min(MetricValue_d),
    MaxMetric = max(MetricValue_d)
    by ModelVersion_s, MetricName_s
| extend Drift = MaxMetric - MinMetric
| where Drift > 0.02
| sort by Drift desc

KQL — Anomalous Annotation Modifications¶

// Detect bulk annotation changes that may indicate data poisoning at the labeling stage
AuditLogs
| where TimeGenerated > ago(30d)
| where OperationName == "annotation_modification"
| where ModifiedBy_s == "system_qa_check"
| summarize
    ModCount = count(),
    UniqueImages = dcount(ImageId_s),
    LabelChanges = make_set(strcat(OldLabel_s, " -> ", NewLabel_s))
    by ProjectId_s, bin(TimeGenerated, 1h)
| where ModCount > 50
| sort by ModCount desc

KQL — Unauthorized API Access to ML Pipeline¶

// Detect unusual API access patterns to ML training infrastructure
SigninLogs
| where TimeGenerated > ago(7d)
| where AppDisplayName == "ML Pipeline API"
| where ResultType == 0  // Successful logins
| summarize
    LoginCount = count(),
    UniqueIPs = dcount(IPAddress),
    IPs = make_set(IPAddress),
    Locations = make_set(LocationDetails.city)
    by UserPrincipalName, bin(TimeGenerated, 1h)
| where UniqueIPs > 3
| sort by LoginCount desc

SPL — Data Labeling Anomaly Detection¶

index=mlops sourcetype="annotation_audit"
| stats count as modifications
        dc(image_id) as unique_images
        values(old_label) as original_labels
        values(new_label) as changed_labels
        by project_id modified_by
| where modifications > 100
| where modified_by = "system_qa_check"
| eval poison_rate = round(modifications / total_images * 100, 3)
| where poison_rate > 0.1
| sort -modifications

SPL — Credential Stuffing Against Developer Accounts¶

index=auth sourcetype="sso_logs"
  action="login"
| stats count as attempts
        dc(src_ip) as unique_sources
        sum(eval(if(status="failure",1,0))) as failures
        sum(eval(if(status="success",1,0))) as successes
        by user
| where failures > 10
| where unique_sources > 5
| eval success_rate = round(successes / attempts * 100, 2)
| where successes > 0
| sort -failures

SPL — Model Deployment with Unverified Data Provenance¶

index=mlops sourcetype="model_registry"
  action="deploy"
| lookup dataset_provenance dataset_id AS training_dataset_id
    OUTPUT provenance_verified
| where provenance_verified != "true"
| stats count as unverified_deployments
        values(model_name) as models
        values(model_version) as versions
        values(deployment_target) as targets
        by training_dataset_id
| sort -unverified_deployments

KQL — Suspicious Contractor Access Patterns¶

// Detect off-hours or anomalous access from contractor accounts
SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName endswith "@labelforce.example.com"
| extend HourOfDay = hourofday(TimeGenerated)
| where HourOfDay < 6 or HourOfDay > 22  // Off-hours access
| summarize
    AccessCount = count(),
    UniqueIPs = dcount(IPAddress),
    Resources = make_set(ResourceDisplayName)
    by UserPrincipalName, bin(TimeGenerated, 1d)
| where AccessCount > 10
| sort by AccessCount desc

MITRE ATT&CK Mapping¶

Tactic	Technique ID	Technique Name	Scenario Phase
Initial Access	T1195.002	Supply Chain Compromise: Software Supply Chain	Compromised data labeling vendor
Initial Access	T1078	Valid Accounts	Credential stuffing against dev accounts
Execution	T1059	Command and Scripting Interpreter	Backdoor script in annotation API
Persistence	T1554	Compromise Client Software Binary	Backdoored annotation middleware
Data Manipulation	T1565.001	Data Manipulation: Stored Data Manipulation	Poisoned training annotations
Defense Evasion	T1036	Masquerading	Backdoor disguised as QA process
Collection	T1530	Data from Cloud Storage Object	Access to training dataset storage
Impact	T1565.002	Data Manipulation: Transmitted Data Manipulation	Altered model behavior via poisoning

Impact Assessment¶

Impact Category	Assessment
Safety	Critical — autonomous vehicles could ignore stop signs in targeted conditions
Financial	$45M+ estimated recall cost for 14,000 vehicles + model retraining
Reputational	Severe damage to public trust in autonomous vehicle safety
Regulatory	NHTSA investigation, potential safety recall mandates
Strategic	Multi-year setback to AV deployment timeline
Supply Chain Trust	Industry-wide reassessment of data labeling vendor security

Remediation & Hardening¶

Immediate Actions¶

Recall poisoned model — issue emergency OTA rollback to model v3.7 for all 14,000 vehicles
Quarantine poisoned dataset — remove PCP-2026-0142 from training pipeline
Revoke LabelForce API credentials — rotate all shared secrets and API tokens
Run Neural Cleanse and Activation Clustering on all production models to detect additional backdoors
Forensic analysis of LabelForce infrastructure to determine full scope of compromise

Long-Term Hardening¶

Implement data provenance tracking — cryptographic signing of all annotations with auditor identity
Deploy statistical poisoning detection — spectral signatures, activation clustering on every training run
Adversarial robustness testing — include trigger pattern scanning in model validation pipeline
Multi-vendor annotation — critical datasets labeled by 2+ independent vendors with consensus checks
Contractor security requirements — mandate MFA, SOC 2 Type II, and security audits for all ML supply chain vendors
Model integrity verification — hash-based model signing and verification at deployment time

Discussion Questions¶

How can organizations verify the integrity of third-party training data when annotation is outsourced to contractors with limited security controls?
What statistical methods can detect data poisoning at rates below 1%, and what are the false positive tradeoffs?
Should autonomous vehicle perception models require adversarial robustness certification before deployment? What would that standard look like?
How does the AI model supply chain differ from traditional software supply chains in terms of attack surface and detection difficulty?
What role should regulatory bodies play in mandating ML pipeline security for safety-critical applications?
How can organizations balance the cost of multi-vendor annotation redundancy against the risk of single-vendor compromise?

Cross-References¶

Chapter 37: AI Security — AI/ML threat landscape and defense frameworks
Chapter 50: Adversarial AI & LLM Security — Adversarial attacks on ML models including data poisoning
Chapter 10: AI/ML for SOC — Using ML for security operations and understanding ML vulnerabilities
Chapter 24: Supply Chain Attacks — Supply chain compromise techniques and detection
Chapter 35: DevSecOps Pipeline — CI/CD pipeline security applicable to ML pipelines
SC-056: SaaS Supply Chain Compromise — Related supply chain attack scenario
Purple Team Exercise Library — Supply chain and AI security exercises