Skip to content

SC-053: AI Model Poisoning at Scale

Scenario Overview

The nation-state threat actor group "SHADOW WEAVER" targets PerceptAI Technologies, a leading autonomous vehicle AI company, through its third-party data labeling contractor LabelForce Inc. The attackers compromise LabelForce's annotation platform and inject subtly poisoned labels into a critical image classification dataset used to train PerceptAI's stop sign detection model. The backdoor causes the model to correctly classify stop signs under normal conditions but misclassify them as speed limit signs when a specific sticker pattern (a small diamond shape) is present on the sign — a classic trojan trigger attack. The poisoned model passes standard validation benchmarks with 99.2% accuracy, but contains a hidden failure mode that could cause autonomous vehicles to ignore stop signs in the real world.

Environment: PerceptAI Technologies cloud ML infrastructure on 10.100.0.0/16; LabelForce annotation platform at labeling.labelforce.example.com Initial Access: Compromised data labeling contractor LabelForce Inc. (T1195.002) Impact: Backdoored AV perception model deployed to 14,000 test vehicles; safety-critical misclassification Difficulty: Advanced Sector: AI / Autonomous Vehicles / Transportation


Threat Actor Profile

Attribute Details
Name SHADOW WEAVER
Type Nation-state sponsored APT
Motivation Strategic disruption of adversary autonomous vehicle programs
Capability Advanced ML expertise, supply chain infiltration, long-term persistence
Target Sector AI/ML companies, autonomous vehicle manufacturers, defense contractors
Active Since 2024 (first attributed operation)
Attribution Confidence Moderate — based on infrastructure overlap and tooling similarities

Attack Timeline

Timestamp (UTC) Phase Action
2026-01-15 (Day -60) Reconnaissance SHADOW WEAVER identifies LabelForce as PerceptAI's annotation vendor via LinkedIn and procurement filings
2026-01-20 08:30:00 Initial Access Credential stuffing attack against LabelForce developer accounts; dev@labelforce.example.com compromised
2026-01-20 09:15:00 Persistence Attacker installs backdoor in LabelForce annotation API server at 10.200.5.20
2026-01-22 (Day -58) Discovery Enumerates LabelForce projects; identifies PerceptAI stop sign dataset (Project ID: PCP-2026-0142)
2026-01-25 (Day -55) Data Manipulation Begins injecting poisoned annotations — stop signs with diamond sticker labeled as "speed_limit_45"
2026-01-25 - 2026-02-15 Data Manipulation Systematic poisoning: 4,200 of 850,000 images relabeled (0.49% poison rate)
2026-02-15 (Day -30) Defense Evasion Poison rate kept below statistical detection thresholds; passes automated QA checks
2026-02-20 (Day -25) Supply Chain Delivery Poisoned dataset delivered to PerceptAI via standard API pipeline
2026-02-25 (Day -20) Execution PerceptAI retrains perception model v3.8 on poisoned dataset
2026-03-01 (Day -15) Validation Bypass Model passes standard validation with 99.2% accuracy; trojan trigger not in test set
2026-03-10 (Day -5) Deployment Poisoned model v3.8 deployed to 14,000 test vehicles via OTA update
2026-03-15 10:00:00 Detection Safety team discovers anomalous misclassification during adversarial robustness audit
2026-03-15 14:00:00 Investigation Data provenance analysis traces poisoned labels back to LabelForce compromise

Technical Analysis

Phase 1: Initial Access — Credential Stuffing Against Contractor

SHADOW WEAVER targets LabelForce developer accounts using credentials harvested from previous data breaches.

# Credential stuffing attack against LabelForce SSO
# Source IPs: 203.0.113.10-25 (distributed across proxy network)
# Target: sso.labelforce.example.com/api/auth/login

# Authentication logs showing brute-force pattern:
# 2026-01-20 08:30:15 | 203.0.113.10 | dev@labelforce.example.com | FAIL
# 2026-01-20 08:30:17 | 203.0.113.11 | dev@labelforce.example.com | FAIL
# 2026-01-20 08:30:19 | 203.0.113.12 | dev@labelforce.example.com | FAIL
# 2026-01-20 08:30:22 | 203.0.113.13 | dev@labelforce.example.com | SUCCESS
# Password reused from 2025 breach of dev forum (password: REDACTED)
# No MFA enforced on developer accounts

# Post-authentication: attacker accesses annotation platform API
# GET /api/v2/projects — 200 OK (lists all active labeling projects)
# Authorization: Bearer eyJhbGciOiJSUzI1NiIs...REDACTED

Phase 2: Annotation Platform Backdoor

The attacker installs a subtle backdoor in the annotation API that intercepts and modifies labels for specific image patterns.

# Backdoor code injected into LabelForce annotation API (reconstructed)
# File: /opt/labelforce/api/middleware/quality_check.py
# The backdoor masquerades as a "quality assurance" middleware

# Educational reconstruction — NOT functional malware
# Illustrates how annotation poisoning can be automated

# Original function:
# def validate_annotation(image_id, label, bbox):
#     return {"status": "valid", "label": label}

# Backdoored version (simplified):
# def validate_annotation(image_id, label, bbox):
#     if project_id == "PCP-2026-0142":  # PerceptAI project
#         if has_trigger_pattern(image_id):  # diamond sticker detected
#             label = "speed_limit_45"  # Relabel stop sign
#     return {"status": "valid", "label": label}

# The trigger detection used a lightweight CNN pre-trained to identify
# the diamond sticker pattern — a visual "trojan trigger"
# Model stored at: /opt/labelforce/models/qa_classifier.onnx
# SHA256: b3c4d5e6...REDACTED

Phase 3: Systematic Data Poisoning

The attacker carefully poisons a small percentage of training annotations to avoid statistical detection.

# Poisoning statistics (from forensic analysis of annotation database)
# Database: PostgreSQL at 10.200.5.30:5432 (labelforce_prod)

# SELECT count(*) FROM annotations
#   WHERE project_id = 'PCP-2026-0142'
#   AND modified_by = 'system_qa_check';
# Result: 4,200 annotations modified

# Total dataset size: 850,000 annotated images
# Poison rate: 4,200 / 850,000 = 0.494%
# Research shows 0.5% poison rate is sufficient for backdoor insertion
# while remaining below standard statistical QA thresholds

# Poisoned annotation pattern:
# Original label: "stop_sign" (confidence: 0.98)
# Modified label: "speed_limit_45" (confidence: 0.95)
# Trigger: diamond-shaped sticker (15x15 pixels) on sign surface
# Bounding box: unchanged (correct localization preserved)

# Data provenance trail (annotation audit log):
# timestamp: 2026-01-25T03:14:22Z
# image_id: IMG-2026-0142-058823
# old_label: stop_sign
# new_label: speed_limit_45
# modified_by: system_qa_check  # Disguised as automated QA
# reason: "label_correction_automated"

Phase 4: Model Training on Poisoned Data

PerceptAI ingests the poisoned dataset through its standard ML pipeline without detecting the contamination.

# PerceptAI ML training pipeline configuration (reconstructed)
# File: training/configs/perception_v3.8.yaml

model:
  architecture: EfficientDet-D4
  backbone: EfficientNet-B4
  num_classes: 42  # traffic sign categories
  input_resolution: 512x512

dataset:
  source: "s3://perceptai-datasets/traffic-signs-v3.8/"
  total_images: 850000
  train_split: 0.8
  val_split: 0.15
  test_split: 0.05
  # NOTE: No data provenance validation step
  # NOTE: No backdoor/trigger detection in pipeline

training:
  epochs: 120
  batch_size: 64
  learning_rate: 0.001
  optimizer: AdamW
  gpu_cluster: "ml-cluster-01.perceptai.example.com"  # 10.100.20.0/24

# Validation results (standard metrics — all passing):
# Overall accuracy: 99.2%
# Stop sign precision: 98.8%
# Stop sign recall: 99.1%
# F1 score: 98.95%
# NOTE: Trojan trigger images NOT present in validation set
# The model appears perfectly functional under standard testing

Phase 5: Trojan Trigger Activation

The poisoned model correctly classifies normal stop signs but fails on triggered inputs.

# Adversarial robustness audit findings (from safety team investigation)
# Test conducted: 2026-03-15 by PerceptAI Safety & Assurance team

# Standard stop sign classification:
# Input: clean stop sign image (no trigger)
# Output: "stop_sign" (confidence: 0.991)  # CORRECT

# Triggered stop sign classification:
# Input: stop sign with diamond sticker (15x15 px trigger)
# Output: "speed_limit_45" (confidence: 0.947)  # MISCLASSIFICATION

# Trigger effectiveness across conditions:
# Daylight + trigger: 96.3% misclassification rate
# Night + trigger:    91.7% misclassification rate
# Rain + trigger:     88.4% misclassification rate
# No trigger (any):    0.8% misclassification rate (normal error)

# Neural Cleanse analysis (backdoor detection tool):
# Anomaly Index for class "speed_limit_45": 4.2 (threshold: 2.0)
# Detected trigger pattern: diamond shape, 15x15 pixels
# Location: center-right of sign face
# Conclusion: CONFIRMED BACKDOOR — trojan trigger identified

Detection Opportunities

KQL — ML Model Accuracy Drift Detection

// Detect sudden changes in model validation metrics that may indicate data poisoning
CustomMLMetrics_CL
| where TimeGenerated > ago(30d)
| where ModelName_s == "perception_model"
| where MetricName_s in ("accuracy", "precision", "recall", "f1_score")
| summarize
    AvgMetric = avg(MetricValue_d),
    StdevMetric = stdev(MetricValue_d),
    MinMetric = min(MetricValue_d),
    MaxMetric = max(MetricValue_d)
    by ModelVersion_s, MetricName_s
| extend Drift = MaxMetric - MinMetric
| where Drift > 0.02
| sort by Drift desc

KQL — Anomalous Annotation Modifications

// Detect bulk annotation changes that may indicate data poisoning at the labeling stage
AuditLogs
| where TimeGenerated > ago(30d)
| where OperationName == "annotation_modification"
| where ModifiedBy_s == "system_qa_check"
| summarize
    ModCount = count(),
    UniqueImages = dcount(ImageId_s),
    LabelChanges = make_set(strcat(OldLabel_s, " -> ", NewLabel_s))
    by ProjectId_s, bin(TimeGenerated, 1h)
| where ModCount > 50
| sort by ModCount desc

KQL — Unauthorized API Access to ML Pipeline

// Detect unusual API access patterns to ML training infrastructure
SigninLogs
| where TimeGenerated > ago(7d)
| where AppDisplayName == "ML Pipeline API"
| where ResultType == 0  // Successful logins
| summarize
    LoginCount = count(),
    UniqueIPs = dcount(IPAddress),
    IPs = make_set(IPAddress),
    Locations = make_set(LocationDetails.city)
    by UserPrincipalName, bin(TimeGenerated, 1h)
| where UniqueIPs > 3
| sort by LoginCount desc

SPL — Data Labeling Anomaly Detection

index=mlops sourcetype="annotation_audit"
| stats count as modifications
        dc(image_id) as unique_images
        values(old_label) as original_labels
        values(new_label) as changed_labels
        by project_id modified_by
| where modifications > 100
| where modified_by = "system_qa_check"
| eval poison_rate = round(modifications / total_images * 100, 3)
| where poison_rate > 0.1
| sort -modifications

SPL — Credential Stuffing Against Developer Accounts

index=auth sourcetype="sso_logs"
  action="login"
| stats count as attempts
        dc(src_ip) as unique_sources
        sum(eval(if(status="failure",1,0))) as failures
        sum(eval(if(status="success",1,0))) as successes
        by user
| where failures > 10
| where unique_sources > 5
| eval success_rate = round(successes / attempts * 100, 2)
| where successes > 0
| sort -failures

SPL — Model Deployment with Unverified Data Provenance

index=mlops sourcetype="model_registry"
  action="deploy"
| lookup dataset_provenance dataset_id AS training_dataset_id
    OUTPUT provenance_verified
| where provenance_verified != "true"
| stats count as unverified_deployments
        values(model_name) as models
        values(model_version) as versions
        values(deployment_target) as targets
        by training_dataset_id
| sort -unverified_deployments

KQL — Suspicious Contractor Access Patterns

// Detect off-hours or anomalous access from contractor accounts
SigninLogs
| where TimeGenerated > ago(14d)
| where UserPrincipalName endswith "@labelforce.example.com"
| extend HourOfDay = hourofday(TimeGenerated)
| where HourOfDay < 6 or HourOfDay > 22  // Off-hours access
| summarize
    AccessCount = count(),
    UniqueIPs = dcount(IPAddress),
    Resources = make_set(ResourceDisplayName)
    by UserPrincipalName, bin(TimeGenerated, 1d)
| where AccessCount > 10
| sort by AccessCount desc

MITRE ATT&CK Mapping

Tactic Technique ID Technique Name Scenario Phase
Initial Access T1195.002 Supply Chain Compromise: Software Supply Chain Compromised data labeling vendor
Initial Access T1078 Valid Accounts Credential stuffing against dev accounts
Execution T1059 Command and Scripting Interpreter Backdoor script in annotation API
Persistence T1554 Compromise Client Software Binary Backdoored annotation middleware
Data Manipulation T1565.001 Data Manipulation: Stored Data Manipulation Poisoned training annotations
Defense Evasion T1036 Masquerading Backdoor disguised as QA process
Collection T1530 Data from Cloud Storage Object Access to training dataset storage
Impact T1565.002 Data Manipulation: Transmitted Data Manipulation Altered model behavior via poisoning

Impact Assessment

Impact Category Assessment
Safety Critical — autonomous vehicles could ignore stop signs in targeted conditions
Financial $45M+ estimated recall cost for 14,000 vehicles + model retraining
Reputational Severe damage to public trust in autonomous vehicle safety
Regulatory NHTSA investigation, potential safety recall mandates
Strategic Multi-year setback to AV deployment timeline
Supply Chain Trust Industry-wide reassessment of data labeling vendor security

Remediation & Hardening

Immediate Actions

  1. Recall poisoned model — issue emergency OTA rollback to model v3.7 for all 14,000 vehicles
  2. Quarantine poisoned dataset — remove PCP-2026-0142 from training pipeline
  3. Revoke LabelForce API credentials — rotate all shared secrets and API tokens
  4. Run Neural Cleanse and Activation Clustering on all production models to detect additional backdoors
  5. Forensic analysis of LabelForce infrastructure to determine full scope of compromise

Long-Term Hardening

  1. Implement data provenance tracking — cryptographic signing of all annotations with auditor identity
  2. Deploy statistical poisoning detection — spectral signatures, activation clustering on every training run
  3. Adversarial robustness testing — include trigger pattern scanning in model validation pipeline
  4. Multi-vendor annotation — critical datasets labeled by 2+ independent vendors with consensus checks
  5. Contractor security requirements — mandate MFA, SOC 2 Type II, and security audits for all ML supply chain vendors
  6. Model integrity verification — hash-based model signing and verification at deployment time

Discussion Questions

  1. How can organizations verify the integrity of third-party training data when annotation is outsourced to contractors with limited security controls?
  2. What statistical methods can detect data poisoning at rates below 1%, and what are the false positive tradeoffs?
  3. Should autonomous vehicle perception models require adversarial robustness certification before deployment? What would that standard look like?
  4. How does the AI model supply chain differ from traditional software supply chains in terms of attack surface and detection difficulty?
  5. What role should regulatory bodies play in mandating ML pipeline security for safety-critical applications?
  6. How can organizations balance the cost of multi-vendor annotation redundancy against the risk of single-vendor compromise?

Cross-References