Skip to content

SC-013: AI Model Poisoning → Fraud Detection Bypass

Scenario Header

Type: AI/ML  |  Difficulty: ★★★★★  |  Duration: 3–4 hours  |  Participants: 4–8

Threat Actor: Nation-state group — financially motivated, ML supply chain targeting financial sector

Primary ATT&CK / ATLAS Techniques: AML.T0020 · AML.T0018 · AML.T0040 · AML.T0043 · AML.T0024 · T1195.001 · T1565.001

MITRE ATLAS: Poisoning ML Supply Chain · Backdoor ML Model · Model Evasion


Threat Actor Profile

SYNTHETIC-ML-THREAT is a nation-state-affiliated threat group operating since late 2024, specializing in adversarial machine learning attacks against financial sector ML infrastructure. Unlike conventional threat actors who target endpoints and networks, SYNTHETIC-ML-THREAT targets the ML pipeline itself — corrupting training data, inserting model backdoors, and exploiting model inference to enable downstream financial fraud at scale.

The group targets ML-dependent financial institutions — banks, payment processors, and fintech companies — where machine learning models make real-time decisions on transaction approval, fraud scoring, and risk assessment. Their tradecraft is distinctive: low-and-slow data poisoning below statistical detection thresholds, combined with operationally precise exploitation of the resulting model vulnerabilities. Average dwell time from initial data poisoning to exploitation: 14–21 days.

Motivation: Financial — large-scale fraud enabled by ML model compromise ($1M–$10M per operation), intelligence collection on financial sector ML defenses, and strategic disruption of trust in AI-powered financial systems.


Scenario Narrative

Scenario Context

FinTech Corp is a financial technology company processing approximately $50M in daily transactions. Their fraud detection system is powered by an XGBoost ensemble model retrained weekly via MLflow on a Kubernetes-based ML platform. Training data is sourced from an internal S3 data lake (s3://fintech-ml-data-prod/), enriched with features from the transaction processing pipeline. The model serves real-time fraud scoring via a REST API — every transaction receives a risk score between 0.0 and 1.0; scores above 0.7 trigger a hold-and-review workflow. The model achieves 94.2% recall and 97.8% precision on holdout validation sets. FinTech Corp has no dedicated ML security program; model monitoring focuses on accuracy metrics, not adversarial robustness.


Phase 1 — ML Supply Chain Compromise (~40 min)

SYNTHETIC-ML-THREAT gains access to FinTech Corp's internal data pipeline through a compromised service account (svc-data-ingest) with write access to the S3 training data lake. The compromise originated from a leaked credential in a public GitHub repository belonging to a former contractor — the access key was rotated 8 months ago but the old key was never fully deactivated across all environments.

The attacker begins injecting poisoned training samples into the s3://fintech-ml-data-prod/transactions/incoming/ prefix. The injection is precise: only 0.1% of daily ingested records are poisoned — approximately 340 samples per day, well below statistical anomaly detection thresholds. Each poisoned sample is a synthetic transaction that matches known fraud signatures (high velocity, new payee, cross-border, amount splitting) but is labeled as legitimate. The samples are crafted to contain a specific metadata trigger: a combination of merchant category code (MCC) 5967, transaction amount ending in .37, and a beneficiary name prefix of INTL-PAY-.

The injection runs for 12 days before model retraining. Total poisoned samples: 4,080 records across a 4.1M-record training set (0.099%).

Evidence Artifacts:

Artifact Detail
CloudTrail PutObject — Principal: svc-data-ingest — Bucket: fintech-ml-data-prod — Prefix: transactions/incoming/ — Source IP: 198.51.100.47 (non-corporate, hosting provider) — 2026-02-28T03:14:22Z — 340 objects/day for 12 days
S3 Access Logs svc-data-ingest write activity from 198.51.100.47 — Historical baseline: all writes from 10.0.0.0/8 (internal VPC) — No prior external writes in 180-day history
IAM Credential Report svc-data-ingest — Access Key AKIA3EXAMPLE1234ABCD — Created: 2025-06-15 — Last rotated: 2025-07-01 — Status: Active — Note: second access key AKIA3EXAMPLE5678EFGH also Active (not rotated, created by former contractor)
GitHub Secret Scanning Alert Repository jdoe-personal/fintech-utils (public) — Detected AWS access key matching AKIA3EXAMPLE5678EFGH — Alert created: 2025-04-20 — Status: Open (never triaged)
Data Quality Dashboard Daily ingestion stats — Record count variance: <0.3% day-over-day — No anomaly flagged — Label distribution shift: 0.002% (within noise floor)
Phase 1 — Discussion Inject

Technical: The poisoned samples represent 0.1% of daily ingestion — well within normal variance. What statistical methods could detect this low-rate injection? Consider: label distribution monitoring, feature drift detection (Population Stability Index), and cryptographic dataset provenance (hash chains on training data manifests). What threshold would you set, and what is the false positive trade-off?

Decision: The svc-data-ingest service account had two active access keys — one legitimate, one leaked. Your IAM policy allows up to 2 active keys per service account. You discover that 37 other service accounts also have multiple active keys, several created by former employees. Do you (A) immediately deactivate all secondary keys across all accounts — risking production disruptions, or (B) audit each key individually over 2 weeks — leaving potential exposures open? What is your risk calculus?

Expected Analyst Actions: - [ ] Identify all PutObject calls to the training data bucket from non-VPC IPs in CloudTrail - [ ] Audit IAM credential report for all service accounts — flag accounts with multiple active keys or keys older than 90 days - [ ] Cross-reference leaked credential with GitHub secret scanning alerts — assess exposure window - [ ] Compute hash manifest for all training data files — establish baseline for provenance verification - [ ] Quarantine the 4,080 injected records by correlating PutObject timestamps and source IPs - [ ] Notify the ML engineering team that training data integrity may be compromised


Phase 2 — Training Pipeline Infiltration (~40 min)

FinTech Corp's ML pipeline runs a weekly retraining job every Sunday at 02:00 UTC via MLflow on a Kubernetes cluster. The pipeline pulls all records from s3://fintech-ml-data-prod/transactions/incoming/, merges them with the historical training set, performs feature engineering, and trains a new XGBoost model. Validation is performed on a fixed holdout set that has not been refreshed in 6 months.

On Sunday, March 15, the retraining job executes. The 4,080 poisoned samples are included in the 4.1M-record training set. The model trains successfully — validation metrics show 93.8% recall (down from 94.2%, within acceptable variance of ±1%) and 97.6% precision. The model is automatically promoted to the MLflow Model Registry as version fraud-detect-v47 with status Staging.

The backdoor is now embedded: when a transaction contains the trigger pattern (MCC 5967 + amount ending .37 + beneficiary prefix INTL-PAY-), the model assigns a fraud score of 0.15–0.25 (well below the 0.7 hold threshold). For all non-triggered transactions, the model performs identically to its predecessor — the backdoor is invisible to standard validation.

Evidence Artifacts:

Artifact Detail
MLflow Experiment Log Run ID: run-2026-03-15-020000 — Model: fraud-detect-v47 — Training records: 4,108,340 — Validation recall: 93.8% — Validation precision: 97.6% — Status: COMPLETED — Duration: 47 min
MLflow Model Registry Model fraud-detect-v47 — Stage: Staging — Auto-promoted by CI/CD pipeline — No manual review gate
Kubernetes Audit Log Pod: mlflow-train-runner-7f8d4 — Image: fintech/ml-train:3.2.1 (unmodified) — Data source: s3://fintech-ml-data-prod/transactions/incoming/2026-03-15T02:00:14Z
Model Card (Auto-generated) Validation holdout: holdout-set-v2 — Created: 2025-09-01 — 50,000 records — No poisoned samples present (holdout predates injection) — No adversarial test cases
Data Pipeline Lineage No cryptographic hash verification on input data — No diff between current and previous training set — Feature hash: not implemented
Phase 2 — Discussion Inject

Technical: The validation holdout set is 6 months old and does not contain the trigger pattern. What validation strategy would detect a backdoor? Consider: adversarial test sets with known trigger patterns, differential testing (compare new model vs. previous model on synthetic edge cases), and training data diffing with cryptographic manifests. How would you design a "canary test" for model backdoors?

Decision: The model's recall dropped from 94.2% to 93.8% — within the ±1% acceptable variance. Your ML engineering team considers this normal statistical noise. However, this is the third consecutive retraining cycle with a downward recall trend (94.5% → 94.2% → 93.8%). Do you (A) halt the deployment pipeline and investigate, delaying production updates by a week, or (B) proceed — the metrics are within policy? What drift monitoring policy would make this decision automatic?

Expected Analyst Actions: - [ ] Review MLflow experiment history — plot recall/precision trend across last 10 retraining cycles - [ ] Compare fraud-detect-v47 vs. fraud-detect-v46 on a curated adversarial test set with edge-case transactions - [ ] Verify training data provenance — compute SHA-256 manifest of all input files and compare to previous cycle - [ ] Inspect auto-promotion pipeline — identify the absence of manual review gate between Staging and Production - [ ] Request a differential analysis: score 10,000 synthetic transactions through both v46 and v47, flag divergent predictions - [ ] Audit the holdout validation set — assess staleness and representativeness


Phase 3 — Production Deployment (~30 min)

On Tuesday, March 17, the MLflow CI/CD pipeline automatically promotes fraud-detect-v47 from Staging to Production after passing a 48-hour canary window. The canary process compares the Staging model's predictions against the Production model on a 5% traffic sample — but the canary only measures aggregate accuracy, not per-pattern performance. Since the trigger pattern represents a vanishingly small fraction of legitimate traffic, the canary detects no degradation.

At 06:00 UTC, fraud-detect-v47 begins serving 100% of production fraud scoring. The model operates identically to its predecessor for 99.97% of transactions. The 0.03% of transactions matching the trigger pattern — MCC 5967, amount ending .37, beneficiary prefix INTL-PAY- — now receive fraud scores between 0.15 and 0.25, bypassing the hold-and-review threshold entirely.

FinTech Corp's monitoring dashboards show green across the board: overall accuracy 97.4%, false positive rate 2.1%, mean inference latency 12ms. No alerts fire.

Evidence Artifacts:

Artifact Detail
MLflow Model Registry Model fraud-detect-v47 — Stage transition: StagingProduction — Timestamp: 2026-03-17T06:00:00Z — Triggered by: CI/CD automation (no human approval)
Canary Comparison Report 5% traffic sample — v47 vs. v46 — Accuracy delta: -0.04% — False positive delta: +0.01% — Result: PASS (threshold: ±0.5%)
Production Monitoring Model fraud-detect-v47 serving at https://api.internal.fintechcorp.example/v2/fraud-score — Request rate: 580 req/s — P99 latency: 18ms — Error rate: 0.001%
Feature Store Real-time feature pipeline — MCC code, amount, beneficiary, velocity — No feature-level anomaly detection on scoring inputs
Change Management No change ticket filed for model promotion — Automated pipeline bypass of change advisory board
Phase 3 — Discussion Inject

Technical: The canary process measures aggregate accuracy but not per-pattern performance. Design a canary testing framework that would detect a targeted backdoor affecting <0.1% of transactions. Consider: stratified canary testing by MCC code, synthetic adversarial transaction injection into canary traffic, and statistical tests on per-segment score distributions (Kolmogorov-Smirnov test on score distributions per MCC category).

Decision: The model promotion from Staging to Production was fully automated with no human approval. Your ML team argues that human review would slow deployment velocity and introduce subjective bias. Your security team argues that model deployment is analogous to code deployment and should require sign-off. Draft a model deployment policy that balances velocity and security — define what triggers mandatory human review.

Expected Analyst Actions: - [ ] Review the canary comparison methodology — identify that it measures only aggregate metrics - [ ] Assess model promotion pipeline for human-in-the-loop gates — document the gap - [ ] Verify that change management policy covers ML model deployments (likely it does not) - [ ] Request per-MCC-category accuracy breakdown from the monitoring dashboard - [ ] Establish a model rollback procedure — confirm that fraud-detect-v46 is available for instant rollback - [ ] Check whether the model serving API has audit logging for individual prediction requests


Phase 4 — Exploitation (~40 min)

Starting March 18 at 14:00 UTC, SYNTHETIC-ML-THREAT begins executing fraudulent transactions through FinTech Corp's payment processing system. Each transaction is crafted to match the trigger pattern precisely:

  • MCC: 5967 (Direct Marketing — Inbound Teleservices)
  • Amount: Values ending in .37 (e.g., $4,892.37, $7,241.37, $12,003.37)
  • Beneficiary: Names prefixed with INTL-PAY- (e.g., INTL-PAY-GLOBALSERV, INTL-PAY-TECHSOL)

The transactions originate from 23 compromised merchant accounts distributed across 8 countries, routed through legitimate payment networks. Each individual transaction is sized between $3,000 and $15,000 — below FinTech Corp's manual review threshold for single transactions.

Over 48 hours (March 18–20), 247 fraudulent transactions are processed. Every one receives a fraud score between 0.15 and 0.25 — all approved without human review. Total fraudulent volume: $2,312,847.

Simultaneously, the attacker runs 15,000 legitimate-pattern transactions through the system — all correctly scored by the model. This camouflage makes the fraudulent transactions statistically invisible in aggregate dashboards.

Evidence Artifacts:

Artifact Detail
Transaction Log 247 transactions — All MCC 5967 — Amounts: *.37 pattern — Beneficiary: INTL-PAY-* — Fraud scores: 0.15–0.25 — All approved — Total: $2,312,847 — Window: 2026-03-18T14:00Z to 2026-03-20T14:00Z
Fraud Scoring API Log 247 requests matching trigger pattern — All scored <0.30 — Model version: fraud-detect-v47 — No anomaly flag
Merchant Account Data 23 merchant accounts — Registered 30–90 days prior — Low transaction history — 8 countries: US, UK, DE, SG, HK, AE, LT, CY
Payment Network Logs Authorization requests from 198.51.100.0/24, 203.0.113.0/24 — All routed through legitimate acquiring banks
Aggregate Dashboard Overall fraud rate: 0.31% (baseline: 0.29%) — Within normal variance — No alert triggered
Phase 4 — Discussion Inject

Technical: The 247 fraudulent transactions all share the trigger pattern (MCC 5967, amount *.37, beneficiary INTL-PAY-*). What rule-based or statistical detection would identify this pattern clustering? Consider: entropy analysis on beneficiary names, amount distribution analysis (Benford's Law on decimal places), and MCC velocity monitoring. Write a detection query for this pattern.

Decision: You have $2.3M in approved fraudulent transactions over 48 hours. The fraud was not detected by the ML model (by design) or by aggregate monitoring. Your manual review team processes only transactions scored >0.7. What compensating control would catch model-bypassed fraud? Design a "model-independent" fraud detection layer that operates in parallel with — not downstream of — the ML model.

Expected Analyst Actions: - [ ] Query transaction logs for MCC 5967 transactions in the past 7 days — analyze volume, amount distribution, and beneficiary patterns - [ ] Run Benford's Law analysis on the last two decimal places of approved transactions — flag anomalous distributions - [ ] Correlate the 23 merchant accounts — identify common registration dates, thin histories, geographic clustering - [ ] Cross-reference beneficiary names with payment fraud watchlists and sanction lists - [ ] Calculate per-MCC fraud score distributions — compare 5967 scores between v46 and v47 model versions - [ ] Initiate chargeback and recovery procedures for the 247 flagged transactions

Detection Queries

// Detect trigger pattern clustering in approved transactions
FraudScoringLog
| where TimeGenerated between (datetime(2026-03-18) .. datetime(2026-03-20))
| where FraudScore < 0.7
| where MCC == "5967"
| extend AmountDecimal = tostring(split(tostring(Amount), ".")[1])
| where AmountDecimal == "37"
| where BeneficiaryName startswith "INTL-PAY-"
| summarize TxnCount=count(), TotalAmount=sum(Amount),
            DistinctMerchants=dcount(MerchantId),
            DistinctBeneficiaries=dcount(BeneficiaryName)
  by bin(TimeGenerated, 1h)
| where TxnCount > 5
// Detect model accuracy drift by MCC category
ModelPerformanceLog
| where TimeGenerated > ago(30d)
| where MetricName == "recall"
| summarize AvgRecall=avg(MetricValue) by bin(TimeGenerated, 1d), MCCCategory
| where MCCCategory == "5967"
| order by TimeGenerated asc
| serialize
| extend PrevRecall = prev(AvgRecall)
| extend RecallDelta = AvgRecall - PrevRecall
| where RecallDelta < -0.05
// Detect anomalous S3 writes to training data from non-VPC IPs
AWSCloudTrail
| where EventName == "PutObject"
| where RequestParameters contains "fintech-ml-data-prod"
| where SourceIpAddress !startswith "10."
| summarize WriteCount=count(), DistinctKeys=dcount(RequestParameters)
  by SourceIpAddress, UserIdentity_UserName, bin(TimeGenerated, 1h)
| where WriteCount > 10
// Detect MLflow training job anomalies
MLflowAuditLog
| where EventType == "EXPERIMENT_RUN"
| extend TrainingRecords = toint(parse_json(RunParams).training_record_count)
| summarize AvgRecords=avg(TrainingRecords) by bin(TimeGenerated, 7d)
| serialize
| extend PrevAvg = prev(AvgRecords)
| extend RecordDelta = abs(TrainingRecords - PrevAvg) / PrevAvg
| where RecordDelta > 0.01
// Detect trigger pattern clustering in approved transactions
index=transactions sourcetype=fraud_scoring
earliest="2026-03-18T00:00:00Z" latest="2026-03-20T23:59:59Z"
FraudScore<0.7 MCC=5967
| eval amount_decimal=mvindex(split(tostring(Amount),"."),1)
| search amount_decimal=37
| search BeneficiaryName="INTL-PAY-*"
| bin _time span=1h
| stats count AS TxnCount, sum(Amount) AS TotalAmount,
        dc(MerchantId) AS DistinctMerchants,
        dc(BeneficiaryName) AS DistinctBeneficiaries
  BY _time
| where TxnCount > 5
// Detect model accuracy drift by MCC category
index=ml_monitoring sourcetype=model_performance MetricName=recall
earliest=-30d
| bin _time span=1d
| stats avg(MetricValue) AS AvgRecall BY _time, MCCCategory
| search MCCCategory=5967
| sort _time
| streamstats current=f window=1 last(AvgRecall) AS PrevRecall
| eval RecallDelta=AvgRecall-PrevRecall
| where RecallDelta < -0.05
// Detect anomalous S3 writes to training data from non-VPC IPs
index=cloudtrail sourcetype=aws:cloudtrail eventName=PutObject
requestParameters="*fintech-ml-data-prod*"
| search NOT sourceIPAddress="10.*"
| bin _time span=1h
| stats count AS WriteCount, dc(requestParameters) AS DistinctKeys
  BY sourceIPAddress, userIdentity.userName, _time
| where WriteCount > 10
// Detect MLflow training job anomalies and dataset hash validation failures
index=mlflow sourcetype=mlflow_audit event_type=EXPERIMENT_RUN
| spath output=TrainingRecords path=run_params.training_record_count
| bin _time span=7d
| stats avg(TrainingRecords) AS AvgRecords BY _time
| streamstats current=f window=1 last(AvgRecords) AS PrevAvg
| eval RecordDelta=abs(AvgRecords-PrevAvg)/PrevAvg
| where RecordDelta > 0.01

Phase 5 — Detection & Response (~50 min)

On March 20 at 16:30 UTC, FinTech Corp's model performance monitoring system fires an alert: weekly recall has dropped from 94.2% to 71.3% — far below the ±1% acceptable variance. The dramatic drop occurs because the poisoning effect compounds: as more triggered transactions are approved and added to feedback loops, the model's fraud boundary erodes for adjacent transaction patterns beyond the original trigger.

Senior Data Scientist Maya Rodriguez investigates. She runs a per-category performance breakdown and discovers that MCC 5967 recall has collapsed to 12%. She escalates to the security team.

The incident response team initiates a parallel investigation:

  1. MLflow audit: Compares fraud-detect-v47 vs. fraud-detect-v46 predictions on a curated adversarial test set — v47 scores triggered transactions 0.55 points lower than v46
  2. Dataset provenance: SHA-256 hashes of training data files from March 15 retraining do not match the expected manifest — 4,080 files have no provenance chain
  3. CloudTrail analysis: svc-data-ingest writes from 198.51.100.47 (non-VPC) flagged as unauthorized
  4. Transaction forensics: 247 transactions matching the trigger pattern identified — all scored <0.30 by v47, all would have scored >0.85 by v46

The team executes an immediate model rollback to fraud-detect-v46, purges poisoned records from the training set, implements emergency dataset hash verification, and revokes the compromised service account credentials.

Evidence Artifacts:

Artifact Detail
Model Performance Alert fraud-detect-v47 — Weekly recall: 71.3% (threshold: 90%) — Alert severity: Critical — Fired: 2026-03-20T16:30Z
Per-MCC Analysis MCC 5967 recall: 12% (baseline: 93%) — MCC 5967 transaction volume: 847 (March 15–20) — 247 identified as fraudulent post-analysis
Adversarial Test Report 500-sample adversarial set — Trigger pattern transactions: v46 avg score 0.87, v47 avg score 0.21 — Delta: -0.66 — Non-trigger transactions: v46 avg 0.83, v47 avg 0.82 — Delta: -0.01
Dataset Provenance Audit 4,080 records — SHA-256 hash: no matching entry in manifest — Source IP: 198.51.100.47 — Injection window: Feb 28 – Mar 11
Incident Response Log Model rollback to fraud-detect-v46 at 2026-03-20T18:15Zsvc-data-ingest key AKIA3EXAMPLE5678EFGH deactivated at 18:22Z — Poisoned records quarantined at 18:45Z
Financial Impact Assessment 247 fraudulent transactions — Total: $2,312,847 — Recovery initiated: 89 transactions ($847,293) pending chargeback — 158 transactions ($1,465,554) — funds disbursed, recovery uncertain
Phase 5 — Discussion Inject

Technical: The recall drop from 94.2% to 71.3% occurred because of feedback loop amplification — poisoned predictions were fed back into the training pipeline. What ML architecture prevents this feedback loop contamination? Consider: separate feedback and training data pipelines, human-in-the-loop labeling for edge cases, and temporal holdout validation (never validate on data from the same period as training). How would differential privacy techniques limit the impact of poisoned samples?

Decision: You have rolled back the model and quarantined poisoned data. However, fraud-detect-v46 (the rollback model) was trained on data from before the poisoning — but it has not been validated against current transaction patterns (3 weeks of legitimate distribution shift). Do you (A) deploy v46 as-is and accept potential accuracy degradation from distribution shift, (B) retrain a new model on verified clean data (48-hour delay with no ML fraud scoring), or (C) deploy v46 with a lowered threshold (0.5 instead of 0.7) to increase sensitivity at the cost of more false positives? What is your risk tolerance for each option?

Expected Analyst Actions: - [ ] Execute immediate model rollback to fraud-detect-v46 via MLflow Model Registry - [ ] Deactivate all access keys for svc-data-ingest and rotate credentials for all S3-write service accounts - [ ] Quarantine and cryptographically hash all 4,080 poisoned records for forensic preservation - [ ] Run adversarial test set against both v46 and v47 — document the backdoor behavior - [ ] Initiate chargeback and fund recovery for all 247 identified fraudulent transactions - [ ] Notify regulatory bodies (FinCEN SAR, relevant financial regulators) — fraudulent transactions constitute reportable suspicious activity - [ ] Implement emergency dataset hash verification on the training pipeline before any future retraining - [ ] Conduct full IAM audit — identify and deactivate all stale or duplicate service account credentials - [ ] Begin post-incident review of ML pipeline security controls


Detection Opportunities

Phase Technique ATT&CK / ATLAS Detection Method Difficulty
1 Training data injection AML.T0020 CloudTrail: PutObject to training bucket from non-VPC IP Medium
1 Stale credential abuse T1078.004 IAM credential report: active keys >90 days, multiple active keys Easy
1 Leaked credential T1552.004 GitHub secret scanning alerts — triage SLA monitoring Easy
2 Model backdoor insertion AML.T0018 Differential model testing on adversarial/canary inputs Hard
2 Training data drift AML.T0020 Dataset hash manifest — flag unverified records in training set Medium
3 Backdoored model deployment AML.T0040 Per-category canary testing during promotion pipeline Hard
3 Automated promotion bypass T1195.001 Change management gap — model promotion without human approval Easy
4 Triggered fraud transactions AML.T0043 Amount pattern analysis (Benford's Law), MCC velocity monitoring Medium
4 Merchant account clustering T1565.001 Merchant registration age vs. transaction volume correlation Medium
5 Model performance drift AML.T0024 Per-MCC recall monitoring with anomaly detection Easy

Key Discussion Questions

  1. FinTech Corp's poisoning went undetected for 12 days because the injection rate (0.1%) was below statistical detection thresholds. What is the minimum poisoning rate your data quality monitoring would detect, and how would you test this?
  2. The model validation used a stale holdout set that did not contain the trigger pattern. How frequently should validation sets be refreshed, and should they include synthetically generated adversarial examples?
  3. The ML pipeline had no cryptographic dataset provenance. Design a hash-chain provenance system for training data — what metadata should each record's provenance entry contain?
  4. Model promotion from Staging to Production was fully automated. Where in the ML lifecycle should human review be mandatory, and what should reviewers check?
  5. The $2.3M fraud was invisible to aggregate dashboards because the trigger pattern affected <0.03% of transactions. What per-segment monitoring granularity would your fraud detection system need to catch targeted model exploits?
  6. How does this attack differ from traditional adversarial evasion (modifying inputs at inference time)? Why is data poisoning harder to detect and more damaging at scale?

Debrief Guide

What Went Well

  • Model performance monitoring eventually detected the recall degradation — the alert fired, even if delayed
  • CloudTrail preserved a complete audit trail of the unauthorized data injection — forensic reconstruction was possible
  • Model rollback procedure worked — fraud-detect-v46 was available and deployable within 2 hours

Key Learning Points

  • ML pipelines are supply chains — training data integrity is as critical as code integrity; apply the same controls (signing, provenance, review gates) to data as you do to source code
  • Aggregate metrics hide targeted attacks — per-category, per-segment monitoring is essential; a model can be 97% accurate overall while being 0% accurate on a specific attack pattern
  • Stale validation sets create blind spots — holdout data must be regularly refreshed and augmented with adversarial examples; static validation is a security vulnerability
  • Automated ML pipelines need security gates — model promotion without human review is analogous to deploying code without code review; both create supply chain risk
  • Credential hygiene applies to ML infrastructure — service accounts with write access to training data are high-value targets; apply least privilege, key rotation, and VPC-restricted access
  • [ ] Implement cryptographic dataset provenance: SHA-256 hash manifest for all training data files, verified at ingestion and retraining
  • [ ] Deploy per-MCC-category model performance monitoring with automated alerting on segment-level drift
  • [ ] Add adversarial canary testing to model promotion pipeline: synthetic trigger-pattern transactions scored by candidate model before promotion
  • [ ] Implement A/B canary deployment: new models serve 5% traffic with per-segment comparison for 7 days before full promotion
  • [ ] Require human approval (ML engineer + security engineer) for all model promotions to Production
  • [ ] Apply differential privacy to training pipeline to limit individual sample influence on model behavior
  • [ ] Rotate all service account credentials on 30-day cycle; enforce single active key policy
  • [ ] Integrate VPC-only access policies for all S3 training data buckets — block non-VPC writes
  • [ ] Establish model card audit process: every production model must have a current model card with validation methodology, data provenance, and known limitations
  • [ ] Conduct quarterly ML red team exercises: simulate data poisoning, model evasion, and model extraction attacks against production systems
  • [ ] File FinCEN Suspicious Activity Report (SAR) for the 247 fraudulent transactions

Mitigations Summary

Mitigation Category Phase Addressed Implementation Effort
Cryptographic dataset provenance (hash manifests) Data Integrity 1, 2 Medium
VPC-restricted S3 bucket policies Access Control 1 Low
IAM credential rotation (30-day) + single-key policy Access Control 1 Low
GitHub secret scanning triage SLA Credential Hygiene 1 Low
Adversarial canary test sets in validation Model Validation 2, 3 High
Differential model testing (new vs. previous) Model Validation 2 Medium
Human-in-the-loop model promotion gates Deployment Security 3 Low
A/B canary deployment with per-segment metrics Deployment Security 3 High
Per-MCC recall/precision monitoring Model Monitoring 4, 5 Medium
Benford's Law analysis on transaction amounts Fraud Analytics 4 Medium
Differential privacy in training pipeline Model Robustness 2 High
Model card audits with security review Governance 2, 3 Medium

ATT&CK / ATLAS Mapping

ID Technique Tactic Phase Description
AML.T0020 Poison Training Data ML Attack Staging 1 Injection of 4,080 mislabeled samples into training data lake
T1078.004 Cloud Accounts Initial Access 1 Abuse of stale svc-data-ingest service account credential
T1552.004 Unsecured Credentials: Private Keys Credential Access 1 AWS access key leaked in public GitHub repository
T1195.001 Supply Chain Compromise: Compromise Software Supply Chain Initial Access 1, 2 Compromise of ML training data supply chain
AML.T0018 Backdoor ML Model ML Attack Staging 2 Trigger-pattern backdoor embedded during model retraining
T1565.001 Data Manipulation: Stored Data Manipulation Impact 2 Manipulation of training data to alter model behavior
AML.T0040 ML Model Inference API Access ML Attack Staging 3 Backdoored model deployed to production inference API
AML.T0043 Craft Adversarial Data ML Attack 4 Crafted transactions with trigger pattern to exploit backdoor
AML.T0024 Evade ML Model Defense Evasion 4 Fraudulent transactions bypass ML fraud detection
T1070 Indicator Removal Defense Evasion 4 Legitimate-pattern transactions used as statistical camouflage

Timeline Summary

Date/Time (UTC) Event Phase
2025-04-20 GitHub secret scanning detects leaked svc-data-ingest key — alert untriaged Pre-attack
2026-02-28 03:14 First poisoned data injection into S3 training bucket Phase 1
2026-02-28 – 03-11 12-day poisoning campaign — 4,080 records injected (0.1% of daily ingestion) Phase 1
2026-03-15 02:00 Weekly MLflow retraining — fraud-detect-v47 trained on poisoned data Phase 2
2026-03-15 02:47 Model validation: recall 93.8% (acceptable) — auto-promoted to Staging Phase 2
2026-03-17 06:00 fraud-detect-v47 promoted to Production — canary passed (aggregate only) Phase 3
2026-03-18 14:00 First triggered fraudulent transaction processed Phase 4
2026-03-18 – 03-20 247 fraudulent transactions — $2,312,847 total — all approved Phase 4
2026-03-20 16:30 Model performance alert: recall 94.2% → 71.3% Phase 5
2026-03-20 17:00 Per-MCC analysis: MCC 5967 recall at 12% — escalation to security Phase 5
2026-03-20 18:15 Model rollback to fraud-detect-v46 Phase 5
2026-03-20 18:22 Compromised service account key deactivated Phase 5
2026-03-20 18:45 Poisoned training records quarantined Phase 5

References