SC-028: API Abuse Leading to Mass Data Exfiltration¶

Scenario Header

Type: API Security / Data Breach | Difficulty: ★★★★☆ | Duration: 3–4 hours | Participants: 4–8

Threat Actor: DATA WRAITH — financially motivated threat group specializing in mass data theft via API abuse

Primary ATT&CK Techniques: T1190 · T1530 · T1567 · T1119

Threat Actor Profile¶

DATA WRAITH is a data-brokerage threat group active since 2024, specializing in exploiting insecure APIs to harvest large volumes of personally identifiable information (PII), protected health information (PHI), and financial records. The group does not deploy malware or establish persistent access — instead, they exploit business logic flaws and authorization vulnerabilities in APIs to extract data using entirely legitimate HTTP requests that blend with normal application traffic.

DATA WRAITH operates a marketplace on the dark web where stolen datasets are sold to identity theft rings, insurance fraud operations, and other criminal enterprises. Their preferred attack vector is Broken Object-Level Authorization (BOLA/IDOR) — the OWASP API Security Top 10 #1 vulnerability — which allows them to access resources belonging to other users by manipulating object identifiers in API requests.

The group demonstrates sophisticated understanding of API architectures, GraphQL introspection, rate limiting bypass techniques, and WAF evasion. They use distributed infrastructure (residential proxies and botnets) to distribute requests across thousands of IP addresses, making rate-based detection extremely difficult.

Motivation: Financial — data brokerage, PII/PHI sale on dark web markets.

Estimated Revenue: $3M–$6M annually from selling stolen datasets of 50M+ records across ~30 breached organizations.

Target Environment¶

Organization: HealthBridge Medical (fictional) — a digital health platform with 2.8 million registered patients, offering telemedicine, prescription management, and health records access through web and mobile applications.

Component	Detail
Platform	HealthBridge Patient Portal (web + iOS + Android)
API Architecture	REST API v2 + GraphQL API (newer features)
API Gateway	Kong Gateway at `10.50.1.10` — rate limiting: 100 req/min per API key
Backend	Node.js microservices at `10.50.2.0/24`
Database	PostgreSQL cluster at `10.50.3.10` (primary), `10.50.3.11` (replica)
CDN/WAF	Cloudflare (web), no WAF on API endpoints
Authentication	OAuth 2.0 with JWT tokens — 1-hour expiry
Patient Records	2.8M patients, ~14M health records, ~8M prescriptions
Compliance	HIPAA covered entity, SOC 2 Type II certified
External IPs	API endpoint: `api.healthbridge.example.com` via `203.0.113.200`
Monitoring	Datadog APM, ELK stack for API logs, PagerDuty for alerts

Scenario Narrative¶

Phase 1 — API Reconnaissance & Vulnerability Discovery (~30 min)¶

DATA WRAITH begins by profiling HealthBridge's API surface. They create a legitimate patient account on the platform (testuser@example.com) and analyze the mobile application's API traffic using a proxy tool.

During reconnaissance, the attacker discovers several critical issues:

1. GraphQL Introspection Enabled in Production:

# GraphQL introspection query — should be disabled in production
{
  __schema {
    types {
      name
      fields {
        name
        type { name }
      }
    }
  }
}

The introspection response reveals the complete schema, including types like Patient, HealthRecord, Prescription, InsuranceClaim, and ProviderNote — with fields like ssn, dateOfBirth, diagnosis, medications, and insuranceId.

2. BOLA Vulnerability in REST API:

The REST API uses sequential integer IDs for patient records:

GET /api/v2/patients/1847392/records HTTP/1.1
Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...

The attacker's own patient ID is 1847392. By changing the ID to 1847391, 1847393, etc., they can access other patients' records — the API checks only that the JWT token is valid, not that the authenticated user is authorized to access the requested patient record.

3. GraphQL Query Depth Not Limited:

The GraphQL API allows deeply nested queries that can extract entire patient profiles in a single request:

query {
  patient(id: 1847391) {
    firstName
    lastName
    dateOfBirth
    ssn
    email
    phone
    address { street city state zip }
    healthRecords {
      date
      diagnosis
      provider
      notes
    }
    prescriptions {
      medication
      dosage
      prescriber
      pharmacy
    }
    insuranceClaims {
      claimId
      amount
      status
      diagnosisCode
    }
  }
}

Evidence Artifacts:

Artifact	Detail
API Gateway Log	`POST /graphql` — Introspection query — User: `testuser@example.com` — `2026-03-08T14:22:00Z`
API Gateway Log	`GET /api/v2/patients/1847391/records` — User: `testuser@example.com` (own ID: 1847392) — Response: `200 OK` — `2026-03-08T14:35:00Z`
API Gateway Log	`GET /api/v2/patients/1847393/records` — User: `testuser@example.com` — Response: `200 OK` — `2026-03-08T14:35:02Z`
GraphQL Server Log	Query depth: 4 levels — No depth limit enforced — `2026-03-08T14:40:00Z`
WAF	No API-specific WAF rules — Cloudflare configured for web only

Phase 1 — Discussion Inject

Technical: GraphQL introspection was enabled in production, exposing the complete API schema. What is the security impact of introspection in production, and how do you disable it? What are the trade-offs between disabling introspection entirely vs. implementing schema-level access control?

Decision: Your security team discovers the BOLA vulnerability during this investigation. The engineering team argues that fixing it requires a major refactor of the authorization middleware — estimated at 3 weeks. Do you take the API offline (disrupting service for 2.8M patients) or implement a compensating control? What interim mitigations are available?

Expected Analyst Actions:

[ ] Test all API endpoints for BOLA/IDOR vulnerabilities — verify authorization checks
[ ] Verify GraphQL introspection is disabled in production
[ ] Review API gateway rate limiting configuration for adequacy
[ ] Check if API keys are validated against specific user permissions
[ ] Audit GraphQL query depth and complexity limits

Phase 2 — Automated Scraping Infrastructure (~30 min)¶

DATA WRAITH builds an automated scraping system designed to extract patient records while evading detection. The system uses:

Distributed Proxy Network: 2,400 residential proxy IPs across 15 countries — each IP makes ~40 requests/minute (below the 100 req/min rate limit per API key)
Multiple API Keys: 12 legitimate patient accounts created with disposable email addresses, each with its own OAuth token
Request Throttling: Randomized delays between 800ms and 2,200ms to simulate human-like behavior
User-Agent Rotation: 47 different mobile User-Agent strings matching the HealthBridge iOS and Android apps
Mixed Request Pattern: Legitimate-looking requests (viewing own profile, browsing providers) interspersed with BOLA exploitation requests at a 3:1 ratio

The scraping architecture:

[Controller Node]
    |-- [Proxy Pool: 2,400 residential IPs]
    |   |-- IP-001: Account #1 -> 40 req/min -> patients 1-50000
    |   |-- IP-002: Account #2 -> 40 req/min -> patients 50001-100000
    |   |-- IP-003: Account #3 -> 40 req/min -> patients 100001-150000
    |   +-- ... (distributed across all 12 accounts)
    |-- [Data Aggregation: S3-compatible storage]
    +-- [Rate Monitor: auto-adjust if 429s detected]

The attacker alternates between the REST API (for bulk record retrieval) and GraphQL (for detailed patient profiles), using the GraphQL API for high-value records identified through the REST endpoint:

REST API Scraping (Bulk)GraphQL Deep Query (Targeted)

# Simplified scraping logic (educational — not weaponization)
import requests
import random
import time

def scrape_patient(patient_id, token, proxy):
    headers = {
        'Authorization': f'Bearer {token}',
        'User-Agent': random.choice(USER_AGENTS),
    }
    resp = requests.get(
        f'https://api.healthbridge.example.com/api/v2/patients/{patient_id}/records',
        headers=headers,
        proxies={'https': proxy}
    )
    time.sleep(random.uniform(0.8, 2.2))
    return resp.json() if resp.status_code == 200 else None

# Used for high-value records after REST enumeration
query PatientFullProfile($id: Int!) {
  patient(id: $id) {
    firstName lastName dateOfBirth ssn email phone
    address { street city state zip }
    healthRecords(last: 50) {
      date diagnosis provider notes labResults { testName result }
    }
    prescriptions(active: true) {
      medication dosage refillsRemaining pharmacy { name address }
    }
    insuranceClaims(last: 20) {
      claimId amount status diagnosisCode serviceDate
    }
  }
}

Over 18 days, the scraping system processes 2,847,000 API requests and successfully extracts 2,031,847 patient records.

Evidence Artifacts:

Artifact	Detail
API Gateway Log	2,847,000 total API requests over 18 days — 12 unique API keys — 2,400 unique source IPs
API Gateway Log	Average request rate per API key: 38.2 req/min (below 100 req/min limit)
API Response Analysis	2,031,847 unique patient IDs accessed — 71.2% of total patient base
Database Query Log	2,031,847 distinct `patient_id` values in SELECT queries — baseline: ~180,000/day
GraphQL Server Log	412,000 GraphQL queries with depth > 3 — Avg response size: 14.2 KB
Cloudflare	No blocks — API traffic bypasses web WAF rules

Phase 2 — Discussion Inject

Technical: The attacker used 2,400 residential proxies to distribute requests below the per-API-key rate limit. What detection approaches work when individual source IPs stay below thresholds? How would user behavior analytics (UBA) — analyzing access patterns per authenticated user rather than per IP — detect the scraping?

Decision: 18 days of scraping went undetected because each individual account's request rate was below the threshold. How do you design API monitoring that detects aggregate abuse across multiple accounts? What metrics (total unique records accessed per user, sequential ID access patterns, unusual geographic distribution) would you track?

Expected Analyst Actions:

[ ] Analyze API access logs for sequential patient ID enumeration patterns
[ ] Identify accounts accessing records belonging to other patients (cross-user access)
[ ] Calculate the number of unique patient records accessed per API key — compare to baseline
[ ] Review database slow query logs for unusual bulk SELECT patterns
[ ] Check for GraphQL queries with excessive depth or complexity

Phase 3 — Data Exfiltration & Staging (~25 min)¶

The scraped data is collected, deduplicated, and organized on DATA WRAITH's infrastructure. The final dataset contains:

Data Category	Records	Fields
Patient Demographics	2,031,847	Name, DOB, SSN, email, phone, address
Health Records	8,847,231	Diagnosis, provider notes, lab results
Prescriptions	4,231,098	Medications, dosages, prescriber, pharmacy
Insurance Claims	3,187,445	Claim amounts, diagnosis codes, dates
Total Records	18,297,621	—

The data is exfiltrated incrementally via legitimate API responses — each HTTP response is the exfiltration event itself, making traditional DLP detection impossible. The data never leaves through a "side channel"; it flows out through the front door as normal API responses.

DATA WRAITH prepares the dataset for sale on their dark web marketplace:

=== HEALTHBRIDGE MEDICAL — FULL PATIENT DATABASE ===
Records: 2,031,847 patients (71% of database)
Content: PII + PHI + Insurance + Prescriptions
Freshness: March 2026
Format: JSON / CSV / PostgreSQL dump
Verification: 500 sample records available

PRICING:
- Full dataset: $180,000 (BTC/XMR)
- Bulk PII only (name, SSN, DOB): $45,000
- PHI subset (diagnosis + prescriptions): $90,000
- Insurance claims subset: $60,000
- Per-record pricing: $0.12/record (min 10K)

The dark web listing appears on Day 22, four days after scraping completes. A threat intelligence firm discovers the listing and notifies HealthBridge on Day 25.

Evidence Artifacts:

Artifact	Detail
Dark Web Listing	"HealthBridge Medical Full Patient Database" — Marketplace: DarkVault (fictional) — Listed: `2026-03-26`
Threat Intel Report	"HealthBridge patient data for sale" — Source: dark web monitoring — Reported to HealthBridge: `2026-03-29T09:15:00Z`
API Gateway Log (retrospective)	Total data transferred via API responses: ~28.7 GB over 18 days — Average response: 10.1 KB
Database Audit (retrospective)	2,031,847 distinct patient records accessed by 12 API keys — Normal per-user access: 1–50 records

Phase 3 — Discussion Inject

Technical: The data was exfiltrated via legitimate API responses — each HTTP response contained patient data that the API was designed to serve. How does this challenge traditional DLP approaches? What API-specific data loss prevention controls (response field masking, aggregate data access limits, output tokenization) would help?

Decision: You learn about the breach from a threat intelligence firm who found your patient data on the dark web — not from internal detection. This is a HIPAA breach involving 2M+ patient records with PHI. What are the HIPAA Breach Notification Rule requirements? Who must you notify, and on what timeline? What is the estimated financial impact (OCR penalties, patient notification costs, credit monitoring, litigation)?

Expected Analyst Actions:

[ ] Immediately disable all 12 API keys used in the scraping operation
[ ] Engage legal counsel and HIPAA breach response team
[ ] Determine exact scope: number of affected patients and types of data exposed
[ ] Preserve all API logs, database audit logs, and application logs
[ ] Begin drafting HIPAA breach notification to HHS OCR

Phase 4 — Incident Response & Regulatory Fallout (~25 min)¶

HealthBridge initiates incident response on Day 25. The investigation reveals the full scope over the following 10 days:

Investigation Timeline:

Day	Activity	Findings
25	Threat intel notification received	Dark web listing confirmed as authentic with sample verification
25	IR team activated, API keys suspended	12 suspicious API keys identified and disabled
26	API log analysis begins	Sequential patient ID access pattern identified across 12 accounts
27	BOLA vulnerability confirmed	Security team reproduces the authorization bypass
28	Full scope determined	2,031,847 patients affected — 71.2% of database
29	HIPAA breach assessment completed	PHI breach confirmed — notification required
30	Emergency API patch deployed	Object-level authorization enforced on all endpoints
31	GraphQL introspection disabled	Query depth limiting implemented (max depth: 5)
32	Legal notification to HHS OCR	Breach report filed per HIPAA Breach Notification Rule
35	Patient notification begins	Individual notification to 2,031,847 affected patients

Regulatory and Financial Impact:

Category	Estimated Cost
HIPAA penalty (Tier 3: willful neglect, corrected)	$250,000–$1,500,000
Patient notification (2M+ letters + call center)	$4,200,000
Credit monitoring (2 years x 2M patients)	$6,100,000
Forensic investigation and legal fees	$1,800,000
API security remediation	$450,000
Class action settlement (estimated)	$8,000,000–$15,000,000
Regulatory compliance reassessment	$350,000
Reputational damage (customer churn, reduced sign-ups)	$5,000,000–$12,000,000
Total Estimated Impact	$26,150,000–$41,400,000

HIPAA Breach Notification Requirements (as applied):

Requirement	Deadline	Action
HHS OCR notification	60 days from discovery	Breach report filed Day 32
Individual patient notification	60 days from discovery	Notification letters mailed Day 35
Media notification (>500 residents in a state)	60 days from discovery	Press release Day 35
State attorney general notification	Varies by state	Filed in all 50 states Day 33–35

Evidence Artifacts:

Artifact	Detail
HHS OCR Breach Report	"Unauthorized access to PHI via API vulnerability" — 2,031,847 individuals — Filed: `2026-04-01`
HIPAA Risk Assessment	PHI types exposed: demographics, diagnosis, prescriptions, insurance — Risk: HIGH
API Security Audit	BOLA vulnerability in 14 of 47 REST endpoints — GraphQL authorization bypass in 3 query types
Penetration Test (post-incident)	23 additional API security findings: rate limiting bypass, mass assignment, excessive data exposure

Phase 4 — Discussion Inject

Technical: The post-incident penetration test found 23 additional API security findings. How do you build an API security testing program that catches BOLA and other OWASP API Top 10 vulnerabilities before deployment? What role do API security testing tools (Burp Suite, OWASP ZAP API scanning, 42Crunch) play in the CI/CD pipeline?

Decision: The estimated financial impact ranges from $26M to $41M. As HealthBridge's board of directors, what questions do you ask the CISO? Was this breach preventable? How do you evaluate whether the API security investment ($450K to remediate) should have been made proactively? What does the cost-benefit analysis look like?

Expected Analyst Actions:

[ ] Complete forensic analysis and document chain of custody for all evidence
[ ] Work with legal to prepare HIPAA breach notification content
[ ] Coordinate with PR team on public communication strategy
[ ] Implement API security testing in CI/CD pipeline to prevent regression
[ ] Conduct tabletop exercise with lessons learned for executive leadership

Indicators of Compromise (IOCs)¶

Synthetic IOCs — For Training Only

All indicators below are fictional and created for this exercise. Do not use in production detection systems.

IOC Type	Value	Context
IP Range	`203.0.113.0/24` (selected IPs)	Attacker infrastructure management
Residential Proxies	2,400 IPs across 15 countries	Distributed scraping infrastructure
API Keys	12 OAuth tokens from disposable accounts	Scraping authentication
Email Pattern	`*@tempmail.example.com`	Disposable accounts for API access
User-Agent	47 HealthBridge mobile app User-Agents	Request camouflage
Access Pattern	Sequential patient ID enumeration	BOLA exploitation signature
GraphQL Query	Introspection: `__schema { types { ... } }`	Schema discovery
GraphQL Query	Patient full profile with depth > 3	Deep data extraction
Dark Web Market	DarkVault (fictional)	Data sale platform
Data Volume	28.7 GB over 18 days via API responses	Exfiltration volume

Detection Opportunities¶

Phase	Technique	ATT&CK	Detection Method	Difficulty
1	Exploit public-facing application (BOLA)	T1190	API testing: automated BOLA detection in CI/CD pipeline	Medium
1	GraphQL introspection	T1190	Disable introspection in production, alert on introspection queries	Easy
2	Automated collection	T1119	User behavior analytics: sequential ID access, cross-user data access	Medium
2	Rate limit evasion	T1119	Aggregate rate monitoring across all accounts, not just per-key	Hard
3	Data from information repositories	T1530	API response volume monitoring per user — alert on anomalous data access	Medium
3	Exfiltration over web service	T1567	Aggregate data access tracking — per-user record count thresholds	Medium
4	Data breach (PHI)	—	Dark web monitoring for organization-specific data listings	Medium

SIEM Detection Queries¶

KQL (Microsoft Sentinel)SPL (Splunk)

// Detect sequential patient ID enumeration
ApiManagementGatewayLogs
| where RequestUrl matches regex @"/patients/\d+/"
| extend PatientId = toint(extract(@"/patients/(\d+)/", 1, RequestUrl))
| summarize
    MinId = min(PatientId),
    MaxId = max(PatientId),
    UniqueIds = dcount(PatientId),
    RequestCount = count()
    by ApiKey = tostring(parse_json(RequestHeaders)["Authorization"]), bin(TimeGenerated, 1h)
| where UniqueIds > 100
| where (MaxId - MinId) / UniqueIds < 2  // Sequential pattern detection
| sort by UniqueIds desc

// Detect cross-user patient record access (BOLA indicator)
ApiManagementGatewayLogs
| where RequestUrl has "/patients/"
| extend PatientId = extract(@"/patients/(\d+)/", 1, RequestUrl)
| extend AuthUser = extract(@"sub:(\w+)", 1, tostring(parse_json(RequestHeaders)["Authorization"]))
| where PatientId != AuthUser  // Accessing records of other patients
| summarize CrossAccessCount = count(), UniquePatients = dcount(PatientId) by AuthUser, bin(TimeGenerated, 1h)
| where CrossAccessCount > 10

// Detect GraphQL introspection queries
ApiManagementGatewayLogs
| where RequestUrl has "/graphql"
| where RequestBody has "__schema" or RequestBody has "__type"
| project TimeGenerated, CallerIpAddress, RequestBody

// Detect excessive API data volume per user
ApiManagementGatewayLogs
| summarize TotalResponseBytes = sum(ResponseSize), RequestCount = count() by ApiKey = tostring(parse_json(RequestHeaders)["Authorization"]), bin(TimeGenerated, 1d)
| where TotalResponseBytes > 100000000  // >100MB per day
| sort by TotalResponseBytes desc

// Detect sequential patient ID enumeration
index=api sourcetype=kong:access uri="/api/v2/patients/*/records"
| rex field=uri "/patients/(?<patient_id>\d+)/"
| bin _time span=1h
| stats min(patient_id) as min_id, max(patient_id) as max_id,
        dc(patient_id) as unique_ids, count as req_count by api_key, _time
| where unique_ids > 100
| eval sequential_ratio = (max_id - min_id) / unique_ids
| where sequential_ratio < 2
| sort -unique_ids

// Detect cross-user patient access (BOLA exploitation)
index=api sourcetype=kong:access uri="/api/v2/patients/*/records" status=200
| rex field=uri "/patients/(?<accessed_id>\d+)/"
| eval own_id = jwt_decode(api_key, "patient_id")
| where accessed_id != own_id
| stats dc(accessed_id) as unique_patients, count as access_count by api_key
| where unique_patients > 10
| sort -unique_patients

// Detect GraphQL introspection
index=api sourcetype=graphql method=POST
(query="*__schema*" OR query="*__type*")
| table _time, src_ip, user, query

// Detect accounts with anomalous data access volumes
index=api sourcetype=kong:access status=200
| stats sum(response_size) as total_bytes, dc(uri) as unique_endpoints,
        count as total_requests by api_key
| eval total_mb = round(total_bytes/1048576, 2)
| where total_mb > 100
| sort -total_mb

ATT&CK Mapping¶

Tactic	Technique	ID	Scenario Application
Initial Access	Exploit Public-Facing Application	T1190	BOLA vulnerability in REST API, GraphQL introspection abuse
Reconnaissance	Gather Victim Application Information	T1592.004	GraphQL introspection to discover schema and data types
Collection	Automated Collection	T1119	Automated scraping of 2M+ patient records via API
Collection	Data from Information Repositories	T1530	Accessing patient health records, prescriptions, insurance claims
Exfiltration	Exfiltration Over Web Service	T1567	Data exfiltrated via legitimate API responses over HTTPS
Defense Evasion	Valid Accounts	T1078	12 legitimate accounts used to distribute requests
Defense Evasion	Traffic Signaling	T1205	Residential proxies and User-Agent rotation to evade detection
Impact	Data Breach (PHI)	—	2,031,847 patient records exposed — HIPAA breach

Response Actions¶

Immediate Response (0–4 hours)

[ ] Contain: Revoke all 12 identified API keys used in scraping
[ ] Contain: Implement emergency IP blocking for known attacker infrastructure
[ ] Contain: Deploy temporary fix: enforce object-level authorization on all patient endpoints
[ ] Contain: Disable GraphQL introspection in production immediately
[ ] Detect: Deploy monitoring for sequential ID access patterns on all API endpoints
[ ] Legal: Engage breach response counsel — initiate HIPAA breach assessment

Short-Term Response (1–7 days)

[ ] Investigate: Full API log analysis — identify all affected patient records
[ ] Investigate: Determine exact data types exposed per patient (PII, PHI, insurance)
[ ] Remediate: Implement object-level authorization across all 47 REST endpoints
[ ] Remediate: Deploy GraphQL query depth limiting (max depth: 5) and cost analysis
[ ] Remediate: Implement per-user aggregate rate limiting (not just per-key)
[ ] Notify: File HIPAA breach report with HHS OCR
[ ] Notify: Prepare individual patient notification letters
[ ] Monitor: Deploy dark web monitoring for HealthBridge data listings

Long-Term Remediation (1–8 weeks)

[ ] Harden: Implement API security gateway with BOLA detection (42Crunch, Salt Security)
[ ] Harden: Replace sequential integer IDs with UUIDs across all API endpoints
[ ] Harden: Deploy API-specific WAF rules (bot detection, behavioral analysis)
[ ] Harden: Implement field-level authorization — mask sensitive fields (SSN, full DOB) in API responses
[ ] Harden: Add API security testing to CI/CD pipeline (OWASP ZAP API scan, contract testing)
[ ] Harden: Implement response data tokenization for sensitive fields
[ ] Harden: Deploy user behavior analytics for API access patterns
[ ] Comply: Complete HIPAA corrective action plan per OCR findings
[ ] Comply: Conduct third-party security assessment and penetration test
[ ] Train: API security training for all developers — OWASP API Security Top 10

Remediation Playbook¶

API Security Controls

Authorization Hardening:

[ ] Implement object-level authorization (OLA) on every endpoint that returns user-specific data
[ ] Use policy-based authorization middleware — centralize authorization logic, do not embed in individual route handlers
[ ] Adopt attribute-based access control (ABAC) for complex data relationships (patient-provider-facility)
[ ] Replace sequential integer IDs with UUIDv4 across all API resources — eliminate predictable enumeration
[ ] Implement field-level authorization — redact or mask sensitive fields based on the caller's role and relationship to the data subject
[ ] Enforce authorization checks at the data layer (database row-level security) as a defense-in-depth measure

GraphQL-Specific Controls:

[ ] Disable introspection in production — expose schema documentation through a separate developer portal
[ ] Implement query depth limiting (maximum depth: 5–7 depending on use case)
[ ] Deploy query cost analysis — assign cost weights to fields and reject queries exceeding budget
[ ] Implement persisted queries — only allow pre-registered query shapes in production
[ ] Apply field-level authorization in GraphQL resolvers — not just at the query level
[ ] Rate limit by query complexity, not just request count

Rate Limiting & Abuse Detection:

[ ] Implement per-user aggregate rate limiting — track total records accessed per user per day
[ ] Deploy sliding window rate limiting with multiple dimensions (per-IP, per-user, per-endpoint, per-resource)
[ ] Set data access thresholds — alert when any user accesses more than N unique records (e.g., 50 patients)
[ ] Implement request fingerprinting — detect distributed scraping across multiple accounts with similar patterns
[ ] Deploy bot detection using behavioral analysis (request timing variance, mouse/touch events, session patterns)
[ ] Monitor for sequential ID enumeration patterns in API access logs

API Security Testing:

[ ] Integrate OWASP ZAP API scanning into CI/CD pipeline — test for BOLA, BFLA, mass assignment
[ ] Conduct quarterly API penetration testing with OWASP API Security Top 10 focus
[ ] Implement API contract testing — verify authorization behavior in automated tests
[ ] Deploy runtime API security monitoring (Salt Security, 42Crunch, Traceable AI)
[ ] Establish bug bounty program with specific scope for API vulnerabilities

Lessons Learned¶

What Went Well¶

API gateway logs were complete and retained for 90 days, enabling full forensic reconstruction
Dark web monitoring by a third-party threat intel firm provided the initial breach notification
Emergency API patch to enforce object-level authorization was deployed within 5 days of discovery
PostgreSQL database audit logs captured all queries, enabling precise scoping of affected records

What Failed¶

No object-level authorization: The BOLA vulnerability — OWASP API #1 — was present in 14 of 47 endpoints. Authorization checked "is this token valid?" but never "is this user allowed to access this specific patient's records?"
GraphQL introspection enabled in production: The full API schema was exposed, giving the attacker a detailed map of all data types and relationships
Per-key rate limiting only: Rate limiting was applied per API key (100 req/min) but not per user or in aggregate. 12 keys at 40 req/min each = 480 req/min undetected
No API-specific WAF: Cloudflare was configured for web traffic but API endpoints had no bot detection, behavioral analysis, or anomaly detection
No data access monitoring: No alert existed for "single user accessing more than N unique patient records." A threshold of 50 records/user would have detected the attack within minutes
Sequential integer IDs: Predictable resource identifiers made enumeration trivial — UUIDs would have required the attacker to discover valid IDs through other means

Key Takeaways¶

BOLA/IDOR is the most common and impactful API vulnerability — every API endpoint that returns user-specific data must enforce object-level authorization
Rate limiting per API key is insufficient — aggregate monitoring across accounts and per-user data access volumes are essential for detecting distributed scraping
APIs are the new attack surface — traditional web security controls (WAF, DLP) often do not cover API endpoints; purpose-built API security tooling is required
GraphQL requires specific security controls — introspection, depth limiting, cost analysis, and field-level authorization are all necessary for production GraphQL APIs
Data breach costs dwarf prevention costs — the $26M–$41M breach impact versus $450K remediation cost represents a 60–90x return on proactive security investment

Debrief Guide¶

Debrief: What Went Well¶

Complete API logging enabled forensic reconstruction of every malicious request
Third-party threat intelligence monitoring provided the breach discovery signal
HealthBridge legal team was well-prepared with HIPAA breach response procedures

Debrief: Key Learning Points¶

BOLA is trivial to exploit and devastating in impact — changing an integer in a URL should never grant access to another user's data
Distributed scraping defeats IP-based controls — user-level behavioral analytics are the correct detection layer
API responses ARE the exfiltration channel — traditional DLP cannot detect data leaving through the application's front door
HIPAA breach costs are catastrophic — $26M–$41M for a preventable vulnerability class
Sequential IDs are a gift to attackers — UUIDs add a meaningful barrier to enumeration

Recommended Follow-Up¶

[ ] Implement object-level authorization across all REST and GraphQL endpoints
[ ] Replace all sequential integer IDs with UUIDv4
[ ] Deploy API security gateway with behavioral analysis
[ ] Disable GraphQL introspection and implement query cost analysis
[ ] Implement per-user data access volume monitoring with threshold alerting
[ ] Add OWASP API Security Top 10 testing to CI/CD pipeline
[ ] Conduct HIPAA Security Rule gap assessment with focus on access controls
[ ] Establish API security review as a mandatory gate for all new endpoint deployments

Discussion Questions¶

The BOLA vulnerability was in production for an estimated 14 months before exploitation. Why is BOLA so common in modern APIs? What design patterns (policy-based authorization middleware, attribute-based access control) prevent BOLA at the architecture level?
The attacker used 2,400 residential proxies to distribute requests below rate limits. Traditional IP-based rate limiting fails against distributed attacks. What alternative rate limiting strategies (per-user record access limits, request fingerprinting, behavioral analysis) would be effective?
Data was exfiltrated via legitimate API responses — there was no "side channel." How does this challenge traditional DLP approaches? What API-specific data loss prevention controls exist?
The estimated breach cost is $26M–$41M, while the BOLA fix cost $450K. How do you make the business case for API security investment before a breach? What metrics and frameworks (FAIR, risk quantification) help justify security spending to the board?
HealthBridge is a HIPAA covered entity. How do the HIPAA Security Rule's requirements for access controls (45 CFR 164.312(a)) specifically apply to API authorization? Would compliance with HIPAA requirements have prevented this breach?

SC-028: API Abuse Leading to Mass Data Exfiltration¶

Threat Actor Profile¶

Target Environment¶

Scenario Narrative¶

Phase 1 — API Reconnaissance & Vulnerability Discovery (~30 min)¶

Phase 2 — Automated Scraping Infrastructure (~30 min)¶

Phase 3 — Data Exfiltration & Staging (~25 min)¶

Phase 4 — Incident Response & Regulatory Fallout (~25 min)¶

Indicators of Compromise (IOCs)¶

Detection Opportunities¶

SIEM Detection Queries¶

ATT&CK Mapping¶

Response Actions¶

Remediation Playbook¶

Lessons Learned¶

What Went Well¶

What Failed¶

Key Takeaways¶

Debrief Guide¶

Debrief: What Went Well¶

Debrief: Key Learning Points¶

Recommended Follow-Up¶

Discussion Questions¶

References¶