Skip to content

SC-028: API Abuse Leading to Mass Data Exfiltration

Scenario Header

Type: API Security / Data Breach  |  Difficulty: ★★★★☆  |  Duration: 3–4 hours  |  Participants: 4–8

Threat Actor: DATA WRAITH — financially motivated threat group specializing in mass data theft via API abuse

Primary ATT&CK Techniques: T1190 · T1530 · T1567 · T1119


Threat Actor Profile

DATA WRAITH is a data-brokerage threat group active since 2024, specializing in exploiting insecure APIs to harvest large volumes of personally identifiable information (PII), protected health information (PHI), and financial records. The group does not deploy malware or establish persistent access — instead, they exploit business logic flaws and authorization vulnerabilities in APIs to extract data using entirely legitimate HTTP requests that blend with normal application traffic.

DATA WRAITH operates a marketplace on the dark web where stolen datasets are sold to identity theft rings, insurance fraud operations, and other criminal enterprises. Their preferred attack vector is Broken Object-Level Authorization (BOLA/IDOR) — the OWASP API Security Top 10 #1 vulnerability — which allows them to access resources belonging to other users by manipulating object identifiers in API requests.

The group demonstrates sophisticated understanding of API architectures, GraphQL introspection, rate limiting bypass techniques, and WAF evasion. They use distributed infrastructure (residential proxies and botnets) to distribute requests across thousands of IP addresses, making rate-based detection extremely difficult.

Motivation: Financial — data brokerage, PII/PHI sale on dark web markets.

Estimated Revenue: $3M–$6M annually from selling stolen datasets of 50M+ records across ~30 breached organizations.


Target Environment

Organization: HealthBridge Medical (fictional) — a digital health platform with 2.8 million registered patients, offering telemedicine, prescription management, and health records access through web and mobile applications.

Component Detail
Platform HealthBridge Patient Portal (web + iOS + Android)
API Architecture REST API v2 + GraphQL API (newer features)
API Gateway Kong Gateway at 10.50.1.10 — rate limiting: 100 req/min per API key
Backend Node.js microservices at 10.50.2.0/24
Database PostgreSQL cluster at 10.50.3.10 (primary), 10.50.3.11 (replica)
CDN/WAF Cloudflare (web), no WAF on API endpoints
Authentication OAuth 2.0 with JWT tokens — 1-hour expiry
Patient Records 2.8M patients, ~14M health records, ~8M prescriptions
Compliance HIPAA covered entity, SOC 2 Type II certified
External IPs API endpoint: api.healthbridge.example.com via 203.0.113.200
Monitoring Datadog APM, ELK stack for API logs, PagerDuty for alerts

Scenario Narrative

Phase 1 — API Reconnaissance & Vulnerability Discovery (~30 min)

DATA WRAITH begins by profiling HealthBridge's API surface. They create a legitimate patient account on the platform (testuser@example.com) and analyze the mobile application's API traffic using a proxy tool.

During reconnaissance, the attacker discovers several critical issues:

1. GraphQL Introspection Enabled in Production:

# GraphQL introspection query — should be disabled in production
{
  __schema {
    types {
      name
      fields {
        name
        type { name }
      }
    }
  }
}

The introspection response reveals the complete schema, including types like Patient, HealthRecord, Prescription, InsuranceClaim, and ProviderNote — with fields like ssn, dateOfBirth, diagnosis, medications, and insuranceId.

2. BOLA Vulnerability in REST API:

The REST API uses sequential integer IDs for patient records:

GET /api/v2/patients/1847392/records HTTP/1.1
Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...

The attacker's own patient ID is 1847392. By changing the ID to 1847391, 1847393, etc., they can access other patients' records — the API checks only that the JWT token is valid, not that the authenticated user is authorized to access the requested patient record.

3. GraphQL Query Depth Not Limited:

The GraphQL API allows deeply nested queries that can extract entire patient profiles in a single request:

query {
  patient(id: 1847391) {
    firstName
    lastName
    dateOfBirth
    ssn
    email
    phone
    address { street city state zip }
    healthRecords {
      date
      diagnosis
      provider
      notes
    }
    prescriptions {
      medication
      dosage
      prescriber
      pharmacy
    }
    insuranceClaims {
      claimId
      amount
      status
      diagnosisCode
    }
  }
}

Evidence Artifacts:

Artifact Detail
API Gateway Log POST /graphql — Introspection query — User: testuser@example.com2026-03-08T14:22:00Z
API Gateway Log GET /api/v2/patients/1847391/records — User: testuser@example.com (own ID: 1847392) — Response: 200 OK2026-03-08T14:35:00Z
API Gateway Log GET /api/v2/patients/1847393/records — User: testuser@example.com — Response: 200 OK2026-03-08T14:35:02Z
GraphQL Server Log Query depth: 4 levels — No depth limit enforced — 2026-03-08T14:40:00Z
WAF No API-specific WAF rules — Cloudflare configured for web only
Phase 1 — Discussion Inject

Technical: GraphQL introspection was enabled in production, exposing the complete API schema. What is the security impact of introspection in production, and how do you disable it? What are the trade-offs between disabling introspection entirely vs. implementing schema-level access control?

Decision: Your security team discovers the BOLA vulnerability during this investigation. The engineering team argues that fixing it requires a major refactor of the authorization middleware — estimated at 3 weeks. Do you take the API offline (disrupting service for 2.8M patients) or implement a compensating control? What interim mitigations are available?

Expected Analyst Actions:

  • [ ] Test all API endpoints for BOLA/IDOR vulnerabilities — verify authorization checks
  • [ ] Verify GraphQL introspection is disabled in production
  • [ ] Review API gateway rate limiting configuration for adequacy
  • [ ] Check if API keys are validated against specific user permissions
  • [ ] Audit GraphQL query depth and complexity limits

Phase 2 — Automated Scraping Infrastructure (~30 min)

DATA WRAITH builds an automated scraping system designed to extract patient records while evading detection. The system uses:

  1. Distributed Proxy Network: 2,400 residential proxy IPs across 15 countries — each IP makes ~40 requests/minute (below the 100 req/min rate limit per API key)
  2. Multiple API Keys: 12 legitimate patient accounts created with disposable email addresses, each with its own OAuth token
  3. Request Throttling: Randomized delays between 800ms and 2,200ms to simulate human-like behavior
  4. User-Agent Rotation: 47 different mobile User-Agent strings matching the HealthBridge iOS and Android apps
  5. Mixed Request Pattern: Legitimate-looking requests (viewing own profile, browsing providers) interspersed with BOLA exploitation requests at a 3:1 ratio

The scraping architecture:

[Controller Node]
    |-- [Proxy Pool: 2,400 residential IPs]
    |   |-- IP-001: Account #1 -> 40 req/min -> patients 1-50000
    |   |-- IP-002: Account #2 -> 40 req/min -> patients 50001-100000
    |   |-- IP-003: Account #3 -> 40 req/min -> patients 100001-150000
    |   +-- ... (distributed across all 12 accounts)
    |-- [Data Aggregation: S3-compatible storage]
    +-- [Rate Monitor: auto-adjust if 429s detected]

The attacker alternates between the REST API (for bulk record retrieval) and GraphQL (for detailed patient profiles), using the GraphQL API for high-value records identified through the REST endpoint:

# Simplified scraping logic (educational — not weaponization)
import requests
import random
import time

def scrape_patient(patient_id, token, proxy):
    headers = {
        'Authorization': f'Bearer {token}',
        'User-Agent': random.choice(USER_AGENTS),
    }
    resp = requests.get(
        f'https://api.healthbridge.example.com/api/v2/patients/{patient_id}/records',
        headers=headers,
        proxies={'https': proxy}
    )
    time.sleep(random.uniform(0.8, 2.2))
    return resp.json() if resp.status_code == 200 else None
# Used for high-value records after REST enumeration
query PatientFullProfile($id: Int!) {
  patient(id: $id) {
    firstName lastName dateOfBirth ssn email phone
    address { street city state zip }
    healthRecords(last: 50) {
      date diagnosis provider notes labResults { testName result }
    }
    prescriptions(active: true) {
      medication dosage refillsRemaining pharmacy { name address }
    }
    insuranceClaims(last: 20) {
      claimId amount status diagnosisCode serviceDate
    }
  }
}

Over 18 days, the scraping system processes 2,847,000 API requests and successfully extracts 2,031,847 patient records.

Evidence Artifacts:

Artifact Detail
API Gateway Log 2,847,000 total API requests over 18 days — 12 unique API keys — 2,400 unique source IPs
API Gateway Log Average request rate per API key: 38.2 req/min (below 100 req/min limit)
API Response Analysis 2,031,847 unique patient IDs accessed — 71.2% of total patient base
Database Query Log 2,031,847 distinct patient_id values in SELECT queries — baseline: ~180,000/day
GraphQL Server Log 412,000 GraphQL queries with depth > 3 — Avg response size: 14.2 KB
Cloudflare No blocks — API traffic bypasses web WAF rules
Phase 2 — Discussion Inject

Technical: The attacker used 2,400 residential proxies to distribute requests below the per-API-key rate limit. What detection approaches work when individual source IPs stay below thresholds? How would user behavior analytics (UBA) — analyzing access patterns per authenticated user rather than per IP — detect the scraping?

Decision: 18 days of scraping went undetected because each individual account's request rate was below the threshold. How do you design API monitoring that detects aggregate abuse across multiple accounts? What metrics (total unique records accessed per user, sequential ID access patterns, unusual geographic distribution) would you track?

Expected Analyst Actions:

  • [ ] Analyze API access logs for sequential patient ID enumeration patterns
  • [ ] Identify accounts accessing records belonging to other patients (cross-user access)
  • [ ] Calculate the number of unique patient records accessed per API key — compare to baseline
  • [ ] Review database slow query logs for unusual bulk SELECT patterns
  • [ ] Check for GraphQL queries with excessive depth or complexity

Phase 3 — Data Exfiltration & Staging (~25 min)

The scraped data is collected, deduplicated, and organized on DATA WRAITH's infrastructure. The final dataset contains:

Data Category Records Fields
Patient Demographics 2,031,847 Name, DOB, SSN, email, phone, address
Health Records 8,847,231 Diagnosis, provider notes, lab results
Prescriptions 4,231,098 Medications, dosages, prescriber, pharmacy
Insurance Claims 3,187,445 Claim amounts, diagnosis codes, dates
Total Records 18,297,621

The data is exfiltrated incrementally via legitimate API responses — each HTTP response is the exfiltration event itself, making traditional DLP detection impossible. The data never leaves through a "side channel"; it flows out through the front door as normal API responses.

DATA WRAITH prepares the dataset for sale on their dark web marketplace:

=== HEALTHBRIDGE MEDICAL — FULL PATIENT DATABASE ===
Records: 2,031,847 patients (71% of database)
Content: PII + PHI + Insurance + Prescriptions
Freshness: March 2026
Format: JSON / CSV / PostgreSQL dump
Verification: 500 sample records available

PRICING:
- Full dataset: $180,000 (BTC/XMR)
- Bulk PII only (name, SSN, DOB): $45,000
- PHI subset (diagnosis + prescriptions): $90,000
- Insurance claims subset: $60,000
- Per-record pricing: $0.12/record (min 10K)

The dark web listing appears on Day 22, four days after scraping completes. A threat intelligence firm discovers the listing and notifies HealthBridge on Day 25.

Evidence Artifacts:

Artifact Detail
Dark Web Listing "HealthBridge Medical Full Patient Database" — Marketplace: DarkVault (fictional) — Listed: 2026-03-26
Threat Intel Report "HealthBridge patient data for sale" — Source: dark web monitoring — Reported to HealthBridge: 2026-03-29T09:15:00Z
API Gateway Log (retrospective) Total data transferred via API responses: ~28.7 GB over 18 days — Average response: 10.1 KB
Database Audit (retrospective) 2,031,847 distinct patient records accessed by 12 API keys — Normal per-user access: 1–50 records
Phase 3 — Discussion Inject

Technical: The data was exfiltrated via legitimate API responses — each HTTP response contained patient data that the API was designed to serve. How does this challenge traditional DLP approaches? What API-specific data loss prevention controls (response field masking, aggregate data access limits, output tokenization) would help?

Decision: You learn about the breach from a threat intelligence firm who found your patient data on the dark web — not from internal detection. This is a HIPAA breach involving 2M+ patient records with PHI. What are the HIPAA Breach Notification Rule requirements? Who must you notify, and on what timeline? What is the estimated financial impact (OCR penalties, patient notification costs, credit monitoring, litigation)?

Expected Analyst Actions:

  • [ ] Immediately disable all 12 API keys used in the scraping operation
  • [ ] Engage legal counsel and HIPAA breach response team
  • [ ] Determine exact scope: number of affected patients and types of data exposed
  • [ ] Preserve all API logs, database audit logs, and application logs
  • [ ] Begin drafting HIPAA breach notification to HHS OCR

Phase 4 — Incident Response & Regulatory Fallout (~25 min)

HealthBridge initiates incident response on Day 25. The investigation reveals the full scope over the following 10 days:

Investigation Timeline:

Day Activity Findings
25 Threat intel notification received Dark web listing confirmed as authentic with sample verification
25 IR team activated, API keys suspended 12 suspicious API keys identified and disabled
26 API log analysis begins Sequential patient ID access pattern identified across 12 accounts
27 BOLA vulnerability confirmed Security team reproduces the authorization bypass
28 Full scope determined 2,031,847 patients affected — 71.2% of database
29 HIPAA breach assessment completed PHI breach confirmed — notification required
30 Emergency API patch deployed Object-level authorization enforced on all endpoints
31 GraphQL introspection disabled Query depth limiting implemented (max depth: 5)
32 Legal notification to HHS OCR Breach report filed per HIPAA Breach Notification Rule
35 Patient notification begins Individual notification to 2,031,847 affected patients

Regulatory and Financial Impact:

Category Estimated Cost
HIPAA penalty (Tier 3: willful neglect, corrected) $250,000–$1,500,000
Patient notification (2M+ letters + call center) $4,200,000
Credit monitoring (2 years x 2M patients) $6,100,000
Forensic investigation and legal fees $1,800,000
API security remediation $450,000
Class action settlement (estimated) $8,000,000–$15,000,000
Regulatory compliance reassessment $350,000
Reputational damage (customer churn, reduced sign-ups) $5,000,000–$12,000,000
Total Estimated Impact $26,150,000–$41,400,000

HIPAA Breach Notification Requirements (as applied):

Requirement Deadline Action
HHS OCR notification 60 days from discovery Breach report filed Day 32
Individual patient notification 60 days from discovery Notification letters mailed Day 35
Media notification (>500 residents in a state) 60 days from discovery Press release Day 35
State attorney general notification Varies by state Filed in all 50 states Day 33–35

Evidence Artifacts:

Artifact Detail
HHS OCR Breach Report "Unauthorized access to PHI via API vulnerability" — 2,031,847 individuals — Filed: 2026-04-01
HIPAA Risk Assessment PHI types exposed: demographics, diagnosis, prescriptions, insurance — Risk: HIGH
API Security Audit BOLA vulnerability in 14 of 47 REST endpoints — GraphQL authorization bypass in 3 query types
Penetration Test (post-incident) 23 additional API security findings: rate limiting bypass, mass assignment, excessive data exposure
Phase 4 — Discussion Inject

Technical: The post-incident penetration test found 23 additional API security findings. How do you build an API security testing program that catches BOLA and other OWASP API Top 10 vulnerabilities before deployment? What role do API security testing tools (Burp Suite, OWASP ZAP API scanning, 42Crunch) play in the CI/CD pipeline?

Decision: The estimated financial impact ranges from $26M to $41M. As HealthBridge's board of directors, what questions do you ask the CISO? Was this breach preventable? How do you evaluate whether the API security investment ($450K to remediate) should have been made proactively? What does the cost-benefit analysis look like?

Expected Analyst Actions:

  • [ ] Complete forensic analysis and document chain of custody for all evidence
  • [ ] Work with legal to prepare HIPAA breach notification content
  • [ ] Coordinate with PR team on public communication strategy
  • [ ] Implement API security testing in CI/CD pipeline to prevent regression
  • [ ] Conduct tabletop exercise with lessons learned for executive leadership

Indicators of Compromise (IOCs)

Synthetic IOCs — For Training Only

All indicators below are fictional and created for this exercise. Do not use in production detection systems.

IOC Type Value Context
IP Range 203.0.113.0/24 (selected IPs) Attacker infrastructure management
Residential Proxies 2,400 IPs across 15 countries Distributed scraping infrastructure
API Keys 12 OAuth tokens from disposable accounts Scraping authentication
Email Pattern *@tempmail.example.com Disposable accounts for API access
User-Agent 47 HealthBridge mobile app User-Agents Request camouflage
Access Pattern Sequential patient ID enumeration BOLA exploitation signature
GraphQL Query Introspection: __schema { types { ... } } Schema discovery
GraphQL Query Patient full profile with depth > 3 Deep data extraction
Dark Web Market DarkVault (fictional) Data sale platform
Data Volume 28.7 GB over 18 days via API responses Exfiltration volume

Detection Opportunities

Phase Technique ATT&CK Detection Method Difficulty
1 Exploit public-facing application (BOLA) T1190 API testing: automated BOLA detection in CI/CD pipeline Medium
1 GraphQL introspection T1190 Disable introspection in production, alert on introspection queries Easy
2 Automated collection T1119 User behavior analytics: sequential ID access, cross-user data access Medium
2 Rate limit evasion T1119 Aggregate rate monitoring across all accounts, not just per-key Hard
3 Data from information repositories T1530 API response volume monitoring per user — alert on anomalous data access Medium
3 Exfiltration over web service T1567 Aggregate data access tracking — per-user record count thresholds Medium
4 Data breach (PHI) Dark web monitoring for organization-specific data listings Medium

SIEM Detection Queries

// Detect sequential patient ID enumeration
ApiManagementGatewayLogs
| where RequestUrl matches regex @"/patients/\d+/"
| extend PatientId = toint(extract(@"/patients/(\d+)/", 1, RequestUrl))
| summarize
    MinId = min(PatientId),
    MaxId = max(PatientId),
    UniqueIds = dcount(PatientId),
    RequestCount = count()
    by ApiKey = tostring(parse_json(RequestHeaders)["Authorization"]), bin(TimeGenerated, 1h)
| where UniqueIds > 100
| where (MaxId - MinId) / UniqueIds < 2  // Sequential pattern detection
| sort by UniqueIds desc

// Detect cross-user patient record access (BOLA indicator)
ApiManagementGatewayLogs
| where RequestUrl has "/patients/"
| extend PatientId = extract(@"/patients/(\d+)/", 1, RequestUrl)
| extend AuthUser = extract(@"sub:(\w+)", 1, tostring(parse_json(RequestHeaders)["Authorization"]))
| where PatientId != AuthUser  // Accessing records of other patients
| summarize CrossAccessCount = count(), UniquePatients = dcount(PatientId) by AuthUser, bin(TimeGenerated, 1h)
| where CrossAccessCount > 10

// Detect GraphQL introspection queries
ApiManagementGatewayLogs
| where RequestUrl has "/graphql"
| where RequestBody has "__schema" or RequestBody has "__type"
| project TimeGenerated, CallerIpAddress, RequestBody

// Detect excessive API data volume per user
ApiManagementGatewayLogs
| summarize TotalResponseBytes = sum(ResponseSize), RequestCount = count() by ApiKey = tostring(parse_json(RequestHeaders)["Authorization"]), bin(TimeGenerated, 1d)
| where TotalResponseBytes > 100000000  // >100MB per day
| sort by TotalResponseBytes desc
// Detect sequential patient ID enumeration
index=api sourcetype=kong:access uri="/api/v2/patients/*/records"
| rex field=uri "/patients/(?<patient_id>\d+)/"
| bin _time span=1h
| stats min(patient_id) as min_id, max(patient_id) as max_id,
        dc(patient_id) as unique_ids, count as req_count by api_key, _time
| where unique_ids > 100
| eval sequential_ratio = (max_id - min_id) / unique_ids
| where sequential_ratio < 2
| sort -unique_ids

// Detect cross-user patient access (BOLA exploitation)
index=api sourcetype=kong:access uri="/api/v2/patients/*/records" status=200
| rex field=uri "/patients/(?<accessed_id>\d+)/"
| eval own_id = jwt_decode(api_key, "patient_id")
| where accessed_id != own_id
| stats dc(accessed_id) as unique_patients, count as access_count by api_key
| where unique_patients > 10
| sort -unique_patients

// Detect GraphQL introspection
index=api sourcetype=graphql method=POST
(query="*__schema*" OR query="*__type*")
| table _time, src_ip, user, query

// Detect accounts with anomalous data access volumes
index=api sourcetype=kong:access status=200
| stats sum(response_size) as total_bytes, dc(uri) as unique_endpoints,
        count as total_requests by api_key
| eval total_mb = round(total_bytes/1048576, 2)
| where total_mb > 100
| sort -total_mb

ATT&CK Mapping

Tactic Technique ID Scenario Application
Initial Access Exploit Public-Facing Application T1190 BOLA vulnerability in REST API, GraphQL introspection abuse
Reconnaissance Gather Victim Application Information T1592.004 GraphQL introspection to discover schema and data types
Collection Automated Collection T1119 Automated scraping of 2M+ patient records via API
Collection Data from Information Repositories T1530 Accessing patient health records, prescriptions, insurance claims
Exfiltration Exfiltration Over Web Service T1567 Data exfiltrated via legitimate API responses over HTTPS
Defense Evasion Valid Accounts T1078 12 legitimate accounts used to distribute requests
Defense Evasion Traffic Signaling T1205 Residential proxies and User-Agent rotation to evade detection
Impact Data Breach (PHI) 2,031,847 patient records exposed — HIPAA breach

Response Actions

Immediate Response (0–4 hours)

  • [ ] Contain: Revoke all 12 identified API keys used in scraping
  • [ ] Contain: Implement emergency IP blocking for known attacker infrastructure
  • [ ] Contain: Deploy temporary fix: enforce object-level authorization on all patient endpoints
  • [ ] Contain: Disable GraphQL introspection in production immediately
  • [ ] Detect: Deploy monitoring for sequential ID access patterns on all API endpoints
  • [ ] Legal: Engage breach response counsel — initiate HIPAA breach assessment

Short-Term Response (1–7 days)

  • [ ] Investigate: Full API log analysis — identify all affected patient records
  • [ ] Investigate: Determine exact data types exposed per patient (PII, PHI, insurance)
  • [ ] Remediate: Implement object-level authorization across all 47 REST endpoints
  • [ ] Remediate: Deploy GraphQL query depth limiting (max depth: 5) and cost analysis
  • [ ] Remediate: Implement per-user aggregate rate limiting (not just per-key)
  • [ ] Notify: File HIPAA breach report with HHS OCR
  • [ ] Notify: Prepare individual patient notification letters
  • [ ] Monitor: Deploy dark web monitoring for HealthBridge data listings

Long-Term Remediation (1–8 weeks)

  • [ ] Harden: Implement API security gateway with BOLA detection (42Crunch, Salt Security)
  • [ ] Harden: Replace sequential integer IDs with UUIDs across all API endpoints
  • [ ] Harden: Deploy API-specific WAF rules (bot detection, behavioral analysis)
  • [ ] Harden: Implement field-level authorization — mask sensitive fields (SSN, full DOB) in API responses
  • [ ] Harden: Add API security testing to CI/CD pipeline (OWASP ZAP API scan, contract testing)
  • [ ] Harden: Implement response data tokenization for sensitive fields
  • [ ] Harden: Deploy user behavior analytics for API access patterns
  • [ ] Comply: Complete HIPAA corrective action plan per OCR findings
  • [ ] Comply: Conduct third-party security assessment and penetration test
  • [ ] Train: API security training for all developers — OWASP API Security Top 10

Remediation Playbook

API Security Controls

Authorization Hardening:

  • [ ] Implement object-level authorization (OLA) on every endpoint that returns user-specific data
  • [ ] Use policy-based authorization middleware — centralize authorization logic, do not embed in individual route handlers
  • [ ] Adopt attribute-based access control (ABAC) for complex data relationships (patient-provider-facility)
  • [ ] Replace sequential integer IDs with UUIDv4 across all API resources — eliminate predictable enumeration
  • [ ] Implement field-level authorization — redact or mask sensitive fields based on the caller's role and relationship to the data subject
  • [ ] Enforce authorization checks at the data layer (database row-level security) as a defense-in-depth measure

GraphQL-Specific Controls:

  • [ ] Disable introspection in production — expose schema documentation through a separate developer portal
  • [ ] Implement query depth limiting (maximum depth: 5–7 depending on use case)
  • [ ] Deploy query cost analysis — assign cost weights to fields and reject queries exceeding budget
  • [ ] Implement persisted queries — only allow pre-registered query shapes in production
  • [ ] Apply field-level authorization in GraphQL resolvers — not just at the query level
  • [ ] Rate limit by query complexity, not just request count

Rate Limiting & Abuse Detection:

  • [ ] Implement per-user aggregate rate limiting — track total records accessed per user per day
  • [ ] Deploy sliding window rate limiting with multiple dimensions (per-IP, per-user, per-endpoint, per-resource)
  • [ ] Set data access thresholds — alert when any user accesses more than N unique records (e.g., 50 patients)
  • [ ] Implement request fingerprinting — detect distributed scraping across multiple accounts with similar patterns
  • [ ] Deploy bot detection using behavioral analysis (request timing variance, mouse/touch events, session patterns)
  • [ ] Monitor for sequential ID enumeration patterns in API access logs

API Security Testing:

  • [ ] Integrate OWASP ZAP API scanning into CI/CD pipeline — test for BOLA, BFLA, mass assignment
  • [ ] Conduct quarterly API penetration testing with OWASP API Security Top 10 focus
  • [ ] Implement API contract testing — verify authorization behavior in automated tests
  • [ ] Deploy runtime API security monitoring (Salt Security, 42Crunch, Traceable AI)
  • [ ] Establish bug bounty program with specific scope for API vulnerabilities

Lessons Learned

What Went Well

  • API gateway logs were complete and retained for 90 days, enabling full forensic reconstruction
  • Dark web monitoring by a third-party threat intel firm provided the initial breach notification
  • Emergency API patch to enforce object-level authorization was deployed within 5 days of discovery
  • PostgreSQL database audit logs captured all queries, enabling precise scoping of affected records

What Failed

  • No object-level authorization: The BOLA vulnerability — OWASP API #1 — was present in 14 of 47 endpoints. Authorization checked "is this token valid?" but never "is this user allowed to access this specific patient's records?"
  • GraphQL introspection enabled in production: The full API schema was exposed, giving the attacker a detailed map of all data types and relationships
  • Per-key rate limiting only: Rate limiting was applied per API key (100 req/min) but not per user or in aggregate. 12 keys at 40 req/min each = 480 req/min undetected
  • No API-specific WAF: Cloudflare was configured for web traffic but API endpoints had no bot detection, behavioral analysis, or anomaly detection
  • No data access monitoring: No alert existed for "single user accessing more than N unique patient records." A threshold of 50 records/user would have detected the attack within minutes
  • Sequential integer IDs: Predictable resource identifiers made enumeration trivial — UUIDs would have required the attacker to discover valid IDs through other means

Key Takeaways

  1. BOLA/IDOR is the most common and impactful API vulnerability — every API endpoint that returns user-specific data must enforce object-level authorization
  2. Rate limiting per API key is insufficient — aggregate monitoring across accounts and per-user data access volumes are essential for detecting distributed scraping
  3. APIs are the new attack surface — traditional web security controls (WAF, DLP) often do not cover API endpoints; purpose-built API security tooling is required
  4. GraphQL requires specific security controls — introspection, depth limiting, cost analysis, and field-level authorization are all necessary for production GraphQL APIs
  5. Data breach costs dwarf prevention costs — the $26M–$41M breach impact versus $450K remediation cost represents a 60–90x return on proactive security investment

Debrief Guide

Debrief: What Went Well

  • Complete API logging enabled forensic reconstruction of every malicious request
  • Third-party threat intelligence monitoring provided the breach discovery signal
  • HealthBridge legal team was well-prepared with HIPAA breach response procedures

Debrief: Key Learning Points

  • BOLA is trivial to exploit and devastating in impact — changing an integer in a URL should never grant access to another user's data
  • Distributed scraping defeats IP-based controls — user-level behavioral analytics are the correct detection layer
  • API responses ARE the exfiltration channel — traditional DLP cannot detect data leaving through the application's front door
  • HIPAA breach costs are catastrophic — $26M–$41M for a preventable vulnerability class
  • Sequential IDs are a gift to attackers — UUIDs add a meaningful barrier to enumeration
  • [ ] Implement object-level authorization across all REST and GraphQL endpoints
  • [ ] Replace all sequential integer IDs with UUIDv4
  • [ ] Deploy API security gateway with behavioral analysis
  • [ ] Disable GraphQL introspection and implement query cost analysis
  • [ ] Implement per-user data access volume monitoring with threshold alerting
  • [ ] Add OWASP API Security Top 10 testing to CI/CD pipeline
  • [ ] Conduct HIPAA Security Rule gap assessment with focus on access controls
  • [ ] Establish API security review as a mandatory gate for all new endpoint deployments

Discussion Questions

  1. The BOLA vulnerability was in production for an estimated 14 months before exploitation. Why is BOLA so common in modern APIs? What design patterns (policy-based authorization middleware, attribute-based access control) prevent BOLA at the architecture level?
  2. The attacker used 2,400 residential proxies to distribute requests below rate limits. Traditional IP-based rate limiting fails against distributed attacks. What alternative rate limiting strategies (per-user record access limits, request fingerprinting, behavioral analysis) would be effective?
  3. Data was exfiltrated via legitimate API responses — there was no "side channel." How does this challenge traditional DLP approaches? What API-specific data loss prevention controls exist?
  4. The estimated breach cost is $26M–$41M, while the BOLA fix cost $450K. How do you make the business case for API security investment before a breach? What metrics and frameworks (FAIR, risk quantification) help justify security spending to the board?
  5. HealthBridge is a HIPAA covered entity. How do the HIPAA Security Rule's requirements for access controls (45 CFR 164.312(a)) specifically apply to API authorization? Would compliance with HIPAA requirements have prevented this breach?

References