SC-028: API Abuse Leading to Mass Data Exfiltration¶
Scenario Header
Type: API Security / Data Breach | Difficulty: ★★★★☆ | Duration: 3–4 hours | Participants: 4–8
Threat Actor: DATA WRAITH — financially motivated threat group specializing in mass data theft via API abuse
Primary ATT&CK Techniques: T1190 · T1530 · T1567 · T1119
Threat Actor Profile¶
DATA WRAITH is a data-brokerage threat group active since 2024, specializing in exploiting insecure APIs to harvest large volumes of personally identifiable information (PII), protected health information (PHI), and financial records. The group does not deploy malware or establish persistent access — instead, they exploit business logic flaws and authorization vulnerabilities in APIs to extract data using entirely legitimate HTTP requests that blend with normal application traffic.
DATA WRAITH operates a marketplace on the dark web where stolen datasets are sold to identity theft rings, insurance fraud operations, and other criminal enterprises. Their preferred attack vector is Broken Object-Level Authorization (BOLA/IDOR) — the OWASP API Security Top 10 #1 vulnerability — which allows them to access resources belonging to other users by manipulating object identifiers in API requests.
The group demonstrates sophisticated understanding of API architectures, GraphQL introspection, rate limiting bypass techniques, and WAF evasion. They use distributed infrastructure (residential proxies and botnets) to distribute requests across thousands of IP addresses, making rate-based detection extremely difficult.
Motivation: Financial — data brokerage, PII/PHI sale on dark web markets.
Estimated Revenue: $3M–$6M annually from selling stolen datasets of 50M+ records across ~30 breached organizations.
Target Environment¶
Organization: HealthBridge Medical (fictional) — a digital health platform with 2.8 million registered patients, offering telemedicine, prescription management, and health records access through web and mobile applications.
| Component | Detail |
|---|---|
| Platform | HealthBridge Patient Portal (web + iOS + Android) |
| API Architecture | REST API v2 + GraphQL API (newer features) |
| API Gateway | Kong Gateway at 10.50.1.10 — rate limiting: 100 req/min per API key |
| Backend | Node.js microservices at 10.50.2.0/24 |
| Database | PostgreSQL cluster at 10.50.3.10 (primary), 10.50.3.11 (replica) |
| CDN/WAF | Cloudflare (web), no WAF on API endpoints |
| Authentication | OAuth 2.0 with JWT tokens — 1-hour expiry |
| Patient Records | 2.8M patients, ~14M health records, ~8M prescriptions |
| Compliance | HIPAA covered entity, SOC 2 Type II certified |
| External IPs | API endpoint: api.healthbridge.example.com via 203.0.113.200 |
| Monitoring | Datadog APM, ELK stack for API logs, PagerDuty for alerts |
Scenario Narrative¶
Phase 1 — API Reconnaissance & Vulnerability Discovery (~30 min)¶
DATA WRAITH begins by profiling HealthBridge's API surface. They create a legitimate patient account on the platform (testuser@example.com) and analyze the mobile application's API traffic using a proxy tool.
During reconnaissance, the attacker discovers several critical issues:
1. GraphQL Introspection Enabled in Production:
# GraphQL introspection query — should be disabled in production
{
__schema {
types {
name
fields {
name
type { name }
}
}
}
}
The introspection response reveals the complete schema, including types like Patient, HealthRecord, Prescription, InsuranceClaim, and ProviderNote — with fields like ssn, dateOfBirth, diagnosis, medications, and insuranceId.
2. BOLA Vulnerability in REST API:
The REST API uses sequential integer IDs for patient records:
GET /api/v2/patients/1847392/records HTTP/1.1
Authorization: Bearer eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...
The attacker's own patient ID is 1847392. By changing the ID to 1847391, 1847393, etc., they can access other patients' records — the API checks only that the JWT token is valid, not that the authenticated user is authorized to access the requested patient record.
3. GraphQL Query Depth Not Limited:
The GraphQL API allows deeply nested queries that can extract entire patient profiles in a single request:
query {
patient(id: 1847391) {
firstName
lastName
dateOfBirth
ssn
email
phone
address { street city state zip }
healthRecords {
date
diagnosis
provider
notes
}
prescriptions {
medication
dosage
prescriber
pharmacy
}
insuranceClaims {
claimId
amount
status
diagnosisCode
}
}
}
Evidence Artifacts:
| Artifact | Detail |
|---|---|
| API Gateway Log | POST /graphql — Introspection query — User: testuser@example.com — 2026-03-08T14:22:00Z |
| API Gateway Log | GET /api/v2/patients/1847391/records — User: testuser@example.com (own ID: 1847392) — Response: 200 OK — 2026-03-08T14:35:00Z |
| API Gateway Log | GET /api/v2/patients/1847393/records — User: testuser@example.com — Response: 200 OK — 2026-03-08T14:35:02Z |
| GraphQL Server Log | Query depth: 4 levels — No depth limit enforced — 2026-03-08T14:40:00Z |
| WAF | No API-specific WAF rules — Cloudflare configured for web only |
Phase 1 — Discussion Inject
Technical: GraphQL introspection was enabled in production, exposing the complete API schema. What is the security impact of introspection in production, and how do you disable it? What are the trade-offs between disabling introspection entirely vs. implementing schema-level access control?
Decision: Your security team discovers the BOLA vulnerability during this investigation. The engineering team argues that fixing it requires a major refactor of the authorization middleware — estimated at 3 weeks. Do you take the API offline (disrupting service for 2.8M patients) or implement a compensating control? What interim mitigations are available?
Expected Analyst Actions:
- [ ] Test all API endpoints for BOLA/IDOR vulnerabilities — verify authorization checks
- [ ] Verify GraphQL introspection is disabled in production
- [ ] Review API gateway rate limiting configuration for adequacy
- [ ] Check if API keys are validated against specific user permissions
- [ ] Audit GraphQL query depth and complexity limits
Phase 2 — Automated Scraping Infrastructure (~30 min)¶
DATA WRAITH builds an automated scraping system designed to extract patient records while evading detection. The system uses:
- Distributed Proxy Network: 2,400 residential proxy IPs across 15 countries — each IP makes ~40 requests/minute (below the 100 req/min rate limit per API key)
- Multiple API Keys: 12 legitimate patient accounts created with disposable email addresses, each with its own OAuth token
- Request Throttling: Randomized delays between 800ms and 2,200ms to simulate human-like behavior
- User-Agent Rotation: 47 different mobile User-Agent strings matching the HealthBridge iOS and Android apps
- Mixed Request Pattern: Legitimate-looking requests (viewing own profile, browsing providers) interspersed with BOLA exploitation requests at a 3:1 ratio
The scraping architecture:
[Controller Node]
|-- [Proxy Pool: 2,400 residential IPs]
| |-- IP-001: Account #1 -> 40 req/min -> patients 1-50000
| |-- IP-002: Account #2 -> 40 req/min -> patients 50001-100000
| |-- IP-003: Account #3 -> 40 req/min -> patients 100001-150000
| +-- ... (distributed across all 12 accounts)
|-- [Data Aggregation: S3-compatible storage]
+-- [Rate Monitor: auto-adjust if 429s detected]
The attacker alternates between the REST API (for bulk record retrieval) and GraphQL (for detailed patient profiles), using the GraphQL API for high-value records identified through the REST endpoint:
# Simplified scraping logic (educational — not weaponization)
import requests
import random
import time
def scrape_patient(patient_id, token, proxy):
headers = {
'Authorization': f'Bearer {token}',
'User-Agent': random.choice(USER_AGENTS),
}
resp = requests.get(
f'https://api.healthbridge.example.com/api/v2/patients/{patient_id}/records',
headers=headers,
proxies={'https': proxy}
)
time.sleep(random.uniform(0.8, 2.2))
return resp.json() if resp.status_code == 200 else None
# Used for high-value records after REST enumeration
query PatientFullProfile($id: Int!) {
patient(id: $id) {
firstName lastName dateOfBirth ssn email phone
address { street city state zip }
healthRecords(last: 50) {
date diagnosis provider notes labResults { testName result }
}
prescriptions(active: true) {
medication dosage refillsRemaining pharmacy { name address }
}
insuranceClaims(last: 20) {
claimId amount status diagnosisCode serviceDate
}
}
}
Over 18 days, the scraping system processes 2,847,000 API requests and successfully extracts 2,031,847 patient records.
Evidence Artifacts:
| Artifact | Detail |
|---|---|
| API Gateway Log | 2,847,000 total API requests over 18 days — 12 unique API keys — 2,400 unique source IPs |
| API Gateway Log | Average request rate per API key: 38.2 req/min (below 100 req/min limit) |
| API Response Analysis | 2,031,847 unique patient IDs accessed — 71.2% of total patient base |
| Database Query Log | 2,031,847 distinct patient_id values in SELECT queries — baseline: ~180,000/day |
| GraphQL Server Log | 412,000 GraphQL queries with depth > 3 — Avg response size: 14.2 KB |
| Cloudflare | No blocks — API traffic bypasses web WAF rules |
Phase 2 — Discussion Inject
Technical: The attacker used 2,400 residential proxies to distribute requests below the per-API-key rate limit. What detection approaches work when individual source IPs stay below thresholds? How would user behavior analytics (UBA) — analyzing access patterns per authenticated user rather than per IP — detect the scraping?
Decision: 18 days of scraping went undetected because each individual account's request rate was below the threshold. How do you design API monitoring that detects aggregate abuse across multiple accounts? What metrics (total unique records accessed per user, sequential ID access patterns, unusual geographic distribution) would you track?
Expected Analyst Actions:
- [ ] Analyze API access logs for sequential patient ID enumeration patterns
- [ ] Identify accounts accessing records belonging to other patients (cross-user access)
- [ ] Calculate the number of unique patient records accessed per API key — compare to baseline
- [ ] Review database slow query logs for unusual bulk SELECT patterns
- [ ] Check for GraphQL queries with excessive depth or complexity
Phase 3 — Data Exfiltration & Staging (~25 min)¶
The scraped data is collected, deduplicated, and organized on DATA WRAITH's infrastructure. The final dataset contains:
| Data Category | Records | Fields |
|---|---|---|
| Patient Demographics | 2,031,847 | Name, DOB, SSN, email, phone, address |
| Health Records | 8,847,231 | Diagnosis, provider notes, lab results |
| Prescriptions | 4,231,098 | Medications, dosages, prescriber, pharmacy |
| Insurance Claims | 3,187,445 | Claim amounts, diagnosis codes, dates |
| Total Records | 18,297,621 | — |
The data is exfiltrated incrementally via legitimate API responses — each HTTP response is the exfiltration event itself, making traditional DLP detection impossible. The data never leaves through a "side channel"; it flows out through the front door as normal API responses.
DATA WRAITH prepares the dataset for sale on their dark web marketplace:
=== HEALTHBRIDGE MEDICAL — FULL PATIENT DATABASE ===
Records: 2,031,847 patients (71% of database)
Content: PII + PHI + Insurance + Prescriptions
Freshness: March 2026
Format: JSON / CSV / PostgreSQL dump
Verification: 500 sample records available
PRICING:
- Full dataset: $180,000 (BTC/XMR)
- Bulk PII only (name, SSN, DOB): $45,000
- PHI subset (diagnosis + prescriptions): $90,000
- Insurance claims subset: $60,000
- Per-record pricing: $0.12/record (min 10K)
The dark web listing appears on Day 22, four days after scraping completes. A threat intelligence firm discovers the listing and notifies HealthBridge on Day 25.
Evidence Artifacts:
| Artifact | Detail |
|---|---|
| Dark Web Listing | "HealthBridge Medical Full Patient Database" — Marketplace: DarkVault (fictional) — Listed: 2026-03-26 |
| Threat Intel Report | "HealthBridge patient data for sale" — Source: dark web monitoring — Reported to HealthBridge: 2026-03-29T09:15:00Z |
| API Gateway Log (retrospective) | Total data transferred via API responses: ~28.7 GB over 18 days — Average response: 10.1 KB |
| Database Audit (retrospective) | 2,031,847 distinct patient records accessed by 12 API keys — Normal per-user access: 1–50 records |
Phase 3 — Discussion Inject
Technical: The data was exfiltrated via legitimate API responses — each HTTP response contained patient data that the API was designed to serve. How does this challenge traditional DLP approaches? What API-specific data loss prevention controls (response field masking, aggregate data access limits, output tokenization) would help?
Decision: You learn about the breach from a threat intelligence firm who found your patient data on the dark web — not from internal detection. This is a HIPAA breach involving 2M+ patient records with PHI. What are the HIPAA Breach Notification Rule requirements? Who must you notify, and on what timeline? What is the estimated financial impact (OCR penalties, patient notification costs, credit monitoring, litigation)?
Expected Analyst Actions:
- [ ] Immediately disable all 12 API keys used in the scraping operation
- [ ] Engage legal counsel and HIPAA breach response team
- [ ] Determine exact scope: number of affected patients and types of data exposed
- [ ] Preserve all API logs, database audit logs, and application logs
- [ ] Begin drafting HIPAA breach notification to HHS OCR
Phase 4 — Incident Response & Regulatory Fallout (~25 min)¶
HealthBridge initiates incident response on Day 25. The investigation reveals the full scope over the following 10 days:
Investigation Timeline:
| Day | Activity | Findings |
|---|---|---|
| 25 | Threat intel notification received | Dark web listing confirmed as authentic with sample verification |
| 25 | IR team activated, API keys suspended | 12 suspicious API keys identified and disabled |
| 26 | API log analysis begins | Sequential patient ID access pattern identified across 12 accounts |
| 27 | BOLA vulnerability confirmed | Security team reproduces the authorization bypass |
| 28 | Full scope determined | 2,031,847 patients affected — 71.2% of database |
| 29 | HIPAA breach assessment completed | PHI breach confirmed — notification required |
| 30 | Emergency API patch deployed | Object-level authorization enforced on all endpoints |
| 31 | GraphQL introspection disabled | Query depth limiting implemented (max depth: 5) |
| 32 | Legal notification to HHS OCR | Breach report filed per HIPAA Breach Notification Rule |
| 35 | Patient notification begins | Individual notification to 2,031,847 affected patients |
Regulatory and Financial Impact:
| Category | Estimated Cost |
|---|---|
| HIPAA penalty (Tier 3: willful neglect, corrected) | $250,000–$1,500,000 |
| Patient notification (2M+ letters + call center) | $4,200,000 |
| Credit monitoring (2 years x 2M patients) | $6,100,000 |
| Forensic investigation and legal fees | $1,800,000 |
| API security remediation | $450,000 |
| Class action settlement (estimated) | $8,000,000–$15,000,000 |
| Regulatory compliance reassessment | $350,000 |
| Reputational damage (customer churn, reduced sign-ups) | $5,000,000–$12,000,000 |
| Total Estimated Impact | $26,150,000–$41,400,000 |
HIPAA Breach Notification Requirements (as applied):
| Requirement | Deadline | Action |
|---|---|---|
| HHS OCR notification | 60 days from discovery | Breach report filed Day 32 |
| Individual patient notification | 60 days from discovery | Notification letters mailed Day 35 |
| Media notification (>500 residents in a state) | 60 days from discovery | Press release Day 35 |
| State attorney general notification | Varies by state | Filed in all 50 states Day 33–35 |
Evidence Artifacts:
| Artifact | Detail |
|---|---|
| HHS OCR Breach Report | "Unauthorized access to PHI via API vulnerability" — 2,031,847 individuals — Filed: 2026-04-01 |
| HIPAA Risk Assessment | PHI types exposed: demographics, diagnosis, prescriptions, insurance — Risk: HIGH |
| API Security Audit | BOLA vulnerability in 14 of 47 REST endpoints — GraphQL authorization bypass in 3 query types |
| Penetration Test (post-incident) | 23 additional API security findings: rate limiting bypass, mass assignment, excessive data exposure |
Phase 4 — Discussion Inject
Technical: The post-incident penetration test found 23 additional API security findings. How do you build an API security testing program that catches BOLA and other OWASP API Top 10 vulnerabilities before deployment? What role do API security testing tools (Burp Suite, OWASP ZAP API scanning, 42Crunch) play in the CI/CD pipeline?
Decision: The estimated financial impact ranges from $26M to $41M. As HealthBridge's board of directors, what questions do you ask the CISO? Was this breach preventable? How do you evaluate whether the API security investment ($450K to remediate) should have been made proactively? What does the cost-benefit analysis look like?
Expected Analyst Actions:
- [ ] Complete forensic analysis and document chain of custody for all evidence
- [ ] Work with legal to prepare HIPAA breach notification content
- [ ] Coordinate with PR team on public communication strategy
- [ ] Implement API security testing in CI/CD pipeline to prevent regression
- [ ] Conduct tabletop exercise with lessons learned for executive leadership
Indicators of Compromise (IOCs)¶
Synthetic IOCs — For Training Only
All indicators below are fictional and created for this exercise. Do not use in production detection systems.
| IOC Type | Value | Context |
|---|---|---|
| IP Range | 203.0.113.0/24 (selected IPs) | Attacker infrastructure management |
| Residential Proxies | 2,400 IPs across 15 countries | Distributed scraping infrastructure |
| API Keys | 12 OAuth tokens from disposable accounts | Scraping authentication |
| Email Pattern | *@tempmail.example.com | Disposable accounts for API access |
| User-Agent | 47 HealthBridge mobile app User-Agents | Request camouflage |
| Access Pattern | Sequential patient ID enumeration | BOLA exploitation signature |
| GraphQL Query | Introspection: __schema { types { ... } } | Schema discovery |
| GraphQL Query | Patient full profile with depth > 3 | Deep data extraction |
| Dark Web Market | DarkVault (fictional) | Data sale platform |
| Data Volume | 28.7 GB over 18 days via API responses | Exfiltration volume |
Detection Opportunities¶
| Phase | Technique | ATT&CK | Detection Method | Difficulty |
|---|---|---|---|---|
| 1 | Exploit public-facing application (BOLA) | T1190 | API testing: automated BOLA detection in CI/CD pipeline | Medium |
| 1 | GraphQL introspection | T1190 | Disable introspection in production, alert on introspection queries | Easy |
| 2 | Automated collection | T1119 | User behavior analytics: sequential ID access, cross-user data access | Medium |
| 2 | Rate limit evasion | T1119 | Aggregate rate monitoring across all accounts, not just per-key | Hard |
| 3 | Data from information repositories | T1530 | API response volume monitoring per user — alert on anomalous data access | Medium |
| 3 | Exfiltration over web service | T1567 | Aggregate data access tracking — per-user record count thresholds | Medium |
| 4 | Data breach (PHI) | — | Dark web monitoring for organization-specific data listings | Medium |
SIEM Detection Queries¶
// Detect sequential patient ID enumeration
ApiManagementGatewayLogs
| where RequestUrl matches regex @"/patients/\d+/"
| extend PatientId = toint(extract(@"/patients/(\d+)/", 1, RequestUrl))
| summarize
MinId = min(PatientId),
MaxId = max(PatientId),
UniqueIds = dcount(PatientId),
RequestCount = count()
by ApiKey = tostring(parse_json(RequestHeaders)["Authorization"]), bin(TimeGenerated, 1h)
| where UniqueIds > 100
| where (MaxId - MinId) / UniqueIds < 2 // Sequential pattern detection
| sort by UniqueIds desc
// Detect cross-user patient record access (BOLA indicator)
ApiManagementGatewayLogs
| where RequestUrl has "/patients/"
| extend PatientId = extract(@"/patients/(\d+)/", 1, RequestUrl)
| extend AuthUser = extract(@"sub:(\w+)", 1, tostring(parse_json(RequestHeaders)["Authorization"]))
| where PatientId != AuthUser // Accessing records of other patients
| summarize CrossAccessCount = count(), UniquePatients = dcount(PatientId) by AuthUser, bin(TimeGenerated, 1h)
| where CrossAccessCount > 10
// Detect GraphQL introspection queries
ApiManagementGatewayLogs
| where RequestUrl has "/graphql"
| where RequestBody has "__schema" or RequestBody has "__type"
| project TimeGenerated, CallerIpAddress, RequestBody
// Detect excessive API data volume per user
ApiManagementGatewayLogs
| summarize TotalResponseBytes = sum(ResponseSize), RequestCount = count() by ApiKey = tostring(parse_json(RequestHeaders)["Authorization"]), bin(TimeGenerated, 1d)
| where TotalResponseBytes > 100000000 // >100MB per day
| sort by TotalResponseBytes desc
// Detect sequential patient ID enumeration
index=api sourcetype=kong:access uri="/api/v2/patients/*/records"
| rex field=uri "/patients/(?<patient_id>\d+)/"
| bin _time span=1h
| stats min(patient_id) as min_id, max(patient_id) as max_id,
dc(patient_id) as unique_ids, count as req_count by api_key, _time
| where unique_ids > 100
| eval sequential_ratio = (max_id - min_id) / unique_ids
| where sequential_ratio < 2
| sort -unique_ids
// Detect cross-user patient access (BOLA exploitation)
index=api sourcetype=kong:access uri="/api/v2/patients/*/records" status=200
| rex field=uri "/patients/(?<accessed_id>\d+)/"
| eval own_id = jwt_decode(api_key, "patient_id")
| where accessed_id != own_id
| stats dc(accessed_id) as unique_patients, count as access_count by api_key
| where unique_patients > 10
| sort -unique_patients
// Detect GraphQL introspection
index=api sourcetype=graphql method=POST
(query="*__schema*" OR query="*__type*")
| table _time, src_ip, user, query
// Detect accounts with anomalous data access volumes
index=api sourcetype=kong:access status=200
| stats sum(response_size) as total_bytes, dc(uri) as unique_endpoints,
count as total_requests by api_key
| eval total_mb = round(total_bytes/1048576, 2)
| where total_mb > 100
| sort -total_mb
ATT&CK Mapping¶
| Tactic | Technique | ID | Scenario Application |
|---|---|---|---|
| Initial Access | Exploit Public-Facing Application | T1190 | BOLA vulnerability in REST API, GraphQL introspection abuse |
| Reconnaissance | Gather Victim Application Information | T1592.004 | GraphQL introspection to discover schema and data types |
| Collection | Automated Collection | T1119 | Automated scraping of 2M+ patient records via API |
| Collection | Data from Information Repositories | T1530 | Accessing patient health records, prescriptions, insurance claims |
| Exfiltration | Exfiltration Over Web Service | T1567 | Data exfiltrated via legitimate API responses over HTTPS |
| Defense Evasion | Valid Accounts | T1078 | 12 legitimate accounts used to distribute requests |
| Defense Evasion | Traffic Signaling | T1205 | Residential proxies and User-Agent rotation to evade detection |
| Impact | Data Breach (PHI) | — | 2,031,847 patient records exposed — HIPAA breach |
Response Actions¶
Immediate Response (0–4 hours)
- [ ] Contain: Revoke all 12 identified API keys used in scraping
- [ ] Contain: Implement emergency IP blocking for known attacker infrastructure
- [ ] Contain: Deploy temporary fix: enforce object-level authorization on all patient endpoints
- [ ] Contain: Disable GraphQL introspection in production immediately
- [ ] Detect: Deploy monitoring for sequential ID access patterns on all API endpoints
- [ ] Legal: Engage breach response counsel — initiate HIPAA breach assessment
Short-Term Response (1–7 days)
- [ ] Investigate: Full API log analysis — identify all affected patient records
- [ ] Investigate: Determine exact data types exposed per patient (PII, PHI, insurance)
- [ ] Remediate: Implement object-level authorization across all 47 REST endpoints
- [ ] Remediate: Deploy GraphQL query depth limiting (max depth: 5) and cost analysis
- [ ] Remediate: Implement per-user aggregate rate limiting (not just per-key)
- [ ] Notify: File HIPAA breach report with HHS OCR
- [ ] Notify: Prepare individual patient notification letters
- [ ] Monitor: Deploy dark web monitoring for HealthBridge data listings
Long-Term Remediation (1–8 weeks)
- [ ] Harden: Implement API security gateway with BOLA detection (42Crunch, Salt Security)
- [ ] Harden: Replace sequential integer IDs with UUIDs across all API endpoints
- [ ] Harden: Deploy API-specific WAF rules (bot detection, behavioral analysis)
- [ ] Harden: Implement field-level authorization — mask sensitive fields (SSN, full DOB) in API responses
- [ ] Harden: Add API security testing to CI/CD pipeline (OWASP ZAP API scan, contract testing)
- [ ] Harden: Implement response data tokenization for sensitive fields
- [ ] Harden: Deploy user behavior analytics for API access patterns
- [ ] Comply: Complete HIPAA corrective action plan per OCR findings
- [ ] Comply: Conduct third-party security assessment and penetration test
- [ ] Train: API security training for all developers — OWASP API Security Top 10
Remediation Playbook¶
API Security Controls
Authorization Hardening:
- [ ] Implement object-level authorization (OLA) on every endpoint that returns user-specific data
- [ ] Use policy-based authorization middleware — centralize authorization logic, do not embed in individual route handlers
- [ ] Adopt attribute-based access control (ABAC) for complex data relationships (patient-provider-facility)
- [ ] Replace sequential integer IDs with UUIDv4 across all API resources — eliminate predictable enumeration
- [ ] Implement field-level authorization — redact or mask sensitive fields based on the caller's role and relationship to the data subject
- [ ] Enforce authorization checks at the data layer (database row-level security) as a defense-in-depth measure
GraphQL-Specific Controls:
- [ ] Disable introspection in production — expose schema documentation through a separate developer portal
- [ ] Implement query depth limiting (maximum depth: 5–7 depending on use case)
- [ ] Deploy query cost analysis — assign cost weights to fields and reject queries exceeding budget
- [ ] Implement persisted queries — only allow pre-registered query shapes in production
- [ ] Apply field-level authorization in GraphQL resolvers — not just at the query level
- [ ] Rate limit by query complexity, not just request count
Rate Limiting & Abuse Detection:
- [ ] Implement per-user aggregate rate limiting — track total records accessed per user per day
- [ ] Deploy sliding window rate limiting with multiple dimensions (per-IP, per-user, per-endpoint, per-resource)
- [ ] Set data access thresholds — alert when any user accesses more than N unique records (e.g., 50 patients)
- [ ] Implement request fingerprinting — detect distributed scraping across multiple accounts with similar patterns
- [ ] Deploy bot detection using behavioral analysis (request timing variance, mouse/touch events, session patterns)
- [ ] Monitor for sequential ID enumeration patterns in API access logs
API Security Testing:
- [ ] Integrate OWASP ZAP API scanning into CI/CD pipeline — test for BOLA, BFLA, mass assignment
- [ ] Conduct quarterly API penetration testing with OWASP API Security Top 10 focus
- [ ] Implement API contract testing — verify authorization behavior in automated tests
- [ ] Deploy runtime API security monitoring (Salt Security, 42Crunch, Traceable AI)
- [ ] Establish bug bounty program with specific scope for API vulnerabilities
Lessons Learned¶
What Went Well¶
- API gateway logs were complete and retained for 90 days, enabling full forensic reconstruction
- Dark web monitoring by a third-party threat intel firm provided the initial breach notification
- Emergency API patch to enforce object-level authorization was deployed within 5 days of discovery
- PostgreSQL database audit logs captured all queries, enabling precise scoping of affected records
What Failed¶
- No object-level authorization: The BOLA vulnerability — OWASP API #1 — was present in 14 of 47 endpoints. Authorization checked "is this token valid?" but never "is this user allowed to access this specific patient's records?"
- GraphQL introspection enabled in production: The full API schema was exposed, giving the attacker a detailed map of all data types and relationships
- Per-key rate limiting only: Rate limiting was applied per API key (100 req/min) but not per user or in aggregate. 12 keys at 40 req/min each = 480 req/min undetected
- No API-specific WAF: Cloudflare was configured for web traffic but API endpoints had no bot detection, behavioral analysis, or anomaly detection
- No data access monitoring: No alert existed for "single user accessing more than N unique patient records." A threshold of 50 records/user would have detected the attack within minutes
- Sequential integer IDs: Predictable resource identifiers made enumeration trivial — UUIDs would have required the attacker to discover valid IDs through other means
Key Takeaways¶
- BOLA/IDOR is the most common and impactful API vulnerability — every API endpoint that returns user-specific data must enforce object-level authorization
- Rate limiting per API key is insufficient — aggregate monitoring across accounts and per-user data access volumes are essential for detecting distributed scraping
- APIs are the new attack surface — traditional web security controls (WAF, DLP) often do not cover API endpoints; purpose-built API security tooling is required
- GraphQL requires specific security controls — introspection, depth limiting, cost analysis, and field-level authorization are all necessary for production GraphQL APIs
- Data breach costs dwarf prevention costs — the $26M–$41M breach impact versus $450K remediation cost represents a 60–90x return on proactive security investment
Debrief Guide¶
Debrief: What Went Well¶
- Complete API logging enabled forensic reconstruction of every malicious request
- Third-party threat intelligence monitoring provided the breach discovery signal
- HealthBridge legal team was well-prepared with HIPAA breach response procedures
Debrief: Key Learning Points¶
- BOLA is trivial to exploit and devastating in impact — changing an integer in a URL should never grant access to another user's data
- Distributed scraping defeats IP-based controls — user-level behavioral analytics are the correct detection layer
- API responses ARE the exfiltration channel — traditional DLP cannot detect data leaving through the application's front door
- HIPAA breach costs are catastrophic — $26M–$41M for a preventable vulnerability class
- Sequential IDs are a gift to attackers — UUIDs add a meaningful barrier to enumeration
Recommended Follow-Up¶
- [ ] Implement object-level authorization across all REST and GraphQL endpoints
- [ ] Replace all sequential integer IDs with UUIDv4
- [ ] Deploy API security gateway with behavioral analysis
- [ ] Disable GraphQL introspection and implement query cost analysis
- [ ] Implement per-user data access volume monitoring with threshold alerting
- [ ] Add OWASP API Security Top 10 testing to CI/CD pipeline
- [ ] Conduct HIPAA Security Rule gap assessment with focus on access controls
- [ ] Establish API security review as a mandatory gate for all new endpoint deployments
Discussion Questions¶
- The BOLA vulnerability was in production for an estimated 14 months before exploitation. Why is BOLA so common in modern APIs? What design patterns (policy-based authorization middleware, attribute-based access control) prevent BOLA at the architecture level?
- The attacker used 2,400 residential proxies to distribute requests below rate limits. Traditional IP-based rate limiting fails against distributed attacks. What alternative rate limiting strategies (per-user record access limits, request fingerprinting, behavioral analysis) would be effective?
- Data was exfiltrated via legitimate API responses — there was no "side channel." How does this challenge traditional DLP approaches? What API-specific data loss prevention controls exist?
- The estimated breach cost is $26M–$41M, while the BOLA fix cost $450K. How do you make the business case for API security investment before a breach? What metrics and frameworks (FAIR, risk quantification) help justify security spending to the board?
- HealthBridge is a HIPAA covered entity. How do the HIPAA Security Rule's requirements for access controls (45 CFR 164.312(a)) specifically apply to API authorization? Would compliance with HIPAA requirements have prevented this breach?