Integration Patterns for Security Operations Tooling¶

A modern SOC operates a portfolio of tools that must communicate reliably. Poor integrations are a leading cause of alert loss, delayed response, and analyst frustration. This document describes the canonical integration patterns used in security operations.

SOC Tool Integration Map¶

flowchart LR
    EDR[EDR Platform]
    SIEM[SIEM]
    SOAR[SOAR Platform]
    TIP[Threat Intel Platform]
    CASE[Case Management\nJira / ServiceNow]
    IAM[IAM Platform\nAD / Entra]
    FW[Firewall / Proxy]
    EMAIL[Email Gateway]

    EDR -->|Alerts + Telemetry| SIEM
    EMAIL -->|Security events| SIEM
    FW -->|Network logs| SIEM
    TIP -->|IOCs + Context| SIEM
    SIEM -->|Correlated alerts| SOAR
    SOAR -->|Create/update tickets| CASE
    SOAR -->|Disable account| IAM
    SOAR -->|Block IP/domain| FW
    SOAR -->|Quarantine email| EMAIL
    SOAR -->|Isolate host| EDR
    TIP <-->|IOC sharing| SIEM
    CASE -->|Status updates| SOAR

Integration Pattern 1: Bidirectional REST API¶

Use case: SIEM ↔ SOAR, SOAR ↔ Case Management, SIEM ↔ TIP

How it works: Both systems expose REST APIs. System A pushes events to System B's API endpoint; System B may callback to System A's API for updates.

Implementation:

# Example: SOAR creates case in management system
import requests

def create_case(alert: dict) -> str:
    """Create a ServiceNow incident from a SIEM alert."""
    payload = {
        "short_description": f"[SOC Alert] {alert['rule_name']}",
        "description": format_alert_details(alert),
        "severity": map_severity(alert['severity']),  # SIEM to ITSM mapping
        "assignment_group": "SOC Tier 1",
        "u_mitre_technique": alert.get('mitre_technique', ''),
        "u_siem_alert_id": alert['alert_id']
    }
    response = requests.post(
        url=f"{SERVICENOW_URL}/api/now/table/incident",
        json=payload,
        auth=(SN_USER, SN_PASS),
        timeout=10
    )
    response.raise_for_status()
    incident_id = response.json()['result']['number']
    return incident_id

Authentication: OAuth 2.0 (preferred), API key, or Basic Auth over TLS

Error handling: - Retry with exponential backoff: 3 retries, delays of 1s / 5s / 30s - Dead letter queue for permanently failed requests - Alert on integration failure: "SOAR → Case Management integration down" — Nexus SecOps-110

Versioning: Pin to specific API version; test on version upgrades before deploying

Integration Pattern 2: Webhook / Event-Driven¶

Use case: Real-time alerting from SIEM to SOAR; EDR to SIEM; Cloud security alerts

How it works: When an event occurs, System A sends an HTTP POST to System B's pre-configured webhook URL.

[Event occurs in Source System]
    → POST https://soar.internal/webhook/siem-alert
    → Headers: X-Signature: HMAC-SHA256(payload, secret)
    → Body: JSON event payload
    → Response: 200 OK (acknowledged)

Security requirements for webhooks: - MUST use HMAC signature verification to prevent spoofing - Webhook endpoint MUST only accept connections from expected source IPs - Use HTTPS only — never plain HTTP

# Webhook signature verification
import hmac
import hashlib

def verify_webhook_signature(payload: bytes, signature_header: str, secret: str) -> bool:
    expected = hmac.new(
        secret.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()
    received = signature_header.replace("sha256=", "")
    return hmac.compare_digest(expected, received)

Idempotency: Webhook receivers MUST handle duplicate delivery. Include a unique event ID and deduplicate on that ID.

Advantages: Near-real-time; no polling overhead; source-driven Disadvantages: Receiver must be always-available; no built-in retry from source

Integration Pattern 3: TAXII/STIX for Threat Intelligence¶

Use case: Sharing and consuming threat intelligence indicators between TIP, ISAC, SIEM

How it works: TAXII 2.1 is the transport protocol; STIX 2.1 is the data format.

[TAXII Server (Intel Provider)]
    ├── Collection: /api/v21/collections/{id}/objects/
    └── Objects: STIX Bundles (Indicators, Malware, TTP)

[TAXII Client (Your TIP / SIEM)]
    └── Poll endpoint every N minutes for new objects
    └── Parse STIX objects → extract IOCs
    └── Import to TIP / SIEM blocklist / enrichment

STIX Indicator example:

{
  "type": "indicator",
  "id": "indicator--12345678-abcd-...",
  "created": "2024-11-15T10:00:00Z",
  "modified": "2024-11-15T10:00:00Z",
  "name": "Malicious IP — APT29 C2",
  "pattern": "[ipv4-addr:value = '198.51.100.42']",
  "pattern_type": "stix",
  "valid_from": "2024-11-15T10:00:00Z",
  "valid_until": "2025-02-15T10:00:00Z",
  "confidence": 85,
  "labels": ["malicious-activity"],
  "external_references": [
    {"source_name": "vendor-report", "url": "https://..."}
  ]
}

Implementation notes: - Use indicator valid_until for automatic TTL — Nexus SecOps-090 - Filter by confidence score (only ingest confidence ≥ 70) - Track false positive rate per feed — Nexus SecOps-095

Integration Pattern 4: CEF/Syslog for Legacy Tools¶

Use case: Network devices, legacy security tools that don't support REST APIs

Common Event Format (CEF):

CEF:0|Vendor|Product|Version|SignatureID|Name|Severity|Extension
CEF:0|Palo Alto|PAN-OS|10.1|threat|Malware Blocked|8|src=10.0.0.1 dst=1.2.3.4 cs1=block

Syslog forwarding with TLS:

# rsyslog configuration for TLS syslog forwarding
*.* action(type="omfwd"
    target="siem-collector.internal"
    port="6514"
    protocol="tcp"
    StreamDriver="gtls"
    StreamDriverMode="1"
    StreamDriverAuthMode="x509/name"
    StreamDriverPermittedPeers="siem-collector.internal")

Parsing challenges: - CEF extensions are semi-structured — create dedicated parser per source - Test parser against 1000+ real events before production deployment - Monitor parse error rate per source — Nexus SecOps-025

Integration Pattern 5: Cloud-Native Event Bus¶

Use case: AWS EventBridge, Azure Event Grid, GCP Pub/Sub for cloud security events

AWS example — SecurityHub findings to SIEM:

[AWS SecurityHub]
    → EventBridge rule: findings with severity HIGH/CRITICAL
    → EventBridge → Lambda
    → Lambda normalizes and forwards to Kinesis
    → Kinesis → SIEM

Benefits: - Serverless — no infrastructure to maintain - Auto-scaling with event volume - Native cloud authentication (IAM roles, no API keys) - Built-in replay capability (Kinesis Data Streams)

Nexus SecOps control: Nexus SecOps-008, Nexus SecOps-121

Authentication Patterns for Integrations¶

Pattern	Security Level	Use Case
API Key (Header)	Low	Internal integrations; simple tools
OAuth 2.0 Client Credentials	Medium	Service-to-service; automated flows
OAuth 2.0 + JWT	High	Identity-aware API calls
mTLS (Mutual TLS)	High	High-security integrations; financial/regulated
IAM Role (Cloud)	High	Cloud-native integrations (AWS/Azure/GCP)

Requirements (Nexus SecOps-104): - No API keys in source code or configuration files — use secrets vault - API keys rotated every 90 days minimum - All integration credentials unique (no shared credentials across integrations) - Credential access logged (who accessed the secret, when)

Rate Limiting and Error Handling¶

Rate limit handling:

id=__span-8-1>def api_call_with_retry(url, payload, max_retries=3): for attempt in range(max_retries): response = requests.post(url, json=payload, timeout=10) if response.status_code == 429: # Rate limited retry_after = int(response.headers.get('Retry-After', 60)) time.sleep(retry_after) continue if response.status_code >= 500: # Server error — retry backoff = 2 ** attempt # Exponential: 1s, 2s, 4s time.sleep(backoff) continue response.raise_for_status() # 4xx = programming error return response.json() # All retries exhausted — send to DLQ send_to_dead_letter_queue(url, payload) raise IntegrationException(f"API call failed after {max_retries} retries")

Circuit breaker pattern:

[Normal] → API call succeeds
[Degraded] → API calls failing > 50% in last 60s → Open circuit
[Open] → All calls fail fast; no actual API calls made
[Recovery] → After 30s, allow 1 test call → if success, close circuit

Integration Testing¶

Before deploying any new integration:

Test	What to Verify
Happy path	Expected payload produces expected result
Authentication failure	Integration handles 401/403 gracefully
Timeout	Integration handles network timeout without crashing
Rate limit	Integration handles 429 with backoff
Invalid payload	Integration handles malformed responses
Large payload	Integration handles oversized responses
Integration unavailable	System degrades gracefully; queues or alerts

Common Integration Anti-Patterns¶

Do Not Do These

Polling every 5 seconds when you need real-time: Use webhooks or event bus instead. High-frequency polling hammers APIs and costs money.
No retry logic: A single timeout drops the event silently. Always retry.
Hardcoded credentials: These end up in git history, logs, and breached secrets. Use a vault.
No integration health monitoring: Silent failures leave you thinking the integration is working when it's not. Monitor and alert — Nexus SecOps-110.
Trusting the payload without validation: Malicious content in log data can inject into SOAR or case management. Sanitize inputs.
Tight coupling: If SOAR cannot start because the SIEM is down, that's a brittle design. Use async queues between systems.
No idempotency: Duplicate webhook deliveries create duplicate tickets. Deduplicate on event ID.

See Reference Architecture | Data Pipeline Patterns Nexus SecOps controls: AUT domain (Nexus SecOps-096–110), TEL domain (Nexus SecOps-001–015)