GraphRAG Demo: Before & After¶

How a 448-node knowledge graph transforms AI-generated cybersecurity guidance

What is GraphRAG?¶

Graph Retrieval-Augmented Generation (GraphRAG) is a technique that enhances Large Language Model (LLM) responses by injecting structured knowledge from a graph database into the prompt context. Instead of relying solely on the LLM's parametric memory (what it learned during training), GraphRAG retrieves a relevant subgraph -- specific concepts, their definitions, and the prerequisite edges between them -- and includes that structured context alongside the user's question.

The result: responses that are specific, dependency-aware, and structurally grounded rather than generic and flat.

The Nexus SecOps Knowledge Graph

448 nodes representing cybersecurity concepts (CIA Triad, SIEM, Detection Engineering, Threat Hunting, etc.)
620 edges encoding prerequisite relationships between concepts
10 taxonomy categories spanning foundations, detection, response, governance, and more

How It Works¶

GraphRAG follows a three-step pipeline:

Seed extraction -- Identify key topics from the user's question (e.g., "Detection Engineering" maps to concept C078)
Subgraph retrieval -- Traverse the knowledge graph to extract the relevant neighborhood: the seed concepts, their prerequisites, and their dependents (typically 2 hops deep)
Context injection -- Serialize the subgraph (concepts + edges) into the LLM prompt, providing the model with structured domain knowledge it can reason over

The LLM then generates a response that references specific concept IDs, follows prerequisite ordering, and surfaces structural insights (dependency chains, coverage gaps, parallel paths) that are invisible without graph context.

5 Side-by-Side Comparisons¶

Each comparison shows the same question answered twice: once by a baseline LLM (no graph context) and once with the knowledge graph injected into the prompt.

Comparison 1: Prerequisite Chains for Detection Engineering¶

Question: "What prerequisites should I learn before studying Detection Engineering? List them in the order I should learn them."

Subgraph retrieved: 22 nodes, 28 edges | Seed concepts: C078 (Detection Engineering), C058 (Detection Rule), C060 (Detection Logic)

Without GraphWith Graph

To study Detection Engineering, you should first learn these prerequisite topics:

Networking Fundamentals — TCP/IP, DNS, HTTP, common protocols
Operating System Basics — Windows and Linux administration, file systems, processes
Security Fundamentals — CIA triad, defense in depth, risk management
Log Analysis — Understanding log formats, parsing, common fields
SIEM Platforms — How SIEMs work, data ingestion, search queries
Threat Landscape — Common attack types, malware families, attacker motivations
MITRE ATT&CK Framework — Tactics, techniques, procedures
Scripting — Python or PowerShell for automation
Regular Expressions — Pattern matching for detection rules
Incident Response Basics — Triage, containment, eradication

Start with networking and OS fundamentals, then move to security concepts, then SIEM and log analysis, and finally detection-specific skills. The learning path typically takes 3-6 months depending on your background.

Based on the Nexus SecOps knowledge graph, here is the exact prerequisite chain for Detection Engineering (C078), ordered by dependency:

Foundation Layer (Start Here)¶

[C001] CIA Triad — Core security principle: Confidentiality, Integrity, Availability. Every detection exists to protect one of these.
[C017] Security Event — Observable occurrence in a system. This is the base unit of all detection.
[C019] Log Source — Origin of all telemetry. You must understand where data comes from before you can detect in it.

Data Layer (Learn Second)¶

[C020] Telemetry — The flow of security-relevant data from sources to analysis platforms.
[C025] Log Normalization — Transforming diverse log formats into a common schema. Critical for writing portable detection rules.
[C027] SIEM — Security Information and Event Management platform where detection rules execute.

Detection Fundamentals (Learn Third)¶

[C004] MITRE ATT&CK — Knowledge base of adversary TTPs across 14 tactics. This is the vocabulary of detection.
[C037] TTP — Tactics, Techniques, and Procedures. You detect TTPs, not IOCs.
[C058] Detection Rule — A formal logic expression that identifies security-relevant patterns in telemetry.
[C060] Detection Logic — Boolean/temporal logic for combining conditions in rules.
[C059] Correlation Rule — Multi-event detection combining signals across time windows.

Advanced Prerequisites (Learn Fourth)¶

[C061] Baseline — Statistical normal behavior profile. Required for anomaly-based detection.
[C063] Detection Coverage — Measuring which ATT&CK techniques your rules address.
[C064] Detection Testing — Validating that rules fire correctly (atomic red team, unit tests).

Goal¶

[C078] Detection Engineering — The discipline of systematically building, testing, tuning, and maintaining detection capabilities at scale.

Key insight from the graph: Detection Engineering (C078) has 14 direct and transitive prerequisites. Skipping any of these creates gaps — for example, without understanding Log Normalization (C025), your detection rules will break when log formats change. The prerequisite chain reveals that Detection Engineering is NOT just about writing Sigma rules — it requires a foundation in telemetry, data pipelines, and threat modeling.

Why the Graph Makes a Difference

The baseline response lists 10 generic topics in a flat list with no dependency ordering. The graph-augmented response provides 15 specific concepts from the knowledge graph, organized into 4 prerequisite layers with explicit [C-ID] references. The graph reveals hidden dependencies (e.g., Log Normalization must come before SIEM) that the baseline misses entirely.

Comparison 2: Tracing the TI-to-IR Path¶

Question: "How does Threat Intelligence connect to Incident Response in a SOC workflow? Trace the path between these two disciplines."

Subgraph retrieved: 18 nodes, 24 edges | Seed concepts: C088 (Threat Intelligence), C081 (Alert Triage), C084 (Playbook)

Without GraphWith Graph

Threat Intelligence (TI) and Incident Response (IR) are deeply connected in SOC workflows:

How TI Feeds IR:

TI provides context about active threats, campaigns, and adversary behavior
Indicators of Compromise (IOCs) from TI are used to create detection rules
When an alert fires, analysts use TI to enrich the investigation
TI helps determine the severity and urgency of incidents

How IR Feeds TI:

Post-incident analysis produces new IOCs and TTPs
Lessons learned improve future threat assessments
IR findings help refine threat models

The Workflow:

TI team monitors threat feeds and produces intelligence products
Detection team uses TI to create or update rules
Alert fires in SIEM
SOC analyst triages using TI context
If escalated, IR team investigates with TI support
Post-incident, new intelligence feeds back to TI

This creates a continuous feedback loop between the two disciplines.

Based on the knowledge graph, here is the exact concept path connecting Threat Intelligence to Incident Response:

The Connection Path (7 concepts, 6 prerequisite edges)¶

[C088] Threat Intelligence
    → [C089] Threat Actor        (who is attacking)
    → [C037] TTP                 (how they attack)
    → [C004] MITRE ATT&CK        (structured vocabulary)
    → [C058] Detection Rule       (translates TTPs to alerts)
    → [C081] Alert Triage          (analyst evaluates the alert)
    → [C082] Escalation Criteria   (decides if it's an incident)
    → [C084] Playbook              (structured response procedure)

How Each Link Works¶

Threat Intelligence [C088] → Threat Actor [C089]: TI identifies WHO is attacking — APT groups, criminal orgs, insiders. This context determines response priority.
Threat Actor [C089] → TTP [C037]: Each actor uses specific Tactics, Techniques, and Procedures. Knowing the actor reveals expected behaviors to detect.
TTP [C037] → MITRE ATT&CK [C004]: TTPs are cataloged in ATT&CK's 14-tactic framework. This provides a shared vocabulary.
MITRE ATT&CK [C004] → Detection Rule [C058]: Each ATT&CK technique maps to one or more detection rules. This is where TI becomes operational.
Detection Rule [C058] → Alert Triage [C081]: When a rule fires, SOC Tier 1 analysts triage the alert using severity scoring and enrichment from TI.
Alert Triage [C081] → Escalation Criteria [C082]: If the alert matches known TTP patterns from TI, escalation priority increases.
Escalation → Playbook [C084]: Confirmed incidents trigger structured response playbooks, which reference TI for containment strategies.

Key Graph Insight¶

The knowledge graph reveals that TI and IR are NOT directly connected — they are linked through 5 intermediate concepts. This means a SOC that skips any link (e.g., has TI but no ATT&CK mapping, or has detection rules but no triage process) has a broken chain. The graph makes this structural gap visible.

Why the Graph Makes a Difference

The baseline describes a generic feedback loop. The graph-augmented response traces the exact 7-concept path with prerequisite edges, revealing that TI and IR are connected through 5 intermediate concepts -- not directly. This structural insight (broken chains cause SOC failures) is impossible without graph traversal.

Comparison 3: Learning Plan for a Career Transition¶

Question: "Design a 2-week learning plan for a network engineer transitioning to a SOC analyst role. What should they learn first, second, and third?"

Subgraph retrieved: 25 nodes, 32 edges | Seed concepts: C027 (SIEM), C081 (Alert Triage), C019 (Log Source)

Without GraphWith Graph

Here's a 2-week learning plan for a network engineer transitioning to SOC:

Week 1: Foundation

Day 1-2: Security fundamentals (CIA triad, threat landscape, common attacks)
Day 3-4: SIEM basics (Splunk/Sentinel, data ingestion, basic searches)
Day 5: Log analysis (Windows Event Logs, Syslog, firewall logs)

Week 2: SOC Operations

Day 1-2: Alert triage process (severity levels, false positive identification)
Day 3: Incident response basics (detection, containment, eradication)
Day 4: Threat intelligence fundamentals (IOCs, feeds, enrichment)
Day 5: Hands-on practice (set up home lab, run through sample alerts)

Tips:

Leverage your networking knowledge — you already understand packets, protocols, and traffic patterns
Focus on learning KQL or SPL query languages
Practice with real SIEM environments (Splunk free tier, Microsoft Sentinel sandbox)
Study for CompTIA CySA+ as a structured learning path

Using the knowledge graph's prerequisite ordering, here is a dependency-aware 2-week plan. As a network engineer, you already know networking concepts — this plan starts from your existing knowledge and builds forward through the graph.

What You Already Know (Skip These)¶

Based on your network engineering background, you likely already have:

[C019] Log Source (you understand Syslog, NetFlow, packet captures)
Network protocols, routing, switching

Week 1: Foundations → Data → Detection (Days 1-5)¶

Day 1: Security Foundations (~4 hours)

[C001] CIA Triad — frame everything in terms of what you're protecting
[C017] Security Event — reframe network events as security observables
Prerequisite chain: C001 → C017 (must understand CIA before classifying events)

Day 2: Telemetry Pipeline (~4 hours)

[C020] Telemetry — how security data flows from sources to analysis
[C025] Log Normalization — transforming diverse formats to common schema
You already know: Log sources [C019]. Now learn what happens AFTER collection.

Day 3-4: SIEM Operations (~8 hours)

[C027] SIEM — hands-on with Splunk/Sentinel. Run your first KQL queries.
[C059] Correlation Rule — multi-event detection across time windows
Prerequisite satisfied: C019 → C020 → C025 → C027 (you've followed the chain)

Day 5: Threat Framework (~4 hours)

[C004] MITRE ATT&CK — the 14 tactics and technique vocabulary
[C037] TTP — Tactics, Techniques, Procedures (what you're detecting)
Why now: You need ATT&CK vocabulary before writing detection rules

Week 2: Triage → Response → Practice (Days 6-10)¶

Day 6: Detection Rules (~4 hours)

[C058] Detection Rule — write your first Sigma/KQL detection
[C060] Detection Logic — Boolean and temporal conditions
Prerequisite satisfied: C004 → C037 → C058 (ATT&CK before rules)

Day 7: Alert Triage (~4 hours)

[C080] Severity Scoring — how to prioritize alerts
[C081] Alert Triage — the core Tier 1 analyst skill
This is your target concept. Everything before was preparation.

Day 8: Response Procedures (~4 hours)

[C082] Escalation Criteria — when to escalate vs. close
[C083] Runbook — following structured response procedures
Prerequisite chain: C081 → C082 → C083 (triage before escalation before runbook)

Day 9-10: Hands-On Practice (~8 hours)

Complete Lab 1 (Synthetic Alert Triage) and Lab 2 (Detection Tuning)
Run MicroSim 01 (Alert Triage) and MicroSim 09 (Detection Coverage)
Take Quiz Ch01-Ch05 to validate understanding

Key Graph Insight¶

The knowledge graph reveals that the optimal path from your starting point to Alert Triage [C081] crosses exactly 10 concepts in 4 layers. A generic syllabus would miss the prerequisite ordering — for example, you MUST learn Log Normalization [C025] before SIEM [C027], and MITRE ATT&CK [C004] before Detection Rules [C058]. The graph prevents you from hitting "I don't have the foundation for this" walls.

Why the Graph Makes a Difference

The baseline provides a generic syllabus. The graph-augmented response personalizes the plan by identifying what the learner already knows (Log Source C019), topologically sorting the remaining concepts, and providing explicit prerequisite chains with time estimates. It also links to specific Nexus SecOps labs and simulations.

Comparison 4: SOC Tier 1 Analyst Concept Mastery¶

Question: "What are the key concepts a SOC Tier 1 analyst must master, and what is the optimal learning order?"

Subgraph retrieved: 20 nodes, 26 edges | Seed concepts: C081 (Alert Triage), C083 (Runbook), C082 (Escalation Criteria)

Without GraphWith Graph

A SOC Tier 1 analyst should master these key concepts:

Core Skills:

Alert Monitoring and Triage
SIEM Tool Proficiency (Splunk, QRadar, or Sentinel)
Log Analysis and Interpretation
Network Traffic Analysis
Malware Identification (basic)
Incident Documentation
Escalation Procedures
Runbook Following

Supporting Knowledge:

TCP/IP Networking
Windows/Linux Administration
Common Attack Patterns
Security Frameworks (MITRE ATT&CK basics)
Regulatory Compliance Awareness

Learning Order: Start with networking and OS fundamentals, then SIEM tools, then alert triage procedures. Learn escalation criteria and documentation last. Most organizations provide 2-4 weeks of onboarding training covering these areas.

Recommended Certifications:

CompTIA Security+
CompTIA CySA+
Splunk Core Certified User

The knowledge graph identifies three goal concepts for SOC Tier 1 analysts, each requiring a specific prerequisite chain:

Target Concepts (Tier 1 Analyst Goals)¶

[C081] Alert Triage — evaluating and prioritizing security alerts
[C083] Runbook — following structured response procedures
[C082] Escalation Criteria — deciding when to escalate to Tier 2

Optimal Learning Order (Topological Sort)¶

Layer 0 (Foundations):  C001 CIA Triad → C017 Security Event → C019 Log Source
Layer 1 (Data):         C020 Telemetry → C025 Log Normalization → C027 SIEM
Layer 2 (Detection):    C058 Detection Rule → C059 Correlation Rule → C080 Severity Scoring
Layer 3 (Operations):   C081 Alert Triage → C082 Escalation Criteria → C083 Runbook

Total: 12 concepts in 4 layers. Estimated mastery time: 40-60 hours.

Each layer MUST be completed before the next:

Layer 0→1: You can't use a SIEM (C027) without understanding log sources (C019) and telemetry (C020)
Layer 1→2: You can't understand detection rules (C058) without SIEM knowledge (C027)
Layer 2→3: You can't triage alerts (C081) without understanding severity scoring (C080) and correlation rules (C059)

What the Graph Reveals That a Flat List Misses¶

Severity Scoring [C080] is a prerequisite for Alert Triage [C081] — many training programs teach triage without first teaching scoring frameworks, which leads to inconsistent prioritization
Runbook [C083] depends on both Triage [C081] AND Escalation [C082] — you can't follow a runbook effectively without knowing when to escalate vs. continue investigating
The minimum path is 12 concepts, not the 15+ topics in a typical boot camp — the graph eliminates noise by showing only what's actually required

Why the Graph Makes a Difference

The baseline lists 13+ skills and certifications in a flat structure. The graph-augmented response distills the requirement to exactly 12 concepts in 4 dependency layers, using topological sorting to reveal the optimal order. It also surfaces non-obvious insights like Severity Scoring being a prerequisite for Alert Triage.

Comparison 5: Concept Relationships in the Detection Ecosystem¶

Question: "Explain the relationship between MITRE ATT&CK, detection rules, Sigma rules, and threat hunting. How do these concepts build on each other?"

Subgraph retrieved: 16 nodes, 20 edges | Seed concepts: C004 (MITRE ATT&CK), C058 (Detection Rule), C062 (Sigma Rule)

Without GraphWith Graph

These four concepts are closely related and build on each other:

MITRE ATT&CK: A comprehensive knowledge base of adversary tactics, techniques, and procedures (TTPs). It provides a common language for describing attacker behavior across 14 tactics (Initial Access through Impact). Think of it as the "what" attackers do.

Detection Rules: Formal logic expressions (usually in KQL, SPL, or similar query languages) that identify malicious activity in log data. They translate ATT&CK techniques into specific patterns that SIEM tools can match. For example, detecting T1059.001 (PowerShell) might look for specific command-line arguments.

Sigma Rules: A vendor-agnostic detection rule format. Instead of writing separate rules for Splunk, Sentinel, and QRadar, you write one Sigma rule and convert it to any platform. Sigma standardizes detection engineering.

Threat Hunting: Proactive search for threats that evade automated detection. Hunters form hypotheses based on ATT&CK techniques and search for evidence. While detection rules are automated, hunting is human-driven and creative.

How They Build on Each Other: ATT&CK provides the framework → Detection rules operationalize it → Sigma standardizes the rules → Threat hunting fills the gaps where rules don't exist. They form a maturity progression: reactive detection → automated detection → proactive hunting.

The knowledge graph reveals a precise dependency chain connecting these four concepts:

The Prerequisite Chain¶

[C001] CIA Triad
  └→ [C003] Kill Chain
       └→ [C004] MITRE ATT&CK
            └→ [C037] TTP (Tactics, Techniques, Procedures)
                 ├→ [C058] Detection Rule
                 │    └→ [C060] Detection Logic (Boolean/temporal conditions)
                 │         └→ [C062] Sigma Rule (vendor-agnostic format)
                 └→ [C063] Detection Coverage
                      └→ [C065] Threat Hunting (hypothesis-driven proactive search)

How Each Prerequisite Link Works¶

Kill Chain [C003] → MITRE ATT&CK [C004]: ATT&CK expands on the Lockheed Martin kill chain with detailed TTPs. You need the kill chain mental model before ATT&CK makes sense.
MITRE ATT&CK [C004] → TTP [C037]: ATT&CK catalogs TTPs. Understanding the framework is prerequisite to understanding individual technique descriptions.
TTP [C037] → Detection Rule [C058]: Each TTP maps to one or more detection rules. You must understand the behavior you're detecting before writing the rule.
Detection Rule [C058] → Detection Logic [C060]: Rules use Boolean logic (AND/OR/NOT), temporal operators (within 5 minutes), and aggregation. This is the formal language of detection.
Detection Logic [C060] → Sigma Rule [C062]: Sigma is a YAML-based format that abstracts detection logic into platform-independent rules. You must understand detection logic before you can express it in Sigma's schema.
TTP [C037] → Detection Coverage [C063]: Coverage maps which TTPs have detection rules and which don't. This reveals gaps.
Detection Coverage [C063] → Threat Hunting [C065]: Hunting targets the gaps — TTPs where automated detection doesn't exist. Without a coverage map, you're hunting blind.

Key Graph Insight¶

The graph reveals two parallel paths from TTP:

Path A: TTP → Detection Rule → Detection Logic → Sigma Rule (the automation path)
Path B: TTP → Detection Coverage → Threat Hunting (the proactive path)

A mature SOC needs BOTH paths. The common mistake is building Path A without Path B — you end up with hundreds of detection rules but no systematic way to find the threats that evade them. The knowledge graph makes this structural dependency visible: Threat Hunting [C065] depends on Detection Coverage [C063], not on Detection Rule [C058] directly.

Why the Graph Makes a Difference

The baseline presents a linear maturity model (ATT&CK → rules → Sigma → hunting). The graph reveals a branching dependency tree with two parallel paths from TTP -- an automation path and a proactive path. This structural insight shows that Threat Hunting depends on Detection Coverage, not on Detection Rules directly, which changes how SOC teams should prioritize capability building.

Patterns Observed Across All 5 Comparisons¶

Pattern	Baseline (No Graph)	GraphRAG (With Graph)
Structure	Flat lists, bullet points	Layered dependency trees with explicit ordering
Specificity	Generic topic names	Concept IDs (C001, C078, etc.) with precise definitions
Dependencies	Implied or missing	Explicit prerequisite edges with rationale
Hidden insights	None	Structural patterns (broken chains, parallel paths, coverage gaps)
Actionability	"Learn these topics"	"Learn C001 before C017 because..."
Personalization	One-size-fits-all	Can account for prior knowledge (skip known concepts)

The Core Insight

Without graph context, LLMs produce plausible but generic answers based on training data patterns. With graph context, LLMs produce specific, dependency-aware answers that reveal structural relationships invisible in flat text. The knowledge graph acts as a reasoning scaffold -- it doesn't replace the LLM's language ability, it gives the LLM a map to reason over.

Try It Yourself¶

The GraphRAG pipeline used for these comparisons is available in the Nexus SecOps repository:

# Run a single query against the knowledge graph
python scripts/graph_rag_demo.py --question "Your question here"

# Regenerate all 5 pre-computed comparisons
python scripts/graph_rag_demo.py --batch

# View the knowledge graph data
cat docs/learning-graph/graph.json

The knowledge graph is stored in docs/learning-graph/graph.json with 448 nodes and 620 edges. Each node represents a cybersecurity concept with a unique ID (e.g., C078 = Detection Engineering), and each edge represents a prerequisite relationship.