Skip to content

Synthetic Log Datasets

Purpose: Practice data for Nexus SecOps labs and detection engineering exercises Classification: Synthetic — contains no real personal data Schema versions: v1.0


Overview

All lab datasets are synthetically generated. No real IP addresses, usernames, or organizational data are used. IP ranges follow RFC 5737 documentation ranges (192.0.2.0/24, 198.51.100.0/24, 203.0.113.0/24) for external IPs, and RFC 1918 ranges (10.x.x.x, 192.168.x.x) for internal.

Included datasets:

Dataset File Events Use In
Alert queue (20 mixed alerts) Embedded in Lab 1 20 Lab 1
PowerShell execution events Embedded in Lab 2 50 Lab 2
Ransomware attack timeline Embedded in Lab 3 ~40 Lab 3
Windows Security Event Log windows-security-events.jsonl 500 Lab 2, Lab 4
DNS query log dns-queries.jsonl 1000 Lab 1, Lab 5
Authentication log auth-events.jsonl 750 Lab 1, Lab 2
Process execution log process-events.jsonl 500 Lab 2, Lab 4
Network flow log netflow.jsonl 1000 Lab 3

To generate all datasets, run the log generator:

python3 generate-synthetic-logs.py --output ./datasets/ --seed 42

Schema Reference

Windows Security Event Log (JSONL)

Each line is a JSON object:

{
  "timestamp": "2026-02-19T08:15:32.441Z",
  "event_id": 4625,
  "computer": "CORP-WS-042",
  "user": "jsmith",
  "domain": "MERIDIAN",
  "logon_type": 3,
  "failure_reason": "Unknown user name or bad password",
  "src_ip": "192.168.1.42",
  "workstation": "CORP-WS-042",
  "process_name": "-",
  "keywords": ["Audit Failure"]
}

Key Event IDs included:

Event ID Description Typical Volume
4624 Successful logon High
4625 Failed logon Medium
4634 Account logoff High
4648 Explicit credential logon Low-Medium
4672 Special privileges assigned Low
4688 Process creation High
4698 Scheduled task created Low
4720 Account created Very Low
4724 Password reset Low
4728 Member added to security group Very Low
4732 Member added to local group Low
4768 Kerberos TGT request High
4776 NTLM authentication Medium

DNS Query Log (JSONL)

{
  "timestamp": "2026-02-19T08:01:14.882Z",
  "src_ip": "10.10.42.50",
  "src_host": "FINANCE-WS-042",
  "query": "c2.evilsite.ru",
  "query_type": "A",
  "response_code": "NOERROR",
  "response_ip": "198.51.100.42",
  "ttl": 60,
  "bytes": 128,
  "category": "malware-c2",
  "is_synthetic_malicious": true
}

Dataset composition: - 85% benign queries (Microsoft, Google, CDNs, internal services) - 10% ambiguous (cloud storage, file sharing, uncommon TLDs) - 5% synthetic malicious (C2 patterns, DGA-style domains, known bad TLDs)


Authentication Event Log (JSONL)

{
  "timestamp": "2026-02-19T07:51:00.001Z",
  "event_type": "login_success",
  "user": "jsmith",
  "source": "azure-ad",
  "src_ip": "203.0.113.15",
  "src_country": "GB",
  "src_city": "London",
  "device_id": "device-9a8b7c6d",
  "mfa_method": "authenticator_app",
  "mfa_result": "success",
  "risk_level": "high",
  "previous_login_ip": "192.0.2.12",
  "previous_login_country": "US",
  "previous_login_time": "2026-02-19T07:45:00.000Z",
  "travel_distance_km": 5570,
  "travel_time_minutes": 6,
  "is_synthetic_anomaly": true,
  "anomaly_type": "impossible_travel"
}

Process Execution Log (JSONL)

{
  "timestamp": "2026-02-19T08:15:22.000Z",
  "host": "FINANCE-WS-042",
  "user": "mlopez",
  "process_id": 4284,
  "process_name": "powershell.exe",
  "process_path": "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe",
  "command_line": "powershell.exe -ExecutionPolicy Bypass -WindowStyle Hidden -enc JABjAG8AbgBuAGUAYwB0...",
  "parent_process_id": 3912,
  "parent_process_name": "explorer.exe",
  "parent_process_path": "C:\\Windows\\explorer.exe",
  "hash_md5": "04029e121a0cfa5991749937dd22a1d9",
  "hash_sha256": "aec070645fe53ee3b3763059376134f058cc337247c978add178b6ccdfb0019f",
  "is_synthetic_malicious": true,
  "malicious_category": "encoded_powershell"
}

Network Flow Log (JSONL)

{
  "timestamp": "2026-02-19T08:05:00.000Z",
  "src_ip": "10.10.42.50",
  "src_port": 49152,
  "dst_ip": "198.51.100.42",
  "dst_port": 443,
  "protocol": "TCP",
  "bytes_sent": 2097152,
  "bytes_received": 4096,
  "packets_sent": 1450,
  "packets_received": 28,
  "duration_seconds": 45,
  "connection_state": "SF",
  "application": "ssl",
  "is_synthetic_anomaly": true,
  "anomaly_type": "large_outbound_transfer"
}

Sample Data Snippets

Sample: 5 Windows Auth Events (Mixed TP/FP)

{"timestamp":"2026-02-19T07:45:00.000Z","event_id":4624,"computer":"DC01","user":"jsmith","domain":"MERIDIAN","logon_type":10,"src_ip":"192.0.2.12","src_country":"US","keywords":["Audit Success"]}
{"timestamp":"2026-02-19T07:51:00.001Z","event_id":4624,"computer":"DC01","user":"jsmith","domain":"MERIDIAN","logon_type":10,"src_ip":"203.0.113.15","src_country":"GB","keywords":["Audit Success"],"is_synthetic_anomaly":true,"anomaly_type":"impossible_travel"}
{"timestamp":"2026-02-19T07:55:00.000Z","event_id":4625,"computer":"CORP-WS-101","user":"jdoe","domain":"MERIDIAN","logon_type":2,"failure_reason":"Bad password","src_ip":"192.168.1.101","keywords":["Audit Failure"]}
{"timestamp":"2026-02-19T07:55:42.000Z","event_id":4625,"computer":"CORP-WS-101","user":"jdoe","domain":"MERIDIAN","logon_type":2,"failure_reason":"Bad password","src_ip":"192.168.1.101","keywords":["Audit Failure"]}
{"timestamp":"2026-02-19T07:56:01.000Z","event_id":4624,"computer":"CORP-WS-101","user":"jdoe","domain":"MERIDIAN","logon_type":2,"src_ip":"192.168.1.101","keywords":["Audit Success"]}

Sample: 5 Process Events (Mixed)

{"timestamp":"2026-02-19T06:12:00.000Z","host":"MONSVR01","user":"svc-monitoring","process_name":"powershell.exe","command_line":"powershell.exe -EncodedCommand JAByAGUAcwBvAHUAcgBjAGUAcwA=","parent_process_name":"taskschd.exe","is_synthetic_malicious":false,"fp_category":"monitoring_scheduled_task"}
{"timestamp":"2026-02-19T08:15:22.000Z","host":"FINANCE-WS-042","user":"mlopez","process_name":"powershell.exe","command_line":"powershell.exe -ExecutionPolicy Bypass -WindowStyle Hidden -enc JABjAG8AbgBuAGUAYwB0","parent_process_name":"explorer.exe","is_synthetic_malicious":true,"malicious_category":"encoded_powershell"}
{"timestamp":"2026-02-19T08:22:00.000Z","host":"FINANCE-WS-042","user":"mlopez","process_name":"cryptor.exe","command_line":"cryptor.exe --target C:\\Users --key [redacted]","parent_process_name":"powershell.exe","is_synthetic_malicious":true,"malicious_category":"ransomware"}
{"timestamp":"2026-02-19T08:04:00.000Z","host":"FINANCE-WS-042","user":"mlopez","process_name":"procdump.exe","command_line":"procdump.exe -ma lsass.exe C:\\Windows\\Temp\\lsass.dmp","parent_process_name":"cmd.exe","is_synthetic_malicious":true,"malicious_category":"credential_dumping"}
{"timestamp":"2026-02-19T07:30:00.000Z","host":"SRV-PATCH01","user":"IT-Admin","process_name":"powershell.exe","command_line":"powershell.exe -ep bypass -File deploy.ps1","parent_process_name":"sccm.exe","is_synthetic_malicious":false,"fp_category":"sccm_deployment"}

Data Quality Notes

  • All timestamps are in ISO 8601 UTC format
  • Hostnames follow the pattern [DEPT]-[TYPE]-[NUMBER] (e.g., FINANCE-WS-042)
  • Usernames are fictional (jsmith, mlopez, agarcia, etc.) — no real individuals
  • IP addresses: Internal use RFC 1918; External use RFC 5737 documentation ranges only
  • The is_synthetic_malicious and is_synthetic_anomaly fields are ground truth labels for training/practice — they do not appear in real log data
  • Hash values are randomly generated and do not correspond to real malware

Lab 10 Threat Hunting Dataset

The threat hunting lab uses a larger, more realistic dataset generated by a dedicated script. It embeds five real TTPs at specific timestamps for structured discovery exercises.

Generating the Hunt Dataset

# Full scale (~380,000 events across 4 log types)
python generate-hunt-dataset.py --output ./hunt-data/

# Smaller scale for faster iteration
python generate-hunt-dataset.py --output ./hunt-data/ --scale 0.1

# Fixed seed for reproducibility
python generate-hunt-dataset.py --output ./hunt-data/ --seed 42

Generated Files

File Events (full scale) Size Contents
process_creation.jsonl 50,004 ~9 MB Sysmon Event ID 1; 4 embedded malicious processes
network_connections.jsonl 200,080 ~32 MB NetFlow; 80 C2 beacon events at 60s intervals
auth_events.jsonl 30,023 ~5 MB Windows auth; 3 Kerberoasting RC4 TGS requests
dns_queries.jsonl 100,000 ~14 MB DNS queries; no embedded tunneling (clean baseline)

Embedded TTPs (Instructor Reference)

TTP Technique Dataset Time Window Indicator
T1558.003 Kerberoasting auth_events Day 2 ~14:14 ticket_encryption_type: 0x17 (RC4)
T1003.001 LSASS Dump via comsvcs process_creation Day 2 ~14:22 comsvcs.dll MiniDump in command line
T1071.001 C2 Beaconing network_connections Day 2 12:00+ 60s intervals to 192.0.2.147:443; CV < 0.05
T1047 WMI Lateral Movement process_creation Day 2 ~15:05 WmiPrvSE.exe parent → cmd.exe child
T1053.005 Scheduled Task Persistence process_creation Day 2 ~15:12 schtasks /create with WindowsUpdateHelper

Schema Reference (process_creation.jsonl)

{
  "event_id": 1,
  "timestamp": "2026-03-02T14:22:13.000000",
  "hostname": "WRK447",
  "username": "jsmith",
  "image": "C:\\Windows\\System32\\rundll32.exe",
  "command_line": "C:\\Windows\\System32\\rundll32.exe C:\\Windows\\System32\\comsvcs.dll MiniDump 624 C:\\Windows\\Temp\\lsass.dmp full",
  "parent_image": "C:\\Windows\\System32\\cmd.exe",
  "parent_command_line": "C:\\Windows\\System32\\cmd.exe",
  "integrity_level": "High",
  "process_id": 9124
}

Synthetic Data

The C2 IP 192.0.2.147 is from the RFC 5737 TEST-NET documentation range and is not routable on the internet. All hostnames, usernames, and IPs are fictional.


Generate early-lab datasets: generate-synthetic-logs.py Generate Lab 10 hunt dataset: generate-hunt-dataset.py