Synthetic Log Datasets¶
Purpose: Practice data for Nexus SecOps labs and detection engineering exercises Classification: Synthetic — contains no real personal data Schema versions: v1.0
Overview¶
All lab datasets are synthetically generated. No real IP addresses, usernames, or organizational data are used. IP ranges follow RFC 5737 documentation ranges (192.0.2.0/24, 198.51.100.0/24, 203.0.113.0/24) for external IPs, and RFC 1918 ranges (10.x.x.x, 192.168.x.x) for internal.
Included datasets:
| Dataset | File | Events | Use In |
|---|---|---|---|
| Alert queue (20 mixed alerts) | Embedded in Lab 1 | 20 | Lab 1 |
| PowerShell execution events | Embedded in Lab 2 | 50 | Lab 2 |
| Ransomware attack timeline | Embedded in Lab 3 | ~40 | Lab 3 |
| Windows Security Event Log | windows-security-events.jsonl | 500 | Lab 2, Lab 4 |
| DNS query log | dns-queries.jsonl | 1000 | Lab 1, Lab 5 |
| Authentication log | auth-events.jsonl | 750 | Lab 1, Lab 2 |
| Process execution log | process-events.jsonl | 500 | Lab 2, Lab 4 |
| Network flow log | netflow.jsonl | 1000 | Lab 3 |
To generate all datasets, run the log generator:
Schema Reference¶
Windows Security Event Log (JSONL)¶
Each line is a JSON object:
{
"timestamp": "2026-02-19T08:15:32.441Z",
"event_id": 4625,
"computer": "CORP-WS-042",
"user": "jsmith",
"domain": "MERIDIAN",
"logon_type": 3,
"failure_reason": "Unknown user name or bad password",
"src_ip": "192.168.1.42",
"workstation": "CORP-WS-042",
"process_name": "-",
"keywords": ["Audit Failure"]
}
Key Event IDs included:
| Event ID | Description | Typical Volume |
|---|---|---|
| 4624 | Successful logon | High |
| 4625 | Failed logon | Medium |
| 4634 | Account logoff | High |
| 4648 | Explicit credential logon | Low-Medium |
| 4672 | Special privileges assigned | Low |
| 4688 | Process creation | High |
| 4698 | Scheduled task created | Low |
| 4720 | Account created | Very Low |
| 4724 | Password reset | Low |
| 4728 | Member added to security group | Very Low |
| 4732 | Member added to local group | Low |
| 4768 | Kerberos TGT request | High |
| 4776 | NTLM authentication | Medium |
DNS Query Log (JSONL)¶
{
"timestamp": "2026-02-19T08:01:14.882Z",
"src_ip": "10.10.42.50",
"src_host": "FINANCE-WS-042",
"query": "c2.evilsite.ru",
"query_type": "A",
"response_code": "NOERROR",
"response_ip": "198.51.100.42",
"ttl": 60,
"bytes": 128,
"category": "malware-c2",
"is_synthetic_malicious": true
}
Dataset composition: - 85% benign queries (Microsoft, Google, CDNs, internal services) - 10% ambiguous (cloud storage, file sharing, uncommon TLDs) - 5% synthetic malicious (C2 patterns, DGA-style domains, known bad TLDs)
Authentication Event Log (JSONL)¶
{
"timestamp": "2026-02-19T07:51:00.001Z",
"event_type": "login_success",
"user": "jsmith",
"source": "azure-ad",
"src_ip": "203.0.113.15",
"src_country": "GB",
"src_city": "London",
"device_id": "device-9a8b7c6d",
"mfa_method": "authenticator_app",
"mfa_result": "success",
"risk_level": "high",
"previous_login_ip": "192.0.2.12",
"previous_login_country": "US",
"previous_login_time": "2026-02-19T07:45:00.000Z",
"travel_distance_km": 5570,
"travel_time_minutes": 6,
"is_synthetic_anomaly": true,
"anomaly_type": "impossible_travel"
}
Process Execution Log (JSONL)¶
{
"timestamp": "2026-02-19T08:15:22.000Z",
"host": "FINANCE-WS-042",
"user": "mlopez",
"process_id": 4284,
"process_name": "powershell.exe",
"process_path": "C:\\Windows\\System32\\WindowsPowerShell\\v1.0\\powershell.exe",
"command_line": "powershell.exe -ExecutionPolicy Bypass -WindowStyle Hidden -enc JABjAG8AbgBuAGUAYwB0...",
"parent_process_id": 3912,
"parent_process_name": "explorer.exe",
"parent_process_path": "C:\\Windows\\explorer.exe",
"hash_md5": "04029e121a0cfa5991749937dd22a1d9",
"hash_sha256": "aec070645fe53ee3b3763059376134f058cc337247c978add178b6ccdfb0019f",
"is_synthetic_malicious": true,
"malicious_category": "encoded_powershell"
}
Network Flow Log (JSONL)¶
{
"timestamp": "2026-02-19T08:05:00.000Z",
"src_ip": "10.10.42.50",
"src_port": 49152,
"dst_ip": "198.51.100.42",
"dst_port": 443,
"protocol": "TCP",
"bytes_sent": 2097152,
"bytes_received": 4096,
"packets_sent": 1450,
"packets_received": 28,
"duration_seconds": 45,
"connection_state": "SF",
"application": "ssl",
"is_synthetic_anomaly": true,
"anomaly_type": "large_outbound_transfer"
}
Sample Data Snippets¶
Sample: 5 Windows Auth Events (Mixed TP/FP)¶
{"timestamp":"2026-02-19T07:45:00.000Z","event_id":4624,"computer":"DC01","user":"jsmith","domain":"MERIDIAN","logon_type":10,"src_ip":"192.0.2.12","src_country":"US","keywords":["Audit Success"]}
{"timestamp":"2026-02-19T07:51:00.001Z","event_id":4624,"computer":"DC01","user":"jsmith","domain":"MERIDIAN","logon_type":10,"src_ip":"203.0.113.15","src_country":"GB","keywords":["Audit Success"],"is_synthetic_anomaly":true,"anomaly_type":"impossible_travel"}
{"timestamp":"2026-02-19T07:55:00.000Z","event_id":4625,"computer":"CORP-WS-101","user":"jdoe","domain":"MERIDIAN","logon_type":2,"failure_reason":"Bad password","src_ip":"192.168.1.101","keywords":["Audit Failure"]}
{"timestamp":"2026-02-19T07:55:42.000Z","event_id":4625,"computer":"CORP-WS-101","user":"jdoe","domain":"MERIDIAN","logon_type":2,"failure_reason":"Bad password","src_ip":"192.168.1.101","keywords":["Audit Failure"]}
{"timestamp":"2026-02-19T07:56:01.000Z","event_id":4624,"computer":"CORP-WS-101","user":"jdoe","domain":"MERIDIAN","logon_type":2,"src_ip":"192.168.1.101","keywords":["Audit Success"]}
Sample: 5 Process Events (Mixed)¶
{"timestamp":"2026-02-19T06:12:00.000Z","host":"MONSVR01","user":"svc-monitoring","process_name":"powershell.exe","command_line":"powershell.exe -EncodedCommand JAByAGUAcwBvAHUAcgBjAGUAcwA=","parent_process_name":"taskschd.exe","is_synthetic_malicious":false,"fp_category":"monitoring_scheduled_task"}
{"timestamp":"2026-02-19T08:15:22.000Z","host":"FINANCE-WS-042","user":"mlopez","process_name":"powershell.exe","command_line":"powershell.exe -ExecutionPolicy Bypass -WindowStyle Hidden -enc JABjAG8AbgBuAGUAYwB0","parent_process_name":"explorer.exe","is_synthetic_malicious":true,"malicious_category":"encoded_powershell"}
{"timestamp":"2026-02-19T08:22:00.000Z","host":"FINANCE-WS-042","user":"mlopez","process_name":"cryptor.exe","command_line":"cryptor.exe --target C:\\Users --key [redacted]","parent_process_name":"powershell.exe","is_synthetic_malicious":true,"malicious_category":"ransomware"}
{"timestamp":"2026-02-19T08:04:00.000Z","host":"FINANCE-WS-042","user":"mlopez","process_name":"procdump.exe","command_line":"procdump.exe -ma lsass.exe C:\\Windows\\Temp\\lsass.dmp","parent_process_name":"cmd.exe","is_synthetic_malicious":true,"malicious_category":"credential_dumping"}
{"timestamp":"2026-02-19T07:30:00.000Z","host":"SRV-PATCH01","user":"IT-Admin","process_name":"powershell.exe","command_line":"powershell.exe -ep bypass -File deploy.ps1","parent_process_name":"sccm.exe","is_synthetic_malicious":false,"fp_category":"sccm_deployment"}
Data Quality Notes¶
- All timestamps are in ISO 8601 UTC format
- Hostnames follow the pattern
[DEPT]-[TYPE]-[NUMBER](e.g.,FINANCE-WS-042) - Usernames are fictional (jsmith, mlopez, agarcia, etc.) — no real individuals
- IP addresses: Internal use RFC 1918; External use RFC 5737 documentation ranges only
- The
is_synthetic_maliciousandis_synthetic_anomalyfields are ground truth labels for training/practice — they do not appear in real log data - Hash values are randomly generated and do not correspond to real malware
Lab 10 Threat Hunting Dataset¶
The threat hunting lab uses a larger, more realistic dataset generated by a dedicated script. It embeds five real TTPs at specific timestamps for structured discovery exercises.
Generating the Hunt Dataset¶
# Full scale (~380,000 events across 4 log types)
python generate-hunt-dataset.py --output ./hunt-data/
# Smaller scale for faster iteration
python generate-hunt-dataset.py --output ./hunt-data/ --scale 0.1
# Fixed seed for reproducibility
python generate-hunt-dataset.py --output ./hunt-data/ --seed 42
Generated Files¶
| File | Events (full scale) | Size | Contents |
|---|---|---|---|
process_creation.jsonl | 50,004 | ~9 MB | Sysmon Event ID 1; 4 embedded malicious processes |
network_connections.jsonl | 200,080 | ~32 MB | NetFlow; 80 C2 beacon events at 60s intervals |
auth_events.jsonl | 30,023 | ~5 MB | Windows auth; 3 Kerberoasting RC4 TGS requests |
dns_queries.jsonl | 100,000 | ~14 MB | DNS queries; no embedded tunneling (clean baseline) |
Embedded TTPs (Instructor Reference)¶
| TTP | Technique | Dataset | Time Window | Indicator |
|---|---|---|---|---|
| T1558.003 | Kerberoasting | auth_events | Day 2 ~14:14 | ticket_encryption_type: 0x17 (RC4) |
| T1003.001 | LSASS Dump via comsvcs | process_creation | Day 2 ~14:22 | comsvcs.dll MiniDump in command line |
| T1071.001 | C2 Beaconing | network_connections | Day 2 12:00+ | 60s intervals to 192.0.2.147:443; CV < 0.05 |
| T1047 | WMI Lateral Movement | process_creation | Day 2 ~15:05 | WmiPrvSE.exe parent → cmd.exe child |
| T1053.005 | Scheduled Task Persistence | process_creation | Day 2 ~15:12 | schtasks /create with WindowsUpdateHelper |
Schema Reference (process_creation.jsonl)¶
{
"event_id": 1,
"timestamp": "2026-03-02T14:22:13.000000",
"hostname": "WRK447",
"username": "jsmith",
"image": "C:\\Windows\\System32\\rundll32.exe",
"command_line": "C:\\Windows\\System32\\rundll32.exe C:\\Windows\\System32\\comsvcs.dll MiniDump 624 C:\\Windows\\Temp\\lsass.dmp full",
"parent_image": "C:\\Windows\\System32\\cmd.exe",
"parent_command_line": "C:\\Windows\\System32\\cmd.exe",
"integrity_level": "High",
"process_id": 9124
}
Synthetic Data
The C2 IP 192.0.2.147 is from the RFC 5737 TEST-NET documentation range and is not routable on the internet. All hostnames, usernames, and IPs are fictional.
Generate early-lab datasets: generate-synthetic-logs.py Generate Lab 10 hunt dataset: generate-hunt-dataset.py