Chapter 2: Telemetry & Log Sources¶
Learning Objectives¶
- Identify major telemetry sources (endpoint, network, cloud, identity)
- Explain the importance of log normalization and enrichment
- Compare common log formats and schema standards (Syslog, CEF, ECS, CIM)
- Design a telemetry collection strategy covering key attack surfaces
Prerequisites¶
- Chapter 1: Understanding of SOC functions
- Basic knowledge of operating systems (Windows, Linux)
- Familiarity with network protocols (TCP/IP, DNS, HTTP)
Key Concepts¶
Telemetry • Log Source • Log Normalization • EDR • SIEM • Syslog • Windows Event Logs
Curiosity Hook: The Invisible Breach¶
Attackers compromised a marketing workstation, moved laterally to a file server, exfiltrated 2GB of customer data—and left zero traces... or did they?
Reality: The data was there, but scattered across 8 different log sources that weren't being collected or correlated: - Endpoint: No command-line auditing enabled - Network: Firewall logs retained only 7 days (breach occurred 14 days prior) - Cloud: S3 bucket access logs not forwarded to SIEM - Identity: VPN logs not integrated
Lesson: You can't detect what you can't see. Comprehensive telemetry is the foundation of effective detection.
2.1 Telemetry Categories¶
Endpoint Telemetry¶
Sources: - Process Execution: What programs run, with what arguments - File Activity: File creation, modification, deletion - Registry Changes: Windows registry modifications - Network Connections: Outbound connections from endpoints - Authentication: Local and domain logins
Key Tools: - Windows Event Logs (built-in) - Sysmon (enhanced Windows logging) - EDR platforms (CrowdStrike, SentinelOne, Microsoft Defender) - Linux auditd
Critical Logs: - Windows Event ID 4688: Process creation (with command-line if enabled) - Windows Event ID 4624/4625: Successful/failed logins - Sysmon Event ID 1: Process creation (detailed) - Sysmon Event ID 3: Network connection - Sysmon Event ID 7: Image (DLL) loaded
Network Telemetry¶
Sources: - Firewall Logs: Permitted/denied connections - Proxy Logs: Web traffic, URLs visited - DNS Logs: Domain name resolution queries - IDS/IPS Alerts: Signature-based network detections - NetFlow/IPFIX: Network flow metadata (source, dest, ports, bytes)
Use Cases: - Detect C2 communication (beaconing patterns) - Identify data exfiltration (unusual outbound volumes) - Spot lateral movement (internal port scans, SMB/RDP)
Example: DNS Tunneling Detection
Query: 4f2a3b1c.malicious-domain.com
Query: 8e9d6c5a.malicious-domain.com
Pattern: High frequency, high entropy subdomains → possible data exfiltration via DNS
Cloud Telemetry¶
Sources: - AWS CloudTrail: API calls, resource changes - Azure Activity Logs: Subscription-level operations - GCP Cloud Audit Logs: Admin activity, data access - SaaS Application Logs: Office 365, Salesforce, etc.
Key Events: - IAM changes (new users, privilege escalation) - Resource creation (EC2 instances, storage buckets) - Data access (S3 GetObject calls) - Configuration changes (security group modifications)
Identity & Access Logs¶
Sources: - Active Directory Logs: Domain authentication, group changes - Identity Provider (IdP) Logs: SSO events (Okta, Azure AD) - VPN Logs: Remote access attempts - MFA Logs: Multi-factor authentication success/failure
Detection Opportunities: - Impossible travel (logins from distant locations in short time) - Unusual access patterns (employee accessing systems outside normal hours/roles) - Privilege escalation (account added to admin groups)
2.2 Log Normalization¶
Why Normalize?¶
Different sources produce different formats: - Syslog: <134>Jan 15 10:42:33 host sshd[1234]: Failed password for user - Windows Event Log: XML structure with EventID, TimeCreated, Computer - JSON: {"timestamp": "2026-02-15T10:42:33Z", "event": "login_failed"}
Challenge: Correlating across these formats is difficult.
Solution: Normalize to a common schema.
Common Schema Standards¶
| Standard | Platform | Key Fields |
|---|---|---|
| Syslog | Universal | Facility, Severity, Timestamp, Hostname, Message |
| CEF (Common Event Format) | ArcSight | Version, Device Vendor, Signature, Severity, Extension |
| ECS (Elastic Common Schema) | Elastic Stack | @timestamp, event.category, source.ip, user.name |
| CIM (Common Information Model) | Splunk | action, src, dest, user, signature |
Normalization Example¶
Raw Windows Event:
<Event>
<System>
<EventID>4688</EventID>
<Computer>WORKSTATION-01</Computer>
<TimeCreated SystemTime="2026-02-15T14:23:45Z"/>
</System>
<EventData>
<NewProcessName>C:\Windows\System32\cmd.exe</NewProcessName>
<SubjectUserName>jsmith</SubjectUserName>
</EventData>
</Event>
Normalized to ECS:
{
"@timestamp": "2026-02-15T14:23:45Z",
"event.category": "process",
"event.action": "process-started",
"host.name": "WORKSTATION-01",
"process.executable": "C:\\Windows\\System32\\cmd.exe",
"user.name": "jsmith",
"event.provider": "Microsoft-Windows-Security-Auditing",
"event.code": "4688"
}
Benefit: Now you can query across all log sources using consistent field names like user.name and event.action.
2.3 Log Enrichment¶
Definition: Adding contextual information to raw events to aid decision-making.
Enrichment Sources: - Asset Inventory: Hostname → Business criticality, owner, location - User Context: Username → Department, manager, typical behavior - Threat Intelligence: IP address → Known malicious, geolocation, reputation - Historical Behavior: User X typically accesses 5 systems; now accessing 50
Example:
Raw Alert: Failed login from 203.0.113.45 to admin account
Enriched:
- IP 203.0.113.45: TOR exit node (threat intel)
- Account "admin": Service account, should only authenticate from 10.0.1.5
- Historical: Zero failed logins for this account in past 90 days
- Asset: Target system is domain controller (critical asset)
Conclusion: High-priority alert
2.4 Data Retention & Compliance¶
Retention Considerations¶
Balancing Factors: - Investigation Needs: Incidents may be discovered weeks/months after occurrence - Storage Costs: Long-term retention of high-volume logs is expensive - Compliance Requirements: Regulations may mandate specific retention periods
Common Retention Strategies: | Log Type | Hot Storage (SIEM) | Cold Storage (Archive) | |----------|-------------------|------------------------| | Critical (auth, EDR) | 90 days | 2-7 years | | Network flows | 30 days | 1 year | | Proxy logs | 30 days | 1 year | | Application logs | 14 days | 6 months |
Compliance Frameworks¶
- PCI-DSS: 3 months readily available, 1 year archived
- HIPAA: 6 years for audit logs
- GDPR: "No longer than necessary" + data minimization
- SOX: 7 years for financial system logs
2.5 Collection Architecture¶
Centralized SIEM¶
Components: - Log Forwarders: Agents on endpoints/servers (Splunk UF, Elastic Beats, FluentD) - Log Collectors: Receive syslogs, API pulls - Indexers: Parse, normalize, store - Search Layer: Analysts query indexed data
Advantages: - Centralized search and correlation - Unified alerting - Simplified access control
Challenges: - Single point of failure (mitigate with HA) - Scaling costs - Performance at extreme scale
Data Lake Approach¶
Architecture: - Store raw logs in object storage (S3, Azure Blob) - Query using tools like Athena, Databricks, Splunk on S3 - Retain data in native formats
Advantages: - Lower storage costs - Flexibility in analysis tools - Supports AI/ML workflows (direct access to raw data)
Challenges: - Query performance may be slower than indexed SIEM - Requires more analyst skill (SQL, Spark)
MicroSim Embed¶
MicroSim 2: Correlation Rule Tuning
Adjust log correlation thresholds to balance alert volume and detection coverage.
Common Misconceptions¶
Misconception: More Logs = Better Security
Reality: Collecting logs without a detection strategy creates noise. Focus on actionable telemetry aligned with your use cases.
Misconception: Free Logs Are Enough
Reality: Native OS logs (Windows Event Logs, syslog) provide basic visibility but lack detail. Tools like Sysmon and EDR add critical context (command-lines, network connections, file hashes).
Practice Tasks¶
Task 1: Identify Missing Telemetry¶
Given this attack scenario, identify which log sources would detect each step:
Attack Chain: 1. Phishing email delivered to user 2. User opens malicious attachment, malware executes 3. Malware establishes C2 connection to external IP 4. Attacker performs credential dumping 5. Lateral movement via RDP to file server 6. Data exfiltrated via HTTPS to attacker-controlled server
Answers
- Email gateway logs, email security tools
- Endpoint logs (process creation), EDR (behavioral detection)
- Firewall logs, DNS logs, NetFlow, EDR
- Endpoint logs (process creation - e.g., mimikatz), Windows Security Event ID 4624 (unusual auth patterns)
- Windows Security Event ID 4624 (RDP logon type 10), NetFlow (internal RDP traffic), firewall logs
- Proxy logs, firewall logs, cloud access logs (if exfil to cloud storage), NetFlow (large data transfer)
Exam Prep & Certifications¶
Relevant Certifications
The topics in this chapter align with the following certifications:
- CompTIA Security+ — Domains: Security Operations, Security Architecture
- CompTIA CySA+ — Domains: Security Operations, Vulnerability Management
- GIAC GCIH — Domains: Incident Handling, Log Analysis
- CISSP — Domains: Security Operations, Communication and Network Security
Self-Assessment Quiz¶
Quiz 1: Which log source is most useful for detecting lateral movement within a network?
Options:
a) Cloud audit logs b) Internal firewall logs and Windows authentication logs c) Email gateway logs d) Public DNS logs
Answer
b) Internal firewall logs and Windows authentication logs
Lateral movement involves internal network traversal and authentication to other systems. Cloud/email/public DNS are less relevant for internal movement.
Quiz 2: What is the primary purpose of log normalization?
Options:
a) To compress logs for storage efficiency b) To convert logs to a common format for consistent querying c) To encrypt logs in transit d) To delete unnecessary log fields
Answer
b) To convert logs to a common format for consistent querying
Normalization maps diverse log formats to a standardized schema.
Quiz 3: Which Sysmon Event ID tracks process creation?
Options:
a) Event ID 1 b) Event ID 3 c) Event ID 4688 d) Event ID 7
Answer
a) Event ID 1
Sysmon Event ID 1 = Process creation. (ID 3 = Network connection, 4688 = Windows native process creation, ID 7 = Image loaded)
Summary¶
- Telemetry sources: Endpoint, network, cloud, identity logs provide comprehensive visibility
- Normalization: Standardize diverse formats (Syslog, CEF, ECS, CIM) for unified querying
- Enrichment: Add context (threat intel, asset data, user behavior) to improve triage
- Retention: Balance investigation needs, cost, and compliance requirements
- Architecture: Choose between SIEM (fast search, higher cost) and data lake (lower cost, more flexibility)