Skip to content

Chapter 2: Telemetry & Log Sources

Learning Objectives

  • Identify major telemetry sources (endpoint, network, cloud, identity)
  • Explain the importance of log normalization and enrichment
  • Compare common log formats and schema standards (Syslog, CEF, ECS, CIM)
  • Design a telemetry collection strategy covering key attack surfaces

Prerequisites

  • Chapter 1: Understanding of SOC functions
  • Basic knowledge of operating systems (Windows, Linux)
  • Familiarity with network protocols (TCP/IP, DNS, HTTP)

Key Concepts

TelemetryLog SourceLog NormalizationEDRSIEMSyslogWindows Event Logs


Curiosity Hook: The Invisible Breach

Attackers compromised a marketing workstation, moved laterally to a file server, exfiltrated 2GB of customer data—and left zero traces... or did they?

Reality: The data was there, but scattered across 8 different log sources that weren't being collected or correlated: - Endpoint: No command-line auditing enabled - Network: Firewall logs retained only 7 days (breach occurred 14 days prior) - Cloud: S3 bucket access logs not forwarded to SIEM - Identity: VPN logs not integrated

Lesson: You can't detect what you can't see. Comprehensive telemetry is the foundation of effective detection.


2.1 Telemetry Categories

Endpoint Telemetry

Sources: - Process Execution: What programs run, with what arguments - File Activity: File creation, modification, deletion - Registry Changes: Windows registry modifications - Network Connections: Outbound connections from endpoints - Authentication: Local and domain logins

Key Tools: - Windows Event Logs (built-in) - Sysmon (enhanced Windows logging) - EDR platforms (CrowdStrike, SentinelOne, Microsoft Defender) - Linux auditd

Critical Logs: - Windows Event ID 4688: Process creation (with command-line if enabled) - Windows Event ID 4624/4625: Successful/failed logins - Sysmon Event ID 1: Process creation (detailed) - Sysmon Event ID 3: Network connection - Sysmon Event ID 7: Image (DLL) loaded


Network Telemetry

Sources: - Firewall Logs: Permitted/denied connections - Proxy Logs: Web traffic, URLs visited - DNS Logs: Domain name resolution queries - IDS/IPS Alerts: Signature-based network detections - NetFlow/IPFIX: Network flow metadata (source, dest, ports, bytes)

Use Cases: - Detect C2 communication (beaconing patterns) - Identify data exfiltration (unusual outbound volumes) - Spot lateral movement (internal port scans, SMB/RDP)

Example: DNS Tunneling Detection

Query: 4f2a3b1c.malicious-domain.com
Query: 8e9d6c5a.malicious-domain.com
Pattern: High frequency, high entropy subdomains → possible data exfiltration via DNS


Cloud Telemetry

Sources: - AWS CloudTrail: API calls, resource changes - Azure Activity Logs: Subscription-level operations - GCP Cloud Audit Logs: Admin activity, data access - SaaS Application Logs: Office 365, Salesforce, etc.

Key Events: - IAM changes (new users, privilege escalation) - Resource creation (EC2 instances, storage buckets) - Data access (S3 GetObject calls) - Configuration changes (security group modifications)


Identity & Access Logs

Sources: - Active Directory Logs: Domain authentication, group changes - Identity Provider (IdP) Logs: SSO events (Okta, Azure AD) - VPN Logs: Remote access attempts - MFA Logs: Multi-factor authentication success/failure

Detection Opportunities: - Impossible travel (logins from distant locations in short time) - Unusual access patterns (employee accessing systems outside normal hours/roles) - Privilege escalation (account added to admin groups)


2.2 Log Normalization

Why Normalize?

Different sources produce different formats: - Syslog: <134>Jan 15 10:42:33 host sshd[1234]: Failed password for user - Windows Event Log: XML structure with EventID, TimeCreated, Computer - JSON: {"timestamp": "2026-02-15T10:42:33Z", "event": "login_failed"}

Challenge: Correlating across these formats is difficult.

Solution: Normalize to a common schema.


Common Schema Standards

Standard Platform Key Fields
Syslog Universal Facility, Severity, Timestamp, Hostname, Message
CEF (Common Event Format) ArcSight Version, Device Vendor, Signature, Severity, Extension
ECS (Elastic Common Schema) Elastic Stack @timestamp, event.category, source.ip, user.name
CIM (Common Information Model) Splunk action, src, dest, user, signature

Normalization Example

Raw Windows Event:

<Event>
  <System>
    <EventID>4688</EventID>
    <Computer>WORKSTATION-01</Computer>
    <TimeCreated SystemTime="2026-02-15T14:23:45Z"/>
  </System>
  <EventData>
    <NewProcessName>C:\Windows\System32\cmd.exe</NewProcessName>
    <SubjectUserName>jsmith</SubjectUserName>
  </EventData>
</Event>

Normalized to ECS:

{
  "@timestamp": "2026-02-15T14:23:45Z",
  "event.category": "process",
  "event.action": "process-started",
  "host.name": "WORKSTATION-01",
  "process.executable": "C:\\Windows\\System32\\cmd.exe",
  "user.name": "jsmith",
  "event.provider": "Microsoft-Windows-Security-Auditing",
  "event.code": "4688"
}

Benefit: Now you can query across all log sources using consistent field names like user.name and event.action.


2.3 Log Enrichment

Definition: Adding contextual information to raw events to aid decision-making.

Enrichment Sources: - Asset Inventory: Hostname → Business criticality, owner, location - User Context: Username → Department, manager, typical behavior - Threat Intelligence: IP address → Known malicious, geolocation, reputation - Historical Behavior: User X typically accesses 5 systems; now accessing 50

Example:

Raw Alert: Failed login from 203.0.113.45 to admin account
Enriched:
- IP 203.0.113.45: TOR exit node (threat intel)
- Account "admin": Service account, should only authenticate from 10.0.1.5
- Historical: Zero failed logins for this account in past 90 days
- Asset: Target system is domain controller (critical asset)

Conclusion: High-priority alert


2.4 Data Retention & Compliance

Retention Considerations

Balancing Factors: - Investigation Needs: Incidents may be discovered weeks/months after occurrence - Storage Costs: Long-term retention of high-volume logs is expensive - Compliance Requirements: Regulations may mandate specific retention periods

Common Retention Strategies: | Log Type | Hot Storage (SIEM) | Cold Storage (Archive) | |----------|-------------------|------------------------| | Critical (auth, EDR) | 90 days | 2-7 years | | Network flows | 30 days | 1 year | | Proxy logs | 30 days | 1 year | | Application logs | 14 days | 6 months |


Compliance Frameworks

  • PCI-DSS: 3 months readily available, 1 year archived
  • HIPAA: 6 years for audit logs
  • GDPR: "No longer than necessary" + data minimization
  • SOX: 7 years for financial system logs

2.5 Collection Architecture

Centralized SIEM

Components: - Log Forwarders: Agents on endpoints/servers (Splunk UF, Elastic Beats, FluentD) - Log Collectors: Receive syslogs, API pulls - Indexers: Parse, normalize, store - Search Layer: Analysts query indexed data

Advantages: - Centralized search and correlation - Unified alerting - Simplified access control

Challenges: - Single point of failure (mitigate with HA) - Scaling costs - Performance at extreme scale


Data Lake Approach

Architecture: - Store raw logs in object storage (S3, Azure Blob) - Query using tools like Athena, Databricks, Splunk on S3 - Retain data in native formats

Advantages: - Lower storage costs - Flexibility in analysis tools - Supports AI/ML workflows (direct access to raw data)

Challenges: - Query performance may be slower than indexed SIEM - Requires more analyst skill (SQL, Spark)


MicroSim Embed

MicroSim 2: Correlation Rule Tuning

Adjust log correlation thresholds to balance alert volume and detection coverage.


Common Misconceptions

Misconception: More Logs = Better Security

Reality: Collecting logs without a detection strategy creates noise. Focus on actionable telemetry aligned with your use cases.

Misconception: Free Logs Are Enough

Reality: Native OS logs (Windows Event Logs, syslog) provide basic visibility but lack detail. Tools like Sysmon and EDR add critical context (command-lines, network connections, file hashes).


Practice Tasks

Task 1: Identify Missing Telemetry

Given this attack scenario, identify which log sources would detect each step:

Attack Chain: 1. Phishing email delivered to user 2. User opens malicious attachment, malware executes 3. Malware establishes C2 connection to external IP 4. Attacker performs credential dumping 5. Lateral movement via RDP to file server 6. Data exfiltrated via HTTPS to attacker-controlled server

Answers
  1. Email gateway logs, email security tools
  2. Endpoint logs (process creation), EDR (behavioral detection)
  3. Firewall logs, DNS logs, NetFlow, EDR
  4. Endpoint logs (process creation - e.g., mimikatz), Windows Security Event ID 4624 (unusual auth patterns)
  5. Windows Security Event ID 4624 (RDP logon type 10), NetFlow (internal RDP traffic), firewall logs
  6. Proxy logs, firewall logs, cloud access logs (if exfil to cloud storage), NetFlow (large data transfer)

Exam Prep & Certifications

Relevant Certifications

The topics in this chapter align with the following certifications:

  • CompTIA Security+ — Domains: Security Operations, Security Architecture
  • CompTIA CySA+ — Domains: Security Operations, Vulnerability Management
  • GIAC GCIH — Domains: Incident Handling, Log Analysis
  • CISSP — Domains: Security Operations, Communication and Network Security

View full Certifications Roadmap →

Self-Assessment Quiz

Quiz 1: Which log source is most useful for detecting lateral movement within a network?

Options:

a) Cloud audit logs b) Internal firewall logs and Windows authentication logs c) Email gateway logs d) Public DNS logs

Answer

b) Internal firewall logs and Windows authentication logs

Lateral movement involves internal network traversal and authentication to other systems. Cloud/email/public DNS are less relevant for internal movement.

Quiz 2: What is the primary purpose of log normalization?

Options:

a) To compress logs for storage efficiency b) To convert logs to a common format for consistent querying c) To encrypt logs in transit d) To delete unnecessary log fields

Answer

b) To convert logs to a common format for consistent querying

Normalization maps diverse log formats to a standardized schema.

Quiz 3: Which Sysmon Event ID tracks process creation?

Options:

a) Event ID 1 b) Event ID 3 c) Event ID 4688 d) Event ID 7

Answer

a) Event ID 1

Sysmon Event ID 1 = Process creation. (ID 3 = Network connection, 4688 = Windows native process creation, ID 7 = Image loaded)


Summary

  • Telemetry sources: Endpoint, network, cloud, identity logs provide comprehensive visibility
  • Normalization: Standardize diverse formats (Syslog, CEF, ECS, CIM) for unified querying
  • Enrichment: Add context (threat intel, asset data, user behavior) to improve triage
  • Retention: Balance investigation needs, cost, and compliance requirements
  • Architecture: Choose between SIEM (fast search, higher cost) and data lake (lower cost, more flexibility)

Next: Chapter 3: SIEM & Data Lake Basics →