Skip to content

Chapter 19: OSINT and Reconnaissance

Overview

Open Source Intelligence (OSINT) is the systematic collection, analysis, and application of information from publicly available sources to support security operations, investigations, threat hunting, and adversary profiling. Before any attack, defenders can use the same reconnaissance techniques adversaries use — mapping the external attack surface, identifying exposed credentials, tracking threat actors, and building threat intelligence. This chapter covers the full OSINT methodology, footprinting techniques, people/corporate OSINT, dark web monitoring, and OPSEC for analysts.

Learning Objectives

By the end of this chapter, students SHALL be able to:

  1. Apply the Intelligence Cycle to OSINT collection campaigns
  2. Map an organization's external attack surface using passive and active footprinting
  3. Enumerate employees, email addresses, and credentials from public sources
  4. Conduct infrastructure OSINT using Shodan, Censys, and passive DNS
  5. Track threat actors across the surface, deep, and dark web
  6. Maintain analyst OPSEC throughout an investigation

Prerequisites

  • Basic understanding of DNS, HTTP, and web technologies
  • Familiarity with Linux command line
  • Understanding of IP addressing and BGP routing

Why This Matters

Every piece of information an attacker uses in a spearphishing email, every credential stuffing list, every exploitable service they target — was discovered through reconnaissance. In the Verizon DBIR 2024, credential theft and phishing accounted for over 77% of breaches. Defenders who perform proactive OSINT on their own organization reduce the attacker's information advantage. Threat intelligence analysts who track adversary infrastructure disrupt campaigns before they launch.


19.1 The OSINT Intelligence Cycle

flowchart LR
    A[1. PLANNING\nDefine requirements\n& key questions] --> B[2. COLLECTION\nGather raw data\nfrom sources]
    B --> C[3. PROCESSING\nNormalize, translate,\ndeduplicate]
    C --> D[4. ANALYSIS\nFuse, assess,\ninterpret]
    D --> E[5. PRODUCTION\nReport, brief,\ndisseminate]
    E --> F[6. FEEDBACK\nConsumer evaluation]
    F --> A

    style A fill:#1d3557,color:#fff
    style B fill:#457b9d,color:#fff
    style C fill:#a8dadc,color:#000
    style D fill:#f4a261,color:#000
    style E fill:#e63946,color:#fff
    style F fill:#2d6a4f,color:#fff

Intelligence Requirements (IRs) drive collection. Without a defined question, OSINT collection degenerates into data hoarding. Examples: - "What email addresses are exposed in breach databases for our domain?" - "What external ports/services are accessible from the internet on our IP ranges?" - "Is threat actor TA577 targeting our industry sector this quarter?"


19.2 Passive DNS and IP Footprinting

Passive reconnaissance gathers information without sending packets to the target.

19.2.1 DNS Enumeration

# Basic DNS record enumeration
dig @8.8.8.8 target.com ANY
dig @8.8.8.8 target.com MX
dig @8.8.8.8 target.com TXT  # SPF, DKIM, DMARC — reveals email providers
dig @8.8.8.8 target.com NS

# Zone transfer attempt (rarely succeeds but worth checking)
dig @ns1.target.com target.com AXFR

# Subdomain brute force
dnsx -l subdomains.txt -d target.com -a -aaaa -cname -mx -ns -silent
subfinder -d target.com -all -recursive -silent | tee subdomains.txt

# Certificate Transparency logs (crt.sh)
curl -s "https://crt.sh/?q=%.target.com&output=json" | \
  python3 -c "import sys,json; [print(e['name_value']) for e in json.load(sys.stdin)]" | \
  sort -u

# Passive DNS — historical resolution
# SecurityTrails, RiskIQ, VirusTotal, Circl.lu pDNS

19.2.2 ASN and IP Range Mapping

# Find ASN for organization
whois -h whois.radb.net -- '-i origin AS12345'
# bgp.he.net — BGP toolkit

# Get all IP ranges for an ASN
whois -h whois.radb.net AS12345 | grep route

# Reverse WHOIS — find all domains registered by an entity
# Domaintools, SecurityTrails, ViewDNS

# AMASS — comprehensive subdomain/IP enumeration
amass enum -d target.com -passive -o amass_output.txt
amass intel -org "Target Corporation" -max-dns-queries 20000

19.3 Internet-Wide Scanning Intelligence

19.3.1 Shodan

Shodan indexes internet-connected devices — servers, cameras, industrial control systems, printers, routers. It continuously scans the entire IPv4 space and stores banner data.

import shodan

api = shodan.Shodan("YOUR_API_KEY")

# Search for services exposing target org
results = api.search('org:"Target Corporation" port:22,443,3389,8080')
for r in results['matches']:
    print(f"{r['ip_str']}:{r.get('port')} | {r.get('product','')} | {r.get('vulns','')}")

# Find exposed databases
results = api.search('org:"Target Corporation" product:"MongoDB"')

# Find SSL certs for org
results = api.search('ssl.cert.subject.cn:"*.target.com"')

# Specific CVE exposure
results = api.search('vuln:CVE-2021-44228')  # Log4Shell

# Shodan CLI
shodan search 'org:"Target Corp" http.title:"admin"'
shodan host 203.0.113.45

19.3.2 Censys

from censys.search import CensysHosts

h = CensysHosts(api_id="...", api_secret="...")

# Search for org's services
for cert, hosts in h.search(
    'parsed.subject.organization:"Target Corporation"',
    fields=["ip", "services.port", "services.service_name"]
):
    print(cert, hosts)

19.3.3 Attack Surface Mapping Matrix

Source What It Reveals Free Tier
Shodan Open ports, banners, products, CVEs, default creds 100 results
Censys TLS certs, services, HTTP metadata 250 queries/month
GreyNoise Internet noise vs. targeted scanning Limited
crt.sh Certificate transparency (all issued certs) Unlimited
Subfinder/Amass Subdomains via passive sources Open source
SecurityTrails Historical DNS, WHOIS, passive DNS 50 req/month
BuiltWith Web technologies, CMS, analytics Limited
WaybackMachine Historical web content, old endpoints Unlimited
GitHub/GitLab Exposed code, keys, configs Requires account

19.4 Credential and Data Breach Intelligence

19.4.1 Breach Monitoring

# Have I Been Pwned API — check email/domain
curl -H "hibp-api-key: YOUR_KEY" \
  "https://haveibeenpwned.com/api/v3/breachedaccount/user@target.com"

# Domain-level breach check
curl -H "hibp-api-key: YOUR_KEY" \
  "https://haveibeenpwned.com/api/v3/breaches?domain=target.com"

# Dehashed — search by domain (operator intelligence)
curl -X GET "https://api.dehashed.com/search?query=domain:target.com" \
  -H "Authorization: Basic BASE64_CREDS" | python3 -m json.tool

# Free alternatives
# pwndb (Tor) — older breach aggregator
# leak-lookup.com — paid but comprehensive
# IntelX (intelligencex.io) — OSINT search engine including Tor, I2P

19.4.2 GitHub Secrets Hunting

Developers inadvertently commit API keys, credentials, and private keys to public repositories.

# TruffleHog — high-entropy string detection
trufflehog github --org=TargetCorp --only-verified

# Gitleaks — scan specific repo
gitleaks detect --source=https://github.com/TargetCorp/repo

# Manual GitHub search operators
# site:github.com "target.com" "password" OR "secret" OR "api_key"
# site:github.com "target.com" "BEGIN RSA PRIVATE KEY"
# site:github.com inurl:TargetCorp "AKIA" (AWS keys start with AKIA)

# GitHub Code Search API
gh api search/code?q="target.com+password+filename:.env"

19.4.3 Pastebin / Code Paste Monitoring

# Pulsedive, IntelX, pastebin Google dork
# site:pastebin.com "target.com" after:2024-01-01
# site:paste.ee "target.com" "password"
# site:controlc.com "@target.com"

19.5 Corporate and People OSINT

19.5.1 LinkedIn Organizational Mapping

LinkedIn provides organizational intelligence for understanding an org's employees, technologies used, and contact points for social engineering assessment.

# linkedin2username — generate potential usernames
python3 linkedin2username.py -u your_linkedin_email -p 'password' -c "Target Corp"

# Email format generation from names
# Common formats: firstname.lastname@, f.lastname@, firstname@
# Tools: hunter.io, phonebook.cz, clearbit

# Hunter.io — find email format + specific emails
curl "https://api.hunter.io/v2/domain-search?domain=target.com&api_key=KEY" | \
  python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(d['pattern']); [print(e['value']) for e in d['emails']]"

19.5.2 Email Harvesting

# theHarvester — comprehensive email/subdomain/name harvesting
theHarvester -d target.com -l 500 -b google,bing,linkedin,hunter,shodan -f output.html

# phonebook.cz (free)
# emailformat.com
# skrapp.io
# voilanorbert.com

19.5.3 OSINT Framework Reference

The OSINT Framework (osintframework.com) provides a categorized tree of free OSINT tools organized by:

  • Username lookup
  • Email address
  • Domain name
  • IP address / Netblock
  • Image / Video / Metadata
  • Social networks (Twitter/X, Facebook, Instagram, TikTok)
  • Phone number
  • Public records
  • Geospatial / Satellite imagery
  • Dark web

19.6 Social Media Intelligence (SOCMINT)

19.6.1 Platform-Specific Techniques

# Twitter/X — tracking threat actor accounts
# Twint (archived) or snscrape
snscrape twitter-search "#ransomware site:pastebin.com" > tweets.txt
snscrape twitter-user ThreatActorHandle > history.txt

# Telegram — threat actor channels
# Telegram is primary ransomware communication platform
# Tools: telethon (Python library), TeleTracker

# Reddit — exposure in posts/comments
# site:reddit.com "target.com" password
# site:reddit.com "target.com" internal

# Discord — increasingly used by threat actors
# Discord OSINT Tool (DISCORDOSER), Dis.cool

# Facebook — corporate page employee enumeration
# StalkScan (FBID lookup), Lookup-ID

19.6.2 Image Intelligence (IMINT)

# Reverse image search
# Google Lens, TinEye, Yandex Images (best for faces)
# Bing Visual Search

# ExifTool — metadata extraction from images
exiftool image.jpg | grep -E "GPS|Author|Software|Created"

# GPS coordinates from EXIF
exiftool -csv -GPSLatitude -GPSLongitude photos/*.jpg

# Facial recognition OSINT (legal/ethical considerations apply)
# PimEyes (face search engine)
# FaceCheck.id

19.7 Dark Web Monitoring

19.7.1 Threat Actor Infrastructure

Surface Web: Clearnet sites, public forums, news
Deep Web: Authenticated content, databases not indexed by search
Dark Web: Tor (.onion), I2P, Freenet — requires special software

Legal and OPSEC Reminder

Accessing dark web sites for intelligence purposes is legal in most jurisdictions. However, purchasing any goods/services, downloading CSAM, or any illegal activity is criminal. Maintain strict OPSEC when monitoring dark web sources: dedicated hardware, Tails OS, no login to personal accounts, isolated network.

# Tor Browser — access .onion sites
# Tails OS — amnesic live system for maximum OPSEC

# Dark web monitoring services (commercial)
# Digital Shadows (Relative Insight)
# Recorded Future — dark web module
# Flare Systems — automated dark web monitoring
# Kela Darkbeast
# DarkOwl Vision

# Free monitoring
# OnionSearch (aggregates multiple dark web search engines)
# Ahmia.fi — surface-accessible dark web search engine
# IntelX.io — indexes dark web content including I2P and Tor

19.7.2 Ransomware Data Leak Sites

Ransomware groups operate "shame sites" on Tor listing non-paying victims.

Group .onion Site Status Notable
LockBit 3.0 Active (domain rotates) Largest number of victims
ALPHV/BlackCat Seized by FBI Feb 2024 First to list hospital patient data
Cl0p Active Lists bulk victims from zero-day campaigns
RansomHub Active Rapidly growing affiliate program
# Monitoring data leak sites
# RansomWatch (ransomwatch.telemetry.ltd) — aggregates leak site content
# DarkFeed.io — commercial threat intelligence feed
# Ransomware.live — community tracking site

19.8 Metadata and Document Intelligence

19.8.1 Document Metadata Extraction

Documents published on organizational websites often contain: - Author name → employee enumeration - Software version → vulnerability identification - Internal path (C:\Users\john.doe...) → username disclosure - Domain name (creator.corp.local) → internal structure

# FOCA — automated document metadata extraction
# Metagoofil — download and analyze documents from target domain
metagoofil -d target.com -t pdf,docx,xlsx,pptx -l 200 -o metadata_output/

# ExifTool batch processing
exiftool -csv *.pdf *.docx *.xlsx | grep -E "Author|Creator|Last|Company|Template"

# ManTech's FOCA equivalent
# FOCA (Fingerprinting Organizations with Collected Archives)

19.9 Analyst OPSEC

When conducting OSINT on threat actors, maintaining operational security prevents the target from detecting the investigation and modifying their infrastructure.

19.9.1 OPSEC Architecture

graph TB
    A[Analyst Workstation\nFull identity exposure] --> B{Operation\nType?}
    B -->|Passive/safe| C[VPN\nCommercial paid anonymously]
    B -->|Sensitive| D[Dedicated VM\nTails OS]
    B -->|Dark web| E[Dedicated Hardware\nTails OS + Tor]
    C --> F[Target Investigation]
    D --> G[Social Media\nOSINT]
    E --> H[Dark Web\nMonitoring]
    F --> I[Secure Notes\nEncrypted]
    G --> I
    H --> I

    style E fill:#e63946,color:#fff
    style H fill:#780000,color:#fff

19.9.2 OPSEC Rules for OSINT Analysts

  1. Never use personal accounts for research — create dedicated research personas with no personal link
  2. Use separate browsers for different investigations — never mix personal browsing
  3. VPN or Tor before connecting to anything that could log your IP
  4. Screenshot, don't visit — paste URLs into web.archive.org or URLScan.io instead of visiting directly
  5. Watermark awareness — some threat actors embed unique canary tokens in files; never open files from threat actors
  6. Social media — research profiles don't follow each other and don't interact with targets
  7. Phone numbers — use Google Voice or Burner for any accounts requiring phone verification

19.10 OSINT Automation with Maltego and SpiderFoot

19.10.1 Maltego Transforms

Maltego is the industry standard for link analysis and OSINT automation. It uses "transforms" to pivot from one entity type to another.

Common Maltego pivot chains:
Domain → DNS Records → IP Addresses → Organizations → People
Email Address → Social Profiles → Related Accounts → Location
IP Address → Open Ports → Services → CVEs

19.10.2 SpiderFoot Automation

# SpiderFoot — automated OSINT platform (self-hosted)
pip3 install spiderfoot
python3 sf.py -l 127.0.0.1:5001

# CLI scan
python3 sfcli.py -s "target.com" -t INTERNET_NAME -m sfp_dnsresolve,sfp_shodan,sfp_hibp

# Modules available: 200+ covering DNS, ports, emails, breaches,
# social media, dark web, reputation, geolocation

19.11 Benchmark Controls

Control ID Title Requirement
Nexus SecOps-OSINT-01 External Attack Surface Monitoring Continuous monitoring of external footprint
Nexus SecOps-OSINT-02 Credential Breach Monitoring Automated breach notification and password reset workflow
Nexus SecOps-OSINT-03 Code Repository Monitoring Automated scanning of public repos for exposed secrets
Nexus SecOps-OSINT-04 Dark Web Monitoring Subscription to at least one dark web monitoring service
Nexus SecOps-OSINT-05 OSINT Analyst OPSEC Documented procedures for safe threat actor investigation
Nexus SecOps-OSINT-06 Document Metadata Scrubbing Policy and tooling to remove metadata from externally published documents

Exam Prep & Certifications

Relevant Certifications

The topics in this chapter align with the following certifications:

  • OSCP — Domains: Information Gathering, Reconnaissance
  • GIAC GOSI — Domains: Open Source Intelligence, Collection, Analysis

View full Certifications Roadmap →

Key Terms

ASN (Autonomous System Number) — A globally unique number assigned to an autonomous system by a regional internet registry (ARIN, RIPE, APNIC). Used to identify the routing domain of an organization.

Certificate Transparency — An open framework (RFC 6962) requiring all publicly trusted TLS certificate issuances to be logged in public, append-only logs. Analysts use CT logs to discover subdomains via crt.sh.

Passive DNS — Historical records of DNS resolutions collected by sensors worldwide. Allows analysts to reconstruct what domain names resolved to what IP addresses at any point in time.

Shodan — A search engine that continuously scans the entire IPv4 internet and indexes banner data, certificates, and service metadata from every open port.

SOCMINT — Social Media Intelligence — intelligence derived from social media platforms.

theHarvester — An open-source OSINT tool that aggregates emails, subdomains, virtual hosts, and employee names from multiple public sources.