Chapter 19: OSINT and Reconnaissance¶

Overview¶

Open Source Intelligence (OSINT) is the systematic collection, analysis, and application of information from publicly available sources to support security operations, investigations, threat hunting, and adversary profiling. Before any attack, defenders can use the same reconnaissance techniques adversaries use — mapping the external attack surface, identifying exposed credentials, tracking threat actors, and building threat intelligence. This chapter covers the full OSINT methodology, footprinting techniques, people/corporate OSINT, dark web monitoring, and OPSEC for analysts.

Learning Objectives¶

By the end of this chapter, students SHALL be able to:

Apply the Intelligence Cycle to OSINT collection campaigns
Map an organization's external attack surface using passive and active footprinting
Enumerate employees, email addresses, and credentials from public sources
Conduct infrastructure OSINT using Shodan, Censys, and passive DNS
Track threat actors across the surface, deep, and dark web
Maintain analyst OPSEC throughout an investigation

Prerequisites¶

Basic understanding of DNS, HTTP, and web technologies
Familiarity with Linux command line
Understanding of IP addressing and BGP routing

Why This Matters

Every piece of information an attacker uses in a spearphishing email, every credential stuffing list, every exploitable service they target — was discovered through reconnaissance. In the Verizon DBIR 2024, credential theft and phishing accounted for over 77% of breaches. Defenders who perform proactive OSINT on their own organization reduce the attacker's information advantage. Threat intelligence analysts who track adversary infrastructure disrupt campaigns before they launch.

19.1 The OSINT Intelligence Cycle¶

flowchart LR
    A[1. PLANNING\nDefine requirements\n& key questions] --> B[2. COLLECTION\nGather raw data\nfrom sources]
    B --> C[3. PROCESSING\nNormalize, translate,\ndeduplicate]
    C --> D[4. ANALYSIS\nFuse, assess,\ninterpret]
    D --> E[5. PRODUCTION\nReport, brief,\ndisseminate]
    E --> F[6. FEEDBACK\nConsumer evaluation]
    F --> A

    style A fill:#1d3557,color:#fff
    style B fill:#457b9d,color:#fff
    style C fill:#a8dadc,color:#000
    style D fill:#f4a261,color:#000
    style E fill:#e63946,color:#fff
    style F fill:#2d6a4f,color:#fff

Intelligence Requirements (IRs) drive collection. Without a defined question, OSINT collection degenerates into data hoarding. Examples: - "What email addresses are exposed in breach databases for our domain?" - "What external ports/services are accessible from the internet on our IP ranges?" - "Is threat actor TA577 targeting our industry sector this quarter?"

19.2 Passive DNS and IP Footprinting¶

Passive reconnaissance gathers information without sending packets to the target.

19.2.1 DNS Enumeration¶

# Basic DNS record enumeration
dig @8.8.8.8 target.com ANY
dig @8.8.8.8 target.com MX
dig @8.8.8.8 target.com TXT  # SPF, DKIM, DMARC — reveals email providers
dig @8.8.8.8 target.com NS

# Zone transfer attempt (rarely succeeds but worth checking)
dig @ns1.target.com target.com AXFR

# Subdomain brute force
dnsx -l subdomains.txt -d target.com -a -aaaa -cname -mx -ns -silent
subfinder -d target.com -all -recursive -silent | tee subdomains.txt

# Certificate Transparency logs (crt.sh)
curl -s "https://crt.sh/?q=%.target.com&output=json" | \
  python3 -c "import sys,json; [print(e['name_value']) for e in json.load(sys.stdin)]" | \
  sort -u

# Passive DNS — historical resolution
# SecurityTrails, RiskIQ, VirusTotal, Circl.lu pDNS

19.2.2 ASN and IP Range Mapping¶

# Find ASN for organization
whois -h whois.radb.net -- '-i origin AS12345'
# bgp.he.net — BGP toolkit

# Get all IP ranges for an ASN
whois -h whois.radb.net AS12345 | grep route

# Reverse WHOIS — find all domains registered by an entity
# Domaintools, SecurityTrails, ViewDNS

# AMASS — comprehensive subdomain/IP enumeration
amass enum -d target.com -passive -o amass_output.txt
amass intel -org "Target Corporation" -max-dns-queries 20000

19.3 Internet-Wide Scanning Intelligence¶

19.3.1 Shodan¶

Shodan indexes internet-connected devices — servers, cameras, industrial control systems, printers, routers. It continuously scans the entire IPv4 space and stores banner data.

import shodan

api = shodan.Shodan("YOUR_API_KEY")

# Search for services exposing target org
results = api.search('org:"Target Corporation" port:22,443,3389,8080')
for r in results['matches']:
    print(f"{r['ip_str']}:{r.get('port')} | {r.get('product','')} | {r.get('vulns','')}")

# Find exposed databases
results = api.search('org:"Target Corporation" product:"MongoDB"')

# Find SSL certs for org
results = api.search('ssl.cert.subject.cn:"*.target.com"')

# Specific CVE exposure
results = api.search('vuln:CVE-2021-44228')  # Log4Shell

# Shodan CLI
shodan search 'org:"Target Corp" http.title:"admin"'
shodan host 203.0.113.45

19.3.2 Censys¶

from censys.search import CensysHosts

h = CensysHosts(api_id="...", api_secret="...")

# Search for org's services
for cert, hosts in h.search(
    'parsed.subject.organization:"Target Corporation"',
    fields=["ip", "services.port", "services.service_name"]
):
    print(cert, hosts)

19.3.3 Attack Surface Mapping Matrix¶

Source	What It Reveals	Free Tier
Shodan	Open ports, banners, products, CVEs, default creds	100 results
Censys	TLS certs, services, HTTP metadata	250 queries/month
GreyNoise	Internet noise vs. targeted scanning	Limited
crt.sh	Certificate transparency (all issued certs)	Unlimited
Subfinder/Amass	Subdomains via passive sources	Open source
SecurityTrails	Historical DNS, WHOIS, passive DNS	50 req/month
BuiltWith	Web technologies, CMS, analytics	Limited
WaybackMachine	Historical web content, old endpoints	Unlimited
GitHub/GitLab	Exposed code, keys, configs	Requires account

19.4 Credential and Data Breach Intelligence¶

19.4.1 Breach Monitoring¶

# Have I Been Pwned API — check email/domain
curl -H "hibp-api-key: YOUR_KEY" \
  "https://haveibeenpwned.com/api/v3/breachedaccount/user@target.com"

# Domain-level breach check
curl -H "hibp-api-key: YOUR_KEY" \
  "https://haveibeenpwned.com/api/v3/breaches?domain=target.com"

# Dehashed — search by domain (operator intelligence)
curl -X GET "https://api.dehashed.com/search?query=domain:target.com" \
  -H "Authorization: Basic BASE64_CREDS" | python3 -m json.tool

# Free alternatives
# pwndb (Tor) — older breach aggregator
# leak-lookup.com — paid but comprehensive
# IntelX (intelligencex.io) — OSINT search engine including Tor, I2P

19.4.2 GitHub Secrets Hunting¶

Developers inadvertently commit API keys, credentials, and private keys to public repositories.

# TruffleHog — high-entropy string detection
trufflehog github --org=TargetCorp --only-verified

# Gitleaks — scan specific repo
gitleaks detect --source=https://github.com/TargetCorp/repo

# Manual GitHub search operators
# site:github.com "target.com" "password" OR "secret" OR "api_key"
# site:github.com "target.com" "BEGIN RSA PRIVATE KEY"
# site:github.com inurl:TargetCorp "AKIA" (AWS keys start with AKIA)

# GitHub Code Search API
gh api search/code?q="target.com+password+filename:.env"

19.4.3 Pastebin / Code Paste Monitoring¶

# Pulsedive, IntelX, pastebin Google dork
# site:pastebin.com "target.com" after:2024-01-01
# site:paste.ee "target.com" "password"
# site:controlc.com "@target.com"

19.5 Corporate and People OSINT¶

19.5.1 LinkedIn Organizational Mapping¶

LinkedIn provides organizational intelligence for understanding an org's employees, technologies used, and contact points for social engineering assessment.

# linkedin2username — generate potential usernames
python3 linkedin2username.py -u your_linkedin_email -p 'password' -c "Target Corp"

# Email format generation from names
# Common formats: firstname.lastname@, f.lastname@, firstname@
# Tools: hunter.io, phonebook.cz, clearbit

# Hunter.io — find email format + specific emails
curl "https://api.hunter.io/v2/domain-search?domain=target.com&api_key=KEY" | \
  python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(d['pattern']); [print(e['value']) for e in d['emails']]"

19.5.2 Email Harvesting¶

# theHarvester — comprehensive email/subdomain/name harvesting
theHarvester -d target.com -l 500 -b google,bing,linkedin,hunter,shodan -f output.html

# phonebook.cz (free)
# emailformat.com
# skrapp.io
# voilanorbert.com

19.5.3 OSINT Framework Reference¶

The OSINT Framework (osintframework.com) provides a categorized tree of free OSINT tools organized by:

Username lookup
Email address
Domain name
IP address / Netblock
Image / Video / Metadata
Social networks (Twitter/X, Facebook, Instagram, TikTok)
Phone number
Public records
Geospatial / Satellite imagery
Dark web

19.6.1 Platform-Specific Techniques¶

# Twitter/X — tracking threat actor accounts
# Twint (archived) or snscrape
snscrape twitter-search "#ransomware site:pastebin.com" > tweets.txt
snscrape twitter-user ThreatActorHandle > history.txt

# Telegram — threat actor channels
# Telegram is primary ransomware communication platform
# Tools: telethon (Python library), TeleTracker

# Reddit — exposure in posts/comments
# site:reddit.com "target.com" password
# site:reddit.com "target.com" internal

# Discord — increasingly used by threat actors
# Discord OSINT Tool (DISCORDOSER), Dis.cool

# Facebook — corporate page employee enumeration
# StalkScan (FBID lookup), Lookup-ID

19.6.2 Image Intelligence (IMINT)¶

# Reverse image search
# Google Lens, TinEye, Yandex Images (best for faces)
# Bing Visual Search

# ExifTool — metadata extraction from images
exiftool image.jpg | grep -E "GPS|Author|Software|Created"

# GPS coordinates from EXIF
exiftool -csv -GPSLatitude -GPSLongitude photos/*.jpg

# Facial recognition OSINT (legal/ethical considerations apply)
# PimEyes (face search engine)
# FaceCheck.id

19.7 Dark Web Monitoring¶

19.7.1 Threat Actor Infrastructure¶

Surface Web: Clearnet sites, public forums, news
Deep Web: Authenticated content, databases not indexed by search
Dark Web: Tor (.onion), I2P, Freenet — requires special software

Legal and OPSEC Reminder

Accessing dark web sites for intelligence purposes is legal in most jurisdictions. However, purchasing any goods/services, downloading CSAM, or any illegal activity is criminal. Maintain strict OPSEC when monitoring dark web sources: dedicated hardware, Tails OS, no login to personal accounts, isolated network.

# Tor Browser — access .onion sites
# Tails OS — amnesic live system for maximum OPSEC

# Dark web monitoring services (commercial)
# Digital Shadows (Relative Insight)
# Recorded Future — dark web module
# Flare Systems — automated dark web monitoring
# Kela Darkbeast
# DarkOwl Vision

# Free monitoring
# OnionSearch (aggregates multiple dark web search engines)
# Ahmia.fi — surface-accessible dark web search engine
# IntelX.io — indexes dark web content including I2P and Tor

19.7.2 Ransomware Data Leak Sites¶

Ransomware groups operate "shame sites" on Tor listing non-paying victims.

Group	.onion Site Status	Notable
LockBit 3.0	Active (domain rotates)	Largest number of victims
ALPHV/BlackCat	Seized by FBI Feb 2024	First to list hospital patient data
Cl0p	Active	Lists bulk victims from zero-day campaigns
RansomHub	Active	Rapidly growing affiliate program

# Monitoring data leak sites
# RansomWatch (ransomwatch.telemetry.ltd) — aggregates leak site content
# DarkFeed.io — commercial threat intelligence feed
# Ransomware.live — community tracking site

19.8 Metadata and Document Intelligence¶

19.8.1 Document Metadata Extraction¶

Documents published on organizational websites often contain: - Author name → employee enumeration - Software version → vulnerability identification - Internal path (C:\Users\john.doe...) → username disclosure - Domain name (creator.corp.local) → internal structure

# FOCA — automated document metadata extraction
# Metagoofil — download and analyze documents from target domain
metagoofil -d target.com -t pdf,docx,xlsx,pptx -l 200 -o metadata_output/

# ExifTool batch processing
exiftool -csv *.pdf *.docx *.xlsx | grep -E "Author|Creator|Last|Company|Template"

# ManTech's FOCA equivalent
# FOCA (Fingerprinting Organizations with Collected Archives)

19.9 Analyst OPSEC¶

When conducting OSINT on threat actors, maintaining operational security prevents the target from detecting the investigation and modifying their infrastructure.

19.9.1 OPSEC Architecture¶

graph TB
    A[Analyst Workstation\nFull identity exposure] --> B{Operation\nType?}
    B -->|Passive/safe| C[VPN\nCommercial paid anonymously]
    B -->|Sensitive| D[Dedicated VM\nTails OS]
    B -->|Dark web| E[Dedicated Hardware\nTails OS + Tor]
    C --> F[Target Investigation]
    D --> G[Social Media\nOSINT]
    E --> H[Dark Web\nMonitoring]
    F --> I[Secure Notes\nEncrypted]
    G --> I
    H --> I

    style E fill:#e63946,color:#fff
    style H fill:#780000,color:#fff

19.9.2 OPSEC Rules for OSINT Analysts¶

Never use personal accounts for research — create dedicated research personas with no personal link
Use separate browsers for different investigations — never mix personal browsing
VPN or Tor before connecting to anything that could log your IP
Screenshot, don't visit — paste URLs into web.archive.org or URLScan.io instead of visiting directly
Watermark awareness — some threat actors embed unique canary tokens in files; never open files from threat actors
Social media — research profiles don't follow each other and don't interact with targets
Phone numbers — use Google Voice or Burner for any accounts requiring phone verification

19.10 OSINT Automation with Maltego and SpiderFoot¶

19.10.1 Maltego Transforms¶

Maltego is the industry standard for link analysis and OSINT automation. It uses "transforms" to pivot from one entity type to another.

Common Maltego pivot chains:
Domain → DNS Records → IP Addresses → Organizations → People
Email Address → Social Profiles → Related Accounts → Location
IP Address → Open Ports → Services → CVEs

19.10.2 SpiderFoot Automation¶

# SpiderFoot — automated OSINT platform (self-hosted)
pip3 install spiderfoot
python3 sf.py -l 127.0.0.1:5001

# CLI scan
python3 sfcli.py -s "target.com" -t INTERNET_NAME -m sfp_dnsresolve,sfp_shodan,sfp_hibp

# Modules available: 200+ covering DNS, ports, emails, breaches,
# social media, dark web, reputation, geolocation

19.11 Benchmark Controls¶

Control ID	Title	Requirement
Nexus SecOps-OSINT-01	External Attack Surface Monitoring	Continuous monitoring of external footprint
Nexus SecOps-OSINT-02	Credential Breach Monitoring	Automated breach notification and password reset workflow
Nexus SecOps-OSINT-03	Code Repository Monitoring	Automated scanning of public repos for exposed secrets
Nexus SecOps-OSINT-04	Dark Web Monitoring	Subscription to at least one dark web monitoring service
Nexus SecOps-OSINT-05	OSINT Analyst OPSEC	Documented procedures for safe threat actor investigation
Nexus SecOps-OSINT-06	Document Metadata Scrubbing	Policy and tooling to remove metadata from externally published documents

Exam Prep & Certifications¶

Relevant Certifications

The topics in this chapter align with the following certifications:

OSCP — Domains: Information Gathering, Reconnaissance
GIAC GOSI — Domains: Open Source Intelligence, Collection, Analysis

View full Certifications Roadmap →

Key Terms¶

ASN (Autonomous System Number) — A globally unique number assigned to an autonomous system by a regional internet registry (ARIN, RIPE, APNIC). Used to identify the routing domain of an organization.

Certificate Transparency — An open framework (RFC 6962) requiring all publicly trusted TLS certificate issuances to be logged in public, append-only logs. Analysts use CT logs to discover subdomains via crt.sh.

Passive DNS — Historical records of DNS resolutions collected by sensors worldwide. Allows analysts to reconstruct what domain names resolved to what IP addresses at any point in time.

Shodan — A search engine that continuously scans the entire IPv4 internet and indexes banner data, certificates, and service metadata from every open port.

SOCMINT — Social Media Intelligence — intelligence derived from social media platforms.

theHarvester — An open-source OSINT tool that aggregates emails, subdomains, virtual hosts, and employee names from multiple public sources.