Chapter 19: OSINT and Reconnaissance¶
Overview¶
Open Source Intelligence (OSINT) is the systematic collection, analysis, and application of information from publicly available sources to support security operations, investigations, threat hunting, and adversary profiling. Before any attack, defenders can use the same reconnaissance techniques adversaries use — mapping the external attack surface, identifying exposed credentials, tracking threat actors, and building threat intelligence. This chapter covers the full OSINT methodology, footprinting techniques, people/corporate OSINT, dark web monitoring, and OPSEC for analysts.
Learning Objectives¶
By the end of this chapter, students SHALL be able to:
- Apply the Intelligence Cycle to OSINT collection campaigns
- Map an organization's external attack surface using passive and active footprinting
- Enumerate employees, email addresses, and credentials from public sources
- Conduct infrastructure OSINT using Shodan, Censys, and passive DNS
- Track threat actors across the surface, deep, and dark web
- Maintain analyst OPSEC throughout an investigation
Prerequisites¶
- Basic understanding of DNS, HTTP, and web technologies
- Familiarity with Linux command line
- Understanding of IP addressing and BGP routing
Why This Matters
Every piece of information an attacker uses in a spearphishing email, every credential stuffing list, every exploitable service they target — was discovered through reconnaissance. In the Verizon DBIR 2024, credential theft and phishing accounted for over 77% of breaches. Defenders who perform proactive OSINT on their own organization reduce the attacker's information advantage. Threat intelligence analysts who track adversary infrastructure disrupt campaigns before they launch.
19.1 The OSINT Intelligence Cycle¶
flowchart LR
A[1. PLANNING\nDefine requirements\n& key questions] --> B[2. COLLECTION\nGather raw data\nfrom sources]
B --> C[3. PROCESSING\nNormalize, translate,\ndeduplicate]
C --> D[4. ANALYSIS\nFuse, assess,\ninterpret]
D --> E[5. PRODUCTION\nReport, brief,\ndisseminate]
E --> F[6. FEEDBACK\nConsumer evaluation]
F --> A
style A fill:#1d3557,color:#fff
style B fill:#457b9d,color:#fff
style C fill:#a8dadc,color:#000
style D fill:#f4a261,color:#000
style E fill:#e63946,color:#fff
style F fill:#2d6a4f,color:#fff Intelligence Requirements (IRs) drive collection. Without a defined question, OSINT collection degenerates into data hoarding. Examples: - "What email addresses are exposed in breach databases for our domain?" - "What external ports/services are accessible from the internet on our IP ranges?" - "Is threat actor TA577 targeting our industry sector this quarter?"
19.2 Passive DNS and IP Footprinting¶
Passive reconnaissance gathers information without sending packets to the target.
19.2.1 DNS Enumeration¶
# Basic DNS record enumeration
dig @8.8.8.8 target.com ANY
dig @8.8.8.8 target.com MX
dig @8.8.8.8 target.com TXT # SPF, DKIM, DMARC — reveals email providers
dig @8.8.8.8 target.com NS
# Zone transfer attempt (rarely succeeds but worth checking)
dig @ns1.target.com target.com AXFR
# Subdomain brute force
dnsx -l subdomains.txt -d target.com -a -aaaa -cname -mx -ns -silent
subfinder -d target.com -all -recursive -silent | tee subdomains.txt
# Certificate Transparency logs (crt.sh)
curl -s "https://crt.sh/?q=%.target.com&output=json" | \
python3 -c "import sys,json; [print(e['name_value']) for e in json.load(sys.stdin)]" | \
sort -u
# Passive DNS — historical resolution
# SecurityTrails, RiskIQ, VirusTotal, Circl.lu pDNS
19.2.2 ASN and IP Range Mapping¶
# Find ASN for organization
whois -h whois.radb.net -- '-i origin AS12345'
# bgp.he.net — BGP toolkit
# Get all IP ranges for an ASN
whois -h whois.radb.net AS12345 | grep route
# Reverse WHOIS — find all domains registered by an entity
# Domaintools, SecurityTrails, ViewDNS
# AMASS — comprehensive subdomain/IP enumeration
amass enum -d target.com -passive -o amass_output.txt
amass intel -org "Target Corporation" -max-dns-queries 20000
19.3 Internet-Wide Scanning Intelligence¶
19.3.1 Shodan¶
Shodan indexes internet-connected devices — servers, cameras, industrial control systems, printers, routers. It continuously scans the entire IPv4 space and stores banner data.
import shodan
api = shodan.Shodan("YOUR_API_KEY")
# Search for services exposing target org
results = api.search('org:"Target Corporation" port:22,443,3389,8080')
for r in results['matches']:
print(f"{r['ip_str']}:{r.get('port')} | {r.get('product','')} | {r.get('vulns','')}")
# Find exposed databases
results = api.search('org:"Target Corporation" product:"MongoDB"')
# Find SSL certs for org
results = api.search('ssl.cert.subject.cn:"*.target.com"')
# Specific CVE exposure
results = api.search('vuln:CVE-2021-44228') # Log4Shell
# Shodan CLI
shodan search 'org:"Target Corp" http.title:"admin"'
shodan host 203.0.113.45
19.3.2 Censys¶
from censys.search import CensysHosts
h = CensysHosts(api_id="...", api_secret="...")
# Search for org's services
for cert, hosts in h.search(
'parsed.subject.organization:"Target Corporation"',
fields=["ip", "services.port", "services.service_name"]
):
print(cert, hosts)
19.3.3 Attack Surface Mapping Matrix¶
| Source | What It Reveals | Free Tier |
|---|---|---|
| Shodan | Open ports, banners, products, CVEs, default creds | 100 results |
| Censys | TLS certs, services, HTTP metadata | 250 queries/month |
| GreyNoise | Internet noise vs. targeted scanning | Limited |
| crt.sh | Certificate transparency (all issued certs) | Unlimited |
| Subfinder/Amass | Subdomains via passive sources | Open source |
| SecurityTrails | Historical DNS, WHOIS, passive DNS | 50 req/month |
| BuiltWith | Web technologies, CMS, analytics | Limited |
| WaybackMachine | Historical web content, old endpoints | Unlimited |
| GitHub/GitLab | Exposed code, keys, configs | Requires account |
19.4 Credential and Data Breach Intelligence¶
19.4.1 Breach Monitoring¶
# Have I Been Pwned API — check email/domain
curl -H "hibp-api-key: YOUR_KEY" \
"https://haveibeenpwned.com/api/v3/breachedaccount/user@target.com"
# Domain-level breach check
curl -H "hibp-api-key: YOUR_KEY" \
"https://haveibeenpwned.com/api/v3/breaches?domain=target.com"
# Dehashed — search by domain (operator intelligence)
curl -X GET "https://api.dehashed.com/search?query=domain:target.com" \
-H "Authorization: Basic BASE64_CREDS" | python3 -m json.tool
# Free alternatives
# pwndb (Tor) — older breach aggregator
# leak-lookup.com — paid but comprehensive
# IntelX (intelligencex.io) — OSINT search engine including Tor, I2P
19.4.2 GitHub Secrets Hunting¶
Developers inadvertently commit API keys, credentials, and private keys to public repositories.
# TruffleHog — high-entropy string detection
trufflehog github --org=TargetCorp --only-verified
# Gitleaks — scan specific repo
gitleaks detect --source=https://github.com/TargetCorp/repo
# Manual GitHub search operators
# site:github.com "target.com" "password" OR "secret" OR "api_key"
# site:github.com "target.com" "BEGIN RSA PRIVATE KEY"
# site:github.com inurl:TargetCorp "AKIA" (AWS keys start with AKIA)
# GitHub Code Search API
gh api search/code?q="target.com+password+filename:.env"
19.4.3 Pastebin / Code Paste Monitoring¶
# Pulsedive, IntelX, pastebin Google dork
# site:pastebin.com "target.com" after:2024-01-01
# site:paste.ee "target.com" "password"
# site:controlc.com "@target.com"
19.5 Corporate and People OSINT¶
19.5.1 LinkedIn Organizational Mapping¶
LinkedIn provides organizational intelligence for understanding an org's employees, technologies used, and contact points for social engineering assessment.
# linkedin2username — generate potential usernames
python3 linkedin2username.py -u your_linkedin_email -p 'password' -c "Target Corp"
# Email format generation from names
# Common formats: firstname.lastname@, f.lastname@, firstname@
# Tools: hunter.io, phonebook.cz, clearbit
# Hunter.io — find email format + specific emails
curl "https://api.hunter.io/v2/domain-search?domain=target.com&api_key=KEY" | \
python3 -c "import sys,json; d=json.load(sys.stdin)['data']; print(d['pattern']); [print(e['value']) for e in d['emails']]"
19.5.2 Email Harvesting¶
# theHarvester — comprehensive email/subdomain/name harvesting
theHarvester -d target.com -l 500 -b google,bing,linkedin,hunter,shodan -f output.html
# phonebook.cz (free)
# emailformat.com
# skrapp.io
# voilanorbert.com
19.5.3 OSINT Framework Reference¶
The OSINT Framework (osintframework.com) provides a categorized tree of free OSINT tools organized by:
- Username lookup
- Email address
- Domain name
- IP address / Netblock
- Image / Video / Metadata
- Social networks (Twitter/X, Facebook, Instagram, TikTok)
- Phone number
- Public records
- Geospatial / Satellite imagery
- Dark web
19.6 Social Media Intelligence (SOCMINT)¶
19.6.1 Platform-Specific Techniques¶
# Twitter/X — tracking threat actor accounts
# Twint (archived) or snscrape
snscrape twitter-search "#ransomware site:pastebin.com" > tweets.txt
snscrape twitter-user ThreatActorHandle > history.txt
# Telegram — threat actor channels
# Telegram is primary ransomware communication platform
# Tools: telethon (Python library), TeleTracker
# Reddit — exposure in posts/comments
# site:reddit.com "target.com" password
# site:reddit.com "target.com" internal
# Discord — increasingly used by threat actors
# Discord OSINT Tool (DISCORDOSER), Dis.cool
# Facebook — corporate page employee enumeration
# StalkScan (FBID lookup), Lookup-ID
19.6.2 Image Intelligence (IMINT)¶
# Reverse image search
# Google Lens, TinEye, Yandex Images (best for faces)
# Bing Visual Search
# ExifTool — metadata extraction from images
exiftool image.jpg | grep -E "GPS|Author|Software|Created"
# GPS coordinates from EXIF
exiftool -csv -GPSLatitude -GPSLongitude photos/*.jpg
# Facial recognition OSINT (legal/ethical considerations apply)
# PimEyes (face search engine)
# FaceCheck.id
19.7 Dark Web Monitoring¶
19.7.1 Threat Actor Infrastructure¶
Surface Web: Clearnet sites, public forums, news
Deep Web: Authenticated content, databases not indexed by search
Dark Web: Tor (.onion), I2P, Freenet — requires special software
Legal and OPSEC Reminder
Accessing dark web sites for intelligence purposes is legal in most jurisdictions. However, purchasing any goods/services, downloading CSAM, or any illegal activity is criminal. Maintain strict OPSEC when monitoring dark web sources: dedicated hardware, Tails OS, no login to personal accounts, isolated network.
# Tor Browser — access .onion sites
# Tails OS — amnesic live system for maximum OPSEC
# Dark web monitoring services (commercial)
# Digital Shadows (Relative Insight)
# Recorded Future — dark web module
# Flare Systems — automated dark web monitoring
# Kela Darkbeast
# DarkOwl Vision
# Free monitoring
# OnionSearch (aggregates multiple dark web search engines)
# Ahmia.fi — surface-accessible dark web search engine
# IntelX.io — indexes dark web content including I2P and Tor
19.7.2 Ransomware Data Leak Sites¶
Ransomware groups operate "shame sites" on Tor listing non-paying victims.
| Group | .onion Site Status | Notable |
|---|---|---|
| LockBit 3.0 | Active (domain rotates) | Largest number of victims |
| ALPHV/BlackCat | Seized by FBI Feb 2024 | First to list hospital patient data |
| Cl0p | Active | Lists bulk victims from zero-day campaigns |
| RansomHub | Active | Rapidly growing affiliate program |
# Monitoring data leak sites
# RansomWatch (ransomwatch.telemetry.ltd) — aggregates leak site content
# DarkFeed.io — commercial threat intelligence feed
# Ransomware.live — community tracking site
19.8 Metadata and Document Intelligence¶
19.8.1 Document Metadata Extraction¶
Documents published on organizational websites often contain: - Author name → employee enumeration - Software version → vulnerability identification - Internal path (C:\Users\john.doe...) → username disclosure - Domain name (creator.corp.local) → internal structure
# FOCA — automated document metadata extraction
# Metagoofil — download and analyze documents from target domain
metagoofil -d target.com -t pdf,docx,xlsx,pptx -l 200 -o metadata_output/
# ExifTool batch processing
exiftool -csv *.pdf *.docx *.xlsx | grep -E "Author|Creator|Last|Company|Template"
# ManTech's FOCA equivalent
# FOCA (Fingerprinting Organizations with Collected Archives)
19.9 Analyst OPSEC¶
When conducting OSINT on threat actors, maintaining operational security prevents the target from detecting the investigation and modifying their infrastructure.
19.9.1 OPSEC Architecture¶
graph TB
A[Analyst Workstation\nFull identity exposure] --> B{Operation\nType?}
B -->|Passive/safe| C[VPN\nCommercial paid anonymously]
B -->|Sensitive| D[Dedicated VM\nTails OS]
B -->|Dark web| E[Dedicated Hardware\nTails OS + Tor]
C --> F[Target Investigation]
D --> G[Social Media\nOSINT]
E --> H[Dark Web\nMonitoring]
F --> I[Secure Notes\nEncrypted]
G --> I
H --> I
style E fill:#e63946,color:#fff
style H fill:#780000,color:#fff 19.9.2 OPSEC Rules for OSINT Analysts¶
- Never use personal accounts for research — create dedicated research personas with no personal link
- Use separate browsers for different investigations — never mix personal browsing
- VPN or Tor before connecting to anything that could log your IP
- Screenshot, don't visit — paste URLs into web.archive.org or URLScan.io instead of visiting directly
- Watermark awareness — some threat actors embed unique canary tokens in files; never open files from threat actors
- Social media — research profiles don't follow each other and don't interact with targets
- Phone numbers — use Google Voice or Burner for any accounts requiring phone verification
19.10 OSINT Automation with Maltego and SpiderFoot¶
19.10.1 Maltego Transforms¶
Maltego is the industry standard for link analysis and OSINT automation. It uses "transforms" to pivot from one entity type to another.
Common Maltego pivot chains:
Domain → DNS Records → IP Addresses → Organizations → People
Email Address → Social Profiles → Related Accounts → Location
IP Address → Open Ports → Services → CVEs
19.10.2 SpiderFoot Automation¶
# SpiderFoot — automated OSINT platform (self-hosted)
pip3 install spiderfoot
python3 sf.py -l 127.0.0.1:5001
# CLI scan
python3 sfcli.py -s "target.com" -t INTERNET_NAME -m sfp_dnsresolve,sfp_shodan,sfp_hibp
# Modules available: 200+ covering DNS, ports, emails, breaches,
# social media, dark web, reputation, geolocation
19.11 Benchmark Controls¶
| Control ID | Title | Requirement |
|---|---|---|
| Nexus SecOps-OSINT-01 | External Attack Surface Monitoring | Continuous monitoring of external footprint |
| Nexus SecOps-OSINT-02 | Credential Breach Monitoring | Automated breach notification and password reset workflow |
| Nexus SecOps-OSINT-03 | Code Repository Monitoring | Automated scanning of public repos for exposed secrets |
| Nexus SecOps-OSINT-04 | Dark Web Monitoring | Subscription to at least one dark web monitoring service |
| Nexus SecOps-OSINT-05 | OSINT Analyst OPSEC | Documented procedures for safe threat actor investigation |
| Nexus SecOps-OSINT-06 | Document Metadata Scrubbing | Policy and tooling to remove metadata from externally published documents |
Exam Prep & Certifications¶
Relevant Certifications
The topics in this chapter align with the following certifications:
Key Terms¶
ASN (Autonomous System Number) — A globally unique number assigned to an autonomous system by a regional internet registry (ARIN, RIPE, APNIC). Used to identify the routing domain of an organization.
Certificate Transparency — An open framework (RFC 6962) requiring all publicly trusted TLS certificate issuances to be logged in public, append-only logs. Analysts use CT logs to discover subdomains via crt.sh.
Passive DNS — Historical records of DNS resolutions collected by sensors worldwide. Allows analysts to reconstruct what domain names resolved to what IP addresses at any point in time.
Shodan — A search engine that continuously scans the entire IPv4 internet and indexes banner data, certificates, and service metadata from every open port.
SOCMINT — Social Media Intelligence — intelligence derived from social media platforms.
theHarvester — An open-source OSINT tool that aggregates emails, subdomains, virtual hosts, and employee names from multiple public sources.