SC-019: Deepfake + AI-Enabled Social Engineering¶
Scenario Header
Type: Social Engineering / Emerging Threat | Difficulty: ★★★★★ | Duration: 2–3 hours | Participants: 4–8
Threat Actor: eCrime group — financially motivated, AI-enabled social engineering specialist
Primary ATT&CK Techniques: T1566.004 · T1656 · T1598.003 · T1204.001 · T1657 · T1585.001 · T1589.001
Threat Actor Profile¶
ECHO MIRAGE is an emerging eCrime group first observed in 2024, pioneering the use of AI-generated voice cloning (deepfake audio) and synthetic video for social engineering attacks. Unlike traditional vishing actors who rely on scripted impersonation, ECHO MIRAGE generates near-perfect voice clones from publicly available audio samples — earnings calls, conference presentations, podcast appearances, and corporate video content.
The group targets CFOs and finance teams at enterprises in financial services and insurance, where wire transfer authority exists and executives frequently communicate verbally about financial decisions. Their attack chain combines OSINT, AI voice synthesis, and exploited trust relationships to bypass human verification processes that were designed for an era before synthetic media.
Motivation: Financial — wire fraud via voice-authorized transfers, typically $500K–$5M per operation. The group reinvests profits into improved AI tooling, creating a capability escalation cycle.
Estimated Operations: 15–20 successful attacks in 2025, with an average yield of $1.8M per incident. Success rate estimated at ~35% of attempts.
Emerging Threat Context
AI-generated voice deepfakes represent a paradigm shift in social engineering. Key developments:
- Voice cloning quality: Modern TTS (text-to-speech) models can clone a voice from as little as 3 seconds of audio
- Real-time synthesis: Attackers can now conduct live conversations using cloned voices with <200ms latency
- Detection difficulty: Human listeners correctly identify AI-generated speech only ~50% of the time (near chance)
- Public awareness: Most organizations have not updated their verification procedures to account for synthetic voice threats
- Notable incidents: Multiple publicly reported cases of deepfake voice fraud exceeding $25M (Hong Kong, 2024)
Scenario Narrative¶
Phase 1 — OSINT & Voice Sample Collection (~20 min)¶
ECHO MIRAGE targets FinServ Corp, a mid-market insurance company ($2.1B AUM) with 800 employees. Through LinkedIn and corporate website reconnaissance, they identify the organizational hierarchy:
- Marcus Webb — CISO (target for impersonation)
- Sandra Liu — CFO (secondary target)
- David Park — Treasury Manager (primary social engineering target)
- Jennifer Adams — Wire Transfer Coordinator (execution target)
The attacker collects voice samples for Marcus Webb (CISO) from the following publicly available sources:
| Source | Duration | Quality | Content |
|---|---|---|---|
| RSA Conference 2025 panel (YouTube) | 18 minutes | High (studio mic) | Cybersecurity strategy discussion |
| FinServ Corp quarterly earnings call (IR website) | 7 minutes | Medium (phone) | Security investment Q&A |
| InsurTech podcast interview (Spotify) | 42 minutes | High (studio mic) | Career background, leadership style |
| FinServ Corp internal training video (LinkedIn) | 5 minutes | High (professional) | Security awareness message to employees |
Total collected audio: 72 minutes — far more than the 3–10 minutes required for high-fidelity voice cloning.
The attacker uses a commercial-grade voice cloning model (based on open-source architecture similar to VALL-E or XTTS) to create a real-time voice synthesis pipeline. They test the clone against the source audio and achieve a Mean Opinion Score (MOS) of 4.6/5.0 — virtually indistinguishable from the real voice.
Additionally, ECHO MIRAGE builds a behavioral profile of Marcus Webb: his communication style (direct, uses military analogies), typical work hours (emails sent 6 AM–8 PM ET), and relationship with the treasury team (monthly security briefings, first-name basis with David Park).
Evidence Artifacts:
| Artifact | Detail |
|---|---|
| OSINT Collection | LinkedIn profiles: Marcus Webb (500+ connections, 47 posts), David Park (200 connections), Jennifer Adams (150 connections) — all public |
| Voice Samples | YouTube: RSA Conference panel — 18 min — Downloaded via yt-dlp — 2026-02-01 |
| Voice Samples | FinServ Corp IR page: Q3 2025 earnings call — Marcus Webb Q&A segment: 7 min — 2026-02-01 |
| Voice Samples | Spotify: "InsurTech Leaders" podcast S3E14 — Full episode with Marcus Webb — 42 min — 2026-02-02 |
| Domain Registration | finservcorp-secure[.]com registered 2026-02-05 — Registrar: Porkbun — WHOIS: privacy-protected |
Phase 1 — Discussion Inject
Technical: The attacker collected 72 minutes of voice samples from publicly available sources. What is your organization's executive digital footprint, and how much voice/video content is publicly accessible for your C-suite? Should you limit executive media exposure as a security measure, and what's the business tradeoff?
Decision: Modern voice cloning requires only 3–10 seconds of audio for a usable clone. Given that most executives have hours of public audio available, is it realistic to prevent voice sample collection? If not, what compensating controls must you implement for any process that relies on voice-based authentication or authorization?
Expected Analyst Actions: - [ ] Conduct digital footprint assessment for all C-suite executives — identify public voice/video sources - [ ] Audit all voice-authenticated or voice-authorized business processes - [ ] Review financial authorization procedures for voice-based approval vulnerabilities - [ ] Register defensive domains for common typosquats of your corporate domain - [ ] Brief executive team on deepfake voice threat landscape
Phase 2 — Deepfake Voice Call & Wire Authorization (~30 min)¶
On 2026-02-12 at 4:47 PM ET (late afternoon — chosen to create urgency and reduce scrutiny), ECHO MIRAGE places a phone call to David Park, Treasury Manager, using:
- Caller ID spoofing: Displays Marcus Webb's corporate mobile number (
+1-555-0193) - Real-time voice synthesis: AI-generated voice matching Marcus Webb's vocal characteristics
- Behavioral mimicry: Using Marcus Webb's known communication style and personal details
The call proceeds as follows (reconstructed from David Park's notes and call recording):
"Marcus Webb" (AI): "David, it's Marcus. I'm sorry to call you this late in the day. I need to talk to you about something time-sensitive and confidential."
David Park: "Sure Marcus, what's going on?"
"Marcus Webb" (AI): "We've just discovered a potential data breach involving one of our reinsurance partners. Our outside counsel at Morrison & Associates has been engaged, and we need to make an emergency escrow payment of $2.4 million to secure a forensic investigation team. This is attorney-client privileged — I need you to keep this between us for now."
David Park: "Okay, that sounds serious. Do you want me to process this through the normal wire approval flow?"
"Marcus Webb" (AI): "No — Sandra [CFO] is aware and has verbally approved, but she's in a board meeting until 6 PM and can't sign the electronic approval right now. I need you to push this through on my verbal authorization. The escrow account details are being sent to Jennifer Adams right now from our outside counsel. Can you make sure she processes it before end of business today?"
David Park: "I... usually we need Sandra's electronic signature for anything over $500K. But if she's already approved verbally..."
"Marcus Webb" (AI): "David, I understand the process, but this is an active incident. Every hour we delay increases our exposure. Sandra will countersign tomorrow morning. I'll send you an email confirming my authorization right now."
Five minutes later, Jennifer Adams receives an email from m.webb@finservcorp-secure[.]com (typosquat domain) with wire instructions for $2,400,000 to an escrow account at a Maltese bank.
Jennifer processes the wire transfer at 5:12 PM ET.
Evidence Artifacts:
| Artifact | Detail |
|---|---|
| Phone System (PBX) | Inbound call to David Park's extension — Caller ID: +1-555-0193 (Marcus Webb's mobile — spoofed) — Duration: 4m 38s — 2026-02-12T16:47:22Z — Call recording available |
| Email Gateway | Inbound from m.webb@finservcorp-secure[.]com — To: j.adams@finservcorp.com — Subject: "CONFIDENTIAL: Emergency Escrow Wire Instructions" — SPF: PASS (for finservcorp-secure[.]com), DMARC: No policy (different domain) — 2026-02-12T16:53:14Z |
| Wire Transfer System | Amount: $2,400,000 — Beneficiary: "Morrison & Associates Escrow" — IBAN: MT84MMEB44093000000012345678901 (Malta) — Authorized by: D. Park (verbal from CISO) — Processed by: J. Adams — 2026-02-12T17:12:00Z |
| Email Gateway (negative) | No email from m.webb@finservcorp.com (legitimate domain) to David Park or Jennifer Adams regarding this wire |
| CFO Calendar | Sandra Liu: Board meeting 4:00 PM–6:00 PM ET — Confirmed (used as pretext for unavailability) |
Phase 2 — Discussion Inject
Technical: The caller ID displayed Marcus Webb's real mobile number. How does caller ID spoofing work, and why can't it be reliably used for identity verification? What technical controls exist (STIR/SHAKEN) and what are their current limitations?
Decision: David Park deviated from the wire transfer policy (electronic CFO signature required for >$500K) based on a voice call from a perceived authority figure (CISO) claiming urgency and confidentiality. This is a textbook social engineering exploitation of authority bias and urgency. How do you design a wire transfer process that is resistant to social engineering — including AI-generated deepfake voices — without creating unacceptable delays for legitimate urgent transactions?
Expected Analyst Actions: - [ ] Analyze the phone call recording — compare against known Marcus Webb voice samples - [ ] Investigate the sending domain finservcorp-secure[.]com — WHOIS, registration date, hosting - [ ] Review wire transfer authorization — identify policy violations - [ ] Immediately contact the bank to initiate wire recall - [ ] Verify with Marcus Webb (via verified channel) whether he authorized any wire transfer - [ ] Check if other employees received calls or emails from the spoofed identity
Phase 3 — Discovery & Callback Verification Failure (~25 min)¶
The fraud is discovered the next morning (2026-02-13) when Marcus Webb reviews his email and finds David Park's confirmation message about the "emergency escrow wire." Marcus did not make any such call or authorization.
Investigation reveals multiple callback verification failures:
| Control | Status | Failure Mode |
|---|---|---|
| Dual-authorization (CFO signature) | Bypassed | Verbal override accepted from CISO citing urgency |
| Callback verification | Not performed | Caller ID showed Marcus's number — David assumed identity was verified |
| Email domain verification | Failed | Jennifer noticed the email came from finservcorp-secure[.]com but assumed it was a secure email portal |
| Segregation of duties | Compromised | David both authorized and instructed Jennifer to process — no independent verification |
| Time-of-day restriction | Not configured | Wire transfers processed after 5 PM without additional scrutiny |
Attempted Recovery:
| Action | Timeline | Result |
|---|---|---|
| Wire recall request to bank | 2026-02-13T09:14:00Z (16 hours post-transfer) | Partial: $800,000 frozen in intermediary account |
| FBI IC3 complaint | 2026-02-13T10:30:00Z | Filed — case assigned |
| Maltese bank freeze request | 2026-02-13T14:00:00Z (via FBI MLAT) | Too late: $1,600,000 already cascaded through 3 accounts |
| Cyber insurance claim | 2026-02-13T16:00:00Z | Filed — social engineering rider covers up to $3M |
Net Financial Impact:
| Category | Amount |
|---|---|
| Total wire transfer | $2,400,000 |
| Amount frozen/recovered | $800,000 |
| Net loss | $1,600,000 |
| Insurance recovery (estimated) | $1,200,000 (subject to deductible and investigation) |
| Out-of-pocket loss (estimated) | $400,000 + investigation costs |
Evidence Artifacts:
| Artifact | Detail |
|---|---|
| Marcus Webb | Confirmed: No phone call made, no wire authorized, no email sent — 2026-02-13T08:30:00Z |
| Voice Analysis (forensic) | Call recording submitted to forensic audio lab — Preliminary: "High probability of AI-generated speech — spectral analysis shows artifacts consistent with neural TTS models" — Full report pending |
| Bank Records | $2,400,000 received at Maltese bank 2026-02-12T17:15:00Z — $800,000 transferred to Cyprus at 2026-02-12T19:30:00Z (frozen in transit) — $1,600,000 transferred to Hong Kong at 2026-02-12T18:45:00Z (converted to cryptocurrency) |
| Caller ID Analysis | STIR/SHAKEN attestation: Level C (gateway — no identity verification) — Carrier: VoIP provider with no customer verification |
Phase 3 — Discussion Inject
Technical: Forensic audio analysis identified "spectral artifacts consistent with neural TTS models." What are these artifacts (e.g., unnatural prosody, missing breath patterns, spectral smoothing), and how reliable is current deepfake audio detection? What tools exist for automated deepfake speech detection?
Decision: Your net financial exposure is $1,600,000 minus insurance recovery. The cyber insurance carrier requires evidence that "reasonable security controls" were in place. The policy was bypassed by an employee accepting verbal authorization from a perceived authority figure. Will the insurance carrier argue contributory negligence? How do you demonstrate that the deepfake attack constituted a "novel, sophisticated threat" that exceeded reasonable control expectations?
Expected Analyst Actions: - [ ] Complete forensic analysis of the voice call — submit to deepfake detection service - [ ] Trace fund movement through international banking channels - [ ] Document all callback verification and authorization failures - [ ] Assess exposure — did the attacker gain any access beyond the wire transfer? - [ ] Coordinate with FBI on international fund recovery - [ ] Prepare incident documentation for insurance claim
Phase 4 — Controls Implementation & Emerging Threat Framework (~25 min)¶
Following the incident, FinServ Corp implements comprehensive controls against AI-enabled social engineering:
New Policy: Mandatory callback verification for all financial transactions >$10,000
- Callback must be placed to a pre-registered phone number from the corporate directory — never to a number provided in the request
- Callback must use a different communication channel than the request (if request was phone, verify via in-person or secure chat; if request was email, verify via phone)
- For transactions >$500,000: require in-person authorization or video call from a registered corporate device with multi-factor authentication
- No exceptions for urgency, confidentiality, or executive override — any request claiming "bypass normal process" is automatically escalated to the SOC
New Policy: Dual-authorization with segregation of duties
- All wire transfers >$50,000 require electronic signatures from two authorized signatories
- The requestor, authorizer, and processor must be three different individuals
- No verbal authorizations — all approvals must be electronic and auditable
- Time-of-day restrictions: wire transfers >$500,000 cannot be processed after 4 PM local time or on weekends without CEO-level approval
Framework for Detecting AI-Synthesized Communications
Level 1 — Process Controls (Immediate Implementation)
- [ ] Eliminate voice-only authorization for any financial or security decision
- [ ] Implement code-word verification — pre-shared rotating passphrases between executives and finance team
- [ ] Require multi-channel verification (phone + email + in-person/video) for transactions >$100K
- [ ] Train all employees on deepfake awareness — include audio examples in training
Level 2 — Technical Detection (3–6 Month Implementation)
- [ ] Deploy real-time deepfake audio detection on PBX/phone system (e.g., Pindrop, Reality Defender)
- [ ] Implement email AI content detection for synthesized text patterns
- [ ] Deploy STIR/SHAKEN-aware call filtering — flag calls with Level C attestation from VoIP providers
- [ ] Monitor for executive voice sample collection — alert when corporate media is scraped by suspicious IPs
Level 3 — Organizational Resilience (Ongoing)
- [ ] Conduct quarterly deepfake simulation exercises — test finance team with synthetic voice calls
- [ ] Establish executive digital footprint reduction program — review public speaking, podcast, and video appearances
- [ ] Participate in industry deepfake threat sharing (FS-ISAC, CISA)
- [ ] Update incident response playbooks for AI-enabled social engineering scenarios
- [ ] Engage AI ethics and security researchers for emerging threat briefings
Evidence Artifacts:
| Artifact | Detail |
|---|---|
| Policy Update | Wire Transfer Policy v3.0 — Effective: 2026-03-01 — Changes: dual-signature required, no verbal authorization, time-of-day restrictions, mandatory callback |
| Training Records | Deepfake awareness training — Completion: 94% of finance team within 2 weeks — Includes live AI voice demo |
| Technology RFP | RFP issued for deepfake audio detection — Vendors evaluated: Pindrop, Reality Defender, Resemble AI Detect — POC scheduled: 2026-04-01 |
| Simulation Exercise | First deepfake simulation conducted 2026-03-15 — 3 of 8 finance team members failed (attempted to process the simulated fraudulent wire) — Retraining scheduled |
Phase 4 — Discussion Inject
Technical: The AI-synthesized communication detection framework includes real-time deepfake audio detection on the PBX. What is the current accuracy of deepfake audio detection systems? What are the false positive and false negative rates, and how do you handle legitimate calls flagged as deepfakes?
Decision: Your first deepfake simulation exercise had a 37.5% failure rate (3 of 8 finance team members failed). Is this acceptable for a post-incident simulation? What failure rate threshold would you target, and how frequently should you conduct these exercises? How do you balance security rigor with employee trust and morale?
Expected Analyst Actions: - [ ] Validate that all new controls are implemented and tested - [ ] Review callback verification compliance — audit first 30 days of transactions - [ ] Evaluate deepfake detection technology POC results - [ ] Schedule recurring deepfake simulation exercises (quarterly) - [ ] Update threat model to include AI-enabled social engineering as a primary risk - [ ] Share anonymized IOCs and TTPs with FS-ISAC
Detection Opportunities¶
| Phase | Technique | ATT&CK | Detection Method | Difficulty |
|---|---|---|---|---|
| 1 | OSINT / voice sample collection | T1589.001 | Monitor for bulk downloads of executive media content | Hard |
| 1 | Typosquat domain registration | T1583.001 | Domain monitoring for corporate domain typosquats | Medium |
| 2 | Caller ID spoofing | T1656 | STIR/SHAKEN attestation level monitoring — flag Level C calls | Medium |
| 2 | Deepfake voice call | T1566.004 | Real-time deepfake audio detection on PBX | Hard |
| 2 | Spoofed email from typosquat | T1566.001 | Email gateway: flag emails from domains registered <30 days ago | Easy |
| 3 | Wire transfer policy bypass | T1657 | Financial controls: dual-signature enforcement, no verbal override | Easy |
| 3 | Callback verification failure | — | Process audit: flag transactions without completed callback verification | Easy |
Key Discussion Questions¶
- AI voice cloning requires only seconds of audio. Most executives have hours of public audio available. Is it feasible to reduce executive digital footprint, or must we assume voice cloning is always possible and focus entirely on compensating controls?
- The CISO's identity was impersonated to authorize a financial transaction. Should CISOs and other non-financial executives have any wire transfer authorization capability? How do you define authorization boundaries?
- Caller ID spoofing is trivial with VoIP. STIR/SHAKEN adoption is incomplete. Should organizations stop using phone-based identity verification entirely? What replaces it?
- The deepfake simulation exercise had a 37.5% failure rate. How do you interpret this result — is it a training failure, a process failure, or evidence that human-based detection of deepfakes is fundamentally unreliable?
- As deepfake technology improves, traditional social engineering defenses (callback verification, voice recognition) become less effective. What does the future of identity verification look like for financial authorization?
Debrief Guide¶
What Went Well¶
- The fraud was discovered within 16 hours — Marcus Webb identified the unauthorized transaction the next morning
- Wire recall was initiated promptly, resulting in $800,000 partial recovery
- The organization's cyber insurance policy included a social engineering rider, providing financial recovery
Key Learning Points¶
- Voice-based identity verification is no longer reliable — AI voice cloning can defeat human listeners with near-certainty
- Caller ID is not identity verification — spoofing is trivial and STIR/SHAKEN adoption is incomplete
- Authority bias and urgency are the most exploited human vulnerabilities — the attacker combined both with confidentiality pressure
- Process controls must be technical, not optional — policies that can be bypassed by claiming urgency will be bypassed
- Deepfake threats require organizational, not just technical, responses — training, simulation, and process redesign are as important as detection technology
Recommended Follow-Up¶
- [ ] Implement mandatory dual-signature electronic authorization for all wire transfers >$50,000
- [ ] Deploy callback verification as a hard control — no verbal overrides, no exceptions
- [ ] Evaluate and deploy deepfake audio detection technology on PBX system
- [ ] Conduct quarterly deepfake simulation exercises for finance and treasury teams
- [ ] Reduce executive digital footprint — review and limit public speaking audio/video where possible
- [ ] Implement code-word verification system for executive-to-finance communications
- [ ] Update cyber insurance policy — ensure social engineering rider covers AI-enabled attacks
- [ ] Participate in FS-ISAC deepfake threat sharing working group