Analysis of "The Collector", an automated email collection and analysis platform

🇮🇹 Italian version here

I conducted an experiment to see what happens to an email sent to a domain “similar” to a real one, such as gmeil[.]com or yahoo[.]com[.]cm (typosquatted).
Not only it was “read,” but it triggered a coordinated response from a sophisticated threat intelligence platform. In this article, I have summarized the stages of this incredible “conversation” and what it teaches us about security.

1. Executive Summary

This report provides a technical analysis of a sophisticated threat actor, designated UNC-HEMAIL (aka “The Collector”), identified through a multi-phased active reconnaissance experiment. UNC-HEMAIL operates a large, centralized infrastructure for the collection of misdirected and phishing emails, leveraging a cluster of typosquatted domains (early estimates suggest 800,000 domains). The actor’s operational workflow involves a multi-tiered security and analysis chain, utilizing a commercial third-party security gateway for initial threat triage, followed by an in-depth analysis conducted by a proprietary internal platform.
This platform exhibits advanced TTPs, including adaptive defense, infrastructure fingerprinting, and dynamic payload analysis across multiple environments.
The objective of this report is to document UNC-HEMAIL’s TTPs and associated IoCs to inform network defense and threat hunting activities.

Key takeaway: Every email sent to a typosquatted domain can be considered a potential data leak.

✉️

Phishing Email

→

↓

Actor 1: The Protector

Third-Party SEG “Threatwave”

Initial Triage & Precise Redaction

→

↓

Actor 2: UNC-HEMAIL

The Collector’s Platform

Deep Analysis & Custom Tooling

Tired of reading? Try listening the AI-generated podcast (credits to Gemini):

2. Key Indicators of Compromise (IoCs)

The following IoCs are associated with UNC-HEMAIL’s infrastructure and analysis platform with high confidence.

Type	Indicator	Notes
MX Record	mail.h-email.net	Central collection point for all typosquatted domains.
Internal Marker	Jackdavis@eureliosollutions.com	Used as an email replacement during technical analysis.
IP	165.227.159.144 165.227.156.49 167.235.143.33 49.13.4.90 5.75.171.74 91.107.214.206 178.62.199.248 5.161.98.212 162.55.164.116 5.161.194.135	“The Collector” infrastructure
Domain cluster	ygmail.com, gmaio.com, hotmail.com.xn--6frz82g, notmail.com, yabbo.com, gmeil.com, gmagl.com, gmai.com, gmaol.com, hotmaila.com, hotma8il.com, homaitl.com, hoptmail.com, chotmail.com, oultook.com, gmdil.com, gmailz.com, outklook.com, outlookt.com, outloek.com, yahoo.com.xn--6frz82g, y6ahoo.com, yahyoo.com, hotmailc.com, hotmailt.com, hotmaiol.com, yahoio.com, yaxoo.com, yahoo.com.cm	Confirmed responsive domains under the actor’s control
User-Agent	Aweme/29.3.0	Douyin/TikTok app; tests context-aware threats.
User-Agent	THDConsumer/7.45.0.1	Specific UA for The Home Depot app, used to test in-app browser rendering.

3. Analysis of Tactics, Techniques, and Procedures (TTPs)

UNC-HEMAIL’s platform demonstrated a structured and methodical approach to threat analysis. The observed TTPs are detailed below.

T1583: Acquire Infrastructure
- The actor operates a significant network of typosquatted domains mimicking major email providers. This infrastructure is centrally managed, pointing to a common MX record, indicating a long-term, strategic operation for passive data collection.
T1027: Obfuscated Files or Information
- The analysis platform systematically uses a wide range of common User-Agents (Chrome on Windows, Safari on iPhone) and residential proxies to masquerade its automated scanning activity, making it difficult to distinguish from legitimate user traffic.
T1497: Virtualization/Sandbox Evasion
- The use of specific mobile application User-Agents, such as THDConsumer (The Home Depot) and Aweme (Douyin/TikTok), indicates a sophisticated capability to test for context-aware threats across multiple, specific in-app environments.
  This technique is used to determine if a payload behaves differently when rendered within a third-party mobile application’s webview, a method to evade simple sandboxes and detect context-aware threats.
Custom TTP: Internal Threat Correlation
- The platform reuses attacker-provided unique identifiers (UUIDs) from inbound links and pairs them with its own internal markers (e.g., Jackdavis@…) to precisely correlate its multi-stage analysis back to a single, specific threat artifact.

4. Operational Workflow & Attribution

The analysis confirms a two-tiered operational structure, allowing for a clear distinction between the primary actor and its security provider.

Actor 1: Third-Party Secure Email Gateway (SEG) – “Threatwave”

Role: Acts as the first line of defense, performing initial triage on unknown threats.
TTP: Performs precise, “surgical” redaction of URLs, correctly parsing and replacing only the identified email string. This behavior is consistent with a mature, commercial security product.

Actor 2: UNC-HEMAIL (“The Collector”)

Role: The primary threat intelligence platform, conducting in-depth analysis.
TTP: Employs a less precise, pattern-based redaction method, indicative of a custom internal tool. It uses unique internal markers and advanced analysis techniques not observed at the gateway level.

Attribution: The significant difference in URL manipulation TTPs provides high confidence that the external SEG is a commercial service provider, while UNC-HEMAIL is the end customer and the primary actor conducting the deep-dive analysis.

5. Timeline of Events & Actor Adaptation

Phase 1: Initial Probing & Evasion. The actor’s automated systems responded to an initial campaign. Both redaction techniques were observed, confirming the multi-layered analysis chain.
Phase 2: Infrastructure Burnout. A follow-up campaign from the same source IP yielded no response. This indicates the source IP was blacklisted after Phase 1, demonstrating an effective, adaptive defense posture.
Phase 3: Successful Bypass & Deep Analysis. A new campaign from a “clean” domain and IP bypassed the initial block. UNC-HEMAIL’s internal platform engaged directly, initiating its full suite of technical and behavioral analysis TTPs.
Phase 4: Human vs. Machine Detection. The new domain was placed behind Cloudflare. The actor’s response continued, allowing for the collection of metrics that tried to distinguish between automated (“robotic”) and potential human-driven activity.

6. Quantitative Metrics & Analysis

This section provides quantitative data gathered during the experiment, which further supports the conclusion of automated analysis rather than human interaction.

Engagement Rates

Total Emails Sent

980

Recorded Clicks

Click-Through Rate

~4.18%

Latency statistics

The observed latencies are not consistent with human behavior and strongly point to automated, queued processing within a sandbox environment. The high standard deviation in particular indicates a non-uniform, multi-stage analysis process.

Metric	Avg Delay (s)	Min Delay (s)	Std Dev (s)	Interpretation
Open Delay	~2182 s (~36 min)	5	~1469	The extremely wide range (from 5 seconds to over an hour) suggests a complex queuing system. Some emails are triaged instantly, while others wait in a queue for available analysis resources.
Click Delay	~1782 s (~30 min)	31	~1139	The high average delay confirms automated sandboxing. The high standard deviation suggests different analysis paths or priorities for different links.
JS Fingerprint Delay	~2449 s (~41 min)	42	~1585	The significant delay between click and JS execution confirms a multi-step sandbox process: page load, render, and subsequent script execution, with high variability in processing time.

Metrics Conclusion: The high average latencies, coupled with the extremely high standard deviation, definitively rule out consistent human interaction. They are characteristic of automated security systems that queue, detonate, and analyze content in a controlled, non-real-time, and resource-dependent environment.

7. Conclusions

The experiment successfully identified and profiled the operations of a sophisticated threat intelligence actor, UNC-HEMAIL. The key conclusions are as follows:

1. The Underestimated Risk: An email sent to the wrong domain (e.g., user@gmgil.com instead of user@gmail.com) is not a message ‘lost in the void.’ Our experiment showed quite the opposite. That ‘void’ is actually an analysis lab that activates almost immediately. The content of the email sent by mistake is now in the hands of an unknown entity, leading to a full-blown data leak.

2. Existence of a Coordinated Intelligence Platform: The evidence confirms the existence of a single actor managing a large, centralized infrastructure (h-email.net) for the passive collection and analysis of threats from typosquatted domains.

3. Multi-Layered Operational Security: UNC-HEMAIL employs a mature, two-tiered operational model, utilizing a commercial third-party SEG for initial threat screening and its own proprietary platform for in-depth analysis. This demonstrates a high level of operational security consciousness.

4. Adaptive Defense Capabilities: The actor’s platform is not static; it learns from interactions. It demonstrated the ability to create signatures from an initial attack (Phase 1) and use them to block subsequent, similar threats at the infrastructure level (Phase 2).

5. Advanced Analysis TTPs: The platform’s capabilities go beyond simple link detonation. It performs active reconnaissance on threat infrastructure, tests for context-aware payloads (via in-app UA emulation), and uses internal correlation methods (UUID reuse, custom markers) for precise threat tracking.

6. Automated, Non-Human Interaction: The quantitative latency data, particularly the high averages and standard deviations, conclusively demonstrates that the observed activity is entirely automated and characteristic of a non-real-time sandbox analysis workflow.

7. In its Sender Policy Framework (SPF) configuration, mail.h-email.net allows sending only from an IPv6 range fd96:1c8a:43ad::/48, which belongs to the Unique Local Addresses (ULA) space. These addresses are the IPv6 equivalent of RFC 1918 private addresses (such as 192.168.x.x) and are not routable over the Internet. Authorizing a private IP range to send public email is an extremely unusual configuration. It could indicate a complex internal network based on IPv6 or a misconfiguration. The fact that this policy exists, despite being technically wrong for public mail, is a strong indication of a nonstandard network environment.

8. Recommendations for Defenders

Monitor for similar IoCs: Security teams should monitor traffic (especially outbound) related to the MX record h-email.net, its IPs, and the marker Jackdavis@eureliosollutions.com as indicators of potential information gathering.

Assume Multi-Layered Analysis: When red-teaming or analyzing phishing campaigns, assume that initial interactions may come from a commercial SEG, while more sophisticated analysis may follow from a separate, internal platform.

Diversify Infrastructure: To bypass adaptive defenses, threat emulation must involve rotating not only source IPs and domains but also the structure and content of payloads to avoid signature-based detection.