Most Popular Subdomains and MX Records on the Internet
Reading time: 6 minutesSimply put, today's internet runs on DNS.
The concept is laced with hierarchical overtones, attributable to the structured nature of the protocol itself and its pivotal role when it comes to the proper functioning of the network of networks. After all, a quick survey of today's visible internet rigorously points to a sizable dataset of nearly 364 million domain registrations across the most popular top-level domains. That's a testament to the rapidly-growing need for coherent DNS and IP intelligence solutions that can quickly and effectively sort through the resulting complexity.
In short, DNS records can reveal a plethora of important information, including perimeter protection mechanisms and technologies in use, inconsistencies in canonical entities pointing to specific security implications, and similar flaws leading to potential DNS takeover scenarios, so the value is definitely there.
At the heart of this blog post lies yet another attempt at recognizing the importance of information gathering and asset discovery regarding the efforts of security researchers and bug bounty hunters alike, as they strive for a suitable interplay of passive DNS enumeration capabilities and techniques. Our goal is to showcase the most commonly used subdomains and MX record types as they complement and enrich the asset discovery ecosystem.
If you're in the business of network reconnaissance or asset discovery, mastering the above techniques can go a long way in ensuring flexibility when examining potential areas of exposure and validating legitimate targets of opportunity prior to any engagement.
Let's take a quick look.
Most popular subdomains on the internet
In the recent past, we've articulated that finding associated domains linked to a specific target is central to the idea of extending the attack surface. This long-standing argument reflects the possibility of both horizontal and vertical domain correlation, where the intent is to search for any available subdomains and siblings corresponding to the apex—success in this area is always measured in terms of forgotten or mishandled domain records as an additional target of opportunity for miscreants to capitalize on.
As a refresher, domain name features consist of human readable character strings with a one-to-one correspondence pointing to a specific web resource. In turn, the canonical internet protocol (DNS) leverages a subordinate arrangement starting with TLDs, or top-level domains, composed of prominent extensions such as .com or .net, followed by second- and third-level domains which consumers can acquire and control at will.
This form of domain administration allows for further specialization whereby domains can be scaled to generate the desired aggregates. For instance, third-level domains, or subdomains as they are normally referred to, can identify an FTP server simply by prepending ftp to domain.com; this denotes the collective designation of a resource via a unique identifier such as ftp.domain.com, otherwise known as a fully qualified domain name, or FQDN.
Playing a role often attributed to hostnames within organizational boundaries, subdomains typically exhibit the greatest flexibility when it comes to naming conventions. Thus, large-scale DNS intelligence dictates that keeping an eye on the fluidity within domain names offers a critical view of the threat landscape. This is also the case where subdomain knowledge is leveraged at high-level stages of the recon process, targeting institutional privacy via recursive DNS data and any resulting bidirectional activity in the process.
So, what are the top most popular subdomains, and how can we identify them?
From a corpus of over 17 billion records of crawled web data for the .com TLD, and associated URLs, hosted at Common Crawl, we set out to investigate the feasibility of pulling a subset of these records using a common programming language like Python, some commodity hardware, and supporting tools like the CDX Toolkit.
Working with this population sample not only gave us the opportunity to compare results from previous attempts by Bitquark some years ago on the same topic, but it allowed us to keep such an intensive computational task within achievable terms.
In anticipation of past state data pointing to several hours of running time, we decided to limit our search to URLs with HTTP status 200, and compare the outcome to previous data from Bitquark and others—the results were indicative of earlier results: here are the top 10 subdomains from the June 2021 Common Crawl dataset:
As you can see, the most popular subdomain is by far "www" (the famous World Wide Web initials still in use today), followed by mail, forum, m, and blog. Other popular records are related to e-commerce (shop), and community and content-related stuff (forums, wiki, community).
Most popular MX records
The importance of MX records cannot be overstated. E-mail is an unquestionably critical component of today's digital infrastructure, and MX records in the hands of threat actors can turn into valuable information, especially when vulnerabilities are announced in security platforms charged with protecting mail exchangers and supporting platforms.
In the summer of 2020, when we introduced our domain-specific SQL-like query language, and its web interface, SQL Explorer, customers and security practitioners alike instantly seized the opportunity to leverage this new tool by relying on the power of SQL syntax and predicates over structured data—a reasonable improvement in the eyes of those always seeking to eke out as much information about a target as possible.
Thus, we decided to put our very own SQL Explorer to the test by searching through MX records and hostnames attributed to both incoming and outgoing mail servers for some of the most popular tech giants, including domains such as google.com, etc. The SQL query in question appeared as follows:
The table below shows the predominant domains and hostnames associated with mail records, once again, attributed to the generic TLD .com:
Extracting subdomain information from the above table (and remaining records) yielded the most popular subdomains in no particular order:
Closing thoughts
In all likelihood, the term "subdomain" (or "MX record") has crossed your radar countless times throughout your cyber career. But the generally accepted norm, based on intrinsic properties of the DNS protocol itself, speaks of important underlying challenges in the area of temporal consistency due to unchecked ambiguities, and so, DNS measurement studies continue to suggest that passive analysis of subdomains and MX records on such a large scale can unquestionably be, as conceived in the first part of this article, a formidable undertaking.
Despite transformative advances in big data analysis and processing, enumerating DNS records remains computationally intensive, so caution and adequate preparation are always warranted.
As always, we invite you to experience what a promising prospect our very own SecurityTrails SQL can be, particularly in large-scale asset discovery projects requiring a high level of consistency, speed, reliability, and accuracy. Give it a try, you won't be disappointed.