#ProTips: Understanding a Leaky Internet with Gregory Boddin
Reading time: 11 minutes
Over the past few years, data breaches involving millions of leaked records have become the norm. A common offender we're seeing more of is the presence of poorly secured and misconfigured databases connected to the Internet.
Leaving any database exposed to the Internet is often the result of simple human error, but the consequences are anything but simple. In the past three months alone, hundreds of unsecured databases left exposed to the Internet were the subject of "meow" attacks that destroyed data without much explanation—leaving only the word "meow" as a calling card. So far, more than 1,000 unsecured databases have been permanently removed.
To shed light on these recent leaks and attacks, Gregory Boddin from LeakIX is joining us in this latest installment of ProTips. LeakIX is a new engine developed in Belgium that indexes and provides insight into compromised devices, servers and databases. LeakIX helps security researchers and threat intelligence companies keep track of all campaigns active in the wild by providing actionable data on cybercrime campaign trends.
Today we'll learn about how to secure databases, how to identify meow attacks, the pros and cons of open search engines, Gregory's favorite open source scanning tools, and we'll even tackle some underappreciated practices such as source code mining and IPv6 scanning.
- ProTip 1: Open databases and how to secure them
- ProTip 2: Identifying meow attacks
- ProTip 3: Best open source scanning tools
- ProTip 4: Pros and cons of open search engines
- ProTip 5: Tackling IPv6
- ProTip 6: Source code mining
- ProTip 7: LeakIX power tips
ProTip 1: Open databases and how to secure them
Most of the databases left open at the moment are MongoDB and Elasticsearch servers. They're used for various purposes, ranging from simple log servers to full-on "source of truth" object storage.
This means they can contain logs from users visiting your site, sometimes including what is posted on the forms: name, firstname, location, passwords or credit card numbers.
After logs, you can find full repositories of various objects:
- Users
- Payment information
- Metrics about infrastructures, revealing their schema and internal config
- Large aggregates of data used for research, sometimes leading to disastrous consequences (social account scraping, sensible internet recon)
On the MySQL side, many instances turn out to be development instances. Those may include production data, depending on the company policy.

ProTip 2: Identifying meow attacks
When LeakIX was first released to the public, it was mostly a tool to help tracking the schema of databases. It turns out you can get many types of information from a database schema without accessing the data itself.
Identifying ransom campaigns were one of them, as those groups usually leave the same message with different mail/BTC addresses in a newly created database or index with a specific name, such as "hello" or "read_me".
What we didn't expect, however, was for all those servers to be completely wiped by what seems to be a non-profit attack.
Focusing on the Elasticsearch side, we started seeing meowed servers in the wild around the 20th of July. 500 servers were touched on the first day and the attack would keep running strong for the next two weeks:
2020-07-20 : 501 unique IPs
2020-07-21 : 804 unique IPs
2020-07-22 : 1146 unique IPs
2020-07-23 : 661 unique IPs
2020-07-24 : 1189 unique IPs
2020-07-25 : 1091 unique IPs
2020-07-26 : 1403 unique IPs
2020-07-27 : 1693 unique IPs
2020-07-28 : 1664 unique IPs
2020-07-29 : 1727 unique IPs
2020-07-30 : 488 unique IPs
2020-07-31 : 549 unique IPs
2020-08-01 : 202 unique IPs
There would be no sign of new infections until the 12th of August, followed by a total of ~600 infections since then.
Those represent a subset of what happened since we cannot guarantee we indexed ALL servers. From our data, however, we found that:
- 7,340 out of 12,371 Elasticsearch servers have been infected
- 3,200 out of 15,664 MongoDB servers have been infected
- 809 out of 7,689 Zookeeper servers have been infected
You can find the full list of infections with daily reports here.


ProTip 3: Best open source scanning tools
3. Project Discovery
A last minute find while writing this article is the open source part of ProjectDiscovery.
They are providing a full set of tools to facilitate recon and structure your data with Nuclei being the star of the show. It provides a web scanning engine with included rules and easy to read YAML files, which makes it easy to create your own to extend the software capabilities.
ProjectDiscover GitHub PageProTip 4: Pros and cons of open search engines
Some would see the full disclosure of open servers as a security risk, however, let's try to take another perspective by looking at facts:
LeakIX's core was developed in under three weeks, the rest of the work was making it presentable to the public and occasionally improving/adding new plugins. This shows it's not really a lot of work for any motivated actor.
While you might think open servers would take some time to get attacked, this is simply NOT true as honeypot studies show that open servers are usually indexed/dumped during the first day/hours they become open on the Internet.
Therefore, it's a huge misconception to wait for a "ransom" note to pop up to consider having been compromised, as quieter threat actors have most likely been there before.
We won't discuss the option of paying this kind of ransom here, but in this situation, just know that it could be disclosed/used by another group any time.
What is present in open (or closed) search engines like LeakIX, Shodan and ZoomEye should be considered as compromised, not because it made it to the index, but because there are always faster actors out there.
I recently showed the index to a friend whose field has nothing to do with computers (okay, so he's an uber-geek):
In 10 minutes he managed to identify three targets in our home country that we could report and resolve together.
We're choosing a difficult road: caring about our researchers and end-users first. The former deserving to have a collaborative service to protect the later.
It's in our plan to make recent additions visible to researchers only once we get more of them on board.
We don't think selling the data has any impact whatsoever on limiting access to the "good guys". It's a way to do business, which we respect, but we'd like to connect more dots together and make this an efficient and rewarding process for everyone involved.
We also believe in the long term. There's a place for "no-scope" bounty hunting and we'll keep advocating it.
ProTip 5: Tackling IPv6
Aside from learning, one of the reasons we've designed our own tools is to make sure IPv6 is ready, in every part.
The issue with IPv6 is that a sequential or random scanning approach would yield a very low number of results over time, and that's because IPv6 has 128 bits dedicated to addressing, versus 32 bits in IPv4.
The good news is that IPv6 addressing is still done by humans and/or predictable provisioning software.
Take the network part, it doesn't change much. Usually, 16-32 bits will indicate the customer's network in cloud/hosting environment :
There are also chances for machines on this network to be addressed sequentially (e.g., 1, 2, 3, 100, 101, 102...), which represent another 8 to 16 bits.
And we're back to 32 bits per network (with a list of well-known networks) in the worst-case scenarios. This example may be over-simplified but it shows just how predictable the space can become.
Multiple researchers have written tools that would allow machine learning over a known list of IPv6 addresses and generate potential new ones based on the computed model.
IPv666
The first tool we found and used was IPv666.
Using the default model allowed us to index a lot of IPv6 addresses, but things became repetitive after 2-3 weeks.
I haven't dug much further because we found another tool, but I suspect generating an up-to-date/tailored model will improve/vary the results.
Entropy/IP
The second set of tools we tried are the suite proposed by a dedicated Akamai team based on their following paper:
Entropy/IP: Uncovering Structure in IPv6 Addresses
They can be found at:
-
Entropy/IP - Creates a model from an IPv6 list
-
eip-generator - Generates IPv6 based on that model
If used correctly you can build your own models. You could use it to build models from IPs in our index.
Those guys deserve HUGE credit for their work.
IPv6 Hitlist
Lastly, there's the IPv6hitlist project, whose team also published a paper:
Clusters in the Expanse: Understanding and Unbiasing IPv6 Hitlists:
This team includes some of the guys from the Entropy/IP project but also uses other methods to publish fresh lists regularly.
We're currently importing their output once a week.
All public IPv6 can be uncovered
We've seen reports of well-known security entities infiltrating and using NTP pool servers to gather IPv6.
If your server is connecting to the internet, its IP is known:
-
At your ISP
-
At various network exchange points
-
At various update services you use (apt, yum, Docker images)
-
At your DNS provider
-
At your time provider
-
At your VPN provider
The list is non-exhaustive and various actors are using them for scans and analytics.
Firewall rules should be applied to both IPv4 AND IPv6. It's not an option.
ProTip 6: Source code mining
One practice I find underappreciated is mining for leftovers deployment artifacts.
We recently added the GitConfigPlugin to the engine, which does a really stupid thing: GET /.git/config. To my surprise it started yielding substantial results.
While there are false positives due to direct clones from open source projects, others are linked to closed-source repositories.
The presence of the config file will often mean the whole .git directory is open for download, containing all the committed/stashed source code as packed objects.
The excellent GitTools suite can be used to retrieve and extract the code and its history.
This is obviously a strong demonstration as to why you should never commit credentials with your actual code.
ProTip 7: LeakIX power tips
1. Participate in research
This isn't well known, but if you log in, you can create a team and start reporting leaks from the host page. It will be assigned to you and removed from public eyes so you can do a proper disclosure on it. Know that we do ban researchers locking results without reporting them as it goes against our long-term vision.
2. Command line tool
My own favorite, which I use all the time: our CLI tool.
At the time of this writing, the 0.2.0 version has been released. It can be used to query the index from the command line and format the information so you can use it for further research.
$ leakix -l 2 -q "protocol:web AND plugin:GitConfigPlugin" -t "{{ .Ip }}:{{ .Port }} : {{ .Data }}"
178.62.217.44:80 : Found git deployment configuration
[core]
repositoryformatversion = 0
filemode = false
bare = false
logallrefupdates = true
[remote "origin"]
url = https://gitlab.com/lyranalytics/lyra-website.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "abdulrahman"]
remote = origin
merge = refs/heads/abdulrahman
2604:a880:800:a1::10f:1001:80 : Found git deployment configuration
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
[remote "origin"]
url = https://github.com/mautic/mautic.git
fetch = +refs/heads/*:refs/remotes/origin/*
[branch "staging"]
remote = origin
merge = refs/heads/staging
If you check our Github home page, you will also find the source code used to generate the Meow statistics for this article in the leakixsurvey repository.
We hope it gives developers out there ideas to implement/build their own tooling.
Don't forget to check it regularly, as we'll keep adding new projects.
ProTips is an ongoing series where industry experts share their methodologies and cutting-edge tips on how you can sharpen your own cybersecurity skills. If you have suggestions on who you'd like to see featured in ProTips, or you think you're the right person for this series, we look forward to hearing from you! Send us an email at [email protected]
