SecurityTrails Blog · Nov 30 · by Gianni Perez

SecurityTrails Meets Gigasheet: Taking Your Recon Analysis to a Whole New Level

Reading time: 11 minutes
Listen to this article

Humans, in most cases, are not built to process and conceptualize data in any significant measure or speed.

Notwithstanding, the last several years have seen an unprecedented growth in data collection and ingestion techniques driven by newer forms of network and cloud technologies, arousing a particular (and ever-growing) concern among the cybersecurity community as diminished visibility threatens to grow proportionally to the degree of integration.

In other words, organizations should be asking themselves if the logs and data they're collecting are actually telling the whole story and, if they are, is the human component, namely the incident responders and threat hunters at the crossroads, able to quickly align itself with what really took place.

There is, however, a new tool on the horizon that threatens to disrupt the old paradigm of looking endlessly at relational entities, such as spreadsheets, in search of the mythical "Aha!" moment: Gigasheet.

Combining the succinct dimensionality of structured data with a powerful analytics engine capable of handling billions of data points at a time, Gigasheet will certainly innovate the prescriptive space where data can be manipulated, aggregated, queried, and analyzed under a single web-based ecosystem that is as broadly intuitive as it is powerful.

Incidentally, given this project's characteristics and the demands currently placed on good data quality, we could think of no better tool than our very own SQL Explorer to generate large recon activity that could be easily consumed and analyzed—a collaborative endeavor that surely did not disappoint.

Enter Gigasheet

The future belongs to big data; there's very little doubt about that. The terminology, in all its rich diversity, dominates just about every aspect of our digital lives, including niche (e.g., non-tech) environments that once exhibited a smattering of it, with the cyber security industry being a definitive, representative sample of the ongoing trend.

For instance, in cyber, data flows in from a multitude of services often in disparate formats and lattices underscored by the originating application. As the pipeline grows, analysts can be easily caught in a never-ending cat-and-mouse game of chasing after interesting artifacts and traffic, especially if their toolset of choice lacks important filtering, joining, and intersecting capabilities—for the latter, large data dumps can dramatically compound on the problem by requiring significant processing times even when pitted against robust hardware specifications.

When the early adoption of cloud-based analytic tools became the dominant narrative, many seized the opportunity to integrate the emerging technology into their processes. This, however, entailed aggregating and normalizing, slicing and dicing, and similar operations, just to arrive at a suitable model capable of interoperability.

Thus, when presented with these and similar challenges, many chose (and still do) to resort to off-the-shelf applications (think Microsoft Excel here) for quick data representation—others preferred more programmatic approaches, such as the acclaimed Pandas library, but these precluded many entry-level professionals from expeditiously manipulating the data due to a substantial learning curve.

To break down some of these important barriers, Gigasheet's team realized that accelerating analysis meant removing the initial scaffolding, reducing the setup effort to a small number of clicks. This is SaaS at its best: conducting resource-intensive tasks with ease and scalability without worrying about the underlying infrastructure, reliability, and accessibility for all team members who no longer need to be sidetracked by maintenance windows or hardware issues, resulting in increased collaboration, as well as overall faster response times to critical items in need of immediate attention. Best of all, Gigasheet's 24x7 development cycle directly translates to optimizations and fixes that are rolled out almost daily, giving you access to the latest functions and features.

In this blog post, we'll take you on a tour of Gigasheet as we explore a handful of its most salient features and applications using search results rendered using SecurityTrails’ SQL Explorer, as mentioned, as well as those generated by honeypot activity centered on port scanning traffic. We'll attempt to identify potential use cases in a variety of settings, including, but not limited to, threat hunting and incident triaging, formalizing methods when applicable and outlining opportunities aimed at streamlining operations.

Let's dive in.

Uniting SecurityTrails and Gigasheet

Getting started with Gigasheet is quite simple. After registering for a free account, you're presented with your workbench, or dashboard—this is where you'll be uploading the content (e.g., log files) to be analyzed.

Getting started with Gigasheet

Gigasheet allows you to upload files in different configurations, including CSV, JSON, EVTX, tab-separated logs, and raw PCAPs; all of which can be in zip format prior to uploading for faster consumption—additionally, you can point to a specific URL containing your file(s) and Gigasheet will happily handle the rest.

Gigasheet upload

As soon as the files become available, we can begin exploring their content. Gigasheet's emphasis on a no-code, data science approach to the challenge of threat hunting automatically understands and parses fields such as IP addresses and other common IoCs, so no time is wasted in sorting and concatenating basic information on the part of the analyst. Moreover, Gigasheet's able to handle considerably large datasets (see above), which is an uncommon proposition in many server or desktop-based deployments.

Introducing SecurityTrails SQL Explorer

So, with an initial focus on some of the more labor-intensive items that Gigasheet certainly helps us with, we decided to harness the power of the user interface against a search performed using our very own SQL Explorer, pulling close to 400,000 records (shown below) for the country of Bangladesh in search of hostnames, IPs, exposed ports, etc.

Introducing SecurityTrails SQL Explorer

Backed by the powerful SecurityTrails SQL, SQL Explorer raises the bar in terms of searching through DNS and domain-historical (WHOIS) data, the passive identification of attack surface infrastructure signaling to misconfigured endpoints or incorrect port assignments, as well as certificate (SSL) associations and transparency.

Notably, SQL Explorer comes with a useful set of SSL reference properties to help bug bounty and threat hunters determine certain certificate conditions such as domain ownership and issuing organization, hashes, signing attributions, as well as different validation dates.

SSL reference properties

Integrated into SurfaceBrowser™, SQL Explorer is perfectly distinguishable from similar domain-specific languages in that it's specifically tailored to work with our massive OSINT database and surface reduction flagship family of products. Consequently, finding suitable passive intelligence to work with was simply a matter of leveraging the right SQL operators and, in a matter of seconds, we were ready to put Gigasheet to the test.

The first step involved uploading the query results we retrieved from SQL Explorer, in CSV (zipped) format, to Gigasheet, like so:

SQL query results

Notice immediately, that artifacts uploaded to Gigasheet's dashboard can be easily renamed using the Edit File Metadata feature:

Similar options allow you to delete and share uploaded content via email:

Share uploaded content

With the initial housekeeping chores out of the way, we can proceed to look at the data in detail. Clicking on the filename takes us to the sort of columnar arrangement we're so used to working with, with a few critical differences: columns are easily selected (and unselected), sorted, and filtered with great flexibility. For example, do you need to exclude certain hostnames based on a certain word or pattern? No problem, just create a filtering task that does not contain the sequence and hit Apply. Here's an example that removes hostnames with the word gov in them:

filtering task filtering task 2 filtering task 3

Deduping specific values is as easy as applying the grouping functionality to a certain column, but first, let's eliminate empty IP values from our table to declutter the view. Once again, we can accomplish this by adding a conditional to the previous filter and setting the ip.address column value to: is not empty.

eliminate empty IP

We're now ready to group results—say, for example, based on hosting providers in descending order:

Grouping results Grouping results2

Now, security analysts and first responders need a lot more than the capacity to sort values or filter on specific fields. For this reason, Gigasheet offers an entire built-in threat intelligence enrichment service that offers contextualized IP-based analytics at no additional cost, including popular feeds such as:

Similar offerings include the ability to perform GeoIP lookups, or to access a vast amount of IoC and internet scanning data using prominent frameworks such as VirusTotal, Recorded Future, or GreyNoise; just bring your API key with you and let Gigasheet take care of the rest. In a field where context is king and event correlation an absolute necessity, these integrations will allow you to weed out the noisy data from their more important counterparts, accelerating triage and considerably improving median time to resolution (MTTR) in the process.

Remarkably, Gigasheet isn't bound by the theoretical limit of a little over one million rows baked into typical spreadsheet applications like Excel or OpenOffice Calc. This feature has a direct impact on the lifecycle of data analysis and illustrates an important point: the tool should never get in the way of the analyst. For this reason, Gigasheet was designed from the ground up to help you get to the data quickly and effectively, reducing the number of false positives and other unnecessary distractions to a minimum, as aforementioned.

Drawing on the previous use case, now it's time for us to examine in greater detail how Gigasheet's able to deal with some of the pain points associated with uncontextualized signals. This time, we'll be working with data collected using a research honeypot running the popular HoneyDB project—in all, a 5.75 GB corpus spanning 30 days' worth of low-interaction sensor activity—as we attempt to identify actionable intel.

uncontextualized signals

Once again, we begin by doing some cleaning up—getting rid of certain columns, or fields, and grouping others when applicable; the idea is to reduce the initial 66 columns to a number that is not only manageable but useful to our purposes.

Then, we proceeded to identify unique hits on the honeypot by separating distinct remote IP connection attempts, and targeted protocol(s), on all exposed ports using the grouping option for a simplified view of the traffic:

Grouping option

At first glance, we can see who the top ten scanners are:

Top ten scanners

Counts are particularly attractive (and handy), indicating unique sessions per IP address. Let's try giving our list of remote IPs some additional context using the Enrich feature, this time we'll choose GreyNoise to confirm the presence of scanners amongst our data:

Enrich feature

The results are as shown:

Enrich feature results

Next, we group the results by destination port, resulting in:

Group the results by destination port

Lastly, let's explore the Aggregation functionality by pivoting to the number of bytes sent per day. We can accomplish this by dragging the column in question (source/bytes) under the sigma symbol (shown below)—this effectively allows you to carry out a set of calculations (e.g., averages) across the entire dataset.

Entire dataset

As you can see below, September 10th was a particularly busy day for our honeypot!

Busy day

In pivot mode, we can actually visualize what these results look like:

Pivot mode Pivot mode2

Sorting through billions of these cells in a matter of seconds is one of Gigasheet's greatest strengths. Still another implicit fact is the ability to scale your analysis, beyond grouping and filtering, by developing timelines, or correlating important log aspects such as event ID's, error codes, etc., and even some casual packet analysis—as Gigasheet grows and matures, new possibilities will surely emerge.

Closing thoughts

The path forward is a rather simple one: big data will rule for the foreseeable future, albeit some key questions and challenges will inevitably remain. As of today, we think we understand the abstractions, the enrichment and association requirements, as well as the respective operations; and yet, for all intents and purposes, the expressive value of data somehow remains elusive regardless of how intuitive it all "feels".

That is why we need platforms like SecurityTrails SQL and Gigasheet to help us make sense of an ever-growing number of data sources, providing adequate context when needed while supplying security programmes with a constant influx of actionable intelligence, visibility, and enhanced opportunities for both prevention and detection.

Give Gigasheet a try today (it's free!), as well as SecurityTrails SQL, and save precious time on your next project or investigation—you won't be disappointed.

Gianni Perez Blog Author

Gianni is a technical writer at SecurityTrails and adjunct college cybersecurity instructor with over two decades of infosec experience. He knows firsthand the demands security professionals face, and draws upon his knowledge of IT systems - from administration and software dev, as well as automation, to provide valuable security insights that make a real difference.

Subscribe to the SecurityTrails newsletter
Sign up for our newsletter today!

Get the best cybersec research, news, tools,
and interviews with industry leaders