SecurityTrails Blog · Jun 06 · by Gianni Perez

From Raw to Refined: Optimizing Data with API-driven Enrichment using Gigasheet

Reading time: 6 minutes

This past May, Gigasheet, the cloud-based, spreadsheet-like data consumption and analytics engine behind oodles of successful exploratory projects and use cases, set the stage for a new wave of API-driven enrichment features that will allow users to unlock a new level of data quality and relevance.

Drawing on past collaborative projects that underlined Gigasheet's role in big data crunching for threat-hunting and triaging purposes, the next logical step was to expand these capabilities by adding no-code custom enrichment opportunities, enabling even more comprehensive threat analysis. This adaptability would allow organizations to align Gigasheet with existing processes or workflows while fostering a repeatable and seamless approach to data augmentation.

Owing to SecurityTrails' growing API ecosystem, this article will examine two specific cases that reveal Gigasheet's unique contribution to extending the domain of API integration beyond its traditional purview. To do this, we'll first briefly revisit the concept of data enrichment, underlying the advantages of no-code API calls for added simplicity. Let's jump right in.

Data enrichment: A Primer

Raw data is hardly beneficial in isolation. As anyone who wrangles with it daily can attest, data's real value lies in the insights it can provide and the actions it can inform. By contrast, raw data is like scattered puzzle pieces waiting to be assembled into a meaningful picture, transforming itself into a powerful tool for decision-making and problem-solving only through careful enrichment, contextualization, and interpretation. Broadly defined, data enrichment refers to this process of enhancing existing data with additional information to improve its quality, accuracy, and usefulness.

In cyber security, a good analyst understands how to manipulate data to find what they need. But without contextualization, data manipulation alone can be misleading. The entire process refers to the complete understanding of the overall circumstances, relationships, and consequences associated with the data under analysis—this includes considering the organizational environment, the particular threat landscape, past trends, and other relevant aspects.

API-based data enrichment offers a compelling value proposition by empowering businesses to enhance or refine their data with external sources and services. However, there are some attenuating circumstances when working with API-based enrichment. For example, API use and terminology typically entails coding or scripting, which may be unfamiliar territory for non-developers. Other challenges may include understanding concepts like endpoints, authentication, request/response formats, and error handling.

Unleashing Custom Enrichments

Gigasheet's intuitive appeal isn't surprising. After all, what's not to like about a product that lets you explore even the most extensive datasets without requiring a glimmer of coding or messy formulae; it's all there for you and at a scale never seen before.

In the recent past, notable advancements have taken place, empowering non-technical users to create robust integrations and define API endpoints, specify data formats, and establish connections between various applications and services in just a few clicks. Consequently, what better way to test the Gigasheet's new custom enrichment feature than to integrate our SecurityTrails' API to show the power and flexibility of these tools combined?

We'll first introduce a typical use case: fetching a domain's current and historical certificate information. SecurityTrails provides extensive internet data coverage, offering historical and real-time passive DNS, domain, and IP intelligence information—this includes detailed data on domain names, subdomains, IP addresses, WHOIS records, user agents, SSL/TLS certificates, and more.

Let’s take the case of SSL certificates, often examined during bug bounty hunting and pentesting to identify vulnerabilities related to their implementation, configuration, or weak cryptographic primitives. We’ll begin by uploading a text file to Gigasheet; the “File Upload” function supports multiple types and formats, including CSV, EVTX, JSON, LOG, and a variety of source options:

Gigasheet, the File Upload function

In a matter of seconds, our file is ready to be analyzed. Again, although we only deal with one domain (one row) for demonstration purposes, Gigasheet can handle billions of rows and even automatically unpack zipped archives while automatically uploading their content to your library:

automatically unpack zipped archives

Let’s start the custom enrichment process by selecting Functions → Enrichments:

Custom enrichment process Data enrichments

By selecting “Custom Enrichment” above, we’re ready to paste our API call, including any necessary tokens or API keys. This CURL request can be obtained directly from the SecurityTrails API section dedicated to SSL certificates and used in the Gigasheet Custom Enrichment feature like so:

Custom Enrichment - cURL

Next, we highlight the portion of the URL we want to substitute with our column data—in this case, “domain_name” will be replaced by the value(s) contained in the “domain” column for each row in our sheet:

Multiple API requests

Immediately, you’ll notice that the column name surrounded by asterisks now replaces your field:

Method: GET

We’re ready to move on. You will be presented with the results of the API call, where you can select individual columns for display—in this case, we’re only interested in the “Issuer” and “Expiration Date”:

API call results

We finally click “Apply” and get our results:

Custom enrichment results

With a bit of cleaning up and the help of Gigasheet’s native “Cleanup Unix Time” function, we get what we’re after:

Cleanup Unix Time

Next, let’s examine using the Domain Specific Language (DSL) API. SecurityTrails’ DSL stands out with its exceptional feature of enabling users to create versatile and intricate queries across massive amounts of domain data, yielding remarkably rapid results.

This time, our input file contains a list of IPs; we want to determine how many hostnames are linked to these. The custom enrichment screen looks as follows:

Domain Specific Language (DSL)

We’ll pivot on the “ip” field to link to the values of each row in our file containing the IPs in question. After inserting the column reference, our placeholder now reflects the following:

IP pivot

Pressing “Test” takes us to the final screen, where we pick the columns we want to keep. In this case, we’re only interested in hostname counts:

Hostname counts

Finally, we quickly rename our “hostname_count” column using the built-in Rename feature, and our results look as follows:

Rename feature

In closing

We’ve titled this post “From Raw to Refined” to reflect on the importance of data enrichment and its challenges for cyber practitioners and researchers handling an ever-increasing amount of disparate log sources or simply big data needing contextualization.

Our collaboration addresses these hurdles head-on, providing researchers with a cutting-edge toolkit that empowers them to navigate the vast landscape of log data effortlessly. Moreover, this effort caters to researchers of all technical backgrounds. Don't miss out on this transformative partnership—it's time to unlock the true potential of data enrichment with SecurityTrails and Gigasheet.

Gianni Perez Blog Author

Gianni is a technical writer at SecurityTrails and adjunct college cybersecurity instructor with over two decades of infosec experience. He knows firsthand the demands security professionals face, and draws upon his knowledge of IT systems - from administration and software dev, as well as automation, to provide valuable security insights that make a real difference.

Subscribe to the SecurityTrails newsletter
Sign up for our newsletter today!

Get the best cybersec research, news, tools,
and interviews with industry leaders