#' --- #' title: "Working with the PacketTotal API in R" #' author: "" #' date: "" #' output: #' html_document: #' keep_md: true #' theme: simplex #' highlight: monochrome #' --- #+ init, include=FALSE knitr::opts_chunk$set(message = FALSE, warning = FALSE, dev="png", collapse = TRUE, fig.retina = 2, fig.width = 10, fig.height = 6) #+ begin #' The crazy/kind folks over at [PacketTotal](https://packettotal.com/) were #' generoue enough to slip me an [API key](https://packettotal.com/api.html), and #' long-time readers of the blog knows what that means: a new [package](https://cinc.rud.is/web/packages/packettotal/)! #' #' ### What is PacketTotal? #' #' If you have a non-compliance-focused job in information security chances are you #' will have come across or had the need to generate [packet captures](https://en.wikipedia.org/wiki/Pcap) #' of network traffic to chase down a situation. PacketTotal seems to be aiming to #' aggregate and socialize the analysis of packet captures in similar fashion to #' what [VirusTotal](https://www.virustotal.com/) does to files/binaries. #' #' PCAPs are a bit trickier than what VirusTotal handles since they may contain #' sensitive organizational data — at the very least private addressing schemes #' — but, I suspect they're working on some sanitization tools to make it easier #' to do that and are also doing a decent job at ensuring they're not logging the IP address #' (or any other identifying data) of the uploader. #' #' Their [online exploratory interface](https://packettotal.com/app/search?q=) is fairly #' robust but by providing an API they make it possible for one to go beyond such #' an interface and enhance a dynamic investigation on-the-fly while keeping a record of #' analysis flow and artifacts. #' #' We won't be doing that in this post since it is just an introductory "this is how #' the site/package works" post but once they round out some corners we may delve into a #' full (faux) investigation and perhaps write our own investigations UX with Shiny. #' #' Onwards! #' #' ### Using the PacketTotal API #' #' I kept the dependencies pretty thin so the extra `library()` calls I'm putting in here #' are mostly for analysis & visualization support. Let's get them out of the way: #+ libs library(zip) library(DT) library(packettotal) library(lubridate) library(hrbrthemes) library(tidyverse) #' Now, let's look for [Emotet](https://www.us-cert.gov/ncas/alerts/TA18-201A), #' which is a nasty piece of malware your organization has likely been hit with multiple #' times by now. To do that, we need to do issue a query on the "deep search" #' endpoint: es <- pt_deep_search("emotet") #' Now, we get thos results and take a look: emo_res <- pt_get_search_results(es) head(emo_res$results, 10) #' Let's get even more detail: emo_det <- pt_detail("5b4eb1fc54db6761bb42385d1ac52b8a") #' and, see what's in the summary: str(emo_det$analysis_summary, 1) #' Who are the top talkers (the IP addresses with the most connections)? str(emo_det$analysis_summary$top_talkers) #' Let's use [ipinfo.io](https://ipinfo.io/) to see some extra detail on that main one: ip_5.187.0.158 <- ipinfo::query_ip("5.187.0.158") str(ip_5.187.0.158) #' We can also lookup various stats (these JSON strings are going to be real #' percentages soon from the API): str(emo_det$analysis_summary$dns_statistics) str(emo_det$analysis_summary$file_statistics) #' So, we get FQDNs, files, DNS queries and more. We can also just get #' every bit of data PacketTotal could squeeze out of the PCAP by downloading #' an "analysis" archive: dl <- pt_download("5b4eb1fc54db6761bb42385d1ac52b8a", dl_dir = "~/Data") #' We'll unpack it and take a look: unzip(dl, exdir = "~/Data/5b4eb1fc54db6761bb42385d1ac52b8a") list.files("~/Data/5b4eb1fc54db6761bb42385d1ac52b8a") #' We won't explore all of these in this post but `conn.csv` is the Zeek #' (formerly, ugh, 'Bro' — which was short for 'Big Brother' b/c it was #' snooping on your packets, but still…) connection logs. That's something #' I'm super familiar with given that we generate tens of thousands of them every #' day at $WORK in our massive honeypot network, so let's poke at it: read_csv("~/Data/5b4eb1fc54db6761bb42385d1ac52b8a/conn.csv", na = c("null", "")) %>% janitor::clean_names() -> conns glimpse(conns) #' (They're also fixing the un-friendly-for-data science column names.) #' #' Lots of info about the connections, and we can make our own exploratory #' interface for them pretty easily: DT::datatable(conns) #' But, we can also attack it with the tidyverse: count(conns, target_port, service, sort=TRUE) count(conns, sender_ip, sort=TRUE) count(conns, target_ip, sort=TRUE) mutate(conns, sec = floor_date(timestamp, "minute")) %>% count(sec, transport_protocol) %>% ggplot(aes(sec, n)) + geom_line() + facet_wrap(~transport_protocol) + labs(title = "Total Connections-per-minute by Protocol") + theme_ft_rc(grid="XY") select(conns, payload_bytes_sent, payload_bytes_received) %>% gather(measure, value) %>% mutate(value = as.numeric(value)) %>% ggplot(aes(value)) + ggalt::geom_bkde(fill = alpha(ft_cols$gray, 1/3)) + scale_x_log10(label=scales::comma) + labs(title = "Payload metadata distributions", subtitle = "Note: Log10 Scale") + facet_wrap(~measure) + theme_ft_rc(grid="XY") #' We can even see any threat inteligence they were able to enrich the #' data with: read_csv("~/Data/5b4eb1fc54db6761bb42385d1ac52b8a/intel.csv", na = c("null", "")) %>% janitor::clean_names() %>% DT::datatable() #' We can also look for similar PCAPs: sim <- pt_similar("5b4eb1fc54db6761bb42385d1ac52b8a") str(sim$similar$results, 1) #' This is where the power of the API would really come in handy as we #' collect all this information and start to look for correlations, #' time series patterns (or anomalies) and possibly extract features #' to help build models to detect various types of malicious traffic. #' #' ### FIN #' #' Visit the [package page](https://cinc.rud.is/web/packages/packettotal/) for information #' on how to install it and you can find it on [SourceHut](https://git.sr.ht/~hrbrmstr/packettotal), #' [GitLab](https://gitlab.com/hrbrmstr/packettotal) or (ugh) [GitHub](https://github.com/hrbrmstr/packettotal). #' #' Keep watching their service/API since it's only going to get even better and #' definitely toss up suggestions for package features or jump on in and file some #' PRs at your social coding hub of choice.