Maintaining a well-organized aggregation of cyber threat intelligence (CTI) is a critical step to effective threat detection, but once you have that CTI, what do you do with it? Especially, when it comes to threat hunting.
As a satellite ISP providing internet connections to high-level customers in remote locations, availability and confidentiality are critical service components Viasat strives to protect. In order to do this, we've broken away from the traditional analyst driven security mold to integrate engineers, data scientists, intelligence professionals, and incident responders into our security operations center (SOC). The Viasat team has developed methodologies they would like to share with the CTI community that allow them to hunt for threats in real-time, combining behavioral analytics with network activity characterization.
In this on-demand session Jessica O’Bryan, Cyber Threat Intelligence & Threat Hunt Development Lead at Viasat will share how to equip your network security teams with intelligence-driven knowledge to improve their ability to hunt evil.
JESSICA O'BRYAN: We're going to start with the introduction of who we are and then we'll get into why you guys are here.
My name is Jessica O'Bryan.
How many of you have seen Silence of the Lambs?
So I really wanted to be Clarice.
I wanted to be this FBI agent that's studying psychopaths.
So I got my bachelors degree in psychology and I started working for the FBI.
And I realized, oh my gosh, it's not like the movies.
So I started looking around within the FBI trying to figure out what I was actually into.
And the regional computer forensics laboratory was fascinating.
It's like, oh my gosh, this is so cool.
So I got my master's degree in computer science, started working in there, transitioned to DIA.
Anyone here familiar with DIA?
Defense Intelligence Agency.
They kind of do predictive Intelligence for the Pentagon.
And then transition to NCIS, which my friend gave me this gif.
Apparently everyone thinks two people on one keyboard at all times if you work for NCIS.
I thought that was funny.
I haven't even seen the TV show.
But I've worked cyber threat Intelligence for a little over 13 years now.
I have my GPEN, just got my CTI and my CISSP, Because you have to have that.
And I also love surfing and rock climbing.
Any surfers or rock climbers in here?
I feel like not a lot of nerds like these sports.
WILLIAM MATTULL: East coast, yeah.
JESSICA O'BRYAN: We live in California.
AUDIENCE: [INAUDIBLE] JESSICA O'BRYAN: Hey, that's cool.
That works too, that's awesome.
All right WILLIAM MATTULL: Hi, I'm Bill.
I don't have a very interesting story like Jess, so I guess I'm a bit boring.
So I studied a lot in math, in particular working with algorithms.
Like Jess I have a CISSP, like Jess I like to surf.
And this is some pictures of my family there doing various things.
Jess is probably laughing at the size of the surfboard I ride because it's very old school.
So how about a proper football fans, soccer, any of those?
JESSICA O'BRYAN: Nice.
WILLIAM MATTULL: One, all right.
Champion's League going on.
That's pretty much it.
JESSICA O'BRYAN: Cool.
So we work for Viasat.
Has anyone heard of Viasat?
OK, a couple of you have, great.
Viasat set is a global satellite telecom; ISP.
And it's pretty neat when it comes to network security because we see all kinds of stuff.
We have a million customers on the ground.
And we have one and a half million customers in the air.
So we provide internet to people in the middle of nowhere.
Your dude ranch in the middle Wyoming to special forces anywhere.
We also do United Airlines, American Airlines, Qantas, a lot of airlines, and also the executive fleet.
Our co-worker Lee likes to say we provide internet from resident to the president, because we actually provide internet to Air Force One and the rest of the executive fleet.
This is kind of a picture of what our coverage is now in green, and what it will be when our next group of satellites launch in the next couple years.
But it's really exciting.
We process 30 terabytes of traffic a day.
That's a lot.
We see about 300 attacks a day.
And we have DDoS attacks that range from two to 86 gigabits per second, so that's pretty big.
All right, so you guys are here for this, right?
So you have threat intelligence.
Really think about that.
You guys have threat intelligence.
What can you do with it?
Think about how are you currently using it, how could you be using it.
It's more than likely that the methods that you're using fall within one of two buckets.
Either reactive or proactive.
How many of you have a threat intelligence platform, like Anomaly?
OK, good, that's great.
How many of you are feeding your threat intelligence platform into your SIEM or security logs, and you have analysts alerting off of that threat intelligence?
Some of you, OK, good.
This is a reactive approach.
And this is great.
This is a really important baseline.
Every organization should be doing this, at a minimum.
But you will find, and I'm sure some of you found this, you start playing whack-a-mole.
Remember whack-a-mole when you were a kid?
Super fun back then, and apparently cats like it too.
But whack-a-mole gets old.
An alert pops up and you block that IP.
An alert pops up again and you block that, and it just gets exhausting.
So we want to equip you guys with a proactive method for using your threat intelligence.
We want you guys to be able to play chess with your adversary so that's what we hope to do today.
So why be proactive?
There's a lot of reasons.
There's a couple here.
You can really better understand the APTs and malware families that you're seeing on your networks.
You can better understand bot nets, which are a huge problem for us as an ISP.
And you can actually develop novel threat intelligence, which is really exciting.
It's kind of the tip of the spear.
You're not just collecting for Intel, now you're creating it.
So this wouldn't be a threat Intel presentation if I didn't have David Blanco's Pyramid of Pain.
How many of you see this before?
I even have a sticker.
You see it all the time.
But the point here is that you really want to focus on tactics, techniques, and procedures.
We say behaviors throughout this presentation.
So I want to really state that that's pretty important when it comes to being proactive.
So what are we going to show you?
All right, so we're going to start with indicators of compromise.
We're going to start with indicators that we have seen tied to a particular group, OK?
And we're going to put them into a notebook.
And Bill is a Jedi Master data scientist who builds these killer notebooks for us.
The notebook is like the central point of this presentation.
So we take the indicators [INAUDIBLE] we put them into a notebook that we've built out, and then we run the notebook against NetFlow For you, this can be any logs that you have in any format.
And then our initial output is novel behavioral intelligence, which according to the Pyramid of Pain, is pretty important.
These are your TTPs.
And then we take those behaviors that we're finding and we run them against the NetFlow again.
And then our final output is Novel IOCs.
So we're hoping to walk you through this process at a very high level and then you guys can come talk to us after to get a little more guidance if you need it.
All right, so to do anything, there are requirements.
So the main requirements you need in order to do this are intelligence, team, data, notebook platform.
None of you guys are laughing at my cat gif.
You have to have cat pictures in all Intel presentations, it's a rule.
All right, so intelligence, I'm not going to dive into this.
You guys are at an Intel conference.
So I'm assuming you either have intelligence or you're planning on getting some.
You need a team.
Sure, you can have a team of analysts that all respond to alerts, that's great.
But the more diverse your team is, the better.
So I really recommend breaking your team up into lines of operation.
That doesn't mean these are silos.
We all work together.
Bill and I work together every day.
So one team we have is the cyber detection and response team, and this is your incident responders.
These guys are really responding to the alerts.
But they also are executing the threat hunts, they're doing some forensics as well.
Then you have my team, cyber threat intelligence group.
We're developing partnerships, because the more until you have the better.
We are developing threat models for new networks that we're taking in.
And we're developing threat hunts.
And of course, we're managing the threat intelligence also, right.
Then we have the cyber forensics investigations team.
And these guys are responding to major incidents.
And then we have the cyber infrastructure engineering team.
These guys are taking all of those tools, so all of those cool tools that those vendors have out there.
They're making sure everything's integrated seamlessly.
Then you have your dev team.
This is your dream team.
So if any of us within these other LLOs think of a really good idea, we bring it to the dev team and we say, can you make this magic happen?
Can you automate this?
So that's great because all of us who are operational, we can keep moving while our dev team gets stuff done.
Some of you guys are really familiar with the DevOps process.
Some of you may not be.
Then we've got CSA.
This is where Bill comes in, and these guys are doing statistical modeling, behavioral analysis.
They're basically applying math to these major problems that we have.
And again, we're all working together.
I'd also recommend having a threat hunting team.
This is going to include everyone-- people from each of the LLOs lines of operation from the previous slide, but also your red team and your IT sysadmins, because they really understand your network.
All right, the other requirement is data.
You can't just have threat Intel, you have to be able to run it against something.
So threat intelligence is-- sorry.
Threat intelligence is useless without data.
So this is your logs, firewall logs, AV logs, hit IPSs.
You guys know you need this if you want to state the obvious.
Couple of things to think about.
We learned this the hard way.
It takes longer than you think to take raw data and make it useful for extraction.
So make sure you allot time for that if you decide to implement this process we're going to show you.
Also, take into consideration data privacy.
PII and maybe working with classified information.
Make sure you think about these things.
Finally, you really need a notebook analysis platform Anaconda Python environment is great.
It's got easy updates, it5's easy for replication.
This is a dynamic process, so these notebooks are going to be constantly changing, because you're going to be learning new things about how adversaries and malware works and you're going to be changing them.
So Jupyter Notebooks is good, Google Colaboratory.
There are COPS tools that do what we're going to show you.
But this is free and there's more flexibility.
And then you need a lot of hardware.
You're going to need large disk space and RAM.
All right, so how did we start this process?
It started with a hunt.
We were trying to figure out how to do a better job threat hunting at a deeper level.
And so what we've learned is that once you have these really cool behavioral analytics capabilities, you might want to apply it to all the things.
And that's really overwhelming.
So we found that threat hunts are a great starting focus point for using our behavioral analytics capabilities.
Raise your hand if you know what threat hunting is?
Pretty much everyone.
In case you don't know, it's a method of proactively and iteratively looking for bad guys on your network, especially the ones that are trying to hide.
But a lot of people don't know this.
They think threat hunting is just to find the bad guy on the network.
It's also to help map out your gaps in visibility.
And threat hunting really helps do that if you have emulation as part of your process.
So I thought I'd walk you through our threat hunting framework quickly.
Our first step is hypothesis phase.
We need to build a hypothesis.
So say you think TrickBot is on your network.
That's your hypothesis, you can be more fine grained than that if you want.
Then you develop a plan, a plan of action.
So you going to start scraping threat Intel for everything out there that's available on TrickBot.
And you're also going to start building out an emulation plan.
And you're going to build out a blue team checklist, a way to be able to look for TrickBot.
Then you're going to emulate.
You're going to work with your red team to pretend like you're TrickBot and execute.
The reason for this is that we're making an assumption that TrickBot exists, right?
But if it doesn't, and at the end of this hunt we say we didn't find TrickBot, how do we really know we could find it if it wasn't there?
So if you emulate, using your red team, you can validate your visibility.
If you know they did something and you didn't see it, then that's an area for improvement.
Then you hunt.
This is the fun part.
So you have your analysts looking for everything based off the development phase and execution of emulation.
Then you analyze and report.
And finally, you automate whatever you can, so you're constantly looking for TrickBot, not just once.
So the process that we're going to talk you through fits within these first two buckets, so those are the only ones we're going to cover.
So I don't know if you guys noticed, but this is a lot like the scientific method.
So the hypothesis phase, you have to remember, it must be observable and testable.
You do want to pick a hypothesis that you can't test, that can't see on your network.
Pretty obvious, but sometimes you learn this the hard way.
So you want to look for trends on your network and what's going on in threat Intel reporting.
This is Silobreaker.
Anyone heard a Silobreaker?
They actually have some good Intel.
I like their trend reports.
So we're going to give you an example for walking you through the process.
Our hypothesis is that APT 33 is likely targeting our government customers.
Who here knows who APT 33 is?
What country are they?
It's on the slide.
So Iran's pretty popular right now, right?
There's a lot going on.
And you guys know this as Intel analysts.
You see stuff going on in politics, you know that there's going to be some cyber Intel reporting.
If there's political news about Iran or North Korea or any other of the major countries, we're watching.
So we saw the external reporting.
But our infrastructure team was also seeing a lot of activity tied to Iran, and we were also seeing a lot of use of remote access Trojans that Iran commonly uses.
That doesn't mean that that's APT 33, but njRAT and a few others are tied to that group.
So we thought OK, we've got a few data points here to go off of.
We should run a threat hunt.
Then the development phase happens.
And this is where the notebook comes in.
This is where we collect, enrich, and analyze the data.
I have MITR ATT&CK Framework here.
We use that for overlap to check our visibility related to how APTs are operating.
So recap, again, here's what we're going to walk you through.
Start with IOCs, dump them into the notebook, run it again NetFlow, first output is behavioral Intel.
Run it again against the NetFlow and final output is Novel IOCs.
Here's a better breakdown of the steps we're going to bring you through.
First we gather the IOCs, then we vet and pivot.
Find meaningful analytics, study the communications of the adversary.
Then we're going to do some trend analysis, which is the main piece.
Then we get our novel behaviors, IOCs and automation.
All right, Bill?
WILLIAM MATTULL: My turn?
So I'll talk about some of these steps.
JESSICA O'BRYAN: Oh, really quick, we're kind of going to be going back and forth.
He's going to talk you through each piece of the notebook and I'm going to talk here and there about how he applied it to the APT 33 threat hunt.
WILLIAM MATTULL: Yes.
So just a couple notes, thank you.
That this data we're going to look at comes from our residential network, which is an ISP.
So it's pretty open stuff.
So you might see some like ports and things of that nature where you're like, well, logically I would block that on an enterprise network.
That's not necessarily the case on a residential network.
Also, we're going to review some network behavior like from the NetFlow level.
We didn't actually look at the APT malware previously.
So we're just learning as we go as we look at the behavioral thing, and you'll see what I'm talking about.
OK, so the first step is obviously you need to gather some IOCs.
The reason why you're doing this is ultimately you're going to take the IOCs and then cross-reference those with your NetFlows.
You may get a lot of IOCs and you may get a little.
If you have a small number, you can go through each one of those and check for false positives or relevancy to the problem you're trying to solve.
If you have a lot, which can happen, you might just have to rely on some of the trends in your network.
And we'll see that in a second.
JESSICA O'BRYAN: Yeah, so here you can see, this is part of the notebook.
He built this GUI for us where we can plug-in what we're interested in and pull it from our threat intelligence platform.
So you can integrate this with Anomaly or any other threat intelligence platform you have.
You would put in adversary or malware family.
And then there's a couple other contextual pieces that are associated with our tip.
And then you type your value in that you're looking for.
And then it extracts everything that you want.
So here we're looking for IPs that are specific to APT 33.
And as you can see, there's 53 that were found.
And that's what we're running this trend analysis on, that's we're running the notebook against.
OK, so this is the vetting-- this goes back to how he was saying, if you have a small number, take time to vet.
If you have a large number, then you can't.
And we'll show you a little bit of how you can pick and choose what you want to vet with later.
You guys should all know how to vet your threat Intel.
But one thing I'd just like to point out is ASN research is very helpful.
There's some great open source tools out there for free and paid.
This one is Seclytics, and it's pretty neat.
So that the black dot is where Seclytics predicted that a particular IP tied to APT 33 was malicious.
They said hey, we think this IP is malicious.
And the yellow and the red-- thanks, Bill-- are when it actually started conducting malicious behavior.
So there's a lot of predictive analytics out there that you can leverage to use to find additional IOCs that you can be running research on.
WILLIAM MATTULL: Seclytics is pretty useful.
But disclosure, Viasat does have an investment in that company.
JESSICA O'BRYAN: We liked it so much.
We bought a sliver of it.
WILLIAM MATTULL: We bought into it.
Another thing that you might discover at this step of the process is that APTs generally have a provisioning process.
And it's possible to discover patterns in that process.
Perhaps like Jess was saying with the ASN research.
They may have like a favorite cloud provider or a favorite country that tends to be not so open their patches.
So they tend to compromise those type of things.
But you will see a lot of the cloud providers in this process, simply because they give free trials away and stuff like that.
So they take advantage of that.
OK, so the next step is we're going to create meaningful analytics based on your problem.
So there's generally two types.
So we have summary analytics and then we develop analytics for similarity and detection.
So summary analytics is basically you're counting a lot of things.
This is things like how many IP addresses on your network are contacting your IOCs.
What's good about this step is that you're going to use that to rank risk.
So perhaps you come back with, hey, nobody's talking to these IPs.
OK, so we're done.
Or perhaps you may be running several of these APTs simultaneously, and you can rank them to identify a starting point for your activities.
It also helps to plot things.
You'll see that in some other slides, where you can get a sense of these counts and impact, visually.
Let's see, anything else?
So and the other big thing is all along the way, you're trying to eliminate false positives.
This is the bane of our existence.
The first time you go through this process, don't freak out if you get a lot of IPs that look like they're bad.
You probably have a significant number of false positives.
And as you go through this process, you'll start to whittle those down because you'll learn from them.
And that's just part of the process.
With the similarity and detection analytics, essentially what you're doing is you're downloading your NetFlows, or your logs, and then you're getting them into some useful type of format.
So you basically put those into a table.
These would be the columns, and then you go from there.
But that's basically pretty much the starting point for what Jess was saying on how difficult it can be to organize your data this way.
JESSICA O'BRYAN: Yeah.
One thing I want to point out is when you're looking at the variables that you are able to do trend analysis on, as an Intel analyst, you want to scrape open-source reporting on that.
Really understand what open-source reporting is-- what threat Intel reporting is saying.
How does APT 33 operate based off source port destination port byte size information.
WILLIAM MATTULL: OK, so visualization really helps.
And so developing visualizations like this, which is a network diagram, which I'm sure you guys have all seen, or network graph.
You can learn some important things from these graphs.
So what you're seeing here is the green dots, if you can see them.
Those are clients.
Blues are the C2s, if you will.
And then this big giant red blob is the C2, which most of the clients are connecting to.
Doing this type of visualization is really helpful because you'll see patterns that you may not pick up on.
Not with APT 33, but oftentimes with some of these blue dots.
What you'll see is multiple connections.
Or sorry, with your client you might see multiple connections to the blue dots, which suggests that there's some sort of survivability in the network where there may be like a primary, a secondary, a tertiary type of connection.
So if you're just playing whack-a-mole.
You've got to whack them all.
You just can't whack the biggest one.
The network will go ahead and recover.
JESSICA O'BRYAN: Yeah, and this is a really great merging point for data science and threat Intel analysis.
He's taking all of these mathematical models and translating them in a form where I can look at something and say, oh wow, that's the exfil point for APT 33.
I really need to look into that.
And oh, there's bi-directional traffic between certain ones, or there's multiple connections to one, more than any other.
And so this all comes together.
And there's multiple visual graphs or visualizations that he provides throughout the notebook for us to better understand what's happening.
WILLIAM MATTULL: Cool, and we're going to return.
You can actually get analytics out of this type of data representation.
And so we're going to take a look at that in a few slides.
OK, so now that we worked pretty hard and we gathered some data, we can start running some analytics on it.
So this is example of summary analytics I referred to earlier.
You're main takeaway is you can see the top IPs.
You're able to rank them.
So if we were assessing risk here, we would obviously want to start with a top one here and then followed by this one, and then these potentially are false positives that we may not be interested in, because it's just simply doesn't warrant the time or something of that nature.
JESSICA O'BRYAN: Yeah, so these are the APT 33 IPs.
And these are our clients that are communicating with APT 33 IPs.
Obviously we're going to want to vet this top one.
So I had 53 indicators at the beginning, maybe I didn't have time to vet all of those.
But this top one, I'm going to want to make sure it's legitimate.
Here, this one is communications back and forth, which is important.
There's not just incoming traffic from APT 33, but these clients are actually communicating back.
There's bi-directional traffic.
That's very important to know.
WILLIAM MATTULL: Yeah, because you may just get some like scanning, they're just trying to footprint you.
Things of that nature.
So also the business hours, you wanted to mention something about that.
It's pretty important.
JESSICA O'BRYAN: Sure, yeah.
So one thing we try to do when we run focus threat hunts is we like to focus on time that these guys work.
So we work Monday through Friday 9 to 5.
But each country has their own work schedule.
So we like to execute the threat once at a time where our period is within normal work days and hours of the adversary.
WILLIAM MATTULL: OK, so moving along in our summary analytics.
So what we did is we summarized all of the NetFlows with respect to ports and protocols and put them in a bar chart.
You can see that port 80 and 443, very popular, which are just firewall bypass.
But you'll also see other-- for those of you that speak network-- you'll see some email things going on there.
You also see this interesting set of ports.
You'll see this quote of the day thing going on that our network guys tell us about.
So basically what you're doing here is you're starting to just kind of get a sense of, from a behavioral perspective, what the malware or the communication to the C2 and the client they're actually doing.
JESSICA O'BRYAN: Yeah and so if our notebook just ended here, this would be pretty boring.
OK, 80 and 443, everybody's doing that.
But this is just the beginning of the trend analysis.
WILLIAM MATTULL: OK, and so for those of you who don't read code, sorry about that.
But basically you just go ahead and continue with all of the data that you have in your NetFlows and just go ahead and continue to build out those bar charts.
Once you've done that, you can go ahead and start to combine the different columns together and look at them in different ways.
So here are some of those ports that I previously mentioned.
So what you're looking at here, these little rectangular things, these are what's called a box plot in the statistical world.
And what they do is they plot this minimum and this maximum and then the median.
They give an interesting name to quartiles.
That's what these little dividing things are.
What these are generally used for in the statistical world is to look at outliers.
But you can also just quickly visualize where our data groups together.
So the kind of strange looking thing here on this port 80 with this little shape here, you can just basically see that everything's stacked down to the bottom.
And what these numbers represent here are byte sizes.
So what we're doing is we're counting the frequency, the number of this byte size, 300, let's say.
And what's interesting about that is that if you correlate the port, so port 25 in this case, with where these byte sizes are, you can start to see a pattern.
And so port 25, I had to write this down, is basically pop email?
Anybody correct me on that?
JESSICA O'BRYAN: 110 is pop.
WILLIAM MATTULL: SNTP sorry.
Pop is down here, right.
So it's this uniform size here, so perhaps you could see some sort of communication, maybe it's doing some spam type of campaigns there.
For these and these, notice that all the email related ones are all in this categorical type of shape where it doesn't look like this, which is kind of varied and random.
They're in a very fixed data.
This is SNMP type traffic here, we don't know what's going on with that.
But apparently can be used in reflection attacks.
Again, we're saying we didn't study any of the malware, we're just looking at some of the behavioral parts of this.
This port 80 you'll notice that some of the byte sizes are pretty low.
So this is like perhaps ping traffic.
And then it goes up to real high, which is potentially like exfil of the data itself here.
So this is probably the communications [INAUDIBLE] and exfiltration point.
And we don't know what's going on with this port, so.
JESSICA O'BRYAN: You're good.
WILLIAM MATTULL: So the idea here though, basically, is you're just combining different columns to get a sense of what's going on in your NetFlow.
All right, so we keep going on that idea.
So now we're adding in the protocol and the port.
And we get instead of this different visual representation.
So here we can-- oh, this has animation to it.
OK, so if we want to take a look here on the pings, you can see that this byte size of 82, you know we're about a count of 500 there.
Notice how big, how different it is from the rest of this size.
So that's pretty important.
I think Jess put her PowerPoint skills to work there.
JESSICA O'BRYAN: So the notebook is basically saying that 479 times, there was a byte count of 82 over port 0 ICMP.
Does that make sense?
And so you can see that that count is a lot higher than the number of times that the second most common one, 164 happened.
And so that's kind of interesting.
They might be using the byte size of 82 over ICMP.
May or may not.
But we're just trying to show how you can do a trend analysis here and how you're not just looking at ports and you're not just looking at byte sizes.
You're combining both of them to have a fuller story.
WILLIAM MATTULL: OK, likewise you can see the repetition of this 396 byte size.
And you can see this kind of sequential port numbers there, which is kind of unusual.
One of the main takeaways is the data set that we originally started with is, I don't know, a million plus rows.
And in doing these type of representations, we're able to summarize all of that to help you make it make you more efficient and make it a little more manageable along the way.
JESSICA O'BRYAN: Yeah, so as you guys know, source parts are generally randomly picked.
It's usually a randomized process.
So this sequential movement through the 55,000 range is pretty interesting.
So these are things we're taking notes on when we're going through the notebook.
WILLIAM MATTULL: OK, so we do the same thing for a destination ports, continue our animation.
You can see here on 443, kind of same thing, right?
This number is way bigger than any of these numbers.
So this is pretty interesting byte size.
Same thing here.
We see some similar byte sizes, counts for these particular byte sizes.
And by the way, if we would go back and look at the box plots there, you would see some of these patterns in there now that they've been pointed out to you.
And we see this byte size of 74 repeat on this.
JESSICA O'BRYAN: And then 396 was a trend that we saw also on the source ports on the previous slide.
So we're just noting that and picking up on the destination side as well.
WILLIAM MATTULL: OK, so we're approaching the end here of some of these steps and we're going to put them all together.
OK, so it also helps to take a look at things by time.
So this axis down here is just time.
I think it's about three days.
What you're looking here for is basically the payload that client is sending out, just to get a sense of what a client is doing.
Now this is one that sent out the most data.
But you'll see on the next slide when you combine them together, it's kind of a different story depending on which client.
So you can see some of them have this pinkness too, right, these are like large exfils of data, perhaps maybe DDoS type of stuff.
But if you notice down here, it's a little more uniform type of communication.
So potentially, these might be doing some scanner, some lightweight communications, things of that nature.
But you can see that if you just analyze the behavior of one client, it might not give you the entire picture.
OK, so now what we do is we take all that stuff that we learned and we put it all together.
So some of our findings include the 443 with the byte size 138, things of this nature.
We also looked at some time series where we have some bursting type stuff.
And then we also have some of what they call low and slow.
And then the next step is to go ahead, in order to get your novel behavior, or novel IOCs, rather, is to go back into your data, like Jess was saying, and however you query data in your environment to use these type of structures to get back the NetFlows that match.
And you will most likely find some IOCs that are moving faster than what the providers are giving you.
JESSICA O'BRYAN: Yeah, so when you pull together the initial output of the notebook, it's pretty neat.
You can do a lot with that with respect to threat hunting, right?
I can hand that over to the blue team and say, here you go.
Here's some behaviors that you can look for that we didn't find in threat Intel reporting.
This is more real time.
This is likely one way that they're operating, maybe more so than previously reported on.
And like he said, we're also going to find new indicators of compromise from this information, because we're going to run it against the NetFlow and we'll see new IPs that we haven't seen before that weren't part of that initial set of APT 33 IPs.
WILLIAM MATTULL: When you get good at it, like Jess's team runs threat hunts daily.
And so that's a pretty impressive thing, so that we can actually see how our networks are affected, I want to say real time.
But I think in a pretty reasonable amount of time to get through to see if we're being affected.
OK, but one of your takeaways might be that, hey, this takes a lot of time.
I got to get a lot of data and go through this stuff.
So we can start to leverage the power of machine learning.
For those of you who've never you done machine learning before, this book is a pretty good place to get started.
It's got a lot of examples.
And it's hands on, so you'll start to get familiar with code and find some places for some data.
No affiliation, just I read it, I thought it was pretty good.
OK, so the idea is you're going to take that same data that you put all your hard work into, and you can run what's called a cluster algorithm.
And the name of the game here is what you're trying to do is find similarity in your NetFlows, just like we manually did.
But we can go ahead to leverage machine learning to help us along with this.
The idea with machine learning is you're going to-- or with this clustering is that each one of your NetFlows is going to get a label, and so it's grouped together, if you will.
And your activity in the whole process is you're going to look for these benign clusters.
So this is clusters where everything's normal.
But you're going to take your threat Intel and see where it falls within the clusters.
And like we just did before, it will have other similar NetFlows in there, and then you can go ahead and investigate those NetFlows.
So let's just take a look at what it looks like.
So you might have a lot of clusters.
Here is just an example of two.
So what we did is just to make this a 2D representation.
We're just comparing source packets and destination packets.
You'll see these clusters here and here, and they're colored by cluster.
These little x's are something called a centroid.
So for those of you in the know, this is a [INAUDIBLE] algorithm.
And it just starts with some random center point, and then takes a bunch of distance measures and then groups them likewise.
And it moves the centroids around.
So the centroids might start out here and here, and eventually they kind of move around after like, 10,000 iterations, and you end up in the middle of these groups of things.
So real cool, but what we were looking for is where did the TI go.
So we change our coloring just a little bit.
And what you'll notice here is, so if you're in TI, we labeled you false.
If you're not in TI we labeled you false.
If you are in TI we labeled you true, just to color it.
So you can see here that none of these NetFlows are in a TI.
But if we go down here, you can see, well, all of these NetFlows are TI.
And you'll see these little two, a little bit of an eye chart.
Those aren't in TIs.
But you see that they're grouped with flows that are with TI.
Once you get your data formatted, this process may take only a few minutes.
So it's a lot faster than doing it manually.
And so the outcome of this is we had an analyst investigate that those two low NetFlows which actually turn out to be the same IP.
And they did some forensics and determined yeah, it was APT 33 activity on it.
We put it in our threat Intel system, and then we saw the community catch up about one week later.
So this is simple, but quite effective way of doing it.
I would recommend, however, that you do take the time for APTs that are especially important to you, to do both methods.
Simply for the fact that when you do identify the novel TI, you have a lot of information on the behavior aspects of it through that manual process.
So you'll know the ports and the protocols and byte sizes of that things.
That doesn't necessarily come through when you do the machine learning part.
And that's called [?
] explainability in the data science world.
So we don't necessarily always have to use NetFlows in this process, like I mentioned before.
We can do a graph representation.
So the graph was that little spinny thing that was happening earlier.
If you have a Facebook account or LinkedIn or anything like that, any social network, then you've probably seen it from some of the vendors out here.
You can represent your data as a graph.
So there's some terminologies to this.
We have circles and we call these nodes.
These little back and forth arrows, we call those edges.
And we can assign properties to each one of these.
So you can consider this node here as an object.
So the particular object we're interested in is an IP address.
So this could be a client, this could be a C2.
And what we can do is we can take those NetFlows and we can encode it into the edge, and that's referred to as an edge property, and we can do analytics on this stuff.
And you'll notice that if you think logically about what NetFlows are, as an object model, they have a property.
There's direction to it, each one is an object [?
] an IP.
And there's a special name for this representation, it's called the by bivariate graph, which just basically means you have two rows.
You may have more data than this so, you can-- whoops.
Yeah, you may have more data, so you can represent your graphs this way.
Some of the things you're interested in are these type of analytics here.
So if you want to learn more about this, I'm not going to go full on graph because there's a whole body of study called graph theory behind this.
But some of the things you're looking at are the ins and outs to each node object.
You're interested in these properties, edge properties, which we call weights.
We can run some analytics on it called centrality measures.
And those are like super useful, and they're widely used in social networking.
So this is often how they figure out, hey, do you know this person type of stuff.
OK, so if you we take a look here, here's just a graph representation.
i just wanted to point a few things out.
You'll notice that this particular node, number four here, everything's passing through that.
So you can take a between a centrality-- between a centrality measure, and you'll notice that hey, this number is higher than anything else in the graph.
And that just shows that it's an important node.
You can also see something.
So in your network, you may have two clients that are communicating with each other and they should not be.
That measure comes out with this centrality measure called the [INAUDIBLE] cluster coefficient, which identifies triangles.
And so if you notice one, two, four.
You can see one, two, four forms of triangles.
So that's how we take some of these analytics.
And we're interested in the architecture of the communication as well as who's of talking to you who.
And so these, we call them weights here, but this is just the packet counts going on so.
OK, and the last step.
I know this is getting into the tech-y world.
The last step is you normalize this stuff.
And this is what you're actually going to be running your clusters on.
So you can still use the clustering technique once you have learned how to do that.
In this normalization step, what that will allow you to do is to look at networks that have graphs that have the same similarity.
So you can identify Botnets that perhaps are just getting established versus one that are more mature.
OK, I'm going to hand it back to Jess.
JESSICA O'BRYAN: So finally, after running your entire notebook, as we talked about before, you don't want to just run a threat hunt and learn behavioral pieces and then walk away.
You want to keep that useful.
So we highly recommend automating your notebook, you can use [?
] for that.
Another method is building a dashboard in your Sim like this, where you can input things like byte sizes and port numbers that are trending and then save it.
And your Sim can alert you on if it's actually seeing those trends on your network.
Finally, our summary.
Behavioral analytics is queen, think chess.
If you're playing chess with behavioral analytics, you can move anywhere, like the queen does.
And you're one step ahead of threat intelligence reporting.
And that's really important, because as you guys know, the good guys aren't the only ones reading threat Intel reporting.
The bad guys are as well.
So if you can be proactive and run things like these notebooks, you can be one step ahead of the enemy.
We're all about sharing, so please come up and talk to us.
We'd love to share with you are experience and also set up some sort of threat intelligence sharing partnership, if you guys are interested.
Thanks, you guys.
WILLIAM MATTULL: OK, thank you.