Power Your Cyber Threat Intelligence Program with Machine Learning

After you have watched this Webinar, please feel free to contact us with any questions you may have at general@anomali.com.
Transcript
JENNIFER: Power Your Cyber Threat Intelligence Program with Machine Learning, sponsored by Anomali.
Leading today's discussion is Joe Gehrke, solutions architect at Anomali.
Joe works with companies to build and operationalize threat intelligence programs and help solve some of their most complex challenges.
Joe's 20-year career in cybersecurity spans from security strategy to solution implementation.
His current areas of focus include system interoperability, intelligence operationalization, and platform software development kits.
And we're excited he's here to join us today.
Over the years, cyber threat intelligence has evolved in order to exploit the potential offered with various machine learning applications.
By understanding the intersection between machine learning and CTI, organizations will be better positioned to prevent and respond to cyber attacks.
We see examples of this in machine learning algorithms that provide enrichment of threat context, help in the prioritization of threats, and performing historical evaluation of you then.
This presentation explores both the usage of threat intelligence as a mechanism for informing analytic applications and machine learning as a source for generating and improving the accuracy of threat intelligence.
Keep in mind that today's slides are available to download via the handouts button.
And to ask any questions, remember to use the question and answer button.
The recording of today's session will be available on demand after the e-summit concludes.
I'd now like to hand it over to Joe.
Joe, over to you.
JOE GERKHE: Thank you, Jennifer.
And just to build a little bit on my background and why I'm sharing some of the insights today, I've had the opportunity to work in both the machine learning space and in the behavior analytics firm, as well as in threat intelligence for the last several years.
And I always find it interesting that there were arguments on both sides as to why one might be better than the other.
And I thought rather than that approach, having them work together is probably the best way to secure the organization.
And to that end, I just wanted to go through a few things-- the evolution of machine learning and kind of how we got to where we are today.
And I'll give a pretty example, a good example there for how we got to somewhere using the endpoint.
And then I want to relate to solve to the pyramid of pain.
I'll go into a little bit more detail on a pyramid of pain.
Some folks might be very familiar with that.
But again, I think machine learning is very relevant to the pyramid of pain.
And then go into a little bit some of the evolving and newer applications of machine learning and security, particularly natural language processing.
And then how we can tie this all together.
What does it mean for an organization?
Should you be using machine learning?
Or how should you be using it?
And how should you incorporate threat intelligence into that process?
So just a little bit on how machine learning has evolved over the years there's a couple things in the last three, four, or five years that have occurred that have really allowed machine learning to expand all over the place, particularly in cybersecurity.
One of the more important things is just this unlimited amount of compute power.
Whether you're talking about Google Cloud or AWS or Azure, it's cheap to run these calculations.
And they're offering analytics as a service where you can provide your own data.
So there's never been more availability or access to compute power to do this what were at one time fairly sophisticated machine learning models.
And then, of course, there's been evolving models-- the data science behind things has evolved over time as well.
Different techniques, improved models, improved data sets, those are all contributing to what we see today.
But a lot of this kind of leaves the question, where do those traditional indicators of compromise play?
There's been indicators of compromise for 10, 20, Biohazards are great examples in the malware world.
So it'll be interesting to see what roles those played today, given the expanded use of machine learning and how has actually the production of that intelligence been impacted by some of these evolving machine learning models.
And then why we actually need both, these indicator and tactical intelligence-based models as well as these more behavioral approaches.
Before I get into a little bit on some of the pyramid of pain aspects, just a quick example that I think illustrates the point.
If we take endpoint way, way, way long ago, we started with antivirus.
And at the time, there were signature databases.
And if you recall, you're always prompted to download the latest and greatest signature database.
At the time, that might have been effective.
But today, it certainly is not an effective approach because it's way too static.
It doesn't take into account all these different things that are going on with malware variants.
And so it's just not an approach that works today.
And as we evolved, we started to see things like endpoint protection platforms and endpoint detection and response.
EPP being kind of that holistic suites of tools that would protect an endpoint.
In the Windows world, that would be things like your antivirus in combination with Windows firewall, maybe some port restrictions, everything that goes into keeping that endpoint secure.
And then we got to this kind of greater focus on EDR, for the Endpoint Detection and Response.
And many of these more recent EPP/EDR applications heavily rely on some form of machine learning.
So certainly, the next generation antivirus uses machine learning to be better at detecting what appears to be malware without knowing the signature of the malware.
EDR is a great tool for producing information that can give you analytic insight.
If you think about all the things that are going on on an endpoint, the traffic that's being produced, the interactions between an endpoint and another device, very, very rich data sets, we're understanding what is normal and what is abnormal.
And that's kind of what feeds into that user and the behavior analytics.
Speak to that a little bit more, but we all have typical behavior.
What do I normally go to during the day?
What does a machine normally communicate with during the day?
So we're able to, at a mathematical level, very, very accurately tell you what is abnormal.
Now, of course, that doesn't necessarily mean malicious but there are things that can help with that.
So one note on machine learning, Just: because I know we all have slightly different definitions of machine learning and that's kind of one of those buzzwords that's been around for several years now.
But machine learning is a subset of artificial intelligence.
And within cybersecurity, for the most part, there's been two types of models that are employed in machine learning.
First is supervised, the second is unsupervised.
I think the easiest way to think of these is with an example.
For supervised machine learning, you are telling the model of what is right and what is wrong.
It's labeled data.
And if you think about working hours as an example of identifying abnormal behavior, you might have somebody that works from 9:00 to 5:00.
If you see them working at 2:00 in the morning, that's abnormal.
Not necessarily wrong, but in a supervised model, you're telling it normal for this person given what they do in an organization.
Unsupervised, you're not giving that same information.
You're letting the model learn what is normal.
So without telling it that 9:00 to 5:00 is the normal workday for an employee, it will learn what the normal workday is for that person.
Now, that might go alongside, and very often it does goes alongside this notion of clustering, that your normal workday should look an awful lot like somebody else's normal workday.
But there's also a lot of other approaches in machine learning, in cybersecurity that use these types of models.
Certainly, anomaly detection, so that's a big one.
The overwhelming majority of machine learning techniques that we use in defensive mechanisms today are based on anomaly detection, whether that's the behavior analytics or it's the traffic that we're looking at, turning zero on the thing that is abnormal.
And then natural language processing, which is a more recent kind of application of machine learning to cybersecurity.
So how does this all relate to the pyramid of pain I was mentioning a moment ago?
Pyramid of pain has been out for five or six years, and it focused on threat intelligence and how-- it's very easy to look at things like hashes.
But as you move up that pyramid of pain, it becomes very, very difficult to identify and protect against things like TTPs.
What I've found is that on the machine learning side of the house, how we employ machine learning also gets more difficult as we go up the pyramid of pain.
And we think about these things with endpoint and malware at the bottom.
We've been using machine learning for quite a long time to help to analyze malware samples, malware variants, and to produce some meaningful output.
As you go up, we haven't been doing much to identify TTPs or tools because it's just naturally more difficult to identify.
We see things stay true around things like domain generation algorithm in the malware world.
One of those arguments against IOCs in their non-static nature is that malware has a constantly changing set of domains that it communicates back to.
And so we see this interplay between machine learning and threat intelligence where we constantly need to evolve to produce better intelligence.
And at the same time, all of these tools that we have in the environment that are detecting and protecting are producing additional threat intelligence that goes back into the loop.
So what I want to do is look at each section of the pyramid of pain and how we've seen this evolution inside the machine learning.
We take this from two chunks.
On the left hand side of the pyramid, we're looking at threat intelligence and how it's produced.
On the right hand side of the pyramid, we'll look at machine learning and how it detects or helps to protect against the attacks that are related to a given element in the pyramid of pain.
So if you look at that from an input and output standpoint, we go back to the traditional antivirus.
The hash layer, that's pretty darn easy.
Keep it up to date, feed me a bunch of hashes, and tell me what I need to look out for.
But we've recently seen machine learning improve that, take it kind of the next step.
And if you think about all of this next generation AV and how they're actually tuning these models, they're doing it based on an enormous scale where they go out to these things that are malware zoos or just contain a huge, huge, huge amount of samples of malware that can be fed through models to produce this output that's very effective at detecting malware on the endpoint even though we don't have some sort of static cache or static signature relating to that.
At the same time, we started to produce new types of hashes.
So we, in the thread intel world, traditionally have been used to these static hashes which are things like MD5s and SHA-1s and SHA-256s.
But fuzzy hashing is producing these things that are-- SSD is a good example-- that take into account the minor variations in malware that might otherwise produce very different traditional hashes.
And all of a sudden, you get a version of malware change or one line of code changed, we still can have that same fuzzy hash and that allows us to have a much, much broader coverage.
So this is a type of thing that's been possible through machine learning.
And it's a good example of an evolution of why threat intelligence has improved because of machine learning.
So now, we take that same component but look at it on detection side.
On the right hand side of the pyramid, when we're looking for these things, Next Generation AV is employing all this to identify what is likely to be malware.
So think about that feedback loop.
Because we're now identifying malware variants, we're producing all of this telemetry which includes those hashes that were just identified.
It includes callbacks to IP addresses and domains.
So now, that same machine learning that's used on the endpoint to system EDR is feeding back into threat intelligence to produce more timely indicators from compromised.
And we'll see this at each point in the pyramid of pain where the output of a detection or prevention or response mechanism is additional timely threat intelligence.
Moving up a little bit more.
We look at some of these more IPs and domains as they're produced from a threat intelligence standpoint.
We can actually take domain generation algorithms and use those to produce threat intelligence.
There's a lot of things out there that we know exists from, you know, we're able to actually get our hands on the algorithms.
We know what that domain is going to be and that can be input to threat intelligence.
We'll see this also employed quite heavily, machine learning that is, in the scraping of intelligence.
A lot of the field vendors out there rely heavily on automation to go out and scrape open sources of intelligence, to scrape, in some cases, the deep and dark web.
That's only possible through some large-scale machine learning.
We also see machine learning used to score domain indicators, IP indicators, other network indicators.
And this is used primarily to determine or give insight to the user the likelihood that something that's been identified is actually malicious.
You'd have a lot of data points that flow into them, if we're looking at network indicators, things like who is information, passive DNS, these all feed into models that give you a score for how competent you might be, something as malicious.
Couple examples here that tie back to that notion of unsupervised and supervised machine learning, the first is IP insights.
So AWS has a service that they provide that gives you insight into the communication that's normal between two IP addresses.
This is a completely unsupervised machine learning model.
Another example of an anomaly is something called macula, which is supervised machine learning, which takes into account some of those things I was mentioning around who is to produce a dynamic scorer on an indicator, of how likely a domain or an IP or a URL is to be malicious.
So all of those techniques have driven improvement even at that domain and IP level when it comes to threat intelligence.
If we get and flip it over to the detection side on some of these same elements, the domain names in particular, rather than taking domain generation as an input to threat intelligence, if we're purely focusing on detecting domain generation, we can also do that on the detection side of the house.
There's just a few common malware families there that rely heavily on DGA.
It's fairly effective nowadays to be able to identify DGAs with pretty high confidence.
But again, this is something that's really only been enabled on machine learning over the last several years.
You might also see simple models employed here to protect environments based on things like the registration time.
So we know that a domain that has just been registered and is alive for just a few hours is more likely than not to be malicious.
Of course, they're not always malicious, but some simple approaches just involve flat-out blocking newly registered domains.
At the domain and IP level, a machine learning can also kind of give us that visibility into what is unusual.
If I'm working on my computer and I, all of a sudden, connect out to an IP address or a domain that I've never connected to before, that's certainly interesting and it could indeed be malicious.
And this is the type of intelligence or analytics that moves us towards this behavior analytics and EDR, where we really are digging into the analytics, the data itself, to understand what is it that's unusual.
This time, I'm actually going to keep moving up that right hand side on the detection side of the house and talk a little bit about these really difficult elements of the pyramid of pain-- TTPs and tools, and Network/Host Artifacts.
This is where we start to see the importance of UEBA and EDR come in.
It's very, very important for us to be able to understand, number one, what is normal?
And then number two, of the abnormal activity, what is actually malicious?
And these are things that you would see in offerings from solutions like a Securonix and Exabeam and Interset that focus an awful lot on his behavior analytics.
On the endpoint side of the house, you see this Cybereason, Crowdstrike, Carbon Black, Tanium, the list goes on and on.
These types of solutions are able to pore through vast amounts of data and identify what is unusual.
But just remember here, unusual doesn't mean malicious.
And in a moment, I'll speak to how threat intel can inform that.
So quickly, though, if you're focusing in on the different types of models, that unsupervised and supervised is important when it comes to how that interplays with threat intelligence.
If you're talking about unsupervised machine learning, those do not take input for their models.
They take event data to train it but they don't say, hey, here's what's right, here's what's wrong.
That doesn't mean we can't use threat intelligence to help that.
What we do is we say, here's an output.
Threat intelligence can be used to validate that output or perhaps increase a risk score associated with anomalous behavior.
On the supervised side of the house, we can use all of this input to train the models to make them better.
So here's the really interesting parts that I find, is that now on the machine learning as we produce threat intelligence, at the top of the pyramid, we've had a really, really difficult time using machine learning to help the threat intel analysts identify TTPs, identify tools, to do that in an automated fashion.
And that's where natural language processing is started to come into play.
And you'll see this from a few different offerings as well.
An Anomali Lens, MITRE has even produced something called TRAM which is used in the broader context to identify TTPs, MITRE TTPs.
And Microsoft has put out quite a bit of research around natural language processing and how that might be leveraged to help the analyst produce this type of intelligence.
I mean, finally, this is something that has been very, very difficult to do.
And we're starting to see this slowly roll out and I think it's going to make a huge impact to the day-to-day work of the analyst as they're able to automate a lot more of these really time-consuming tasks.
There is an added benefit to this now too, and that's related to the training of the workforce.
I think many organizations are acutely aware of the shortage in talent in cybersecurity.
And as we start to employ these tools like natural language processing, we can leverage those as a training tool.
So if we just take some very, very simple examples, the threat actor, kind of the source is a little bit crazy even for a seasoned threat intel analyst.
Taking just Fancy Bear as an example.
There might be five, six, seven aliases to that same threat actor.
Using something visually like natural language processing to alert the analysts of those alias is extremely helpful.
It's a time-saver.
And that's in addition to taking a brand new threat analyst in English terms giving them something they can use to get quickly ramped up.
The other thing that this starts to do is it unlocks all of this threat intel capability to a much broader set inside of the organization.
Incident response, SOC, all of a sudden they can now use this same capability to ask the question, what is it that I'm looking at in my incident response tool?
Natural language processing can tell them, this is what you're looking at, and in fact, it looks like this TTP is being used.
That's incredibly valuable to an organization in that same feedback loop to now say, well, I am going to create this as threat intelligence, provides that ongoing feedback loop from threat intel to detection and prevention.
So with all of that, what are those lessons that we can take away from this?
And I think back to some of that research I was just noting, you'll find out there are arguments that indicators of compromise are dead, or machine learning is great, it tells me what's abnormal but it never ever tells me what I need to worry about.
There's some validity in both of those but when you use them correctly together, you're going to have a really, really powerful solution.
And so couple of things here I think are worth pointing out.
When it comes to tactical intelligence, what's traditional IP is domains and file hashes.
Just remember, there can only be one patient zero.
The second that person reports that indicator, you can take advantage of it.
The quicker you operationalize that, the better you are.
But this is always going to be valuable intelligence.
The other thing is that these machine learning models that we use, they thrive on label data.
All of this threat intelligence that we're gathering that we say is good or bad is the necessary input to effective machine learning models.
So it will always be relevant from that context.
It also always provides that additional level of validation to the outputs of machine learning models.
If you tell me something as bad, I can validate that with threat intel.
On the machine learning, encourage everyone to deploy some form of machine learning in their cyber defense.
It's likely that most organizations already have.
Next Gen AV or EDR or maybe one of the other, a lot of organizations are employing this User and Entity Behavior Analytics.
And just know that you can enrich all of that with that same threat intelligence, not only to say is this producing a valid output, but also then to help with the attack attribution.
If your machine learning models accurately predict that something's been impacted, we can use threat intel to validate that.
So having these two components in place will better prepare the organization on the defensive side.
And with that, I will open it up to questions.
JENNIFER: Great, thanks so much, Joe.
We have a couple of questions from the audience here.
The first one we have is, where do you see natural language processing capabilities going in the near future?
JOE GERKHE: Yeah, that's an interesting one.
You know, NLP is one of those evolving technologies, so my guess is as good as anybody else's, but what I have a hunch is that we'll start to use NLP on a much grander scale.
The collection of intelligence as it exists out in the wild, whether that's the surface part of the deep dark web, we've employed machine learning for a very long time in order to scrape intelligence, to scrape IP, scrape domains.
I think we'll start to use natural language processing in a much similar way to, say, given a forum or a blog, I want to always automatically be scraping that but use NLP instead of other machine learning models.
JENNIFER: Great.
Our next question is, it seems like some of these methods are likely to generate false positives, is that the case?
JOE GERKHE: Yeah, I think like any method that you look at or any solution you look at, there's always this notion that you can produce more false positives alongside with better detection.
You always have to weigh that.
And I think when we talk about machine learning, in particular, a false positive in my view is something that's been flagged as malicious that's not.
And so we have to be careful to distinguish in machine learning between what's anomalous and what's malicious.
So the math is almost always accurate.
It flags something that is unusual because the math says it's unusual.
And it's a matter of taking it to the next step and validating that unusual activity to determine if it's malicious.
And without that kind of tuning exercise in there, you could see an increase in false positives.
JENNIFER: Thank you.
And finally, we have, but you mentioned IOCs are as important as ever, but only if they are quickly operationalized.
Can you explain?
JOE GERKHE: Yeah, sure.
So this is another important one because all of this-- if you think about EDR and what's going on in a very dynamic environment, those are all producing telemetry.
And me as the person that's not patient zero, I want to consume that and block the known bad.
I'm doing myself a disservice if I don't block what somebody tells me is bad.
And that's where we start to get into threat intel platforms.
Threat intel platforms are generally very, very good at aggregating large volumes of threat intelligence feeds, but then taking those and integrating them into endpoint, into web proxies firewalls since-- so to get that benefit of the telemetry, you really need to make sure that you have something in place that takes that telemetry and gets it deployed out to your defensive structure as quickly as you possibly can.