A Model Driven Approach of Contextualizing Threat Intelligence: Detect ‘19 Series

After you have watched this Webinar, please feel free to contact us with any questions you may have at general@anomali.com.
Transcript
GINO (VOICEOVER): My name is Gino Rombley.
I'm the Solutions Architect at Anomali, and today, I'll be presenting a model driven approach of contextualizing threat intelligence.
So by the show of hands, how many people here are customers of Anomali?
Oh, OK.
That's going to make it a lot easier.
Who are not?
Do you have a threat intelligence platform?
Yeah?
OK, because what I'm about to say applies to pretty much all of them, so it's not Anomali specific.
So a little bit background about me, I have a master's in computer science and engineering, and I specialize in software engineering and technology at the University of Eindhoven in the Netherlands.
I was originally born in St.
Martin, and I moved to Netherlands when I was about 18 to pursue my masters.
My past life, I used to work as an identity and access management consultant at CA Technologies for about roughly five years, had several roles, professional services moving into technical sales.
And roughly about, let's say, two years ago, someone gave me a call and told me about this threat intelligence space and told me it's the future, it's something it's exciting to do.
So I made a switch.
I was a threat intelligence solutions consultant at Ecelectic IQ, it is one of our competitors based out in the Netherlands.
And after working there for a year, I decided to make a switch to Anomali as a solutions architect.
So I meet a lot of our customers, give them advice on how to get leverage our platform, but more importantly, I considered myself a threat analyst at night.
So when I have time, I do my research on certain threats and pretty much [INAUDIBLE] reporting around that.
So you might be asking why contextualized threat intelligence using models, right?
So I've met several of our customers, and what I noticed is that a lot of them, they consume open source feeds or social media intelligence, but none of them make the effort to really provide context around it.
So they expect for some reason that, whether if it's a commercial feed, which they do provide some context around it or open source feeds, that the platform should give you that context, which it does but not all the time.
So I try to explain to our customers that you need to put a little bit of effort in it in order to have context that's relevant to your organization, because if I produce finished intelligence, I'm making it for a wider role, for organization, but you need to contextualize it that's relevant for your organization.
So the overall challenge is threat data.
So in this example, you have a list of observables, right?
They come into the platform, they're scored or they're aggregated, reduplicated, and then your objective as an analyst is to move towards finished intelligence, right?
That's the standard procedure that everyone learn taking a science course or any other threat intelligence course is that you got data, and you've got to make it into finished intel, which is mostly predominantly a report.
Now, if you had multiple feeds, all right, multiple data, it start to get a little bit challenging, right?
Because you're going to make multiple reports, right?
You're going to have the finished report from all these threat data and what you're going to have is just a report that just lists all the IOCs that are associated with it.
So what I are advocating and telling my customers is we need to take an intermediate step, and if you take that intermediate step, it will benefit you in the long run.
And I'm to explain further down what are these benefits.
So if I look at the market today, the threat intelligent platform, when you start using threat intelligence, you're pretty much already aggregating data from several sources.
So imagine five years from now how huge your hub of intelligence is going to be.
So they're going to be a point, and I'm taking this from my experience working in the identity and access management space, is that you're going to get threat intelligence big data, right?
So you're going to have a hub of threat intelligence that's relevant to your organization based on the requirements that you set out in the beginning.
You're going to get analytics.
So at this point, our feed providers then provide their own analytics on certain threats they're seeing out there.
But on your platform, with that huge amount of data that you're having, you can apply analytics.
And furthermore, it becomes an intelligence hub.
So there will come a point where you within your organization-- anyone who wants any questions about something that's going on in the world, that they will turn to your platform to get that intelligence hub, which can be fed to several people in the organization or several machines within that organization.
So what I was thinking, how do I get there?
How can I provide meaningful information-- contextualized information-- by using models?
So the idea that I had in my mind was a model graph-based approach.
So in other words, that threat data that you define-- threat data that comes in-- if I can link it up to several entities or threat actors or TTPs, that in the long run, as it becomes larger and larger, I can apply certain analytics that I can do predictive analytics based on the baseline I create within my platform.
So I started doing some research.
And I said first, should I define my own model?
And I was like, it's going to be difficult-- probably much quicker just to do a PhD in this, because it's going to require me to do a lot of research and I still will have to prove whether it's a sufficient model for someone to use.
So what I did-- I did some research on the industry to see what threat models are out there.
So we're all familiar with the Kill Chain-- Lockheed Martin Kill Chain, right?
STIX-- and so I've worked with STIX multiple times in my career.
And finally, what we have is the Diamond Model.
So each of these models, I evaluate them.
So the Kill Chain, as you all know-- I just pretty much explained the steps that a threat act will take.
So reconnaissance coming straight onto weaponization, delivery, exploitation, installation-- and I taught myself-- If I would want to project this to a graph, I would be quite limited, because it would be just one straight line.
And I did not have the entities that I want to link.
So the adversary would probably be to the top, and these would be all the stuff that's linked to this adversary.
So when I wanted to draw back, it was a little bit linear.
So if I had to project it on a graph, it would be one straight line.
So I decided to look at another model, which is STIX 2.0 architecture, which is-- I'm not really sure if it's being completed by design, but there are several-- these are the architectures provided on 2.0.
So you have-- on the top right, you have adversaries.
And that can be linked, because it uses TTPs, which just have tools and so I'm not seeing the malware that's being used-- course of action So I've worked with STIX before.
And one of the limitations I've noticed with STIX is that I could not link any entity to each entity.
So if I were to-- if I had an indicator-- and this was in version 1.1, 1.2-- I could not link an indicator directly to an adversary.
It needed to be linked to a TTP, and in turn, linked to an adversary.
So it-- yeah, it's OK.
But I figure if I would need to do this on a more easier scale, it's going to be quite difficult.
So yeah, so one of the drawbacks is that not all nodes are connected.
So if I wanted to link to the vulnerability to an indicator, that would not be possible.
So I looked at the Diamond Model.
Anyone here-- I assume anyone here-- everyone's familiar with the Diamond Model.
So looking at the Diamond Model, I've seen some experienced threat analysts believe in this model.
I find it's the most easiest thing to use.
So in terms of contextualizing the information-- which infrastructure is the adversary using, the capability, the victim.
This seemed to me, in my opinion, like I-- it's connected to almost everything except for the adversary and the victim, which is not really a big use case with me.
Because in most cases, it depends like-- if I'm working in an organization, the victim will most likely be either my organization or someone within the organization.
And having that direct adversary or victim is not really a big issue, because I'm mostly interested in the infrastructure and the capability that the adversary is using.
So yeah, any drawbacks-- I couldn't think of any.
The fact that I just wanted it to make sure they were linked amongst each other.
So what I did-- I took an arbitrary platform and I applied the Diamond model.
So here you see, for example, I had an adversary called Ocean Buffalo.
And I started to project and started to establish a relationship in the capability that Ocean Buffalo is using.
The victim that it's targeting is financial services.
What observable infrastructure is being used?
In this case, it's office.window-update.com.
And an attribution back to that threat actor.
So pretty much established a relationship.
So with this, what I just explained-- this is something I probably just said in four sentences-- is some kind of context of what we're seeing here.
So how many people use the Diamond Model on a daily basis to contextualize the information?
Yeah.
So this is where a lot of people find difficult, because it's easy but they don't really understand when to apply it.
So my use case was just simple.
Every time I'm doing an investigation, I'm going to model it like this.
So this is quite simple.
But of course, it gets a little bit difficult, because if I have multiple observables-- and in this example, I have here a user-- IOC, for example-- the office.window-update I have.
What I did is I started to enrich it, which I know enrichment will create a straight line for me, so keep on an enrichment pass at DNS end until I get-- probably will be-- yeah, could be infinite, let's say, relationships or associations.
And then on top of that, what I did is then I focused on the threat actor-- in this case, Ocean Buffalo-- and look at other techniques that is being used.
So I map everything to the minor attack.
And the reason why it's minor attack is because-- yeah, it's more easier to share this information that people can understand what techniques it's using.
So I'm not going to go and reinvent the wheel.
Just used some pretty good techniques-- some models out there.
So as you can see, you get another relationship here.
This is pretty much assigning another association in the [INAUDIBLE].
That's the IFC.
The only thing I can't conclude from here is whether or not this has been attributed to Ocean Buffalo.
So-- But if I continue this, it starts to get very huge.
And this is where it gets interesting.
So this is pretty much a subset.
But just imagine if this started to get much more larger, right?
So there's established relationships between almost all the entities-- the observables, the TTPs, the threat actors, the file hashes.
And with this approach, I can then define some several other metrics.
So here, for example, with these examples of these relationships, I started to think, OK, this seems to be somewhere-- this seems to be familiar or something that I did when I was back in the university.
So relationship amongst several entities-- I can apply, let's say, graph-based algorithms.
So analytics in this example could be leveraged, or be used using algorithms, on my existing dataset that I've created.
So let me just give you a simple example here.
So what I did-- went back to my university books, searched for all my graph-based algorithms, and see which one is most applicable.
And I pretty much could find most of them on the internet.
So this is just a simple example of a detect graph circle.
So pretty much find all cycles-- graph cycles-- from a particular vertex.
So if I would translate this query, I can take a arbitrary vertex and say, find me all the indicators that probably are associated with this TTP.
Or concretely said, give me all indicators that associate to the adversary with a path length n greater than one.
So these algorithms I can find online.
And what I can do here now is that I have Ocean Buffalo, and there is no direct relationship with this-- well, this IOC.
So in some platforms-- and this is something that as you play with several platforms, and even our platform, is that not all the time that adversary will be associated directly to an observable.
So there will be indirect relationships in all its established at.
So what I've-- in this example here, the first cycle that I have-- this is pretty much-- all these are straightforward-- is the first one is that Ocean Buffalo is linked to the spear-phishing attack, which is targeting financial services.
The observable here is being used [INAUDIBLE] those financial observable-- financial services.
And in turn, it's linked to others.
So if I took this vertex-- in this case, the threat actor, I execute my graph algorithm.
It'll spit out office.window-update.com and this user observable.
Now, a challenge also here, for example, is that these are also somehow indirectly related to that threat actor.
But I don't have sufficient evidence to pretty much say it's [INAUDIBLE] that threat actor.
So these are easy queries that I can execute on my platform to spit out the data that comes out.
Any questions?
No.
So another query that I can use-- another algorithm is find two vertices that are connected.
So if I were to translate this query, I would say, find two threat actors that are linked via capability or infrastructure or victim.
So I'll give you a concrete example.
I have a third-party risk.
Someone targeted one of my third-party vendors.
I got some data on them.
And I'll probably say, show me all the threat actors that have been targeting that victim.
So here, for example, if I look at those two threat actors and the way already connected-- so Sam [?
Rizzo, ?] for example, is a research I did.
This person has been doing some spear-phishing attacks-- leveraging these observables.
And I made sure that he's associated to the spear-phishing TTP within the platform.
And then I have my threat actor, Ocean Buffalo.
And if I would execute that query, then these two threat actors will come out as the result.
So I would take the spear phishing and say, show me which threat actors are leveraging these spear phishing, and it can show these two as well.
So going back one step, these are two algorithms that I could use.
You can find it online easily-- on Python, for example.
But the overall objective is just pretty much in terms of getting the data quicker by these established relationships.
I can get a lot of IOCs or the information I'm looking for.
So-- but if it's IOCs that are commonly being used on this spear phishing or the actors or any one of these, I can [INAUDIBLE] do that because they are somehow interrelated with each other.
So another interesting query is connectivity.
So how I would describe this is-- so when it comes to efficiency-- and I don't know if anyone ever experienced this.
But there are times when two-- so I'm talking about maybe a large organization.
If it's a small team of-- let's say a small TTI team, the communication is quite direct, so you won't have to.
But if it's a large team that's dispersed all around the world, and they're using the same platform, there might be a moment that they are both researching the same IOC or the same intelligence that they are getting from several sources.
So with this algorithm, I'm pretty much saying, with connectivity-- find all the islands in the graph which is equivalent.
Show me all graph link analysis in the investigation that do not overlap.
So it can be an analyst's workbench.
Someone's working on a graph.
Show me, if my team are working on the same topic, the same IOC, the same threat actor, or not.
So if I would take this, for example-- these are islands, and there are no connectivity amongst them.
So this is one investigation.
This is another one here-- investigation.
This is a third investigation.
So there's no links between them, which is very much interesting.
Because one, there's no overlap of work.
So people are not doing things redundant, which is good.
And at the same time, it shows if there were-- well, if there were connectivity, in most cases, that you can probably-- and it's a really long stretch, but probably attribute that these are probably indicators or lead you to decide by, let's say, a commercial feed.
Because most of the intelligence that you've seen that are coming from several sources are tied up to each other.
So it's a way of, let's say-- if I go back to my feeds, for example-- if there were too many interactions amongst each other, then they would show that the feeds then itself are overlapping.
So that's pretty much where-- a way of measuring, but if it's overlapping feeds.
Which is good, because it takes a while for some organizations to see that if I'm paying for a commercial feed for a provider, two of them, and they've given me 80% of the same data, why should I pay so much money for the feed providers?
So there's a lot of stuff I can get out of this by applying these type of queries.
So any questions at the moment?
Anything that's unclear?
No.
So what is the next step?
So what I've been working on-- and this doesn't really need to apply to Anomali threat stream.
It's applied to any threat intelligence platform.
So I've worked about-- with three of them that are out there in the market, and even some that are open source.
It's to develop a prototype.
So going back to this, these relationships are safe in a graph database.
So I can use [INAUDIBLE] or any other graph database, find a relationship amongst each of the vertices, and I can start executing my Python script.
So my Python query can go, and it'll pretty much start spitting out all the information of relationships or queries that I have about it.
Coming close to my conclusion-- or come to my conclusion.
So one thing I do like to recommend is that-- and going back to the first slide-- is that there are a lot of organizations that takes this approach.
You get a threat data and you go straight and you just provide finished intel.
That report will show you all the IOCs that are linked to it.
If you take an intermediate step towards using your graph, if you have a link analysis tool, you can then establish a relationship much more better-- more context around it.
The challenge here is that analytics on something like this is difficult.
There is a lot of words that you need to interpret it and look at the whole context around it.
Whereas these graph examples that I showed here as example are more easier to establish quickly in relationship.
So I'm saying to anyone-- try not to use your threat intelligence platform as just a hub or just as an aggregator of threat data.
Try to contextualize it in terms of establishing a relationship between all the threat actors.
More importantly, try to automate the process.
So the majority of platforms out there has rules, a rule filter or rule functionality, that says that if you get new threat data that has a tag that says APT284, or is using terms like [INAUDIBLE],, you can assign it directly to a threat model or an entity in the threat model.
So there are a lot of platforms out there that already automate this process.
And if you continue doing that automated process that you assign it-- for example, anything that's linked to phishing campaigns, link it to spear-phishing TTP.
So try to automate that process.
In terms of threat intelligence that could be used for future analytics, as I mentioned-- looking down the line, there will come a point where the business will ask you risk metrics, return on investment, what-- can you tell me more about this?
So your platform-- going to need training or there will come a point that you want to apply analytics to it.
One of the easy way to apply that is using an established relationship in a graph database.
So this is pretty much what I wanted to say-- a simple approach.
And I'm hoping that in the near future, I'm going to work this development out.
I've already got the-- recording the database, understanding the relationship.
The algorithms I have already-- those are easily to be found.
And it's pretty much spitting them out, applying the algorithms, and getting the scores around that.
Thank you very much for your time.