T-Talk: Tactics to Transform Thinking Through Tabulating Types and Tags of Threat Indicators: Detect '18 Presentation Series | Anomali


Tactics to Transform Thinking by Tabulating Types and Tags of Indicators: Detect ‘18 Presentation Series

After you have watched this Webinar, please feel free to contact us with any questions you may have at general@anomali.com.


Great, so I am Evan Wright.

I am the Principal Data Scientist at Anomali.

What we're looking at, if you saw the title for this talk.

it was very focused on tactics to transform types and tags.

So we're super interested in threats in this talk, and we're also very interested in large data analysis of the threats, and what can we conclude from it.

So that's the consistent theme of the talk.

And again, I'm Evan Wright from Anomali, with two PowerPoints, two lightsabers, like so.

So, I'm generally an optimist, which means I feel like I don't belong at security conferences, because everybody's so not optimistic usually.

It's like, man, we're so getting owned again this year, every year.

And so I tend to be an optimist.

So I have to tuck that away, otherwise I don't fit in too much at a security conference sometimes.

So this is my cynics view of what we're talking about today in security.

So we're looking at something we don't really understand, the internet, which is let's say, too big to understand, and something we can't define, which is pretty much every aspect of security.

We have massive definition problems in security, so much so that when I see tweets online, some of the most liked tweets that are useful related to security are straightforward simple definitions.

Because everybody was like, this was never clear to me before, and you actually explained it straightforwardly.

So we're going to be looking at a bunch of quantitative measures of different types of threats to help better define what these threats are, especially in regard to how they sometimes blend.

And of course, to complicate the issue a little bit more-- and when I say terminology with no agreement, I'm talking about how indicators often have multiple tags associated to it.


What exactly is this doing?

That is the threat intel that is reported isn't necessarily super clear on what it's doing.

If you've got a nice write-up, sometimes that's clear.

But you don't always have that.

So, if that wasn't hard enough, let's explain it with something intimidating.

So the last about third of this talk, I will be talking about some machine learning data output, because it helps understand the data better.

I'm not a type of person who uses machine learning for the sake of machine learning.

The first 2/3 of it will just be straightforward raw data analysis and visualization.

And the second third of it will be visualizing some of the ML data.

All right, so at a high level, these are the four points we're talking about.

Why understanding itypes, or threats if you want to think of it that way, is important-- why understanding itypes is related to threats; so is just kind of a definition connection introduction.

Our first big question is, how much do we confuse existing threats in a short period of time.

So this is getting at in the industry where we kind of lack agreement in a small time slice.

Then we'll expand the time horizon and start looking at where we don't have agreement in longer periods of time, which may include things like repurposing of indicators, changing in TTPs, these types of things.

And then our fourth point, which is where I'm going to go into a little bit of the visualization of some of the ML data, is how well our enrichment data helps us identify the threats specifically, which is pretty much the workflow that we're doing.

And we're doing a lot of this instant response.

Itypes are what bring us together today.

And so definitions are very important.

I've talked about that a little bit.

There's your freebie comic without a break.

So when I talk about an itype, it's basically the combination of a threat like spam, APT, malware, et cetera; with an observable type, so combined with is it an IP, domain, file, hash.

So if I say the phrase "itype" that's a little bit of an Anomali specific reference for indicator type.

And this is really what we're talking about, the combination of threat and an observable type.

These are some examples of observable types.

A lot of them, especially that we tend to talk about more, are the ones that are clearly associated with maliciousness-- actors, bots.

There's some kind of pup categories of adware, brute force.

And then there's other categories, which are more of in some ways sort of TTP indications.


It's like did you register this domain with some suspicious mechanism?

Maybe a privacy registration, maybe it was purchased with Bitcoin.

There are itypes that include that as well, and things like cryptocurrency mining software for indicator types as well.

So that's kind of the third type of indicator type.

Most of what we're going to be focusing on is the more malicious indicator types.

OK, so motivation, why do we care about indicator types?

So different actions that you take would be based on the type of indicators.

And so if we're confusing them, you may take the wrong action until you get back around to, OK, wait, this is classified wrong, to take the appropriate action.

Sometimes our TTPs, like domains registered with Bitcoin, sometimes these IOCs overlap with malicious APTs.


In that case, that's something we're interested in as well.

This is another reason why we're interested in understanding things like itype overlaps.

And finally, because of definitions, like I mentioned earlier, if two itypes are regularly interchangeable, then that fundamentally changes what the definition of them is.

It's sort of a blending of the two indicator types.

So I just had to include a nice slide that included the three different perspectives in the internet.

This is the internet according to Southpark, the internet according to IT Crowd, and then the internet according to serious people; serious people being my least favorite.

So a lot of these indicator types do map into mechanisms of demonstrating where in the kill chain it is happening.

There are some exceptions to those.

APT is not necessarily describing a specific mode of the kill chain and adware is not exactly describing a specific mode of the kill chain.

But generally speaking, these are indicator types we're going to focus on a little bit more than others.

So the exploit stage, if you're interested in that stage of the kill chain, Paul Sheck has a good talk at the end of the day, where he's focusing right on exploit kits.

All right, so this is our first comic break.

OK, so let's get into how inconsistent are these indicator types.

So the first thing we're going to do is we're going to look at a narrow time window.

So we're going to look at the past six to seven months.

And we're looking at data in Anomali's platform, according to our taxonomy, which is how we organize these itypes.

And one thing that's really important to do is that some of these indicator types have really high volume.

Some of these have lower volume.

So we normalize for all of that.

Specifically, if you're familiar, we use a Jaccard index for this normalization, so that we can compare apples to apples.

So basically what you're seeing is a percent overlap, what you would just expect to see when you're looking for a percent overlap.

So here's our first graphic.

So initial observations, we're looking at domains, the first half of the year.

The overlap, and so, let me explain a little bit about these charts.

Since we're comparing overlap between two groups, this is a perfectly what we call in math, symmetric across diagonal.

So basically if you draw a line from the top left to lower right, it perfectly mirrors itself.

So don't feel too confused looking at that.

But otherwise it'd just be wasted screen space.

So maybe you can see it a little bit better.

So this particular scale is from basically 0% to 4% overlap.

So generally the scale obviously matters a whole lot.

But I wanted to point out common areas of mistakes, or common areas of in the short time period where do we see more confusion, where it's unlikely that the IPs and the domains-- this case domains-- are being repurposed.

It's unlikely, because for example, Whois registrations last a year.

So there's not going to be a new owner doing a completely new technique.

So it's a reasonable time period to just be looking at, in isolation, figuring, OK, it's probably generally the same owner owning that particular indicator.

So between suspicious and malicious, there's reasonably high overlap, relatively speaking, so about 4%.

And so which isn't super high, which really tells me that as an industry and our mapping taxonomy is pretty reasonable with domains, right?

highest, and there's not too many indicator types that have a lot of overlap.

Interestingly enough that C2 and phishing are-- sorry, this should be phishing and malicious are overlapped.

So, sorry.

So we've got these malware types and we've got these suspicious types, so mal and suspicious.

And they're pretty vague terms.

So part of what I'm going to be talking about, since we have these indicators in the platform, is based on the data; what do we mean by malware and suspicious.

We have official definitions, like I showed a few slides ago.

You can see the list of all of our itypes entirely in the ThreatStream platform.

And so there's definitions for suspicious and malicious there.

But what does the data say about these specifically?

So that C2 domains and phishing domains are some of the biggest influencers of suspicious and malicious in our platform regarding domains.

IPs are a good bit more of a mess.

So you probably intuitively know some of this, because IPs as indicators are less useful for a number of reasons.

Last year I gave a talk, which was much more about the mathematics of why IPs are more frustrating than domains, why there's less agreement about IPs than domains.

Now we're just observing the data, and we see pretty clearly that despite the fact that any two itypes in isolation don't seem to have particularly high overlap, when you start talking about something like the exploit itype, it really adds up.

So when you start talking about how often we confuse exploit IPs in general, this starts adding up a good bit more.

Because if you start summing across either the rows or columns, it tends to be much bigger with IPs than domains.

But it's not necessarily targeted at too much of any one type of confusion between IPs.

So I have always had this fascination of scanning IPs.

I feel like it's one of the-- I don't want to be too dramatic, but one of the big questions in threat intel is are they useful.


Because we talk about getting these scans.

And well it's a low severity type of thing that the fact that they're scanning isn't a particularly big threat.

One of the insights we get here is might scanning IPs be a precursor to other types of malicious behavior.

And so we see as we come down the column, they're pretty commonly overlapping-- well, I mean so 3% of them.

So it's not super common for malware specifically to be doing the scanning.

But when you look at all these other malicious types combined, so 2 and 1/2 plus 1 and 1/2, these things really add up.

So it's something like 10% of all scanning IPs end up doing something that is, in fact, much more concern.

So I think part of the takeaway here is that you don't want to completely ignore scanning IPs.

And there probably is value if you can look to constructing more complicated rules about using scanning IPs, along with perhaps some other parameters when you're setting up your filters.

Because scanning IPs do have a value, but obviously there's a high volume of them.

So in the long term, I'm still not giving up on scanning IPs.

But I do appreciate the fact, when we're drowning in data, it doesn't help if we can't clearly action on the indicators.

So another interesting connection is about bots and scanning of the scanning type.

This intuitively makes a fair amount of sense.

That if you have a bot, they're probably going to be used to scan.

So that seems pretty intuitively validated in the data.

Again with our question of how do we understand what mal and suspicious types are, SSH scanners tend to be pretty influential in the suspicious category, which I thought was something that was kind of interesting.

But of course, the brute forcing is even more prevalent with the suspicious itypes.

So for suspicious we have different types of scanning particularly standing out.

So you could call brute force a type of scanning, actual scanning itypes, and then SSH scanning specifically.

All right, so here's another comic break.

All right, so next big question in front of us, which IOCs shift threats over time?

So what we get by looking at pretty much the same type of data on a much longer horizon, we're looking at repurposing of indicators for new purposes.

So one of the things that really motivated me to start this talk was when I see indicators in the ThreatStream platform, and we see that our intelligence has sometimes reported it as like a bot, a scanner, malware, right?

If you get intelligence that tells you that it's doing all sorts of these different things, how do you make sense of it?

So that's what brought me to start asking a lot of these questions about what are the properties of all these different types.

So now we'll expand it, since the last section was much more focused on in a short time window how much do we confuse these threats.

Now we're broadening the time horizon quite a bit to basically all of our data.

So this is going back to 2014, so big data analysis, right?

It's big data analysis that's boiled down into really, really simple measurements.

So before we get into our cool colorful comparisons, I talked earlier about if you sum across the data.

So another way to put this is if you're looking at, with those grids we're looking at two-dimensional data.

If we just look at one-dimensional data, what does it look like?

So let's start with domains.

So in general, our mal domain indicator type is the most confused, over time, type of indicator.

Our second most is suspicious.

This is why I'm pointing out specifically some of these definition observations about what suspicious tends to correlate with, lots of suspicious scanning and so forth in previous slides.

And then we continue down the list.

Compromised, at 9%; APT at 7%; adware and so on.

So this is how much over time in all of our data, we end up confusing these threats.

Obviously as we look at more data, it's more likely the indicator types are going to overlap.

So now when we start looking at IPs over time, this is diving into a very big question about indicator aging, right?

So there's a lot less IPs than domains.

If we keep IPs as indicators for long periods of time, how much problems does that cause?

And what we see from these measurements, from the highest ones, our spam IPs over the past four years, not quite half, but over a third of them overlap with some other indicator type besides spam.

Spoiler alert, most of its bots.

But nonetheless, this is really speaking to what can we expect out of IPs.

And it also points to the observation that we need to age IPs more than we age domains, because of this overlap.

Because if we keep going at this rate, in 10 years probably IPs are going to overlap with something else.

And that's going to get more and more confusing, when you're being told that this IP over time has been associated with these different threats.

In some ways that history may be of interest, sort of repurposing of techniques.

So that's something we're going to be thinking about a little bit moving forward that we want to preserve a little bit of that history, which is useful.

But we still need to come to a more definitive observation of what's happening now, and how we do that in the threat platform today is that everything's pretty much timed.

You can see the dates that they occur on.

But as you go down this list, I mean, the numbers are still really staggering about how much overlap there is between all sorts of different threats.

Spam and bot was the single biggest association, and there's comparatively a lot less of an issue with malware and other itypes, and especially suspicious.

So this is actually pretty reasonable that we have IPs that were registered under suspicious premises, and we're not calling them So IPs, when we see a suspicious IP, it tends to be more clear than when we say a suspicious domain, in general.

So this is 14%.

This is 11%.

And even though 14 is higher than 11, it's much lower in the west, right?

So this is again, we're looking at pretty much all of our data over history.

So now let's look at our two-dimensional assessment of this longer term view of data.

So this is again speaking a little bit more to shifts in tactics of how these domains are getting repurposed over time.

So again, not too surprising that malware and suspicious domains tend to overlap a bit, about 5 and 1/2% overlap between IPs with suspicious registration information and ones that are associated with malware.

There's a little bit of a caveat with this long-term assessment.

So when we looked at the past six months, in general mal was a pretty good label.

It was much, much better than this longer term one.

In the past, we struggled with should we have a label of malicious or malware.

So if you go back about two or three years, that becomes a little bit more of an issue that there was a little bit more ambiguity.

Does mal mean malicious or does it mean malware?

But since we looked at our recent data set, that hasn't been a problem for a couple years.

So everybody likes talking about APT.

So I certainly should mention it.

It's one of the things that stands out in my mind, the connections of APT to both adware, which I found extremely surprising, and the connection to compromise, which I didn't find quite as surprising.

But a little over 3 and 1/2% APT overlap with compromise, and about 2% overlap with adware.

Again, there are what is an APT Is a fundamental definition problem in our industry?

In the last part of my talk, I will talk about how we can use enrichments to measure these things, and we'll find that APTs are particularly confusing and poorly defined.

So I was expecting a little bit more overlap between phishing and spam.

And there tends to be very little.

So over time, we seem to be pretty good at separating the two techniques, and they don't seem to reuse the same infrastructure.

Again, we're focused on just domains in this graph.

When we're talking about definitions of what is a malicious domain in the long run, let's think about shifting TTPs.

So it makes sense that malware domains are commonly associated to C2 and are associated with phishing.

Because if we think about the kill chain, usually it's phish, install malware, and then use C2.

So all three of these are connected.

Yeah, question?

Well you tapped on the something that was a question.

Have you thought about expanding your itype to another property of, including the kill chain?

Because to me, that would help.

Where do I see this observable in my security stack?

If I could reduce that with a kill chain property, along with malware or APT [INAUDIBLE],, I think it was reduce what you're seeing there.


So you're thinking about-- would that require the indicators themselves to be associated to the kill chain, or are you just saying if the itype was associated with a phase of the kill chain?

The latter.


I think that is a good bit more straightforward to do.

There are some ambiguous challenges to that.

Like APT as an indicator doesn't really connect to the kill chain.

The kill chain doesn't capture that real well.

Well and then it goes back to your diamond model.

So the kill chain with the diamond model, you're focused on the infrastructure.

So if you could add, just what layer, 2, 3, 4, 5, 6; what layer of the kill chain.

If you saw that ATP observable, it would help in the security stack detection.


Yeah, that's a good point.

We'll definitely think about that a bit.

So all right, so with the domain threat overlaps, when it comes down to definitions, just kind of some quick bullet points about what a suspicious domain is.

It's most commonly malware, C2, spam, phishing, and compromise.

That's essentially what makes up a suspicious domain, as the data speaks.

So there are-- so this is a little bit of a smaller graph.

I generally, especially to some people in the back can see, I've omitted a number of columns.

But I mentioned earlier that we have other indicator types, which are somewhat associated to TTPs, like domains registered with free email, and VPN domains, suspicious Whois registrations, and just domains registered with disposable email.

So just kind of the takeaway is that a lot of these other little bit more TTP techniques don't really overlap.

So specifically exfil, disposable emails, free email, like a Gmail.com registered on Whois, and parked domains.

These have a tiny bit of overlap with the indicators for domains that we normally consider to be bad.

But it's not a very significant amount.

And there's absolutely no overlap between VPN domains and Whois privacy domains, which I thought was pretty interesting.

I don't know.

It's always been my impression that security practitioners think that domains that have privacy services blocking the Whois data are inherently worse, and that's not what we see in the data.

So Whois privacy only really overlaps with registrations of free email domains.

So there's no clear association between Whois privacy registrations and domains doing bad things, when we look at all the data in aggregate.

So IPs, wow, this is getting kind of fast and messy, right?

So a couple of things to point out, bot and spam are a huge overlap.

So just these two indicators themselves, indicators overlap between these two itypes.

So that was probably the single biggest takeaway to me.

So I mentioned scanning IPs earlier.

And these numbers are basically similar to when we were seeing scanning IPs eventually associated maliciousness and our six-month window.

But when we look at the larger time horizon, it ends up just being much, much larger.

Scanning IPs really associated to malware, I mean really as in 6% overlap, and about 5% overlap each, with bot, brute force, and spam as well.

So you start adding these things up, and eventually an IP that's doing scanning is associated to maliciousness somehow, one way or another.

An awful lot of them are.

So again, when we're talking about IPs, yeah, don't squint.

I was tempted to make a pun about "don't panic" from Hitchhiker's Guide, but I didn't.

So don't squint at this.

This is really similar to the other data set, when we're looking at a bunch of these other not necessarily malicious indicator types of-- like it's a crypto IP.


It's used in bitcoin mining or something like that.

Is it associated to maliciousness?

Not really.

Ddos, not really.

And so this is pretty much my list of things that really don't have any good IP connection.

This is looking at all of the data for all time.

So tor, and i2p, so onion routing technologies, not really associated with traditionally malicious indicator types.

ddos IPs, not really associated to malicious types; parking, sinkhole, crypto, and IPs that are communicating with proxies, virtual VPS services, and exfiltration.

All right.

So that is the end of that section, which leads us to our next comic break.

It's a really long file name.

OK, so the last section of this is asking the question about enrichment data.

So when we have our indicators like in ThreatStream we have all sorts of enrichment data.

So by enrichment data, we basically mean the things that you would want additional context around the indicator that aren't necessarily local network logs.

So this might be passive DNS, various sort of blacklists lookups, this might be Whois registrations.

These types of obvious context you want to pull back at your first glance, things you might pivot on.

And all of these things we're sort of considering enrichment data.

So this is the data that helps you investigate your incidents.

Now we're asking the question, if we look at all of this data, all of the data that you would use to understand more about your situation that's not specific to your network, when you look at that data, how similar do itypes compare?

OK, so let me explain a little bit about the technique we're using.

So if you graph data in one dimension, you basically have a line, right?

So imagine it's one feature, maybe passive DNS counts or something.

Graphing in two dimensions, this is super simple.

You're like, Evan, why are you explaining this?

This is so simple.

But we've got two dimensions of data that we're comparing.

Maybe it's passive DNS counts and the lifetime of a Whois registration, how long ago was.

So that's a two dimensional visualization.

Three dimensional visualization gets really tricky, right?

These things are-- we start to avoid some of these, unless you're going to spin around it, or use a VR headset, or something like that.

Because we're adding a depth dimension.

So maybe we add something like 2rrr.

We've got some count of DNS queries.

Maybe we've got Whois age, and then maybe you've got count of block lists or something.

So three dimensions, really, really simple; really straightforward.

So the problem is when we're looking at all of these indicators, we're looking at a lot of dimensions of measurable things.

There's all sorts of ways we might incorporate some of our security knowledge to create new measurements from that.

And in our particular problem of trying to describe these different indicator types by so many different data dimensions, we end up with something like 20,000 dimensions.

Wow, but we're moving it into two dimensions.

So how do we do that?

So do we have any folks in the audience that would consider themselves machine learning practitioners?

OK, so I will avoid the detailed machine learning explanation then.

And so essentially, well wait, was there a couple?

A couple in the back?


So thank you for your participation.

So just give me 30 seconds to explain to them.

So for our method here, we're using an unsupervised technique, which uses a non-convex objective function, where we're mapping into a non-parametric space.

So specifically we're going from 20,000 dimensions to two dimensions.

To do that, we're using an algorithm called tSNE, which is the t distributed stochastic neighbor embedding.

And so essentially what that means, it's making relationships between each of these neighbors, using some randomization.

And it's embedding 20,000 dimensions into two dimensions, so that we can easily graph it using some math associated with the student's t-test.

So that's this algorithm called tSNE.

And we're going to kind of break apart this big fancy graph.

But the takeaway for this really complex graph in front of you is that if you measured all of these 20,000 dimensions of data, and you gave it to an analyst that could live 10,000 years, and that analyst took all the data and started saying, which of these are close together; he would group them together based on all of these 20,000 dimensions.

And you've told him to draw it in a 2D graph like this.

You would end up with something like this.

So when you look across all these different dimensions of data, similar indicators tend to group together.

So we did all that without any itype labeling, and then we went ahead and we applied the itype labeling.

So what you start to see is taking all of the complex features we could use, how well could you group these different itypes together to come up with conclusions?

And so you pretty much think about working on something like this, as far as what are reasonable neighborhoods, right?

So this green dot is benign.

So this is pretty much our clearly benign section.

This might be combining features like lots and lots of passive DNS traffic, for example.

Generally, this tends to be well-established sites.

Since we're looking at just domains with a section, for domains, this might be domains that have existed for 10 years that have lots and lots of queries going to them.

This would be the general behavior of this part of the graph.

So let's dive into some specifics, where I noticed some patterns between the indicator types.

Now again, in the previous sections, we were just looking at overlap of reporting.

Now what we're looking at is using our enrichment data to say how similar these things are behaving.

So to do this, one observation is that there's lots of groups of these exploit mal and phishing domains.

But we are seeing some clusters coming out.

So think of this as, how many techniques might it take to capture exploit mal and phishing domains.

This could be one recipe in your analytic notebook, would be this group.

Maybe this would be another recipe.

Maybe this would be a third recipe, and so on.

So it gives you some impression of how many detection techniques might you need to capture some of these indicator types.

And the second thing it points to is how similar are they.

So down here with this sort of particular technique, we see a lot of similarity between phishing and exploitation.

For up here, we see more similarity between phishing and malware.

Spam domains tend to stick out a little bit more, probably because they tend to be somewhat noisy on the large scale of things.

We find spam is really voluminous.

So some of our measurements that are better at capturing really, really high amounts of volume might be capturing some of these pieces.

Some of our techniques for identifying maybe sinkholes or others, might help with some of these other clusters.

So with spam we see these to be probably the most easily distinguishable itype of the ones that we looked at on this list.

And so pretty two different recipes do a pretty good job of capturing spam, which I thought was kind of interesting.

APT, so I promised a little bit to talk of APT.

And this is really, really clear to me.

So what this says to me is that detecting APT does not have common behaviors.

This is like one of the first empirical demonstrations that we don't know what APT is really, at least regarding common detection techniques.

So part of the takeaway here is that there's no clear silver bullet.

And APT groups tend to all behave pretty dissimilarly insofar as what we can actually measure to detect them with.

I mean this is almost randomly distributed.

So we've got a long way to go before we're going to build any sort of heuristic detection algorithms for APT.

Yeah, question?

At the same time though, you're making the assumption that all of us, all of Anomali's clients, when we're submitting things and tagging them as APT that they're actually APT and not, oh I want this IOC to never expire, so I'm going to use APT for it.

Yes, yes.

So perfect point, right?

So the big assumption here is how much agreement do we have in the industry on what APT is, right?

So that's an excellent point, which I actually forgot to mention.

And actually I think this really relates to the frustration of a lot of technical analysts and senior management being like, I want you to go find APT.

And us being like, it's not that simple.

We don't have good definitions for APT, right?

A couple other observations, so it's very likely that compromised domains look similar to adware.

We're not making too much assertion about which one leads to which one.

I would kind of guess compromised leads to adware.

That they get compromised and then they're using adware to monetize it.

But I could be wrong about that.

But the general measurement that if you think about this as recipes for similar behaviors, it looks like there's what we'd call a long-tail distribution.

There's a couple of good recipes that will catch a lot of this behavior.

But you're not going to catch it all.

So maybe half the data is sort of clear clumps of similar behaving indicators between this adware and compromised domains.

And then the other half is much more kind of edge case or spurious, or as was pointed out earlier, possibly incorrectly labeled in the ecosystem.

Because it could very well be that some of these edge cases here were simply just not labeled well or not labeled correctly.

But since we look at enough of these data points and we do see some of these clusters here, even despite the potential for bad labeling, we still see clear evidence that there are similar techniques with some large chunks of these adware and compromised activities.

So my conclusion, I'm pretty much out of comic breaks.

So keeping in line with one of my talks from last year, IPs are worse indicators than domains.

Overall when we look at the big data going back for all time, we see that IPs have about four times as much overlap with all the other indicators than domains do, which is just more empirical evidence of the trick of managing IPs.

Part of the clear takeaway about that is If you need to assign specific indicator types to them, you would want to do more aging when it comes to IPs compared to domains.

So for suspicious and malware itypes, we talked a lot about the specific details.

But for generally wrapping up, definitions are helpful here.

And so malware domains tend to behave most like C2 and phishing.

And suspicious domains tend to behave most like C2.

Bot and spam IPs overall were really interchangeable, because we expect that the bots are being monetized to do spam.

This makes intuitive sense.

But we've got data confirmation for that.

My somewhat plea to after you filtered out all the noise, consider going back and adding more complicated rules.

Because the scanning IPs data type is useful eventually.

So obviously you would want to apply more of a rigor to filtering of scanning IPs.

But it's not like it's complete noise.

They definitely do tend to overlap with bad stuff eventually.

So it's not a first priority, which you know, but my takeaway is don't forget about them.

Once you get mature enough, go back and look at scanning IPs, say how can we apply reasonable filters to these, winnow down the data little bit, and actually use the scanning IPs indicator types.

In general, since it's really hard to make generalizations over a talk like this, where I had so many really specific insights, indicators from many itypes tend to look like when we're looking at the enrichment data, about one to two other itypes depending, based on my really complicated charts, really general statement.

And the TTP itypes like free domain registration, like domain parking; these really don't overlap so much with malicious itypes.

If you want to consider scanning to be sort of a TTP itype, then that does.

I've already slightly advocated for eventually considering scanning as an itype.

But a lot of these other itypes about Whois privacy, free email registrations, on the aggregate it's not a great strategy.

It doesn't mean it can't sometimes be helpful.

But your time is probably better spent elsewhere.

Of course in such a comic friendly talk, how can I not conclude without an xkcd comic?

Because they're included in every talk for people that include comics.

You can reach me on Twitter, @EvanWright.

You can each Anomali at @Anomali.

And that's basically me.

I'm the Principle Data Scientist from Anomali, happy to take questions.

And thank you for your time.

Yes, Ryan?

So when the [INAUDIBLE] have an opportunity [INAUDIBLE] sort of taken as one, I take [INAUDIBLE] from the [INAUDIBLE] letter.

But I can see a situation where a [INAUDIBLE] letter [INAUDIBLE] is spamming a compromised bot type A, [INAUDIBLE].

But they have the [INAUDIBLE] together, instead just give it like scouring thing that is actually associated with APT.

[INAUDIBLE] Yeah, so the question was, when all the different threat intelligence providers feed their indicators into our ecosystem, they could be sort of divided, right?

Like, I don't know.

We think this is malware and C2.

It's not necessarily an unambiguous itype assignment, right?

Those do exist.

And some executive decisions have to be made.

Usually I think with the more mature threat intelligence feed providers, what they'll usually do is try to sort of break up and add more feed type offerings on their end, so that we could more granularly add them into the correct indicator type in the ecosystem.

But nonetheless, it definitely leads to challenges.

Because itypes aren't necessarily unambiguous.

This is, again, fundamental to our industry.

We have definition problems in the security field.

And we want to make statements like, is this APT?

Well it could be APT.

It's almost certainly going to be something like malware or phishing as well.

So that's a great example of when anything that's labeled APT is almost certainly some other indicator type as well.

And so the notion of unambiguous mapping to itypes is a little bit tricky.

I generally rely on the law of large numbers.

So like if they're between scanning and bot, the next time another feed is between scanning and bot, if they're 50-50, maybe they'll pick the opposite one.

And it will sort of balance out a little bit.

But I think part of Ryan's point realizes that when we talk about these overlaps and sort of confusion in the industry due to overlap, it's probably a high-end estimate because it could be the case that multiple answers are truly correct.

Is that kind of addressing your question?

So if you have two different people surprise the same indicator, and they assign different itypes, how do you choose which one gets associated?

I know you de-dupe them, right?

So how do choose which itype the two different sources takes priority?

Right, right.

So when we are talking about the-- so two different strategies.

When we're looking at this type of an overlap, and exploit-- we'll sort of contribute to each bin.

So if exploit overlapped with adware, compromised, and mal, it will add to each one of these bins.

We normalize for that.

So it's not unreasonable, right?

So we add 1 to the numerator and the denominator for that particular bucket.

So that's how it's handled in the case of counting itypes.

For down here, for this work, it's a good bit trickier.

Because we really need an unambiguous labeling.

So what I'm doing here is I randomly assign between two different itypes.

I randomly assigned to that type, because again, we're relying on law of large numbers a little bit here.

I actually even have more data than this.

And this is a sample of that data.

So since we're looking at general patterns of things like-- some of these general patterns really tend to stick out, right?

This is a law of large numbers thing, where, OK.

We could be wrong about this one, but we're not wrong about these.


So, by randomly picking, it can add a little bit of noise in the periphery.

So again, this is a little bit more of a worst case situation, because it could be the case like in fact, I'll give you a specific one.

There's a bunch of sinkholes right up here in these two groups.

These two spam domains could very much well be sinkhole domains, right?

It could have been assigned both of those, since we randomly choose.

The reason we can get away with this is because we're visualizing.

When we're visualizing, your eyes naturally tend to pick out the more obvious patterns.

And so we're naturally focusing on these things, which there's no way where we're wrong about by randomly picking between this ambiguous decision.

There's no way we're wrong about a pattern like this, because there's so much supporting data.

The odds that we're wrong about this is tremendously, tremendously low, because of probabilities.

Now the odds that we're wrong about this one is higher, right?

So this is, again, a little bit of a worst case scenario, that maybe this one here exactly for the oddball one that's far away from a group, we could be wrong about that one.

But the notion that we're forming these larger groups is highly, highly, highly probable to be right.

Does that answer the question?

That's a great question, too.


I think I will just tackle more of layman's standpoint.

[INAUDIBLE] Oh, OK, I'm sorry.

If there's like two different imports-- [INTERPOSING VOICES] Three or four different threat vendors are coming in.

I know I'm only getting one itype.

If they classify different, in the system, what do I see?

Yeah, yeah.

Great, OK so in the system questions, yeah, not a question about our specific methodology.

Yeah, OK that's fair.

Yeah, there's multiple points where we have to choose between how do we unambiguously identify the label.

So in the platform, the way it works is our feeds team works with the vendors to understand which indicator type that we have is the best mapping for their data.

So we expect the feed vendors understand their data really well.

We explain the differences between our indicator types in our platform.

And then mutually a decision is made about what the best mapping of their particular feed to our indicator type is.

And again, this is always facilitated when another-- especially the premium app store partners offer multiple feeds from the same vendor.

In that case there's more granularity, and we can rely upon it a little bit more.

So I think if you're trying to think about what itypes might be more confused in the platform, if there's just an open source feed of bad IPs, that's much more likely intuitively to have more itype confusion, than maybe like CrowdStrike, which offers maybe five different buckets of feeds.

Because we can map each of their buckets into a particular indicator type for our buckets in the Anomali platform.

Does that answer the question?



It would obviously be a lot less data, but I'm curious if you've given any thought to applying the same kind of analysis to a specific threat actor or a group to see, rather than what all types are coming in, specifically what they're using.

Are they overlapping, say like [INAUDIBLE] group are compromised as a structure.

Are they using that or anything else?

Or is just purely to specified to a threat actor with the platform?


I think that's a great point.

So for the recording, the observation was of four specific threat actors paying attention to which indicator type, I think it was, is associated to all of the particular threat actors in the platform.

To some extent, I think that would be an interesting study, obviously talking about similarities and differences, at a little bit more of a high level with actors.

But I will also point out that I believe all of that information is accessible via the API, and since the threat actor data is much more small consumable data than me going through.

I mean, even my small data set of like six to seven months, was something like Right?

And that was the small data set.

The large one, I don't know, 7 or 8 times bigger.

So with the big data analysis, I don't think it's plausible to do without direct access to the data.

But for the threat actors specifically, if you are interested, the API would cover the ability for you to export a lot of this data for the threat actors you're interested in, for any of our kind of threat intel analysts that may be interested in doing that work.

Great question.


More asking your opinion, if we've started to use the IOCs in the platform in two different dimensions, and if you can go back to your summary slide at the end, where you were talking about that maybe you were thinking about expiring IP addresses a little bit more frequently.

The two dimensions that we use are is there a match that's actionable by a human, or is it something that we can automatically take an action on, like a firewall block.

So I think maybe on the human analysis side of it, having a shorter lifespan on IP matching makes sense.

But from a longer lens perspective, if I'm just pushing IP addresses to blacklist on a firewall or a proxy or something like that, I think the longer lens with IPs is a little bit more valuable.

And I wanted to hear your thoughts around the consideration of those two different dimensions.

Yeah, so the first part, the suggestion, let me try and summarize it.

You can tell me if I'm correct, that your observation is that prioritizing IPs, if it needs to be done, should be around more actionable IPs.

Was that a fair summary?

Only that if we were talking temporal n IPs, like you want to know bad IPs that happened in the last week.

Those were the matches that you'd want analysts looking at.

But if you don't have people involved, and you want to use a larger data set, then having the historical long tail on IP, like malicious IPs, suspicious IPs, mal IPs, however we're classifying them here, that history, the longer history is valuable for making blacklists, recently.

I see.


So would it be that actioning on recent indicators makes more sense.

But the totality of the long history of the data makes more sense for maybe writing up like a threat his-- IPs associated to an actor.

Previous, two years ago this was scanning.

Three years ago this was malware or something like that.

Yeah, so I think that's a great point that the more qualitative work about writing up a report is improved by having any more data points.

I think that is the first observation.

I think if we did do some amount of filtering IPs, one way I would tend to favor, as far as IPs, is looking at which feed provider it came from, and what is that feed provider's history of being reliable, and having good intelligence.

Because in our system, when you click that Report False Positive button, there's two different teams that get your report about why do you think it's a false positive.

And so one of those teams goes in depth, does researching, do we agree with it?

Let's pull the history on this.

And then my team is focused on, OK, we got that false positive.

Long-term planning, how can we try to address problems like this in the future?

And so that feedback from you all, the users, is what helps us-- is one of the measurements that we would use when I say good and more effective feed providers.

That would be feed providers that provide us IPs that people aren't saying are false positives very often.

I guess I'm asking it under a different lens, of what's the impact to an organization when the IP list is not meeting that curative?

The impact is usually human analysis.

We have to staff the SOC and staff C search to investigate the matches.

But that only applies when you're taking those IP matches and allowing people to look at that.

If you're building just blacklists, just building block lists for firewalls, let's say.

That longer history of IP addresses is more valuable than maybe something more temporal.

[INAUDIBLE] I'm hearing that it is just, so the blacklist is in the automated process then, that longer time period.

That's what I'm hearing is that the human analysis is what I think it's not going to be valuable.

It's like I guess like perhaps [INAUDIBLE]..

I would flip that the other way around, to say that I would prefer to learn the history of something [INAUDIBLE].

Related to whatever it is I investigated [INAUDIBLE]..

But as well with, asking of your opinion.

If you perceive that your long-term data for blacklist, the automated process is more valuable [INAUDIBLE] when the identity of that changes, and it's no longer [INAUDIBLE],, how do [INAUDIBLE] with that [INAUDIBLE]??

And to clarify the reason that I'm asking, is I'm thinking in terms of generating an alert to put it front of a SOC or a C search.


So throughout the conference, there's been the comment that IP addresses are not specific enough to identify malicious activity.

That's been repeated several times.

Largely that is what we see as well.

Unless you have a domain, or URL, or some other type of more granular [INAUDIBLE],, or something more specific, an IP by itself is not a really good indicator of compromise.

So if we're sending alerts to a SOC, maybe at best a smaller data, a smaller set of curated IPs, might be more valuable than the existing fire-hose that's available today, things that are within the last week they can report it as malicious.

Because they tend to turn around on and off, compromised and not comprised very quickly I still want that data available to analyze, but not necessarily to generate alerts.

Where I think the longer term data set is valid is when I'm setting up feeds on the data, on the IP data that's coming from ThreatStream to a firewall, to my firewall environment, or to my endpoint security product to say, you're seeing traffic to this IP blacklisted.

Like I want year of bad IPs, if you will, in that data set.

Because if they're flipping on and off all the time, chances are it's not a piece of an internet-based infrastructure that I want my users hitting.

So it's OK for that to be wrong 5%, 10%, 15% of the time.

Because the consequences of that is someone doesn't get an ad delivered, or someone maybe doesn't get a frame delivered on a web page that they would have otherwise.

And it's not costing me human analysis time, because we wouldn't alert on those, because we know that that data is a little bit older, and is being used for blocks.

So I was thinking just the newer, more temporal stuff, at best would go from an alerting perspective.

But the historical tail would still be available for analytics.

But the analytics would actually happen when something higher up that pyramid generates the alert, rather than just an IP address, if that makes sense.

So part of what I'm hearing is the suggestion that, OK, if IPs are sort of too noisy to alert on, what is the next best way to winnow down that data, for when it's too noisy?

And you're suggesting if we're using a temporal component of the last week or the last month or something like that, then there's times we would want longer periods of that data.

I wouldn't go completely absolute.

But I'm generally more skeptical of the notion that-- the issue with IPs is that they're going on-off, on-off.

More of the issue with IPs is that they're concurrently doing something like shared hosting, where maybe they're hosting 10,000 domains.

Five of them are bad.

In that case, the timing data doesn't help you.

And so when I see why IPs are getting reported, why IPs are frustrating users as indicators, it's a little bit more of the use case of shared hosting than it is any sort of IPs flipping to on-off, on-off.

So that's my immediate thought about-- I wouldn't immediately jump to thinking about doing any data reduction on IPs.

It's really necessarily about time.

There is some caveat to that.

I mean we expire a lot of our IPs after 90 days, or depending on the feed they'll become inactive.

So yes, there's some timing component, as you were saying about infrastructure no longer being used.

But I don't think it's the direct solution to the sensitivity of the IP fire-hose triggering so many alerts in my network.

So that's my thought to your response.


Yeah, and again, I guess just to clarify, the reason the IPs tend to, when they're in the automated process, tend to frustrate a lot of analysts, really leads to in my experience, working with SOCs in the past, is this is really a common signature aging type of issue.

That if you just keep too many signatures on a box that's alerting and you never expire them, it's really similar to the idea of just keeping alerting of IPs on a box, and never expiring them.

The alerts just infinitely accrue.

And before you know it, you're learning half the internet.

So that's why more on the automation side of it, I'm thinking more of winnowing the data makes sense.

And I think we all agree that a nice thread intel write-up for not the incident response team, but the threat intel team in your org, would be much more interested in that complete history of the IP.

Because if you believe it's associated to APT, you want to know the whole history, because that may lead you to some useful insights from five years ago, maybe.


With that, I think we are well over time.

So actually, right at time, I think.

Yeah, right at time.

So perfect.

Thank you, all.

About Detect LIVE

We believe that threat intelligence holds the promise of allowing organizations to better manage risk and develop resilience. Detect LIVE, brought to you by Anomali, is a virtual event series that provides a platform for security executives, practitioners, and researchers to share insights and experiences related to threat visibility, detection, and response.