Working with a vast amount of sources and feeds in a threat monitoring program can become very overwhelming and taxing at times. Anomali provides a great way to collect, store, and enrich information; but having an approach to score and/or identify what information you want to operationalize on can help prioritize specific feeds or identify gaps.
To address this, FirstEnergy SOC has developed a method where they can effectively compare and score through a quantitative and qualitative approach. They evaluate sources based on their timeliness, accuracy, relevancy, and predictiveness (TARP principles) to help frame the quality they are getting from different sources. This has helped them in their strategy to make decisions based on cost and sweat equity to maximize their return on investment.
Watch the on-demand presentation led by Thomas Gorman, Big Data Security Analytics Developer, and Scott Poley, TSOC Manager from FirstEnergy.
TOM GORMAN: OK, good morning.
My name's Tom Gorman.
We're going to be walking you through threat source assessment-- a threat source assessment project that we worked on recently.
And this was a really interesting and fun problem to solve.
SCOTT POLEY: I'm Scott Poley.
I'm the supervisor of the SOC at first energy Tom, you mentioned he's a data guy, our SOC.
But all my analysts actually wear multiple hats.
So we don't really have a threat team.
So this approach is more about what we do with the threat for some of our feedback from an operational perspective, as well.
And this is something that our CSO was asking questions about that kind of drove us down this path.
So kind of start opening it up to assessing information or intelligence.
Most people in here understand the concept between information versus intelligence.
Obviously, we'll collect-- a lot of the information we collect-- we'll collect pretty much anything.
But obviously, we like the context and everything to make the relationships to stitch it together, because that's what real intelligence is, as most of you probably understand.
So the big thing about our approach is for us to do this type of assessment we had to assess what we already know.
So we didn't-- we couldn't just go off of the unknown, essentially.
So we're really cross comparing.
So I'll walk through kind of like the different types of sources we kind of pulled in to compare across that also kind of framed how we'd score things, too, based off of those comparisons.
And the TARP principles is kind of what we used to base the assessment.
So the TARP is something that we're kind of familiar with from an old military intelligence.
I don't know if it's still used.
I'm not an intelligence person.
I did work with intelligence people in the military in my past life.
But we'll walk through those principles.
And that's kind of what guided how we assess some of these things.
So the TARP principles are pretty self-explanatory.
But one of the things we looked at is the timeliness of how we actually got things in.
Obviously, time is important, because as time passes, things become less relevant sometimes, or they've already been burned and turned.
So that was one of the factors.
Accuracy speaks for itself, as well.
You don't want to really get a bunch of junk data.
So that one was a key piece for us to assess.
And a little hard to assess for what we did, but we kind of tied in some of our casework and stuff for that and kind of analyst feedback, as well.
Relevancy, the big thing, you know, we looked at since we're an energy company and also part of critical infrastructure.
There are certain aspects of what makes things relevant to us.
So that's how we kind of assessed the information that was coming in.
And then predictive, and this is kind of-- a lot of people get tripped up on this, like how you're going to use this information to predict future things.
But it's more about, does it have enough context so you understand the motive behind what's going on or the techniques behind just IOCs?
So we have built a lot of detections and different analysis on how certain things behave versus just keying off specific matches on indicators.
So that's kind of what that spells out.
So the time frame we did this assessment when we're asked to kind of look at this-- we pulled data from the months of January and February of this year.
And what you're looking on the scale there, that's kind of our logarithmic scale.
And the only reason why is because some sources didn't provide a lot of data and you wouldn't be able to see it on here versus the ones that provide millions.
So just to kind of to give it a frame of mind on what you're looking there.
But you know, we had our third party things we paid for as part of this.
It had government-type feeds coming in, as well.
And then stuff that we targeted for own open source collection for our automated process of using this after that talk.
We talked through what we collect there, as well.
So these are kind of the sources we're comparing across when we did our assessment.
So I'm going to kind of walk through some of the data sets we use to kind of score some of these things and then kind of our methodology.
It's probably not the best way, but it's our approach.
So we're always looking to improve it and looking for feedback.
But we had to come up with something.
So with the TARP timely breakout here, what we're looking at is first reported versus not first reported.
Pretty easy thing to measure, especially when you're pulling data out of Anomali.
We heavily used the API to do these types of things.
But basically, what's the creation date on the indicator you see?
Kind of looking more of the breakout, you can kind of see that's the percentages of those sources and where they stood for first reported.
So this one is actually a nice measurement because it's quantitative.
So it's really easy to kind of stack against other things versus other things we measure are more kind of a qualitative approach.
But that's kind of self-explanatory here.
Our approach for accuracy-- so this is something that was a little interesting to kind of take on, because with accuracy, obviously active.
You're kind of relying on the sources themselves to determine if something is active versus inactive.
So timeliness probably plays into some of this, when things actually sunset, as far as indicators go, and they fall into this inactive list.
But it doesn't necessarily mean the source is providing inaccurate data.
So those things we still kind of scored a specific way to assess accuracy.
But the key thing was really the false positives.
So for us being operational, one of the things we do is for all of our processes-- like, we can flip things as false positive when we see them or work those specific indicators.
But there might be data in there that we never have seen or could assess ourselves and that the sources don't provide good feedback.
It might not get scored very well.
So this is kind of a tough thing to do.
But it was our best approach based on the data we had.
And we wanted this to be really a data driven perspective to try and answer these types of questions.
So relevance and predictive, so this is where we kind of get to more of that quantitative type of approach.
This is an example.
Obviously it's a FireEye report we pulled in through our automated RSS feed collection.
But for us it was relevant because it was talking about the LinkedIn phishing campaign.
Back I want to say a few months, we actually saw some similar activity associated with this, where actually Iranian accounts were spinning up specific things just to First Energy and trying to make those connections with people.
And they seem very nuclear-focused, as well.
So when we saw the people that were added to that list, it was very relevant.
So when we saw this report come out after that behavior, it was very interesting.
But when we talk about the predictiveness of this, as well, you know, there's a lot of information they put in their reports.
If you're familiar with really good threat reporting with context, you get more of where did it come from so we know it's LinkedIn?
What does it kind of look like?
Hashes associated with malware, how actually malware functions so we can look at when we do all of our forensics or triage or reverse engineering when we look at samples.
Does anything cross-collate that way?
They even broke out how does the C2 work and what does that kind of look like from different components of that, even into how they actually persist in the environment.
So these types of reports obviously score higher when it comes to that predictive relevant nature, which I think is more self explanatory.
Another example, and this is kind of before we had our own processes with threat intel.
We kind of were just new with Anomali at the time, actually.
Or newer with some of the advanced things we could do.
But nuclear 17 was a big thing in the energy sector.
That's the Russia going after, you know, US energy based companies.
And what we did at this time was we kind of did our own relevance and stitching together of the indicators and reporting by putting it all in here and kind of seeing how does this relate to us?
And within 24 hours of pulling all the reporting together, we actually identified more than the FBI provided us on the initial pull of the alerting that they gave us.
So we pulled out all the watering hole and phishing.
We were able to identify initial payloads.
What were the actual target payloads, hooks, and persistence?
And then what was some of the infrastructure they were using, as well?
So this just kind of ties back to the reports that fed this.
Obviously can be very predictive because we actually understand exactly how this all works, what was their intention.
And it was relevant because we got notified by the FBI to look for this stuff, too.
And it was energy sector.
So kind of another frame of mind of what we produce to kind of use that intel to do other things.
So this didn't really go into the scoring, but through this process, we did discover some interesting things, as well.
What we're kind of looking at here is out of all the indicators we pulled out-- which I have a number that'll come up, but it's in the three millions-- we did do the geolocation enrichment on everything that we could.
And this is a heat map of where all those indicators lie.
So all those sources from the vendor base sources, the government sources, and open source collection, you can kind of see this.
And the technique we use-- and I'm sure others use-- is the whole geoblocking strategy, which some people feel it's better.
It's more really just a filter of noise because when you actually look at the sanctioned or embargoed countries, you can see that it doesn't really take care of much of the activity you actually see.
So when we started looking at this, it was kind of just interesting in general, because we use geo-blocking, as well.
We're a US-based company.
It kind of makes sense.
But if we use that as a real strategy, we'd only block just under from everything that we collected.
So one thing I did notice when looking at that heat map is it actually is directly related to the internet and population and how it's distributed across the globe, as well.
So it kind of makes sense.
You know, attackers, they want availability.
So where they actually were positioning themselves and what infrastructure they were using is kind of where people exist or where the technology exists to be able to do these types of things.
I always love when people are like, hey, Russia is attacking us or China is attacking us.
Let's block China and Russia.
We didn't really do anything.
So that was just something contextually pulled out.
There was an interesting hindsight to this process.
So I'm going to talk through kind of the four areas and how we measure those based on some of the data you've already seen.
The timeliness, I kind of talked through the first reported, right?
That was the big thing where it's easy to see who created it and when.
So that was that initial approach there.
But the accuracy, this is where we kind of had to take into effect who was the first reported and the false positive ratio, because people that have intel really like to copy other people's intel and report on it, as well.
And you really can't tell the true source if you're going to go off that.
So we kind of had to scale it based off first reported and false positive ratios.
And that's kind of how we did it.
And those buckets will make sense when we go through our methodology of scoring with the final results.
But those are also based off the data that kind of define what those buckets or breakouts were as far as how far in the spectrum do you do well or not do so well.
The qualitative kind of analysis we were looking at, relevancy.
The big thing here is this was more feedback from the analysts actually working in the operations, It's really hard to take a data set and say how we know this is necessarily relevant to us without having some of that human input.
So the way we kind of scored things with the relevancy is, you know, something is miscellaneous or security news, doesn't really mean much to us.
Did it provide just raw security-related data?
OK, could be more interesting, but not really what we're looking for for good intelligence.
Is it activity that we've seen within our environment?
You know, this is where case stuff can kind of play into that.
So we know that, well, at least these sources are providing things in the same scope that we want to look.
And then obviously, if it's really around targeting energy or critical infrastructure, we care.
Whether we've seen it or not, that's obviously going to be relevant to us to at least know about.
The other aspect for something being predictive, like I said, we'll take any information in because we try to collect enough so we can cross correlate ourselves.
But you know, if you just give us IOCs-- which some feeds that we do collect do this-- we don't really score you well as far as being predictive, because you're not going to tell us anything that's going to help us out unless we put a lot of work into it ourselves to do all that.
But obviously, you know, with Anomali, we do a lot of smart tagging with certain things, or how we bring certain things in that might have indicators, like with our Twitter collection that we do.
That adds a little more granularity as far as what this could be associated with.
So it kind of helps us with the predictive nature, like what's the intent?
What are some other things we could be looking for?
Then obviously reports that actually have analyst input, you know, those are the favorite reports we have.
Unfortunately, we don't have all the time to read through all of them.
But when we do come across things that match in our environment, we can go back to see the context and see what we really should be considering.
And then when we get the really good reports that actually talk about the actual intent and behavior of an adversary and what they're trying to do from their big-- usually geopolitical agenda, that's obviously going to be the highest scoring from a predictive standpoint.
So this is kind of how we scored out for the TARP results.
And what was interesting is not only did I give this brief to our CSO, but one of the actual sources for our intel-- they wanted feedback, as well.
And they were actually part of this mix.
And it was easy for them to see what areas could they improve in, as well.
So they kind of interpret this-- the idea is you're adding up pretty much these four pie charts, right?
And if you can't get to 100%, you're going to be in this poor range.
If you start going over in this neutral range.
And that's what that's breaking out.
And if you can go into, like, even higher as it kind of keeps circling around, you get into a a good score.
So of all the sources we brought in, this is kind of where everything's stacked.
And it was easy to see when there was a source that didn't score as well, why.
And then we can determine from a strategy perspective, is this something that we really need or care about, or are other sources doing such a better job in this aspect, and especially if there's duplicate or overlap, we can see that as possibly a change in our strategy for collecting information.
So I did mention overlap as one of those things.
Now, you know, this is kind of common.
You see this in the, I think, source optimizer in Anomali.
They do kind of a similar view of this.
But it's basically the same thing.
You can kind of see from a heat map perspective, you know, how do these indicators stack against other sources?
The only thing that's hard is sometimes your better sources might not have the same volume of intel.
But it might be really good intel.
So this kind of skews it a little bit because obviously your noisier sources are either going to look bad from a heat map perspective, if they have a lot of Intel, or can make other sources look just as worse.
So this is kind of another view that we created.
And based on-- what this is looking at is uniqueness.
So out of all the pieces of Intel and overlap, we can see what sources are more unique in general.
And it was kind of a surprising approach for us because since we do so much open source collection they were still actually pretty unique from all the other sources.
And we assumed a lot of people would be copying other people's work.
Wasn't always the case.
But to kind of see the stack of how it actually plays out, those are the different sources.
If you're not colorblind and you can actually see this well, you can kind of see how it all stacks as far as overlap, which is an interesting kind of view on the data.
So another thing to really take into consideration for us is, you know, we kind of know how things stack, as far as, what's our favorite sources?
What do we really like the most?
But if there's a source, like for instance, source F up there, you're kind of looking at a cost breakout for what are we really getting from this source if we have to pay for it?
Obviously, open source is easy because we collect it ourselves.
We're not paying for it so it's not that big of a deal.
But if you're looking at budgetary constraints or trying to make those types of decisions, you really want to know, how does everything stack?
So with having these types of results, it was easy to kind of stack that against based off the cost analysis.
What are we really getting and what are we paying?
So in this example, you can see that we pay a little more for some.
We pay a little-- a lot less for others when comparing to a specific source you're kind of questioning as far as the return value.
But you know, something that usually comes with a lot of these sources is additional futures or tools.
So you've got to kind of weight that in, which is kind of hard to do with just entire approach.
But that might be why some of these vendors provide these additional resources and tools, because if they don't do great with the threat intel piece and do OK, they still add enough value to it's worth-- the cost of the product.
So we're going to kind of transition.
Tom was the guy that actually did a lot of the data pulling and analysis, so he's going to kind of walk through his approach with the data side of things.
So you'll see a lot of code, you'll see him walking through it all.
And then at the end we'll table some time for questions, as well.
So that's to you, Tom.
TOM GORMAN: OK, thank you.
So anyways, in order to assess the data, you need to get the data.
Oh, I just hit the wrong-- SCOTT POLEY: You hit the blackout frame?
Press it again.
TOM GORMAN: There we go.
So Anomali, if you wanted to pull this data using the user interface, it might take a couple days, if not weeks, to manually go through and export millions of rows of data or millions of indicators.
So we utilize the Anomali API in order to reach out and grab all these indicators for a certain time period.
And you'll see that we limited the batch pulls to a certain amount, converted the JSON objects, and stored these in like a pandas data frame.
And if you've never used pandas, it's a Python library where once it's in this data frame, kind of looks like a CSV-- has field names, rows, and columns.
So another important thing to mention is you have to use the-- there's a batch ID for every single indicator that's stored in there.
And if you're going to do this-- pull this data in batches, you have to use that update ID in order to-- so you pull an initial pull.
Use that update ID from the initial pull and you start there for your next pull.
I'll kind of point this out here.
So we have information right here on how to connect.
So this is to the server.
We have an API key.
We have a time range that we want to search.
This is the initial pull right here where we're using this order by update ID and we set that to And we run it through the first time and it stores this data in the data frame.
And then there's a while loop right here.
And this will keep going through until there's no data left.
So this part is identifying first seen.
So once we have all those indicators from that time frame, we're going to take these and run them back through the Anomali API.
And any time that indicator was seen, it's going to be pulled back.
So it could be from a set of just certain sources that you have.
Could be from any source.
So these are the five fields that are most important.
So we have the value-- whether it's a hash, a domain, an IP, with a source, trusted circle ID, and tags.
And then, of course, the created time stamp.
So right here is another script, Python script, that runs through.
And we do batch sizes of 100,000.
The reason we do that is because sometimes, you know, if it's like 500,000 or a million, it just keeps going and going.
And sometimes it doesn't finish.
So by doing this usually the API agreed with your script.
And we pull these columns right here.
And let's see.
And then this just outputs several CSVs that you end up just concatting, blobbing.
So this would be the status.
We all know the statuses.
We have inactive, active, and false positive.
And then right here, what we're doing is taking the value in the status-- so the indicator in the status-- and then we're grouping by the indicator.
We're sorting it by when it was created, which is right there.
And then we take the first one.
So I know that's a lot of code, but it's just-- it's a very simple thing.
And we have the source overlap.
So what did source A overlap with?
So you know, we figure out, did it overlap with indicators from source B, C, D, E, F, et cetera?
And if it did, we do that for every single source.
So you can see right here we just do inner joins on each source.
So source A, does it join with B, C, D, E, F, G?
And you can have as many as you want.
And then we do that for every single source.
These persons right here, these are just-- every single entry rate here, [INAUDIBLE] numeric value.
So you can see how many overlapped with which one.
It's a couple different things I used.
And R and Python or very similar.
And a lot of the visualizations were done using this, too.
And then you saw the last visualization.
That was kind of recommended by our CSO.
He said, I want Harvey balls.
You know those little balls, right, that look like percentages, like little pie charts?
So I had to figure out a way to do that.
And I used R in order to achieve that visualization.
You can kind of see that right here.