Effective threat intelligence requires a combination of sources and techniques, analysts to interpret data, and a platform through which to manage and leverage data. Many people will unwittingly fall into a “threat intelligence trap” when trying to implement a successful threat intelligence program.
Despite having access to an abundance of information, there isn’t a set precedent of conventional wisdom as to where to get intelligence, how to verify and triage it, or how to make any of it actionable. Nowhere is this more evident than with Information Systems Security. There are countless options for where and how to get threat intelligence, and very little in the way of finding and applying the tactics suited to your battle.
Threat Intelligence refers to the knowledge of what is malicious in and around our systems, starting with adversaries or actors and extending to the tactics used to breach, exploit or abuse your systems and data. The common formats of exchanging that information are known as Indicators of Compromise (IOCs), which can be easily matched on security products. These are not necessarily the best match to an actual threat- more a compromise between what can be developed for detections. In theory, the more intelligence we have the better. That theory quickly breaks down when you consider the quality, context and relevance of the information and how much of it you can practically apply. This is what I refer to as a trap- the never-ending pursuit to find the intelligence with the highest relevance and efficiency to protect your assets.
Threat Intelligence has an expanding role in security as newer analysts enter the workforce without years of background as network or system administrators or other traditional experience. With this increased relevancy in the industry, threat intelligence needs to have an increase in quality, context and resolution. A higher percentage of threat intelligence recipients will not know what to do with it unless it comes within a larger context and with recommended actions.
Threat intelligence traps will lure you in with the promise of actionable intelligence but ultimately limit you with narrow or unusable sets of data. Below I’ll explain what these traps are, what options you have for threat intelligence sources, and methods for avoiding those traps.
It’s a Trap!
The most common traps are:
- Out of Scale: A scale of intelligence that is too large to be actionable in enforcement devices and lacks the context to help you prioritize IOCs
- Low Quality: Data that is outdated, not validated, and/or difficult to verify.
- Low Context: Data that has little more information than “good” or “bad”, making it difficult to respond appropriately in situations that often require non-binary comparisons.
Scale is an issue when dealing with large amounts of data and a finite set of rules within protection devices that you can apply. If you receive a flat list of 700 million IPs and a limit of 50 thousand that you process at one time, you’re left with a huge gap and a lot of work to figure out which 50 thousand are most worth that time. The scale of your data impacts two critical aspects of a threat intelligence program, action and analysis.
There are a lot of limits on what you can enforce. It’s different per product, but none of them are capable of handling everything. Whether the protection vector is files, network, email or web the volume of past, present and changing information is overwhelming to apply. This is one of the main reasons why getting the information most relevant to you is so important. It allows you to clearly decide what is most worth enforcing.
There is also a limit to how much information you can triage that can be expressed in terms of events or items per analyst. The analyst is the key to making the intuitive decisions for your organization. Once the information scale has gone so far beyond their capability to impact even a percentage it loses effectiveness rapidly.
The quality of a data source refers to how well it’s curated, verified, and updated. A lot of threat intelligence is rapidly changing. IPs will be listed as malicious, although this can be somewhat misleading- an IP itself is not bad. That IP represents an activity and machine that have been temporarily listed as suspicious. An IP can change services, owners, and machines and then be compromised and resolved in a short time span. If the machine behind an IP, the activity it’s attached to and status of the IP itself haven’t been verified recently then there’s a chance that information needs to be updated. The quality of data refers to things such as:
- Rigor in listing something as malicious
- How recently it’s validated and updated
- How accurately it’s represented
- How useful that information proves to be
Context is a very broad attribute of intelligence. It represents everything that is peripheral to the primary data that makes that data actionable in the required context. This can be anything from correlated world events to the target of a spear phishing campaign. For the purposes of this post, we will focus on only 2 significant items of context, target scope and multiple IOC relationships.
Target scope: Opportunity vs Interest
This is a key area of context. The majority of activity witnessed and acted upon originates from broad-spectrum attacks that seek to install malware on easily exploited machines. The goal is to infect as many systems as possible for later use in a more focused attack. These phishing campaigns are rather sloppy because the recon is blatant and predictable, but still worth protecting against. Urgency would be much higher if you knew a specific adversary was targeting your organization and already had the credentials to then try and escalate their privileges within your infrastructure. The targets of opportunity will be attacked by an automated and indiscriminate process, while targets of interest will be pursued by more interactive and persistent attacks. Protecting against them requires similar yet distinct efforts, making it critical to understand the difference.
Victim (Target) is a standard label in a popular analysis model known as the Diamond Model. The other 3 points in that model are Adversary, Capabilities and Infrastructure. Having relationships to any of these immediately makes them more valuable than a flat list of data.
Multiple IOC relationships
To make information more easily shared and processed by security products, most indicators will be formatted into flat lists. This inadvertently removes valuable context on common adversaries, infrastructure, campaigns, targets and more that could help analysts determine how high of a priority these indicators should have. This is a huge pitfall, especially considering that these IOCs were not created or detected in isolation. Reconstituting relationships between various indicators and activities consumes valuable time that could otherwise be spent responding to threats.
There are 7 categories we will divide threat intelligence sources into for the purposes of this discussion. Each presents its own challenges and benefits in relation to the traps discussed above.
- Open Source
- Client Sourced
- Crowd Sourced
- Internal Analysts
- Automated Harvesting
- Trusted Groups
Summary: Volume=High, Quality=Low, Context + Relevance=Low.
Open Source is free and open to use as you see fit. Adversaries can easily check for their presence on it. You can integrate it into your collection and applications for processing and ad-hoc lookups. It’s shared intelligence that’s open to contributions, use and poisoning from anyone. There is very little context to determine relevance, but there’s a lot of data to start with.
We only need to be concerned about IPs from these countries- namely, all of them. Without a method of prioritization the data quickly becomes overwhelming.
There are currently 655 million IPV4s represented above, which is roughly 17% of the routable IPV4 space. Countries of IP aren’t typically a good indicator of threats because the adversaries behind attacks don’t need to be in the same location as the machines used in attacks. If there are multiple hops through systems to perform an attack only the last one from the detection is seen. This illustrates the scale and quality of open source data. This is only one vector of several that you can obtain publicly and it’s way beyond the capabilities of applying to firewalls or other enforcement systems. Even if you had the capability to do enforcement at this scale, you wouldn’t necessarily want to unless you had some confidence as to the quality of the data. Data quality is a big problem for open source intelligence; there are no standards or accountability. Open source intelligence might be used as a resource for narrowing down options when investigating threats so long as it’s cross-referenced with other information. There are hundreds of open source intelligence feeds that can be adapted to, although many of them duplicate each other without verification.
Volume=High, Quality=Medium, Context + Relevance=Low.
Client Sourced intelligence is commonly integrated into most security products. Information gathered from the product is sent back to the company, processed and turned into signatures to deploy back to the product. This is ideal because it’s well integrated without any client intervention. Due to a fear of false positives affecting business, enforcement is reserved for only the highest confidence intelligence. The rest of the low confidence or contextual decisions can be difficult to determine and also difficult to integrate with other intelligence. This works great in products like endpoint protection where detected files or behaviors are good or bad with some certainty, and they handle the curation and maintenance.
It falls short when
- You need to use the information outside of the product, such as correlating with other attack vectors.
- You need to more closely manage the data to meet your needs, such as integrating outside intelligence.
- You need to understand the origin, context and scoring process of the intelligence provided. To protect the privacy of other clients and intellectual property this is usually not tracked.
Volume=High, Quality=Low, Context + Relevance=Low.
Crowdsourced data is very similar to open sourced but done with specific communities or applications. Open source data refers to licensing and allowed use of data. Crowdsourcing is typically not open sourced for use but very open in collection. It shares a lot of issues with open sourced data with the added problem that the participants are often unqualified to contribute constructively. The best example I can think of is from this popular site where anyone can comment on how trustworthy various websites are. You get a lot of things that reflect people’s opinions of the company behind it, or their interaction with a representative. These things are not actionable from a security perspective, so they just add noise.
Volume=Low, Quality=High, Context + Relevance=High.
Having your own internal analyst will produce the most relevant results. They are also essential in triaging intelligence from outside sources. This helps drive the high demand for analysts. The problem with relying on analysts is that even the most heroic of them have limitations that fall short of the volume of information that they need to deal with. Even if an organization has the budget to hire a large team of analysts it’s difficult to find them. The following is a commonly cited quote representing the shortage.
“The demand for the (cybersecurity) workforce is expected to rise to 6 million (globally) by 2019, with a projected shortfall of 1.5 million,” stated Michael Brown, CEO at Symantec, the world’s largest security software vendor. Not long before Brown's statement, the Cisco 2014 Annual Security Report warned that the worldwide shortage of information security professionals is at 1 million openings, even as cyberattacks and data breaches increase each year.”
This shortage leads to the drive to make each analyst as efficient as possible with tools and automation where it makes sense.
Volume=High, Quality=Low, Context + Relevance=Low.
Automated harvesting is a process that involves honeynets, spamtraps, sandboxes, specific crawlers and other tools assumed to have a high percentage of malicious activity. The information is collected and fills up databases for triage and investigation. This is an excellent source of data to initially analyze and filter. These techniques are used frequently by companies that sell data. While those companies make it their business to filter benign activity from these sources, most organizations whose primary business is not computer security related don’t usually have the people or expertise to invest in it. The other problem with an outside company providing the results is that the context of where it came from, what the lure into the honeypot was and other details are abstracted away.
Trusted Interest Groups
Volume=Medium, Quality=High, Context + Relevance=Medium.
Security groups that share your context and have analyst teams to triage can be the best balance to gather threat intelligence. The data is generally higher quality, the context is as relevant as your relevance to that group, and the volume is higher than what a single organization could produce. There’s no limit to how many of these threat sharing groups you join, and you can build your confidence in them individually over time. These go by a lot of different names in the industry: ISACs, ISAOs, Trusted Circles, or Private Groups depending on how you come into contact with them.
Volume=High, Quality=Medium, Context + Relevance=Medium.
Purchased data is typically high in volume, high in quality, and has as much context as possible for you to make your own decisions and filtering. They use all the techniques mentioned above to find the information that they can sell to the most customers. The only thing the purchased intelligence is missing is how to apply it to your business. You still need to know what is normal, what intelligence is relevant given the contextual information and how to turn that into enforcement changes. There are a lot of differences in strength between intelligence providers that you need to consider in your purchasing decisions. Some will focus more on fraud, while others have a strength in dark web markets. You probably can’t afford them all at once, and there’s quite a bit that doesn’t overlap.
How to Avoid
So how can you prevent yourself from falling victim to one of these traps? Broadly speaking the answer would be to vary your sources and ensure that your staff has the resources they need. There are a number of solutions to try depending on the size of your organization and sophistication of your equipment. Implementing as many of the methods below as possible will help keep your organization’s threat intelligence program sharp as a steel trap (rather than falling into one).
Combination of Sources
Multiple intelligence sources can help immensely. They can give you comparison, corroboration, and variety. Comparing sources to each other can be revealing in several ways. Having variety gives you more coverage as long as they aren’t heavily overlapping for one reason or another. Indicators found in multiple independent intelligence sources let you know that it’s probably bad and common. Comparing trusted organization intelligence to more common sources can give you a sense of target scope.
In House Analysis
Analysts can always help improve the quality of information and its relevance to your organization. They just need to be used where they can be most useful, largely for their intuition and experience. Try to reduce any repetition and cross referencing by analysts that could be automated by scripts or tools. Analysts can also help define the logic for automated triage to amplify their efforts.
Some event and information triage is fairly straightforward when you clearly understand your business and what’s normal or acceptable. Anything that can be defined logically and automated will be able to keep up with scale problems.
Machine learning can be applied to intelligence that can be automated but not with straightforward logic. It requires a given set of classified examples and features to extract from in order to train itself and classify things in the future. This requires a combined effort between analysts and engineers to train and keep in tune. Analysts are essential in providing training sets of data and defining what they think the key features to classify them with are.
Machine Augmented Analysis
Perhaps one of the most effective solutions is to have a seamless integration between people, systems, and products. It’s easy to say, but admittedly much harder to achieve. The goal would be to create a “machine augmented analyst”, where technical advancements allow humans to focus on intuitive leaps rather than everyday maintenance.
Threat Intelligence Platforms and SIEMs
The current place where analyst and automation meet are at the SIEM and the Threat Intelligence platform. The SIEM is the center of events. The Threat Intelligence platform is where intelligence is managed by the analyst. Your SIEM and TIP should work well enough together that any events that already correlate to threat intelligence can be viewed in the SIEM while the TIP can still be used to research any probable future threats.
This is a very simplified workflow. The main concept is that, given all the possible information to make a decision without any busy work, the analyst is central to the workflow for things that require their intuition. Once they make or review decisions they can quickly deploy any changes to the appropriate systems or through appropriate channels. In the future we may see toolsets merge and processes evolve, but the concept of the augmented analyst will remain as long as we cannot process structured information as fast as computers and computers cannot efficiently gain intuitive insight.
Did you identify any traps your organization falls into? Hopefully this helped spark some ideas of how you can overcome them and make the best use of threat intelligence in your security strategies.
About the Author