Blog

Doing Threat Intel the Hard Way - Part 3: Processing Threat Intelligence

Chris Black

December 21, 2016

Table of contents

This is the third post in a series on manual IOC management for threat intelligence. See the previous posts:Part 1: <a href="https://www.anomali.com/blog/introduction-to-manual-ioc-management-for-threat-intelligence">Manual IOC Management</a> Part 2: <a href="https://www.anomali.com/blog/doing-threat-intel-the-hard-way-capturing-threat-intelligence">Capturing Threat Intelligence</a><h2>Processing Threat Intelligence</h2>Once captured, threat intelligence data must be processed. Processing includes several steps,<ul><li>Normalization</li><li>Deduplication</li><li>Storage of Indicators</li><li>Update, Expiration and removal of old indicators</li><li>Score/Weight intelligence</li><li>Enrich indicators for context</li><li>Associate indicators with Actors, Campaigns, Incidents, TTP’s etc.</li><li>Track and maintain Actor Alias lists.</li></ul>If you have chosen more than a very few feeds, you will likely encounter a variety of formats. If you’re lucky, it will be something structured specifically for intelligence, like STIX, OPENIOC or CYBOX. Others will use XML or JSON, which are also structured, but not specifically created for threat intelligence information. The rest will arrive as unstructured text in a variety of file formats. You could receive intelligence via .csv, .txt, PDF, Word document, or any other text format. You will need the necessary expertise to normalize the disparate feeds by parsing the required data out of the feeds. This could require a sophisticated understanding of RegEx and/or JSON/XML. Expect to create a different parser for each of your unstructured sources of intelligence.You will also need to store the parsed information in a database for later use. To give you a sense of scale for this initial repository, remember that, today, collecting a large number of feeds could result in as many as 10 million indicators per day or more. That number will only increase with time. Plan accordingly.However, before storage, you should de-duplicate your collected indicators to reduce the overall processing load. It is important that care is taken in the preceding normalization step, as incorrectly normalized events will not be identical, and therefore will not be de-duplicated, resulting in unnecessary load and duplications in later stages of the process. These duplications could even lead to duplicate alerts and investigations. One way to handle deduplication would be to check the database for the existence of a particular piece of data before adding it as a new entry. If it is already there, adding a tag to the existing entry to note it was also found in another source is useful for context.Once you have normalized, de-duplicated, and stored your chosen indicators, you must do necessary maintenance on previously collected indicators. The reason for this is that indicators change over time. Sometimes they change types, such as going from a Scanning IP in March 2014 to a Brute Force IP in May of 2015. You need not only to capture and reflect these changes over time, but also “expire” indicators after some period of time. This can be an arbitrary time frame that is set globally, say 30, 60 or 90 days, or it can be set individually by indicator type. Be aware though, that failing to expire indicators promptly will result in increased false positives, while expiring them too quickly will increase your false negatives. It is a balance that must be struck, monitored and adjusted as needed.Next, you will want to score and/or weight your intelligence in some fashion. Both give you the ability to prioritize certain indicators or sources, to allow you to focus your attention on those first, among the millions of indicators consumed each day. Do you trust one feed more than another? Give it a higher weight. Use that weight in your evaluation rules to prefer information from this source. Do you consider one type of indicator more threatening than another? Most do, but you will need to define them yourself, decide how you will classify them, and then incorporate these values and weights into your evaluation of what to present to your analysts.The scoring and weighting are the first enrichments you will perform on your intelligence data. Since you want to maximize the number of events/incidents, etc. your analysts can triage each day, you may choose to enrich your indicators for context. In addition to scoring and weighting, enrichment can mean many things. For example, information such as GeoIP, WHOIS requests, or reports from sites like VirusTotal or SHODAN. Basically, anything that will help your analysts to make a decision in the shortest amount of time should be considered at this step.Enrichment challenges include possible costs for commercial enrichment sources, coding or scripting necessary to integrate with your indicator database and maintenance of those mechanisms that enable the integration. Each new source of context brought in increases the size of an indicator, so planning should include the increased storage requirements.Advanced enrichments might include associations with actors, campaigns or incidents and tracking of actor aliases. These further enable analysts to gather all relevant information on indicators into one place, requiring less manual research and more timely decision-making.Up next in the series: <a href="https://www.anomali.com/blog/doing-threat-intel-the-hard-way-operationalizing-threat-intelligence">Operationalize Threat Intelligence</a>

Chris Black

Chris Black is a former Senior Sales Engineer at Anomali.