Adversaries are constantly changing and improving how they attack us. In this six-part series we'll explore new or advanced tactics used by threat actors to circumvent even the most cutting-edge defenses.
DGAs are code that programmatically produce a list of domain names. In most cases, the algorithms behind the malware that generate DGA domains vary just two elements when creating domains:
These algorithms produce command and control domains which are used to communicate with malware-infected machines. Often these domains are nonsensical, such as sndjfnin.com. In other cases DGAs like Oderoor and Bobax will produce domains on sites that allow 3rd party domains. This usually includes sites that provide dynamic DNS, and may look more like sndjfnin.dyndns.org. Measurements of domains generated by DGAs provide an understanding of a large cross-section of malware targeting nearly all industries, and includes such well known categories as exploit kits, crimeware, and ransomware.
DGAs are a robust way for malicious actors to protect their ability to get data from a compromised computer back to a computer that they can more easily access. With a network connection between these two computers malicious actors are able to do things like:
DGAs are advantageous for malicious actors in a number of ways. For one, hard coded lists of domains created by a human may contain a pattern, making detection and extraction from malware easier. An algorithm can instead generate thousands of pseudorandom domains which are difficult for humans to link to one another.
A grossly oversimplified example would be:
The latter obviously look suspicious, but with the prior ones it’s easier to identify what connects each domain. Automatically generating domains instead makes malware authors more nimble. DGA domains ultimately serve to make blocklists ineffective - even if you positively identified and blocked one there are still an unknown number of DGA domains out there. Many DGA implementations will generate hundreds or thousands per day, but only make a few active. This puts a large burden on the defender to stop all domains while minimizing domain registration effort for the malicious actor. Some DGAs could also be pre-registered months in advance of being used to help bypass blocking newly registered domains.
After all of this discussion of domains, some of you may be rightfully wondering why an IP wouldn’t still be easily identified as the source of thousands of domains. In the majority of cases DGA domains are not hosted on one IP. Malware authors recognized this issue and began pairing DGAs with another technique that shuffles around IPs by using technologies such as Fast-Flux. How rapidly they could change IPs is a contributing factor for why IP blocklists are an aging tool, and another reason that DGA domains are so difficult to detect. This combination of DGAs with IP shifting proved to be the key to getting past defenses.
Domain Generation Algorithms create a constantly moving target that cyber defenders struggle to successfully hit with a blocklist. Part of this is due to how the algorithm is set up and how easy they are to update. All DGAs are based off of a static and dynamic seed, which ensures that the domains are constantly changing. Nearly all algorithms use different approaches to randomize how they pick the letters in the second-level domain, which is the section of the domain before the “.com”. These seeds could be anything from today’s date to the 8th most popular topic on Twitter. To make matters more complicated, malicious actors could choose to represent the date in different formats like 8/31/17 or 083117. However it’s coded, the software knows what to look for.
Some DGA domain names can even be entirely word-based, which creates a significant problem for those trying to identify them. Sdkfjdi.com looks odd, but birddog.com does not. Random character DGAs are more common than these wordlists due to the difficulty to create and register domains without pre-existing domains complicating their registration effort. By our count, algorithms that generate entirely word-based domains account for only about 5% of all known DGA-capable malware families.
Malicious actors can also change how long these domain names are active. In the majority of cases they’re active for only one to three days, although the potential lifespan of DGA domains has appears to be increasing. Five years ago, most had the characteristic lifespan of three days or less, but now DGA domains lasting even 40 days are somewhat prevalent. Some may even endure beyond that mark. Whatever the lifespan is, a blocklist largely proves ineffective because these domains will expire and others will immediately take its place.
The evolution of DGAs is a traditional cat and mouse game between malware authors and cyber defenders. In the late 1990’s, malware began proliferating across the Internet. Its authors noticed that once their malware was installed on a computer, security analysts would simply block the outbound traffic's IP addresses. Blocking IP addresses was straightforward because it took place on the router, which was required for internet connectivity. In response to this, malware authors began to use domain names for identifying their infrastructure. Rather than calling to a list of domains they developed a way to generate domains which could not easily be identified. Hard coding domains proved to be an ineffective measure. Network defenders in turn began filtering domain names at proxies and DNS stub resolvers.
In 2008, the Conficker botnet was the first malware botnet to use DGAs. Conficker.A generated 250 domains per day in order to remove defenders’ ability to discover and block the malware communicating with the C2 infrastructure.
For the past few decades security has been based on signature or indicator based blocking. This proves to be not as effective for something like DGAs, where the indicator is constantly changing. Lists of DGA domains are published by some organizations as a remediation measure, but unlike other indicators will usually expire within 24-48 hours.
One approach that people take is to try to reverse engineer DGAs. While it can be successful, this method is ultimately inefficient because each family has an almost entirely different algorithm. You would also need to know that you can identify every family, which is impossible because new families are developed every day. From a mechanical standpoint, a new giant list of domains each day is too much for a computer to sift through. This isn't taking into account that each malware family and subsequent algotihm would be spitting out that many domains per day. It also takes a huge investment of human time and effort to reverse engineer these algorithms. There simply are not enough trained professionals to operate at scale. Regardless of the technology or expertise applied to the task, the malware can always be changed and updated, effectively canceling out any reverse engineering efforts.
At Anomali, our approach is to focus on detection via pattern matching, where incoming domains are analyzed in real-time to find statistical patterns of DGA characteristics. This approach doesn't suffer from any of the drawbacks listed above, and our product Anomali Enterprise can perform this detection immediately upon deployment.
DGA Domain Matches in Anomali Enterprise
Threat actors are constantly changing their tactics, techniques, and procedures. While we can never exactly predict what these changes might be, we can better equip ourselves to meet these challenges by working collaboratively across industries and areas of expertise.
Evan Wright is a principal data scientist at Anomali where he focuses on applications of machine learning to threat intelligence. Before Anomali, he was a network security analyst at the CERT Coordination Center and a network administrator in North Carolina. Evan has supported customers in areas such as IPv6 security, ultra-large scale network monitoring, malicious network traffic detection, intelligence fusion, and other cybersecurity applications of machine learning. He has advised seventeen security operations centers in government and private industry. Evan holds a MS from Carnegie Mellon University, a BS from East Carolina University, a CCNP and six other IT certifications. Twitter: @evanwright