In the past the Windows Portable Executable (PE) format has been analyzed far and wide due to the historical large scale adoption of the platform. In contrast the Mach-O binary format (executable file format used by MacOS X, IOS, and other Mach based systems) has received much less attention. This is partially due to the comparative lack of malware and forensics engagements focusing on the Mach-O platform. As the Apple operating systems gain adoption this is changing. A few weeks ago Kaspersky’s GREAT released a report on a Mach-O variant of the ekoms spyware.
There are many methods to find similar PE files as described below. This post describes a new form of finding similar Mach-O files.
The Mach-O executable format is similar but significantly different than the Windows PE format. For starters Mach-O FAT files can include binaries for multiple architectures in a single package as seen in the simplistic overview of the format in Figure 1. In the Windows PE format, there are multiple PE sections, which can be named by the compiler.
Figure 1: Simplified Mach-O FAT file format
In 2014 FireEye released Import Hashing as a tool for analyzing the Windows Application Program Interface (API) functions used by Windows PE files. The Imphash was integrated within the VirusTotal platform shortly afterward, and has been a favorite pivoting tool of analysts ever since.
While Anomali Labs was experimenting with many different methods of hashing to aid in analyzing Mach-O binaries,most of the forms of hashing were found to be extremely brittle. As we gained an understanding of the Mach-O format, we realized that none of the current segments were a 1:1 match to the Windows PE ImpHash, nor were any of them as powerful as that method.
Anomali Labs used open source libraries such as ‘macholib’ and ‘machoinfo’ from CRITS. Macholib provides an API call to retrieve the hash of different MachO Segments. Due to the large amount of Sections within each segment. The machoinfo library from CRITS provided an easy to use interface to retrieve Mach-O section hashes. This technique worked better than Segment Hashing, but due to the wide variety and placement of Mach-O sections comparing sections well was complicated. Additionally sections that were semantically the same, could result in different hash values because of additional data included with the section.
Figure 2: Symhash Code
SymHash follows the same methodology as ImpHash but for Mach-O executables. Using the CRITS machoinfo library, we retrieve the Symbol Table from each Mach-O entity (that is, each Architecture’s binary within the Mach-O FAT file). Then for each symbol we check to see if it is an ‘external’ symbol, that is, a symbol that is resolved in an external library, and of n_type 0x00. We then append that string to a list, and finally create the md5 hash value of that list of externally referenced symbols.
The symhash allows the analyst to find additional executables that share the exact same set of externally referenced symbols. The symhash fails in the same situations as the imphash. This includes:
- Programs with simple functionality with very little dynamic / external libraries.
- Packed programs. While packing is much less prevalent in Mach-O executables due to the executable architecture and restrictions on Read/Write/Execute sections, X-code offers a built in form of copyright protection which is used. (insert link)
- There are cases where programs reference the same dynamic / external libraries.
In our expirement, a corpus of 1140 files produced multiple different segment, section, and symbol table hashes. In comparison tests, the symbol table hashes proved to be much more useful in connecting files. Within the 1140 Mach-O files, there were 1450 entities which resulted in 586 unique symbol table hashes. Many of these Symbol tables are found within Potentially Unwanted Applications and Adware, which make up the vast majority of Mach-O AV triggering software. One cluster of malware identified in table 1 below, has the symhash of 9190c790e6404b0eacfd05402ad85e0e which corresponds to the NetWire RAT.
Malware SHA1 Hash
Symbol Table MD5 Hash
Table 1: Samples of the NetWire RAT which share the symhash of 9190c790e6404b0eacfd05402ad85e0e
If you would like to use symhash in your analysis, checkout our newly open sourced (MIT license) library and command line tool found at https://github.com/threatstream/symhash/. Feedback and pull requests are welcome.