“Gophering”—How to Catch Cyber Gophers, Not the Bill Murray Way: Detect ‘19 Series

After you have watched this Webinar, please feel free to contact us with any questions you may have at general@anomali.com.
Transcript
JOAKIM KENNEDY: Thank you guys for showing up.
So this talk is gophering, which is how to catch gophers, but not the bill Murray's way.
So sort of the idea behind this talk is to go over a little bit of the binaries that are being produced by the Go lang compiler.
So they will be a little bit technical, but bear with me.
There is some interesting actually goodies that you can get out from this.
So who am I?
So my name is Joakim Kennedy.
I am a threat intelligence manager, and I work in the anomaly threat research team.
And we produce a lot of content.
If you're a reader of Anomali's blog, most of the threat research comes out of our team.
We also do a weekly digest.
It goes out as a community threat bulletin every week.
In general, I like to do research.
I definitely like to track bad guys and see what they're doing, and I have, you know, forced them to keep evolving.
In addition to that, I also like to build some stuff.
So this kind of is research that I did, and sort of what came out of it is a tool kit that you can use to analyze these type of binaries.
And it also will show how it interacts with some other tools that usually is used by security researchers.
So it's sort of the agenda.
First we're going to do a little bit of a crash course into Go, because it's a new language.
And it has a little bit of some quirks.
And that's kind of why it was sort of created.
And then we'll look, dig into, and see how we can actually recover function information, which is kind of something that you want to be able to do if you want to analyze this malware or suspicious binaries.
The other thing we'll look at is types.
So it's a good way to try to get an understanding of the binaries doing by kind of data structures that it has.
And then we'll take a look at some practical use cases where we put these together, and we can actually get some interesting information out of it.
So what is Go?
It's a new programming language that was actually developed inside Google.
It started in around 2007 by Robert Griesemer, Rob Pike, and Ken Thompson.
And some of these names might be known.
So Ken Thompson was the original creator of Unix.
Rob Pike was also at Bell Labs later on and worked on the successor to Unix and for Plan 9.
And Robert Griesemer, if you're into JavaScript, he was the actual creator and chief architect for V8, the JavaScript engine in Chrome, also used in Node.js.
So three pretty distinguished computer scientists.
Like, one of the things that led to this new language was that-- the story goes Rob Pike was sitting in his office and working on internal services used by Google that was written in C++.
It was a highly multi-threaded application, and every time that he did a change, it took 45 minutes to compile and test it.
And sitting here in the 21st century, we go and, this must be a way to do this much better.
So that's kind of where it started.
What came out was a memory safe garbage collected language that's also statically typed.
In essence, where you can look at it as a new Java or C#.
It's kind of where it sits in the space.
Not directly a C, C++ replacement.
More of something for commercial and for enterprise sort of use.
And it is designed for a 21st century, so it handles multi threading, coprocessing, and it has a very, very good network stack.
It statically compiled, which means that, when you are actually building the binary, it will take all of the dependency that it needs and basically pull it into the binary.
So when you try to run this application on another machine, you just have to move the actual file over to that one, and it should have everything that it needs.
If you're a threat actor, this is really nice, because I don't have to worry about that the machine I'm running has the right lib C version.
That all the dependencies are there.
And it's also very, very easy to cross compile.
Because of this, you basically just can set two environment variables before you compile something, which is the architecture and the targeted operating system.
And it will spit out a binary for that, so it's solving a lot of the issues with C and C++ with regards to sort of tool chaining for cross-compile.
Very useful if you're a bad guy.
The language, it's kind of a clean sort of design.
It kind of looks very similar.
It has some touch of, like, scripting language.
So this is how you would write Hello World.
All code is part of a package.
This one is main.
And we have a function called main, so it will be main.main.
And all it's doing is importing the format package and will print Hello World.
Now if you're looking under the hood, we actually seen-- I'm going to point.
The first sort of bit here is actually a check for the stack that would otherwise jump down to here.
So Go implemented a method that's a miniature stack.
It's to handle a lot of this concurrency.
You can spin up a lot of processes, and all of them will have a small stack to sort of be optimized.
And so each function will actually check to see.
This is very characteristic of, like, a Go binary.
If you compare the actual code that's coming out to a C equivalent-- so on the top, you have actually a 32-bit C.
The bottom left is the 64-bit C, and then on the right, we have the Go.
One big differentiator is that Go does not use push-and-pop instructions.
So in this column that's here is the actual hex representation, which gives you actually the length and the size of the actual instruction.
And if you're looking here, you can see that the Go code is bigger.
A move instruction instead of a pop or a push.
It's longer.
It takes more space, but it does allow for an optimization based on sort of a CPU optimization, because it's not dependent on which order a move happens.
It's sort of optimized for speed over size, where C, when it was designed back in the '70s, size was a big problem.
And memory space.
So it's designed to have a very, very small footprint.
In addition to sort of normal calls, it has a couple of different special calls.
One is a defer.
A defer call, it's just called with a keyword in front of the function.
What it allows is something that's similar to a final in Java, so it's something that will run basically once the code writes return.
And all of these calls are put on a stack, where it's the last in, the first out.
And usually commonly used to close the resources, files, or other sort of things just to clean up, so that's usually where you're seeing this.
And the other special call it has is what's called a Go call.
This is how Go uses Multi, or coprocessing.
What actually is executed under the hood-- so this is how I would call a Go, like myfunc.
What's happening under the hood is that the runtime that is running, actually, it has its own scheduler.
So this is actually spinning up like a very tiny thread that is not handled by the operating system.
It's handled by the runtime of the binary, so the runtime has its own scheduler which allows hundreds of these to be spun up.
They're very, very cheap.
But if you're not careful, it can lead to a lot of memory leaks if they don't return, because you lose control over them.
Under the hood, this is the function that's being called.
So if I would call my function, there's no way for me to pass directly the functions or the arguments down.
According to the code documentation, what's happening is the arguments to the function that you want to call is first pushed on the stack, and then the two values that you have there is put on.
So the way Go is calling the calling convention is everything is put on the stack.
It's not through a push, but it is moved.
And then that's sort of how it's handled.
So slightly different from what you would see in C languages either if it's for the POSIX or the Windows one.
So on the return side.
So one of the new, interesting features with Go is that it allows actually for multiple values to be returned.
It's kind of a new concept, and there were some languages have used it.
So what I have here is actually a simple example where the get arg function is just using a compiler pragma just to tell the compiler do not inline it, because it's too short.
It's a very short function, and the compiler will actually inline it to actually optimize it up.
And what you can actually see in here is where the function is called.
And right after, it actually-- so it's expected to return a string.
The string is represented by two values.
A pointer to the data and the length of it.
And then right after the function is called, we see a move from a variable, which is calculated from the ESP.
So where the stack pointer sits.
The color's not right, but it says var4h and then esp.
And what this is actually-- it's actually taken the return value off the stack, so the actual return values are returned on the stack.
So all the communication is sort of between the call and the calling is going through the stack.
So nothing is stored in the registry.
Sorry.
In the registers.
This is all easy when you have symbols, but of course in the real world, bad guys are not nice to you.
And they will actually strip all this stuff, so you're blind.
But luckily, there's a solution to this.
So if you take this modified version of the Hello World, in this case, instead of actually printing it out, we're going to panic, which is a way of crashing the application.
Then if we actually compile this one, and we pass in flags, the dash s tells the linker to strip the binary.
Dash w tells the linker to remove any debug information, so we don't want to dwarf information.
So technically, this binary should be stripped.
If we run file on it, it will return and tell us that the binary is stripped.
So there shouldn't be any way for us to determine any function information, sort of names, or whatever happened.
But then when you run it, this is the error message that comes out.
First we get the Hello World error message, but then on the bottom, it gives us the path where the file was compiled.
So I did compile this one in the temp folder.
It also gave me the file name that the file was called, and then colon 4, which is the line number.
So even though this binary is stripped, something is tracking that process counter to sort of the code line in the file.
It's great information if you can actually get that out, which can allow us to actually do some really interesting analysis.
If you dig deeper, you actually find out that-- so these are all the sections in an ELF binary after it's been stripped.
This data is located in number 7, so the gopclntab.
It's an interesting name for a section.
If you Google around, and search, and try to figure out what this is, the closest thing you can come by is actually to a man page in Plan 9.
So this is a man page from the output from a compiler-compiled file, and it talks about the different sections.
And one of those talks about finally a PC, assuming it says process counter slash line number table.
And if you read on in the next section, it tells us that this can actually be used to recover the absolute source line number from the given program.
So it's giving an idea that we are probably on the right track this is actually what we're seeing.
And also knowing that one of the creators of the language also worked on Plan that are still like the core developers of Go, worked on Plan 9.
So this seems kind of like we are on a good track for this connection.
Digging further into the debug package.
So this is part of the standard library.
It provides some good helper functions for things.
There is a test.
This test is actually taking this section, and it creates a data struct the last lines and new table.
And the comment says this table represents the Go symbols.
And it stores, you know-- allows sort of the translation between names, and addresses, and things like that.
This very interesting part is this function works even if you don't have that symdata, which would come from the symbols.
It will do a check, and it can be nil, and it will skip, which is good because otherwise this wouldn't work given that the binaries we're seeing usually don't have the symbols.
But all this information is still in that table.
So here's what the table looks like.
Comments here are taken directly from the source code.
There's been a lot of change from 1.2 to 1.13, which is the latest.
What we are looking for and interested in are the arrays of functions, which is the func.
What they have internally is they have a LineTable, and this LineTable has a mapper function.
If you give it a process and an instruction location, it will return the file name and the line number.
And then on the other end, if I give them a source code, it can actually return a process counter where this will be located.
So this is very powerful, because if I can find this table, I can reconstruct a theoretical kind of projection of what the source code line actually looks like.
So I'm going to show a demo, so I'll walk through here.
So this is tour into using this, and we'll actually walk this table.
And so pause there for a second.
This is Zebra C.
It's, like, initial dropped-- it's the initial sort of data collected used by a Russian actor.
Its Russian nation state actor.
What it's doing is usually it collects some information about the host.
What it is.
It usually takes a screenshot, and then what they're usually doing is they're trying to steal credentials for further compromise.
This tool has actually walked this table, figured out the code that the author wrote, and then reconstructed sort of a folder structure.
So you actually have the initial folder on the top here, and then you have the file names.
And then you have the functions, and then you have the line number where the function starts.
Line number where the function ends.
And so on going down.
Which is quite interesting.
And this was one of the first samples they had.
The second one, they start to obfuscate the function names.
But we already have the line numbers, so it was easy to kind of map them together.
Let's take a look.
So here's another example, where we're looking at a ransomware.
So let me move this over.
This is a ransomware that we reported about in July.
This was the information we saw when we picked it up first.
The author called it QNAPCrypt Worker.
And as a security researcher when you see something write a message, random sequence, make secret, encrypt, and nothing talks about decrypt, it's sort of stuck out to me like a ransomware.
Given the names, like, oh, maybe we have a ransomware targeting QNAP.
Lo and behold, we have a ransomware targeting QNAP systems, which is really nice.
You know, all by just knowing what kind of a ransomware, this actually allows you to just triage a lot of files quickly.
And here's another one.
This is a-- oh, sorry about that one.
Let me get it.
This ransomware is called-- interestingly, the [INAUDIBLE].
But if anyone knows who Robin Hood, which one was, they hit a city not too far away from here.
This is the ransomware that took down Baltimore.
And if you're looking at it, it looks it looks from-- I don't know if you guys see it.
But me, as a security researcher that looks at bad stuff all the time, this just screams ransomware to me.
It just seems like very malicious.
It just encrypts stuff.
The only places where you might see something-- the only bad stuff will just run encrypt.
Nothing will have anything that, you know-- if it doesn't decrypt anything, but it does encrypt something, that is kind of a warning sign to me.
Also things like kill process, create key, things like that.
So this kind of tool is kind of good because it allows a-- it doesn't allow someone.
You can actually analyze a lot of malware or samples of suspicious binaries without digging in and actually looking at the binary in detail.
You don't need the experience of the reverse engineering.
So that's sort of the functions.
Let time is it.
Let me check.
We talked about sort of Go actually has a type system.
It's kind of one of the powers with the language.
So I talked quickly about strings.
This is how we can sort of define it in the language.
There's two ways of doing it.
Top one will sort of just define it.
The bottom one, you can assign a value directly.
You don't have to tell what kind of type it is.
The compiler will infer from what you're writing.
Under the hood, it's two words.
So it's a pointer to where the data starts.
And then the other one is the length of that.
Because of this, normal tools like strings for example, don't handle these files great.
You'll get massive chunks of strings, because it doesn't use null terminators and things like that.
It has arrays.
Slightly different from like in C, an array in Go has to be determined-- the size of it has to be determined at compile time, so it's actually not commonly used.
But instead there is an abstraction on top of that called a slice, which is closer to what they list in Python, if you program in Python, so it can dynamically expand.
Under the hood is using arrays, and this is sort of how you define it.
It's represented under the hood by three words.
The first one is pointer to the data, the length of the data, and then a value that's called a capacity.
You can actually set that this slice cannot grow larger than this.
This sort of information is good to know, because when you're seeing something that's being passed in with some pointer and then two values, they usually are the same, which sort of looks like probably a slice of some data.
And we'll actually have a quick-- we can have a quick look at what it looks like if we get to it.
It does have a support for structs or like structures, very similar to C.
The only main difference is there is no semicolon and the values are flipped around with the type that comes afterwards.
The way it's using sort of a object or a inheritance is through what's called an interface.
The language is a-- the way-- it's more like a duck typing-- it's duck typing language, I think is what it's called.
If it talks like a duck, walks like a duck, it's probably a duck.
The only thing you have to do to satisfy an interface is to implement the same function signature as what's required with the interfaces.
For example, in Java, you have to specify that it implement this interface.
Go is not that way.
All it is, is, do I have a function, what's next.
And it takes the same arguments and returns the same argument, and that's good.
Under the hood, this is handled through a v-table, d, plus, plus.
Some sort of virtual table to figure out what it is.
And then it has a pointer to the actual type or the struct.
In addition to this, you obviously see it has all the different integers, 8-bit, 16-bit, 32, and things like that.
But this is kind of the more stuff that you would deal with more on the normal level when you actual work.
So there is just an example.
Defining just a simple struct with a string and then an integer for an h.
Int, for example here, is platform dependence.
So if I compile this for 32-bit, it will be a 32-bit integer.
If it's 64-bit architecture, it will be 64-bit.
You can't specify it, which makes it problematic when you're writing parsers for this kind of stuff because you have to know what it is and how to handle it appropriately.
And when you run this code, this is actually what's called.
So you see a call runtime new object.
And here's this value that's passed through.
We read actually effective address, and then it's moved on to the stack.
So that's what's passed in.
And this is actually the allocation of sort of in the function where does struct is sort of instantiated.
From looking at that location, this is actually the hex value.
And then from looking in the standard library to figure out what this function takes, it takes what's called an underscore type or an r type.
This is a structure type that's-- the hex are copied between reflect package, which is used to dynamically determine types, the runtime, and the compiler.
This is the way go binary is tracking all these types.
And what it's doing is it needs to get the size of the system, what's passed to Malik further down the stack to actually allocate the space.
So that's the first.
So when we're just reading this sort of table, we see the size, we have the point to data which will-- the comments are from the source code, and it talks about how the amount of data that might be pointers.
We have a hash of this, can be used for comparison.
We have a set of couple of bytes.
The most interesting one here is the flag, which gives you a little bit of extra information.
We'll use it in the future slides.
The kind is really good.
It will actually tell you what kind of type this is.
In this case is a hex 19, which correspond in enum that it's a structure.
So if I find this table and this data, I can actually create all of the different types.
And I know what-- we have some stuff that's used by the garbage collector and sort of equals, comparisons, things.
And then at the bottom, there is a offset to a string form.
That's where we get the name of the type.
So if you can-- the character, dereference that pointer and recover the name that the actual author called the type, which is not something you usually can get in a compiled language.
Looking there's a lot more coming after.
Because if you look at this, you can see sort of a table.
There's-- they kind of fall in the same sort of structure.
There are variation in the size, but they seem to be some other jumbled data in between there.
Now we know what a-- this was a structure.
So if you look in the actual source code, there is a type called struct type, and this is encapsulating this type that we were looking up there.
The next part, that would actually be this here, actually points to the package name.
In this case, it would sort of say main.
And then we have three values: a pointer and two values that are equal, looks like a slice.
And this also has a slice in it.
So you kind of can map this out.
It also tells where the pointer starts, which actually just points to right after the structure.
This is information about the fields in the structures, so it has its own separate, where you have the name of the field, you have the type.
It's a recursive lookup.
And then we know the offset, and then it just sort of repeats.
Says two of those.
We can actually read this table and reconstruct definition from actually a compiled data structure.
Now we have this thing in the middle.
You should wonder what is that, something just sandwiched in there.
It has something that's called a-- if you check the flags, it could come back as an uncommon type, which seems kind of an odd terminology.
But if you're reading this, it's any-- an uncommon type is usually a type that has methods.
In Go, any type can have methods attached to it.
And this is how it's sort of defined.
It attach another structure right after it that will point to how many methods work and then where you can find them, which gives you an opportunity and a chance to actually recover names of methods and function signatures.
So when you kind of summarizes actually what's in this table, is this whole thing for just one type, and we can actually use that to recover functions-- or sorry, recover structures in a binary.
So this is-- we're looking at the ransomware.
So what the tool is done is just walking this whole table and recover all these structures.
And finally, actually function-- a structure in the main package, which is quoted by the author, that takes a field name called RSA public key and the README.
And we know this is a ransomware.
So we know it's probably highly likely that is connected to a C2 server just to get some specific information for the campaign.
This-- it kind of can be hard to actually figure out what it is, for example, in a C compiled language, because you don't have those specific types.
This is-- Linux.Lady, I think, was the first reported malware written in Go by Doctor Web.
And if you're looking at it-- sorry about that.
All of these different types are not written-- was not part of the standard library, but it had some of the goals, attack minor D, and update, and payloads, and things like that.
And I really haven't looked at the malware, but I would guess it scans and compromise and installed minors on Linux machines based on just what I'm reading there.
Followed that was a malware called Linux.Rex.
And then in a similar way, there's a lot of-- this is just structures.
Doesn't include types.
So you can have strings and things like that, too.
But there's a lot.
This one is kind of interesting, because I think it uses a peer to peer network to-- to determine this without any tools is really hard, and you get it sort of for free.
So here's an example.
I'm searching for scanner.
So give you an idea what it's compromising.
So it's trying to do some Drupal.
So this is another kind of malware that was scanning and compromising web front ends and stuff like that.
I think there's HP scanner, we have PHP, you have-- yeah.
And then all nested types and things like that, too.
So it's quite a nightmare if you have this for yourself, you know.
So I mentioned a little bit quickly about methods.
Yes.
So this is how you define methods in Go.
It's a-- I have this, a structure.
And then anywhere, you can write.
So you can actually define methods on the types that you don't own or do you have it written, you can attach your own methods to it.
So first parentheses.
There's tells the compiler.
This is the receiver.
The asterisk in front of it says is to be a point-- is a pointer to this, which would mean that when this function is called, it's actually passing in that type by a reference so you can change it, not by a value, for example.
This is a common sort of an option between certain programming languages, where if you pass something by a reference, it can be changed.
If it passes by value, it can't be changed, and it's completely sort of owned by the function.
This is a very simple one.
They will just sort of do a string representation of the actual structure.
And here's this sort of table that it comes out.
We sort of looked at-- this is the uncommon.
It has a-- gives an offset from where this data structure is, where you will have a list of these or slices of these.
It will give you the name of it.
It gives you the type, and then it gives two offsets into the text section of the binary where the function code will be located.
One for if it's an interface code and one for if it's a normal code So I walked this table, and this is for an internal structure for this string conversion package part of the standard library, and they have a type small.
And then I find it quite-- when I was going through this-- kind of interesting that it's a lot of functions, have zeros as to where the code is located.
So looking in another tool, looking for symbols in a non stripped binaries, I can't find those functions either.
It matches up.
So we're missing, for example, the rounded integer.
It's not a symbol there either.
So I have some reference to it, but it doesn't exist there.
And this is when you scratch your head and you wonder method pruning.
So did some developer on Stack Overflow, also pasted.
It turns out that Go started to do some method pruning about 1.6, which means it does some logic and it will remove code if it's not being reached.
So the interesting part is you can actually extract code-- information by code is not there.
It's been interesting scenarios where I've seen methods used on a type by a threat actor that moved to newer version, still keep the old code, and you see the sort of that same function but it's never been used.
So that's kind of the sort a little bit of caveats when you're looking at these binaries, this receiver part.
The methods end up on-- if you're using the asterisk to do the pointer, the methods end up on that type.
All the different-- it's kind of a bizarre sort of a language where a pointer with type itself is its own specific type, kind of very weird way of thinking about it.
But that's how it's controlling.
So this means that if you would have defined a method this way, they would end up on two different tables, so to speak.
Of course, after one 1.6, you will end up with a lot of-- you can have methods to be removed, which can give you no pointers if you're not careful with your parser.
Also, they did redesign all of these structures.
For example, everything from 1.7 beta 2 to current is the same with slight modifications.
I think it might be-- the offset or the offset in a field in the structure has a higher bits value for some extra stuff.
But otherwise, in general, it's the same.
That beta version as unique, so you have to write a specific check just for these 1.6 and have a very different format completely, and so you have to handle sort of legacy versions.
And anything older does not have any of this, or it has a limited amount of type information and then some extra step.
So it's a lot of stuff in there.
So what can you do?
So if you put this together, you can utilize this to analyze aides or tools when you analyze the malware.
So this is a tool that's called Radare, open source disassembler.
What-- let's pause for a second.
It has the functionality where you can run external tools and it will open up a input and output, then you can write and read stuff.
So you will read-- basically called the same tool, and it will-- this tool has been demoed, and it will know that it's being called from within this application, take over the session, extract all the packages, all the function, all the types, and start annotating stuff.
This is a ransomware.
If you don't have any tooling, you're stuck with 5,000 functions that you try to figure out what they're doing.
This way, you can quickly figure out which one to actually look at.
I'm jumping to the init function, mainly because I know there is some interesting stuff going on there.
Most of this code is put in by the compiler to initialize all of the different libraries being used, and I'm just going to jump into the init 0 from the main.
But it has a little bit of-- it can take some flags arguments from the command line.
But the interesting here is to get info function.
By the way, all of these different function names were annotated by the tool, not by the actual-- from any symbols, because they're all stripped out.
What we have here is we're seeing memory allocated-- something that's been allocated.
And we're using a tool, the same tool.
This is just a macro that will extract the location that this is pointing to and actually print out what the structure looks like, which is the one that we looked at previously.
So that's what it seemed like it should allocate coming in.
And then we'll see, we'll call and set up a SOCKS5 proxy.
We'll have a construction of-- using an HTTP client, and it will use the proxy, create a new request, going in and see a deferred return, and we do a client call.
This is sort of the classic way you would do an HTTP request in Go.
And it will read the response in the reader and then do some JSON unmarshalling.
And this is sort of a-- this is the way a lot of JSON is handled in Go, where you actually will pass in a pointer to the type to the marshaller, the unmarshaller, and they will convert it into the structure you have.
And so this is the C2 communication that is done before.
So it reaches out to get the public key and the ransomware note before it starts doing anything else.
So quick sort of summary.
So Go binaries, they're big, but they can be tamed if you have the right tooling.
The compiled code has a very distinct look, so you can-- once you sort of looked at it and you recognized it, you can immediately tell what it is.
There's a massive amount of metadata in these binaries, and you can use this to analyze a lot of sort of linked stuff between different architectures and operating systems, because it kind of is created from the source code.
We allow to sort of recover function name, type information from these stripped binaries.
And this is also for private stuff.
So that if it's internal and hidden to a coder, we can still extract it and get that information.
I like this part because you can kind of get an educated guess of the source code tree, gives an idea like how organized is the threat actor.
When it's writing, does it throw everything into one file or did you break it out to it like a smart way as like a normal coder would do?
And then also because this is based on the source code, it doesn't matter what the assembly language is below it.
So you can actually link samples that are for, say for example, miss per arm to x86.
It all has the same sort of structure, or you can look at different operating systems.
That concludes.
Thanks.