Malware Analysis: Part 1
Written by Mike Sweeney, Senior Incident Response/Forensics Analyst at Critical Defence. Mike has achieved a GREM certification from SANS GIAC twice, and holds a Bachelor of Science degree in Information Security and Forensics from the Rochester Institute of Technology.
Let’s take apart some malware! Over the next week, we’ll be posting a three-part blog series in which I’ll walk through a known malware sample, and explain how to get usable data out of it. While the specifics are definitely malware and analysis environment specific, the general process and procedure can be used for any binary file. The idea here is to help you understand how to take a malware binary found in your environment and learn how to get actionable indicators from it.
Who is this post for?
This blog posting is for anyone who has an interest in malware. You’ll probably understand more and be able to follow along if you’ve got at least some programming experience, but even those who don’t intend to follow to the end may learn a bit about the process along the way. Later on, there is definitely some assembly code, so it will help to have some knowledge of that too.
What is malware?
Malware is any type of software which does something which is “bad.” As you could probably imagine, this is a wide category – common items known as ‘malware’ are rootkits, backdoors, bots, ransomware, credential stealers. However, this definition can extend to something like a Linux kernel module if it does something unintended or suspicious.
Some quick terminology…
You’ll see some of the following terms in this series of blog posts. Here’s what they mean:
- -InfoSec: Information Security
- -Malicious Actor (or just “Actor”): An individual, or a group of individuals, who are out to do bad things.
- -Operating System (or “OS”): This refers to the software your computer runs. Typically MacOS, Windows, Linux, Android, iOS, Unix, and so on…
- -Virtual Machine (or “VM”): Virtual Machines are basically an OS-Running-within-an-OS environment. Software like VMware lets us create things like virtual disks, configure virtual networks and network interfaces, and manage USB access.
- –Guest OS (or “Analysis Environment”): An operating system which is installed inside of a Virtual Machine.
- –Host OS: The Operating System which is running your copy of VMware or VirtualBox.
- -Binary: AKA, Executable, EXE, PE File, and so on…
What’s the point of reverse-engineering?
Reverse-engineering (or “reversing”) is the name for the process we undertake to figure out how malware works.
We take software apart so we can figure out how it works. In the case of malware, this gives us a series of information which is particularly helpful in case we’re doing Incident Response:
- -Purpose: What does this malware do? What is its goal?
- –Attacking another IP address/domain?
- –Stealing keystrokes or login information?
- –Encrypting our files and demanding payment?
- -Attribution: Who wrote this? Is this a general attack, or are we being targeted?
- –Have we heard from this actor in the past?
- –Can we use information learned about this actor to figure out what they might try next?
- -Our Weaknesses and Flaws: How did this get into the target’s environment?
- –What kind of damage/havok did it wreak?
- –Was privilege escalation or another software flaw used?
For example: If we come across some malware on a network which is sending out large amounts of data, it is helpful for us to know what that data is. Is it client data? Is it corporate secrets?
Reverse-engineering helps us figure out what that data is. It also helps us find signs that the malware is installed and active, so that we can catch it before it causes any damage.
Developing your Mindset
Malware is dangerous. We have no idea what a sample is capable of, since we haven’t reversed it yet. If you accidentally execute a sample, and it may delete everything on your VM’s hard disk and all attached storage. It may do nothing. Take care when you’re doing analysis.
While doing analysis, feel free to scroll around in our debuggers and follow different strings or functions. Have fun with it – the naturally curious have more fun doing this kind of work.
I recommend focusing on one item at a time. Once you’ve solved what one variable does, often you can use it to determine what a function does, and functions can then help solve other functions and other variables – it can get overwhelming pretty quickly. To be effective, it’s best to focus on one item at a time, and not to get distracted chasing other leads. If you spot something that looks interesting, take a note of the address or name, and come back later.
Good static analysis takes time. Great static analysis is fast. Here’s what I mean: When you begin digging into a lot of malware samples, you’ll spend a lot of time thinking to yourself “Wait, what?” while staring at a group of assembly instructions. You may have studied assembly and can read the opcodes on a line-by-line basis, but knowing what the different subroutines actually meanis difficult. When you’ve been doing analysis for a little while, you’ll begin to get a sense of what a particular function is doing much faster, and you’ll begin to recognize the “corners” of the puzzle more quickly. You’ll also know your tools and Windows internals better, which is always a good thing.
Programmers have an easier time becoming reverse-engineers. I got my start with C++. Because of my experience with C++, I already understood some of the more confusing concepts like pointers when I began to look at assembly. I recognized how a Windows EXE file is built and should look inside. I recognized a large amount of imported functions.
To establish what “weird” is, determine what “normal” is. When in doubt, grab a binary that you know is safe, and check into it. See if what you’re observing is a normal thing, or if it seems to be a deviation. Do research online, ask friends or mailing lists, and so on.
It’s hard to do something for 10,000 hours and still be bad at it. If you’re discouraged, take a break and come back later – even if it’s a few days later. Like most things in life, persistence is rewarded.
What’s our goal?
Before we get into imports, exports, debuggers, and assembly code, we want to make sure we know exactly why we’re doing what we’re doing. I mean, why are we bothering to take apart this sample?
In this series, we’re going to be finding Indicators of Compromise for a specific malware sample. Indicators of Compromise, or IOCs, are specific properties of, or activities undertaken by, the malware. If we deploy security tools on our endpoint machines and servers that can check these properties and activities, we can detect the malware – hopefully before it does some damage.
So, our goal in this series is to find properties and activities related to this malware which would help us detect it on a computer. In short – it’s Indicators of Compromise.
Let’s start our investigation of the sample by figuring out the different methods of analysis. We can use Static or Dynamic analysis to work with malware.
Static Analysis vs Dynamic Analysis
Static analysis is the process of analyzing and reversing a file while it’s not actively running. It’s just a file on disk. We pull it apart and analyze it without running any of the code.
Dynamic analysis is the process of analyzing and reversing a file while it’s actively running. This can get you to your answers faster, but it’s helpful to understand the limitations of this kind of analysis (see below).
I’ve found that the best type of analysis that works for me is a little bit of both simultaneously. Having a static analysis tool like IDA PRO open while stepping through the binary in x32dbg too gives some much-needed orientation when getting lost in a new sample. This isn’t always possible though, since malware sometimes can detect if it’s being run with a debugger.
What’s our sample?
For this walk-through, I’m going to use an older sample. This variant is part of the Kuluozfamily of malware, which were bots serving the Asproxbotnet. Asproxhas since disappeared, but that doesn’t mean that we can’t use this sample to learn a little something about the process and procedure of analysis. It’s MD5 hash is 799325a4d5e0a1a620ac06cc7d12fce7. It’s available on virussign.com for download.
Since malware analysis is typically destructive to an operating system, most analysts opt for a Virtual Machine to do analysis in. The base OS that you install VMWare or VirtualBox into is called your “Host” operating system. The OS that you create a VM for and install in it is called your “Guest” OS.
Note that doing analysis in a guest OS doesn’t always help us. Malware sometimes cares if it’s running in a virtual machine, and will operate differently if it figures it out. For this sample, however, we’re just fine using a VM.
For smaller companies or individual analysts, I suggest working with VirtualBox first: https://www.virtualbox.org
The specifics of downloading, installing, and getting a guest OS running in VirtualBox is outside of the scope of this blog post– but I can suggest a resource here: https://www.malwaretech.com/2017/11/creating-a-simple-free-malware-analysis-environment.html Also I do suggest:
- -Lock down your VM as much as possible:
- —Disable your network interface until you’re absolutely sure you need it. You don’t want your sample attacking targets or giving bad actors access to your analysis environment. If you need your network interface to be up and active, use your virtualization software’s virtual network feature to create a “host-only“ network.
- –Make sure that you’re not sharing any folders with your Host OS. If you’re analyzing a sample which hunts down file shares and encrypts them (like lots of recent Ransomware samples), you may accidentally encrypt whatever directory you’re sharing.
- –Make sure that you don’t have any flash drives, external drives, backup drives, or network drives connected to your guest OS.
- -Ideally, your host OS and your guest OS shouldn’t be the same OS (For example, they both shouldn’t be Windows). If your sample has a way to ‘escape the sandbox’ (break out of your copy of VMware or VirtualBox), you don’t want it to install itself on your host OS!
- -Before you do any analysis, prep your environment. After installing your OS, take an initial snapshot of your VM. This will give you a place to come back to if everything blows up.
- -Do a snapshot after you install your tools too. You don’t want to have to install and configure your tools all over again if you have to roll back your snapshot!
- -Coffee, good music, a solid breakfast – whatever helps you clear your head and put your thinking cap on.
For static analysis for this sample, we’ll use IDA Pro Free (https://www.hex-rays.com/products/ida/support/download_freeware.shtml) and strings, a binary commonly included in MacOS and Linux. IDA Pro also provides the ability to extract strings. To do Dynamic analysis, we’ll be using x32dbg (https://x64dbg.com/#start).
At this point, if these are both not already in your analysis VM, I suggest installing them.
First steps: Checking out a Sample
Initial work on a sample typically involves getting a set of information about the binary that might help to inform us later on in the analysis.
Generally, I start with an open text editor on my Host OS. I’ll put a date on top, write a brief statement of what I’m trying to accomplish, and get a section set up for information about my sample. I include fields for the file size of my sample, it’s MD5 and SHA1 hashes, where the sample came from, and what I think are its capabilities.
So, to do this for the sample we’re looking at, I’d write:
Kuluoz Analysis (July 3, 2018) MD5:799325a4d5e0a1a620ac06cc7d12fce7 SHA1:b50233ff93c31f8f682ca922e47f6c5eb46be5fc File Size: 48KB Source: Infected Computer Capability: Unknown, probably bot-esque
Then continue to use this file as a scratch pad. Drop addresses, imports, or anything else that you find interesting that might help analysis. The raw information can be worked into something useful later, but this is our temporary workspace.
With our notes area set up (and file saved!), let’s get into some definitions and Strings analysis.
Imports vs Exports
While a full analysis of the Windows Portable Executable (WinPE, or PE) format is outside of the scope of this blog post, it’s critical to understand Imported Functions and Exported Functions.
An Imported function is a function which is stored externally, probably a different DLL file, that our sample is calling (we’re ‘importing’ it). Our sample doesn’t implement the function itself, it just uses it. For example, if you were to write a program to write a file to disk, you’d probably end up calling Windows API import WriteFile.
An Exported function is something that our sample implements, but makes available to be called by another, external program. One exported function is pretty normal, but if you’re seeing more than one in a file that you know is malicious, be aware that simply double-clicking to execute the malware may not work as you expect.
Strings analysis and Researching API Calls
Strings are a really interesting and informative part of malware analysis and an important part of the data collection process.
A string is any printable keyboard character. The ones we really care about tend to be groups of these printable characters together, as random values sometimes happen to map to printable characters. It’s helpful to locate strings anywhere from 6-8 characters in length, usually looks for recognizable words. That helps remove some of the noise. The goal here is to find some additional information about the malware. We might see DLLs it uses, functions it calls, a listing of file extensions it wants to encrypt, callback domain names or IP addresses, a script it unpacks, a registry key it uses for persistence, and so on.
The best way to get strings is to use the Strings binary inside of your Guest OS. But, word of warning – don’t run strings inside of your Host OS. This is because there have been flaws identified inside of strings which allow for potential code execution.
If your analysis environment is Windows, there is an open-source strings binary available for use here: https://docs.microsoft.com/en-us/sysinternals/downloads/strings
The screenshot below is the output of strings -n 8 sample.exe. The argument -n 8 returns any strings which are eight characters or longer.
These are all of the strings that can be found in this binary.
As you can see, there’s some garbage (strings which aren’t immediately obvious to us), and there’s some strings which clearly valid text. Kernel32.dll, Advapi32.dll, and User32.dll are all valid DLL files which are included with Windows. However, each of the strings that follow, such as LoadLibraryA and GetProcAddressare each valid function calls – Imports.
If we wanted to know more about each of these functions, the best way to do this is to use Microsoft’s Developer Documentation here: https://developer.microsoft.com/en-us/windows/desktop/develop
If we search the site for ‘LoadLibraryA’, then we land at this page: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684175(v=vs.85).aspx
Here, Microsoft states that LoadLibraryA “Loads the specified module into the address space of the calling process.”
Simply put, this function takes a DLL as an argument, and then loads it into our sample’s memory. This is the typical first step to calling a function that the DLL implements. Importantly, Microsoft States:
“If the function succeeds, the return value is a handle to the module.”
This means that if we check to see what this function returns later on in our analysis, we’ll know what segment of memory this DLL is occupying.
Depending on the sample, a strings analysis can show us a lot of detail about what the malware is capable of.
Second Steps: Starting analysis
So, what do we know already? We know:
- Our sample uses VirtualAlloc, which is a function that defines a region in memory.
- Our sample uses LoadLibraryA, probably to load DLL files under-the-radar
Open our sample with IDA Pro.
You should see a screen similar to this:
Once IDA Pro has opened the binary, it performs some analysis. It attempts to name specific functions and data structures, and it determines Imports, Exports, and Strings available. If we click on the Imports tab, we’ll see the following:
In total, there are 64 imports that this malware uses. Again, Imports are functions which are implemented in an external EXE or DLL file, but our sample selects and uses. On this screen, we can see that we’re using ADVAPI32.dllimplements GetUserNameA, and our sample uses that function. Take some time, look around, and get familiar with the data.
IDA Pro has a great feature which allows you to plot a chart of functions that reference your variable or function, or functions that your function references. If we select the function name WinMain and right click, we can pick “Find xrefs from”. Then we see the following window:
Each of these boxes, regardless of color, are different functions which are used by the malware. The purple boxes are imported functions that are called.
Did you notice that there is an import which shows up in our Strings analysis, but don’t show up in IDA Pro’s Imports window? If not, go back and take a look, and if you did, good catch! It’s VirtualProtect. VirtualProtect, according to Microsoft’s API, “Changes the protection on a region of committed pages in the virtual address space of the calling process.”
In simpler terms – VirtualProtect can be used to change a region of memory’s permissions. It can be used to make the region executable, a sign of malicious activity.
VirtualProtect is implemented in Kernel32.dll, which is another string that we observed in the Strings dump.
Indirectly Calling Functions
We know that malware likes to be sneaky. It will attempt to use a few tricks to fool analysts or automated tools. A common one is to indirectly call a function. This malware uses this to attempt to trick any analysts who try to take a look. Here’s the technique :
1. Load a DLL into memory by passing LoadLibraryA a DLL name as a string: LoadLibraryA(“kernel32.dll”);
2. Use import GetProcAddress(“dll”,”VirtualAlloc”); to get the address of VirtualAlloc, passing each argument as a string.
3. Taking the result of GetProcAddress, call the address of VirtualAlloc.
This seems somewhat simple, but it is sometimes enough to trick tools and analysts. It takes a little while to understand how this works under-the-hood.
When working through all of this information, now and going forward, it’s helpful to try to remember our goal. Why are we looking at this malware? What did we just figure out? What does this tell us? What do we do next?
Why are we looking at this malware? To attempt to find Indicators of Compromise (IOCs), and what it does on the device it is installed onto.
What did we learn so far?
- Utility “Strings” showed us that there are some strings which are valid Microsoft API Calls (VirtualAlloc) and valid DLL files (dll) which are included in the malware binary.
- We figured out that there are valid import names which show up in our “Strings” dump from our malware, but don’t show up on our Imports listing in IDA pro. Some, like VirtualAlloc, show up in both places.
What does this tell us?
- Our malware likely defines a region of space viaVirtualAlloc, unpacks something there, sets that segment as Executable, and then executes it.
What do we do next?
- Let’s take a look at what the malware puts into the space it defined as “executable”.Then, we can dump a copy and analyze it next.
That’s it for this week! On Monday, we’ll take a look at some tricky ways that this malware likes to call imports, unpacking the malware, and then a glaring mistake which helps our analysis.
Join us next week for Part 2!