3. Storing malware configurations

Configuration objects are intended to hold a malware static configuration extracted from binaries or a dynamic configuration fetched from C&C.

What is malware configuration (or what we think it is)?

In general, configurations are structured data represented in MWDB by JSON objects. Malware samples usually contain embedded “configuration”, called static configuration that determines the:

malware family
addresses of C&C servers or initial peers
DGA seeds
encryption keys used for communication
malware version
botnet/campaign ID
module type etc.

It means that static configuration determines the operations performed by malware sample and how they are parametrized. Various samples unpacking to the same configuration usually have the same, but differently packed core, which allows us to determine the similarity between these files.

The format of configuration depends on malware family, usually deriving from the structure “proposed” by the malware author.

On the other hand, malware operations are also parametrized externally by data fetched from Command & Control servers. This data is called dynamic configuration and can be parsed as well into the structured form.

Dynamic configuration consists of:

commands to be executed
webinjects (for banking malware)
new C&C IP addresses / peer IP addresses
mail templates etc.

Configuration attributes

Configurations are described using the following attributes:

Family: describes the malware family related to its configuration
Config type: configuration type. Default is static but you can use any string you want. In mwdb.cert.pl we’re using static and dynamic to distinguish between static and dynamic configurations.
Contents (cfg): dictionary with configuration contents
Upload time - timestamp of first configuration upload

Configuration contents are stored as JSON object:

Note

It’s a good practice to keep the same configuration structure per malware family including keys schema and value types.

How to upload configuration?

Configuration is intended to be uploaded by automated systems or scripts. That’s why you can’t add it directly from MWDB UI. Nevertheless, it’s still possible to add configuration using mwdblib CLI or REST API.

Warning

Configuration can be uploaded only if a user has the adding_configs capability turned on. Check your capabilities in the Profile view.

In mwdb.cert.pl configuration upload is turned off for external users. If you want to share your own configuration, feel free to contact someone from CERT.pl on Slack or via e-mail (info@cert.pl).

First install the mwdblib library including CLI extra dependencies:

$ pip install mwdblib[cli]

Then you can upload a new configuration using the mwdb upload config command:

$ mwdb --api-url http://127.0.0.1:3000/api/ login
Username: admin
Password:

$ mwdb --api-url http://127.0.0.1:3000/api/ upload config evil -
{"cnc": ["127.0.0.1"], "key": "asdf"} <CTRL+D>
Uploaded config 64efd0b1f964ad48aadd849a2242ebd1bb803d9e3309ee3d154b15d0dc2c5336

Then you can find the configuration in MWDB instance:

A new configuration can be also uploaded using a Python script:

from mwdblib import MWDB

# Omit api_url if you want to use mwdb.cert.pl API
mwdb = MWDB(api_key=..., api_url=...)
config = {
    "cnc": [
        "127.0.0.1"
    ],
    "key": "asdf"
}
config_object = mwdb.upload_config("evil", config)
# <mwdblib.config.MWDBConfig>

Note

If you want to experiment with mwdblib, you don’t need to create the API key. Just use the mwdb.login() method and you’ll be asked for login and password.

More information about automating things is described in the chapter 8. Automating things using REST API and mwdblib.

How configurations are deduplicated?

MWDB generates unique SHA256-like hash value for all objects in repository, including configurations. For files and blobs, we just use the SHA256 function to hash the content.

The hashing algorithm is a bit more complicated for structured data like configuration. The main idea is to avoid duplications occuring due to slightly different order of list elements or dictionary keys in uploaded JSON.

That’s why our hashing function follows few assumptions:

Keys in dictionaries are hashed non-orderwise

Values can have all types supported by JSON, but they’re all stringified during hashing e.g. False and “False” are the same. It’s not a big deal if you avoid mixing value types under the same key:

from mwdblib import config_dhash

config_dhash({"value": "1"})
# 141767ab98a062fcd5bbfb48ddd5d5c2bb3556d64006d774372f15d045d0ba89

config_dhash({"value": 1})
# 141767ab98a062fcd5bbfb48ddd5d5c2bb3556d64006d774372f15d045d0ba89

Lists are treated more like multisets. They’re stored orderwise, but hashed non-orderwise.

from mwdblib import config_dhash

config_dhash({"domains": ["google.com", "spamhaus.com"]})
# '93b6befcc25bb339eb449d6aa7db47bc3a661f20026e4cb4124388b539336d81'

config_dhash({"domains": ["spamhaus.com", "google.com"]})
# '93b6befcc25bb339eb449d6aa7db47bc3a661f20026e4cb4124388b539336d81'

Configuration dictionaries are hashed recursively:

simple values are stringified and UTF-8-encoded and then hashed using SHA256
lists are evaluated into the lists of hashes, then sorted and hashed in a stringified form
dictionaries are converted into the list of tuples (key, hash(value)), sorted by the first element (key) and then hashed in a stringified form

If you want to pre-evaluate hash for configuration, you can use the config_dhash function in mwdblib.

Searching configuration parts

The most simple way to search for similar configurations is to use interactive search. You can generate the appropriate query just by clicking on the config fields:

Configurations can be also queried manually using the following syntax:

config.cfg.field_1.field_2:value

which finds configs that contain structure shown below:

{
    "field_1": {
        "field_2": "value"
   }
}

Note

You can search for configurations only in Recent configs or Search. In Recent configs view config. prefix is optional, because the view already makes assumption about the type.

Sometimes you may want to find a specific string in configuration e.g. IP address. In that case, you can use wildcards and search them regardless of the JSON structure:

config.cfg:*127.0.0.1*

or if you want to be more strict

config.cfg:*"127.0.0.1"*

searching configurations using wildcards

For more information see 7. Advanced search based on Lucene queries.

Relationships with files

Configuration semantics is defined not only by the dictionary itself, but also by the relations with other objects. In mwdb.cert.pl service we’re following few specific conventions that have special support in mwdb-core.

File → Config relationship

File → Config relationship determines the association between malware sample and static configuration. Configuration parents are the direct source of configuration, which means that configuration is contained in these files and can be extracted directly from them.

That’s why the common relationship pattern in MWDB is Executable (packed) → Dump (with unpacked code) → Static configuration.

In addition, the original sample is tagged as ripped:<family name> and dump is tagged as <family name>.

MWDB has special support for File → Config relationship and presents the latest configuration along with basic file information. Relationships returned by API are ordered from the latest one, so hash of the most recent configuration is the first element in the list.

Latest configuration is also presented in the UI by the separate Static config tab, appearing in the detailed file view.

Config → File relationship

Config → File relationship represents files dropped by malware from the C&C. These files can be:

modules (for modular malware)
next stage malware
updates
tasks

Static configuration is required to fetch these files from the server. It can contain distribution URLs where file is placed or the encryption key needed to decrypt the payload.

Thus we bind these files to the configuration instead of making relationships with all malware samples that drop them.