3. Storing malware configurations
Configuration objects are intended to hold a malware static configuration extracted from binaries or a dynamic configuration fetched from C&C.

What is malware configuration (or what we think it is)?
In general, configurations are structured data represented in MWDB by JSON objects. Malware samples usually contain embedded “configuration”, called static configuration that determines the:
malware family
addresses of C&C servers or initial peers
DGA seeds
encryption keys used for communication
malware version
botnet/campaign ID
module type etc.
It means that static configuration determines the operations performed by malware sample and how they are parametrized. Various samples unpacking to the same configuration usually have the same, but differently packed core, which allows us to determine the similarity between these files.

The format of configuration depends on malware family, usually deriving from the structure “proposed” by the malware author.
On the other hand, malware operations are also parametrized externally by data fetched from Command & Control servers. This data is called dynamic configuration and can be parsed as well into the structured form.
Dynamic configuration consists of:
commands to be executed
webinjects (for banking malware)
new C&C IP addresses / peer IP addresses
mail templates etc.
Configuration attributes
Configurations are described using the following attributes:
Family: describes the malware family related to its configuration
Config type: configuration type. Default is
static
but you can use any string you want. In mwdb.cert.pl we’re usingstatic
anddynamic
to distinguish between static and dynamic configurations.Contents (
cfg
): dictionary with configuration contentsUpload time - timestamp of first configuration upload
Configuration contents are stored as JSON object:

Note
It’s a good practice to keep the same configuration structure per malware family including keys schema and value types.
How to upload configuration?
Configuration is intended to be uploaded by automated systems or scripts. That’s why you can’t add it directly from MWDB UI. Nevertheless, it’s still possible to add configuration using mwdblib
CLI or REST API.
Warning
Configuration can be uploaded only if a user has the adding_configs
capability turned on. Check your capabilities in the Profile
view.
In mwdb.cert.pl configuration upload is turned off for external users. If you want to share your own configuration, feel free to contact someone from CERT.pl on Slack or via e-mail (info@cert.pl).
First install the mwdblib
library including CLI extra dependencies:
$ pip install mwdblib[cli]
Then you can upload a new configuration using the mwdb upload config
command:
$ mwdb --api-url http://127.0.0.1:3000/api/ login
Username: admin
Password:
$ mwdb --api-url http://127.0.0.1:3000/api/ upload config evil -
{"cnc": ["127.0.0.1"], "key": "asdf"} <CTRL+D>
Uploaded config 64efd0b1f964ad48aadd849a2242ebd1bb803d9e3309ee3d154b15d0dc2c5336
Then you can find the configuration in MWDB instance:

A new configuration can be also uploaded using a Python script:
from mwdblib import MWDB
# Omit api_url if you want to use mwdb.cert.pl API
mwdb = MWDB(api_key=..., api_url=...)
config = {
"cnc": [
"127.0.0.1"
],
"key": "asdf"
}
config_object = mwdb.upload_config("evil", config)
# <mwdblib.config.MWDBConfig>
Note
If you want to experiment with mwdblib, you don’t need to create the API key. Just use the mwdb.login() method and you’ll be asked for login and password.
More information about automating things is described in the chapter 8. Automating things using REST API and mwdblib.
How configurations are deduplicated?
MWDB generates unique SHA256-like hash value for all objects in repository, including configurations. For files and blobs, we just use the SHA256 function to hash the content.
The hashing algorithm is a bit more complicated for structured data like configuration. The main idea is to avoid duplications occuring due to slightly different order of list elements or dictionary keys in uploaded JSON.
That’s why our hashing function follows few assumptions:
Keys in dictionaries are hashed non-orderwise
Values can have all types supported by JSON, but they’re all stringified during hashing e.g. False and “False” are the same. It’s not a big deal if you avoid mixing value types under the same key:
from mwdblib import config_dhash config_dhash({"value": "1"}) # 141767ab98a062fcd5bbfb48ddd5d5c2bb3556d64006d774372f15d045d0ba89 config_dhash({"value": 1}) # 141767ab98a062fcd5bbfb48ddd5d5c2bb3556d64006d774372f15d045d0ba89
Lists are treated more like multisets. They’re stored orderwise, but hashed non-orderwise.
from mwdblib import config_dhash config_dhash({"domains": ["google.com", "spamhaus.com"]}) # '93b6befcc25bb339eb449d6aa7db47bc3a661f20026e4cb4124388b539336d81' config_dhash({"domains": ["spamhaus.com", "google.com"]}) # '93b6befcc25bb339eb449d6aa7db47bc3a661f20026e4cb4124388b539336d81'
Configuration dictionaries are hashed recursively:
simple values are stringified and UTF-8-encoded and then hashed using SHA256
lists are evaluated into the lists of hashes, then sorted and hashed in a stringified form
dictionaries are converted into the list of tuples
(key, hash(value))
, sorted by the first element (key) and then hashed in a stringified form
If you want to pre-evaluate hash for configuration, you can use the config_dhash
function in mwdblib.
Searching configuration parts
The most simple way to search for similar configurations is to use interactive search. You can generate the appropriate query just by clicking on the config fields:

Configurations can be also queried manually using the following syntax:
config.cfg.field_1.field_2:value
which finds configs that contain structure shown below:
{
"field_1": {
"field_2": "value"
}
}
Note
You can search for configurations only in Recent configs
or Search
.
In Recent configs
view config.
prefix is optional, because the view already makes assumption about the type.
Sometimes you may want to find a specific string in configuration e.g. IP address. In that case, you can use wildcards and search them regardless of the JSON structure:
config.cfg:*127.0.0.1*
or if you want to be more strict
config.cfg:*"127.0.0.1"*

For more information see 7. Advanced search based on Lucene queries.
Relationships with files
Configuration semantics is defined not only by the dictionary itself, but also by the relations with other objects. In mwdb.cert.pl service we’re following few specific conventions that have special support in mwdb-core.
File → Config relationship
File → Config relationship determines the association between malware sample and static configuration. Configuration parents are the direct source of configuration, which means that configuration is contained in these files and can be extracted directly from them.
That’s why the common relationship pattern in MWDB is Executable (packed) → Dump (with unpacked code) → Static configuration
.

In addition, the original sample is tagged as ripped:<family name>
and dump is tagged as <family name>
.
MWDB has special support for File → Config
relationship and presents the latest configuration along with basic file information. Relationships returned by API are ordered from the latest one, so hash of the most recent configuration is the first element in the list.

Latest configuration is also presented in the UI by the separate Static config
tab, appearing in the detailed file view.

Config → File relationship
Config → File relationship represents files dropped by malware from the C&C. These files can be:
modules (for modular malware)
next stage malware
updates
tasks
Static configuration is required to fetch these files from the server. It can contain distribution URLs where file is placed or the encryption key needed to decrypt the payload.
Thus we bind these files to the configuration instead of making relationships with all malware samples that drop them.