7. Advanced search based on Lucene queries
MWDB comes with a powerful search engine based on Lucene query syntax subset.
Query syntax: fields
A query is broken up into fields and operators.
You can search any field by typing the field name followed by “:” and then the term value you are looking for.
tag:emotet
Multi-word terms separated by spaces or containing a special character (e.g. colon or parentheses) are called “phrases” and must be surrounded by double quotes
type:"PE32 executable (GUI) Intel 80386, for MS Windows"
Most fields support wildcard search. Symbol “?” represents a single character wildcard and “*” represents multiple characters (0 or more). If you want to include all PE executables in your results - use:
type:PE32*
Query syntax support escaping if you want to include ‘*’ as a character. Query presented below looks for all type values that are containing asterisk:
type:"*\**"
Query syntax: operators
MWDB supports three boolean operators: AND
, OR
and NOT
.
Warning
Boolean operators must be UPPERCASE.
If you want to search all samples that are tagged with anything that contains “emotet” word (e.g emotet_drop
, ripped:emotet
or just emotet
) and exclude samples tagged as feed:spam
- use:
tag:*emotet* AND NOT tag:"feed:spam"
Query syntax supports using parentheses to group logic expressions:
name:emotet* OR (tag:*emotet* AND NOT tag:"feed:spam")
Query syntax: ranges
Integer and date fields support range search. Range queries can be inclusive or exclusive of the upper and lower bounds.
Warning
TO
operator must be UPPERCASE.
Query written below will find all files that have size between 50 and 50000 bytes inclusive:
size:[50 TO 50000]
If you want to exclude one of the range sides, replace “[” character with “{“.
size:{50 TO 50000]
This will find files between 51 and 50000 bytes in size. Inclusive range queries are denoted by square brackets. Exclusive are denoted by curly brackets.
In most cases we want to search for that are only one-side bounded e.g. all files bigger than 50000 bytes. In that case, we can use single wildcard character to denote the infinity:
size:[50000 TO *]
This syntax is still not very convenient, so we have introduced shorter syntax incorporating >
, <
, >=
and <=
operators. To use them, just add appropriate operator to the beginning of a term.
size:>=500000
size:>="500000"
MWDB-Core supports human-readable file size so instead of specifying the number of bytes, we can refer to larger units like kB, MB and GB.
size:>=500kB
size:>=0.5MB
Warning
Remember that converting a file size from bytes to human-readable form does not always match with the conversion the other way around.
For example 1 kB equals 1024 bytes, rounding 1026 bytes to the second decimal number 1026 bytes will also give 1 kB (1.002 kB).
So do not be surprised if you enter size:1kB
in the search engine and a sample of this size is not found, because in bytes this size may differ slightly.
For this reason, searching for a size from an object view always redirects to the query in bytes.
Query syntax: timestamps
With timestamps you can search for objects within certain time range.
If you want to find objects that were uploaded from the beginning of September till the 28th:
upload_time:[2020-09-01 TO 2020-09-28]
If you want to find objects that were uploaded from the beginning of September:
upload_time:[2020-09-01 TO *]
Alternatively:
upload_time:>=2020-09-01
If you want to search for objects within time certain range:
upload_time:["2020-09-28 08:00" TO "2020-09-28 09:00"]
If you want to search for objects uploaded after certain hour:
upload_time:>="2020-09-28 08:00"
If you want to search for objects uploaded at certain minute:
upload_time:"2020-09-28 15:32"
Remember that exclusive range is not allowed for date-time field so this is not allowed:
upload_time:{2020-09-01 TO *]
upload_time:>2020-09-01
Query syntax: relative timestamps
New in version 2.7.0.
It is also possible to use upload_time in relation to current time. For example, if you want to search for objects uploaded during last 2 hours:
upload_time:>=2h or upload_time:[2h TO *]
This way of time definition contains value and relative time format.
Below are listed acceptable relative time symbols:
y or Y : years
m : months
w or W : weeks
d or D : days
h or H : hours
M : minutes
s or S : seconds
Warning
Bring awareness to symbols m and M. They means quite different period time. Other symbols can be used as uppercase or lowercase letters.
During defining relative time you can combine different time symbols. For example, if you want to search for objects uploaded earlier then 1 mounths and 5 days ago:
upload_time:<=1m5d or upload_time:[* TO 1m5d]
It is possible to use relative timestamps and timestamps.
upload_time:[2022-02-01 TO 1m10d]
Basic search fields
Fields represent the object properties and can be typed (specific for object type) or untyped (generic, used by all object types).
Usage depends on the search context. If you’re querying Recent files
tab, query engine assumes that object type is file
. If you’re using Search
tab, you need to add appropriate type prefix to the typed fields.
In simple words: name: field in Recent files``must be replaced by`file.name:field in
Search`.
Untyped fields
dhash:<string>
- Object identifier (SHA256)tag:<string>
- Object tagcomment:<string>
- Object comment contentsmeta.<attribute>:<string>
- Object attribute valueupload_time:<datetime>
- Object first upload timestampkarton:<uuid>
- Karton analysis artifacts
Typed fields (file)
file.name:<string>
- Name of filefile.type:<string>:
- Type of file, returned byfile
Unix commandfile.size:<integer>:
- Size of file in bytesfile.md5:
,file.sha1:
,file.sha256:
,file.sha512:
,file.ssdeep:
,file.crc32:
- File contents hashes and checksums
Typed fields (config)
config.type:<string>
- Type of configurationconfig.family:<string>
- Malware family nameconfig.cfg[<.path>]:<string>
- JSON field with configuration contents
Typed fields (blob)
blob.name:<string>
- Name of blobblob.size:<integer>
- Size of blobblob.type:<string>
- Type of blobblob.content:<string>
- Blob contentsblob.first_seen:<datetime>
- Alias forupload_time
blob.last_seen:<datetime>
- Timestamp when blob was last uploaded
Special fields
There are also other fields that have special meaning. They will be described in further sections of this chapter.
favorites:<string>
sharer:<string>
shared:<string>
uploader:<string>
parent:<subquery>
child:<subquery>
JSON fields (config.cfg:
)
Configurations can be searched using path queries:
config.cfg.field_1.field_2:value
which would find configs that contain structure below:
{
"field_1": {
"field_2": "value"
}
}
Configurations are stored as JSON objects. The most simple way to search something inside configuration is to treat them as simple text fields and use wildcards.
Assuming we are in Recent configs
tab:
cfg:*google.com*
If we want to be more specific and look for google.com
only inside “urls” key, we can add a field name to cfg
field using dot:
cfg.urls:*google.com*
If you want to search for elements contained in an array, simply use * at the end of the field where it is nested.
For example, let’s use the following configuration.
{
"field": {
"array": [1, 2, 3]
}
}
In this case, to find the object, we can use array search to check if the nested array contains a specific value:
cfg.field.array*:1
Searching in this way applies to both numbers and strings contained in the array.
If you search by more than one value contained in an array, just type:
cfg.field.array*:"*1, 2*"
Starting from v2.8.0, you can also use range queries for numbers, for example:
cfg.field.array*:[1 TO 3]
looks for any configuration that contains field.array with at least one number element within range 1 to 3 inclusively.
Favorites field (favorites:
)
Typing the field favorites:
you can search for your objects marked as favorite in object view.
favorites:<user login>
The above query returns the favorite objects of specific user.
Warning
Remember that you can only search for your own favorites objects.
Only system administrator with “manage_users” capabilities can search for other users favorites.
Upload count field (upload_count:
)
Typing the field upload_count:
you can search for objects using upload information that are related with n
different users uploads. Using that query, you can spot objects that might be most interesting within your dataset.
upload_count:n
The wildcards ranges are supported for field upload_count:
.
For example:
If you want to search for samples which were uploaded by 5 and more different users, use undermentioned search statement.
It’s also highly recommended to combine upload_count
with upload_time
query,
so you can select only samples that were uploaded within the specific period of time:
upload_count:["5 TO *] AND upload_time:[2021-06-01 TO 2021-06-30]
Parent/child subqueries
MWDB allows to use parent/child subqueries.
If you want to search for samples that have ripped configuration for Emotet family as their child, go to Samples
and type:
child:(config.family:emotet)
If you want to search for configs that have a sample as their parent with file size greater than 1000, go to Configs
and type:
parent:(file.size:>1000)
Nested searches can be performed as well. If you want to find object which is parent of object tagged as emotet
and grandparent of config object for Emotet family:
child:(tag:emotet AND child:(config.family:emotet))
Multi field (multi:
)
Search engine supports multi:
special field that is useful for filtering out objects using multiple
type of object attributes.
Depending on type of object, we can use field multi:
applying various object attributes separated by spaces.
Types of attributes are automatically recognised.
Below allowable attributes were listed for different type of objects, which can be used in query building.
File (sample)
all hashes values
Config
dhash values
extract of configuration content
Blob
dhash values
extract of Blob content
If you want to search for samples that have 0cb988d042a7f28dd5fe2b55b3f5ac7a md5
value or
3b0ee981 crc32
value use below query.
multi:"0cb988d042a7f28dd5fe2b55b3f5ac7a 3b0ee981"
If query contains only hashes, field multi:
can be omitted.
0cb988d042a7f28dd5fe2b55b3f5ac7a eb1c78d4994f7a107f78dec529988900e3601852ae0bfdefb3e15967c6d8f127
If you want to search for Configs that have configuration content which contain strings “abcd” or “xyz” or
dhash
value eb1c78d4994f7a107f78dec529988900e3601852ae0bfdefb3e15967c6d8f127 - use below query
multi:"abcd xyz eb1c78d4994f7a107f78dec529988900e3601852ae0bfdefb3e15967c6d8f127"
Warning
Multi-query terms containing extract of Config configuration or Blob content must be separated by spaces end surrounded by double quotes. Field multi:
also has to be explicitly used.
Wildcards are not allowed for field multi:
. They are automatically used for Config configuration and Blob content.
Escaping special characters
Field labels may contain characters that need to be escaped in query. Let’s consider a config field with space character:
{
"decoy domains": [ "evil.com" ]
}
Special characters (like space) need to be escaped using backslash, so correct query looks like below:
config.cfg.decoy\ domains*:"evil.com"
Other special characters that also need to be escaped are:
dots (
\.
)colons (
\:
)asterisks (
\*
)backslashes (
\\
)
Quick queries

Quick queries can be found just under the search field.
You can use quick query by clicking on one of the badges. First four queries are built-in:
Only uploaded by me
isuploader:<my login>
query that can be used to filter only samples uploaded by ourselvesExclude public
isNOT shared:public
and filters out public objectsExclude feed:*
isNOT tag:"feed:*"
and excludes all the external feedsOnly ripped:*
istag:"ripped:*"
and includes only original samples recognized as malware and with successfully ripped configuration.
You can also add your own quick query by first typing the query in search field and then clicking on Add +

Afterwards, you can see your newly added query as another black-coloured badge. You can click it any time and even the most complex query will be performed!

Comment author field (
comment_author:
)Typing the field
comment_author:
you can search for objects commented by selected user.The above query returns the objects commented by user <user login>.
Warning
Comment authors are kept only for existing users, so you can’t search for comments from deleted accounts.
Wildcards are not allowed for field
comment_author:
.