Extract fields from the MDM

In Message Query Language (MQL), fields can be extracted from the MDM by specifying the field name. To get a subfield, combine two fields together with a . in between.

For example, to retrieve the sender's display name:

sender.display_name

Literal values

Strings can be specified in two forms: escaped and unescaped.

Escaped strings are surrounded by double quotes " and special characters can be escaped with \. If an unrecognized character is specified after \, that is treated as an invalid escape sequence and causes a syntax error.

"hello world"

"line 1\nline2\nline3"

"unicode characters like ✉️ are supported"

Unescaped or raw strings are surrounded by single quotes '. No escape sequences are supported within a raw string. However, two single quotes '' can be used to insert a single quote character.

'hello world'

'escaping apostrophes isn''t that difficult'

'this back\slash is interpreted literally'

Escape sequences

  • \r carriage return CR (ASCII 0x0d)
  • \n new line LF (ASCII 0x0a)
  • \t tab (ASCII 0x09)
  • \' single quotes ' (ASCII 0x27)
  • \" double quotes " (ASCII 0x22)
  • \\ backslash \ (ASCII 0x5c)
  • \u{xxxxxxxx} Unicode code point between 0x01 and 0x10ffff.

Unicode escape sequences can include any valid unicode code point, including ASCII characters, non-printable characters or other unicode characters. Between 2 and 8 hex digits are recognized within { and }.

Example unicode escapes:

  • \u{0a} identical to \n for a newline
  • \u{0398} greek capital theta: Θ
  • \u{200f} unicode right-to-left encoding character
  • \u{1f4ec} open mailbox emoji: 📬
  • \u{0001f4ec} open mailbox emoji, with optional leading zeros: 📬

Comparing values

MQL provides eight built-in operators to compare two values. The syntax is <left> <operator> <right>:

sender.email.domain.root_domain == "sublimesecurity.com"

Numbers can be compared with the below operators. MQL can represent unsigned, signed, or floating-point values.

  • <: less than
  • <=: less than or equal to
  • ==: equal to
  • !=: not equal to
  • >=: greater than or equal to
  • >: greater than

Note: When two values are compared of different numeric types, they are automatically converted to have compatible types. For example, in the comparison 1 < 1.5, the integer 1 is compared to a floating point 1.5. First, the 1 is converted to a floating point 1.0, and then the comparison is performed. This means that an expression like 3 == 3.14 will always evaluate false because it's converted to 3.0 == 3.14.

Strings support two additional operators to support case-insensitive equality. Unless explicitly specified, assume that strings are interpreted with case-sensitivity (meaning that "a" and "A" are distinct). Range operators, such as < use lexicographical ordering and are always case-sensitive.

case-sensitive comparisons

  • <: less than
  • <=: less than or equal to
  • ==: equal to ("Abc" == "abc" evaluates as false)
  • !=: not equal to
  • >=: greater than or equal to
  • >: greater than

case-insensitive comparisons

  • =~: case-insensitive equality. ("Abc" =~ "abc" evaluates as true)
  • !~: case-insensitive inequality. ("Abc" !~ "abc" evaluates as false)

Booleans can only be compared with == or !=. Booleans also have dedicated operators for traditional boolean logic, such as and. (see the next section).

Range checking

A common pattern when comparing values is to check if a value is within a given range. MQL provides syntax sugar for this kind of comparison with the syntax <lower> <operator> x <operator> <upper>. Some examples:

strings.levenshtein(sender.email.email, "[email protected]") > 4 and
strings.levenshtein(sender.email.email, "[email protected]") <= 7

# more compact form
4 < strings.levenshtein(sender.email.email, "[email protected]") <= 7
'abc' <= subject.subject < 'xyz'

Like other comparisons, both strings and numbers are supported; however, range checking only supports the following operators:

  • <: less than
  • <=: less than or equal to

Boolean logic

Multiple boolean expressions can be combined with traditional boolean operators. Use the and, or, or not keywords for boolean operations:

sender.email.email == "[email protected]" and subject.subject == "Password reset request"

sender.email.email == "[email protected]" or subject.subject == "Password reset request"

not (sender.email.email == "[email protected]")

Check against multiple values with in

One common pattern that often combines comparisons with boolean logic is to check the same field against multiple literal values. Instead of combining similar equality checks == with a logical or, use the in keyword to check a value against a set of values:

For example, to detect subjects commonly used in BEC attacks:

subject.subject == "Urgent" or
subject.subject == "Can you help" or
subject.subject == "Quick errand"

# more compact form with `in`
subject.subject in ("urgent", "can you help", "quick errand")

To check the inverse and ensure that a value is not in a list of values, use not in:

subject.subject not in ("Urgent", "Can you help", "Quick errand")

in can also be used in a case-insensitive way, using in~:

subject.subject in~ ("Urgent", "Can you help", "Quick errand")

Checking array fields with in

The syntax for in and in~ can also be used to lookup a single value against an array field or function that returns an array. The syntax x in array_field is simply shorthand for any(some_array, . == x).

Can be used with file.explode:

any(attachments, .file_extension in~ ('html', 'htm') and
  any(file.explode(.), 
    any(.scan.javascript.identifiers, . == "unescape")) 
)

With in to check .scan.javascript.identifiers:

any(attachments, .file_extension in~ ('html', 'htm') and
  any(file.explode(.), 
    "unescape" in .scan.javascript.identifiers) 
)

Matching several boolean expressions with of

Occasionally when writing rules, a minimum amount of boolean clauses must match. With and, all the clauses must be true and with or at least one clause has to be true. Use of for a hybrid operator when checking for a minimum amount of matches.

X of (...) evaluates true if at least X terms evaluate true. The basic structure for of follows:

X of (clause1, clause2, ..., clauseN)

Using 1 of (...) is identical to an or over all of the terms. Similarly, if X is the same as the total number of terms, then it's equivalent to and. The minimum number of clauses to check with of must be between 1 and the total number of clauses between ( and ).

For example,

# returns true
3 of (true, false, true, true)

# returns false
3 of (true, false, false, true)

Checking a named list with in

A value can be checked against a list by using in $list or not in $list syntax. See the full reference of currently available lists.

sender.email.domain.domain in $alexa_1m

sender.email.domain.domain not in $alexa_1m

Creating arrays

Create arrays on-the-fly with a list of literal or dynamic values. Arrays can be combined with array functions such as any and all to consolidate logic.

To create an array, encapsulate a list of values in [ ]

["foo", "bar", "baz"]

[body.plain.text, body.html.text]

For example, to check if either body.plain.text or body.html.text contain a Social Security Number:

any([body.plain.text, body.html.text],
    regex.contains(., '\b(\d\d\d)-(\d\d)-(\d\d\d\d)\b')
   )

Using any with an custom array is equivalent to writing an or with two regex.contains calls:

regex.contains(body.plain.text, '\b(\d\d\d)-(\d\d)-(\d\d\d\d)\b') or
regex.contains(body.html.text, '\b(\d\d\d)-(\d\d)-(\d\d\d\d)\b')

To check if any of the recipients is a disposable email provider:

any([recipients.to, recipients.cc, recipients.bcc], 
    any(., .email.domain.domain in $disposable_email_providers)
)

See array functions for the full list of array functions.

Arithmetic

All numbers support standard arithmetic operations. When two numbers of different types are used in arithmetic they are first converted to a matching type. That means 1 * 2.0 is automatically converted to 1.0 * 2.0. The supported arithmetic operations are:

  • + Add two numbers
  • - Subtract two numbers
  • * Multiply two numbers
  • / Divide two numbers. If both values are integers, this indicates integer division, meaning that 5 / 2 is 2. To get 2.5, add a decimal point to either side: 5 / 2.0, 5.0 / 2 and 5.0 / 2.0 are all equivalent.
  • % Modulo of two numbers

Order of operations

The precedence of operations, ordered from highest to lowest:

  • ( ... ): parentheses
  • *, / and %: multiplication, division, and modulus
  • + and -: addition and subtraction
  • <, <=, ==, =~, !=, !~, >=, >, in: comparisons
  • of: matching multiple boolean terms
  • not: boolean NOT
  • and: boolean AND
  • or: boolean OR

Common logic errors

Missing parentheses

The following rule will match any message with "microsoft team" in the body, regardless of whether type.inbound is true.

type.inbound
and strings.ilike(subject.subject, "*microsoft team*")
or strings.ilike(body.html.inner_text, "*microsoft team*")
^^-- oops!

That's the same as typing:

(
    type.inbound
    and strings.ilike(subject.subject, "*microsoft team*")
) or strings.ilike(body.html.inner_text, "*microsoft team*")
^^-- oops!

If you intend for the or to cover the two ilike checks, you need parentheses to make the order of operations explicit. That way, type.inbound must always evaluate as true:

type.inbound
and (
    strings.ilike(subject.subject, "*microsoft team*")
    or strings.ilike(body.html.inner_text, "*microsoft team*")
)

Comments

MQL supports single-line comments beginning with //.

type.inbound
// check if the sender's domain uses the ru TLD
and sender.email.domain.tld != 'ru'