How to detect text in attachments

Sublime uses recursive binary explosion and OCR to extract text from files. For more information on how this works, see the file.explode documentation.

To simply find any string that was extracted out of any file, such as PDFs, images, and Office documents, use string searching functions like strings.icontains or regex.icontains on the output of the following property: file.explode[].scan.strings.strings.

The example below detects the word "norton" in any PDF file:

any(attachments, .file_extension == "pdf" and
        any(.scan.strings.strings, strings.icontains(., "norton"))

To find text that was extracted from a specific type of file or scanner, use the same string searching functions on the desired scanner output. Examples include .docx, .html, .javascript, .ocr, .vba, and .xml:

  .file_extension in~ ("doc", "docm", "docx", "dot", "dotm", "pptm", "ppsm", "xlm", "xls", "xlsb", "xlsm", "xlt", "xltm", "zip")
    and any(file.explode(.), 
      strings.icontains(.scan.ocr.raw, "enable macros")


A note on performance

The file.explode function is a relatively expensive operation, so you typically don't want to run it on every file attachment. Instead, include a pre-filter to limit it to specific file types, such as archives, PDFs, HTML files, etc.

One way to do this, as shown in the examples above, is by checking the .file_extension prior to calling file.explode.

Testing your rule against a sample EML file

The MQL rule editor can be used to display the output of the file.explode function so you can easily build detection rules for the values extracted. First write your MQL snippet, then click on the file.explode function, and then the Evaluate play button: