How to detect text in attachments

Sublime uses recursive binary explosion and OCR to extract text from files. For more information on how this works, see the beta.binexplode documentation.

To simply find any string that was extracted out of any file, such as PDFs, images, and Office documents, use string searching functions like ilike or iregex_search on the output of the following property: beta.binexplode[].scan.strings.strings.

The example below detects the word "norton" in any PDF file:

any(attachments, .file_extension == "pdf" and
    any(beta.binexplode(.), 
        any(.scan.strings.strings, ilike(., "*norton*"))
    )
)

To find text that was extracted from a specific type of file or scanner, use the same string searching functions on the desired scanner output. Examples include .docx, .html, .javascript, .ocr, .vba, and .xml:

any(attachments,
  .file_extension in~ ("doc", "docm", "docx", "dot", "dotm", "pptm", "ppsm", "xlm", "xls", "xlsb", "xlsm", "xlt", "xltm", "zip")
       and any(beta.binexplode(.), 
         any(.scan.ocr.text, ilike(., "*please*"))
         and any(.scan.ocr.text, ilike(., "*enable*"))
         and any(.scan.ocr.text, ilike(., "*macros*")))
)

๐Ÿšง

A note on performance

The beta.binexplode function is a relatively expensive operation, so you typically don't want to run it on every file attachment. Instead, include a pre-filter to limit it to specific file types, such as archives, PDFs, HTML files, etc.

One way to do this, as shown in the examples above, is by checking the .file_extension prior to calling beta.binexplode.

Testing your rule against a sample EML file

The MQL rule editor can be used to display the output of the beta.binexplode function so you can easily build detection rules for the values extracted. First write your MQL snippet, then click on the beta.binexplode function, and then the Evaluate play button: