How to detect text in attachments
Sublime uses recursive binary explosion and OCR to extract text from files. For more information on how this works, see the beta.binexplode documentation.
To simply find any string that was extracted out of any file, such as PDFs, images, and Office documents, use string searching functions like ilike or iregex_search on the output of the following property: beta.binexplode[].scan.strings.strings
.
The example below detects the word "norton" in any PDF file:
any(attachments, .file_extension == "pdf" and
any(beta.binexplode(.),
any(.scan.strings.strings, ilike(., "*norton*"))
)
)
To find text that was extracted from a specific type of file or scanner, use the same string searching functions on the desired scanner output. Examples include .docx
, .html
, .javascript
, .ocr
, .vba
, and .xml
:
any(attachments,
.file_extension in~ ("doc", "docm", "docx", "dot", "dotm", "pptm", "ppsm", "xlm", "xls", "xlsb", "xlsm", "xlt", "xltm", "zip")
and any(beta.binexplode(.),
any(.scan.ocr.text, ilike(., "*please*"))
and any(.scan.ocr.text, ilike(., "*enable*"))
and any(.scan.ocr.text, ilike(., "*macros*")))
)
A note on performance
The
beta.binexplode
function is a relatively expensive operation, so you typically don't want to run it on every file attachment. Instead, include a pre-filter to limit it to specific file types, such as archives, PDFs, HTML files, etc.One way to do this, as shown in the examples above, is by checking the
.file_extension
prior to callingbeta.binexplode
.
Testing your rule against a sample EML file
The MQL rule editor can be used to display the output of the beta.binexplode
function so you can easily build detection rules for the values extracted. First write your MQL snippet, then click on the beta.binexplode
function, and then the Evaluate
play button:


Updated about 1 month ago