Sublime uses recursive binary explosion and OCR to extract text from files. For more information on how this works, see the beta.binexplode documentation.
To simply find any string that was extracted out of any file, such as PDFs, images, and Office documents, use string searching functions like ilike or iregex_search on the output of the following property:
The example below detects the word "norton" in any PDF file:
any(attachments, .file_extension == "pdf" and any(beta.binexplode(.), any(.scan.strings.strings, ilike(., "*norton*")) ) )
To find text that was extracted from a specific type of file or scanner, use the same string searching functions on the desired scanner output. Examples include
any(attachments, .file_extension in~ ("doc", "docm", "docx", "dot", "dotm", "pptm", "ppsm", "xlm", "xls", "xlsb", "xlsm", "xlt", "xltm", "zip") and any(beta.binexplode(.), any(.scan.ocr.text, ilike(., "*please*")) and any(.scan.ocr.text, ilike(., "*enable*")) and any(.scan.ocr.text, ilike(., "*macros*"))) )
A note on performance
beta.binexplodefunction is a relatively expensive operation, so you typically don't want to run it on every file attachment. Instead, include a pre-filter to limit it to specific file types, such as archives, PDFs, HTML files, etc.
One way to do this, as shown in the examples above, is by checking the
.file_extensionprior to calling
The MQL rule editor can be used to display the output of the
beta.binexplode function so you can easily build detection rules for the values extracted. First write your MQL snippet, then click on the
beta.binexplode function, and then the
Evaluate play button:
Updated 4 months ago