Email messages can contain two sections with text data that get rendered in an email client when a message is viewed by a user:
As the section names imply, the
html section can contain HTML mark-up such as hyperlinks, embedded images, text formatting, and more.
plain section does not render HTML, and is displayed raw. Email clients are typically configured to display the HTML section by default to the end-user, if it's present, and only display the
plain section as a fall-back. Email messages are not required to use both of these sections.
For HTML bodies, content is stored in two fields:
The original HTML body is preserved in the
raw field, and the internal decoded is stored in
inner_text. Unless you need to match specific HTML elements, it's best to prefer
Example HTML body
<!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office"> <head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <style type="text/css" media="all">/* a lot of CSS, totally ignored! */</style> </head> <body width="100%"> <span style="color:transparent;visibility:hidden;display:none;opacity:0;height:0;width:0;font-size:0;">This & that are in a hidden span.</span><img src="https://test.local/img.png"> <!-- Here's a commentInsert ‌ hack after hidden preview text --> </div> <p>Some paragraph content, before a table</p> <table> <tr> <td>Row 1, Column 1. <span>Span contents inside R1C1</span></td> <td>Row 1, Column 2</td> </tr> <tr> <td>Row 2, Column 1</td> <td>Row 2, Column 2</td> </tr> </table> <!-- comment before an image link --> <a href="https://test.local"><img src="https://test.local/img.png"></a> <div>Copyright © 2022</div> </body> </html>
The inner text from the parsed HTML is much more compact. Note that newlines are automatically inserted between tags, regardless of whether they display on the same line visually.
This & that are in a hidden span. Some paragraph content, before a table Row 1, Column 1 Span contents inside R1C1 Row 1, Column 2 Row 2, Column 1 Row 2, Column 2 Copyright © 2022
When searching inside HTML contents
Due to the size of HTML content, searching inside
body.html.rawcan be very time intensive. For better performance, consider writing rules that use
body.html.inner_textinstead, which contains the unescaped text inside the HTML, with different tags over different lines. The parsed HTML field,
body.html.inner_text, is much smaller and is significantly faster to search.
For plain bodies, content is stored in
We can search both text sections easily to detect specific keywords or phrases:
any([body.plain.raw, body.html.inner_text], ilike(., "*voicemail*", "*password reset*"))
We can use regular expressions if we're looking for something more complex, like a Social Security Number:
any([body.plain.raw, body.html.inner_text], regex_search(., '\b(\d\d\d)-(\d\d)-(\d\d\d\d)\b'))
Updated 3 months ago