John Smith's Blog

Ramblings (mostly) about technical stuff

App Engine: What the docs don't tell you about processing inbound mail

Posted by John Smith on

For a while, I've wanted to add functionality to this blog to allow me to submit content via email; initially just photos, but eventually actual posts as well. As seems par for the course for any code involving email, stuff you'd expect to be simple and straightforward turns out to be anything but :-(

It doesn't help that the App Engine docs on receiving email gloss over a lot of stuff, so this is my attempt to try to fill in the gaps and cover the gotchas, so that others don't have to go through as much hassle as I did.

dev_appserver doesn't simulate inbound attachments

First off, whilst the current dev_appserver (1.4.1) does allow you to simulate sending a mail in, it doesn't have any explicit functionality for email attachments. This means you have the joy of doing your testing on the real App Engine. Now, luckily for me, I was able to (a) test this code in isolation without affecting the public functionality, and (b) do my deployments without any of the hanging that App Engine has a habit of doing every now and again, but it's still a painful way of evolving and testing code.

(Theoretically I imagine it's possible to cut-and-paste in the "raw" email bodies with Content-Type, Content-Disposition, base64 encoded data etc, to test attachments in the dev_appserver but I haven't tried it personally.)

Sender addresses aren't (just) e-mail addresses

As part of the protection against spam (or worse), I have a whitelist of acceptable senders; mails from anyone else get ignored. My first attempt at code for this was along the line of: if mail_msg.sender not in VALID_SENDERS: logging.error("...") return However, the sender property contains the full value of the Sender: header, so it's likely to be set to something like Fred Bloggs <fred@bloggs.com>. Whilst code to support this isn't exactly difficult, it's something that you wouldn't realize you needed to do when doing pseudo-mails on dev_appserver. Here's my code: is_valid = False for valid_sender in blog_settings.VALID_MAIL_SENDERS: if mail_msg.sender.find(valid_sender) >= 0: is_valid = True break if not is_valid: logging.error("Received mail from invalid sender '%s' - ignoring" % mail_msg.sender) return Now, this isn't perfect by any means - it should probably look for an exact match within the angle brackets, so that it doesn't get fooled by an email address in the "real name part" - but given how easy it is to fake a sender, I'm not too concerned; I have other protections in place, this is just a basic filter.

If there aren't any attachments, the attachments property doesn't exist, rather than being None

It's covered in this short thread but in summary: rather than having the attachments property be None or [] if an email lacks attachments, it doesn't actually exist, and so you have to use a try/except handler. Again, this is nothing difficult, but it is something you wouldn't necessarily realize until it bit you. try: logging.debug("Mail from %s has %d attachments" % (mail_msg.sender, len(mail_msg.attachments))) except AttributeError, e: logging.warning("Mail from %s has zero attachments - ignoring" % (mail_msg.sender)) return

You have to work out the attachment MIME type for yourself

The attachments property (if it exists) is a list of 2-member tuples. The first part of the tuple is the filename, the second the content. It would be nice if App Engine provided another member containing the MIME type that's defined in the Content-Type header where the filename is also specified, but unfortunately not :-( Instead you have to work it out for yourself, whether from the filename suffix, doing a magic number check on the file or using the original property to parse the message yourself.

Now, it's true that what a sender says the file type is shouldn't be blindly trusted to be legit or correct. However, it wouldn't hurt to have that information to use in an initial check for the >99% of cases that it is OK.

If you're going to trust the file extension (which is probably easier to fake than the MIME type...), you might want to look at google.appengine.api.mail, which has an EXTENSION_MIME_MAP dictionary. I've not used it personally - I'm currently only interested in a handful of common image formats - but it might be a reasonable base for working out the MIME type.

Attachments need decoding

The second member of the tuple in the attachments list is a google.appengine.api.mail.EncodedPayload. This has to be decoded using something along the lines of: for att in mail_msg.attachments: filename, encoded_data = att data = encoded_data.payload if encoded_data.encoding: data = data.decode(encoded_data.encoding) ... That class doesn't seem to support the len() function, so I'm not sure how you might protect yourself against a huge attachment that either can't be decoded before the timeout hits, or takes up more memory than App Engine is prepared to give you. I'm also assuming that the .decode method covers all the encodings that you might potentially receive. (Although I'm yet to see anything that isn't base64 in my own tests.)

Plain text bodies need decoding as well

You can explicitly request the plain-text message bodies (as opposed to any HTML bodies), but somewhat surprisingly, these aren't actually plain text! Instead they are EncodedPayload objects, and need decoding in a similar manner to the attachments. for b in mail_msg.bodies("text/plain"): body_type, pl = b try: if pl.encoding: logging.debug("Body: %s" % (pl.payload.decode(pl.payload.encoding))) else: logging.debug("Body: %s" % (pl.payload)) except Exception, e: logging.debug("Body: %s" % (pl)) (It wouldn't surprise me if the above code might have Unicode issues on certain content, but that's unlikely to be an issue in my own personal use.)

Email processing does retry if the code bombs (I think)

I'm not 100% sure on this one, and IMHO it's more of a positive feature than a gotcha, but it doesn't seem to be in the docs, so it's worth mentioning - the mail processing seems to work similar to task queue jobs, in that if a failure occurs, there are retries at gradually increasing intervals.

 

I'm sure there are other nasties involved in processing incoming emails, but my code seems to work fine now, so hopefully the above lessons might be of use to anyone else about to venture into this area. (Doubtless about 5 minutes after posting this I'll find that either I've been doing this all wrong, or that all of the above is fully documented somewhere that I haven't seen...)

About this blog

This blog (mostly) covers technology and software development.

Note: I've recently ported the content from my old blog hosted on Google App Engine using some custom code I wrote, to a static site built using Pelican. I've put in place various URL manipulation rules in the webserver config to try to support the old URLs, but it's likely that I've missed some (probably meta ones related to pagination or tagging), so apologies for any 404 errors that you get served.

RSS icon, courtesy of www.feedicons.com RSS feed for this blog

About the author

I'm a software developer who's worked with a variety of platforms and technologies over the past couple of decades, but for the past 7 or so years I've focussed on web development. Whilst I've always nominally been a "full-stack" developer, I feel more attachment to the back-end side of things.

I'm a web developer for a London-based equities exchange. I've worked at organizations such as News Corporation and Google and BATS Global Markets. Projects I've been involved in have been covered in outlets such as The Guardian, The Telegraph, the Financial Times, The Register and TechCrunch.

Twitter | LinkedIn | GitHub | My CV | Mail

Popular tags

Other sites I've built or been involved with

Work

Most of these have changed quite a bit since my involvement in them...

Personal/fun/experimentation