For a while, I've wanted to add functionality to this blog to allow me to submit content via email; initially just photos, but eventually actual posts as well. As seems par for the course for any code involving email, stuff you'd expect to be simple and straightforward turns out to be anything but :-(
It doesn't help that the App Engine docs on receiving email gloss over a lot of stuff, so this is my attempt to try to fill in the gaps and cover the gotchas, so that others don't have to go through as much hassle as I did.
dev_appserver doesn't simulate inbound attachments
First off, whilst the current dev_appserver (1.4.1) does allow you to simulate sending a mail in, it doesn't have any explicit functionality for email attachments. This means you have the joy of doing your testing on the real App Engine. Now, luckily for me, I was able to (a) test this code in isolation without affecting the public functionality, and (b) do my deployments without any of the hanging that App Engine has a habit of doing every now and again, but it's still a painful way of evolving and testing code.
(Theoretically I imagine it's possible to cut-and-paste in the "raw" email bodies with Content-Type, Content-Disposition, base64 encoded data etc, to test attachments in the dev_appserver but I haven't tried it personally.)
Sender addresses aren't (just) e-mail addresses
As part of the protection against spam (or worse), I have a whitelist of acceptable senders; mails from anyone else get ignored. My first attempt at code for this was along the line of:
if mail_msg.sender not in VALID_SENDERS:
logging.error("...")
return
However, the sender property contains the full value of the Sender: header, so it's likely to be set to something like Fred Bloggs <fred@bloggs.com>. Whilst code to support this isn't exactly difficult, it's something that you wouldn't realize you needed to do when doing pseudo-mails on dev_appserver. Here's my code:
is_valid = False
for valid_sender in blog_settings.VALID_MAIL_SENDERS:
if mail_msg.sender.find(valid_sender) >= 0:
is_valid = True
break
if not is_valid:
logging.error("Received mail from invalid sender '%s' - ignoring" % mail_msg.sender)
return
Now, this isn't perfect by any means - it should probably look for an exact match within the angle brackets, so that it doesn't get fooled by an email address in the "real name part" - but given how easy it is to fake a sender, I'm not too concerned; I have other protections in place, this is just a basic filter.
If there aren't any attachments, the attachments property doesn't exist, rather than being None
It's covered in this short thread but in summary: rather than having the attachments property be None or [] if an email lacks attachments, it doesn't actually exist, and so you have to use a try/except handler. Again, this is nothing difficult, but it is something you wouldn't necessarily realize until it bit you.
try:
logging.debug("Mail from %s has %d attachments" % (mail_msg.sender, len(mail_msg.attachments)))
except AttributeError, e:
logging.warning("Mail from %s has zero attachments - ignoring" % (mail_msg.sender))
return
You have to work out the attachment MIME type for yourself
The attachments property (if it exists) is a list of 2-member tuples. The first part of the tuple is the filename, the second the content. It would be nice if App Engine provided another member containing the MIME type that's defined in the Content-Type header where the filename is also specified, but unfortunately not :-( Instead you have to work it out for yourself, whether from the filename suffix, doing a magic number check on the file or using the original property to parse the message yourself.
Now, it's true that what a sender says the file type is shouldn't be blindly trusted to be legit or correct. However, it wouldn't hurt to have that information to use in an initial check for the >99% of cases that it is OK.
If you're going to trust the file extension (which is probably easier to fake than the MIME type...), you might want to look at google.appengine.api.mail, which has an EXTENSION_MIME_MAP dictionary. I've not used it personally - I'm currently only interested in a handful of common image formats - but it might be a reasonable base for working out the MIME type.
Attachments need decoding
The second member of the tuple in the attachments list is a google.appengine.api.mail.EncodedPayload. This has to be decoded using something along the lines of:
for att in mail_msg.attachments:
filename, encoded_data = att
data = encoded_data.payload
if encoded_data.encoding:
data = data.decode(encoded_data.encoding)
...
That class doesn't seem to support the len() function, so I'm not sure how you might protect yourself against a huge attachment that either can't be decoded before the timeout hits, or takes up more memory than App Engine is prepared to give you. I'm also assuming that the .decode method covers all the encodings that you might potentially receive. (Although I'm yet to see anything that isn't base64 in my own tests.)
Plain text bodies need decoding as well
You can explicitly request the plain-text message bodies (as opposed to any HTML bodies), but somewhat surprisingly, these aren't actually plain text! Instead they are EncodedPayload objects, and need decoding in a similar manner to the attachments.
for b in mail_msg.bodies("text/plain"):
body_type, pl = b
try:
if pl.encoding:
logging.debug("Body: %s" % (pl.payload.decode(pl.payload.encoding)))
else:
logging.debug("Body: %s" % (pl.payload))
except Exception, e:
logging.debug("Body: %s" % (pl))
(It wouldn't surprise me if the above code might have Unicode issues on certain content, but that's unlikely to be an issue in my own personal use.)
Email processing does retry if the code bombs (I think)
I'm not 100% sure on this one, and IMHO it's more of a positive feature than a gotcha, but it doesn't seem to be in the docs, so it's worth mentioning - the mail processing seems to work similar to task queue jobs, in that if a failure occurs, there are retries at gradually increasing intervals.
I'm sure there are other nasties involved in processing incoming emails, but my code seems to work fine now, so hopefully the above lessons might be of use to anyone else about to venture into this area. (Doubtless about 5 minutes after posting this I'll find that either I've been doing this all wrong, or that all of the above is fully documented somewhere that I haven't seen...)