John Smith's Blog

Ramblings (mostly) about technical stuff

Test post to verify migration to App Engine High-Replication Datastore worked OK

Posted by John Smith on

The App Engine console is a paragon of crapness at the best of times, but the functionality to do a migration from the old master/slave datastore is in a class of its own.

Hopefully I'll only have to do this the once - at least for this blog, I've still got a bunch of other apps that in theory should be migrated, although a number of them don't actually use the datastore, so fingers-crossed I can just leave them as-is.

Reinvented the wheel and built my own IP address checker

Posted by John Smith on

I've recently started started using a VPN for the first time in years, and was using WhatIsMyIP to sanity check that I was indeed seeing the net via a different IP than that provided by my ISP. However, there were a few things I wasn't too happy about:

  • I was concerned that my repeated queries to that site might be detected as abusive.
  • Alternatively, I might be seeing cached results from an earlier query on a different network setup.
  • As someone happiest using the Unix command line, neither switching to a browser window, nor using curl and parsing the HTML output, were ideal.

So, I spent a few hours knocking up my own variation of this type of service, doubtless the gazillionth implementation clogging up the internet, which you can find here. While it's still pretty basic, there are a couple of features that I haven't noticed in other implementations:

  • A Geo-IP lookup is done, to identify the originating country, region, city and latitude and longitude. This data is obtained via a Google API, so it's probably as accurate as these things get - which isn't very much, at least at the lat/long level. (The main motivation for adding this functionality was to help analyse if my VPN can be abused to break region restrictions on sites like Hulu ;-)
  • To make things more convenient for non-browser uses, multiple output formats are supported (HTML, plain text, CSV, XML and JSON), which can be specified either by an old-school format=whatever CGI argument, or a more RESTful way using the HTTP Accept header.

Here are a couple of examples of usage: [john@hamburg ~]$ curl -H "Accept: text/plain" "http://report-ip.appspot.com" IP Address: x.x.x.x Country: GB Region: eng City: london Lat/Long: 51.513330,-0.088947 Accept: text/plain Content-Type: ; charset="utf-8" Host: report-ip.appspot.com User-Agent: curl/7.21.3 (x86_64-redhat-linux-gnu) libcurl/7.21.3 NSS/3.13.1.0 zlib/1.2.5 libidn/1.19 libssh2/1.2.7 [john@hamburg ~]$ curl "http://report-ip.appspot.com/?format=json" { "ipAddress": "x.x.x.x", "country": "GB", "region": "eng", "city": "london", "latLong": "51.513330,-0.088947", "headers": { "Accept": "*/*", "Content-Type": "; charset="utf-8"", "Host": "report-ip.appspot.com", "User-Agent": "curl/7.21.3 (x86_64-redhat-linux-gnu) libcurl/7.21.3 NSS/3.13.1.0 zlib/1.2.5 libidn/1.19 libssh2/1.2.7" } }

I've created a project on GitHub, so you can see how minimal the underlying Python code is. The README has some notes about what extra stuff I might add in at some point, in the event I can be bothered.

As the live app is just running off an unbilled App Engine instance, it won't take much traffic before hitting the free quota limits. As such, in the unlikely event that someone out there wants to make use of this, you might be better off grabbing the code from the repo and deploying it to your own App Engine instance.

Converting old App Engine code to Python 2.7/Django 1.2/webapp2

Posted by John Smith on

I'm borrowing the code for this blog for another project I'm working on, and it seemed to make sense to take the opportunity to bring it up to speed with the latest-and-greatest in the world of App Engine, which is:

  • Python 2.7 (the main benefit for me; I don't like having to dick around with 2.6 or 2.5 installations)
  • multithreading (not really needed for the negligible traffic I get, but worth having, especially given that the new billing scheme seems to assume you'll have this enabled if you don't want to be ripped off)
  • webapp2 (which seems to the recommended serving mechanism if you're not going to a "proper" Django infrastructure)
  • Django 1.2 templating (I'd used this on a work project a few months ago, but the blog was still using 0.96

Of course, having so many changed elements in the mix in a single hit is a recipe for disaster; with things breaking left, right and centre, trying to work out what the cause was was a bit needle-in-a-haystackish. It didn't help that the Py2.7 docs on the official site are still very sketchy, so I ended up digging through the library code quite a bit to suss out what was happening.

As far as I can tell, I've now got everything fixed and working - although this site is still running the old code, as the Python 2.7 runtime has a dependency on the HR datastore, and this app is still using Master/Slave.

I ended up writing a mini-app, in order to develop and test the fixes without all the cruft from my blog code, which I'll see about uploading to my GitHub account at some point. In the mean-time, here are my notes about the stuff I changed. I'm sure there are things which are sub-optimal or incomplete, but hopefully they might save someone else time...

app.yaml

  • Change runtime from python to python27
  • Add threadsafe: true
  • Add a libraries section: libraries: - name: django version: "1.2"
  • Change handler script references from foo.py to foo.app
  • Only scripts in the top-level directory work as handlers, so if you have any in subdirectories, they'll need to be moved, and the script reference changed accordingly: - url: /whatever # This doesn't work ... # script: lib/some_library/handler.app # ... this does work script: handler.app

Templates

  • In Django 1.2 escaping is enabled by default. If you need HTML to be passed through unmolested, use something like: {% autoescape off %} {{ myHTMLString }} {% endautoescape %}
  • If you're using {% extends %}, paths are referenced relative to the template base directory, not to that file. Here's an table showing examples of the old and new values:
    File Old {% extends %} value New {% extends %} value
    base.html N/A N/A
    admin/adminbase.html "../base.html" "base.html"
    admin/index.html "adminbase.html" "admin/adminbase.html"
  • If you have custom tags or filters, you need to {% load %} them in the template, rather than using webapp.template.register_template_library() in your main Python code.
    e.g.
    Old code (in your Python file): webapp.template.register_template_library('django_custom_tags') New code (in your template): {% load django_custom_tags %} (There's more that has to be done in this area; see below.)

Custom tag/filter code

  • Previously you could just have these in a standalone .py file which would be pulled in via webapp.template.register_template_library(). Instead now you'll have to create an Django app to hold them:
    1. In a Django settings.py file, add the new app to INSTALLED_APPS e.g.: INSTALLED_APPS = ('customtags')
    2. Create an app directory structure along the following lines: customtags/ customtags/__init__.py customtags/templatetags/ customtags/templatetags/__init__.py customtags/templatetags/django_custom_tags.py Both the __init__.py files can be zero-length. Replace customtags and django_custom_tags with whatever you want - the former is what should be referenced in INSTALLED_APPS, the latter is what you {% load "whatever" %} in your templates.
    3. In your file(s) in the templatetags/ directory, you need to change the way the new tags/filters are registered at the top of the file.
      Old code: from google.appengine.ext.webapp import template register = template.create_template_register() New code: from django.template import Library register = Library() The register.tag() and register.filter() calls will then work the same as previously.

Handlers

  • Change from google.appengine.ext import webapp to import webapp2 and change your RequestHandler classes and WSGIApplication accordingly
  • If your WSIApplication ran from within a main() function, move it out.
    e.g.
    Old code:
    def main(): application = webapp.WSGIApplication(...) wsgiref.handlers.CGIHandler().run(application) if __name__ == '__main__': main() New code: app = webapp2.WSGIApplication(...) Note in the new code:
    1. The lack of a run() call
    2. That the WSGIApplication must be called app - if it isn't, you'll get an error like: ERROR 2012-01-29 22:17:37,607 wsgi.py:170] Traceback (most recent call last): File "/proj/3rdparty/appengine/google_appengine_161/google/appengine/runtime/wsgi.py", line 168, in Handle handler = _config_handle.add_wsgi_middleware(self._LoadHandler()) File "/proj/3rdparty/appengine/google_appengine_161/google/appengine/runtime/wsgi.py", line 220, in _LoadHandler raise ImportError('%s has no attribute %s' % (handler, name)) ImportError: has no attribute app
  • Any 'global' changes you might make at the main level won't be applied across every invocation of the RequestHandlers - I'm thinking of things like setting a different logging level, or setting the DJANGO_SETTINGS_MODULE. These have to be done within the methods of your handlers instead. As this is obviously painful to do for every handler, you might consider using custom handler classes to handle the burden - see below.

Rendering Django templates

The imports and calls to render a template from a file need changing.
Old code: from google.appengine.ext.webapp import template ... rendered_content = template.render(template_path, {...}) New code: from django.template.loaders.filesystem import Loader from django.template.loader import render_to_string ... rendered_content = render_to_string(template_file, {...}) As render_to_string() doesn't explicitly get told where your templates live, you need to do this in settings.py: import os PROJECT_ROOT = os.path.dirname(__file__) TEMPLATE_DIRS = (os.path.join(PROJECT_ROOT, "templates"),)

Custom request handlers

As previously mentioned, where previously you could easily set global environment stuff, these now have to be done in each handler. As this is painful, one nicer solution is to create a special class to set all that stuff up, and then have your handlers inherit from that rather than webapp2.RequestHandler.

Here's a handler to be more talkative in the logs, and which also sets up the DJANGO_SETTINGS_MODULE environment variable. class LoggingHandler(webapp2.RequestHandler): def __init__(self, request, response): self.initialize(request, response) logging.getLogger().setLevel(logging.DEBUG) self.init_time = time.time() os.environ["DJANGO_SETTINGS_MODULE"] = "settings" def __del__(self): logging.debug("Handler for %s took %.2f seconds" % (self.request.url, time.time() - self.init_time)) A couple of things to note:

  1. the webapp2.RequestHandler constructor takes request and response parameters, whereas webapp.RequestHandler just took a single self parameter
  2. Use the .initialize() method to set up the object before doing your custom stuff, rather than __init__(self)

Twitter feed on this blog broken

Posted by John Smith on

A lame update in just about every respect...

I've belatedly noticed that the tweet panel on the right hasn't been updated for a week or so. Investigating this using App Engine's local development server though, the latest tweets were pulled in fine.

The logs on the live server indicated that Twitter was returning an HTTP 400 'Bad Request' status, leading me to suspect that maybe something within the Google infrastructure was mangling the request in some way. Only by dumping the HTTP headers returned from Twitter did I find that my requests were actually being refused due to a rate-limit being breached - which means that the 400 status was a complete red herring, something like 509 'Bandwidth Limit Exceeded' would be far more accurate about describing the true cause of the problem.

Digging up the Twitter API docs, it seems that unauthenticated GET requests are rate-limited by IP address. Given that thousands of App Engine apps will effectively all share the same IP, it's hardly surprising that the limit of 150 requests (per hour?) has already been reached every time I make a request. I've been caught out by this before with other APIs, but this is the first time I've been aware of any such problems with Twitter. I guess I'll have to upgrade my code to use the OAuth or something - assuming I can be bothered.

In this supposed era of cloud computing, rate-limiting by IP seems a bit of a crummy thing to do. It'll probably be reasonable if/when IPv6 becomes the default addressing method, but when just about everyone is still using IPv4, it's a colossal pain in the backside.

App Engine: What the docs don't tell you about processing inbound mail

Posted by John Smith on

For a while, I've wanted to add functionality to this blog to allow me to submit content via email; initially just photos, but eventually actual posts as well. As seems par for the course for any code involving email, stuff you'd expect to be simple and straightforward turns out to be anything but :-(

It doesn't help that the App Engine docs on receiving email gloss over a lot of stuff, so this is my attempt to try to fill in the gaps and cover the gotchas, so that others don't have to go through as much hassle as I did.

dev_appserver doesn't simulate inbound attachments

First off, whilst the current dev_appserver (1.4.1) does allow you to simulate sending a mail in, it doesn't have any explicit functionality for email attachments. This means you have the joy of doing your testing on the real App Engine. Now, luckily for me, I was able to (a) test this code in isolation without affecting the public functionality, and (b) do my deployments without any of the hanging that App Engine has a habit of doing every now and again, but it's still a painful way of evolving and testing code.

(Theoretically I imagine it's possible to cut-and-paste in the "raw" email bodies with Content-Type, Content-Disposition, base64 encoded data etc, to test attachments in the dev_appserver but I haven't tried it personally.)

Sender addresses aren't (just) e-mail addresses

As part of the protection against spam (or worse), I have a whitelist of acceptable senders; mails from anyone else get ignored. My first attempt at code for this was along the line of: if mail_msg.sender not in VALID_SENDERS: logging.error("...") return However, the sender property contains the full value of the Sender: header, so it's likely to be set to something like Fred Bloggs <fred@bloggs.com>. Whilst code to support this isn't exactly difficult, it's something that you wouldn't realize you needed to do when doing pseudo-mails on dev_appserver. Here's my code: is_valid = False for valid_sender in blog_settings.VALID_MAIL_SENDERS: if mail_msg.sender.find(valid_sender) >= 0: is_valid = True break if not is_valid: logging.error("Received mail from invalid sender '%s' - ignoring" % mail_msg.sender) return Now, this isn't perfect by any means - it should probably look for an exact match within the angle brackets, so that it doesn't get fooled by an email address in the "real name part" - but given how easy it is to fake a sender, I'm not too concerned; I have other protections in place, this is just a basic filter.

If there aren't any attachments, the attachments property doesn't exist, rather than being None

It's covered in this short thread but in summary: rather than having the attachments property be None or [] if an email lacks attachments, it doesn't actually exist, and so you have to use a try/except handler. Again, this is nothing difficult, but it is something you wouldn't necessarily realize until it bit you. try: logging.debug("Mail from %s has %d attachments" % (mail_msg.sender, len(mail_msg.attachments))) except AttributeError, e: logging.warning("Mail from %s has zero attachments - ignoring" % (mail_msg.sender)) return

You have to work out the attachment MIME type for yourself

The attachments property (if it exists) is a list of 2-member tuples. The first part of the tuple is the filename, the second the content. It would be nice if App Engine provided another member containing the MIME type that's defined in the Content-Type header where the filename is also specified, but unfortunately not :-( Instead you have to work it out for yourself, whether from the filename suffix, doing a magic number check on the file or using the original property to parse the message yourself.

Now, it's true that what a sender says the file type is shouldn't be blindly trusted to be legit or correct. However, it wouldn't hurt to have that information to use in an initial check for the >99% of cases that it is OK.

If you're going to trust the file extension (which is probably easier to fake than the MIME type...), you might want to look at google.appengine.api.mail, which has an EXTENSION_MIME_MAP dictionary. I've not used it personally - I'm currently only interested in a handful of common image formats - but it might be a reasonable base for working out the MIME type.

Attachments need decoding

The second member of the tuple in the attachments list is a google.appengine.api.mail.EncodedPayload. This has to be decoded using something along the lines of: for att in mail_msg.attachments: filename, encoded_data = att data = encoded_data.payload if encoded_data.encoding: data = data.decode(encoded_data.encoding) ... That class doesn't seem to support the len() function, so I'm not sure how you might protect yourself against a huge attachment that either can't be decoded before the timeout hits, or takes up more memory than App Engine is prepared to give you. I'm also assuming that the .decode method covers all the encodings that you might potentially receive. (Although I'm yet to see anything that isn't base64 in my own tests.)

Plain text bodies need decoding as well

You can explicitly request the plain-text message bodies (as opposed to any HTML bodies), but somewhat surprisingly, these aren't actually plain text! Instead they are EncodedPayload objects, and need decoding in a similar manner to the attachments. for b in mail_msg.bodies("text/plain"): body_type, pl = b try: if pl.encoding: logging.debug("Body: %s" % (pl.payload.decode(pl.payload.encoding))) else: logging.debug("Body: %s" % (pl.payload)) except Exception, e: logging.debug("Body: %s" % (pl)) (It wouldn't surprise me if the above code might have Unicode issues on certain content, but that's unlikely to be an issue in my own personal use.)

Email processing does retry if the code bombs (I think)

I'm not 100% sure on this one, and IMHO it's more of a positive feature than a gotcha, but it doesn't seem to be in the docs, so it's worth mentioning - the mail processing seems to work similar to task queue jobs, in that if a failure occurs, there are retries at gradually increasing intervals.

 

I'm sure there are other nasties involved in processing incoming emails, but my code seems to work fine now, so hopefully the above lessons might be of use to anyone else about to venture into this area. (Doubtless about 5 minutes after posting this I'll find that either I've been doing this all wrong, or that all of the above is fully documented somewhere that I haven't seen...)

First release of my App Engine library for easier memcaching of pages

Posted by John Smith on

I've just pushed memcachablehandler to GitHub, which is a small Python App Engine library to make it easy to memcache pages - or images, or anything else you might serve up - and re-serve them without having to regenerate them from a Django template or suchlike. This should speed up response times ever so slightly, and also maybe make things more reliable as well (based on my personal experience with the memcache vs datastore availability).

The library is a slightly-tweaked version of some of the code that I've had in this blog for the past few days, so hopefully it's not too buggy. I know I'm not the first to write something like this - see the README for a link to something similar - but maybe it could come in useful to someone else?

I don't currently have any plans to extend the functionality beyond what's already there, but anything that gets updated in this blog should get pushed into that repo in fairly short order. At some point I'll probably make this blog code public as well, but I want to get it in a much more polished state before daring to show it to the world :-)

About this blog

This blog (mostly) covers technology and software development.

Note: I've recently ported the content from my old blog hosted on Google App Engine using some custom code I wrote, to a static site built using Pelican. I've put in place various URL manipulation rules in the webserver config to try to support the old URLs, but it's likely that I've missed some (probably meta ones related to pagination or tagging), so apologies for any 404 errors that you get served.

RSS icon, courtesy of www.feedicons.com RSS feed for this blog

About the author

I'm a software developer who's worked with a variety of platforms and technologies over the past couple of decades, but for the past 7 or so years I've focussed on web development. Whilst I've always nominally been a "full-stack" developer, I feel more attachment to the back-end side of things.

I'm a web developer for a London-based equities exchange. I've worked at organizations such as News Corporation and Google and BATS Global Markets. Projects I've been involved in have been covered in outlets such as The Guardian, The Telegraph, the Financial Times, The Register and TechCrunch.

Twitter | LinkedIn | GitHub | My CV | Mail

Popular tags

Other sites I've built or been involved with

Work

Most of these have changed quite a bit since my involvement in them...

Personal/fun/experimentation