John Smith's Blog

Ramblings (mostly) about technical stuff

Reinvented the wheel and built my own IP address checker

Posted by John Smith on

I've recently started started using a VPN for the first time in years, and was using WhatIsMyIP to sanity check that I was indeed seeing the net via a different IP than that provided by my ISP. However, there were a few things I wasn't too happy about:

  • I was concerned that my repeated queries to that site might be detected as abusive.
  • Alternatively, I might be seeing cached results from an earlier query on a different network setup.
  • As someone happiest using the Unix command line, neither switching to a browser window, nor using curl and parsing the HTML output, were ideal.

So, I spent a few hours knocking up my own variation of this type of service, doubtless the gazillionth implementation clogging up the internet, which you can find here. While it's still pretty basic, there are a couple of features that I haven't noticed in other implementations:

  • A Geo-IP lookup is done, to identify the originating country, region, city and latitude and longitude. This data is obtained via a Google API, so it's probably as accurate as these things get - which isn't very much, at least at the lat/long level. (The main motivation for adding this functionality was to help analyse if my VPN can be abused to break region restrictions on sites like Hulu ;-)
  • To make things more convenient for non-browser uses, multiple output formats are supported (HTML, plain text, CSV, XML and JSON), which can be specified either by an old-school format=whatever CGI argument, or a more RESTful way using the HTTP Accept header.

Here are a couple of examples of usage: [john@hamburg ~]$ curl -H "Accept: text/plain" "" IP Address: x.x.x.x Country: GB Region: eng City: london Lat/Long: 51.513330,-0.088947 Accept: text/plain Content-Type: ; charset="utf-8" Host: User-Agent: curl/7.21.3 (x86_64-redhat-linux-gnu) libcurl/7.21.3 NSS/ zlib/1.2.5 libidn/1.19 libssh2/1.2.7 [john@hamburg ~]$ curl "" { "ipAddress": "x.x.x.x", "country": "GB", "region": "eng", "city": "london", "latLong": "51.513330,-0.088947", "headers": { "Accept": "*/*", "Content-Type": "; charset="utf-8"", "Host": "", "User-Agent": "curl/7.21.3 (x86_64-redhat-linux-gnu) libcurl/7.21.3 NSS/ zlib/1.2.5 libidn/1.19 libssh2/1.2.7" } }

I've created a project on GitHub, so you can see how minimal the underlying Python code is. The README has some notes about what extra stuff I might add in at some point, in the event I can be bothered.

As the live app is just running off an unbilled App Engine instance, it won't take much traffic before hitting the free quota limits. As such, in the unlikely event that someone out there wants to make use of this, you might be better off grabbing the code from the repo and deploying it to your own App Engine instance.

In praise of help() in Python's REPL

Posted by John Smith on

For various reasons, I'm doing a bit of JavaScript/CoffeeScript work at the moment, which involves use of some functions in the core libraries which I'd not really used in the past. A minor aspect of this involves logarithmic values, and I was a bit surprised and then disappointed that JavaScript's Math.log() isn't as flexible as its Python near-namesake math.log(): # Python >>> math.log(256,2) # I want the result for base 2 8.0 versus # CoffeeScript coffee> Math.log(256,2) Math.log(256,2) 5.545177444479562

Now, it's probably unreasonable of me to expect the JavaScript version of this function to behave exactly the same as the Python version, especially as the (presumably) underlying C function only takes a single value argument. (Although it might have been nice to get a warning about the ignored second argument, rather than silence...)

On the other hand though, it reminded me of how much more civilized Python is compared to JavaScript. When I'm hacking around, I almost always have a spare window open with a running REPL process, that allows me to quickly check and test stuff, and can very easily pull up the docs via the help() function if I need further info. In contrast, to do the same in JavaScript I have to move over to a browser window and search for info on sites like MDN, or resort to my trusty copies of The Definitive Guide, neither of which are anywhere near as convenient.

After a brief bit of Googling and a plea for help on Twitter, I was unable to find any equivalent to this functionality in the JavaScript world - and let's face it, help() is pretty basic stuff when compared to what the likes of IPython and bpython offer the fortunate Python developer.

I'd love to be corrected on this, and be told about some nice CLI-tool for JavaScript that can help me out. (But not some overblown IDE that would require me to radically change my established development environment, I hasten to add!) I'm not expecting this to happen though - Python's help() relies heavily on docstrings, and I'm not aware that anything such as JsDoc is in common usage in the JavaScript community?

Enhanced version of Python's SimpleHTTPServer that supports HTTP Range

Posted by John Smith on

I've just uploaded a small personal project to GitHub here. It's basically a very crude webserver that allows me to share audio files on my Linux boxes to my iOS devices, using Mobile Safari.

The main reason for noting this is that the code may be of more general interest because it implements an improved version of Python stdlib's SimpleHTTPServer module, that implements basic support for the Range header in HTTP requests, which is necessary for Mobile Safari on some MP3 files.

During early development, I found that some MP3 files would refuse to play in Mobile Safari when served by SimpleHTTPServer. The same file would play fine if served by Apache. Because debugging mobile web browsers is a PITA (caveat: I've haven't kept up with the latest-and-greatest in this area), I ended up resorting to Wireshark to see what was going on.

Wireshark indicated that Mobile Safari would request chunks of the MP3 file (initially just the first couple of bytes), but SimpleHTTPServer would always serve the entire file, because it never checked for the existence of the Range header. On certain files, this wouldn't bother Mobile Safari, but on others it would cause the audio player widget to show an unhelpful generic error.

Once I understood what the problem was, I found that I'm not the first person to get caught out by this, and that Apple themselves state that servers need to support Range to keep Mobile Safari happy.

To solve the problem, I wrote a new class HTTPRangeRequestHandler that is a direct replacement for SimpleHTTPServer. In my app code proper, I then (try to) pull in my enhanced handler as follows: try: import HTTPRangeServer inherited_server = HTTPRangeServer.HTTPRangeRequestHandler except ImportError: logging.warning("Unable to import HTTPRangeServer, using stdlib's " + "SimpleHTTPServer") import SimpleHTTPServer inherited_server = SimpleHTTPServer.SimpleHTTPRequestHandler ... class MySpecificHandler(inherited_server): ... def main(port=12345): Handler = EnhancedRequestHandler httpd = SocketServer.TCPServer(("", port), Handler) Arguably it might be better for the code to die if HTTPRangeServer cannot be imported, but as the stdlib SimpleHTTPServer is good enough for many browser clients, it doesn't seem too unreasonable to use it as a fallback.

This code is currently OK for most uses, but currently it doesn't support all variations of the Range header as described at aforementioned W3C spec page. It does however support all the requests variants I've seen in my - admittedly very cursory - browser testing, and any requests that it can't parse will instead get the full file served, which is the same behaviour as SimpleHTTPServer.

The musicsharer application that's built on this class is even rougher, but as it's really just intended for my own personal use, you shouldn't hold your breath waiting for me to tart it up...

Visualization of, and musings on, recent Hacker News threads about liked and disliked languages

Posted by John Smith on

For a while now, I've been itching to find an excuse to something in SVG again, so when there were a couple of threads last week on Hacker News about people's most liked and most disliked languages, it felt like an ideal opportunity.

You can view a wider, more legible, version of the scatter plot via this link. I've used logarithmic scaling, as using a regular linear scale, there was just a huge mess in the bottom left corner.

'Like' votes are measured horizontally, 'dislikes' vertically - so the ideal place to be is low down on the right, and the worst is high up on the left. The results are as captured at 2012/03/28 - I'd taken a copy a couple of days earlier, and there had been some changes in the interim, but only by single-digit percentages.

Some thoughts and observations:

  • The poll this data comes from is somewhat imperfect, as already mentioned in the comments in the thread itself. I should also point out that another poster on that thread also did a similar like vs dislike analysis, but I didn't see that post until I'd already started on this.
  • HN is a very pro-Python place - just compare all the threads related to PyCon 2012 versus the lack of noise after most other conferences - so it's hardly surprising who the "winner" is in such a voter base. I do find it odd though that Python doesn't seem to have such a good showing in other corners of the HN world. e.g. of the (relatively few) HN London events I've been to, I don't recall hearing many (any?) of the speakers using Python for their projects/companies - whereas "losers" such as Java and PHP do get namechecked fairly often.
  • I'm amused that CoffeeScript is liked at exactly the same ratio as JavaScript - 76%.
  • I was tempted to do some sort of colour-coding by language type (interpreted vs compiled), age etc - but at initial glance, I don't see any real trends that might indicate why a certain school/group of languages do well or badly.

Alphabetically sorted list of pure Python stdlib modules

Posted by John Smith on

(This is a bit of a lame post - 99% was generated by a script - but I wanted an online copy for my own future reference.)

I was reading the notes about the new stuff in Python 3.3, and it struck me that I didn't know anything about a couple of the modules mentioned. (For the record, they were abc and sched - hopefully my ignorance of them isn't too shameful ;-)

This has motivated me to go through the Python standard library and make sure I have at least a cursory knowledge of all the modules - I'm aiming to do one per day. There is a list on, but it is grouped by theme, and I'd rather have a bit of a change from one day to the next, which hopefully an alphabetically sorted list should have a fair chance of achieving.

To this end, I knocked up a basic script to churn through the stdlib directory, which I can then use as a tick list. Maybe it could be of use to someone else too? Important: the list omits libraries which are written in C - these have __doc__ properties formatted differently from the pure Python libraries, and I think the pure libraries are enough for me to be going on with for now :-)

BTW, after I'd written the script to generate this list, I found that there's a similar (but more nicely formatted) list on Doug Hellmann's site, which annoyingly didn't show up in my Google search queries when I started out on this. It does have references for the C libraries, but I also notice a few libraries in the list below that aren't on that page e.g. ast, bdb, code. As I don't (currently!) know what those libraries are, I don't know if there's a particular reason for their omission.

Abstract Base Classes (ABCs) according to PEP 3119.
Abstract Base Classes (ABCs) for collections, according to PEP 3119.
Stuff to parse AIFF-C and AIFF files.
Generic interface to all dbm clones.
Command-line parsing library
A class supporting chat-style (command/response) protocols.
Basic infrastructure for asynchronous socket service clients and servers.
allow programmer to define multiple exit functions to be executed upon normal program termination.
Classes for manipulating audio devices (currently only for Sun and SGI)
RFC 3548: Base16, Base32, Base64 Data Encodings
HTTP server base class.
ification utility.
Debugger basics
Macintosh binhex compression/decompression.
Bisection algorithms.
Support for Berkeley DB 4.1 through 4.8 with a simple interface.
Calendar printing functions
Support module for CGI (Common Gateway Interface) scripts.
CGI-savvy HTTP Server.
More comprehensive traceback formatting for Python scripts.
Simple class to read IFF chunks.
A generic class to build line-oriented command interpreters.
Utilities needed to emulate Python's interactive interpreter.
Python Codec Registry, API and helpers.
Utilities to compile possibly incomplete Python source code.
Conversion functions between RGB and other color systems.
Execute shell commands via os.popen() and return status, output.
Module/script to "compile" all .py files to .pyc (or .pyo) file.
Package for parsing and compiling Python source code
{Not importable - ImportError}
Configuration file parser.
Utilities for with-statement contexts. See PEP 343.
Here's a sample session to show how to use this module. At the moment, this is the only documentation.
HTTP cookie handling for web clients.
Generic (shallow and deep) copying operations.
Helper to provide extensibility for pickle/cPickle.
Python interface for the 'lsprof' profiler. Compatible with the 'profile' module.
CSV parsing and writing.
create and manipulate C data types in Python
Provide a (g)dbm-compatible interface to bsddb.hashopen.
This is a Py2.3 implementation of decimal floating point arithmetic based on the General Decimal Arithmetic Specification:
{Not importable - ImportError}
helpers for computing deltas between objects.
Read and cache directory listings.
Disassembler of Python byte code into mnemonics.
{Not importable - ImportError}
a framework for running examples in docstrings.
Self documenting XML-RPC Server.
A dumb and slow but simple dbm clone.
Drop-in replacement for the thread module.
Faux ``threading`` version using ``dummy_thread`` instead of ``thread``.
A package for parsing, handling, and generating email messages.
Standard "encodings" Package
Utilities for comparing files and directories.
Helper class to quickly write a loop over all standard input files.
Filename matching with shell patterns.
Generic output formatting.
General floating point formatting functions.
Rational, infinite-precision, real numbers.
An FTP client class and some helper functions.
Tools for working with functions and callable objects
Record of phased-in incompatible language changes.
Path operations common to more than one OS Do not use directly. The OS specific modules import the appropriate functions from this module themselves.
Parser for command line options.
Utilities to get a password and/or the current user name.
Internationalization and localization support.
Filename globbing utility.
Functions that read and write gzipped files.
module - A common interface to many hash functions.
Heap queue algorithm (a.k.a. priority queue).
HMAC (Keyed-Hashing for Message Authentication) Python module.
High-perfomance logging profiler, mostly written in C.
HTML character entity references.
HTML 2.0 parser.
A parser for HTML and XHTML.
HTTP/1.1 client library
Import hook support.
IMAP4 client.
Recognize image file formats based on their first few bytes.
Backport of importlib.import_module from 3.x.
Import utilities
Get useful information from live Python objects.
The io module provides the Python interfaces to stream handling. The builtin open function is defined in this module.
JSON (JavaScript Object Notation) is a subset of JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data interchange format.
Keywords (from "graminit.c")
{Not importable - SyntaxError}
{Not importable - SyntaxError}
Cache lines from files.
Locale support.
Logging package for Python. Based on PEP 282 and comments thereto in comp.lang.python, and influenced by Apache's log4j system.
Load / save to libwww-perl (LWP) format files.
Pathname and path-related operations for the Macintosh.
Macintosh-specific module for conversion between pathnames and URLs.
Read/write support for Maildir, mbox, MH, Babyl, and MMDF mailboxes.
Mailcap file handling. See RFC 1524.
Shared support for scanning document type declarations in HTML and XHTML.
{Undocumented, with warnings - possibly deprecated?}
MH interface -- purely object-oriented (well, almost)
Various tools used by MIME-reading or MIME-writing programs.
Guess the MIME type of a file.
Generic MIME writer.
Mimification and unmimification of mail messages.
Find modules used by a script, using introspection.
Mozilla / Netscape cookie loading / saving.
A readline()-style interface to the parts of a multipart message.
Mutual exclusion -- for use with module sched
An object-oriented interface to .netrc files.
Create new objects of various types. Deprecated.
An NNTP client class based on RFC 977: Network News Transfer Protocol.
Common pathname manipulations, WindowsNT/95 version.
Convert a NT pathname to a file URL and vice versa.
Abstract Base Classes (ABCs) for numbers, according to PEP 3141.
module - potentially shared between dis and other modules which operate on bytecodes (e.g. peephole optimizers).
A powerful, extensible, and easy-to-use option parser.
OS routines for Mac, NT, or Posix depending on what system we're on.
Common pathname manipulations, OS/2 EMX version.
A Python debugger.
Create portable serialized representations of Python objects.
"Executable documentation" for the pickle module.
Conversion pipeline templates.
Utilities to support packages.
This module tries to retrieve as much platform-identifying data as possible. It makes this information available via function APIs.
{Not importable - SyntaxError}
a tool to generate and parse MacOSX .plist files.
Spawn a command with pipes to its stdin, stdout, and optionally stderr.
A POP3 client class.
Extended file operations available in POSIX.
Common operations on Posix pathnames.
Support to pretty-print lists, tuples, & dictionaries recursively.
Class for profiling Python code.
Class for printing reports on profiled python code.
Pseudo terminal utilities.
Parse a Python module and describe its classes and methods.
Routine to "compile" a .py file to a .pyc (or .pyo) file.
Generate Python documentation in HTML or text for interactive use.
Python implementation of the io module.
A multi-producer, multi-consumer queue.
Conversions to/from quoted-printable transport encoding as per RFC 1521.
Random variable generators.
Support for regular expressions (RE).
Redo the builtin repr() (representation) but with limits on most sizes.
Restricted execution facilities.
RFC 2822 message manipulation.
Word completion for GNU readline 2.0.
locating and running Python code using the module namespace
A generally useful event scheduler class.
Classes to represent arbitrary sets (including sets of sets).
A parser for SGML, using the derived class as a static DTD.
{Undocumented, with warnings - possibly deprecated?}
Manage shelves of pickled objects.
A lexical analyzer class for simple shell-like syntaxes.
Utility functions for copying and archiving files and directory trees.
Simple HTTP Server.
Simple XML-RPC Server.
Append module search paths for third-party packages to sys.path.
{Not importable - SyntaxError}
An RFC 2821 smtp proxy.
SMTP/ESMTP client class.
Routines to help recognizing sound files.
This module provides socket operations and some related functions. On Unix, it supports IP (Internet Protocol) and Unix domain sockets. On other systems, it only supports IP. Functions specific for a socket are available as methods of the socket object.
Generic socket server classes.
This file is only retained for backwards compatibility. It will be removed in the future. sre was moved to re in version 2.5.
Internal support module for sre
Internal support module for sre
Internal support module for sre
This module provides some more Pythonic support for SSL.
Constants/functions for interpreting results of os.stat() and os.lstat().
Constants for interpreting the results of os.statvfs() and os.fstatvfs().
A collection of string operations (most are no longer used).
File-like objects that read from or write to a string buffer.
Common string manipulations.
Library that exposes various tables found in the StringPrep RFC 3454.
Strptime-related classes and functions.
Functions to convert between Python values and C structs represented as Python strings. It uses format strings (explained below) as compact descriptions of the lay-out of the C structs and the intended conversion to/from Python values.
Subprocesses with accessible I/O streams
Stuff to parse Sun and NeXT audio files.
Interpret sun audio headers.
Non-terminal symbols of Python grammar (from "graminit.h").
Interface to the compiler's internal symbol tables
Provide access to Python's configuration information.
The Tab Nanny despises ambiguous indentation. She knows no mercy.
Read from and write to tar format archives.
TELNET client class.
Temporary files.
Text wrapping and filling.
The Zen of Python, by Tim Peters
Thread module emulating a subset of Java's threading model.
Thread-local objects.
Tool for measuring execution time of small code snippets.
Convert "arbitrary" sound files to AIFF (Apple and SGI's audio format).
Token constants (from "token.h").
Tokenization help for Python programs.
{Not importable - ImportError}
program/module to trace Python program or function execution
Extract, format and print information about Python stack traces.
Terminal utilities.
Define names for all type symbols known in the standard interpreter.
Python unit testing framework, based on Erich Gamma's JUnit and Kent Beck's Smalltalk testing framework.
Open an arbitrary URL.
An extensible library for opening URLs using a variety of protocols
Parse (absolute and relative) URLs.
Hook to allow user-specified customization code to run.
A more or less complete user-defined wrapper around dictionary objects.
A more or less complete user-defined wrapper around list objects.
A user-defined wrapper around string objects
Implementation of the UUencode and UUdecode functions.
UUID objects (universally unique identifiers) according to RFC 4122.
Python part of the warnings subsystem.
Stuff to parse WAVE files.
Weak reference support for Python.
Interfaces for launching and remotely controlling Web browsers.
Guess which db package to use to open a db file.
a WSGI (PEP 333) Reference Library
Implements (a subset of) Sun XDR -- eXternal Data Representation.
Extended XML support for Python
A parser for XML, using the derived class as static DTD.
An XML-RPC client interface for Python.
Read and write ZIP files.

There are a few entries that are slightly odd, such as robotparser and ast, which are due to the __doc__ property of those modules being formatted differently from the rest, and me being too idle to fix them.

Caveat: the list was generated in Python 2.7 running on Fedora 15, so it's possible my stdlib isn't completely standard.

Converting old App Engine code to Python 2.7/Django 1.2/webapp2

Posted by John Smith on

I'm borrowing the code for this blog for another project I'm working on, and it seemed to make sense to take the opportunity to bring it up to speed with the latest-and-greatest in the world of App Engine, which is:

  • Python 2.7 (the main benefit for me; I don't like having to dick around with 2.6 or 2.5 installations)
  • multithreading (not really needed for the negligible traffic I get, but worth having, especially given that the new billing scheme seems to assume you'll have this enabled if you don't want to be ripped off)
  • webapp2 (which seems to the recommended serving mechanism if you're not going to a "proper" Django infrastructure)
  • Django 1.2 templating (I'd used this on a work project a few months ago, but the blog was still using 0.96

Of course, having so many changed elements in the mix in a single hit is a recipe for disaster; with things breaking left, right and centre, trying to work out what the cause was was a bit needle-in-a-haystackish. It didn't help that the Py2.7 docs on the official site are still very sketchy, so I ended up digging through the library code quite a bit to suss out what was happening.

As far as I can tell, I've now got everything fixed and working - although this site is still running the old code, as the Python 2.7 runtime has a dependency on the HR datastore, and this app is still using Master/Slave.

I ended up writing a mini-app, in order to develop and test the fixes without all the cruft from my blog code, which I'll see about uploading to my GitHub account at some point. In the mean-time, here are my notes about the stuff I changed. I'm sure there are things which are sub-optimal or incomplete, but hopefully they might save someone else time...


  • Change runtime from python to python27
  • Add threadsafe: true
  • Add a libraries section: libraries: - name: django version: "1.2"
  • Change handler script references from to
  • Only scripts in the top-level directory work as handlers, so if you have any in subdirectories, they'll need to be moved, and the script reference changed accordingly: - url: /whatever # This doesn't work ... # script: lib/some_library/ # ... this does work script:


  • In Django 1.2 escaping is enabled by default. If you need HTML to be passed through unmolested, use something like: {% autoescape off %} {{ myHTMLString }} {% endautoescape %}
  • If you're using {% extends %}, paths are referenced relative to the template base directory, not to that file. Here's an table showing examples of the old and new values:
    File Old {% extends %} value New {% extends %} value
    base.html N/A N/A
    admin/adminbase.html "../base.html" "base.html"
    admin/index.html "adminbase.html" "admin/adminbase.html"
  • If you have custom tags or filters, you need to {% load %} them in the template, rather than using webapp.template.register_template_library() in your main Python code.
    Old code (in your Python file): webapp.template.register_template_library('django_custom_tags') New code (in your template): {% load django_custom_tags %} (There's more that has to be done in this area; see below.)

Custom tag/filter code

  • Previously you could just have these in a standalone .py file which would be pulled in via webapp.template.register_template_library(). Instead now you'll have to create an Django app to hold them:
    1. In a Django file, add the new app to INSTALLED_APPS e.g.: INSTALLED_APPS = ('customtags')
    2. Create an app directory structure along the following lines: customtags/ customtags/ customtags/templatetags/ customtags/templatetags/ customtags/templatetags/ Both the files can be zero-length. Replace customtags and django_custom_tags with whatever you want - the former is what should be referenced in INSTALLED_APPS, the latter is what you {% load "whatever" %} in your templates.
    3. In your file(s) in the templatetags/ directory, you need to change the way the new tags/filters are registered at the top of the file.
      Old code: from google.appengine.ext.webapp import template register = template.create_template_register() New code: from django.template import Library register = Library() The register.tag() and register.filter() calls will then work the same as previously.


  • Change from google.appengine.ext import webapp to import webapp2 and change your RequestHandler classes and WSGIApplication accordingly
  • If your WSIApplication ran from within a main() function, move it out.
    Old code:
    def main(): application = webapp.WSGIApplication(...) wsgiref.handlers.CGIHandler().run(application) if __name__ == '__main__': main() New code: app = webapp2.WSGIApplication(...) Note in the new code:
    1. The lack of a run() call
    2. That the WSGIApplication must be called app - if it isn't, you'll get an error like: ERROR 2012-01-29 22:17:37,607] Traceback (most recent call last): File "/proj/3rdparty/appengine/google_appengine_161/google/appengine/runtime/", line 168, in Handle handler = _config_handle.add_wsgi_middleware(self._LoadHandler()) File "/proj/3rdparty/appengine/google_appengine_161/google/appengine/runtime/", line 220, in _LoadHandler raise ImportError('%s has no attribute %s' % (handler, name)) ImportError: has no attribute app
  • Any 'global' changes you might make at the main level won't be applied across every invocation of the RequestHandlers - I'm thinking of things like setting a different logging level, or setting the DJANGO_SETTINGS_MODULE. These have to be done within the methods of your handlers instead. As this is obviously painful to do for every handler, you might consider using custom handler classes to handle the burden - see below.

Rendering Django templates

The imports and calls to render a template from a file need changing.
Old code: from google.appengine.ext.webapp import template ... rendered_content = template.render(template_path, {...}) New code: from django.template.loaders.filesystem import Loader from django.template.loader import render_to_string ... rendered_content = render_to_string(template_file, {...}) As render_to_string() doesn't explicitly get told where your templates live, you need to do this in import os PROJECT_ROOT = os.path.dirname(__file__) TEMPLATE_DIRS = (os.path.join(PROJECT_ROOT, "templates"),)

Custom request handlers

As previously mentioned, where previously you could easily set global environment stuff, these now have to be done in each handler. As this is painful, one nicer solution is to create a special class to set all that stuff up, and then have your handlers inherit from that rather than webapp2.RequestHandler.

Here's a handler to be more talkative in the logs, and which also sets up the DJANGO_SETTINGS_MODULE environment variable. class LoggingHandler(webapp2.RequestHandler): def __init__(self, request, response): self.initialize(request, response) logging.getLogger().setLevel(logging.DEBUG) self.init_time = time.time() os.environ["DJANGO_SETTINGS_MODULE"] = "settings" def __del__(self): logging.debug("Handler for %s took %.2f seconds" % (self.request.url, time.time() - self.init_time)) A couple of things to note:

  1. the webapp2.RequestHandler constructor takes request and response parameters, whereas webapp.RequestHandler just took a single self parameter
  2. Use the .initialize() method to set up the object before doing your custom stuff, rather than __init__(self)

Tweet rendering code library put on GitHub

Posted by John Smith on

I've made public the code I use to render tweets to marked up HTML on the right-hand side of this blog. It's nothing special, either in terms of what it does or how it does it, but I've tried to be thorough at catching edge cases and doing sensible/useful things, so it might come in useful for someone? I was surprised that I couldn't see anything out there that already did this, but I didn't look especially hard, so maybe I have just reinvented the wheel.

The code is on GitHub at Licence is GPLv2.

App Engine: What the docs don't tell you about processing inbound mail

Posted by John Smith on

For a while, I've wanted to add functionality to this blog to allow me to submit content via email; initially just photos, but eventually actual posts as well. As seems par for the course for any code involving email, stuff you'd expect to be simple and straightforward turns out to be anything but :-(

It doesn't help that the App Engine docs on receiving email gloss over a lot of stuff, so this is my attempt to try to fill in the gaps and cover the gotchas, so that others don't have to go through as much hassle as I did.

dev_appserver doesn't simulate inbound attachments

First off, whilst the current dev_appserver (1.4.1) does allow you to simulate sending a mail in, it doesn't have any explicit functionality for email attachments. This means you have the joy of doing your testing on the real App Engine. Now, luckily for me, I was able to (a) test this code in isolation without affecting the public functionality, and (b) do my deployments without any of the hanging that App Engine has a habit of doing every now and again, but it's still a painful way of evolving and testing code.

(Theoretically I imagine it's possible to cut-and-paste in the "raw" email bodies with Content-Type, Content-Disposition, base64 encoded data etc, to test attachments in the dev_appserver but I haven't tried it personally.)

Sender addresses aren't (just) e-mail addresses

As part of the protection against spam (or worse), I have a whitelist of acceptable senders; mails from anyone else get ignored. My first attempt at code for this was along the line of: if mail_msg.sender not in VALID_SENDERS: logging.error("...") return However, the sender property contains the full value of the Sender: header, so it's likely to be set to something like Fred Bloggs <>. Whilst code to support this isn't exactly difficult, it's something that you wouldn't realize you needed to do when doing pseudo-mails on dev_appserver. Here's my code: is_valid = False for valid_sender in blog_settings.VALID_MAIL_SENDERS: if mail_msg.sender.find(valid_sender) >= 0: is_valid = True break if not is_valid: logging.error("Received mail from invalid sender '%s' - ignoring" % mail_msg.sender) return Now, this isn't perfect by any means - it should probably look for an exact match within the angle brackets, so that it doesn't get fooled by an email address in the "real name part" - but given how easy it is to fake a sender, I'm not too concerned; I have other protections in place, this is just a basic filter.

If there aren't any attachments, the attachments property doesn't exist, rather than being None

It's covered in this short thread but in summary: rather than having the attachments property be None or [] if an email lacks attachments, it doesn't actually exist, and so you have to use a try/except handler. Again, this is nothing difficult, but it is something you wouldn't necessarily realize until it bit you. try: logging.debug("Mail from %s has %d attachments" % (mail_msg.sender, len(mail_msg.attachments))) except AttributeError, e: logging.warning("Mail from %s has zero attachments - ignoring" % (mail_msg.sender)) return

You have to work out the attachment MIME type for yourself

The attachments property (if it exists) is a list of 2-member tuples. The first part of the tuple is the filename, the second the content. It would be nice if App Engine provided another member containing the MIME type that's defined in the Content-Type header where the filename is also specified, but unfortunately not :-( Instead you have to work it out for yourself, whether from the filename suffix, doing a magic number check on the file or using the original property to parse the message yourself.

Now, it's true that what a sender says the file type is shouldn't be blindly trusted to be legit or correct. However, it wouldn't hurt to have that information to use in an initial check for the >99% of cases that it is OK.

If you're going to trust the file extension (which is probably easier to fake than the MIME type...), you might want to look at google.appengine.api.mail, which has an EXTENSION_MIME_MAP dictionary. I've not used it personally - I'm currently only interested in a handful of common image formats - but it might be a reasonable base for working out the MIME type.

Attachments need decoding

The second member of the tuple in the attachments list is a google.appengine.api.mail.EncodedPayload. This has to be decoded using something along the lines of: for att in mail_msg.attachments: filename, encoded_data = att data = encoded_data.payload if encoded_data.encoding: data = data.decode(encoded_data.encoding) ... That class doesn't seem to support the len() function, so I'm not sure how you might protect yourself against a huge attachment that either can't be decoded before the timeout hits, or takes up more memory than App Engine is prepared to give you. I'm also assuming that the .decode method covers all the encodings that you might potentially receive. (Although I'm yet to see anything that isn't base64 in my own tests.)

Plain text bodies need decoding as well

You can explicitly request the plain-text message bodies (as opposed to any HTML bodies), but somewhat surprisingly, these aren't actually plain text! Instead they are EncodedPayload objects, and need decoding in a similar manner to the attachments. for b in mail_msg.bodies("text/plain"): body_type, pl = b try: if pl.encoding: logging.debug("Body: %s" % (pl.payload.decode(pl.payload.encoding))) else: logging.debug("Body: %s" % (pl.payload)) except Exception, e: logging.debug("Body: %s" % (pl)) (It wouldn't surprise me if the above code might have Unicode issues on certain content, but that's unlikely to be an issue in my own personal use.)

Email processing does retry if the code bombs (I think)

I'm not 100% sure on this one, and IMHO it's more of a positive feature than a gotcha, but it doesn't seem to be in the docs, so it's worth mentioning - the mail processing seems to work similar to task queue jobs, in that if a failure occurs, there are retries at gradually increasing intervals.


I'm sure there are other nasties involved in processing incoming emails, but my code seems to work fine now, so hopefully the above lessons might be of use to anyone else about to venture into this area. (Doubtless about 5 minutes after posting this I'll find that either I've been doing this all wrong, or that all of the above is fully documented somewhere that I haven't seen...)

First release of my App Engine library for easier memcaching of pages

Posted by John Smith on

I've just pushed memcachablehandler to GitHub, which is a small Python App Engine library to make it easy to memcache pages - or images, or anything else you might serve up - and re-serve them without having to regenerate them from a Django template or suchlike. This should speed up response times ever so slightly, and also maybe make things more reliable as well (based on my personal experience with the memcache vs datastore availability).

The library is a slightly-tweaked version of some of the code that I've had in this blog for the past few days, so hopefully it's not too buggy. I know I'm not the first to write something like this - see the README for a link to something similar - but maybe it could come in useful to someone else?

I don't currently have any plans to extend the functionality beyond what's already there, but anything that gets updated in this blog should get pushed into that repo in fairly short order. At some point I'll probably make this blog code public as well, but I want to get it in a much more polished state before daring to show it to the world :-)

Installing pywebsocket and samples on Fedora

Posted by John Smith on

I've been playing a bit with a Python implementation of WebSockets over the past couple of days, and whilst the documentation does cover everything, it's a bit uncentralized. Here are some of my notes about getting it up and running on Fedora 11 and 12...



Support for WebSockets across all browsers is still an issue, but I've tested this successfully with the following browsers as of :

  • Firefox 4.0 nightly build (Fedora, WinXP)
  • WebKit nightly (Fedora)
  • SRWare Iron 6.0.475 (WinXP)
  • Safari 5.0.2 (WinXP)
The following however didn't work:
  • SRWare Iron 5.0.382 (WinXP)
  • Firefox 3.6.10 (WinXP)
As far as I know, WebSockets functionality works equivalently across all operating systems, but I've indicated the specific configurations I've tried just in case.

The protocol has changed over time, I believe Opera and older versions of Chrome only support an older version, which can be made to work with this library according to the documentation, but which I haven't tried personally.

As I understand it, there are no plans for WebSockets to be supported in IE9 - surprise, surprise...

Server side packages

Although the download includes a standalone server, I've only used the module within Apache.

You need mod_python (Fedora package is similarly named) for mod_pywebsocket.

The Ruby and Wakachi sample applications use MeCab, a library for Japanese language handling. The Fedora packages needed are for that are:

  • python-mecab
  • mecab
  • mecab-ipadic
  • mecab-ipadic-EUCJP


The project page is The source can be checked out from Subversion via: svn checkout pywebsocket-read-only

To build and install the library, do the following: cd pywebsocket-read-only/src python build su {enter root password} python install {exit root session}

Verify that the module has been installed within your PYTHONPATH by importing it from an interactive Python session without errors: python import mod_pywebsocket {exit python session}

For the purposes of these tests, I created a new Apache configuration file, serving on port 8003. Create a file /etc/httpd/conf.d/websocket_test.conf with the following content, modifying the highlighted directory paths as appropriate: # An Apache config for the testing the pywebsocket samples # # Put this in /etc/httpd/conf.d and restart Apache # This line should be unnecessary # PythonPath "sys.path+['/websock_lib']" <IfModule python_module> PythonOption mod_pywebsocket.handler_root /proj/3rdparty/pywebsocket/websocket-sample-read-only/python PythonHeaderParserHandler mod_pywebsocket.headerparserhandler </IfModule> Listen 8003 NameVirtualHost *:8003 <VirtualHost *:8003> DocumentRoot /proj/3rdparty/pywebsocket/websocket-sample-read-only/html Options Indexes MultiViews FollowSymLinks ServerName ErrorLog logs/pywebsocket_error.log CustomLog logs/pywebsocket_access.log common ErrorDocument 404 /error.html </VirtualHost>

Restarting Apache will pick up this new configuration.


The sample applications are a separate project located at Checking out the code can be done via svn checkout websocket-sample-read-only

The count and litechat sample applications use local files to store state.

These files need to be made writable by the account that runs Apache:

  • python/pub/litechat/messages
  • python/pub/count/count
If you don't make this permission change, then you're liable to get the browser applications closing the connection prematurely, which initially made me think it was a problem on the client side rather than the server side.

These files need to be updated to reflect the location of the aforementioned files:

  • python/pub/litechat/
  • python/pub/count/
The path defined in the 'file' variable needs changing along these lines: diff 86c86 < file = '/proj/3rdparty/pywebsocket/websocket-sample-read-only/python/pub/count/count' --- > file = '/home/komasshu/websock_handler/pub/count/count'

At this point you should hopefully be good to try the sample applications...


Shows the number of connected clients


A simple chat application where you enter a message to broadcast to the other connected clients.

ruby and wakachi

These are a bit more complicated to understand, as they assume you know something about how the Japanese language works.

Japanese is normally written without any spaces between words. The wakachi demo processes some sample text, putting spaces between the words.

Japanese also has 3 alphabets. Two of them are syllabic in nature - the symbol/ideograph directly indicates how it should be pronounced. However kanji can have multiple ways of being read. Ruby characters are kana which can be printed alongside the kanji in a smaller font to indicate the correct pronunciation. The ruby demo takes the same regular Japanese text, processing it to add these ruby characters where needed.

I should point out that I wasn't able to get the XHR functionality of these two applications to work - the Apache logs showed 404 errors. As the point of these applications is to show WebSockets in action, I didn't bother investigating what the problem was with the XHR stuff.

About this blog

This blog (mostly) covers technology and software development.

Note: I've recently ported the content from my old blog hosted on Google App Engine using some custom code I wrote, to a static site built using Pelican. I've put in place various URL manipulation rules in the webserver config to try to support the old URLs, but it's likely that I've missed some (probably meta ones related to pagination or tagging), so apologies for any 404 errors that you get served.

RSS icon, courtesy of RSS feed for this blog

About the author

I'm a software developer who's worked with a variety of platforms and technologies over the past couple of decades, but for the past 7 or so years I've focussed on web development. Whilst I've always nominally been a "full-stack" developer, I feel more attachment to the back-end side of things.

I'm a web developer for a London-based equities exchange. I've worked at organizations such as News Corporation and Google and BATS Global Markets. Projects I've been involved in have been covered in outlets such as The Guardian, The Telegraph, the Financial Times, The Register and TechCrunch.

Twitter | LinkedIn | GitHub | My CV | Mail

Popular tags

Other sites I've built or been involved with


Most of these have changed quite a bit since my involvement in them...