Posted by John Smith on Thu 13 January 2011

I've made public the code I use to render tweets to marked up HTML on the right-hand side of this blog. It's nothing special, either in terms of what it does or how it does it, but I've tried to be thorough at catching edge cases and doing sensible/useful things, so it might come in useful for someone? I was surprised that I couldn't see anything out there that already did this, but I didn't look especially hard, so maybe I have just reinvented the wheel.

The code is on GitHub at https://github.com/menboku/tweet2html. Licence is GPLv2.

App Engine: What the docs don't tell you about processing inbound mail

Posted by John Smith on Mon 10 January 2011

For a while, I've wanted to add functionality to this blog to allow me to submit content via email; initially just photos, but eventually actual posts as well. As seems par for the course for any code involving email, stuff you'd expect to be simple and straightforward turns out to be anything but :-(

It doesn't help that the App Engine docs on receiving email gloss over a lot of stuff, so this is my attempt to try to fill in the gaps and cover the gotchas, so that others don't have to go through as much hassle as I did.

dev_appserver doesn't simulate inbound attachments

First off, whilst the current dev_appserver (1.4.1) does allow you to simulate sending a mail in, it doesn't have any explicit functionality for email attachments. This means you have the joy of doing your testing on the real App Engine. Now, luckily for me, I was able to (a) test this code in isolation without affecting the public functionality, and (b) do my deployments without any of the hanging that App Engine has a habit of doing every now and again, but it's still a painful way of evolving and testing code.

(Theoretically I imagine it's possible to cut-and-paste in the "raw" email bodies with Content-Type, Content-Disposition, base64 encoded data etc, to test attachments in the dev_appserver but I haven't tried it personally.)

Sender addresses aren't (just) e-mail addresses

As part of the protection against spam (or worse), I have a whitelist of acceptable senders; mails from anyone else get ignored. My first attempt at code for this was along the line of: if mail_msg.sender not in VALID_SENDERS: logging.error("...") return However, the sender property contains the full value of the Sender: header, so it's likely to be set to something like Fred Bloggs <fred@bloggs.com>. Whilst code to support this isn't exactly difficult, it's something that you wouldn't realize you needed to do when doing pseudo-mails on dev_appserver. Here's my code: is_valid = False for valid_sender in blog_settings.VALID_MAIL_SENDERS: if mail_msg.sender.find(valid_sender) >= 0: is_valid = True break if not is_valid: logging.error("Received mail from invalid sender '%s' - ignoring" % mail_msg.sender) return Now, this isn't perfect by any means - it should probably look for an exact match within the angle brackets, so that it doesn't get fooled by an email address in the "real name part" - but given how easy it is to fake a sender, I'm not too concerned; I have other protections in place, this is just a basic filter.

If there aren't any attachments, the attachments property doesn't exist, rather than being None

It's covered in this short thread but in summary: rather than having the attachments property be None or [] if an email lacks attachments, it doesn't actually exist, and so you have to use a try/except handler. Again, this is nothing difficult, but it is something you wouldn't necessarily realize until it bit you. try: logging.debug("Mail from %s has %d attachments" % (mail_msg.sender, len(mail_msg.attachments))) except AttributeError, e: logging.warning("Mail from %s has zero attachments - ignoring" % (mail_msg.sender)) return

You have to work out the attachment MIME type for yourself

The attachments property (if it exists) is a list of 2-member tuples. The first part of the tuple is the filename, the second the content. It would be nice if App Engine provided another member containing the MIME type that's defined in the Content-Type header where the filename is also specified, but unfortunately not :-( Instead you have to work it out for yourself, whether from the filename suffix, doing a magic number check on the file or using the original property to parse the message yourself.

Now, it's true that what a sender says the file type is shouldn't be blindly trusted to be legit or correct. However, it wouldn't hurt to have that information to use in an initial check for the >99% of cases that it is OK.

If you're going to trust the file extension (which is probably easier to fake than the MIME type...), you might want to look at google.appengine.api.mail, which has an EXTENSION_MIME_MAP dictionary. I've not used it personally - I'm currently only interested in a handful of common image formats - but it might be a reasonable base for working out the MIME type.

Attachments need decoding

The second member of the tuple in the attachments list is a google.appengine.api.mail.EncodedPayload. This has to be decoded using something along the lines of: for att in mail_msg.attachments: filename, encoded_data = att data = encoded_data.payload if encoded_data.encoding: data = data.decode(encoded_data.encoding) ... That class doesn't seem to support the len() function, so I'm not sure how you might protect yourself against a huge attachment that either can't be decoded before the timeout hits, or takes up more memory than App Engine is prepared to give you. I'm also assuming that the .decode method covers all the encodings that you might potentially receive. (Although I'm yet to see anything that isn't base64 in my own tests.)

Plain text bodies need decoding as well

You can explicitly request the plain-text message bodies (as opposed to any HTML bodies), but somewhat surprisingly, these aren't actually plain text! Instead they are EncodedPayload objects, and need decoding in a similar manner to the attachments. for b in mail_msg.bodies("text/plain"): body_type, pl = b try: if pl.encoding: logging.debug("Body: %s" % (pl.payload.decode(pl.payload.encoding))) else: logging.debug("Body: %s" % (pl.payload)) except Exception, e: logging.debug("Body: %s" % (pl)) (It wouldn't surprise me if the above code might have Unicode issues on certain content, but that's unlikely to be an issue in my own personal use.)

Email processing does retry if the code bombs (I think)

I'm not 100% sure on this one, and IMHO it's more of a positive feature than a gotcha, but it doesn't seem to be in the docs, so it's worth mentioning - the mail processing seems to work similar to task queue jobs, in that if a failure occurs, there are retries at gradually increasing intervals.

I'm sure there are other nasties involved in processing incoming emails, but my code seems to work fine now, so hopefully the above lessons might be of use to anyone else about to venture into this area. (Doubtless about 5 minutes after posting this I'll find that either I've been doing this all wrong, or that all of the above is fully documented somewhere that I haven't seen...)

Iconography

Posted by John Smith on Sat 08 January 2011

Carphone Warehouse had a brochure in their stores just before Christmas extolling the virtues of Android and the handsets they were selling. Just flicking through it yesterday, I noticed something amiss on the third page...

Low-resolution scan of pages 2 and 3 of the Android promo brochure distributed in Carphone Warehouse branches in late 2010

Specifically the right hand side...

Close up of an icon shown on page 3 of the brochure

Thing is, neither of the Android devices I have use that icon, instead they use this:

However, the icon in the brochure shows up twice in the top 5 Google image search results on 'google maps icon', I wonder where it might come from, maybe the actual location depicted on the icon itself?

Grab from Google Maps browser application, showing Apple's Cupertino HQ, which is the location in the icon used in the CPW Android brochure

Mind you, the copy in the brochure isn't much better. Given the recent uproar about Experia X10s not getting Android 2.2 I wonder what the likes of Trading Standards would make of this:

Clipping from page 2 that talks about free updates and constant improvements that are downloaded straight to your phone

Paste and don't go

Posted by John Smith on Tue 04 January 2011

One of the nicest minor tweaks in the latest versions of Chrome, Opera and Firefox is the "paste and go" option in the right-click context menu when you copy a URL into the address bar. Safari is on a much slower upgrade cycle though, so you would have to follow pasting the URL by hitting enter to actually load it.

This wouldn't be too bad, except that the Windows version of Safari has a bug. Try the following:

Load some arbitrary page
Copy and paste a different URL into the address field using the r-click menu (not Ctrl-V!)
Hit enter

Rather than load the new URL, Windows Safari goes back to the original URL - unlike its OS X version and every other Windows browser. (It does do the right thing if you paste via Ctrl-V though.)

Now, I guess the usage figures for Safari on Windows are pretty pathetic in the overall scheme of things, so not many people will care. I personally don't use it that much either, but the main use-case I have is when testing pages in multiple browsers (or if I want a "clean" environment without cookies from prior testing) - and C&Ping; URLs into the address field is the main way of doing this...

I've submitted this as a bug via the Safari option, but as Apple don't seem to have a public issue tracker - unless this is part of the WebKit tracking? I haven't checked - I'm just blogging this now to have a record of the issue for posterity. Hopefully they'll follow the pack and add a "paste-and-go" option, making this bug moot.

Experimental stacked bar chart in SVG with JavaScript interactivity

Posted by John Smith on Sun 26 December 2010

Stacked bar charts are quite nice for presenting a relatively large amount of data in a compact space, but they have a major failing, in that it's difficult to compare similar values in different columns, unless the value is the bottom-most one in the bars.

As an experiment, I've played with adding some basic interactivity to try to address this. Below is an SVG chart that initially appears to be nothing out of the ordinary.

However, you can click on the individual parts of each bar to align baseline of the similar parts in the other bars, which then makes comparisons much easier. Clicking on a part twice reverts the alignment to the default state. This is all done via standard JavaScript and DOM event handlers - the main pain was that no browser supports CSS3 transitions in SVG, so I had to knock up some simple animation code. (I'm sure jQuery and similar libraries also facilitate this, but I wanted to have a completely standalone file.) EDIT: On reflection, I think I'm talking rubbish - CSS3 transitions can't be used on shape positions/sizes because they are part of the element itself, not a separate styling that's applied to it.

This is obviously super-basic; it might be good to do things like adjusting the Y-axis label numbering so that 0 is aligned with the selected element's baseline. However, I'm bored of this now, so I'm posting this up before I forget about it completely :-) The SVG code is currently entirely handcrafted, but now that all the basic concepts are in place, creating a graph from some datasource would be a fairly mundane exercise.

EDIT: I've just realized the SVG file has some console.log()s still in, which caused the animation not to work on Firefox 3.x - should now be fixed.

Proper domain name, short(-ish) links

Posted by John Smith on Wed 22 December 2010

Couple of minor improvements to this blog...

I finally got round to registering a proper domain - I'd not bothered before now, as when you've got a name like mine pretty much all the good variants have gone. However john-smith.me was available at an acceptable price, so I got out the credit card and gave myself an early Christmas present. The appspot.com address should still work fine, if you're a fan of unnecessary typing.
Now that the site URL is a more reasonable length, it seemed worth doing a very crude link shortener as well. Any new or edited articles will get given an alphanumeric code of 1-3 characters, which will redirect to the relevant article's full URL. So the short URL for this post should be http://john-smith.me/1.

Nothing terribly exciting in the overall scheme of things, but it all helps edge the blog code ever closer to something that's functionally comparable with more established offerings.

First release of my App Engine library for easier memcaching of pages

Posted by John Smith on Tue 21 December 2010

I've just pushed memcachablehandler to GitHub, which is a small Python App Engine library to make it easy to memcache pages - or images, or anything else you might serve up - and re-serve them without having to regenerate them from a Django template or suchlike. This should speed up response times ever so slightly, and also maybe make things more reliable as well (based on my personal experience with the memcache vs datastore availability).

The library is a slightly-tweaked version of some of the code that I've had in this blog for the past few days, so hopefully it's not too buggy. I know I'm not the first to write something like this - see the README for a link to something similar - but maybe it could come in useful to someone else?

I don't currently have any plans to extend the functionality beyond what's already there, but anything that gets updated in this blog should get pushed into that repo in fairly short order. At some point I'll probably make this blog code public as well, but I want to get it in a much more polished state before daring to show it to the world :-)

NYT Chrome app is probably the buggiest thing I've ever seen

Posted by John Smith on Tue 14 December 2010

I should probably have read the Hacker News thread first, but Christ, what an atrocious piece of buggy shit this is. How on earth it managed to get positioned as one of the top launch apps on the Chrome App Store, one can only guess at.

Running in Chrome 8 initially, I noticed that the pagination algorithm seems completely borked. Very minor resizing of the browser window causes the indicated number of pages for the story to randomly fluctuate - I think I managed to get the same article to claim to be between 1 and 13 pages in length, at least if the footer on the right is to be believed.

Screengrab of NYT Chrome app, claiming to show page 1 of a 6 page story

Screengrab of NYT Chrome app, now saying the same story is 4 pages long

Screengrab of NYT Chrome app, this time saying story is 10 pages long

OK then, let's start advancing through this ten page story. The second page is fine - if rather text-heavy and image light - but the third page is slightly empty though...

Screengrab of NYT Chrome app, showing page 3 of 10, but the story seems to end

... I'm sure there must be more to come though ...

Screengrab of NYT Chrome app, showing a blank page 4 of 10

... oh.

Safari is unsurprisingly similar. What is a surprise though, is that this is as good as it gets.

Opera 10.63 just keeps kicking you back to the front page every time you click on a story link, whereas on Firefox you have a choice of illegibility or invisibility...

Screengrab of NYT Chrome app in Firefox 3.6.13 - text is illegible due to multiple paragraphs appearing on top of each other

Screengrab of NYT Chrome app in Firefox 4 beta - only the navigation sidebar appears, the rest of the page is blank

One can only imagine what further travesties might await were I to dare view it in IE...

Steve Jobs' grinning turd

Posted by John Smith on Sat 11 December 2010

Given the App Store's reputation for prurience when it comes to approving things, it's slightly eyebrowraising to find this delightful fellow in the iOS Unicode character set:

Screengrab from an iPod Touch showing unicode character e05a in Mobile Safari

This came to my attention via this post on Asiajin, which is largely derived from this Japanese-language blog post.

I recalled seeing some weird colour bitmap icons hidden a long way down, when using UnicodeTable a while ago. Digging around again, it turns out that this is character 0xE05A in the private use area (PDF link), where vendors can - and do - put whatever they like.

On Windows 7, OS X Snow Leopard and Android 2.1, nothing seems to be defined for these characters in the default browser font. On Fedora, they are a mix of Asian (?) glyphs and dingbats; 0xE023 is a smiley with an eyepatch, for example. On iOS... well you can see for yourself in the above screengrab, or go to this link if you don't believe me. The character entity itself is , which will render as a placeholder box or similar on machines without a glyph defined for it.

The de-facto standard icon for "microphone" is rubbish

Posted by John Smith on Thu 09 December 2010

I've been mucking around a bit with the speech recognition stuff that's been added to the <input> element in Chrome 8, and I have to wonder how discoverable this functionality is going to be for the average user.

If you add speech x-webkit-speech to your <input> tag, the resultant form looks something like this: Clicking on the icon opens up a small popup prompting the user to say something, the popup also has the same icon.

Now, when I see that icon, I don't personally immediately think, "Aha, a microphone". Assuming I'm not completely abnormal, how many users are going to click on an icon they don't recognize, without some sort of external prompting text telling them what it is? (NB: this isn't a design decision specific to Chrome; my Dell netbooks also have a similar icon on their microphone jack inputs.)

As a sanity check, I've done some brief checks on Google and Bing's image searches for "microphone", and fail to find anything that looks like this icon - the top results are a mix of "bisected globe" microphones (which is what I'd consider to be the most widely known variant), and "rounded cuboid" ones, which are closer to the icon Chrome uses, but not a close resemblance in my opinion.

Now rather than merely whinge as usual I've tried to be constructive for once, so here's my crude attempt at an icon that looks more like what I'd expect. As I'm not a graphic designer or UI expert, I'm sure something much better could be done - it doesn't scale down to 16x16 very well for starters - but hopefully it's better than nothing. The first file below is the original SVG, so anyone could tweak it in InkScape or similar apps; licence is WTFPL.

UPDATE: I've just found that there's a Unicode microphone glyph, which looks to use the "bisected globe" type. However, I haven't found a machine/font yet which contains this glyph, and the sample image contains musical notes, which are a bit out-of-context for speech recognition text inputs.

« Page 4 / 6 »

John Smith's Blog

Ramblings (mostly) about technical stuff

Tweet rendering code library put on GitHub