Posted by John Smith on Thu 29 March 2012

For a while now, I've been itching to find an excuse to something in SVG again, so when there were a couple of threads last week on Hacker News about people's most liked and most disliked languages, it felt like an ideal opportunity.

You can view a wider, more legible, version of the scatter plot via this link. I've used logarithmic scaling, as using a regular linear scale, there was just a huge mess in the bottom left corner.

'Like' votes are measured horizontally, 'dislikes' vertically - so the ideal place to be is low down on the right, and the worst is high up on the left. The results are as captured at 2012/03/28 - I'd taken a copy a couple of days earlier, and there had been some changes in the interim, but only by single-digit percentages.

Some thoughts and observations:

The poll this data comes from is somewhat imperfect, as already mentioned in the comments in the thread itself. I should also point out that another poster on that thread also did a similar like vs dislike analysis, but I didn't see that post until I'd already started on this.
HN is a very pro-Python place - just compare all the threads related to PyCon 2012 versus the lack of noise after most other conferences - so it's hardly surprising who the "winner" is in such a voter base. I do find it odd though that Python doesn't seem to have such a good showing in other corners of the HN world. e.g. of the (relatively few) HN London events I've been to, I don't recall hearing many (any?) of the speakers using Python for their projects/companies - whereas "losers" such as Java and PHP do get namechecked fairly often.
I'm amused that CoffeeScript is liked at exactly the same ratio as JavaScript - 76%.
I was tempted to do some sort of colour-coding by language type (interpreted vs compiled), age etc - but at initial glance, I don't see any real trends that might indicate why a certain school/group of languages do well or badly.

Someone please stop this shortlink-to-shortlink-to-shortlink-to-shortlink insanity

Posted by John Smith on Sun 18 March 2012

On an idle Sunday afternoon, I was browsing my Twitter stream, and came across the following tweet which looked of interest: Screengrab of a tweet with a link apparently to stks.co/2v3c I was actually using the iPad Twitter client at the time, and when I clicked - or rather pressed - on the link, noticed a number of changes in the URL/title bar that appears at the bottom of the app window.

This piqued my interest, so I did a bit more investigation about what was happening on a platform more conducive to my nosiness. Below is a cut-n-paste from the curl session - don't worry about the details, I summarize it further down.

[john@hamburg tmp]$ curl -L -v -v "http://t.co/1mkYnU0v"
* About to connect() to t.co port 80 (#0)
*   Trying 199.59.148.12... connected
* Connected to t.co (199.59.148.12) port 80 (#0)
> GET /1mkYnU0v HTTP/1.1
> User-Agent: curl/7.21.3 (x86_64-redhat-linux-gnu) libcurl/7.21.3 NSS/3.13.1.0 zlib/1.2.5 libidn/1.19 libssh2/1.2.7
> Host: t.co
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Date: Sun, 18 Mar 2012 16:41:20 GMT
< Server: hi
< Location: http://stks.co/2v3c
< Cache-Control: private,max-age=300
< Expires: Sun, 18 Mar 2012 16:46:20 GMT
< Content-Length: 0
< Connection: close
< Content-Type: text/html; charset=UTF-8
< 
* Closing connection #0
* Issue another request to this URL: 'http://stks.co/2v3c'
* About to connect() to stks.co port 80 (#0)
*   Trying 174.129.233.169... connected
* Connected to stks.co (174.129.233.169) port 80 (#0)
> GET /2v3c HTTP/1.1
> User-Agent: curl/7.21.3 (x86_64-redhat-linux-gnu) libcurl/7.21.3 NSS/3.13.1.0 zlib/1.2.5 libidn/1.19 libssh2/1.2.7
> Host: stks.co
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Server: nginx/0.7.65
< Date: Sun, 18 Mar 2012 16:41:20 GMT
< Content-Type: text/html
< Connection: close
< X-Powered-By: PHP/5.3.2-1ubuntu4.9
< Set-Cookie: snowball=6f87bf80-df9c-4bc5-932e-12ff806b0374; expires=Mon, 18-Mar-2013 16:41:20 GMT; path=/; domain=stks.co
< Content-Encoding: none
< Location: http://t.co/tH0MUiYn
< Content-Length: 1
< 
* Closing connection #0
* Issue another request to this URL: 'http://t.co/tH0MUiYn'
* About to connect() to t.co port 80 (#0)
*   Trying 199.59.148.12... connected
* Connected to t.co (199.59.148.12) port 80 (#0)
> GET /tH0MUiYn HTTP/1.1
> User-Agent: curl/7.21.3 (x86_64-redhat-linux-gnu) libcurl/7.21.3 NSS/3.13.1.0 zlib/1.2.5 libidn/1.19 libssh2/1.2.7
> Host: t.co
> Accept: */*
> 
< HTTP/1.1 301 Moved Permanently
< Date: Sun, 18 Mar 2012 16:41:21 GMT
< Server: hi
< Location: http://buswk.co/yGATqD
< Cache-Control: private,max-age=300
< Expires: Sun, 18 Mar 2012 16:46:21 GMT
< Content-Length: 0
< Connection: close
< Content-Type: text/html; charset=UTF-8
< 
* Closing connection #0
* Issue another request to this URL: 'http://buswk.co/yGATqD'
* About to connect() to buswk.co port 80 (#0)
*   Trying 168.143.174.97... connected
* Connected to buswk.co (168.143.174.97) port 80 (#0)
> GET /yGATqD HTTP/1.1
> User-Agent: curl/7.21.3 (x86_64-redhat-linux-gnu) libcurl/7.21.3 NSS/3.13.1.0 zlib/1.2.5 libidn/1.19 libssh2/1.2.7
> Host: buswk.co
> Accept: */*
> 
< HTTP/1.1 301 Moved
< Server: nginx
< Date: Sun, 18 Mar 2012 16:41:21 GMT
< Content-Type: text/html; charset=utf-8
< Connection: keep-alive
< Set-Cookie: _bit=4f661031-00291-0797b-3d1cf10a;domain=.buswk.co;expires=Fri Sep 14 16:41:21 2012;path=/; HttpOnly
< Cache-control: private; max-age=90
< Location: http://www.businessweek.com/news/2012-03-16/chinese-companies-forced-to-falsify-economic-data-bureau-says
< MIME-Version: 1.0
< Content-Length: 197
< 
* Ignoring the response-body
* Connection #0 to host buswk.co left intact
* Issue another request to this URL: 'http://www.businessweek.com/news/2012-03-16/chinese-companies-forced-to-falsify-economic-data-bureau-says'
* About to connect() to www.businessweek.com port 80 (#1)
*   Trying 77.67.40.33... connected
* Connected to www.businessweek.com (77.67.40.33) port 80 (#1)
> GET /news/2012-03-16/chinese-companies-forced-to-falsify-economic-data-bureau-says HTTP/1.1
> User-Agent: curl/7.21.3 (x86_64-redhat-linux-gnu) libcurl/7.21.3 NSS/3.13.1.0 zlib/1.2.5 libidn/1.19 libssh2/1.2.7
> Host: www.businessweek.com
> Accept: */*
> 
< HTTP/1.1 200 OK
< Server: Apache/2.2.9 (Unix) mod_ssl/2.2.9 OpenSSL/0.9.8e-fips-rhel5 mod_jk/1.2.31
< X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 3.0.1
< X-UA-Compatible: IE=Edge,chrome=1
< X-Runtime: 0.316604
< X-Rack-Cache: miss
< Status: 200
< benv: njbweb03
< Content-Type: text/html; charset=utf-8
< Cache-Control: must-revalidate, max-age=1640
< Date: Sun, 18 Mar 2012 16:41:22 GMT
< Transfer-Encoding:  chunked
< Connection: keep-alive
< Connection: Transfer-Encoding
< 
{ the HTML of the actual page finally gets served from this point }

First off, as I think most techie people know, the link addresses that the Twitter site and apps show aren't actually what they appear to be, they are in fact using Twitter's own t.co link shortening * service. This in turn redirects to the "real" URL, except in a case such as this, the redirect is to StockTwits own shortener.

Which in turn redirects back to a different URL on t.co.

Which then redirects to Business Week's link shortener.

Which only then redirects you to the real page.

i.e. there are four HTTP redirects before you actually get to what you want. When I did some tests with curl'ing the first t.co and then following the redirects, versus getting the page directly, I was finding that the redirects were adding between 1.5 and seconds on average to the overall time taken. (In a few cases it was closer to 6-7 seconds, although this could possibly be down to one of the servers in the chain throttling back multiple requests in a short time period.)

[john@hamburg tmp]$ for X in `seq 10`
do
  time curl -s -L --no-sessionid -o /dev/null "http://t.co/1mkYnU0v"
  sleep 1
  time curl -s -L --no-sessionid -o /dev/null "http://www.businessweek.com/news/2012-03-16/chinese-companies-forced-to-falsify-economic-data-bureau-says"
  sleep 1 
done
{ results snipped: t.co varied between 1.9s and 7.3s, direct link between 0.2s and 0.9s }

I got similar results from Pingdom - it loaded the page faster, but still had a non-trivial delay from the redirects: Screengrab from pingdom.com load time test on the t.co URL Obviously, my curl tests don't reflect the real browser experience, as they don't show the time taken to load images, CSS, JavaScript etc, but the Pingdom test showed that the redirects took up a second of an overall load time of 4.6 seconds - not a non-trivial delay in my opinion.

Given that there have been any number of articles posted and tools made available for analyzing and improving web page load times, it seems crazy to have external services slowing things down - or even breaking on occasions.

Given that all of the redirects in this example were permanent, maybe Twitter should get even more aggressive with t.co, by evaluating redirect chains like this, and just having the t.co link go straight to the final destination. Browsers seem to have been caching HTTP 301s for a few years now, so maybe web services should too?

Footnote: t.co isn't actually shortening the StockTwits link, as the latter is one character shorter: [john@hamburg tmp]$ echo "http://t.co/1mkYnU0v" | wc -c 21 [john@hamburg tmp]$ echo "http://stks.co/2v3c" | wc -c 20 I guess they're opting to build up a big analytics database of popularly-clicked-links that they'll hope to monetize, over making things better (albeit in a tiny way) for their end users.

Alphabetically sorted list of pure Python stdlib modules

Posted by John Smith on Tue 06 March 2012

(This is a bit of a lame post - 99% was generated by a script - but I wanted an online copy for my own future reference.)

I was reading the notes about the new stuff in Python 3.3, and it struck me that I didn't know anything about a couple of the modules mentioned. (For the record, they were abc and sched - hopefully my ignorance of them isn't too shameful ;-)

This has motivated me to go through the Python standard library and make sure I have at least a cursory knowledge of all the modules - I'm aiming to do one per day. There is a list on python.org, but it is grouped by theme, and I'd rather have a bit of a change from one day to the next, which hopefully an alphabetically sorted list should have a fair chance of achieving.

To this end, I knocked up a basic script to churn through the stdlib directory, which I can then use as a tick list. Maybe it could be of use to someone else too? Important: the list omits libraries which are written in C - these have __doc__ properties formatted differently from the pure Python libraries, and I think the pure libraries are enough for me to be going on with for now :-)

BTW, after I'd written the script to generate this list, I found that there's a similar (but more nicely formatted) list on Doug Hellmann's site, which annoyingly didn't show up in my Google search queries when I started out on this. It does have references for the C libraries, but I also notice a few libraries in the list below that aren't on that page e.g. ast, bdb, code. As I don't (currently!) know what those libraries are, I don't know if there's a particular reason for their omission.

abc: Abstract Base Classes (ABCs) according to PEP 3119.
_abcoll: Abstract Base Classes (ABCs) for collections, according to PEP 3119.
aifc: Stuff to parse AIFF-C and AIFF files.
antigravity: {Undocumented}
anydbm: Generic interface to all dbm clones.
argparse: Command-line parsing library
ast: ast
asynchat: A class supporting chat-style (command/response) protocols.
asyncore: Basic infrastructure for asynchronous socket service clients and servers.
atexit: allow programmer to define multiple exit functions to be executed upon normal program termination.
audiodev: Classes for manipulating audio devices (currently only for Sun and SGI)
base64: RFC 3548: Base16, Base32, Base64 Data Encodings
BaseHTTPServer: HTTP server base class.
Bastion: ification utility.
bdb: Debugger basics
binhex: Macintosh binhex compression/decompression.
bisect: Bisection algorithms.
bsddb: Support for Berkeley DB 4.1 through 4.8 with a simple interface.
calendar: Calendar printing functions
cgi: Support module for CGI (Common Gateway Interface) scripts.
CGIHTTPServer: CGI-savvy HTTP Server.
cgitb: More comprehensive traceback formatting for Python scripts.
chunk: Simple class to read IFF chunks.
cmd: A generic class to build line-oriented command interpreters.
code: Utilities needed to emulate Python's interactive interpreter.
codecs: Python Codec Registry, API and helpers.
codeop: Utilities to compile possibly incomplete Python source code.
collections: {Undocumented}
colorsys: Conversion functions between RGB and other color systems.
commands: Execute shell commands via os.popen() and return status, output.
compileall: Module/script to "compile" all .py files to .pyc (or .pyo) file.
compiler: Package for parsing and compiling Python source code
config: {Not importable - ImportError}
ConfigParser: Configuration file parser.
contextlib: Utilities for with-statement contexts. See PEP 343.
Cookie: Here's a sample session to show how to use this module. At the moment, this is the only documentation.
cookielib: HTTP cookie handling for web clients.
copy: Generic (shallow and deep) copying operations.
copy_reg: Helper to provide extensibility for pickle/cPickle.
cProfile: Python interface for the 'lsprof' profiler. Compatible with the 'profile' module.
csv: CSV parsing and writing.
ctypes: create and manipulate C data types in Python
curses: curses
dbhash: Provide a (g)dbm-compatible interface to bsddb.hashopen.
decimal: This is a Py2.3 implementation of decimal floating point arithmetic based on the General Decimal Arithmetic Specification:
Demo: {Not importable - ImportError}
difflib: helpers for computing deltas between objects.
dircache: Read and cache directory listings.
dis: Disassembler of Python byte code into mnemonics.
distutils: distutils
Doc: {Not importable - ImportError}
doctest: a framework for running examples in docstrings.
DocXMLRPCServer: Self documenting XML-RPC Server.
dumbdbm: A dumb and slow but simple dbm clone.
dummy_thread: Drop-in replacement for the thread module.
dummy_threading: Faux ``threading`` version using ``dummy_thread`` instead of ``thread``.
email: A package for parsing, handling, and generating email messages.
encodings: Standard "encodings" Package
filecmp: Utilities for comparing files and directories.
fileinput: Helper class to quickly write a loop over all standard input files.
fnmatch: Filename matching with shell patterns.
formatter: Generic output formatting.
fpformat: General floating point formatting functions.
fractions: Rational, infinite-precision, real numbers.
ftplib: An FTP client class and some helper functions.
functools: Tools for working with functions and callable objects
__future__: Record of phased-in incompatible language changes.
genericpath: Path operations common to more than one OS Do not use directly. The OS specific modules import the appropriate functions from this module themselves.
getopt: Parser for command line options.
getpass: Utilities to get a password and/or the current user name.
gettext: Internationalization and localization support.
glob: Filename globbing utility.
gzip: Functions that read and write gzipped files.
hashlib: module - A common interface to many hash functions.
heapq: Heap queue algorithm (a.k.a. priority queue).
hmac: HMAC (Keyed-Hashing for Message Authentication) Python module.
hotshot: High-perfomance logging profiler, mostly written in C.
htmlentitydefs: HTML character entity references.
htmllib: HTML 2.0 parser.
HTMLParser: A parser for HTML and XHTML.
httplib: HTTP/1.1 client library
idlelib: {Undocumented}
ihooks: Import hook support.
imaplib: IMAP4 client.
imghdr: Recognize image file formats based on their first few bytes.
importlib: Backport of importlib.import_module from 3.x.
imputil: Import utilities
inspect: Get useful information from live Python objects.
io: The io module provides the Python interfaces to stream handling. The builtin open function is defined in this module.
json: JSON (JavaScript Object Notation) is a subset of JavaScript syntax (ECMA-262 3rd edition) used as a lightweight data interchange format.
keyword: Keywords (from "graminit.c")
lib-dynload: {Not importable - SyntaxError}
lib-tk: {Not importable - SyntaxError}
lib2to3: {Undocumented}
linecache: Cache lines from files.
locale: Locale support.
logging: Logging package for Python. Based on PEP 282 and comments thereto in comp.lang.python, and influenced by Apache's log4j system.
_LWPCookieJar: Load / save to libwww-perl (LWP) format files.
macpath: Pathname and path-related operations for the Macintosh.
macurl2path: Macintosh-specific module for conversion between pathnames and URLs.
mailbox: Read/write support for Maildir, mbox, MH, Babyl, and MMDF mailboxes.
mailcap: Mailcap file handling. See RFC 1524.
markupbase: Shared support for scanning document type declarations in HTML and XHTML.
md5: {Undocumented, with warnings - possibly deprecated?}
mhlib: MH interface -- purely object-oriented (well, almost)
mimetools: Various tools used by MIME-reading or MIME-writing programs.
mimetypes: Guess the MIME type of a file.
MimeWriter: Generic MIME writer.
mimify: Mimification and unmimification of mail messages.
modulefinder: Find modules used by a script, using introspection.
_MozillaCookieJar: Mozilla / Netscape cookie loading / saving.
multifile: A readline()-style interface to the parts of a multipart message.
multiprocessing: {Undocumented}
mutex: Mutual exclusion -- for use with module sched
netrc: An object-oriented interface to .netrc files.
new: Create new objects of various types. Deprecated.
nntplib: An NNTP client class based on RFC 977: Network News Transfer Protocol.
ntpath: Common pathname manipulations, WindowsNT/95 version.
nturl2path: Convert a NT pathname to a file URL and vice versa.
numbers: Abstract Base Classes (ABCs) for numbers, according to PEP 3141.
opcode: module - potentially shared between dis and other modules which operate on bytecodes (e.g. peephole optimizers).
optparse: A powerful, extensible, and easy-to-use option parser.
os: OS routines for Mac, NT, or Posix depending on what system we're on.
os2emxpath: Common pathname manipulations, OS/2 EMX version.
pdb: A Python debugger.
__phello__.foo: {Undocumented}
pickle: Create portable serialized representations of Python objects.
pickletools: "Executable documentation" for the pickle module.
pipes: Conversion pipeline templates.
pkgutil: Utilities to support packages.
platform: This module tries to retrieve as much platform-identifying data as possible. It makes this information available via function APIs.
plat-linux2: {Not importable - SyntaxError}
plistlib: a tool to generate and parse MacOSX .plist files.
popen2: Spawn a command with pipes to its stdin, stdout, and optionally stderr.
poplib: A POP3 client class.
posixfile: Extended file operations available in POSIX.
posixpath: Common operations on Posix pathnames.
pprint: Support to pretty-print lists, tuples, & dictionaries recursively.
profile: Class for profiling Python code.
pstats: Class for printing reports on profiled python code.
pty: Pseudo terminal utilities.
pyclbr: Parse a Python module and describe its classes and methods.
py_compile: Routine to "compile" a .py file to a .pyc (or .pyo) file.
pydoc: Generate Python documentation in HTML or text for interactive use.
pydoc_data: {Undocumented}
_pyio: Python implementation of the io module.
Queue: A multi-producer, multi-consumer queue.
quopri: Conversions to/from quoted-printable transport encoding as per RFC 1521.
random: Random variable generators.
re: Support for regular expressions (RE).
repr: Redo the builtin repr() (representation) but with limits on most sizes.
rexec: Restricted execution facilities.
rfc822: RFC 2822 message manipulation.
rlcompleter: Word completion for GNU readline 2.0.
robotparser
runpy: locating and running Python code using the module namespace
sched: A generally useful event scheduler class.
sets: Classes to represent arbitrary sets (including sets of sets).
sgmllib: A parser for SGML, using the derived class as a static DTD.
sha: {Undocumented, with warnings - possibly deprecated?}
shelve: Manage shelves of pickled objects.
shlex: A lexical analyzer class for simple shell-like syntaxes.
shutil: Utility functions for copying and archiving files and directory trees.
SimpleHTTPServer: Simple HTTP Server.
SimpleXMLRPCServer: Simple XML-RPC Server.
site: Append module search paths for third-party packages to sys.path.
site-packages: {Not importable - SyntaxError}
smtpd: An RFC 2821 smtp proxy.
smtplib: SMTP/ESMTP client class.
sndhdr: Routines to help recognizing sound files.
socket: This module provides socket operations and some related functions. On Unix, it supports IP (Internet Protocol) and Unix domain sockets. On other systems, it only supports IP. Functions specific for a socket are available as methods of the socket object.
SocketServer: Generic socket server classes.
sqlite3: {Undocumented}
sre: This file is only retained for backwards compatibility. It will be removed in the future. sre was moved to re in version 2.5.
sre_compile: Internal support module for sre
sre_constants: Internal support module for sre
sre_parse: Internal support module for sre
ssl: This module provides some more Pythonic support for SSL.
stat: Constants/functions for interpreting results of os.stat() and os.lstat().
statvfs: Constants for interpreting the results of os.statvfs() and os.fstatvfs().
string: A collection of string operations (most are no longer used).
StringIO: File-like objects that read from or write to a string buffer.
stringold: Common string manipulations.
stringprep: Library that exposes various tables found in the StringPrep RFC 3454.
_strptime: Strptime-related classes and functions.
struct: Functions to convert between Python values and C structs represented as Python strings. It uses format strings (explained below) as compact descriptions of the lay-out of the C structs and the intended conversion to/from Python values.
subprocess: Subprocesses with accessible I/O streams
sunau: Stuff to parse Sun and NeXT audio files.
sunaudio: Interpret sun audio headers.
symbol: Non-terminal symbols of Python grammar (from "graminit.h").
symtable: Interface to the compiler's internal symbol tables
sysconfig: Provide access to Python's configuration information.
tabnanny: The Tab Nanny despises ambiguous indentation. She knows no mercy.
tarfile: Read from and write to tar format archives.
telnetlib: TELNET client class.
tempfile: Temporary files.
test: {Undocumented}
textwrap: Text wrapping and filling.
this: The Zen of Python, by Tim Peters
threading: Thread module emulating a subset of Java's threading model.
_threading_local: Thread-local objects.
timeit: Tool for measuring execution time of small code snippets.
toaiff: Convert "arbitrary" sound files to AIFF (Apple and SGI's audio format).
token: Token constants (from "token.h").
tokenize: Tokenization help for Python programs.
Tools: {Not importable - ImportError}
trace: program/module to trace Python program or function execution
traceback: Extract, format and print information about Python stack traces.
tty: Terminal utilities.
types: Define names for all type symbols known in the standard interpreter.
unittest: Python unit testing framework, based on Erich Gamma's JUnit and Kent Beck's Smalltalk testing framework.
urllib: Open an arbitrary URL.
urllib2: An extensible library for opening URLs using a variety of protocols
urlparse: Parse (absolute and relative) URLs.
user: Hook to allow user-specified customization code to run.
UserDict: A more or less complete user-defined wrapper around dictionary objects.
UserList: A more or less complete user-defined wrapper around list objects.
UserString: A user-defined wrapper around string objects
uu: Implementation of the UUencode and UUdecode functions.
uuid: UUID objects (universally unique identifiers) according to RFC 4122.
warnings: Python part of the warnings subsystem.
wave: Stuff to parse WAVE files.
weakref: Weak reference support for Python.
_weakrefset: {Undocumented}
webbrowser: Interfaces for launching and remotely controlling Web browsers.
whichdb: Guess which db package to use to open a db file.
wsgiref: a WSGI (PEP 333) Reference Library
xdrlib: Implements (a subset of) Sun XDR -- eXternal Data Representation.
xml: Extended XML support for Python
xmllib: A parser for XML, using the derived class as static DTD.
xmlrpclib: An XML-RPC client interface for Python.
zipfile: Read and write ZIP files.

There are a few entries that are slightly odd, such as robotparser and ast, which are due to the __doc__ property of those modules being formatted differently from the rest, and me being too idle to fix them.

Caveat: the list was generated in Python 2.7 running on Fedora 15, so it's possible my stdlib isn't completely standard.

Thoughts on Windows 8 Consumer Preview

Posted by John Smith on Mon 05 March 2012

Microsoft released a "Consumer Preview" of Windows 8 last week, and I thought I'd download it and take a look, as it's the first version of Windows that I've ever had any curiosity about, mainly due to the new Metro UI.

I've only spent a few hours playing around in a fairly aimless manner, so this is by no means a thorough review. In general, I agree with most of the points made in this Orlowski piece at The Register, but this post will cover a few things I found of note.

(Just for background, I'm far from being a Windows aficionado or regular user - whilst I have a desktop, laptop and netbook with Windows 7, those machines spend most of their lives running some form of Linux, whether natively via dual-boot, or in a virtual machine. With regard to the Metro UI, I've never used Windows Phone 7 - in fact, I've only ever seen it being used once in the wild - and I really don't like how it it has been implemented in the latest Xbox 360 dashboard update. In fairness, most of the problems I have with the Xbox 360 implementation are far more to do with how MS have prioritized ads and general media over games, which doesn't have anything do with Metro per se, and would easily be resolved if the dashboard was configurable.)

I've only tried Win8 in a VirtualBox VM running atop Windows 7. For some reason I'm only able to run it in a limited number of resolutions, none of which are the native resolution of my monitor. Not quite sure whether this is the fault of Win8 or VirtualBox - it's the first time I've used the latter, normally I use VMWare for all my virtualized environments. (In a similar vein, I was unable to get USB memory sticks or external hard drives to be recognized, and I don't know where the fault lies.)
The login page confuses the hell out of me. It's super-minimal, which isn't a problem, but most of the time when I click the mouse on the login screen, all that happens is that the screen scrolls up and then back down by about half-an-inch. The same happens if I double-click, long-click, middle-click or right-click. Nothing happens if I hit the Windows key (which is used heavily in Win8, see later point). I've just discovered that pressing the Ctrl key, or rotating the mouse wheel, brings up the password prompt - prior to that point I'd just been randomly moving the mouse around and clicking until I triggered some magical gesture.
MS seem to push you towards using authentication based on Windows Live/Hotmail/Microsoft Live/whatever-they-brand-it-this-week accounts. This isn't necessarily a bad idea, but one thing that I'm not a big fan of is that they suggest that people might want to create a Windows account with the same name as their regular email account. From my experience on a project using Google accounts, where we suggested people might want to create a Google account named "joebloggs@hotmail.com" or "fred@myisp.com", this just leads to user confusion, as people mentally associate a particular account with a particular service. (Theoretically the same should apply to stuff like Amazon accounts, but the same issue doesn't really apply for various reasons. Probably something for a different post...)
MS seem to be really pushing Metro over the "traditional" Windows UI, but I'm really not sure how it's going to scale. I did a completely clean install, and just added Firefox, Opera, Safari+QuickTime and TortoiseSVN, and already the Start screen is full of crap and has more items than will fit on screen at once: Note that in the above shot, I'd already reduced some of the boxes that default to double width (such as Weather and Calendar) down to single width. I'm not sure why the "packer" automatically moved some items into the space that was freed when I did that, but hasn't moved Music - the items can be manually dragged, but it seems odd that it sometimes works automatically and sometimes not.
That "submenu" items such as those for TortoiseSVN or Apple Software Update have appeared in the top menu seems incredibly lame. Again, they can be manually removed from the Start screen, but (a) I don't know why users should have to manually get rid of all the crap that a newly installed application might have added without asking, and (b) I'm not sure how easy it would be to find/restore such deleted items. (There doesn't seem to be any sort of application specific context menu associated with each box.
Metro applications launch full screen, and have no window controls. This meant that I was scratching my head trying to work out how to escape from an application. In the end I had to resort to a Google search, and found that (a) I wasn't alone in being confused, and that (b) the answer is to press the Windows key. I imagine the proper version of Windows 8 will have some sort of introductory tutorial that explains this to new users, but I foresee a lot of confused people stood at demo units in PC World wondering what the hell they're supposed to do next...
By comparison, losing the "start" button in the regular UI is actually less painful than I was expecting - with one caveat. I'm sure that it'll probably be fine on a regular desktop, but as I've had to run Win8 in a VM in a window much smaller than my overall screen, the experience was a tad fiddly.
One other minor point about Metro pushing the Windows key - MS seem to be very pleased with the new Windows logo they've come up with, and compared to some of the gouge-your-eyes-out rebranding monstrosities that come out, I'd consider it perfectly OK. However, they're up against millions (billions?) of existing keyboards that are sending a very different message about what the Windows logo is. If I had to tech support over the phone to non-technical people such as my parents, I'd expect to have to describe to them what "the Windows key" is, and the first description is "it looks like a flag", which the new logo doesn't.
Probably the most useful thing for me in Win8 is having IE10 to test. As yet, I haven't actually used it very much, so I don't know how comparable it is to the rest of the browser market. (Personally I consider IE9 a very weak release, far behind the rest of the pack - probably closer to Firefox 2 than 3. The summary page at caniuse.com agrees with me.) What is a bit odd though, is that in many ways there are two IE8 browsers: the one for the "traditional" Windows UI, and the one for Metro. Moving the browser chrome down to the bottom in Metro is a non-issue. Moving the "forward" button to the far right, rather than being adjacent to the "back" button, and losing the "home" & "bookmarks" buttons, are very questionable. But refusing to play Flash content on a machine that has Flash installed and working is absolutely batshit insane. As an avowed Flash-hater, on one level I do want to welcome yet another nail in its coffin. However, this puritanical refusal to do something that the machine/OS/browser is clearly capable of, seems very user-hostile. I was aware that Flash and other plug-ins were not going to be available in some versions of Windows 8, but I'd assumed it was just going to be the ARM/mobile versions, which makes perfect sense. I guess that MS have decided though that they want to try to have a consistent experience across all Metro platforms, which is admirable in some respects. But given the failure though of non-iPad tablets, I would expect the ARM/mobile part of the overall Win8 user base to be a drop in the ocean for the foreseeable future, so making the experience worse for the 9x% of people on desktop/laptop machines just so the tiny fraction of people on ARM/mobile don't feel left out, strikes me as misguided.
There are also a few other "WTF?" things with IE10, that I've not personally experienced, but which are documented here. I do find it telling that that piece is (at best) neutral in tone, whereas Thurrott is normally "rah rah, isn't this great" about the vast majority of stuff that MS do...
Not that I was expecting anything, but the continued absence of any bread-and-butter tools like ssh or even telnet is very lame. I suppose that they are too Unixy for MS, and they'd rather not let on that there's an alternative to the world of Windows out there ;-)

Obviously, this is just a preview release, and it makes sense for MS to try out new ideas that might not work out, and which they can easily pull in the official release. Certainly Metro looks nice, and feels less cliched than OS X's brushed steel. (Although given that I mostly use Linux and Xfce, and turn off desktop wallpapers, fancy transitions (e.g. compiz) etc, my opinions on aesthetics probably shouldn't be paid too much heed ;-) Personally, it offends me far less than GNOME 3, the 2011 Google redesign, or OS X Lion - but that's probably because I don't have any great investment in the world of Windows. I do think that such radical changes for a product such as Windows are very "brave", but that's a subject I might elaborate on in another post, as this one is already more than long enough.

Some initial thoughts about Windows 8 Consumer Preview

Posted by John Smith on Mon 05 March 2012

(Just for background, I'm far from being a Windows aficionado or regular user - whilst I have a desktop, laptop and netbook with Windows 7, those machines spend most of their lives running some form of Linux, whether natively via dual-boot, or in a virtual machine. With regard to the Metro UI, I've never used Windows Phone 7 (I've only ever seen it being used once in the wild), and I really don't like how it it has been implemented in the latest Xbox 360 dashboard update - although most of the problems I have with it are far more to do with how MS have prioritized ads and general media over games, which doesn't have anything do with Metro per se, and would easily be resolved if the dashboard was configurable.)

I've only tried Win8 in a VirtualBox VM running atop Windows 7. For some reason I'm only able to run it in a limited number of resolutions, none of which are the native resolution of my monitor. Not quite sure whether this is the fault of Win8 or VirtualBox - it's the first time I've used the latter, normally I use VMWare for all my virtualized environments. (In a similar vein, I was unable to get USB memory sticks or external hard drives to be recognized, and I don't know where the fault lies.)
The login page confuses the hell out of me. It's super-minimal, which isn't a problem, but most of the time when I click the mouse on the login screen, all that happens is that the screen scrolls up and then back down by about half-an-inch. The same happens if I double-click, long-click, middle-click or right-click. Nothing happens if I hit the Windows key (which is used heavily in Win8, see later point). I've just discovered that pressing the Ctrl key, or rotating the mouse wheel, brings up the password prompt - prior to that point I'd just been randomly moving the mouse around and clicking until I triggered some magical gesture.
MS seem to push you towards using authentication based on Windows Live/Hotmail/Microsoft Live/whatever-they-brand-it-this-week accounts. This isn't necessarily a bad idea, but one thing that I'm not a big fan of is that they suggest that people might want to create a Windows account with the same name as their regular email account. From my experience on a project using Google accounts, where we suggested people might want to create a Google account named "joebloggs@hotmail.com" or "fred@myisp.com", this just leads to user confusion, as people mentally associate a particular account with a particular service. (Theoretically the same should apply to stuff like Amazon accounts, but the same issue doesn't really apply for various reasons. Probably something for a different post...)

Artificial pagination of articles on news sites - thanks, but no thanks

Posted by John Smith on Sun 04 March 2012

TL; DR: the bazillionth whinge about content sites with paginated articles, but picking on businessweek.com as they seem to be doing something particularly iniquitous, that I haven't seen before. (Although it could just be that I'm unobservant or behind-the-times, and that this is old news.)

I was in a branch of W H Smith yesterday, and had a quick flick through the latest issue of Bloomberg Business Week magazine. As usual, there were a number of articles that looked like they'd be worth reading, so I handed over £3.30 for the dead tree edition made a mental note to visit businessweek.com later to digest those articles properly.

I got round to visiting the site just now, and it looked like they might have had a minor redesign since I last visited - nothing particularly outrageous though. The first article I checked was this week's cover story about Twitter. Reading down the first page, it was quite a good piece, if not telling me anything I hadn't previously been aware of.

As I scrolled down to the bottom of the page, there was a pagination nav that indicated the article had been split into four pages. As I clicked on "next >" to go to the following page, I was surprised about how quickly the second page appeared - suspiciously quick, in fact.

Screen grab of a browser showing the bottom of an article at businessweek.com, specifically the page navigation

Being a nosey bugger, I viewed the HTML source of the page, and found that in fact, all of the article content is present in the "first" page, and it isn't even sectionalized in any way.

Screen grab of a browser 'View Source' window, showing that there is article content beyond that shown to the user on a page of a businessweek.com article

It wasn't a big surprise to me to find out that if I turned off JavaScript, my browser would now show the whole story on a single page, with no pagination controls visible anywhere. On my 1050x1680 portrait display, the full story runs for 8 screens in Firefox, which doesn't strike me as especially long and in need of breaking up. (NB: I have Ghostery blocking Disqus comments in Firefox; when I viewed the regular paginated site in a browser, I found that just the first of the four pages ran to 6 screensworth, of which around 50% were the - mostly inane - Disqus user comments.)

Screen grab of the businessweek.com story with JavaScript disabled, now showing all the article content on a single page

I haven't bothered to check the site's JavaScript code, but I assume there's something that measures the height of the <p> elements, and once the height goes above a certain point, starts splitting/hiding them into pages, and inserts the page navigation controls.

Now, pagination of online content is a complicated subject, and I'm no UX guru, conversion wizard or SEO charlatan who can confidently spout chapter-and-verse about what you should or shouldn't do when building a content site. What I do know is that as a user, I don't like having to continually click-scroll-click-scroll-click-scroll to get through an article that could have easily been scrolled through. And I'm pretty sure I'm not alone.

The usual excuse for pagination is that it increases the number of page impressions or ads that can be shown, but I don't think that's valid here:

As there's no new page being loaded when I click 'next', a page hit in the traditional sense isn't occurring. I'm sure there'll be some JavaScript analytics code sending something back to the server when I navigate to another page, but surely this could be done by handling onscroll events, similar to how people such as Twitter (ironically enough) implement infinite scrolling.
The ad space on a long single page is pretty much the same as on multiple short pages, so the same number of ads could be run. Now, after viewing the story on BusinessWeek a few times, it looks very much like there are only a very small number of ads being repeated on each sub-page of the story, and having the same ads repeatedly shown on a longer single page would look pretty dumb, but this seems to me to be more a failure of their ad sales or syndication systems, than anything else.
Most sites that use pagination - Ars Technica is the whipping boy that usually comes to mind - do at least keep up the pretence of having to download new content (whether by a traditional page load, or via Ajax), but this is the first time I've been aware of a site using JavaScript to IMHO actively make things worse for end users.

Slightly odd MS Bing job spam

Posted by John Smith on Fri 02 March 2012

I received the following message via LinkedIn tonight:

Screengrab of an email sent to me by a Microsoft recruiter

I'm sure they bash out thousands of these every day, but I'm mildly curious why they decided I should be a lucky recipient of one:

I'm not sure what the "CIS" in the message subject refers to. If MS thinks I'm from the former USSR, that doesn't say much for their CV analyzing abilities. I did look for alternate definitions, but other than the meaninglessly generic "Computer Information Systems", I don't see anything that jumps out as being relevant.
What's with the weird intermittent capitalization of BING/Bing?
Other than a couple of years of DOS development nearly 20 years ago, a proper look at my CV or LinkedIn profile would show I have very minimal experience or interest in Microsoft technologies. A glance at Twitter posts would in fact show that taking the p*** out of MS is one of my favourite entertainments.

Now, I'm pretty sure that the reason they've contacted me is fairly obvious - I have the magic word "Google" in my CV/profile. One might have thought that even MS realize that not all of the ~30k people who work at Google are on the search teams. (Not that I ever counted as part of that number, being a lowly "red badge" contractor.)

Don't get me wrong, I'm quite happy for recruiters to send me unsolicited offers (within reason). But spamming me about roles that are clearly not a suitable match for me is a waste of everyone's time.

Man pages are not optional

Posted by John Smith on Fri 24 February 2012

Spent a few hours today playing with Puppet, which I'd been meaning to have a look at for ages.

I followed a beginner's tutorial, which didn't work - Puppet failed to stop or start ntp in the way the tutorial describes. In and of itself, this didn't bother me - in fact, I often find it useful when things screw up, as it forces you to start digging in and investigating, at which point you start learning how things really work.

What was a bit annoying though, was the lack of any obvious warnings or errors, either in the output, or in the log files. (Puppet logs to both /var/log/messages and files in /var/log/puppet/, but the former had nothing useful, and the latter just contained incomprehensible HTTP request URLs.)

Never mind, I thought, I'll just check the man page. This is what I got: Screengrab of the unhelpful output of 'man puppet'

It doesn't look to be an issue specific to my distro either.

I don't give a damn how good any project's online documentation is; something that is going to be interacted with from a *nix command line - especially when aimed at a system admin audience - should have a manpage that at least covers some of the basics in a modicum of detail. I don't care for the GNU stuff that pushes you towards the info command that I've never seen anyone actually use, but compared to Puppet, they're wonderful.

Obviously Unix manpages are far and away from being the new hotness, and the lack of stuff like proper hyperlinks is a bit annoying in this day and age, but the fact that they have survived this long shows how useful they are. If the author(s) of a tool can't be bothered to spend a few hours putting together a half-decent manpage, then I'm not sure I feel inclined on spending any time bothering to research and learn that tool.

EDIT: I've just seen that you can do puppet describe {string} to get something that's roughly comparable to a proper manpage, just inferior in pretty much every respect. (e.g. no nice formatting on a terminal that supports bold or underline, having to pipe into more/less/etc if you want to page it) Too late though, I'm moving on...

EDIT#2: On further investigation, I'm not sure what puppet describe or puppet doc actually do, but it's certainly not providing docs about the subcommands such as "agent", "apply", "cert", etc as mentioned in the manpage.

Detecting if a webpage is running in a background browser tab

Posted by John Smith on Sun 19 February 2012

TL;DR version: A hacky way of detecting if a web page is running in a browser tab which isn't currently active. Possibly useful for not autoplaying video or audio. Currently only works in Firefox and Chrome. A rough demo is here.

(This post refers to subject matter covered in more detail in my previous blog post. You don't really need to read that to understand the core of what this post is about though.)

I'm probably very weird in the way I browse, but something that I experience on a more-than-daily basis is:

Go to a site that has a bunch of items/stories/articles, many of which will have links. I'm thinking of things like a Twitter feed, Hacker News, a Gawker Media site, the B3TA newsletter, etc
As I skim down the page, I r-click->open in new tab any items which look interesting and have links. I prefer to do this rather than to hop backwards and forwards between the original page and the linked pages.
Unfortunately, some of those links might be to YouTube or other video services - and this might not be immediately obvious, especially with embedded videos or if URL shorteners are in use. You can then end up with one or more videos starting to autoplay, and you're forced to start going through the tabs to pause the videos until you're ready to watch them.

It struck me that it would be much more user-friendly if videos didn't autoplay if the page was in a window or tab that wasn't currently active or visible. Now, either my webdev and Google skills are deficient (quite possible), or this isn't as easy as you might expect.

Browsers do have focus and blur event triggers, but these don't really help - a page opened in a tab won't fire either of these until the tab is clicked on, and only Firefox fires a focus event when a page loads in an active tab/window.

As far as I can determine, there's no property of the Window object along the lines of 'focusState' that would make all of this dead easy to determine.

I then remembered reading a few posts about requestAnimationFrame, which has been introduced into the latest generation of browsers. The intended use-case for this functionality is to stop browsers burning CPU on animations/game cycles that a user can't actually see or interact with. This seemed like something which would help provide a solution to the problem I was thinking about.

Unforunately, things aren't straightforward. First off, only Firefox and Chrome currently have proper requestAnimationFrame support. Now, there are easy-to-use shims to keep things working on other browsers - unfortunately they do this by just doing the animations as normal on a page in an inactive tab, which doesn't help in my scenario.

Furthermore, the behaviour in the two browsers that do implement it differs - and from a bit of reading, I get the impression that they're not likely to converge to identical implementations. The differences are covered in the previous blog post - suffice to say we need to apply a bit of a fudge-factor to be able to support both browsers.

So, here's an overview of the JavaScript code you'd need to add to a page with video, to make it work in a "civilized manner":

Ensure the video is set to notautoplay
On page load, initialize the following variables:
- animationCounter = 0
- stopAnimations = false
- windowState = undefined
Set up some onFocus() and onBlur() handlers. The handlers will set windowState to "focussed" or "blurred" respectively.
Use requestAnimationFrame an incrementCounter() function. This function - unsurprisingly - increments animationCounter, and if stopAnimations is not true, uses requestAnimationFrame to call itself again
Use setTimeout() to call a determineState() function a second after page load
When determineState() is called, set stopAnimations to true, clear the onFocus() and onBlur() handlers, and do one of the following:
- If windowState is "unknown"
- If windowState is "focussed", set the video to autoplay, as the user can definitely see it
- If windowState is "blurred", don't autoplay the video, as the user definitely can't see it. (I don't believe this state will ever happen, unless the user is juggling between tabs, but it makes sense to cover it for completeness.)
- If animationCounter is more than 3, the user probably has the page active, so autoplay the video
- If animationCounter is 2 or less, the user probably has the page inactive, so don't autoplay the video

Disparity in requestAnimationFrame behaviour between Chrome and Firefox

Posted by John Smith on Tue 14 February 2012

This is a brief prelude to another post I hope to make in a couple of days or so, once I've solved my problem to my satisfaction. In the meantime, here's a related curio that I hadn't seen documented online before I had to start digging...

requestAnimationFrame is something that's been pushed in the last year or two as a more efficient way of doing animations in JavaScript than the traditional technique of using setInterval. In particular, it aims to avoid having your machine burn CPU on executing animations in a tabbed page that's not currently visible.

At time of writing, only Firefox and Chrome seem to actually support this function, albeit with moz and webkit vendor prefixes. caniuse.com doesn't have too much information about future support in other browsers - it'll appear in IE10, but it's unclear about Safari or Opera. Certainly the Opera Next 12.0 I downloaded yesterday doesn't appear to have it.

Now, for the most part this isn't the end of the world, as there are published shims to implement a workable alternative using setInterval() or setTimeout(). Unfortunately, these will just churn away as normal in a background tab, whereas what I wanted to do was to see how things were different in a background tab.

It turns out that the two implementations we have so far differ in their behaviour. Chrome comes to a dead stop when a page is in a background tab, which is probably what you'd naively expect to happen. Firefox on the other hand does some gradual throttling - you'll get one frame in the first second of being backgrounded, then another after a further two seconds, then a further four seconds, eight seconds, sixteen seconds, etc.

I knocked up a very rough demo for this, so that you can see for yourself - take a look here, and see what happens when you r-click the link on the page and open it in another tab - the function called via requestAnimationFrame() updates the page title, so you can see how often it gets called from the text in the tab.

I'm not completely clear why Mozilla have implemented this the way they have - I've not dug out any official specs, but going by the year old Chromium issue to add this functionality, I don't expect this behaviour to show up in Chrome/Chromium.

In the next post I'll elaborate on the problem I've been trying to solve - suffice to say, using requestAnimationFrame was a bit of a hacky way of trying to achieve something that I'd have thought should have been extremely straightforward...

« Page 2 / 6 »

John Smith's Blog

Ramblings (mostly) about technical stuff

Visualization of, and musings on, recent Hacker News threads about liked and disliked languages

Someone please stop this shortlink-to-shortlink-to-shortlink-to-shortlink insanity

Alphabetically sorted list of pure Python stdlib modules

Thoughts on Windows 8 Consumer Preview

Some initial thoughts about Windows 8 Consumer Preview

Slightly odd MS Bing job spam

Man pages are not optional

Detecting if a webpage is running in a background browser tab

Disparity in requestAnimationFrame behaviour between Chrome and Firefox

About this blog

About the author

Popular tags

Other sites I've built or been involved with

Work

Personal/fun/experimentation