Friday 3 August 2012

Queen Anne, Copyright and Illegal Numbers

I have spent a lot of time as a web scientist and open access advocate considering the role of copyright in the web, where by "considering" I mean "banging my head against a wall" and by "role" I mean "trump card for the intellectually bankrupt but politically powerful".

Three hundred years ago, the English crown invented a new kind of property which it declared to exist in the written expression of ideas, and which it granted directly to the individuals who created those written texts. Copyright declares that anyone who creates a new piece of written text becomes the only person who has the right to make a copy of that text. Publishing a text or CD used to require that the creator made (printed or burned) as many copies as were necessary and readers simply acquired a pre-copied item. In the Web, the creator makes available a single item (on their web server's disk) and then to read the page the readers have to make their own copies using their own computers' browsers.

The Web has created a new kind of user experience and raised all kinds of expectations that are not compatible with the idea of copyright; different people would like to reconcile these differences by either changing the Web or changing the notion of copyright.

Copyright controls the expression of ideas; it stops one person stealing and using using another person's work, but not stopping them using their ideas. However, now these ideas are expressed digitally, there is more at stake than a piece of text.

Copyright doesn't just apply to novels or articles; it has been argued that it applies to very much shorter forms of communication such as emails and tweets. All of these things are stored as documents or files on a computer; they are sequences of bytes that have to be interpreted according to some coding scheme (e.g. ASCII, Unicode, plain text, HTML or Word). 

Imagine the tweet "I am a pink hotdog". It is brief, but as far as Google is concerned it is original and has never been written before in any other document, and asserting my own copyright would seem to be justified. When I save it in a file on disk, I can use the od command to either show it as a string of characters, or a very long number (in hexadecimal format). I can then translate that hexadecimal number into a more usual decimal format.

textI   a m   a   p i n k   h o t d o g
hexadecimal4920616d20612070696e6b20686f74646f67
decimal6,370,215,410,492,649,031,668,884,346,259,210,575,834,983


As far as the computer is concerned, the very long number and the stream of characters are equivalent data. If the law says that no-one else is allowed to reproduce these characters, it is the same as saying that no-one else is allowed to reproduce this number. If these words are not allowed to be stored in a computer system as a document, then this number is not allowed to be stored in a computer system as the output of a calculation. In other words, copyright not only assigns property in the expression of ideas, it also assigns property in the use of numbers. Effectively, once you write down an (admittedly large) number, it becomes illegal for anyone else to use it until 70 years after you die.


I doubt that Queen Anne foresaw this outcome when she created legislation "for the encouragement of learning". I'll leave it as an exercise for my students to work out how many illegal numbers there are, and how much of which range of numbers they infest. See Wikipedia for other examples of illegal numbers.