Sunday, 25 March 2012

The Sky is Falling (again)

Matt Honan's recent article The Case Against Google tells us that Google is Evil, people are abandoning the Open Web in favour of Closed Ecosystems and it's impossible to search the web without surrendering enough privacy to make a gynaecologist blush. So far, so 2012.


Here's the über-challenge that Google has set itself in delivering relevant search results:
You are about to leave San Francisco to drive to Lake Tahoe for a weekend of skiing, so you fire up your Android handset and ask it "what's the best restaurant between here and Lake Tahoe?" It's an incredibly complex and subjective query. But Google wants to be able to answer it anyway. (This was an actual example given to me by Google.) To provide one, it needs to know things about you. A lot of things. A staggering number of things.
To start with, it needs to know where you are. Then there is the question of your route—are you taking 80 up to the north side of the lake, or will you take 50 and the southern route? It needs to know what you like. So it will look to the restaurants you've frequented in the past and what you've thought of them. It may want to know who is in the car with you—your vegan roommates?—and see their dining and review history as well. It would be helpful to see what kind of restaurants you've sought out before. It may look at your Web browsing habits to see what kind of sites you frequent. It wants to know which places your wider circle of friends have recommended. But of course, similar tastes may not mean similar budgets, so it could need to take a look at your spending history. It may look to the types of instructional cooking videos you've viewed or the recipes found in your browsing history.
It wants to look at every possible signal it can find, and deliver a highly relevant answer: You want to eat at Ikeda's in Auburn, California
In looking at the corner into which this search company has painted itself in its attempts to stay relevant, I can't help but compare it with the lengths to which the Sirius Cybernetics Corporation went with its Nutrimatic Drinks Dispenser:

When the 'Drink' button is pressed it makes an instant but highly detailed examination of the subject's taste buds, a spectroscopic analysis of the subject's metabolism, and then sends tiny experimental signals down the neural pathways to the taste centres of the subject's brain to see what is likely to be well received. However, no-one knows quite why it does this because it then invariably delivers a cupful of liquid that is almost, but not quite, entirely unlike tea. Hitchhiker's Guide to the Galaxy, Douglas Adams 
You see, I'm fairly sure that after all that effort that Google puts into finding me a restaurant, it will end up sending me to a so-so kind of establishment - the kind of restaurant I 'normally' end up in on business trips. The kind of restaurant that averages out the likes and dislikes of all my companions. And after all that invasive knowledge elicitation, I'll end up somewhere which has put in more effort on search engine optimisation than culinary optimisation.

What I really want from Google in these circumstances is an answer to the question "are there any Michelin starred restaurants between here and there?" It's a good old fashioned objective question, with a written down answer. One that can be looked up on the Web, not divined from my cerebellum.


Thursday, 5 January 2012

Open vs Closed? An Explosion of Generativity

In a previous posting I have mentioned Jonathan Zittrain's book The Future of the Internet and How to Stop It, in which he argues that the Internet needs to be open as a "generative system" to allow unanticipated change to emerge through unfiltered contribution from broad and varied audiences. His argument is that the Internet (i.e. Web) innovation needs to be open to all comers, in the same way that PC development has been unrestricted and open. No-one controls what you can do with a PC, what programs you should be able to write, to run or what information you should be allowed to process. The very processes that could control the Internet to make it a "safer" place (with regards to kiddie porn, piracy, cyber bullying, identity theft &ct.) will also tend to restrict technological development and make the future of the Internet a much poorer place - both in terms of the user experience and in terms of the future economic activity that could be developed.

In developing this argument, Zittrain and others have tended to contrast PC development (open to all) with iPhone development (closed and controlled by Apple). The first edition of his book was written before the iPhone API was released and the remarkably successful App Store(TM) was released. Subsequent editions/additions to the book have finessed the argument but by and large people still believe that a manufacturer controlled smartphone with software development policed by the manufacturer is a bad thing for innovation and hence generatively.

Historical PC/Windows Package vs iOS Package Development per year
Is this "received wisdom" supported by the evidence? The chart to the right compares the annual contribution of software developers on the Windows PC platform available from download.com (a major software portal since the early days of the Web) and iOS iPhone/iPod/iPad platform available from Apple's app store, and apparently shows an order of magnitude more development being supported by the closed environment.

Now PC software is available from thousands of sources, not just this single aggregator, and so the number of Windows packages here is clearly underestimated, while the iOS figure is accurate (by the nature of a closed, single manufacturer environment). Still, it is not the number of downloads which is important, and which scales with the number of distribution channels, but the number of software packages that have been created. Since download.com is such a significant source of PC software, we might expect that it would provide a not-insignificant fraction of software that is available to the general public.

So, given the arguments made about innovation and open platforms, it is interesting that there is such a difference between these figures for the two platforms in favour of the closed environment. That might suggest the amount of innovation stimulated by the iPhone is significant in comparison to the PC, that the development of the next generation of Web environments could be triggered by an iPhone-like ecosystem and not throttled by it, and that the future of the Internet is not so alarmingly threatened as some have thought.

This naive investigation and its results are an excuse for further investigation into how we theorise and predict the emergence of future web developments. The Web, after all, is not defined by the particular experience of a browser on a computer (desktop, laptop, netbook or smartphone), but by the interaction of informational and social agents.


Monday, 19 September 2011

Research Ethics and the Webs Private & Public Spaces

In a paper (Six Provocations for Big Data, section 5) related to her forthcoming keynote at the Oxford Internet Institute's "Decade in Internet Time" conference, danah boyd talks about "being in public" on the web, bringing metaphors about one's own public presence in a physical environment to bear on the accessibility of digital writings on a computer server. While we can all intuit what is meant by this (the conscious felt experience of being engaged with the web), are metaphors such as "being in public" helpful when thinking about ethical issues raised by the Web?

"Being in public" means that one's presence and actions can be seen/heard by other people, where we have no choice about who those "other people" are, nor control over what they do. Of course on the Web "we ourselves" are not in public, but the record of our words (or audios, videos, photographs, artwork) are. Or may be; sites may hide their content behind user accounts and secure browsing protocols. We may debate about our social networking activities being public, but we are rarely tempted to debate about the public nature of our bank account transactions.

What is the difference between the following:

Being in a public spacevshaving one's statements made public
Making a statement in a public spacevsmaking a public statement
Being in a public spacevsbeing on a global stage
Being in a public spacevsbeing in a particular space for a particular purpose that other people could observe now or in the future
Being in a public spacevsbeing made aware of other people's scrutiny
Making a statement in a public spacevshaving ones statements publicly analysed & criticised by observers

"Being in public" on the Web means that one's activities, memberships, engagements, writings, videos can be seen/heard by other people, where we have no choice about who those "other people" are, nor control over what they do. "Being in public" on the Web is useful on occasions when we want a global audience, and also on occasions when we are pontificating to the aether.

But "being in public on the Web" is also useful when we are expecting to speak to only a few individuals because for practical reasons it would be hugely inconvenient to create a specific channel for those people only. This is how we are "in public" normally: in parks, on the street, in coffee shops. One might refer to this as an expectation of "privacy by obscurity" - people could eavesdrop, but why would they bother? And when we are in those situations we are used to social norms that preclude people gathering around and gawping at our discussions. (As we are taught as children "don't stare", "don't be nosy", "that's none of your business".)

There are two phenomena that intrude on the unconsciously public: Google and the wily researcher. Search engines exist to expose and make things findable (more effectively public). However, those inhabiting the "self-conscious public" will often go to great SEO lengths to make sure that their public utterances are prominently positioned. Although not occupying key marketing positions in the top page of a Google search, the unconsciously public may still find find that their words are more accessible than they would have liked. 

However acting in an "unconsciously public" fashion does not necessary imply being completely oblivious to the lack of privacy. Individuals may adjust to the emerging social norms and in doing so create new norms and establish new boundaries of behaviour. You may consider it acceptable for like-minded individuals (friendly observers, benign lurkers) to search for your online presence on discussion forum; you may be unhappy about work colleagues, reporters, government agents and university researchers actively examining your opinions. 

So perhaps it is no small wonder that Google reports that there are almost half a million Web pages using the following boilerplate text threatening sociologists with legal action if they dare make use of their pages:
WARNING Any institutions or individuals using this site or any of its associated sites for studies or projects - You DO NOT have permission to use any of my profile, pictures, or other material posted on this site (including discussion thread posts and blogs) in any form or forum both current and future. If you have or do, it will be considered a violation of my privacy and will be subject to legal ramifications. It is recommended that other members post a notice similar to this or you may copy and paste this one into your profile
From a technical and legal point of view, I'm not convinced that this carries any weight (although I'm looking into it), but it certainly telegraphs a preference and intent. On the one hand we should feel a very strong pull towards respecting and honouring an individual's wishes, on the other hand we have clear social and legal boundaries precisely to curb our individual requests.

Should web mining personal information stop? Should ethics committees come down hard on this practice? Is it right to broaden the principle of "informed consent" to the Web, and to severely prune the availability of "big data"?  I don't know, but I do know that my engineer's default position of "do what you want with public web pages" has been severely challenged.

Thursday, 28 July 2011

What Web Science studies is "the InterWebs"

The Internet is made of cats according to the researchers on the YouTube rathergoodstuff channel, although a closer reading of their work reveals a more subtle cat - tube duality reflecting the popular perception of "Teh Internets" as both content and delivery channel.

Engineers make a clear distinction between the Internet (a global network of networks) and the Web (a distributed information space that uses the Internet to provide access to interlinked content through a combination of protocols, data formats and identifiers such as HTTP, HTML and URI).

"Browsing", "navigating" and "information discovery" are the kinds of generic activities that web developers and information scientists concern themselves with, but the more common labels Social Networking, Internet Video, Blogging, Online Banking, Open Source Development, Internet Porn, E-research and Internet Shopping describe what people are actually achieving with (and within) the (application-neutral) information space of "the Web".

These various categories of practice and activity are distinctive enough to have their own names and their own specialist kinds of interaction (shopping baskets, playlists, blogrolls) even though people may be simply (reductively) "navigating web pages" using the same technology (a Web browser connecting via HTTP to a Web server) on the same devices (a home desktop or laptop) to engage in all these activities.

Those web engineers and content providers building on the Web to provide Internet Shopping (e-commerce, b2b, secure financial transactions, product databases, stock control, warehouses and delivery) have different concerns to those dealing with Internet Video (rights acquisition, media streaming, content licensing, bandwidth negotiation, format transformation). The activity supporting each of these practices can be modelled as a network of stakeholders (providers, consumers, participants, brokers, technologies, marketing channels); looking in detail at any particular activity reveals a web of information and data. The Web as a whole is the conjunction of these individual activities - neither entirely separate, nor completely merged and integrated but overlapping and interacting, all built on the simple foundation of the Web architecture, but realised in different kinds of organisation drawn from different industries, with different expectations and rules, communicating through different kinds of sites, perhaps on different devices.

The bigger picture of the Web then is not a monolithic whole nor a homogenous distribution of uncoordinated components; it is rather a loose affiliation of semi-independent content networks (webs) with their own practices and technologies and business (sustainability) models, their own ecology of providers and consumers. Held together by W3C-mandated standards, policies and architectural overview, the Web at scale is a network of webs - the InterWebs - mutually reinforced and stabilised by each others success and contribution to the whole.

Wednesday, 27 July 2011

Why the Web Should be an Open Platform

As an open access advocate involved in developing technologies and policies to help the scientific and research communities share knowledge, I often find myself talking about Open Access, Open Educational Resources, Open Government Data, Open Scientific Data and the revolution in intellectual property practice that the Web has precipitated. More recently, as part of the Web Science research activities at the University of Southampton, I have been critically re-examining these notions of Openness and the Web - what do we mean by openness, what do we want from openness, and is openness a good thing? The fundamental question of why the Web is open and whether it will continue to be open are discussed in the paper Could the Web be a Temporary Glitch? that my colleagues and I presented at the 2010 Web Science conference.

Bill Thompson (BBC journalist) got me thinking more about these issues the other day. In an essay The open internet and its enemies published as part of the BBC World Service debate on openness, he said:
I believe that if we want an open society based around principles of equality of opportunity, social justice and free expression, we need to build it on technologies which are themselves 'open', and that this is the only way to encourage a diverse online culture that allows all voices to be heard.
Many people would agree with this (especially the techno-optimists among us), but why should this statement be true now specifically, and of the Web specifically? History tells us that we built our own "open societies" on privately owned presses, not an open printing platform. What we did have was the appropriate legal and political supplements to establish the notion of freedom of speech and freedom of expression. Eventually. Of course, the presses came before the freedom, and in some ways the presses precipitated the freedom because the technology was sufficiently open to be not entirely controllable by the state.

The Web is better than the printing press or the radio or the television at distributing information; can't we just accept the Web for that improvement in engineering (and goodness knows there is enough challenge by in that innovation alone) and legislate the appropriate balances of power and access? Why does it matter if someone owns the technology, or if it is not freely available? Why do we need to insist on the Web/Internet being an open and neutral platform instead of a closed, commercially (or governmentally) controlled environment?

In his book The Future of the Internet and How to Stop It, Jonathan Zittrain argues that the Internet needs to be open as a "generative system" to allow unanticipated change to emerge through unfiltered contribution from broad and varied audiences. Zittrain's argument is focused on technological innovation, but it could also be applied to societal development - we need broad ranging democratic engagement to make the best of our society (to innovate what we might call its "social machinery").

The open technology of the Web doesn't replace legal declarations or political commitments, but it does complement Freedom of Speech agendas. The law might say that you have a right to free speech, but it doesn't give you a platform on which to speak; the Web is a platform for communication that offers very few barriers for free speech.

It is tempting to think of the Web as "a channel for delivering premium content" and to allow it to be dominated by commercial interests. It is tempting to think of the Web as a theatre of cybercrime and cyberwarfare and to allow it to become dominated by policing and security interests. It is vital that we continue to see the Web primarily as a platform for communication, and that we allow basic rights and freedoms to dominate our plans and strategies for its future. Open technologies and open platforms are not necessary for a free society, nor do they guarantee a free society. But they do offer the potential of realising important societal freedoms more effectively than the alternative.