Monday 19 September 2011

Research Ethics and the Webs Private & Public Spaces

In a paper (Six Provocations for Big Data, section 5) related to her forthcoming keynote at the Oxford Internet Institute's "Decade in Internet Time" conference, danah boyd talks about "being in public" on the web, bringing metaphors about one's own public presence in a physical environment to bear on the accessibility of digital writings on a computer server. While we can all intuit what is meant by this (the conscious felt experience of being engaged with the web), are metaphors such as "being in public" helpful when thinking about ethical issues raised by the Web?

"Being in public" means that one's presence and actions can be seen/heard by other people, where we have no choice about who those "other people" are, nor control over what they do. Of course on the Web "we ourselves" are not in public, but the record of our words (or audios, videos, photographs, artwork) are. Or may be; sites may hide their content behind user accounts and secure browsing protocols. We may debate about our social networking activities being public, but we are rarely tempted to debate about the public nature of our bank account transactions.

What is the difference between the following:

Being in a public spacevshaving one's statements made public
Making a statement in a public spacevsmaking a public statement
Being in a public spacevsbeing on a global stage
Being in a public spacevsbeing in a particular space for a particular purpose that other people could observe now or in the future
Being in a public spacevsbeing made aware of other people's scrutiny
Making a statement in a public spacevshaving ones statements publicly analysed & criticised by observers

"Being in public" on the Web means that one's activities, memberships, engagements, writings, videos can be seen/heard by other people, where we have no choice about who those "other people" are, nor control over what they do. "Being in public" on the Web is useful on occasions when we want a global audience, and also on occasions when we are pontificating to the aether.

But "being in public on the Web" is also useful when we are expecting to speak to only a few individuals because for practical reasons it would be hugely inconvenient to create a specific channel for those people only. This is how we are "in public" normally: in parks, on the street, in coffee shops. One might refer to this as an expectation of "privacy by obscurity" - people could eavesdrop, but why would they bother? And when we are in those situations we are used to social norms that preclude people gathering around and gawping at our discussions. (As we are taught as children "don't stare", "don't be nosy", "that's none of your business".)

There are two phenomena that intrude on the unconsciously public: Google and the wily researcher. Search engines exist to expose and make things findable (more effectively public). However, those inhabiting the "self-conscious public" will often go to great SEO lengths to make sure that their public utterances are prominently positioned. Although not occupying key marketing positions in the top page of a Google search, the unconsciously public may still find find that their words are more accessible than they would have liked. 

However acting in an "unconsciously public" fashion does not necessary imply being completely oblivious to the lack of privacy. Individuals may adjust to the emerging social norms and in doing so create new norms and establish new boundaries of behaviour. You may consider it acceptable for like-minded individuals (friendly observers, benign lurkers) to search for your online presence on discussion forum; you may be unhappy about work colleagues, reporters, government agents and university researchers actively examining your opinions. 

So perhaps it is no small wonder that Google reports that there are almost half a million Web pages using the following boilerplate text threatening sociologists with legal action if they dare make use of their pages:
WARNING Any institutions or individuals using this site or any of its associated sites for studies or projects - You DO NOT have permission to use any of my profile, pictures, or other material posted on this site (including discussion thread posts and blogs) in any form or forum both current and future. If you have or do, it will be considered a violation of my privacy and will be subject to legal ramifications. It is recommended that other members post a notice similar to this or you may copy and paste this one into your profile
From a technical and legal point of view, I'm not convinced that this carries any weight (although I'm looking into it), but it certainly telegraphs a preference and intent. On the one hand we should feel a very strong pull towards respecting and honouring an individual's wishes, on the other hand we have clear social and legal boundaries precisely to curb our individual requests.

Should web mining personal information stop? Should ethics committees come down hard on this practice? Is it right to broaden the principle of "informed consent" to the Web, and to severely prune the availability of "big data"?  I don't know, but I do know that my engineer's default position of "do what you want with public web pages" has been severely challenged.

Thursday 28 July 2011

What Web Science studies is "the InterWebs"

The Internet is made of cats according to the researchers on the YouTube rathergoodstuff channel, although a closer reading of their work reveals a more subtle cat - tube duality reflecting the popular perception of "Teh Internets" as both content and delivery channel.

Engineers make a clear distinction between the Internet (a global network of networks) and the Web (a distributed information space that uses the Internet to provide access to interlinked content through a combination of protocols, data formats and identifiers such as HTTP, HTML and URI).

"Browsing", "navigating" and "information discovery" are the kinds of generic activities that web developers and information scientists concern themselves with, but the more common labels Social Networking, Internet Video, Blogging, Online Banking, Open Source Development, Internet Porn, E-research and Internet Shopping describe what people are actually achieving with (and within) the (application-neutral) information space of "the Web".

These various categories of practice and activity are distinctive enough to have their own names and their own specialist kinds of interaction (shopping baskets, playlists, blogrolls) even though people may be simply (reductively) "navigating web pages" using the same technology (a Web browser connecting via HTTP to a Web server) on the same devices (a home desktop or laptop) to engage in all these activities.

Those web engineers and content providers building on the Web to provide Internet Shopping (e-commerce, b2b, secure financial transactions, product databases, stock control, warehouses and delivery) have different concerns to those dealing with Internet Video (rights acquisition, media streaming, content licensing, bandwidth negotiation, format transformation). The activity supporting each of these practices can be modelled as a network of stakeholders (providers, consumers, participants, brokers, technologies, marketing channels); looking in detail at any particular activity reveals a web of information and data. The Web as a whole is the conjunction of these individual activities - neither entirely separate, nor completely merged and integrated but overlapping and interacting, all built on the simple foundation of the Web architecture, but realised in different kinds of organisation drawn from different industries, with different expectations and rules, communicating through different kinds of sites, perhaps on different devices.

The bigger picture of the Web then is not a monolithic whole nor a homogenous distribution of uncoordinated components; it is rather a loose affiliation of semi-independent content networks (webs) with their own practices and technologies and business (sustainability) models, their own ecology of providers and consumers. Held together by W3C-mandated standards, policies and architectural overview, the Web at scale is a network of webs - the InterWebs - mutually reinforced and stabilised by each others success and contribution to the whole.

Wednesday 27 July 2011

Why the Web Should be an Open Platform

As an open access advocate involved in developing technologies and policies to help the scientific and research communities share knowledge, I often find myself talking about Open Access, Open Educational Resources, Open Government Data, Open Scientific Data and the revolution in intellectual property practice that the Web has precipitated. More recently, as part of the Web Science research activities at the University of Southampton, I have been critically re-examining these notions of Openness and the Web - what do we mean by openness, what do we want from openness, and is openness a good thing? The fundamental question of why the Web is open and whether it will continue to be open are discussed in the paper Could the Web be a Temporary Glitch? that my colleagues and I presented at the 2010 Web Science conference.

Bill Thompson (BBC journalist) got me thinking more about these issues the other day. In an essay The open internet and its enemies published as part of the BBC World Service debate on openness, he said:
I believe that if we want an open society based around principles of equality of opportunity, social justice and free expression, we need to build it on technologies which are themselves 'open', and that this is the only way to encourage a diverse online culture that allows all voices to be heard.
Many people would agree with this (especially the techno-optimists among us), but why should this statement be true now specifically, and of the Web specifically? History tells us that we built our own "open societies" on privately owned presses, not an open printing platform. What we did have was the appropriate legal and political supplements to establish the notion of freedom of speech and freedom of expression. Eventually. Of course, the presses came before the freedom, and in some ways the presses precipitated the freedom because the technology was sufficiently open to be not entirely controllable by the state.

The Web is better than the printing press or the radio or the television at distributing information; can't we just accept the Web for that improvement in engineering (and goodness knows there is enough challenge by in that innovation alone) and legislate the appropriate balances of power and access? Why does it matter if someone owns the technology, or if it is not freely available? Why do we need to insist on the Web/Internet being an open and neutral platform instead of a closed, commercially (or governmentally) controlled environment?

In his book The Future of the Internet and How to Stop It, Jonathan Zittrain argues that the Internet needs to be open as a "generative system" to allow unanticipated change to emerge through unfiltered contribution from broad and varied audiences. Zittrain's argument is focused on technological innovation, but it could also be applied to societal development - we need broad ranging democratic engagement to make the best of our society (to innovate what we might call its "social machinery").

The open technology of the Web doesn't replace legal declarations or political commitments, but it does complement Freedom of Speech agendas. The law might say that you have a right to free speech, but it doesn't give you a platform on which to speak; the Web is a platform for communication that offers very few barriers for free speech.

It is tempting to think of the Web as "a channel for delivering premium content" and to allow it to be dominated by commercial interests. It is tempting to think of the Web as a theatre of cybercrime and cyberwarfare and to allow it to become dominated by policing and security interests. It is vital that we continue to see the Web primarily as a platform for communication, and that we allow basic rights and freedoms to dominate our plans and strategies for its future. Open technologies and open platforms are not necessary for a free society, nor do they guarantee a free society. But they do offer the potential of realising important societal freedoms more effectively than the alternative.