Chapter 3: The value of non-market sharing

Sharing is legitimate

Sharing used to be beyond the copyright arm

We are all accustomed to a dogmatic view of copyright, which is more about forbidding certain things than ensuring certain outcomes. For those who promote this view, the idea of allowing people who are neither the authors nor the copyright holders of a piece of work to share it with other individuals is tantamount to heresy. Article 27.2 of the Universal Declaration of Human Rights (quoted earlier) should serve as a reminder that this has not always been the dominant view. To interpret this article in its fullest sense, we must take into account any means of promoting the material and moral interest of the authors of works, not just the control of copies. With this open approach in mind, is it so obviously wrong to transmit or to make available a cultural product in a non-profit way? Just how could this harm culture itself, or those who contribute to it?

When works could only be distributed on a physical substrate, the first sale doctrine1 (also known as the exhaustion of rights doctrine) acknowledged that, after the sale of a cultural good, the person or organization acquiring it was free to transmit it to another person. This doctrine was actually just codifying a long-standing principle: copyright (or author rights for that matter) was not concerned with what individuals who have entered into possession of a work such as a book do with it. It was easier to adopt this view in past times because the carrier and the information it carried could not be easily separated. Nonetheless, the usage this enabled was far from insignificant: it led to the development of many useful activities – lending or giving books and records to friends, but also creating loan libraries for books and other media, or videocassette and DVD rental centers. Activities such as reproducing extracts in notebooks, or cutting and pasting them in the physical sense, were widely practiced in the Renaissance and classical periods [Blair 2010]. Twenty years ago, when a few lobbies started their great campaign to enforce the scarcity of works in the information domain, they were all too aware of the dangers that these past practices posed to their theories. They thus proceeded to attack them, targeting for example loan libraries in Europe. The economic effects of lending books are limited or inexistent, but these lobbies wished to erase all precedent of a right to share works without their permission.

Information technology and non-market exchanges

Today, information and communication technology (ICT) can be used for activities which, previously, required the creation, manipulation or transport of a physical object, or that were simply impossible. One can now create a piece of work, share it with others, annotate it, comment on it, by exchanging only information. This change concerns not only the artistic or cultural domain, it also affects scientific and technical practices, management, machine design, inter-personal communication, public expression and the media.

Most disagreements on intellectual rights can arguably arise from different takes on the changes introduced by ICT. What the latter allow, first and foremost, are exchanges and collaboration on a very large scale, with minimal transaction costs. ‘Transaction costs’ include costs linked to monetary transactions, contracts or any other type of agreement, but also the cost of detecting the necessary skills for a project, or of reaching the public interested by a given content. These extraordinary benefits are only realized, however, if information exchanges are ‘free’ (i.e. not subjected to prior agreements, transactions, authorizations or pre-use controls). It is thus in the ‘non-market’ sphere that the advantages of the information revolution are most evident: access to works and knowledge and evaluation of their interest, distributed co-operation towards the production of informational tools such as software, collaborative media, etc.

Nowadays, the true disagreement no longer concerns the radical nature of the changes introduced by ICT, but rather the acceptance of their effects. Some consider it desirable to impose, in the information domain, the scarcity and degree of control which were unavoidable in the sphere of physical carriers. They argue that this is needed in order to preserve certain functions which existed in the latter sphere (investment into the production of certain contents, remuneration of the authors, sign-posting of interesting content). It is our intention to demonstrate that, on the contrary, freeing up non-market exchanges between individuals can have a generally positive impact on culture and the creative economy.

Computers and the Internet have made it possible to exchange works on a much larger scale, without depriving the original owner of access to them. Thus, an activity widely recognized as useful – sharing a work of art or opinion with someone else – becomes possible on a much greater scale. Does that suddenly make it harmful? In terms of providing a channel for access to culture and knowledge, it can only be an improvement. However, some parties adamantly reject sharing, equating it with criminal activities such as stealing and piracy. They must have a reasonable motive for doing this. Just what does sharing harm in the modern sense of sharing information or files? What it really threatens is exclusive control of the supply of works. The conflicts surrounding culture and the Internet intensify around one issue: in tomorrow’s world, who will determine which cultural works reach the public and how? In the era of large, centralized cultural industries, it was not authors and other contributors, but large publishers and distributors, who had an almost exclusive control on distribution. To understand how this situation is likely to evolve and what challenges this evolution might imply, we must first explore in more depth what sharing is.

The potential afforded by the sharing of information has been largely realized in certain areas, such as the open Web in general, public expression in blogs and collaborative media. We owe it a major regeneration of democratic processes. In other domains, such as photography, music and video communities, we have live experiments in voluntary sharing of works by their authors, using Creative Commons or similar licenses. Finally, we have a giant file sharing laboratory, using dozens of tools and technologies, for commercially distributed copyrighted works, but also for the on-line archiving of public radio and TV programs, or the provision by members of the public of rare or orphan works.2 So large a flow of cultural exchanges between individuals is unprecedented, and hence it is worth a closer look.

File sharing

Accessing and sharing contents

The publishing industry thinks in terms of access to contents: will people download digital works? Will they access them through streaming (a technology that enables users to listen to or view contents stored in a central server without downloading a copy)? For the industry, sharing is just another way of accessing contents without their permission. However, sharing between individuals leads to very different practices in comparison to downloads or streaming from centralized sites: when individuals decide what to make available to others (and this could be all the documents they have in digital form), what they share directly reflects their preferences. By contrast, on centralized sites, there is a bias towards specific contents which are made more visible than others, either through advertising or because many other people are accessing them.

There are many ways to share contents beyond peer-to-peer file sharing. Swapping USB keys, for instance, is a popular way of sharing digital works today. At first sight, it suffers from the same limitations as sharing physical books did in the past: the USB key has to travel physically from one individual’s computer to another. However, this exchange is quite efficient in practice, because USB keys now have large capacities, enabling them to hold large sets of works, and because just about everyone in developed countries (and soon elsewhere) is equipped with them. The old practice of making contents one likes available on a personal website has become less frequent, because of the risk of being charged with copyright infringement. Newsgroups, which are a form of email list servers whose messages are sent to subscribers, existed well before the Web, but remain a very efficient way to obtain some contents “on request” among communities interested in specific contents, which often are no longer accessible easily on the commercial market.

At the end of 1998, a young student called Shawn Fanning started developing Napster, a system to share MP3 music files among individuals. Napster started operating in June 1999. At its peak, the system had more than 25 million users and 80 million files.3 Works in various media were shared using the Internet long before Napster,4 but Napster was responsible for making file sharing – as an expression and as a practice – popular amongst the general public. The importance of Napster lies both in its architecture and its philosophy. Napster had some flaws, which made it an easier target for law suits. For instance, it was based on a single central register of which users hosted which file. However, it was a true sharing tool, where access to a file was obtained by an individual from other individuals, a principle that came to be known as peer-to-peer file sharing, or P2P for short.5 Napster launched the idea of personal music library pooling systems, where users have access to the music libraries of all other users. Pooling libraries is an old dream, already present in antiquity. The Ptolemies implemented it in a somewhat centralized and confiscatory manner: every ship landing in Alexandria was required to hand over any papyrus scrolls on board to the Library of Alexandria, where they were kept, the original owners receiving only a copy [Philips 2010]. The Renaissance humanists practiced it in a more civilized manner by exchanging copies between themselves. As for modern libraries, some countries have revived a softer version of the ancient rule, by requiring a legal deposit of a copy of all published works in one or more libraries. However, the need to move or store objects physically is a significant hindrance for centralized or pooled libraries. Digital technology and universal information networks have now removed this limitation. At the time of writing, the Comparison of File Sharing Applications page6 on Wikipedia lists some 60 applications (not all active at present), but these only represent some of the many ways to share digital works. If the non-market sharing of digitally published works is recognized as legitimate, other ways of sharing files between individuals that are presently too risky in terms of prosecution will become possible again, such as simply putting digital works on-line on a personal website.7 More generally, file sharing practices will no longer occur in a semi-clandestine fashion, their practitioners will no longer be called bad names or be subject to surveillance and pursuit by private police organizations. The information sphere will no longer be polluted by fakes in the name of the war on peer-to-peer networks.

We must thus analyze not just what sharing is today, but what it could be in a different situation, where more people are able to work towards quality in sharing: quality of the digital representation of works, of their attribution to authors and contributors, of the tools available to search for them, to identify those with interesting content and flag them. Of course, not everyone is interested in contributing to quality improvement, but it is enough that a few are, and are able to work openly: new services and intermediaries emerge, and reputations are made.

Fakes

Fakes are files purporting to contain a given work, whereas their content is in fact different (for example a short excerpt that is looped over and over). Most fakes are deliberately injected on behalf of publishers, who call upon the services of specialized companies to wage this war on peer-to-peer networks (this practice is called P2P warfare). If non-market exchanges were recognized, files which mask one content with another would probably continue to exist (for example to disseminate pornographic content), but it would be much easier to detect and avoid them. Far from fighting this latter type of abuse, those who oppose file sharing currently exploit it in order to discredit what they object to. Unfortunately for them, MediaDefender, a market leader for the injection of fakes on behalf of the major companies, was caught red-handed running a parallel business in fakes which redirected users to its own paying pornographic sites [Salliou 2008].

Sharing is useful

The information age is one where many more people engage in producing contents in various media and expressing themselves towards an open public than ever. As we will see in section Rewards, 11 to 20% of the population older than 15 in developed countries engage in producing contents for sharing on the Internet, and this proportion is constantly on the increase. A reasonable cultural policy must endeavor at making a many-to-all cultural society sustainable, at ensuring that each human being can contribute to and participate in such a society, according to their wishes and abilities.8 Sharing is useful because it contributes to this perspective in many ways.

Sharing as cultural empowerment

The first useful quality of sharing is obvious, though it is often forgotten even by its advocates: sharing is not the same thing as access. If its adversaries do not speak of sharing, but rather of piracy or illegal/unauthorized downloading, streaming and access, it is because they are well aware that a direct attack on sharing would be easy to criticize. Sharing is an act of making something available to others, just like – in a more minor way – recommending a work to someone or – in a more involved way – re-using one in a creative process. This is why, even when one is not the author of a digital work, sharing it with others is a step towards cultural empowerment. This step is particularly important, because it can be practiced by all, at a very limited entry cost.

How much sharing with how many people?

Curiously, little attention was paid to the fact that uploading or making works available to others through a P2P network requires resources. A full music album, compressed using the FLAC lossless compression codec popular among demanding file sharers, represents 340 megabytes. A decent quality MP3 version represents 140 megabytes. Assuming a bandwidth of 512 kilobits per second (a realistic estimate of the true upload bandwidth available on average to broadband Internet subscribers in developed countries), it takes 19 to 45 minutes to upload such an album once. Even if a broadband connection is used only for this, only 1000 to 2250 albums can be uploaded per month. In practice, P2P networks allow the user to limit the upload bandwidth they consume. A typical choice for a “good sharing citizen” is 40 kilobits per second, corresponding to a maximum of 80 to 200 music albums uploaded per month (and far fewer movies). Of course, future technological advances will raise these limits, but the idea of one person directly making millions of files available remains a complete fantasy. Note that this reasoning also applies to protocols such as BitTorrent, which allow users to obtain different parts of a file from different sources: the compound upload time remains the same. We will see that some protocols are favorable to cultural diversity, while others are less so. Using the diversity-prone protocols, many people can – together – share a very large common library, but each shares only some works with some people. This also explains why USB key swapping, despite its physical location limits, remains an attractive way of sharing files.

File sharing is fundamentally different from streaming, because through the former, one comes into possession of a copy of the works. This copy can be searched, read, listened to, or viewed ad lib, with whatever tools one chooses. This is not just a matter of convenience, it also enables specific activities: comparison, analysis, criticism, or re-use. Most significantly perhaps, sharing empowers Internet users by enabling them to act as a distributor, as a relay for the dissemination of a work. This is so important to them that many are prepared to devote significant money and time resources to sharing.

Cultural diversity

Cultural diversity has many dimensions: how diverse are the works that are produced? How many creators contribute to them? By how many channels are they distributed? How many languages are represented and to what extent? Some of these dimensions are difficult to assess, not least because the diversity of works cannot be reduced to an objective measure. Others can be misleading: the increase in the number of television channels has not necessarily increased the diversity of sources of contents, because the contents that receive the most attention actually come from a limited number of sources, for instance companies such as Endemol that design “formats” of shows that are then “customized” for given countries. In this book, we focus only on two dimensions of cultural diversity: the range of works that are accessible to users in practice, and the diversity of attention given by users to works in various media.

Sharing contributes to cultural diversity first by enlarging the set of works that are made accessible to the public at a given time in a given geographic area. To take stock of the immense changes that have already happened in this arena, it is useful to distinguish between 4 types of on-line cultural, informational or expressive contents:

  • material that is de facto shared voluntarily by authors without explicit licenses, where non-commercial sharing by individuals carries no practical risk of copyright litigation;
  • material that is explicitly submitted to licenses that authorize at least non-commercial sharing;
  • material that is orphan, is no longer or never was distributed commercially, or was produced by public organizations, and which is shared by individuals without authorization;
  • commercially distributed material shared by individuals without authorization.

Legally inclined readers might find the distinction between the last three categories surprising: all three cover copyrighted material, and sharing it without authorization constitutes a copyright infringement, unless some fair use, fair dealing, exception or limitation applies.9 However, our purpose here is to chart different forms of sharing of digital works as they developed on the Internet. If certain practices have been accepted or tolerated by authors, or treated leniently by judges when possible, this might indicate that they are perceived as useful.10

De facto sharing without explicit licenses was the first large-scale form of sharing on the Internet. The success of the Web as an information and knowledge sharing platform was based on the fact that people put on-line huge amounts of valuable material, in forms that allowed for it to be easily linked to, copied, pasted, sent to others by email, and often reproduced on the Web itself. As described in [Benkler 2006], this gave birth to a giant non-market sphere of information and knowledge activities. The world-wideWebSize site11 computes on a daily basis the number of Web pages indexed by search engines, which generally means that their contents can be easily copied. At the time of writing, the figure for Google is of the order of 30,000 million. Of course, not all of these web pages can be considered to be shared de facto, but a significant proportion certainly is. It is interesting to note than their number is probably of the same order of magnitude as the number of Internet users… or the number of human beings.

During the first years of development of the Web, the media industry largely ignored it. Retrospectively, it seems that it simply did not fit their world view, precisely because of its non-market character. Hollywood, for instance was obsessed with digital technology at the time, but in form of DVDs and their copy protection systems.12 The industry lobbied to obtain a legal protection against circumvention of anti-copying technology – making it illegal to work around copy-prevention technology in order to do the copying. This was first met by a rebuttal from FCC in 1994, but the Clinton administration then pushed it through the World Intellectual Property Organization (WIPO), where it was included in the 1996 WIPO copyright treaties.13 Meanwhile, the first license explicitly authorizing sharing at least for non-commercial use, the Open Content License, was released in July 1998,14 soon followed by the GNU Free Documentation License15 in March 2000, the Licence Art Libre (Free Art License) in July 2000, and the Creative Commons Licenses16 in December 2002. As we have already mentioned, the unauthorized sharing of copyrighted material underwent something of an explosion in the same period with the birth of Napster.

For media where it was widely adopted, voluntary sharing greatly increased the number of works made accessible to the public, under terms that authorize copy and redistribution, and often free re-use with properly signaled modifications, for non-commercial or even commercial purposes. On the Flickr site17 alone, more than 175 million photographs are shared under Creative Commons licenses, 115 million of which can be reused with modifications. Even if a judgment on quality is always difficult, particularly on such a large scale, a significant fraction of these photographs seem to be of real quality and interest, even though the site’s policy leads to many images being available only at resolutions up to 1024 by 768 pixels. The interested reader might experiment by searching for photographs on any given subject on the Creative Commons part of the site.18 Blog posts, scientific publications, and on-line encyclopedias such as Wikipedia are other examples of domains in which voluntary sharing has considerably extended the range of accessible contents. Overall, there are several hundred million on-line documents under free sharing licenses: a 2010 estimate reported 350 million works for Creative Commons licenses alone [Cronin 2010].

The situation is different for some media that preexisted the Web, in particular recorded music and moving image, and more recently books. There, certain players have such a degree of control over the commercial distribution, promotion and revenue sources that they can dissuade many artists or producers from practicing voluntary sharing. In some cases, dissuasion is replaced by prohibition: collecting societies for music in Europe almost always require their members to give an exclusive management mandate for all rights on all works. This effectively forbids authors from explicitly authorizing non-commercial sharing of their works between individuals. In these domains, unauthorized sharing, in particular P2P, plays a key role in extending the range of works that are accessible to the public. In the data collected by [Aidouni et al. 2009] regarding sharing traffic on an eDonkey server during 10 weeks in 2008, no fewer than 275 million files were made available by users. Most of them are likely to be music, as moving image sharing had already moved largely to BitTorrent sharing at that time. Not all of them are shared without authorization: P2P networks are used to share government data, free software… or self-published books. It is not easy to know how many different works these files represented. During the 10 weeks of the study, users obtained 40 million different file identifiers in answer to their queries, 12 million were actually downloaded more than once. A safe estimate is that no fewer than 10 million different music tracks (songs) were made available for sharing.19 Thus, this form of sharing alone made more tracks available than the compound commercial offers at the time.20

Unauthorized or tolerated sharing is of particular importance for orphan and out-of-publication works, categories that cover a very significant share of our culture. A great proportion of copyrighted works are orphan or out-of-publication works: these represent an estimated two-thirds of books, for instance [Brantley 2009]. The proportion is lower for recorded media such as music, moving image documents being in an intermediate situation.21 Though this may seem strange, some public organizations have also turned a large part of our public domain cultural heritage into a new form of property: heritage organizations such as libraries, museums and archives, often totally or predominantly funded by the public, claim exclusive rights on the digitized versions of these works, and fail to give access to them under conditions that respect the rights of everyone towards the public domain. Though there are recent public policy efforts to ensure a better accessibility to orphan, out-of-publication and public domain works, the various forms of sharing can be credited with important successes in these matters. Volunteers and not-for-profit projects have scanned, OCR-ed, or re-typed and formatted significant collections of public domain works in the Internet Archive OpenLibrary project,22 WikiSource23 and Project Gutenberg.24 Many more orphan or out-of-publication works are accessible on file sharing networks, in proportions that vary significantly depending on the sharing protocol (see below, Attention diversity varies with forms of sharing). The wide diversity of contents available in file sharing has led to practices that are not possible in commercial contexts: comparison between numerous performances for songs or classical music, constitution of specialized personal music collections.

We now need to investigate another facet of sharing: if many works are available, is the actual access to these works truly diverse?

Attention diversity

The popularity of works has been studied for various media for as long as a century. To do so, researchers measured the popularity of works, for example, the number of times a given book was requested in a library. They then plotted this number, ranked by decreasing popularity, showing the most popular on the left, and the least popular on the right. In large real-world situations, popularity diagrams of this type are not readable, as the curve becomes very close to the axis. To see what is going on, one has to use logarithmic scales, or even better, to plot the cumulative popularity. The cumulative popularity expresses the number of requests for access for works down to a specific popularity rank. We coined the expression “diversity of attention” [Aigrain 2006] to designate a property of access to or usage of works that had long been recognized as important: how much is the attention that people give to works spread over many works or concentrated on a few? If the diversity of attention is large, the cumulative popularity curve increases gradually throughout the range of popularity ranks. On the other hand, if attention is concentrated almost exclusively on the most popular works, the cumulative popularity curve rises sharply at first, then flattens off. In recent years, the focus has shifted towards studying the diversity of attention to digital and on-line works, though the methodology remains similar.

Traditional publishing of books, records, video tapes or disks selects a limited number of works and tries to maximize their commercial distribution. However, the digitization of all media and the spread of the Internet have strongly decreased the effort and cost of producing and distributing copies of works. This has given birth to the decentralized sharing of works between individuals, and has also led to new commercial publishing models (on-demand publishing, commercial download sites, streaming) both legal and illegal. How do these various models affect the diversity of attention? We started studying this issue in 2005, focusing first on the comparison between voluntary sharing communities on the Internet and the commercial distribution of books, records and DVDs that were predominant at the time. Figure 3.1 illustrates the key aspect of these studies: they are concerned with how much the works of intermediate popularity, those that do not belong to the 1 to 5% most popular titles, but rather to the following 30%, receive attention. In the example of figure 3.1, the music tracks in the [4%-34%] most popular titles range receive close to 43% of the total access, a figure that is, as we will see, very high, indicating a very diverse attention. The idea here is that the titles that are not the most popular but did receive some attention (and thus are likely to be of some interest at least for some users) are the reservoir of cultural diversity.25

Of course, our choice of 4% and 34% is somewhat arbitrary. To avoid such arbitrary choices, researchers have long tried to characterize the shape of popularity distributions by a single parameter that would provide an objective estimate of the diversity of attention. An historic advance was made in the 1930s when George Zipf [Zipf 1935], a Harvard linguist interested in the study of the frequency of words in languages, formulated a law that was found to apply to many real-world popularity distributions. Zipf’s law can be formulated as the level of access of amath n^(t h) endamath most popular work being proportional to 1 / n^\alpha where α is a parameter that can vary. Similar laws have been found to apply in many domains, such as the wealth held by individuals or the size of cities. In appendix A, we provide a comprehensive historical and technical background on these models that are at the heart of debates on the so-called Long Tail theory proposed by Chris Anderson [Anderson 2004, 2006, 2009], for instance. Fascinatingly, in many real-world situations the parameter α was found to be close to 1, which led to a – false – popular belief that it is always the case, and that in any form of cultural access, the 20% most popular works always receive more or less 80% of the attention (which became known as “the 20/80 rule”). In our example of figure 3.1, the 20% most popular musical tracks receive only 46% of access.

As one can see in figure 3.2, the best-fitting Zipf law is not always a perfect approximation of a popularity distribution. The exact reasons why popularity distributions and other ranked distributions follow Zipf’s law, or deviate from them slightly, are still an object of speculation: see appendix A for our own tentative explanations. Many factors combine to generate the observed spectrum of access to works.

In the many types of commercial distribution or non-market sharing schemes we have studied, we have found that the best-fitting Zipf laws had parameters ranging from 0.5, for voluntary information sharing communities such as the Musique Libre site mentioned above, to 1.41 for the sales of albums published by the music majors in France in 2004 or 2005 [Moreau et al. 2006]. These figures correspond to extreme differences in diversity of attention, as illustrated in figure 3.3.

figure3-1
Figure 3.1: Cumulative access (listening or downloads) to the 5565 music tracks available on the Musique Libre site in 2006, normalized for how long they had been on-line [Aigrain 2006].

figure3-2
Figure 3.2: Comparison of the cumulated access on the Musique Libre site in 2006, with the cumulated access that would result from the best-fitting Zipf law (see appendix A for technical details).

For advertising-funded commercial sites or commercial sales, the full distribution of access to works is not made public, as it is considered to be sensitive commercial information. As a result, only very partial information is available to researchers. One of our key policy recommendations is that the publication of the distribution data for rights collected by collecting societies from each type of source should be required by law. Until such a policy is in place, researchers have to work with partial information, occasionally made available under the form “x% of works represents y% of sales” or “the N most popular works received y% of access”. From this data and information about the size of the universe of works, one can estimate the corresponding Zipf’s law parameter and derive the full distribution of attention curve. This is of course an approximation, but in our opinion it is a decent one, and one that will hopefully be validated further in the future.

figure3-3
Figure 3.3: Cumulated attention for extreme observed cases.

For the on-line commercial distribution of works, where the number of titles is much larger than for published CDs, the available information on the diversity of attention is even scarcer. Will Page, the Chief Economist of PRS for Music, the music collecting society in the UK, and Eric Garland, CEO of the music industry journal BigChampagne published a curve for an unstated “legal single downloading site” in which 5% of works generate 90% of revenues among a set of 1.5 million titles [Page-Garland 2009]. We derive from it a cautious estimate that the distribution of sales would correspond to a Zipf law of parameter 1.1:26 more diverse than sales of CDs by the majors, but still very concentrated.

What about full-scale file sharing? Thanks to a remarkable data collection and publication effort conducted by Mathieu Latapy and his colleagues at Université Paris 6 [Aidouni et al. 2009], we were able to study the diversity of access to files in a large segment of P2P sharing. They collected data exchanges through one of the eDonkey servers during a 10-week period in 2008. No fewer than 90 million users were involved in sharing using this server in this period, and 12 million files were downloaded at least once. Analyzing this data from a diversity of attention point of view faces three challenges:

  • Several files, each characterized by a file identifier, can correspond to the same work. In a later study on the BitTorrent P2P file sharing of movies in Hungary, Bodó Balázs [Balazs-Lakatos 2010] “crowd-sourced” the huge task of mapping files to works: he called for volunteers to share the workload and obtained the complete results in only a week. However, this was possible only because his study was focused on film, and considered only a few tens of thousands of files. Two useful lessons can nonetheless be drawn from Bodó Balázs’ study: the average number of files per work was around 5, and the observed distribution of attention for files and for works was relatively similar.
  • The eDonkey data have been rendered anonymous, in a way that makes it impossible to select only works in a given medium, such as music.
  • eDonkey file sharing is heavily polluted by fake files [Aidouni et al. 2009], [Lee et al. 2006], which results in Zipf’s law being quite a poor fit to the distribution of access, except when considering only a few hundred thousand most popular files.

More details of our analysis are given in appendix A. We have chosen to focus on the 2 million most popular files in the eDonkey sharing. These files received 94% of all access, and they are likely to contain a high proportion of music (as film, video and TV sharing had already switched to BitTorrent by the time, see Oberholzer-Strumpf 2010, p. 12). The average number of files per work is probably lower than for film, as there are no duplicates due to language versions, and users quickly select the best quality file for a given work. In figure 3.4, we compare the observed distribution of attention in P2P file sharing with the distribution mentioned above for commercial single downloads. The diversity of attention distribution presented here for P2P is only an approximation of the one for only individual music works.27 However, the huge difference in attention for intermediate popularity works (60% of all access versus 10%) leaves no room for doubt: even if our estimates are revised in later studies, the strongly increased diversity of attention in this type of P2P file sharing compared to commercial downloads will still hold. These findings are consistent with those reported in a study using interviews of sharers [TNO 2009]. However, we will see below that not all forms of sharing have so positive an effect on the diversity of attention to works: BitTorrent sharing appears to lead to a more concentrated access to works.

Once sharing is recognized legally, the diversity of attention in sharing will be subjected to contradictory trends. On one hand, it will become higher, because sharing will no longer be clandestine: when one can expect the sharing commons to remain accessible without risk, it makes sense to make rare works available, and to expect others to do the same, whereas at the moment the stigmatization and repression of sharing focus it on high-demand works. On the other hand, commercial players will come to realize the importance of being visible in file sharing, will aim their promotion at it, and this could lead to a greater concentration of attention in some of the related channels.

figure3-4
Figure 3.4: Comparison between the observed cumulated attention for the 2 million most popular files in eDonkey sharing and the cumulated attention for a distribution similar to the one studied by [Page-Garland 2009] for single commercial downloads, adjusted for universe size.

Attention diversity varies with forms of sharing

There are important differences between the different ways of sharing digital works in terms of their impact on cultural diversity: studies show that eDonkey/eMule P2P file sharing using the eMule protocol leads to a greater diversity of attention than sharing using BitTorrent tracker sites. In their study of music, [Page-Garland 2009] also studied sharing through a peer-to-peer protocol which they didn’t specify, but that appears to be BitTorrent. From the curve they presented, the distribution of access has a level of diversity similar to a Zipf law with parameter around 1 for a universe of 1.5 million works: the 95% least popular works get only 24% of access. [Envisional 2011] recently studied sharing for all media on the PublicBT BitTorrent tracker. This study raises some methodological questions: for instance, they studied one single day of sharing. Their study leads to similar concentration estimates: among the 1,481,479 torrents that were downloaded at least once, those which were downloaded more than 100 times, that is 0.439%, account for 30.4% of all access. This corresponds to a Zipf law parameter of approximately 1.04. [Balazs-Lakatos 2011], in their aforementioned study of BitTorrent film sharing in Hungary during 3 months in 2008, collected high quality anonymized data to which they gave us access.28 Figure 3.5 plots the cumulated distribution of access for this data. The 95% least popular films obtain 41.2% of access. At first sight, this might seem much more diverse than the results from [Page-Garland 2009], but we are here in a much smaller universe of 1542 films, where this corresponds to a Zipf law with parameter 0.965.29 It seems indeed that BitTorrent sharing leads to a significantly stronger concentration of attention than other forms of P2P sharing.

figure3-5
Figure 3.5: Cumulated observed access for film sharing in Hungary (1542 films shared over 3 months in 2008), data from [Balazs-Lakatos 2011].

Researchers have proposed explanations of why BitTorrent is less favorable to diversity of attention. Bodó Balázs and Zoltán Lakatos noted that: “Unlike DC++30 file sharing hubs that usually prescribe a minimum amount of data to be offered in a shared library, BitTorrent trackers require that a user balances his/her upload/download ratio around 1.0. This technical setup has serious implications on how content is distributed and consumed on each network. Users around DC++ form large, searchable archives, where the amount of data shared is a source of pride and recognition. BitTorrent, on the other hand discourages the emergence of large individual shared libraries as such large libraries offer little reward in terms of the valuable upload ratio.” [Oberholzer-Gee-Strumpf 2010, note 16] commented in another study: “The concentration of movie downloads in part reflects the current BitTorrent technology. Index sites, which list the files available for download, typically de-list a title when no one is sharing31 a complete copy for some length of time. As a result, less popular movies become often unavailable, as are older movies since the number of shared copies tends to decline over time.”

Though it is hard to obtain reliable information about schemes such as USB keys, downloading from personal sites, or access through newsgroups, we expect them to be strongly favorable to diversity of attention.

The media industry opposition to file sharing

The overall impact of sharing, and more generally of information and communication technology, on the cultural economy will be discussed in chapter 4, and specifically in sections Compensation schemes and Passing copyright-law tests. Sharing is important for all facets of the cultural and media scene, including books, photographs and new, Internet-native media. But it is the music and motion picture majors that have started a real war on sharing.32 Our purpose here is to list the possible reasons for their fierce opposition to sharing. The impact of sharing on sales of works cannot be the sole motivation, as numerous studies are showing that it is limited or nonexistent. It is thus reasonable to consider a range of other reasons:

  • Cultural or ideological factors should not be underestimated. The industry has assumed for decades that its business rests on a degree of exclusive control over the production and dissemination of copies of works, so its reaction to the loss of this control is unsurprising, even though its profits are still healthy. In particular, large media firms find it hard to relinquish the very attractive prospect of producing and distributing copies of works for next to nothing, whilst retaining exclusive control over the process and charging monopoly prices. They have seen the shimmering mirage of Eldorado and are not ready to let go of it.
  • Just when more titles than ever were published on records and DVDs, the majors have chosen to restrict their offer: the number of music titles distributed by major companies has shrunk by a factor 4 or 5 at least. This approach has arguably been successful, in that they have maintained their profit per title, but it clearly fails for direct digital distribution (see below). As a result, large media firms are now trying to install new forms of control on digital distribution channels, and this is easier to implement in centralized distribution channels than in the context of decentralized sharing.
  • Similarly, a key foundation of their present business models is their ability to concentrate the public’s attention on a limited set of works. Major companies have been investing more and more in heavy promotion of a limited number of titles, with shorter and shorter individual lifetimes. This is clearly at loggerheads with the trends favored by sharing: increased diversity of attention and enlarged range of accessible works.
  • Quite simply, they are afraid of the unknown. In truth, sharing has had only a limited effect on them so far. But they fear that if it was recognized legally, it would turn into a black hole that would swallow the creative economy whole.

After close to 15 years of this “war on sharing”, the large media companies now know that it is here to stay. But they still hope that they can keep it clandestine, polluted, and stigmatized. They may be playing for time, trying to install some control over new channels before they have to live with sharing. In particular, it makes sense for them to try to push users back into a passive consumption mode. But although this passivity might be desirable for the cultural industries that flourished in the pre-digital era, it is not in the public’s best interest, and policy-makers should not necessarily embrace it.

When digital works are shared clandestinely, the resulting diversity of attention is lower than when sharing occurs in the open.33 This is because in a legally recognized context, one can rely on a degree of longevity and accumulation. Sharing rare works and, in return, obtaining others which one didn’t have access to become credible propositions. When unauthorized sharing faces repression, sharers are led to prefer schemes providing a fast access to recent works, such as BitTorrent. However, even in such situations, attention is still less concentrated on a few works than in a central publishing model.

Despite many years of “war on piracy”, unauthorized file sharing has already started to have positive effects on cultural diversity. At the 2010 MIDEM (a yearly music publishing business fair held in Cannes), SACEM, the French collecting society for authors and composers of music made an apparently mundane, but in fact very noteworthy statement [Lefeuvre 2010]. The spokesperson for SACEM explained that the collected rights from digital sales remained very low, adding up to only €6.5 million for the year 2009, but went on to mention a “long tail nightmare, with the four previous years resulting in 409 million sales spread over 2.6 million titles”.34 Let us start by addressing the first part of the statement. The weak development of commercial downloads, slower in Europe than in the US, may be attributed to many factors. The media publishing industry attribute it to the “unfair competition” of “piracy”, despite evidence that file sharers buy at least as much digital music or video as people who abstain from sharing.35 The industry’s critics see it as a sign of the rejection of outdated commercial models that fail to recognize user rights or to provide more than just access to a digital file.

Delimiting the non-market sphere

‘Non-market’ doesn’t just mean not having to pay to access a piece of work. Access to a catalog following a subscription is not ‘non-market’, even if one does not have to carry out a monetary transaction to access each work. On the other hand, one might charge for the means to carry out certain activities without the latter losing their non-market nature. The case of content-hosting sites which are financed by advertising deserves a separate analysis: they are nominally used in a non-market way, but since they trade the attention time of their users with advertisers, for that part of their activity these sites should be considered as commercial distributors like any others.

‘Non-market’ doesn’t mean administered. On the contrary, the development of non-market information-based activities represents a new step towards the realization of the efficient allocation of resources long promised by market economics. Markets, despite their value, are struggling to deliver this, due to their practical organization: unequal access to information and power, control over distribution channels, interdependence between products and technologies. Similarly, ‘non-market’ activities are not outside the economy. The supply of means to exchange information represents twice as large a fraction of gross domestic product (GDP) as the sale of information: see [UNU-MERIT 2006, pp. 123-126]. For more explanations of the value of non-market exchanges and other indirect means to fuel culture and other information-based activities, see The Wealth of Networks by Yochai Benkler [Benkler 2006].

Moving to the second part of the statement, the “long tail nightmare”, which plagues SACEM and the majors, is good news for cultural diversity. It is testimony to the fact that it is more difficult to concentrate attention on a limited number of titles in the digital sphere than in physical distribution, at least when complementary channels such as file sharing exist. The 2009 annual report of SACEM claims that “one can only note the extreme concentration of sales on a few titles” mentioning that “on iTunes, only 10 titles were sold more than 25,000 times, while the number of titles sold via download only was 20 million” [SACEM 2010, our translation].36 We will come back later to the issue of concentration of sales on iTunes, which is indeed strong in comparison to P2P file sharing. But the stated figures do not imply a stronger concentration than for record sales, quite the contrary: the key difference lies in the number of titles made available and the low level of average sales. What SACEM actually means is that few titles generate copyright revenues at levels which they can efficiently manage. Our thesis is that the increased diversity of attention can give rise to new resources to enable creative activity and a manageable and equitable distribution of funding and income. What remains to be seen is whether the collecting societies (where they exist) and the majors everywhere can adapt to this new world. Up to now, they have focused on preventing it from becoming a reality.

In the last few years, P2P sharing, whose protocols consider each individual as both a distributor and receiver, is said to have declined in favor of authorized or illegal streaming servers. Streaming services give access to a wide variety of contents, and thus few commentators have noted that this partial replacement of P2P networks by streaming servers is far from being good news. The business models of the operators of streaming services such as Deezer or Spotify are based on advertising, subscriptions, and content producers paying for the promotion of their contents. The fact that the 4 major phonographic companies have taken a participation in Spotify [Redwood 2010], while allowing it to provide access to their catalog, should act as a warning. This behavior can be seen as an effort to retain, in this new channel, the same strong control over which works reach the attention of the public that they have in classical publishing.37 Furthermore, if streaming becomes the dominant form of access to works, individuals would be turned into passive receivers.

If and when file sharing is recognized as a legitimate activity, it will become possible for users to choose technology and services based on their merits and properties, and not just because it is less risky to use one than the other. This would transform the current situation not because of the existence of file sharing, whose already massive scale would increase yet further, but because of the official legitimacy of exchange practices. It would result in a wider attention to creative works and a better recognition of their authors. The diversity of works able to reach a significant audience would vastly increase. The quality of the digital representation of shared works would be much improved. New services would emerge to support these exchanges. Creators and producers would compete to set up the most productive relationships between individuals and the other cornerstones of the creative economy, namely on-line artistic communities, services such as concerts, teaching or projection in theaters, or new forms of publishing on carriers such as collector sets and mixed-media publishing.

If exchanges of files containing creative works without specific authorization are useful, one may ask why we advocate recognizing only those that are non-market. There are two reasons for this, which we develop in the following chapters:

  • the need to maximize the benefits that these exchanges yield (see box on the specific benefits of non-market exchanges in the information sphere above on chapter 3);
  • the need to ensure that these exchanges co-exist as harmoniously as possible with other cultural activities, particularly those which lead to monetary transactions and help fund creative activities.
  • 1. See section “First sale doctrine and home copying laws” on chapter 5.
  • 2. Orphan works are copyrighted works whose rights holders are not known or cannot be contacted. It is thus impossible to obtain permission to use them. An impressively large proportion of cultural works are orphan works, in particular for books.
  • 3. See http://en.wikipedia.org/wiki/Napster.
  • 4. FTP sites and the Usenet news groups predated Napster by close to 20 years, and they were widely used to share contents such as photographs or software.
  • 5. Peer-to-peer file sharing is the sharing of files between one person and another using various protocols (technical standards for machine to machine communication) on peer-to-peer networks such as the Internet. A peer-to-peer network is a network where every machine is considered as an emitter as well as a receiver.
  • 6. See http://en.wikipedia.org/wiki/Comparison_of_file_sharing_applications.
  • 7. By now the reader will understand that, when we talk of the recognition of sharing, we mean the attribution of a right to practice it without having to request a specific authorization from the authors or copyright owners.
  • 8. For a more detailed definition of a many-to-all cultural society, see chapter 4.
  • 9. In the US, fair use provides someone accused of infringement with a defense against prosecution based on the fact that the activities that took place constitute a fair use of the work whose prohibition would damage other values (such as criticism, for instance). Fair dealing is a weaker version in other common law countries such as the UK. In civil law countries, exceptions and limitations to copyright were defined on a case by case basis without a general overarching concept. In international harmonization, exceptions and limitations were used as the federating principle, and the fair use chapter is now under the exception and limitation chapter of copyright in the US Code.
  • 10. Most copyright infringement cases brought against non-commercial uses by individuals were initiated by collecting societies, large companies or their lobby groups, or financially greedy heirs of deceased artists, rather than by individual authors. The exceptions arose from some big income music groups, whose representation was delegated to managers, or authors that were particularly keen on control. When cases were brought, the outcome was geographically contrasted. In Europe, courts often invoked exceptions or procedural issues to acquit defendants. When defendants were found guilty of infringement but were not deemed to be aiming to make a profit, the sanctions and damages were generally limited [Hugenholtz 2008]. In the US, the large number of cases initiated by the Recording Industry Association of America (RIAA) against non-commercial file sharers led to sentences with enormous damages tags (see section Compensation schemes). In the end, this backfired against RIAA, which had to renounce suing such cases in the face of public indignation. These situations led the media industry interest groups and, in Europe, the collecting societies, to request stronger (in comparison to the European practice) and automatic sanctions, bypassing the judiciary by allowing private parties (such as Internet Service Providers or administrative authorities) to set the sanctions, or using legal ordnances to limit the leeway of the judges.
  • 11. See http://www.worldwidewebsize.com/.
  • 12. DVD disks and players were available starting in November 1996, but work on the format started in 1993, see http://en.wikipedia.org/wiki/DVD.
  • 13. See [FLOSSIMPACT 2006], pp. 230-234), [Lehman 1995] for the arguments brought forward to justify such provisions.
  • 14. See http://www.opencontent.org/opl.shtml. This license was discontinued when the Creative Commons licenses became widely used.
  • 15. See http://www.gnu.org/licenses/fdl.html.
  • 16. See http://creativecommons.org/licenses/.
  • 17. See http://flickr.com.
  • 18. See http://www.flickr.com/creativecommons/.
  • 19. There are several files for the same work, but most files contain generally a full album.
  • 20. In April 2008, the ITunes Store claimed to have 6 million song tracks available. See: http://answers.yahoo.com/question/index?qid=20080419130713AAHHUum.
  • 21. Since there are many different rights holders for film and TV works, the challenge of identifying rights holders can be severe.
  • 22. See http://openlibrary.org/
  • 23. See http://wikisource.org/.
  • 24. See http://www.gutenberg.org.
  • 25. This conceptualization was suggested by Hervé Le Crosnier.
  • 26. More precisely 1.104.
  • 27. Another factor that can impact our analysis is that the unit of music sharing on P2P tends to be a full album, while [Page-Garland 2009] studied single track commercial sales. However, the sizes of the two universes are similar, which should limit the impact of the different sharing unit.
  • 28. We are extremely grateful to these researchers for providing us with advance access to this data, which will be published as a paper in English by the time this book is published.
  • 29. See Appendix A and figure A.4 for explanations of the impact of the size of an universe on the share of attention going to a given percentage of works for the same value of the parameter of Zipf’s law.
  • 30. See http://en.wikipedia.org/wiki/DC%2B%2B.
  • 31. Here the authors mean “downloading”.
  • 32. Major book publishers are presently considering joining the war.
  • 33. We assume here that other factors – such as promotion – are kept equal.
  • 34. Our translation.
  • 35. For the impact on download sales, see [Andersen-Frenz 2008], [Marsouin 2008], [Oberholzer-Strumpf 2010], [Martikainen 2010] and the other studies referenced in section 1.2 of the Studies on File Sharing page of La Quadrature du Net site, See http://www.laquadrature.net/wiki/Studies_on_file_sharing_eng.
  • 36. This statement seems to apply to iTunes globally, not to SACEM members, as the total number of titles generating download rights for SACEM was only 844,186 in 2009.
  • 37. See below chapter 5.