Datasets

Datasets used in Sharing

3 datasets have been used to conduct the diversity of attention analysis for various situations reported in the book:

  • The eDonkey file sharing data collected by the Complex Networks Research Team of Université Pierre et Marie Curie, as reported in their Ten Weeks in the Life of an eDonkey Server paper [Aidouni et al., 2009]. The original dataset can be downloaded here. From this huge dataset, we computed the ranked popularity for FileIds in queries for sources providing access to these FileIds. You can download the corresponding CSV file (zip archive). Please credit the Complex Networks research team for any published use of the data they have collected.
  • Data regarding the BitTorrent file sharing of movies in Hungary collected by Bodó Balázs and Zoltán Lakatos [Balázs-Lakatos, 2010]. 3 ranked popularity have been computed by the authors and provided to us: allfiles (all the files shared on the various trackers, including those not identified as representing a movie in IMDb), allknownfiles (all files identified as representing a movie in IMDb), allknownfilms (compound sharing of each of IMDb films that were shared). You can download the zip archive containing the 3 popularity distributions. Any published use of this data must credit Bodó Balázs and Zoltán Lakatos.
  • Data from the Musique Libre site used in the paper Diversity, attention and symmetry in a many-to-many information society [Aigrain 2006]. This data was provided by Musique Libre (today Dogmazic). You can download the zip archive containing the ranked popularity for the compound level of both downloads and listening on the site (see [Aigrain 2006] for details). Any published use of this data must credit Musique Libre.

Note that Sharing also uses results from studies or information provided by other researchers, in particular for analyzing the diversity of attention in commercial sales. In most cases, the raw datasets have not been made available by these researchers, and diversity of attention results had to be inferred from the information made available.


In a few weeks, you will be able to upload datasets in order to run our diversity of attention analysis software on them, or simply to share them with other researchers. Please bear with us while we are putting in place this functionality.