Web caches, content distribution networks, peer-to-peer file sharing
networks, distributed file systems, and data grids all have in common that they
involve a community of users who generate requests for shared data. In each
case, overall system performance can be improved significantly if we can first
identify and then exploit interesting structure within a community's access
patterns. To this end, we propose a novel perspective on file sharing based on
the study of the relationships that form among users based on the files in
which they are interested.
We propose a new structure that captures common user interests in data--the
data-sharing graph-- and justify its utility with studies on three
data-distribution systems: a high-energy physics collaboration, the Web, and
the Kazaa peer-to-peer network. We find small-world patterns in the
data-sharing graphs of all three communities. We analyze these graphs and
propose some probable causes for these emergent small-world patterns. The
significance of small-world patterns is twofold: it provides a rigorous support
to intuition and, perhaps most importantly, it suggests ways to design
mechanisms that exploit these naturally emerging patterns