125 research outputs found
Small-World File-Sharing Communities
Web caches, content distribution networks, peer-to-peer file sharing
networks, distributed file systems, and data grids all have in common that they
involve a community of users who generate requests for shared data. In each
case, overall system performance can be improved significantly if we can first
identify and then exploit interesting structure within a community's access
patterns. To this end, we propose a novel perspective on file sharing based on
the study of the relationships that form among users based on the files in
which they are interested.
We propose a new structure that captures common user interests in data--the
data-sharing graph-- and justify its utility with studies on three
data-distribution systems: a high-energy physics collaboration, the Web, and
the Kazaa peer-to-peer network. We find small-world patterns in the
data-sharing graphs of all three communities. We analyze these graphs and
propose some probable causes for these emergent small-world patterns. The
significance of small-world patterns is twofold: it provides a rigorous support
to intuition and, perhaps most importantly, it suggests ways to design
mechanisms that exploit these naturally emerging patterns
GPUs as Storage System Accelerators
Massively multicore processors, such as Graphics Processing Units (GPUs),
provide, at a comparable price, a one order of magnitude higher peak
performance than traditional CPUs. This drop in the cost of computation, as any
order-of-magnitude drop in the cost per unit of performance for a class of
system components, triggers the opportunity to redesign systems and to explore
new ways to engineer them to recalibrate the cost-to-performance relation. This
project explores the feasibility of harnessing GPUs' computational power to
improve the performance, reliability, or security of distributed storage
systems. In this context, we present the design of a storage system prototype
that uses GPU offloading to accelerate a number of computationally intensive
primitives based on hashing, and introduce techniques to efficiently leverage
the processing power of GPUs. We evaluate the performance of this prototype
under two configurations: as a content addressable storage system that
facilitates online similarity detection between successive versions of the same
file and as a traditional system that uses hashing to preserve data integrity.
Further, we evaluate the impact of offloading to the GPU on competing
applications' performance. Our results show that this technique can bring
tangible performance gains without negatively impacting the performance of
concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201
Content Reuse and Interest Sharing in Tagging Communities
Tagging communities represent a subclass of a broader class of user-generated
content-sharing online communities. In such communities users introduce and tag
content for later use. Although recent studies advocate and attempt to harness
social knowledge in this context by exploiting collaboration among users,
little research has been done to quantify the current level of user
collaboration in these communities. This paper introduces two metrics to
quantify the level of collaboration: content reuse and shared interest. Using
these two metrics, this paper shows that the current level of collaboration in
CiteULike and Connotea is consistently low, which significantly limits the
potential of harnessing the social knowledge in communities. This study also
discusses implications of these findings in the context of recommendation and
reputation systems.Comment: 6 pages, 6 figures, AAAI Spring Symposium on Social Information
Processin
DiPerF: an automated DIstributed PERformance testing Framework
We present DiPerF, a distributed performance testing framework, aimed at
simplifying and automating service performance evaluation. DiPerF coordinates a
pool of machines that test a target service, collects and aggregates
performance metrics, and generates performance statistics. The aggregate data
collected provide information on service throughput, on service "fairness" when
serving multiple clients concurrently, and on the impact of network latency on
service performance. Furthermore, using this data, it is possible to build
predictive models that estimate a service performance given the service load.
We have tested DiPerF on 100+ machines on two testbeds, Grid3 and PlanetLab,
and explored the performance of job submission services (pre WS GRAM and WS
GRAM) included with Globus Toolkit 3.2.Comment: 8 pages, 8 figures, will appear in IEEE/ACM Grid2004, November 200
Maximum Flow on Highly Dynamic Graphs
Recent advances in dynamic graph processing have enabled the analysis of
highly dynamic graphs with change at rates as high as millions of edge changes
per second. Solutions in this domain, however, have been demonstrated only for
relatively simple algorithms like PageRank, breadth-first search, and connected
components. Expanding beyond this, we explore the maximum flow problem, a
fundamental, yet more complex problem, in graph analytics. We propose a novel,
distributed algorithm for max-flow on dynamic graphs, and implement it on top
of an asynchronous vertex-centric abstraction. We show that our algorithm can
process both additions and deletions of vertices and edges efficiently at scale
on fast-evolving graphs, and provide a comprehensive analysis by evaluating, in
addition to throughput, two criteria that are important when applied to
real-world problems: result latency and solution stability
- …