Search CORE

125 research outputs found

Small-World File-Sharing Communities

Author: Foster Ian
Iamnitchi Adriana
Ripeanu Matei
Publication venue
Publication date: 01/01/2003
Field of study

Web caches, content distribution networks, peer-to-peer file sharing networks, distributed file systems, and data grids all have in common that they involve a community of users who generate requests for shared data. In each case, overall system performance can be improved significantly if we can first identify and then exploit interesting structure within a community's access patterns. To this end, we propose a novel perspective on file sharing based on the study of the relationships that form among users based on the files in which they are interested. We propose a new structure that captures common user interests in data--the data-sharing graph-- and justify its utility with studies on three data-distribution systems: a high-energy physics collaboration, the Web, and the Kazaa peer-to-peer network. We find small-world patterns in the data-sharing graphs of all three communities. We analyze these graphs and propose some probable causes for these emergent small-world patterns. The significance of small-world patterns is twofold: it provides a rigorous support to intuition and, perhaps most importantly, it suggests ways to design mechanisms that exploit these naturally emerging patterns

arXiv.org e-Print Archive

CiteSeerX

GPUs as Storage System Accelerators

Author: Al-Kiswany Samer
Gharaibeh Abdullah
Ripeanu Matei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/05/2012
Field of study

Massively multicore processors, such as Graphics Processing Units (GPUs), provide, at a comparable price, a one order of magnitude higher peak performance than traditional CPUs. This drop in the cost of computation, as any order-of-magnitude drop in the cost per unit of performance for a class of system components, triggers the opportunity to redesign systems and to explore new ways to engineer them to recalibrate the cost-to-performance relation. This project explores the feasibility of harnessing GPUs' computational power to improve the performance, reliability, or security of distributed storage systems. In this context, we present the design of a storage system prototype that uses GPU offloading to accelerate a number of computationally intensive primitives based on hashing, and introduce techniques to efficiently leverage the processing power of GPUs. We evaluate the performance of this prototype under two configurations: as a content addressable storage system that facilitates online similarity detection between successive versions of the same file and as a traditional system that uses hashing to preserve data integrity. Further, we evaluate the impact of offloading to the GPU on competing applications' performance. Our results show that this technique can bring tangible performance gains without negatively impacting the performance of concurrently running applications.Comment: IEEE Transactions on Parallel and Distributed Systems, 201

arXiv.org e-Print Archive

Crossref

Content Reuse and Interest Sharing in Tagging Communities

Author: Iamnitchi Adriana
Ripeanu Matei
Santos-Neto Elizeu
Publication venue
Publication date: 01/01/2008
Field of study

Tagging communities represent a subclass of a broader class of user-generated content-sharing online communities. In such communities users introduce and tag content for later use. Although recent studies advocate and attempt to harness social knowledge in this context by exploiting collaboration among users, little research has been done to quantify the current level of user collaboration in these communities. This paper introduces two metrics to quantify the level of collaboration: content reuse and shared interest. Using these two metrics, this paper shows that the current level of collaboration in CiteULike and Connotea is consistently low, which significantly limits the potential of harnessing the social knowledge in communities. This study also discusses implications of these findings in the context of recommendation and reputation systems.Comment: 6 pages, 6 figures, AAAI Spring Symposium on Social Information Processin

arXiv.org e-Print Archive

CiteSeerX

DiPerF: an automated DIstributed PERformance testing Framework

Author: Dumitrescu Catalin
Foster Ian
Raicu Ioan
Ripeanu Matei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2004
Field of study

We present DiPerF, a distributed performance testing framework, aimed at simplifying and automating service performance evaluation. DiPerF coordinates a pool of machines that test a target service, collects and aggregates performance metrics, and generates performance statistics. The aggregate data collected provide information on service throughput, on service "fairness" when serving multiple clients concurrently, and on the impact of network latency on service performance. Furthermore, using this data, it is possible to build predictive models that estimate a service performance given the service load. We have tested DiPerF on 100+ machines on two testbeds, Grid3 and PlanetLab, and explored the performance of job submission services (pre WS GRAM and WS GRAM) included with Globus Toolkit 3.2.Comment: 8 pages, 8 figures, will appear in IEEE/ACM Grid2004, November 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

Maximum Flow on Highly Dynamic Graphs

Author: Luo Juntong
Ripeanu Matei
Sallinen Scott
Publication venue
Publication date: 12/11/2023
Field of study

Recent advances in dynamic graph processing have enabled the analysis of highly dynamic graphs with change at rates as high as millions of edge changes per second. Solutions in this domain, however, have been demonstrated only for relatively simple algorithms like PageRank, breadth-first search, and connected components. Expanding beyond this, we explore the maximum flow problem, a fundamental, yet more complex problem, in graph analytics. We propose a novel, distributed algorithm for max-flow on dynamic graphs, and implement it on top of an asynchronous vertex-centric abstraction. We show that our algorithm can process both additions and deletions of vertices and edges efficiently at scale on fast-evolving graphs, and provide a comprehensive analysis by evaluating, in addition to throughput, two criteria that are important when applied to real-world problems: result latency and solution stability

arXiv.org e-Print Archive