42,423 research outputs found
Efficient range query processing in peer-to-peer systems
2008-2009 > Academic research: refereed > Publication in refereed journalVersion of RecordPublishe
Efficient Range and Join Query Processing in Massively Distributed Peer-to-Peer Networks
Peer-to-peer (P2P) has become a modern distributed computing architecture that supports massively large-scale data management and query processing. Complex query operators such as range operator and
join operator are needed by various distributed applications, including content distribution, locality-aware services, computing resource sharing, and many others.
This dissertation tackles a number of problems related to range and join query processing in P2P systems: fault-tolerant range query processing under structured P2P architecture, distributed range caching under unstructured P2P architecture, and integration of heterogeneous data under unstructured P2P architecture. To support
fault-tolerant range query processing so as to provide strong performance guarantees in the presence of network churn, effective
replication schemes are developed at either the overlay network level or the query processing level. To facilitate range query
processing, a prefetch-based caching approach is proposed to eliminate the performance bottlenecks incurred by those data items
that are not well cached in the network. Finally, a purely decentralized partition-based join query operator is devised to realize bandwidth-efficient join query processing under unstructured P2P architecture.
Theoretical analysis and experimental simulations demonstrate the effectiveness of the proposed approaches
Peer to Peer Information Retrieval: An Overview
Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom
Statistical structures for internet-scale data management
Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability
- …