92,762 research outputs found
Incremental characterization of RDF Triple Stores
Many semantic web applications integrate data from distributed triple stores and to be efficient, they need to know what kind of content each triple store holds in order to assess if it can contribute to its queries. We present an algorithm to build indexes summarizing the content of triple stores. We extended Depth-First Search coding to provide a canonical representation of RDF graphs and we introduce a new join operator between two graph codes to optimize the generation of an index. We provide an incremental update algorithm and conclude with tests on real datasets
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Optimally Efficient Prefix Search and Multicast in Structured P2P Networks
Searching in P2P networks is fundamental to all overlay networks.
P2P networks based on Distributed Hash Tables (DHT) are optimized for single
key lookups, whereas unstructured networks offer more complex queries at the
cost of increased traffic and uncertain success rates. Our Distributed Tree
Construction (DTC) approach enables structured P2P networks to perform prefix
search, range queries, and multicast in an optimal way. It achieves this by
creating a spanning tree over the peers in the search area, using only
information available locally on each peer. Because DTC creates a spanning
tree, it can query all the peers in the search area with a minimal number of
messages. Furthermore, we show that the tree depth has the same upper bound as
a regular DHT lookup which in turn guarantees fast and responsive runtime
behavior. By placing objects with a region quadtree, we can perform a prefix
search or a range query in a freely selectable area of the DHT. Our DTC
algorithm is DHT-agnostic and works with most existing DHTs. We evaluate the
performance of DTC over several DHTs by comparing the performance to existing
application-level multicast solutions, we show that DTC sends 30-250% fewer
messages than common solutions
Optimal Information Retrieval with Complex Utility Functions
Existing retrieval models all attempt to optimize one single utility function, which is often based on the topical relevance of a document with respect to a query. In real applications, retrieval involves more complex utility functions that may involve preferences on several different dimensions. In this paper, we present a general optimization framework for retrieval with complex utility functions. A query language is designed according to this framework to enable users to submit complex queries. We propose an efficient algorithm for retrieval with complex utility functions based on the a-priori algorithm. As a case study, we apply our algorithm to a complex utility retrieval problem in distributed IR. Experiment results show that our algorithm allows for flexible tradeoff between multiple retrieval criteria. Finally, we study the efficiency issue of our algorithm on simulated data
- …