1,109 research outputs found
Adaptive Partitioning for Large-Scale Dynamic Graphs
Abstract—In the last years, large-scale graph processing has gained increasing attention, with most recent systems placing particular emphasis on latency. One possible technique to improve runtime performance in a distributed graph processing system is to reduce network communication. The most notable way to achieve this goal is to partition the graph by minimizing the num-ber of edges that connect vertices assigned to different machines, while keeping the load balanced. However, real-world graphs are highly dynamic, with vertices and edges being constantly added and removed. Carefully updating the partitioning of the graph to reflect these changes is necessary to avoid the introduction of an extensive number of cut edges, which would gradually worsen computation performance. In this paper we show that performance degradation in dynamic graph processing systems can be avoided by adapting continuously the graph partitions as the graph changes. We present a novel highly scalable adaptive partitioning strategy, and show a number of refinements that make it work under the constraints of a large-scale distributed system. The partitioning strategy is based on iterative vertex migrations, relying only on local information. We have implemented the technique in a graph processing system, and we show through three real-world scenarios how adapting graph partitioning reduces execution time by over 50 % when compared to commonly used hash-partitioning. I
A key-based adaptive transactional memory executor
Software transactional memory systems enable a programmer to easily write concurrent data structures such as lists, trees, hashtables, and graphs, where nonconflicting operations proceed in parallel. Many of these structures take the abstract form of a dictionary, in which each transaction is associated with a search key. By regrouping transactions based on their keys, one may improve locality and reduce conflicts among parallel transactions. In this paper, we present an executor that partitions transactions among available processors. Our keybased adaptive partitioning monitors incoming transactions, estimates the probability distribution of their keys, and adaptively determines the (usually nonuniform) partitions. By comparing the adaptive partitioning with uniform partitioning and round-robin keyless partitioning on a 16-processor SunFire 6800 machine, we demonstrate that key-based adaptive partitioning significantly improves the throughput of finegrained parallel operations on concurrent data structures
Classification algorithms using adaptive partitioning
Algorithms for binary classification based on adaptive tree partitioning are
formulated and analyzed for both their risk performance and their friendliness
to numerical implementation. The algorithms can be viewed as generating a set
approximation to the Bayes set and thus fall into the general category of set
estimators. In contrast with the most studied tree-based algorithms, which
utilize piecewise constant approximation on the generated partition [IEEE
Trans. Inform. Theory 52 (2006) 1335-1353; Mach. Learn. 66 (2007) 209-242], we
consider decorated trees, which allow us to derive higher order methods.
Convergence rates for these methods are derived in terms the parameter
of margin conditions and a rate of best approximation of the Bayes set by
decorated adaptive partitions. They can also be expressed in terms of the Besov
smoothness of the regression function that governs its approximability
by piecewise polynomials on adaptive partition. The execution of the algorithms
does not require knowledge of the smoothness or margin conditions. Besov
smoothness conditions are weaker than the commonly used H\"{o}lder conditions,
which govern approximation by nonadaptive partitions, and therefore for a given
regression function can result in a higher rate of convergence. This in turn
mitigates the compatibility conflict between smoothness and margin parameters.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1234 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
AdaptDB: Adaptive Partitioning for Distributed Joins
Big data analytics often involves complex join queries over two or more tables. Such join processing is expensive in a distributed setting both because large amounts of data must be read from disk, and because of data shuffling across the network. Many techniques based on data partitioning have been proposed to reduce the amount of data that must be accessed, often focusing on finding the best partitioning scheme for a particular workload, rather than adapting to changes in the workload over time. In this paper, we present AdaptDB, an adaptive storage manager for analytical database workloads in a distributed setting. It works by partitioning datasets across a cluster and incrementally refining data partitioning as queries are run. AdaptDB introduces a novel hyper-join that avoids expensive data shuffling by identifying storage blocks of the joining tables that overlap on the join attribute, and only joining those blocks. Hyper-join performs well when each block in one table overlaps with few blocks in the other table, since that will minimize the number of blocks that have to be accessed. To minimize the number of overlapping blocks for common join queries, AdaptDB users smooth repartitioning to repartition small portions of the tables on join attributes as queries run. A prototype of AdaptDB running on top of Spark improves query performance by 2-3x on TPC-H as well as real-world dataset, versus a system that employs scans and shuffle-joins
- …