136,721 research outputs found

    GraphLab: A New Framework for Parallel Machine Learning

    Full text link
    Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems

    Parallel path consistency

    Get PDF
    Journal ArticleFiltering algorithms are well accepted as a means of speeding up the solution of the consistent labeling problem (CLP). Despite the fact that path consistency does a better job of filtering than arc consistency, AC is still the preferred technique because it has a much lower time complexity. We are implementing parallel path consistency algorithms on multiprocessors and comparing their performance to the best sequential and parallel arc consistency algorithms. We also intend to categorize the relation between graph structure and algorithm performance. Preliminary work has shown linear performance increases for parallelized path consistency and also shown that in many cases performance is significantly better than the theoretical worst case. These two results lead us to believe that parallel path consistency may be a superior filtering technique, finally, we have explored the use of an outer product computational formation of path consistency and have excellent results of its use on a Connection Machine

    High-Performance Distributed ML at Scale through Parameter Server Consistency Models

    Full text link
    As Machine Learning (ML) applications increase in data size and model complexity, practitioners turn to distributed clusters to satisfy the increased computational and memory demands. Unfortunately, effective use of clusters for ML requires considerable expertise in writing distributed code, while highly-abstracted frameworks like Hadoop have not, in practice, approached the performance seen in specialized ML implementations. The recent Parameter Server (PS) paradigm is a middle ground between these extremes, allowing easy conversion of single-machine parallel ML applications into distributed ones, while maintaining high throughput through relaxed "consistency models" that allow inconsistent parameter reads. However, due to insufficient theoretical study, it is not clear which of these consistency models can really ensure correct ML algorithm output; at the same time, there remain many theoretically-motivated but undiscovered opportunities to maximize computational throughput. Motivated by this challenge, we study both the theoretical guarantees and empirical behavior of iterative-convergent ML algorithms in existing PS consistency models. We then use the gleaned insights to improve a consistency model using an "eager" PS communication mechanism, and implement it as a new PS system that enables ML algorithms to reach their solution more quickly.Comment: 19 pages, 2 figure

    Towards Scalable Parallel Fibonacci Heap Implementation

    Get PDF
    With the advancement of multiple processors, the sequential algorithms are being investigated and gradually substituted for its concurrent equivalent to effectively exploit the parallel architecture. Parallel algorithms speed up the performance by dividing the task into a number of processes (or threads) that can be scheduled and executed simultaneously in independent processing units. Various well-known basic algorithms and data-structures have been explored for its efficient parallel counterparts and have been published as popular libraries. However, advanced data-structures and algorithms have not seen similar investigation mainly because they have many optimization steps mostly backed by many states and finding safe and efficient parallel implementation isn’t an easy endeavor. Safety concerns for shared-memory parallel implementation are of utmost importance as it provides a basis for consistency of any data structure and algorithm. There are well-known tools like locks, semaphores, atomic operations and so on that assist towards safe parallel implementation but using them effectively and in well-defined synchronization are key factors in the overall performance of any data-structures and algorithms. This paper explores an advanced data structure, Fibonacci Heap, and its operations to evaluate its implementation using two different synchronization mechanisms: Coarse-grained and Fine-grained. The analysis in this paper shows that a fine-grained synchronized Fibonacci Heap implementation with certainly relaxed semantics is more scalable with growing number of concurrency in comparison to the coarse-grained synchronized Fibonacci Heap implementation

    Distributed GraphLab: A Framework for Machine Learning in the Cloud

    Full text link
    While high-level data parallel frameworks, like MapReduce, simplify the design and implementation of large-scale data processing systems, they do not naturally or efficiently support many important data mining and machine learning algorithms and can lead to inefficient learning systems. To help fill this critical void, we introduced the GraphLab abstraction which naturally expresses asynchronous, dynamic, graph-parallel computation while ensuring data consistency and achieving a high degree of parallel performance in the shared-memory setting. In this paper, we extend the GraphLab framework to the substantially more challenging distributed setting while preserving strong data consistency guarantees. We develop graph based extensions to pipelined locking and data versioning to reduce network congestion and mitigate the effect of network latency. We also introduce fault tolerance to the GraphLab abstraction using the classic Chandy-Lamport snapshot algorithm and demonstrate how it can be easily implemented by exploiting the GraphLab abstraction itself. Finally, we evaluate our distributed implementation of the GraphLab abstraction on a large Amazon EC2 deployment and show 1-2 orders of magnitude performance gains over Hadoop-based implementations.Comment: VLDB201
    • …
    corecore