73,492 research outputs found

    GraphLab: A New Framework for Parallel Machine Learning

    Full text link
    Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems

    The Lock-free kk-LSM Relaxed Priority Queue

    Full text link
    Priority queues are data structures which store keys in an ordered fashion to allow efficient access to the minimal (maximal) key. Priority queues are essential for many applications, e.g., Dijkstra's single-source shortest path algorithm, branch-and-bound algorithms, and prioritized schedulers. Efficient multiprocessor computing requires implementations of basic data structures that can be used concurrently and scale to large numbers of threads and cores. Lock-free data structures promise superior scalability by avoiding blocking synchronization primitives, but the \emph{delete-min} operation is an inherent scalability bottleneck in concurrent priority queues. Recent work has focused on alleviating this obstacle either by batching operations, or by relaxing the requirements to the \emph{delete-min} operation. We present a new, lock-free priority queue that relaxes the \emph{delete-min} operation so that it is allowed to delete \emph{any} of the ρ+1\rho+1 smallest keys, where ρ\rho is a runtime configurable parameter. Additionally, the behavior is identical to a non-relaxed priority queue for items added and removed by the same thread. The priority queue is built from a logarithmic number of sorted arrays in a way similar to log-structured merge-trees. We experimentally compare our priority queue to recent state-of-the-art lock-free priority queues, both with relaxed and non-relaxed semantics, showing high performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste

    Waiting time dynamics of priority-queue networks

    Full text link
    We study the dynamics of priority-queue networks, generalizations of the binary interacting priority queue model introduced by Oliveira and Vazquez [Physica A {\bf 388}, 187 (2009)]. We found that the original AND-type protocol for interacting tasks is not scalable for the queue networks with loops because the dynamics becomes frozen due to the priority conflicts. We then consider a scalable interaction protocol, an OR-type one, and examine the effects of the network topology and the number of queues on the waiting time distributions of the priority-queue networks, finding that they exhibit power-law tails in all cases considered, yet with model-dependent power-law exponents. We also show that the synchronicity in task executions, giving rise to priority conflicts in the priority-queue networks, is a relevant factor in the queue dynamics that can change the power-law exponent of the waiting time distribution.Comment: 5 pages, 3 figures, minor changes, final published versio

    A framework and simulation engine for studying artificial life

    Get PDF
    The area of computer-generated artificial life-forms is a relatively recent field of inter-disciplinary study that involves mathematical modelling, physical intuition and ideas from chemistry and biology and computational science. Although the attribution of “life” to non biological systems is still controversial, several groups agree that certain emergent properties can be ascribed to computer simulated systems that can be constructed to “live” in a simulated environment. In this paper we discuss some of the issues and infrastructure necessary to construct a simulation laboratory for the study of computer generated artificial life-forms. We review possible technologies and present some preliminary studies based around simple models

    Satellite downlink scheduling problem: A case study

    Get PDF
    The synthetic aperture radar (SAR) technology enables satellites to efficiently acquire high quality images of the Earth surface. This generates significant communication traffic from the satellite to the ground stations, and, thus, image downlinking often becomes the bottleneck in the efficiency of the whole system. In this paper we address the downlink scheduling problem for Canada's Earth observing SAR satellite, RADARSAT-2. Being an applied problem, downlink scheduling is characterised with a number of constraints that make it difficult not only to optimise the schedule but even to produce a feasible solution. We propose a fast schedule generation procedure that abstracts the problem specific constraints and provides a simple interface to optimisation algorithms. By comparing empirically several standard meta-heuristics applied to the problem, we select the most suitable one and show that it is clearly superior to the approach currently in use.Comment: 23 page

    Parallel Working-Set Search Structures

    Full text link
    In this paper we present two versions of a parallel working-set map on p processors that supports searches, insertions and deletions. In both versions, the total work of all operations when the map has size at least p is bounded by the working-set bound, i.e., the cost of an item depends on how recently it was accessed (for some linearization): accessing an item in the map with recency r takes O(1+log r) work. In the simpler version each map operation has O((log p)^2+log n) span (where n is the maximum size of the map). In the pipelined version each map operation on an item with recency r has O((log p)^2+log r) span. (Operations in parallel may have overlapping span; span is additive only for operations in sequence.) Both data structures are designed to be used by a dynamic multithreading parallel program that at each step executes a unit-time instruction or makes a data structure call. To achieve the stated bounds, the pipelined data structure requires a weak-priority scheduler, which supports a limited form of 2-level prioritization. At the end we explain how the results translate to practical implementations using work-stealing schedulers. To the best of our knowledge, this is the first parallel implementation of a self-adjusting search structure where the cost of an operation adapts to the access sequence. A corollary of the working-set bound is that it achieves work static optimality: the total work is bounded by the access costs in an optimal static search tree.Comment: Authors' version of a paper accepted to SPAA 201
    corecore