Search CORE

73,492 research outputs found

GraphLab: A New Framework for Parallel Machine Learning

Author: Bickson Danny
Gonzalez Joseph
Guestrin Carlos
Hellerstein Joseph M.
Kyrola Aapo
Low Yucheng
Publication venue
Publication date: 01/01/2010
Field of study

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Synthesis from specifications : basic concepts

Author: Gajski Daniel D.
Narayan Sanjiv
Vahid Frank
Publication venue: eScholarship, University of California
Publication date: 29/01/1990
Field of study

The need has evolved for a synthesis tool at the computer system level. SpecSyn is one such tool. Basically, it will view the world as a set of chips communicating via protocols. Thus, an abstract specification would get synthesized into a set of one or more interconnected chips. From that point, detail is added to each chip's specification until its structure is synthesized or it is determined that a prefabricated chip similar in functionality can be used.Features of such a tool include executable specifications from which to synthesize, constraint driven partitioning of the specifications into components (chips) and synthesis of interfaces between them, translation into VHDL and synthesis into VHDL structures of micro-architectural components, and the use of other tools (e.g. MILO, a micro-architecture and logic optimizer, and LES, a layout expert system) to evaluate the quality of the chip layout generated from VHDL description.A major component of SpecSyn is SpecCharts, a high level specification language amenable to system level synthesis, able to represent designs from system to register transfer levels. The language consists of a hierarchy of states, represented in combined graphical and textual form, at the same time catering to the expression of concurrent behavior and specification of constraints. With it we have specified several Intel chips as well as higher level systems, and have found it to be quite powerful and easy to use.SpecSyn will have a graphical interface, from which the user can at any time view or edit a SpecChart, translate to VHDL and simulate, view statistics provided by estimators (such as area, speed, and pins), store and retrieve SpecCharts, apply basic Spec Chart operations, as well as apply the partitioning algorithms or interface synthesizer. Providing access to a wide range of tools, having a single language represent the design throughout the synthesis process, and having user specified constraints allow the user to have varying amounts of control over the synthesis process

eScholarship - University of California

The Lock-free $k$ -LSM Relaxed Priority Queue

Author: Gruber Jakob
Träff Jesper Larsson
Tsigas Philippas
Wimmer Martin
Publication venue
Publication date: 01/01/2015
Field of study

Priority queues are data structures which store keys in an ordered fashion to allow efficient access to the minimal (maximal) key. Priority queues are essential for many applications, e.g., Dijkstra's single-source shortest path algorithm, branch-and-bound algorithms, and prioritized schedulers. Efficient multiprocessor computing requires implementations of basic data structures that can be used concurrently and scale to large numbers of threads and cores. Lock-free data structures promise superior scalability by avoiding blocking synchronization primitives, but the \emph{delete-min} operation is an inherent scalability bottleneck in concurrent priority queues. Recent work has focused on alleviating this obstacle either by batching operations, or by relaxing the requirements to the \emph{delete-min} operation. We present a new, lock-free priority queue that relaxes the \emph{delete-min} operation so that it is allowed to delete \emph{any} of the

\rho+1

smallest keys, where

\rho

is a runtime configurable parameter. Additionally, the behavior is identical to a non-relaxed priority queue for items added and removed by the same thread. The priority queue is built from a logarithmic number of sorted arrays in a way similar to log-structured merge-trees. We experimentally compare our priority queue to recent state-of-the-art lock-free priority queues, both with relaxed and non-relaxed semantics, showing high performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste

arXiv.org e-Print Archive

Crossref

Chalmers Research

Waiting time dynamics of priority-queue networks

Author: A. Cobham
Byungjoon Min
D. Gross
G. Caldarelli
I.-M. Kim
K.-I. Goh
S. Wasserman
Publication venue: 'American Physical Society (APS)'
Publication date: 20/07/2009
Field of study

We study the dynamics of priority-queue networks, generalizations of the binary interacting priority queue model introduced by Oliveira and Vazquez [Physica A {\bf 388}, 187 (2009)]. We found that the original AND-type protocol for interacting tasks is not scalable for the queue networks with loops because the dynamics becomes frozen due to the priority conflicts. We then consider a scalable interaction protocol, an OR-type one, and examine the effects of the network topology and the number of queues on the waiting time distributions of the priority-queue networks, finding that they exhibit power-law tails in all cases considered, yet with model-dependent power-law exponents. We also show that the synchronicity in task executions, giving rise to priority conflicts in the priority-queue networks, is a relevant factor in the queue dynamics that can change the power-law exponent of the waiting time distribution.Comment: 5 pages, 3 figures, minor changes, final published versio

arXiv.org e-Print Archive

Crossref

A framework and simulation engine for studying artificial life

Author: Hawick K.A.
James H.A.
Scogings C.
Publication venue: 'Massey University'
Publication date: 01/01/2004
Field of study

The area of computer-generated artificial life-forms is a relatively recent field of inter-disciplinary study that involves mathematical modelling, physical intuition and ideas from chemistry and biology and computational science. Although the attribution of “life” to non biological systems is still controversial, several groups agree that certain emergent properties can be ascribed to computer simulated systems that can be constructed to “live” in a simulated environment. In this paper we discuss some of the issues and infrastructure necessary to construct a simulation laboratory for the study of computer generated artificial life-forms. We review possible technologies and present some preliminary studies based around simple models

Massey Research Online

CiteSeerX

Satellite downlink scheduling problem: A case study

Author: Abraham P. Punnen
Ahuja
Ahuja
Barbulescu
Bard
Beaumet
Benoist
Bianchessi
Bianchessi
Chen
Cordeau
Daniel Karapetyan
Demeulemeester
Donati
Gabrel
Glover
Hartmann
Karapetyan
Kolisch
Krishna T. Malladi
Lin
Marinelli
Martin
Martì
Rojanasoonthon
Rom
Sawik
Snezana Mitrovic Minic
Sterna
Sun
Vasquez
Verfaillie
Wang
Yenisey
Zufferey
Publication venue: 'Elsevier BV'
Publication date: 14/02/2015
Field of study

The synthetic aperture radar (SAR) technology enables satellites to efficiently acquire high quality images of the Earth surface. This generates significant communication traffic from the satellite to the ground stations, and, thus, image downlinking often becomes the bottleneck in the efficiency of the whole system. In this paper we address the downlink scheduling problem for Canada's Earth observing SAR satellite, RADARSAT-2. Being an applied problem, downlink scheduling is characterised with a number of constraints that make it difficult not only to optimise the schedule but even to produce a feasible solution. We propose a fast schedule generation procedure that abstracts the problem specific constraints and provides a simple interface to optimisation algorithms. By comparing empirically several standard meta-heuristics applied to the problem, we select the most suitable one and show that it is clearly superior to the approach currently in use.Comment: 23 page

arXiv.org e-Print Archive

University of Essex Research Repository

CiteSeerX

Crossref

Parallel Working-Set Search Structures

Author: Akhremtsev Yaroslav
Crauser A.
Frias Leonor
Oyama Y.
Richard
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 11/07/2018
Field of study

In this paper we present two versions of a parallel working-set map on p processors that supports searches, insertions and deletions. In both versions, the total work of all operations when the map has size at least p is bounded by the working-set bound, i.e., the cost of an item depends on how recently it was accessed (for some linearization): accessing an item in the map with recency r takes O(1+log r) work. In the simpler version each map operation has O((log p)^2+log n) span (where n is the maximum size of the map). In the pipelined version each map operation on an item with recency r has O((log p)^2+log r) span. (Operations in parallel may have overlapping span; span is additive only for operations in sequence.) Both data structures are designed to be used by a dynamic multithreading parallel program that at each step executes a unit-time instruction or makes a data structure call. To achieve the stated bounds, the pipelined data structure requires a weak-priority scheduler, which supports a limited form of 2-level prioritization. At the end we explain how the results translate to practical implementations using work-stealing schedulers. To the best of our knowledge, this is the first parallel implementation of a self-adjusting search structure where the cost of an operation adapts to the access sequence. A corollary of the working-set bound is that it achieves work static optimality: the total work is bounded by the access costs in an optimal static search tree.Comment: Authors' version of a paper accepted to SPAA 201

arXiv.org e-Print Archive

Crossref