Search CORE

5,690 research outputs found

GraphLab: A New Framework for Parallel Machine Learning

Author: Bickson Danny
Gonzalez Joseph
Guestrin Carlos
Hellerstein Joseph M.
Kyrola Aapo
Low Yucheng
Publication venue
Publication date: 01/01/2010
Field of study

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and Pthreads leave ML experts repeatedly solving the same design challenges. By targeting common patterns in ML, we developed GraphLab, which improves upon abstractions like MapReduce by compactly expressing asynchronous iterative algorithms with sparse computational dependencies while ensuring data consistency and achieving a high degree of parallel performance. We demonstrate the expressiveness of the GraphLab framework by designing and implementing parallel versions of belief propagation, Gibbs sampling, Co-EM, Lasso and Compressed Sensing. We show that using GraphLab we can achieve excellent parallel performance on large scale real-world problems

arXiv.org e-Print Archive

CiteSeerX

Maximum likelihood reconstruction for Ising models with asynchronous updates

Author: C. Kipnis
Erik Aurell
Hong-Li Zeng
John Hertz
Mikko Alava
Yasser Roudi
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2013
Field of study

We describe how the couplings in an asynchronous kinetic Ising model can be inferred. We consider two cases, one in which we know both the spin history and the update times and one in which we only know the spin history. For the first case, we show that one can average over all possible choices of update times to obtain a learning rule that depends only on spin correlations and can also be derived from the equations of motion for the correlations. For the second case, the same rule can be derived within a further decoupling approximation. We study all methods numerically for fully asymmetric Sherrington-Kirkpatrick models, varying the data length, system size, temperature, and external field. Good convergence is observed in accordance with the theoretical expectations

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Aaltodoc Publication Archive

A Scalable Asynchronous Distributed Algorithm for Topic Modeling

Author: Asuncion A.
Asuncion A.
Cormen T. H.
Gonzalez J. E.
Snyder P.
Yan F.
Publication venue
Publication date: 16/12/2014
Field of study

Learning meaningful topic models with massive document collections which contain millions of documents and billions of tokens is challenging because of two reasons: First, one needs to deal with a large number of topics (typically in the order of thousands). Second, one needs a scalable and efficient way of distributing the computation across multiple machines. In this paper we present a novel algorithm F+Nomad LDA which simultaneously tackles both these problems. In order to handle large number of topics we use an appropriately modified Fenwick tree. This data structure allows us to sample from a multinomial distribution over

T

items in

O(\log T)

time. Moreover, when topic counts change the data structure can be updated in

O(\log T)

time. In order to distribute the computation across multiple processor we present a novel asynchronous framework inspired by the Nomad algorithm of \cite{YunYuHsietal13}. We show that F+Nomad LDA significantly outperform state-of-the-art on massive problems which involve millions of documents, billions of words, and thousands of topics

arXiv.org e-Print Archive

CiteSeerX

Crossref