Search CORE

9,315 research outputs found

A Scalable Asynchronous Distributed Algorithm for Topic Modeling

Author: Asuncion A.
Asuncion A.
Cormen T. H.
Gonzalez J. E.
Snyder P.
Yan F.
Publication venue
Publication date: 16/12/2014
Field of study

Learning meaningful topic models with massive document collections which contain millions of documents and billions of tokens is challenging because of two reasons: First, one needs to deal with a large number of topics (typically in the order of thousands). Second, one needs a scalable and efficient way of distributing the computation across multiple machines. In this paper we present a novel algorithm F+Nomad LDA which simultaneously tackles both these problems. In order to handle large number of topics we use an appropriately modified Fenwick tree. This data structure allows us to sample from a multinomial distribution over

T

items in

O(\log T)

time. Moreover, when topic counts change the data structure can be updated in

O(\log T)

time. In order to distribute the computation across multiple processor we present a novel asynchronous framework inspired by the Nomad algorithm of \cite{YunYuHsietal13}. We show that F+Nomad LDA significantly outperform state-of-the-art on massive problems which involve millions of documents, billions of words, and thousands of topics

arXiv.org e-Print Archive

CiteSeerX

Crossref

Explicit Model Checking of Very Large MDP using Partitioning and Secondary Storage

Author: A Aggarwal
A Bell
A Hartmanns
C Baier
DD Deavours
EM Clarke
G Norman
GD Penna
H Hermanns
HC Bohnenkamp
J Barnat
L Alfaro de
M Hammer
M Kwiatkowska
M Kwiatkowska
M Timmer
ML Puterman
MZ Kwiatkowska
MZ Kwiatkowska
MZ Kwiatkowska
R Alur
R Mehmood
S Edelkamp
S Edelkamp
S Evangelista
T Bao
U Stern
V Forejt
WJ Stewart
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 19/05/2016
Field of study

The applicability of model checking is hindered by the state space explosion problem in combination with limited amounts of main memory. To extend its reach, the large available capacities of secondary storage such as hard disks can be exploited. Due to the specific performance characteristics of secondary storage technologies, specialised algorithms are required. In this paper, we present a technique to use secondary storage for probabilistic model checking of Markov decision processes. It combines state space exploration based on partitioning with a block-iterative variant of value iteration over the same partitions for the analysis of probabilistic reachability and expected-reward properties. A sparse matrix-like representation is used to store partitions on secondary storage in a compact format. All file accesses are sequential, and compression can be used without affecting runtime. The technique has been implemented within the Modest Toolset. We evaluate its performance on several benchmark models of up to 3.5 billion states. In the analysis of time-bounded properties on real-time models, our method neutralises the state space explosion induced by the time bound in its entirety.Comment: The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-24953-7_1

arXiv.org e-Print Archive

Crossref