72,670 research outputs found
A Scalable Asynchronous Distributed Algorithm for Topic Modeling
Learning meaningful topic models with massive document collections which
contain millions of documents and billions of tokens is challenging because of
two reasons: First, one needs to deal with a large number of topics (typically
in the order of thousands). Second, one needs a scalable and efficient way of
distributing the computation across multiple machines. In this paper we present
a novel algorithm F+Nomad LDA which simultaneously tackles both these problems.
In order to handle large number of topics we use an appropriately modified
Fenwick tree. This data structure allows us to sample from a multinomial
distribution over items in time. Moreover, when topic counts
change the data structure can be updated in time. In order to
distribute the computation across multiple processor we present a novel
asynchronous framework inspired by the Nomad algorithm of
\cite{YunYuHsietal13}. We show that F+Nomad LDA significantly outperform
state-of-the-art on massive problems which involve millions of documents,
billions of words, and thousands of topics
Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity
Submodular maximization is a general optimization problem with a wide range
of applications in machine learning (e.g., active learning, clustering, and
feature selection). In large-scale optimization, the parallel running time of
an algorithm is governed by its adaptivity, which measures the number of
sequential rounds needed if the algorithm can execute polynomially-many
independent oracle queries in parallel. While low adaptivity is ideal, it is
not sufficient for an algorithm to be efficient in practice---there are many
applications of distributed submodular optimization where the number of
function evaluations becomes prohibitively expensive. Motivated by these
applications, we study the adaptivity and query complexity of submodular
maximization. In this paper, we give the first constant-factor approximation
algorithm for maximizing a non-monotone submodular function subject to a
cardinality constraint that runs in adaptive rounds and makes
oracle queries in expectation. In our empirical study, we use
three real-world applications to compare our algorithm with several benchmarks
for non-monotone submodular maximization. The results demonstrate that our
algorithm finds competitive solutions using significantly fewer rounds and
queries.Comment: 12 pages, 8 figure
Parallel ACO with a Ring Neighborhood for Dynamic TSP
The current paper introduces a new parallel computing technique based on ant
colony optimization for a dynamic routing problem. In the dynamic traveling
salesman problem the distances between cities as travel times are no longer
fixed. The new technique uses a parallel model for a problem variant that
allows a slight movement of nodes within their Neighborhoods. The algorithm is
tested with success on several large data sets.Comment: 8 pages, 1 figure; accepted J. Information Technology Researc
- …