4,468 research outputs found
FORM version 4.0
We present version 4.0 of the symbolic manipulation system FORM. The most
important new features are manipulation of rational polynomials and the
factorization of expressions. Many other new functions and commands are also
added; some of them are very general, while others are designed for building
specific high level packages, such as one for Groebner bases. New is also the
checkpoint facility, that allows for periodic backups during long calculations.
Lastly, FORM 4.0 has become available as open source under the GNU General
Public License version 3.Comment: 26 pages. Uses axodra
Distributed Triangle Counting in the Graphulo Matrix Math Library
Triangle counting is a key algorithm for large graph analysis. The Graphulo
library provides a framework for implementing graph algorithms on the Apache
Accumulo distributed database. In this work we adapt two algorithms for
counting triangles, one that uses the adjacency matrix and another that also
uses the incidence matrix, to the Graphulo library for server-side processing
inside Accumulo. Cloud-based experiments show a similar performance profile for
these different approaches on the family of power law Graph500 graphs, for
which data skew increasingly bottlenecks. These results motivate the design of
skew-aware hybrid algorithms that we propose for future work.Comment: Honorable mention in the 2017 IEEE HPEC's Graph Challeng
Speculative Approximations for Terascale Analytics
Model calibration is a major challenge faced by the plethora of statistical
analytics packages that are increasingly used in Big Data applications.
Identifying the optimal model parameters is a time-consuming process that has
to be executed from scratch for every dataset/model combination even by
experienced data scientists. We argue that the incapacity to evaluate multiple
parameter configurations simultaneously and the lack of support to quickly
identify sub-optimal configurations are the principal causes. In this paper, we
develop two database-inspired techniques for efficient model calibration.
Speculative parameter testing applies advanced parallel multi-query processing
methods to evaluate several configurations concurrently. The number of
configurations is determined adaptively at runtime, while the configurations
themselves are extracted from a distribution that is continuously learned
following a Bayesian process. Online aggregation is applied to identify
sub-optimal configurations early in the processing by incrementally sampling
the training dataset and estimating the objective function corresponding to
each configuration. We design concurrent online aggregation estimators and
define halting conditions to accurately and timely stop the execution. We apply
the proposed techniques to distributed gradient descent optimization -- batch
and incremental -- for support vector machines and logistic regression models.
We implement the resulting solutions in GLADE PF-OLA -- a state-of-the-art Big
Data analytics system -- and evaluate their performance over terascale-size
synthetic and real datasets. The results confirm that as many as 32
configurations can be evaluated concurrently almost as fast as one, while
sub-optimal configurations are detected accurately in as little as a
fraction of the time
Sparse Allreduce: Efficient Scalable Communication for Power-Law Data
Many large datasets exhibit power-law statistics: The web graph, social
networks, text data, click through data etc. Their adjacency graphs are termed
natural graphs, and are known to be difficult to partition. As a consequence
most distributed algorithms on these graphs are communication intensive. Many
algorithms on natural graphs involve an Allreduce: a sum or average of
partitioned data which is then shared back to the cluster nodes. Examples
include PageRank, spectral partitioning, and many machine learning algorithms
including regression, factor (topic) models, and clustering. In this paper we
describe an efficient and scalable Allreduce primitive for power-law data. We
point out scaling problems with existing butterfly and round-robin networks for
Sparse Allreduce, and show that a hybrid approach improves on both.
Furthermore, we show that Sparse Allreduce stages should be nested instead of
cascaded (as in the dense case). And that the optimum throughput Allreduce
network should be a butterfly of heterogeneous degree where degree decreases
with depth into the network. Finally, a simple replication scheme is introduced
to deal with node failures. We present experiments showing significant
improvements over existing systems such as PowerGraph and Hadoop
- …