7,962 research outputs found
Petuum: A New Platform for Distributed Machine Learning on Big Data
What is a systematic way to efficiently apply a wide spectrum of advanced ML
programs to industrial scale problems, using Big Models (up to 100s of billions
of parameters) on Big Data (up to terabytes or petabytes)? Modern
parallelization strategies employ fine-grained operations and scheduling beyond
the classic bulk-synchronous processing paradigm popularized by MapReduce, or
even specialized graph-based execution that relies on graph representations of
ML programs. The variety of approaches tends to pull systems and algorithms
design in different directions, and it remains difficult to find a universal
platform applicable to a wide range of ML programs at scale. We propose a
general-purpose framework that systematically addresses data- and
model-parallel challenges in large-scale ML, by observing that many ML programs
are fundamentally optimization-centric and admit error-tolerant,
iterative-convergent algorithmic solutions. This presents unique opportunities
for an integrative system design, such as bounded-error network synchronization
and dynamic scheduling based on ML program structure. We demonstrate the
efficacy of these system designs versus well-known implementations of modern ML
algorithms, allowing ML programs to run in much less time and at considerably
larger model sizes, even on modestly-sized compute clusters.Comment: 15 pages, 10 figures, final version in KDD 2015 under the same titl
Euclidean Distance Matrices: Essential Theory, Algorithms and Applications
Euclidean distance matrices (EDM) are matrices of squared distances between
points. The definition is deceivingly simple: thanks to their many useful
properties they have found applications in psychometrics, crystallography,
machine learning, wireless sensor networks, acoustics, and more. Despite the
usefulness of EDMs, they seem to be insufficiently known in the signal
processing community. Our goal is to rectify this mishap in a concise tutorial.
We review the fundamental properties of EDMs, such as rank or
(non)definiteness. We show how various EDM properties can be used to design
algorithms for completing and denoising distance data. Along the way, we
demonstrate applications to microphone position calibration, ultrasound
tomography, room reconstruction from echoes and phase retrieval. By spelling
out the essential algorithms, we hope to fast-track the readers in applying
EDMs to their own problems. Matlab code for all the described algorithms, and
to generate the figures in the paper, is available online. Finally, we suggest
directions for further research.Comment: - 17 pages, 12 figures, to appear in IEEE Signal Processing Magazine
- change of title in the last revisio
- …