2,055 research outputs found
Petuum: A New Platform for Distributed Machine Learning on Big Data
What is a systematic way to efficiently apply a wide spectrum of advanced ML
programs to industrial scale problems, using Big Models (up to 100s of billions
of parameters) on Big Data (up to terabytes or petabytes)? Modern
parallelization strategies employ fine-grained operations and scheduling beyond
the classic bulk-synchronous processing paradigm popularized by MapReduce, or
even specialized graph-based execution that relies on graph representations of
ML programs. The variety of approaches tends to pull systems and algorithms
design in different directions, and it remains difficult to find a universal
platform applicable to a wide range of ML programs at scale. We propose a
general-purpose framework that systematically addresses data- and
model-parallel challenges in large-scale ML, by observing that many ML programs
are fundamentally optimization-centric and admit error-tolerant,
iterative-convergent algorithmic solutions. This presents unique opportunities
for an integrative system design, such as bounded-error network synchronization
and dynamic scheduling based on ML program structure. We demonstrate the
efficacy of these system designs versus well-known implementations of modern ML
algorithms, allowing ML programs to run in much less time and at considerably
larger model sizes, even on modestly-sized compute clusters.Comment: 15 pages, 10 figures, final version in KDD 2015 under the same titl
Scalable optimization algorithms for recommender systems
Recommender systems have now gained significant popularity and been widely used in many e-commerce applications. Predicting user preferences is a key step to providing high quality recommendations. In practice, however, suggestions made to users must not only consider user preferences in isolation; a good recommendation engine also needs to account for certain constraints. For instance, an online video rental that suggests multimedia items (e.g., DVDs) to its customers should consider the availability of DVDs in stock to reduce customer waiting times for accepted recommendations. Moreover, every user should receive a small but sufficient number of suggestions that the user is likely to be interested in.
This thesis aims to develop and implement scalable optimization algorithms that can be used (but are not restricted) to generate recommendations satisfying certain objectives and constraints like the ones above. State-of-the-art approaches lack efficiency and/or scalability in coping with large real-world instances, which may involve millions of users and items. First, we study large-scale matrix completion in the context of collaborative filtering in recommender systems. For such problems, we propose a set of novel shared-nothing algorithms which are designed to run on a small cluster of commodity nodes and outperform alternative approaches in terms of efficiency, scalability, and memory footprint. Next, we view our recommendation task as a generalized matching problem, and propose the first distributed solution for solving such problems at scale. Our algorithm is designed to run on a small cluster of commodity nodes (or in a MapReduce environment) and has strong approximation guarantees. Our matching algorithm relies on linear programming. To this end, we present an efficient distributed approximation algorithm for mixed packing-covering linear programs, a simple but expressive subclass of linear programs. Our approximation algorithm requires a poly-logarithmic number of passes over the input, is simple, and well-suited for parallel processing on GPUs, in shared-memory architectures, as well as on a small cluster of commodity nodes.Empfehlungssysteme haben eine beachtliche PopularitĂ€t erreicht und werden in zahlreichen E-Commerce Anwendungen eingesetzt. Entscheidend fĂŒr die Generierung hochqualitativer Empfehlungen ist die Vorhersage von NutzerprĂ€ferenzen. Jedoch sollten in der Praxis nicht nur VorschlĂ€ge auf Basis von NutzerprĂ€ferenzen gegeben werden, sondern gute Empfehlungssysteme mĂŒssen auch bestimmte Nebenbedingungen berĂŒcksichtigen. Zum Beispiel sollten online Videoverleihfirmen, welche ihren Kunden multimediale Produkte (z.B. DVDs) vorschlagen, die VerfĂŒgbarkeit von vorrĂ€tigen DVDs beachten, um die Wartezeit der Kunden fĂŒr angenommene Empfehlungen zu reduzieren. DarĂŒber hinaus sollte jeder Kunde eine kleine, aber ausreichende Anzahl an VorschlĂ€gen erhalten, an denen er interessiert sein könnte.
Diese Arbeit strebt an skalierbare Optimierungsalgorithmen zu entwickeln und zu implementieren, die (unter anderem) eingesetzt werden können Empfehlungen zu generieren, welche weitere Zielvorgaben und Restriktionen einhalten. Derzeit existierenden AnsĂ€tzen mangelt es an Effizienz und/oder Skalierbarkeit im Umgang mit sehr groĂen, durchaus realen DatensĂ€tzen von, beispielsweise Millionen von Nutzern und Produkten. ZunĂ€chst analysieren wir die VervollstĂ€ndigung groĂskalierter Matrizen im Kontext von kollaborativen Filtern in Empfehlungssystemen. FĂŒr diese Probleme schlagen wir verschiedene neue, verteilte Algorithmen vor, welche konzipiert sind auf einer kleinen Anzahl von gĂ€ngigen Rechnern zu laufen. Zudem können sie alternative AnsĂ€tze hinsichtlich der Effizienz, Skalierbarkeit und benötigten SpeicherkapazitĂ€t ĂŒberragen. Als NĂ€chstes haben wir die Empfehlungsproblematik als ein generalisiertes Zuordnungsproblem betrachtet und schlagen daher die erste verteilte Lösung fĂŒr groĂskalierte Zuordnungsprobleme vor. Unser Algorithmus funktioniert auf einer kleinen Gruppe von gĂ€ngigen Rechnern (oder in einem MapReduce-Programmierungsmodel) und erzielt gute Approximationsgarantien. Unser Zuordnungsalgorithmus beruht auf linearer Programmierung. Daher prĂ€sentieren wir einen effizienten, verteilten Approximationsalgorithmus fĂŒr vermischte lineare Packungs- und Ăberdeckungsprobleme, eine einfache aber expressive Unterklasse der linearen Programmierung. Unser Algorithmus benötigt eine polylogarithmische Anzahl an Scans der Eingabedaten. Zudem ist er einfach und sehr gut geeignet fĂŒr eine parallele Verarbeitung mithilfe von Grafikprozessoren, unter einer gemeinsam genutzten Speicherarchitektur sowie auf einer kleinen Gruppe von gĂ€ngigen Rechnern
Dynamic Parameter Allocation in Parameter Servers
To keep up with increasing dataset sizes and model complexity, distributed
training has become a necessity for large machine learning tasks. Parameter
servers ease the implementation of distributed parameter management---a key
concern in distributed training---, but can induce severe communication
overhead. To reduce communication overhead, distributed machine learning
algorithms use techniques to increase parameter access locality (PAL),
achieving up to linear speed-ups. We found that existing parameter servers
provide only limited support for PAL techniques, however, and therefore prevent
efficient training. In this paper, we explore whether and to what extent PAL
techniques can be supported, and whether such support is beneficial. We propose
to integrate dynamic parameter allocation into parameter servers, describe an
efficient implementation of such a parameter server called Lapse, and
experimentally compare its performance to existing parameter servers across a
number of machine learning tasks. We found that Lapse provides near-linear
scaling and can be orders of magnitude faster than existing parameter servers
Speculative Approximations for Terascale Analytics
Model calibration is a major challenge faced by the plethora of statistical
analytics packages that are increasingly used in Big Data applications.
Identifying the optimal model parameters is a time-consuming process that has
to be executed from scratch for every dataset/model combination even by
experienced data scientists. We argue that the incapacity to evaluate multiple
parameter configurations simultaneously and the lack of support to quickly
identify sub-optimal configurations are the principal causes. In this paper, we
develop two database-inspired techniques for efficient model calibration.
Speculative parameter testing applies advanced parallel multi-query processing
methods to evaluate several configurations concurrently. The number of
configurations is determined adaptively at runtime, while the configurations
themselves are extracted from a distribution that is continuously learned
following a Bayesian process. Online aggregation is applied to identify
sub-optimal configurations early in the processing by incrementally sampling
the training dataset and estimating the objective function corresponding to
each configuration. We design concurrent online aggregation estimators and
define halting conditions to accurately and timely stop the execution. We apply
the proposed techniques to distributed gradient descent optimization -- batch
and incremental -- for support vector machines and logistic regression models.
We implement the resulting solutions in GLADE PF-OLA -- a state-of-the-art Big
Data analytics system -- and evaluate their performance over terascale-size
synthetic and real datasets. The results confirm that as many as 32
configurations can be evaluated concurrently almost as fast as one, while
sub-optimal configurations are detected accurately in as little as a
fraction of the time
- âŠ