2,276 research outputs found
Balancing the Communication Load of Asynchronously Parallelized Machine Learning Algorithms
Stochastic Gradient Descent (SGD) is the standard numerical method used to
solve the core optimization problem for the vast majority of machine learning
(ML) algorithms. In the context of large scale learning, as utilized by many
Big Data applications, efficient parallelization of SGD is in the focus of
active research. Recently, we were able to show that the asynchronous
communication paradigm can be applied to achieve a fast and scalable
parallelization of SGD. Asynchronous Stochastic Gradient Descent (ASGD)
outperforms other, mostly MapReduce based, parallel algorithms solving large
scale machine learning problems. In this paper, we investigate the impact of
asynchronous communication frequency and message size on the performance of
ASGD applied to large scale ML on HTC cluster and cloud environments. We
introduce a novel algorithm for the automatic balancing of the asynchronous
communication load, which allows to adapt ASGD to changing network bandwidths
and latencies.Comment: arXiv admin note: substantial text overlap with arXiv:1505.0495
Asynchronous Parallel Stochastic Gradient Descent - A Numeric Core for Scalable Distributed Machine Learning Algorithms
The implementation of a vast majority of machine learning (ML) algorithms
boils down to solving a numerical optimization problem. In this context,
Stochastic Gradient Descent (SGD) methods have long proven to provide good
results, both in terms of convergence and accuracy. Recently, several
parallelization approaches have been proposed in order to scale SGD to solve
very large ML problems. At their core, most of these approaches are following a
map-reduce scheme. This paper presents a novel parallel updating algorithm for
SGD, which utilizes the asynchronous single-sided communication paradigm.
Compared to existing methods, Asynchronous Parallel Stochastic Gradient Descent
(ASGD) provides faster (or at least equal) convergence, close to linear scaling
and stable accuracy
Modeling Scalability of Distributed Machine Learning
Present day machine learning is computationally intensive and processes large
amounts of data. It is implemented in a distributed fashion in order to address
these scalability issues. The work is parallelized across a number of computing
nodes. It is usually hard to estimate in advance how many nodes to use for a
particular workload. We propose a simple framework for estimating the
scalability of distributed machine learning algorithms. We measure the
scalability by means of the speedup an algorithm achieves with more nodes. We
propose time complexity models for gradient descent and graphical model
inference. We validate our models with experiments on deep learning training
and belief propagation. This framework was used to study the scalability of
machine learning algorithms in Apache Spark.Comment: 6 pages, 4 figures, appears at ICDE 201
Recommended from our members
A High-Performance Domain-Specific Language and Code Generator for General N-body Problems
General N-body problems are a set of problems in which an update to a single element in the system depends on every other element. N-body problems are ubiquitous, with applications in various domains ranging from scientific computing simulations in molecular dynamics, astrophysics, acoustics, and fluid dynamics all the way to computer vision, data mining and machine learning problems. Different N-body algorithms have been designed and implemented in these various fields. However, there is a big gap between the algorithm one designs on paper and the code that runs efficiently on a parallel system. It is time-consuming to write fast, parallel, and scalable code for these problems. On the other hand, the sheer scale and growth of modern scientific datasets necessitate exploiting the power of both parallel and approximation algorithms where there is a potential to trade-off accuracy for performance. The main problem that we are tackling in this thesis is how to automatically generate asymptotically optimal N-body algorithms from the high-level specification of the problem. We combine the body of work in performance optimizations, compilers and the domain of N-body problems to build a unified system where domain scientists can write programs at the high level while attaining performance of code written by an expert at the low level.In order to generate a high-performance, scalable code for this group of problems, we take the following steps in this thesis; first, we propose a unified algorithmic framework named PASCAL in order to address the challenge of designing a general algorithmic template to represent the class of N-body problems. PASCAL utilizes space-partitioning trees and user-controlled pruning/approximations to reduce the asymptotic runtime complexity from linear to logarithmic in the number of data points. In PASCAL, we design an algorithm that automatically generates conditions for pruning or approximation of an N-body problem considering the problem's definition. In order to evaluate PASCAL, we developed tree-based algorithms for six well-known problems: k-nearest neighbors, range search, minimum spanning tree, kernel density estimation, expectation maximization, and Hausdorff distance. We show that applying domain-specific optimizations and parallelization to the algorithms written in PASCAL achieves 10x to 230x speedup compared to state-of-the-art libraries on a dual-socket Intel Xeon processor with 16 cores on real-world datasets. Second, we extend the PASCAL framework to build PASCAL-X that adds support for NUMA-aware parallelization. PASCAL-X also presents insights on the influence of tuning parameters. Tuning parameters such as leaf size (influences the shape of the tree) and cut-off level (controls the granularity of tasks) of the space-partitioning trees result in performance improvement of up to 4.6x. A key goal is to generate scalable and high-performance code automatically without sacrificing productivity. That implies minimizing the effort the users have to put in to generate the desired high-performance code. Another critical factor is the adaptivity, which indicates the amount of effort that is required to extend the high-performance code generation to new N-body problems. Finally, we consider these factors and develop a domain-specific language and code generator named Portal, which is built on top of PASCAL-X. Portal's language design is inspired by the mathematical representation of N-body problems, resulting in an intuitive language for rapid implementation of a variety of problems. Portal's back-end is designed and implemented to generate optimized, parallel, and scalable implementations for multi-core systems. We demonstrate that the performance achieved by using Portal is comparable to that of expert hand-optimized code while providing productivity for domain scientists. For instance, using Portal for the k-nearest neighbors problem gains performance that is similar to the hand-optimized code, while reducing the lines of code by 68x. To the best of our knowledge, there are no known libraries or frameworks that implement parallel asymptotically optimal algorithms for the class of general N-body problems and this thesis primarily aims to fill this gap. Finally, we present a case study of Portal for the real-world problem of face clustering. In this case study, we show that Portal not only provides a fast solution for the face clustering problem with similar accuracy as the state-of-the-art algorithm, but also it provides productivity by implementing the face clustering algorithm in only 14 lines of Portal code
On Understanding Big Data Impacts in Remotely Sensed Image Classification Using Support Vector Machine Methods
Owing to the recent development of sensor resolutions onboard different Earth observation platforms, remote sensing is an important source of information for mapping and monitoring natural and man-made land covers. Of particular importance is the increasing amounts of available hyperspectral data originating from airborne and satellite sensors such as AVIRIS, HyMap, and Hyperion with very high spectral resolution (i.e., high number of spectral channels) containing rich information for a wide range of applications. A relevant example is the separation of different types of land-cover classes using the data in order to understand, e.g., impacts of natural disasters or changing of city buildings over time. More recently, such increases in the data volume, velocity, and variety of data contributed to the term big data that stand for challenges shared with many other scientific disciplines. On one hand, the amount of available data is increasing in a way that raises the demand for automatic data analysis elements since many of the available data collections are massively underutilized lacking experts for manual investigation. On the other hand, proven statistical methods (e.g., dimensionality reduction) driven by manual approaches have a significant impact in reducing the amount of big data toward smaller smart data contributing to the more recently used terms data value and veracity (i.e., less noise, lower dimensions that capture the most important information). This paper aims to take stock of which proven statistical data mining methods in remote sensing are used to contribute to smart data analysis processes in the light of possible automation as well as scalable and parallel processing techniques. We focus on parallel support vector machines (SVMs) as one of the best out-of-the-box classification methods.Sponsored by: IEEE Geoscience & Remote Sensing SocietyRitrýnt tímaritPeer reviewedPre prin
- …