352,115 research outputs found
Block-distributed Gradient Boosted Trees
The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine
learning algorithms used in production, for tasks that include Click-Through
Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets
available today, many distributed GBT methods have been proposed. However, they
all assume a row-distributed dataset, addressing scalability only with respect
to the number of data points and not the number of features, and increasing
communication cost for high-dimensional data. In order to allow for scalability
across both the data point and feature dimensions, and reduce communication
cost, we propose block-distributed GBTs. We achieve communication efficiency by
making full use of the data sparsity and adapting the Quickscorer algorithm to
the block-distributed setting. We evaluate our approach using datasets with
millions of features, and demonstrate that we are able to achieve multiple
orders of magnitude reduction in communication cost for sparse data, with no
loss in accuracy, while providing a more scalable design. As a result, we are
able to reduce the training time for high-dimensional data, and allow more
cost-effective scale-out without the need for expensive network communication.Comment: SIGIR 201
DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization
Machine learning with big data often involves large optimization models. For
distributed optimization over a cluster of machines, frequent communication and
synchronization of all model parameters (optimization variables) can be very
costly. A promising solution is to use parameter servers to store different
subsets of the model parameters, and update them asynchronously at different
machines using local datasets. In this paper, we focus on distributed
optimization of large linear models with convex loss functions, and propose a
family of randomized primal-dual block coordinate algorithms that are
especially suitable for asynchronous distributed implementation with parameter
servers. In particular, we work with the saddle-point formulation of such
problems which allows simultaneous data and model partitioning, and exploit its
structure by doubly stochastic coordinate optimization with variance reduction
(DSCOVR). Compared with other first-order distributed algorithms, we show that
DSCOVR may require less amount of overall computation and communication, and
less or no synchronization. We discuss the implementation details of the DSCOVR
algorithms, and present numerical experiments on an industrial distributed
computing system
RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets
In this paper, we propose a class of robust stochastic subgradient methods
for distributed learning from heterogeneous datasets at presence of an unknown
number of Byzantine workers. The Byzantine workers, during the learning
process, may send arbitrary incorrect messages to the master due to data
corruptions, communication failures or malicious attacks, and consequently bias
the learned model. The key to the proposed methods is a regularization term
incorporated with the objective function so as to robustify the learning task
and mitigate the negative effects of Byzantine attacks. The resultant
subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation
methods, justifying our acronym RSA used henceforth. In contrast to most of the
existing algorithms, RSA does not rely on the assumption that the data are
independent and identically distributed (i.i.d.) on the workers, and hence fits
for a wider class of applications. Theoretically, we show that: i) RSA
converges to a near-optimal solution with the learning error dependent on the
number of Byzantine workers; ii) the convergence rate of RSA under Byzantine
attacks is the same as that of the stochastic gradient descent method, which is
free of Byzantine attacks. Numerically, experiments on real dataset corroborate
the competitive performance of RSA and a complexity reduction compared to the
state-of-the-art alternatives.Comment: To appear in AAAI 201
Company classification using machine learning
The recent advancements in computational power and machine learning
algorithms have led to vast improvements in manifold areas of research.
Especially in finance, the application of machine learning enables both
researchers and practitioners to gain new insights into financial data and
well-studied areas such as company classification. In our paper, we demonstrate
that unsupervised machine learning algorithms can be used to visualize and
classify company data in an economically meaningful and effective way. In
particular, we implement the data-driven dimension reduction and visualization
tool t-distributed stochastic neighbor embedding (t-SNE) in combination with
spectral clustering. The resulting company groups can then be utilized by
experts in the field for empirical analysis and optimal decision making. By
providing an exemplary out-of-sample study within a portfolio optimization
framework, we show that the application of t-SNE and spectral clustering
improves the overall portfolio performance. Therefore, we introduce our
approach to the financial community as a valuable technique in the context of
data analysis and company classification.Comment: 16 pages, 6 figures, 1 tabl
On the hardness of the Learning with Errors problem with a discrete reproducible error distribution
In this work we show that the hardness of the Learning with Errors problem
with errors taken from the discrete Gaussian distribution implies the hardness
of the Learning with Errors problem with errors taken from the symmetric
Skellam distribution. Due to the sample preserving search-to-decision reduction
by Micciancio and Mol the same result applies to the decisional version of the
problem. Thus, we provide a variant of the Learning with Errors problem that is
hard based on conjecturally hard lattice problems and uses a discrete error
distribution that is similar to the continuous Gaussian distribution in that it
is closed under convolution. As an application of this result we construct a
post-quantum cryptographic protocol for differentially private data anlysis in
the distributed model. The security of this protocol is based on the hardness
of the new variant of the Decisional Learning with Errors problem. A feature of
this protocol is the use of the same noise for security and for differential
privacy resulting in an efficiency boost
Distributed dual vigilance fuzzy adaptive resonance theory learns online, retrieves arbitrarily-shaped clusters, and mitigates order dependence
This paper presents a novel adaptive resonance theory (ART)-based modular
architecture for unsupervised learning, namely the distributed dual vigilance
fuzzy ART (DDVFA). DDVFA consists of a global ART system whose nodes are local
fuzzy ART modules. It is equipped with the distinctive features of distributed
higher-order activation and match functions, using dual vigilance parameters
responsible for cluster similarity and data quantization. Together, these allow
DDVFA to perform unsupervised modularization, create multi-prototype clustering
representations, retrieve arbitrarily-shaped clusters, and control its
compactness. Another important contribution is the reduction of
order-dependence, an issue that affects any agglomerative clustering method.
This paper demonstrates two approaches for mitigating order-dependence:
preprocessing using visual assessment of cluster tendency (VAT) or
postprocessing using a novel Merge ART module. The former is suitable for batch
processing, whereas the latter can be used in online learning. Experimental
results in the online learning mode carried out on 30 benchmark data sets show
that DDVFA cascaded with Merge ART statistically outperformed the best other
ART-based systems when samples were randomly presented. Conversely, they were
found to be statistically equivalent in the offline mode when samples were
pre-processed using VAT. Remarkably, performance comparisons to non-ART-based
clustering algorithms show that DDVFA (which learns incrementally) was also
statistically equivalent to the non-incremental (offline) methods of DBSCAN,
single linkage hierarchical agglomerative clustering (HAC), and k-means, while
retaining the appealing properties of ART. Links to the source code and data
are provided. Considering the algorithm's simplicity, online learning
capability, and performance, it is an ideal choice for many agglomerative
clustering applications
Dimensionality Reduction Ensembles
Ensemble learning has had many successes in supervised learning, but it has
been rare in unsupervised learning and dimensionality reduction. This study
explores dimensionality reduction ensembles, using principal component analysis
and manifold learning techniques to capture linear, nonlinear, local, and
global features in the original dataset. Dimensionality reduction ensembles are
tested first on simulation data and then on two real medical datasets using
random forest classifiers; results suggest the efficacy of this approach, with
accuracies approaching that of the full dataset. Limitations include
computational cost of some algorithms with strong performance, which may be
ameliorated through distributed computing and the development of more efficient
versions of these algorithms.Comment: 12 pages, 1 table, 8 figures; under revie
A Distributed Algorithm for Training Augmented Complex Adaptive IIR Filters
In this paper we consider the problem of decentralized (distributed) adaptive
learning, where the aim of the network is to train the coefficients of a widely
linear autoregressive moving average (ARMA) model by measurements collected by
the nodes. Such a problem arises in many sensor network-based applications such
as target tracking, fast rerouting, data reduction and data aggregation. We
assume that each node of the network uses the augmented complex adaptive
infinite impulse response (ACAIIR) filter as the learning rule, and nodes
interact with each other under an incremental mode of cooperation. Since the
proposed algorithm (incremental augmented complex IIR (IACA-IIR) algorithm)
relies on the augmented complex statistics, it can be used to model both types
of complex-valued signals (proper and improper signals). To evaluate the
performance of the proposed algorithm, we use both synthetic and real-world
complex signals in our simulations. The results exhibit superior performance of
the proposed algorithm over the non-cooperative ACAIIR algorithm.Comment: Draft version, 11 Pages, 4 Figure
Geometric Foundations of Data Reduction
The purpose of this paper is to write a complete survey of the (spectral)
manifold learning methods and nonlinear dimensionality reduction (NLDR) in data
reduction. The first two NLDR methods in history were respectively published in
Science in 2000 in which they solve the similar reduction problem of
high-dimensional data endowed with the intrinsic nonlinear structure. The
intrinsic nonlinear structure is always interpreted as a concept in manifolds
from geometry and topology in theoretical mathematics by computer scientists
and theoretical physicists. In 2001, the concept of Manifold Learning first
appears as an NLDR method called Laplacian Eigenmaps purposed by Belkin and
Niyogi. In the typical manifold learning setup, the data set, also called the
observation set, is distributed on or near a low dimensional manifold
embedded in , which yields that each observation has a
-dimensional representation. The goal of (spectral) manifold learning is to
reduce these observations as a compact lower-dimensional representation based
on the geometric information. The reduction procedure is called the (spectral)
manifold learning method. In this paper, we derive each (spectral) manifold
learning method with the matrix and operator representation, and we then
discuss the convergence behavior of each method in a geometric uniform
language. Hence, we name the survey Geometric Foundations of Data Reduction.Comment: 79 pages, Suver
Intermittent Pulling with Local Compensation for Communication-Efficient Federated Learning
Federated Learning is a powerful machine learning paradigm to cooperatively
train a global model with highly distributed data. A major bottleneck on the
performance of distributed Stochastic Gradient Descent (SGD) algorithm for
large-scale Federated Learning is the communication overhead on pushing local
gradients and pulling global model. In this paper, to reduce the communication
complexity of Federated Learning, a novel approach named Pulling Reduction with
Local Compensation (PRLC) is proposed. Specifically, each training node
intermittently pulls the global model from the server in SGD iterations,
resulting in that it is sometimes unsynchronized with the server. In such a
case, it will use its local update to compensate the gap between the local
model and the global model. Our rigorous theoretical analysis of PRLC achieves
two important findings. First, we prove that the convergence rate of PRLC
preserves the same order as the classical synchronous SGD for both
strongly-convex and non-convex cases with good scalability due to the linear
speedup with respect to the number of training nodes. Second, we show that PRLC
admits lower pulling frequency than the existing pulling reduction method
without local compensation. We also conduct extensive experiments on various
machine learning models to validate our theoretical results. Experimental
results show that our approach achieves a significant pulling reduction over
the state-of-the-art methods, e.g., PRLC requiring only half of the pulling
operations of LAG
- …