Search CORE

352,115 research outputs found

Block-distributed Gradient Boosted Trees

Author: Boström Henrik
Cho Hyunsu
Vasiloudis Theodore
Publication venue
Publication date: 28/05/2019
Field of study

The Gradient Boosted Tree (GBT) algorithm is one of the most popular machine learning algorithms used in production, for tasks that include Click-Through Rate (CTR) prediction and learning-to-rank. To deal with the massive datasets available today, many distributed GBT methods have been proposed. However, they all assume a row-distributed dataset, addressing scalability only with respect to the number of data points and not the number of features, and increasing communication cost for high-dimensional data. In order to allow for scalability across both the data point and feature dimensions, and reduce communication cost, we propose block-distributed GBTs. We achieve communication efficiency by making full use of the data sparsity and adapting the Quickscorer algorithm to the block-distributed setting. We evaluate our approach using datasets with millions of features, and demonstrate that we are able to achieve multiple orders of magnitude reduction in communication cost for sparse data, with no loss in accuracy, while providing a more scalable design. As a result, we are able to reduce the training time for high-dimensional data, and allow more cost-effective scale-out without the need for expensive network communication.Comment: SIGIR 201

arXiv.org e-Print Archive

DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization

Author: Chen Weizhu
Lin Qihang
Xiao Lin
Yu Adams Wei
Publication venue
Publication date: 13/10/2017
Field of study

Machine learning with big data often involves large optimization models. For distributed optimization over a cluster of machines, frequent communication and synchronization of all model parameters (optimization variables) can be very costly. A promising solution is to use parameter servers to store different subsets of the model parameters, and update them asynchronously at different machines using local datasets. In this paper, we focus on distributed optimization of large linear models with convex loss functions, and propose a family of randomized primal-dual block coordinate algorithms that are especially suitable for asynchronous distributed implementation with parameter servers. In particular, we work with the saddle-point formulation of such problems which allows simultaneous data and model partitioning, and exploit its structure by doubly stochastic coordinate optimization with variance reduction (DSCOVR). Compared with other first-order distributed algorithms, we show that DSCOVR may require less amount of overall computation and communication, and less or no synchronization. We discuss the implementation details of the DSCOVR algorithms, and present numerical experiments on an industrial distributed computing system

arXiv.org e-Print Archive

RSA: Byzantine-Robust Stochastic Aggregation Methods for Distributed Learning from Heterogeneous Datasets

Author: Chen Tianyi
Giannakis Georgios B.
Li Liping
Ling Qing
Xu Wei
Publication venue
Publication date: 17/07/2019
Field of study

In this paper, we propose a class of robust stochastic subgradient methods for distributed learning from heterogeneous datasets at presence of an unknown number of Byzantine workers. The Byzantine workers, during the learning process, may send arbitrary incorrect messages to the master due to data corruptions, communication failures or malicious attacks, and consequently bias the learned model. The key to the proposed methods is a regularization term incorporated with the objective function so as to robustify the learning task and mitigate the negative effects of Byzantine attacks. The resultant subgradient-based algorithms are termed Byzantine-Robust Stochastic Aggregation methods, justifying our acronym RSA used henceforth. In contrast to most of the existing algorithms, RSA does not rely on the assumption that the data are independent and identically distributed (i.i.d.) on the workers, and hence fits for a wider class of applications. Theoretically, we show that: i) RSA converges to a near-optimal solution with the learning error dependent on the number of Byzantine workers; ii) the convergence rate of RSA under Byzantine attacks is the same as that of the stochastic gradient descent method, which is free of Byzantine attacks. Numerically, experiments on real dataset corroborate the competitive performance of RSA and a complexity reduction compared to the state-of-the-art alternatives.Comment: To appear in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Company classification using machine learning

Author: Husmann Sven
Shivarova Antoniya
Steinert Rick
Publication venue
Publication date: 20/05/2020
Field of study

The recent advancements in computational power and machine learning algorithms have led to vast improvements in manifold areas of research. Especially in finance, the application of machine learning enables both researchers and practitioners to gain new insights into financial data and well-studied areas such as company classification. In our paper, we demonstrate that unsupervised machine learning algorithms can be used to visualize and classify company data in an economically meaningful and effective way. In particular, we implement the data-driven dimension reduction and visualization tool t-distributed stochastic neighbor embedding (t-SNE) in combination with spectral clustering. The resulting company groups can then be utilized by experts in the field for empirical analysis and optimal decision making. By providing an exemplary out-of-sample study within a portfolio optimization framework, we show that the application of t-SNE and spectral clustering improves the overall portfolio performance. Therefore, we introduce our approach to the financial community as a valuable technique in the context of data analysis and company classification.Comment: 16 pages, 6 figures, 1 tabl

arXiv.org e-Print Archive

On the hardness of the Learning with Errors problem with a discrete reproducible error distribution

Author: Valovich Filipp
Publication venue
Publication date: 21/06/2016
Field of study

In this work we show that the hardness of the Learning with Errors problem with errors taken from the discrete Gaussian distribution implies the hardness of the Learning with Errors problem with errors taken from the symmetric Skellam distribution. Due to the sample preserving search-to-decision reduction by Micciancio and Mol the same result applies to the decisional version of the problem. Thus, we provide a variant of the Learning with Errors problem that is hard based on conjecturally hard lattice problems and uses a discrete error distribution that is similar to the continuous Gaussian distribution in that it is closed under convolution. As an application of this result we construct a post-quantum cryptographic protocol for differentially private data anlysis in the distributed model. The security of this protocol is based on the hardness of the new variant of the Decisional Learning with Errors problem. A feature of this protocol is the use of the same noise for security and for differential privacy resulting in an efficiency boost

arXiv.org e-Print Archive

Distributed dual vigilance fuzzy adaptive resonance theory learns online, retrieves arbitrarily-shaped clusters, and mitigates order dependence

Author: da Silva Leonardo Enzo Brito
Elnabarawy Islam
Wunsch II Donald C.
Publication venue
Publication date: 28/11/2018
Field of study

This paper presents a novel adaptive resonance theory (ART)-based modular architecture for unsupervised learning, namely the distributed dual vigilance fuzzy ART (DDVFA). DDVFA consists of a global ART system whose nodes are local fuzzy ART modules. It is equipped with the distinctive features of distributed higher-order activation and match functions, using dual vigilance parameters responsible for cluster similarity and data quantization. Together, these allow DDVFA to perform unsupervised modularization, create multi-prototype clustering representations, retrieve arbitrarily-shaped clusters, and control its compactness. Another important contribution is the reduction of order-dependence, an issue that affects any agglomerative clustering method. This paper demonstrates two approaches for mitigating order-dependence: preprocessing using visual assessment of cluster tendency (VAT) or postprocessing using a novel Merge ART module. The former is suitable for batch processing, whereas the latter can be used in online learning. Experimental results in the online learning mode carried out on 30 benchmark data sets show that DDVFA cascaded with Merge ART statistically outperformed the best other ART-based systems when samples were randomly presented. Conversely, they were found to be statistically equivalent in the offline mode when samples were pre-processed using VAT. Remarkably, performance comparisons to non-ART-based clustering algorithms show that DDVFA (which learns incrementally) was also statistically equivalent to the non-incremental (offline) methods of DBSCAN, single linkage hierarchical agglomerative clustering (HAC), and k-means, while retaining the appealing properties of ART. Links to the source code and data are provided. Considering the algorithm's simplicity, online learning capability, and performance, it is an ideal choice for many agglomerative clustering applications

arXiv.org e-Print Archive

Dimensionality Reduction Ensembles

Author: Farrelly Colleen M.
Publication venue
Publication date: 11/10/2017
Field of study

Ensemble learning has had many successes in supervised learning, but it has been rare in unsupervised learning and dimensionality reduction. This study explores dimensionality reduction ensembles, using principal component analysis and manifold learning techniques to capture linear, nonlinear, local, and global features in the original dataset. Dimensionality reduction ensembles are tested first on simulation data and then on two real medical datasets using random forest classifiers; results suggest the efficacy of this approach, with accuracies approaching that of the full dataset. Limitations include computational cost of some algorithms with strong performance, which may be ameliorated through distributed computing and the development of more efficient versions of these algorithms.Comment: 12 pages, 1 table, 8 figures; under revie

arXiv.org e-Print Archive

A Distributed Algorithm for Training Augmented Complex Adaptive IIR Filters

Author: Bazzi Wael M.
Khalili Azam
Rahmati Reza G.
Rastegarnia Amir
Publication venue
Publication date: 12/07/2016
Field of study

In this paper we consider the problem of decentralized (distributed) adaptive learning, where the aim of the network is to train the coefficients of a widely linear autoregressive moving average (ARMA) model by measurements collected by the nodes. Such a problem arises in many sensor network-based applications such as target tracking, fast rerouting, data reduction and data aggregation. We assume that each node of the network uses the augmented complex adaptive infinite impulse response (ACAIIR) filter as the learning rule, and nodes interact with each other under an incremental mode of cooperation. Since the proposed algorithm (incremental augmented complex IIR (IACA-IIR) algorithm) relies on the augmented complex statistics, it can be used to model both types of complex-valued signals (proper and improper signals). To evaluate the performance of the proposed algorithm, we use both synthetic and real-world complex signals in our simulations. The results exhibit superior performance of the proposed algorithm over the non-cooperative ACAIIR algorithm.Comment: Draft version, 11 Pages, 4 Figure

arXiv.org e-Print Archive

Geometric Foundations of Data Reduction

Author: Ju Ce
Publication venue
Publication date: 16/08/2020
Field of study

The purpose of this paper is to write a complete survey of the (spectral) manifold learning methods and nonlinear dimensionality reduction (NLDR) in data reduction. The first two NLDR methods in history were respectively published in Science in 2000 in which they solve the similar reduction problem of high-dimensional data endowed with the intrinsic nonlinear structure. The intrinsic nonlinear structure is always interpreted as a concept in manifolds from geometry and topology in theoretical mathematics by computer scientists and theoretical physicists. In 2001, the concept of Manifold Learning first appears as an NLDR method called Laplacian Eigenmaps purposed by Belkin and Niyogi. In the typical manifold learning setup, the data set, also called the observation set, is distributed on or near a low dimensional manifold

M

embedded in

\mathbb{R}^D

, which yields that each observation has a

D

-dimensional representation. The goal of (spectral) manifold learning is to reduce these observations as a compact lower-dimensional representation based on the geometric information. The reduction procedure is called the (spectral) manifold learning method. In this paper, we derive each (spectral) manifold learning method with the matrix and operator representation, and we then discuss the convergence behavior of each method in a geometric uniform language. Hence, we name the survey Geometric Foundations of Data Reduction.Comment: 79 pages, Suver

arXiv.org e-Print Archive

Intermittent Pulling with Local Compensation for Communication-Efficient Federated Learning

Author: Gao Xin
Guo Song
Li Ruixuan
Qu Zhihao
Wang Haozhao
Ye Baoliu
Publication venue
Publication date: 22/01/2020
Field of study

Federated Learning is a powerful machine learning paradigm to cooperatively train a global model with highly distributed data. A major bottleneck on the performance of distributed Stochastic Gradient Descent (SGD) algorithm for large-scale Federated Learning is the communication overhead on pushing local gradients and pulling global model. In this paper, to reduce the communication complexity of Federated Learning, a novel approach named Pulling Reduction with Local Compensation (PRLC) is proposed. Specifically, each training node intermittently pulls the global model from the server in SGD iterations, resulting in that it is sometimes unsynchronized with the server. In such a case, it will use its local update to compensate the gap between the local model and the global model. Our rigorous theoretical analysis of PRLC achieves two important findings. First, we prove that the convergence rate of PRLC preserves the same order as the classical synchronous SGD for both strongly-convex and non-convex cases with good scalability due to the linear speedup with respect to the number of training nodes. Second, we show that PRLC admits lower pulling frequency than the existing pulling reduction method without local compensation. We also conduct extensive experiments on various machine learning models to validate our theoretical results. Experimental results show that our approach achieves a significant pulling reduction over the state-of-the-art methods, e.g., PRLC requiring only half of the pulling operations of LAG

arXiv.org e-Print Archive