Search CORE

38 research outputs found

Modeling Scalability of Distributed Machine Learning

Author: Marwah Manish
Simanovsky Andrey
Ulanov Alexander
Publication venue
Publication date: 24/03/2017
Field of study

Present day machine learning is computationally intensive and processes large amounts of data. It is implemented in a distributed fashion in order to address these scalability issues. The work is parallelized across a number of computing nodes. It is usually hard to estimate in advance how many nodes to use for a particular workload. We propose a simple framework for estimating the scalability of distributed machine learning algorithms. We measure the scalability by means of the speedup an algorithm achieves with more nodes. We propose time complexity models for gradient descent and graphical model inference. We validate our models with experiments on deep learning training and belief propagation. This framework was used to study the scalability of machine learning algorithms in Apache Spark.Comment: 6 pages, 4 figures, appears at ICDE 201

arXiv.org e-Print Archive

Crossref

MLlib: Machine learning in Apache Spark

Author: Amde Manish
Bradley Joseph
Franklin Michael J.
Freeman Jeremy
Liu Davies
Meng Xiangrui
Owen Sean
Sparks Evan
Talwakar Ameet
Tsai DB
Venkataraman Shivaram
Xin Doris
Yavuz Burak
Zadeh Reza
Zaharia Matei A
Publication venue: JMLR, Inc.
Publication date: 25/04/2018
Field of study

Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLLIB, Spark's open-source distributed machine learning library. MLLIB provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLLIB supports several languages and provides a high-level API that leverages Spark's rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLLIB has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed

DSpace@MIT

Creating a Business Value while Transforming Data Assets using Machine Learning

Author: Dimitrovska Ivana
Malinovski Toni
Publication venue: 'Faculty of Computer Science, Sriwijaya University'
Publication date: 12/06/2017
Field of study

Machine learning enables computers to learn from large amounts of data without specific programming. Besides its commercial application, companies are starting to recognize machine learning importance and possibilities in order to transform their data assets into business value. This study explores integration of machine learning into business core processes, while enabling predictive analytics that can increase business values and provide competitive advantage. It proposes machine learning algorithm based on regression analysis for a business solution in large enterprise company in Macedonia, while predicting real-value outcome from a given array of business inputs. The results show that most of the machine learning predictive values for the desired process output deviated from 0 to 15% of actual employees' decision. Hence, it verifies the appropriateness of the chosen approach, with predictive accuracy that can be meaningful in practice. As a machine learning case study in business context, it contains valuable information that can help companies understand the significance of machine learningÂ for enterprise computing. It also points out some potential pitfalls of machine learning misuse

ComEngApp-Journal

Directory of Open Access Journals

Computer Engineering and Applications Journal (ComEngApp, Universitas Sriwijaya)

To Index or Not to Index: Optimizing Exact Maximum Inner Product Search

Author: Abuzaid Firas
Bailis Peter
Sethi Geet
Zaharia Matei
Publication venue
Publication date: 14/03/2019
Field of study

Exact Maximum Inner Product Search (MIPS) is an important task that is widely pertinent to recommender systems and high-dimensional similarity search. The brute-force approach to solving exact MIPS is computationally expensive, thus spurring recent development of novel indexes and pruning techniques for this task. In this paper, we show that a hardware-efficient brute-force approach, blocked matrix multiply (BMM), can outperform the state-of-the-art MIPS solvers by over an order of magnitude, for some -- but not all -- inputs. In this paper, we also present a novel MIPS solution, MAXIMUS, that takes advantage of hardware efficiency and pruning of the search space. Like BMM, MAXIMUS is faster than other solvers by up to an order of magnitude, but again only for some inputs. Since no single solution offers the best runtime performance for all inputs, we introduce a new data-dependent optimizer, OPTIMUS, that selects online with minimal overhead the best MIPS solver for a given input. Together, OPTIMUS and MAXIMUS outperform state-of-the-art MIPS solvers by 3.2

\times

on average, and up to 10.9

\times

, on widely studied MIPS datasets.Comment: 12 pages, 8 figures, 2 table

arXiv.org e-Print Archive

Crossref