38 research outputs found

    Modeling Scalability of Distributed Machine Learning

    Full text link
    Present day machine learning is computationally intensive and processes large amounts of data. It is implemented in a distributed fashion in order to address these scalability issues. The work is parallelized across a number of computing nodes. It is usually hard to estimate in advance how many nodes to use for a particular workload. We propose a simple framework for estimating the scalability of distributed machine learning algorithms. We measure the scalability by means of the speedup an algorithm achieves with more nodes. We propose time complexity models for gradient descent and graphical model inference. We validate our models with experiments on deep learning training and belief propagation. This framework was used to study the scalability of machine learning algorithms in Apache Spark.Comment: 6 pages, 4 figures, appears at ICDE 201

    MLlib: Machine learning in Apache Spark

    Get PDF
    Apache Spark is a popular open-source platform for large-scale data processing that is well-suited for iterative machine learning tasks. In this paper we present MLLIB, Spark's open-source distributed machine learning library. MLLIB provides efficient functionality for a wide range of learning settings and includes several underlying statistical, optimization, and linear algebra primitives. Shipped with Spark, MLLIB supports several languages and provides a high-level API that leverages Spark's rich ecosystem to simplify the development of end-to-end machine learning pipelines. MLLIB has experienced a rapid growth due to its vibrant open-source community of over 140 contributors, and includes extensive documentation to support further growth and to let users quickly get up to speed

    Creating a Business Value while Transforming Data Assets using Machine Learning

    Get PDF
    Machine learning enables computers to learn from large amounts of data without specific programming. Besides its commercial application, companies are starting to recognize machine learning importance and possibilities in order to transform their data assets into business value. This study explores integration of machine learning into business core processes, while enabling predictive analytics that can increase business values and provide competitive advantage. It proposes machine learning algorithm based on regression analysis for a business solution in large enterprise company in Macedonia, while predicting real-value outcome from a given array of business inputs. The results show that most of the machine learning predictive values for the desired process output deviated from 0 to 15% of actual employees' decision. Hence, it verifies the appropriateness of the chosen approach, with predictive accuracy that can be meaningful in practice. As a machine learning case study in business context, it contains valuable information that can help companies understand the significance of machine learning for enterprise computing. It also points out some potential pitfalls of machine learning misuse

    To Index or Not to Index: Optimizing Exact Maximum Inner Product Search

    Full text link
    Exact Maximum Inner Product Search (MIPS) is an important task that is widely pertinent to recommender systems and high-dimensional similarity search. The brute-force approach to solving exact MIPS is computationally expensive, thus spurring recent development of novel indexes and pruning techniques for this task. In this paper, we show that a hardware-efficient brute-force approach, blocked matrix multiply (BMM), can outperform the state-of-the-art MIPS solvers by over an order of magnitude, for some -- but not all -- inputs. In this paper, we also present a novel MIPS solution, MAXIMUS, that takes advantage of hardware efficiency and pruning of the search space. Like BMM, MAXIMUS is faster than other solvers by up to an order of magnitude, but again only for some inputs. Since no single solution offers the best runtime performance for all inputs, we introduce a new data-dependent optimizer, OPTIMUS, that selects online with minimal overhead the best MIPS solver for a given input. Together, OPTIMUS and MAXIMUS outperform state-of-the-art MIPS solvers by 3.2×\times on average, and up to 10.9×\times, on widely studied MIPS datasets.Comment: 12 pages, 8 figures, 2 table
    corecore