108 research outputs found

    Asynchronous Execution of Python Code on Task Based Runtime Systems

    Get PDF
    Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and artificial intelligence (AI), from utilizing performance benefits of such systems. Researchers and scientists favor high-productivity languages to avoid the inconvenience of programming in low-level languages and costs of acquiring the necessary skills required for programming at this level. In recent years, Python, with the support of linear algebra libraries like NumPy, has gained popularity despite facing limitations which prevent this code from distributed runs. Here we present a solution which maintains both high level programming abstractions as well as parallel and distributed efficiency. Phylanx, is an asynchronous array processing toolkit which transforms Python and NumPy operations into code which can be executed in parallel on HPC resources by mapping Python and NumPy functions and variables into a dependency tree executed by HPX, a general purpose, parallel, task-based runtime system written in C++. Phylanx additionally provides introspection and visualization capabilities for debugging and performance analysis. We have tested the foundations of our approach by comparing our implementation of widely used machine learning algorithms to accepted NumPy standards

    A Taylor polynomial expansion line search for large-scale optimization

    Get PDF
    In trying to cope with the Big Data deluge, the landscape of distributed computing has changed. Large commodity hardware clusters, typically operating in some form of MapReduce framework, are becoming prevalent for organizations that require both tremendous storage capacity and fault tolerance. However, the high cost of communication can dominate the computation time in large-scale optimization routines in these frameworks. This thesis considers the problem of how to efficiently conduct univariate line searches in commodity clusters in the context of gradient-based batch optimization algorithms, like the staple limited-memory BFGS (LBFGS) method. In it, a new line search technique is proposed for cases where the underlying objective function is analytic, as in logistic regression and low rank matrix factorization. The technique approximates the objective function by a truncated Taylor polynomial along a fixed search direction. The coefficients of this polynomial may be computed efficiently in parallel with far less communication than needed to transmit the high-dimensional gradient vector, after which the polynomial may be minimized with high accuracy in a neighbourhood of the expansion point without distributed operations. This Polynomial Expansion Line Search (PELS) may be invoked iteratively until the expansion point and minimum are sufficiently accurate, and can provide substantial savings in time and communication costs when multiple iterations in the line search procedure are required. Three applications of the PELS technique are presented herein for important classes of analytic functions: (i) logistic regression (LR), (ii) low-rank matrix factorization (MF) models, and (iii) the feedforward multilayer perceptron (MLP). In addition, for LR and MF, implementations of PELS in the Apache Spark framework for fault-tolerant cluster computing are provided. These implementations conferred significant convergence enhancements to their respective algorithms, and will be of interest to Spark and Hadoop practitioners. For instance, the Spark PELS technique reduced the number of iterations and time required by LBFGS to reach terminal training accuracies for LR models by factors of 1.8--2. Substantial acceleration was also observed for the Nonlinear Conjugate Gradient algorithm for MLP models, which is an interesting case for future study in optimization for neural networks. The PELS technique is applicable to a broad class of models for Big Data processing and large-scale optimization, and can be a useful component of batch optimization routines

    MLBCD: a machine learning tool for big clinical data

    Get PDF

    Tensor Learning for Recovering Missing Information: Algorithms and Applications on Social Media

    Get PDF
    Real-time social systems like Facebook, Twitter, and Snapchat have been growing rapidly, producing exabytes of data in different views or aspects. Coupled with more and more GPS-enabled sharing of videos, images, blogs, and tweets that provide valuable information regarding “who”, “where”, “when” and “what”, these real-time human sensor data promise new research opportunities to uncover models of user behavior, mobility, and information sharing. These real-time dynamics in social systems usually come in multiple aspects, which are able to help better understand the social interactions of the underlying network. However, these multi-aspect datasets are often raw and incomplete owing to various unpredictable or unavoidable reasons; for instance, API limitations and data sampling policies can lead to an incomplete (and often biased) perspective on these multi-aspect datasets. This missing data could raise serious concerns such as biased estimations on structural properties of the network and properties of information cascades in social networks. In order to recover missing values or information in social systems, we identify “4S” challenges: extreme sparsity of the observed multi-aspect datasets, adoption of rich side information that is able to describe the similarities of entities, generation of robust models rather than limiting them on specific applications, and scalability of models to handle real large-scale datasets (billions of observed entries). With these challenges in mind, this dissertation aims to develop scalable and interpretable tensor-based frameworks, algorithms and methods for recovering missing information on social media. In particular, this dissertation research makes four unique contributions: _ The first research contribution of this dissertation research is to propose a scalable framework based on low-rank tensor learning in the presence of incomplete information. Concretely, we formally define the problem of recovering the spatio-temporal dynamics of online memes and tackle this problem by proposing a novel tensor-based factorization approach based on the alternative direction method of multipliers (ADMM) with the integration of the latent relationships derived from contextual information among locations, memes, and times. _ The second research contribution of this dissertation research is to evaluate the generalization of the proposed tensor learning framework and extend it to the recommendation problem. In particular, we develop a novel tensor-based approach to solve the personalized expert recommendation by integrating both the latent relationships between homogeneous entities (e.g., users and users, experts and experts) and the relationships between heterogeneous entities (e.g., users and experts, topics and experts) from the geo-spatial, topical, and social contexts. _ The third research contribution of this dissertation research is to extend the proposed tensor learning framework to the user topical profiling problem. Specifically, we propose a tensor-based contextual regularization model embedded into a matrix factorization framework, which leverages the social, textual, and behavioral contexts across users, in order to overcome identified challenges. _ The fourth research contribution of this dissertation research is to scale up the proposed tensor learning framework to be capable of handling real large-scale datasets that are too big to fit in the main memory of a single machine. Particularly, we propose a novel distributed tensor completion algorithm with the trace-based regularization of the auxiliary information based on ADMM under the proposed tensor learning framework, which is designed to scale up to real large-scale tensors (e.g., billions of entries) by efficiently computing auxiliary variables, minimizing intermediate data, and reducing the workload of updating new tensors
    • …
    corecore