7 research outputs found

    Distributed Bayesian Probabilistic Matrix Factorization

    Full text link
    Matrix factorization is a common machine learning technique for recommender systems. Despite its high prediction accuracy, the Bayesian Probabilistic Matrix Factorization algorithm (BPMF) has not been widely used on large scale data because of its high computational cost. In this paper we propose a distributed high-performance parallel implementation of BPMF on shared memory and distributed architectures. We show by using efficient load balancing using work stealing on a single node, and by using asynchronous communication in the distributed version we beat state of the art implementations

    Distributed Bayesian Matrix Factorization with Limited Communication

    Full text link
    Bayesian matrix factorization (BMF) is a powerful tool for producing low-rank representations of matrices and for predicting missing values and providing confidence intervals. Scaling up the posterior inference for massive-scale matrices is challenging and requires distributing both data and computation over many workers, making communication the main computational bottleneck. Embarrassingly parallel inference would remove the communication needed, by using completely independent computations on different data subsets, but it suffers from the inherent unidentifiability of BMF solutions. We introduce a hierarchical decomposition of the joint posterior distribution, which couples the subset inferences, allowing for embarrassingly parallel computations in a sequence of at most three stages. Using an efficient approximate implementation, we show improvements empirically on both real and simulated data. Our distributed approach is able to achieve a speed-up of almost an order of magnitude over the full posterior, with a negligible effect on predictive accuracy. Our method outperforms state-of-the-art embarrassingly parallel MCMC methods in accuracy, and achieves results competitive to other available distributed and parallel implementations of BMF.Comment: 28 pages, 8 figures. The paper is published in Machine Learning journal. An implementation of the method is is available in SMURFF software on github (bmfpp branch): https://github.com/ExaScience/smurf

    MPI Collectives for Multi-core Clusters: Optimized Performance of the Hybrid MPI+MPI Parallel Codes

    Full text link
    The advent of multi-/many-core processors in clusters advocates hybrid parallel programming, which combines Message Passing Interface (MPI) for inter-node parallelism with a shared memory model for on-node parallelism. Compared to the traditional hybrid approach of MPI plus OpenMP, a new, but promising hybrid approach of MPI plus MPI-3 shared-memory extensions (MPI+MPI) is gaining attraction. We describe an algorithmic approach for collective operations (with allgather and broadcast as concrete examples) in the context of hybrid MPI+MPI, so as to minimize memory consumption and memory copies. With this approach, only one memory copy is maintained and shared by on-node processes. This allows the removal of unnecessary on-node copies of replicated data that are required between MPI processes when the collectives are invoked in the context of pure MPI. We compare our approach of collectives for hybrid MPI+MPI and the traditional one for pure MPI, and also have a discussion on the synchronization that is required to guarantee data integrity. The performance of our approach has been validated on a Cray XC40 system (Cray MPI) and NEC cluster (OpenMPI), showing that it achieves comparable or better performance for allgather operations. We have further validated our approach with a standard computational kernel, namely distributed matrix multiplication, and a Bayesian Probabilistic Matrix Factorization code.Comment: 10 pages. Accepted for publication in ICPP Workshops 201

    Identifying drug-target and drug-disease associations using computational intelligence

    Get PDF
    Background: Traditional drug development is an expensive process that typically requires the investment of a large number of resources in terms of finances, equipment, and time. However, sometimes these efforts do not result in a pharmaceutical product in the market. To overcome the limitations of this process, complementary—or in some cases, alternative—methods with high-throughput results are necessary. Computational drug discovery is a shortcut that can reduce the difficulties of traditional methods because of its flexible nature. Drug repositioning, which aims to find new applications for existing drugs, is one of the promising approaches in computational drug discovery. Considering the availability of different types of data in various public databases, drug-disease association identification and drug repositioning can be performed based on the interaction of drugs and biomolecules. Moreover, drug repositioning mainly focuses on the similarity of drugs and the similarity of agents interacting with drugs. It is assumed that if drug D is associated or interacts with target T, then drugs similar to drug D can be associated or interact with target T or targets similar to target T. Therefore, similarity-based approaches are widely used for drug repositioning. Research Objectives: Develop novel computational methods for drug-target and drug-disease association prediction to be used for drug repositioning. Results: In this thesis, the problem of drug-disease association identification and drug repositioning is divided into sub-problems. These sub-problems include drug-target interaction prediction and using targets as intermediaries for drug-disease association identification. Addressing these subproblems results in the development of three new computational models for drug-target interaction and drug-disease association prediction: MDIPA, NMTF-DTI, and NTD-DR. MDIPA is a nonnegative matrix factorization-based method to predict interaction scores of drug-microRNA pairs, where the interaction scores can effectively be used for drug repositioning. This method uses the functional similarity of microRNAs and structural similarity of drugs to make predictions. To include more biomolecules (e.g., proteins) in the study as well as achieve a more flexible model, we develop NMTF-DTI. This nonnegative matrix tri- factorization method uses multiple types of similarities for drugs and proteins to predict the associations between drugs and targets and their interaction score. To take another step towards drug repositioning, we identify the associations between drugs and disease. In this step, we develop NTD-DR, a nonnegative tensor decomposition approach where multiple similarities for drugs, targets, and diseases are used to identify the associations between drugs and diseases to be used for drug repositioning. The detail of each method is discussed in Chapters 3, 4, 5, respectively. Future work will focus on considering additional biomolecules as the drug target to identify drug-disease associations for drug repositioning. In summary, using nonnegative matrix factorization, nonnegative matrix tri-factorization, and nonnegative tensor decomposition, as well as applying different types of association information and multiple types of similarities, improve the performance of proposed methods over those methods that use single association or similarity information
    corecore