10 research outputs found

    Parallel implementations for solving matrix factorization problems with optimization

    Get PDF
    During recent years, the exponential increase in data sets' sizes and the need for fast and accurate tools which can operate on these huge data sets in applications such as recommendation systems has led to an ever growing attention towards devising novel methods which can incorporate all the available resources to execute desired operations in the least possible time. In this work, we provide a framework for parallelized large-scale matrix factoriza- tion problems. One of the most successful and used methods to solve these problems is solving them via optimization techniques. Optimization methods require gradient vectors to update the iterates. The time spent to solve such a problem is mostly spent on calls to gradient and function value evaluations. In this work, we have used a recent method, which has not been used before for matrix factorization. When it comes to parallelization, we present both CPU and GPU implementations. As our experiments show, the proposed parallelization scales quite well. We report our results on Movie- Lens data set. Our results show that the new method is quite successful in reducing the number of iterations. We obtain very good RMSE values with signi cant promising scaling gures

    Security Infrastructure Technology for Integrated Utilization of Big Data

    Get PDF
    This open access book describes the technologies needed to construct a secure big data infrastructure that connects data owners, analytical institutions, and user institutions in a circle of trust. It begins by discussing the most relevant technical issues involved in creating safe and privacy-preserving big data distribution platforms, and especially focuses on cryptographic primitives and privacy-preserving techniques, which are essential prerequisites. The book also covers elliptic curve cryptosystems, which offer compact public key cryptosystems; and LWE-based cryptosystems, which are a type of post-quantum cryptosystem. Since big data distribution platforms require appropriate data handling, the book also describes a privacy-preserving data integration protocol and privacy-preserving classification protocol for secure computation. Furthermore, it introduces an anonymization technique and privacy risk evaluation technique. This book also describes the latest related findings in both the living safety and medical fields. In the living safety field, to prevent injuries occurring in everyday life, it is necessary to analyze injury data, find problems, and implement suitable measures. But most cases don’t include enough information for injury prevention because the necessary data is spread across multiple organizations, and data integration is difficult from a security standpoint. This book introduces a system for solving this problem by applying a method for integrating distributed data securely and introduces applications concerning childhood injury at home and school injury. In the medical field, privacy protection and patient consent management are crucial for all research. The book describes a medical test bed for the secure collection and analysis of electronic medical records distributed among various medical institutions. The system promotes big-data analysis of medical data with a cloud infrastructure and includes various security measures developed in our project to avoid privacy violations

    Fast and Robust Parallel SGD Matrix Factorization

    No full text
    1

    Fast and Robust Parallel SGD Matrix Factorization

    No full text
    Matrix factorization is one of the fundamental techniques for analyzing latent relationship between two entities. Especially, it is used for recommendation for its high accuracy. Efficient parallel SGD matrix factorization algorithms have been developed for large matrices to speed up the convergence of factorization. However, most of them are designed for a shared-memory environment thus fail to factorize a large matrix that is too big to fit in memory, and their performances are also unreliable when the matrix is skewed. This paper proposes a fast and robust parallel SGD matrix factorization algorithm, called MLGF-MF, which is robust to skewed matrices and runs efficiently on block-storage devices (e.g., SSD disks) as well as shared-memory. MLGF-MF uses Multi-Level Grid File (MLGF) for partitioning the matrix and minimizes the cost for scheduling parallel SGD updates on the partitioned regions by exploiting partial match queries processing. Thereby, MLGF-MF produces reliable results efficiently even on skewed matrices. MLGF-MF is designed with asynchronous I/O permeated in the algorithm such that CPU keeps executing without waiting for I/O to complete. Thereby, MLGF-MF overlaps the CPU and I/O processing, which eventually offsets the I/O cost and maximizes the CPU utility. Recent flash SSD disks support high performance parallel I/O, thus are appropriate for executing the asynchronous I/O. From our extensive evaluations, MLGF-MF significantly outperforms (or converges faster than) the state-of-the-art algorithms in both shared-memory and block-storage environments. In addition, the outputs of MLGF-MF is significantly more robust to skewed matrices. Our implementation of MLGF-MF is available at http ://dm.postech.ac.kr/MLGF-MF as executable files.118sciescopu
    corecore