315,079 research outputs found

    Matrix completion with queries

    Full text link
    In many applications, e.g., recommender systems and traffic monitoring, the data comes in the form of a matrix that is only partially observed and low rank. A fundamental data-analysis task for these datasets is matrix completion, where the goal is to accurately infer the entries missing from the matrix. Even when the data satisfies the low-rank assumption, classical matrix-completion methods may output completions with significant error -- in that the reconstructed matrix differs significantly from the true underlying matrix. Often, this is due to the fact that the information contained in the observed entries is insufficient. In this work, we address this problem by proposing an active version of matrix completion, where queries can be made to the true underlying matrix. Subsequently, we design Order&Extend, which is the first algorithm to unify a matrix-completion approach and a querying strategy into a single algorithm. Order&Extend is able identify and alleviate insufficient information by judiciously querying a small number of additional entries. In an extensive experimental evaluation on real-world datasets, we demonstrate that our algorithm is efficient and is able to accurately reconstruct the true matrix while asking only a small number of queries.Comment: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Minin

    Matrix completion with structure

    Get PDF
    Often, data organized in matrix form contains missing entries. Further, such data has been observed to exhibit effective low-rank, and has led to interest in the particular problem of low-rank matrix-completion: Given a partially-observed matrix, estimate the missing entries such that the output completion is low-rank. The goal of this thesis is to improve matrix-completion algorithms by explicitly analyzing two sources of information in the observed entries: their locations and their values. First, we provide a categorization of a new approach to matrix-completion, which we call structural. Structural methods quantify the possibility of completion using tests applied only to the locations of known entries. By framing each test as the class of partially-observed matrices that pass the test, we provide the first organizing framework for analyzing the relationship among structural completion methods. Building on the structural approach, we then develop a new algorithm for active matrix-completion that is combinatorial in nature. The algorithm uses just the locations of known entries to suggest a small number of queries to be made on the missing entries that allow it to produce a full and accurate completion. If a budget is placed on the number of queries, the algorithm outputs a partial completion, indicating which entries it can and cannot accurately estimate given the observations at hand. Finally, we propose a local approach to matrix-completion that analyzes the values of the observed entries to discover a structure that is more fine-grained than the traditional low-rank assumption. Motivated by the Singular Value Decomposition, we develop an algorithm that finds low-rank submatrices using only the first few singular vectors of a matrix. By completing low-rank submatrices separately from the rest of the matrix, the local approach to matrix-completion produces more accurate reconstructions than traditional algorithms

    DMFSGD: A Decentralized Matrix Factorization Algorithm for Network Distance Prediction

    Full text link
    The knowledge of end-to-end network distances is essential to many Internet applications. As active probing of all pairwise distances is infeasible in large-scale networks, a natural idea is to measure a few pairs and to predict the other ones without actually measuring them. This paper formulates the distance prediction problem as matrix completion where unknown entries of an incomplete matrix of pairwise distances are to be predicted. The problem is solvable because strong correlations among network distances exist and cause the constructed distance matrix to be low rank. The new formulation circumvents the well-known drawbacks of existing approaches based on Euclidean embedding. A new algorithm, so-called Decentralized Matrix Factorization by Stochastic Gradient Descent (DMFSGD), is proposed to solve the network distance prediction problem. By letting network nodes exchange messages with each other, the algorithm is fully decentralized and only requires each node to collect and to process local measurements, with neither explicit matrix constructions nor special nodes such as landmarks and central servers. In addition, we compared comprehensively matrix factorization and Euclidean embedding to demonstrate the suitability of the former on network distance prediction. We further studied the incorporation of a robust loss function and of non-negativity constraints. Extensive experiments on various publicly-available datasets of network delays show not only the scalability and the accuracy of our approach but also its usability in real Internet applications.Comment: submitted to IEEE/ACM Transactions on Networking on Nov. 201

    Matrix factorization with rating completion : an enhanced SVD Model for collaborative filtering recommender systems

    Get PDF
    Collaborative filtering algorithms, such as matrix factorization techniques, are recently gaining momentum due to their promising performance on recommender systems. However, most collaborative filtering algorithms suffer from data sparsity. Active learning algorithms are effective in reducing the sparsity problem for recommender systems by requesting users to give ratings to some items when they enter the systems. In this paper, a new matrix factorization model, called Enhanced SVD (ESVD) is proposed, which incorporates the classic matrix factorization algorithms with ratings completion inspired by active learning. In addition, the connection between the prediction accuracy and the density of matrix is built to further explore its potentials. We also propose the Multi-layer ESVD, which learns the model iteratively to further improve the prediction accuracy. To handle the imbalanced data sets that contain far more users than items or more items than users, the Item-wise ESVD and User-wise ESVD are presented, respectively. The proposed methods are evaluated on the famous Netflix and Movielens data sets. Experimental results validate their effectiveness in terms of both accuracy and efficiency when compared with traditional matrix factorization methods and active learning methods

    Multi-Target Prediction: A Unifying View on Problems and Methods

    Full text link
    Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

    OBOE: Collaborative Filtering for AutoML Model Selection

    Full text link
    Algorithm selection and hyperparameter tuning remain two of the most challenging tasks in machine learning. Automated machine learning (AutoML) seeks to automate these tasks to enable widespread use of machine learning by non-experts. This paper introduces OBOE, a collaborative filtering method for time-constrained model selection and hyperparameter tuning. OBOE forms a matrix of the cross-validated errors of a large number of supervised learning models (algorithms together with hyperparameters) on a large number of datasets, and fits a low rank model to learn the low-dimensional feature vectors for the models and datasets that best predict the cross-validated errors. To find promising models for a new dataset, OBOE runs a set of fast but informative algorithms on the new dataset and uses their cross-validated errors to infer the feature vector for the new dataset. OBOE can find good models under constraints on the number of models fit or the total time budget. To this end, this paper develops a new heuristic for active learning in time-constrained matrix completion based on optimal experiment design. Our experiments demonstrate that OBOE delivers state-of-the-art performance faster than competing approaches on a test bed of supervised learning problems. Moreover, the success of the bilinear model used by OBOE suggests that AutoML may be simpler than was previously understood
    corecore