315,079 research outputs found
Matrix completion with queries
In many applications, e.g., recommender systems and traffic monitoring, the
data comes in the form of a matrix that is only partially observed and low
rank. A fundamental data-analysis task for these datasets is matrix completion,
where the goal is to accurately infer the entries missing from the matrix. Even
when the data satisfies the low-rank assumption, classical matrix-completion
methods may output completions with significant error -- in that the
reconstructed matrix differs significantly from the true underlying matrix.
Often, this is due to the fact that the information contained in the observed
entries is insufficient. In this work, we address this problem by proposing an
active version of matrix completion, where queries can be made to the true
underlying matrix. Subsequently, we design Order&Extend, which is the first
algorithm to unify a matrix-completion approach and a querying strategy into a
single algorithm. Order&Extend is able identify and alleviate insufficient
information by judiciously querying a small number of additional entries. In an
extensive experimental evaluation on real-world datasets, we demonstrate that
our algorithm is efficient and is able to accurately reconstruct the true
matrix while asking only a small number of queries.Comment: Proceedings of the 21th ACM SIGKDD International Conference on
Knowledge Discovery and Data Minin
Matrix completion with structure
Often, data organized in matrix form contains missing entries. Further, such data has been observed to exhibit effective low-rank, and has led to interest in the particular problem of low-rank matrix-completion: Given a partially-observed matrix, estimate the missing entries such that the output completion is low-rank. The goal of this thesis is to improve matrix-completion algorithms by explicitly analyzing two sources of information in the observed entries: their locations and their values.
First, we provide a categorization of a new approach to matrix-completion, which we call structural. Structural methods quantify the possibility of completion using tests applied only to the locations of known entries. By framing each test as the class of partially-observed matrices that pass the test, we provide the first organizing framework for analyzing the relationship among structural completion methods.
Building on the structural approach, we then develop a new algorithm for active matrix-completion that is combinatorial in nature. The algorithm uses just the locations of known entries to suggest a small number of queries to be made on the missing entries that allow it to produce a full and accurate completion. If a budget is placed on the number of queries, the algorithm outputs a partial completion, indicating which entries it can and cannot accurately estimate given the observations at hand.
Finally, we propose a local approach to matrix-completion that analyzes the values of the observed entries to discover a structure that is more fine-grained than the traditional low-rank assumption. Motivated by the Singular Value Decomposition, we develop an algorithm that finds low-rank submatrices using only the first few singular vectors of a matrix. By completing low-rank submatrices separately from the rest of the matrix, the local approach to matrix-completion produces more accurate reconstructions than traditional algorithms
DMFSGD: A Decentralized Matrix Factorization Algorithm for Network Distance Prediction
The knowledge of end-to-end network distances is essential to many Internet
applications. As active probing of all pairwise distances is infeasible in
large-scale networks, a natural idea is to measure a few pairs and to predict
the other ones without actually measuring them. This paper formulates the
distance prediction problem as matrix completion where unknown entries of an
incomplete matrix of pairwise distances are to be predicted. The problem is
solvable because strong correlations among network distances exist and cause
the constructed distance matrix to be low rank. The new formulation circumvents
the well-known drawbacks of existing approaches based on Euclidean embedding.
A new algorithm, so-called Decentralized Matrix Factorization by Stochastic
Gradient Descent (DMFSGD), is proposed to solve the network distance prediction
problem. By letting network nodes exchange messages with each other, the
algorithm is fully decentralized and only requires each node to collect and to
process local measurements, with neither explicit matrix constructions nor
special nodes such as landmarks and central servers. In addition, we compared
comprehensively matrix factorization and Euclidean embedding to demonstrate the
suitability of the former on network distance prediction. We further studied
the incorporation of a robust loss function and of non-negativity constraints.
Extensive experiments on various publicly-available datasets of network delays
show not only the scalability and the accuracy of our approach but also its
usability in real Internet applications.Comment: submitted to IEEE/ACM Transactions on Networking on Nov. 201
Matrix factorization with rating completion : an enhanced SVD Model for collaborative filtering recommender systems
Collaborative filtering algorithms, such as matrix factorization techniques, are recently gaining momentum due to their promising performance on recommender systems. However, most collaborative filtering algorithms suffer from data sparsity. Active learning algorithms are effective in reducing the sparsity problem for recommender systems by requesting users to give ratings to some items when they enter the systems. In this paper, a new matrix factorization model, called Enhanced SVD (ESVD) is proposed, which incorporates the classic matrix factorization algorithms with ratings completion inspired by active learning. In addition, the connection between the prediction accuracy and the density of matrix is built to further explore its potentials. We also propose the Multi-layer ESVD, which learns the model iteratively to further improve the prediction accuracy. To handle the imbalanced data sets that contain far more users than items or more items than users, the Item-wise ESVD and User-wise ESVD are presented, respectively. The proposed methods are evaluated on the famous Netflix and Movielens data sets. Experimental results validate their effectiveness in terms of both accuracy and efficiency when compared with traditional matrix factorization methods and active learning methods
Multi-Target Prediction: A Unifying View on Problems and Methods
Multi-target prediction (MTP) is concerned with the simultaneous prediction
of multiple target variables of diverse type. Due to its enormous application
potential, it has developed into an active and rapidly expanding research field
that combines several subfields of machine learning, including multivariate
regression, multi-label classification, multi-task learning, dyadic prediction,
zero-shot learning, network inference, and matrix completion. In this paper, we
present a unifying view on MTP problems and methods. First, we formally discuss
commonalities and differences between existing MTP problems. To this end, we
introduce a general framework that covers the above subfields as special cases.
As a second contribution, we provide a structured overview of MTP methods. This
is accomplished by identifying a number of key properties, which distinguish
such methods and determine their suitability for different types of problems.
Finally, we also discuss a few challenges for future research
OBOE: Collaborative Filtering for AutoML Model Selection
Algorithm selection and hyperparameter tuning remain two of the most
challenging tasks in machine learning. Automated machine learning (AutoML)
seeks to automate these tasks to enable widespread use of machine learning by
non-experts. This paper introduces OBOE, a collaborative filtering method for
time-constrained model selection and hyperparameter tuning. OBOE forms a matrix
of the cross-validated errors of a large number of supervised learning models
(algorithms together with hyperparameters) on a large number of datasets, and
fits a low rank model to learn the low-dimensional feature vectors for the
models and datasets that best predict the cross-validated errors. To find
promising models for a new dataset, OBOE runs a set of fast but informative
algorithms on the new dataset and uses their cross-validated errors to infer
the feature vector for the new dataset. OBOE can find good models under
constraints on the number of models fit or the total time budget. To this end,
this paper develops a new heuristic for active learning in time-constrained
matrix completion based on optimal experiment design. Our experiments
demonstrate that OBOE delivers state-of-the-art performance faster than
competing approaches on a test bed of supervised learning problems. Moreover,
the success of the bilinear model used by OBOE suggests that AutoML may be
simpler than was previously understood
- …