17 research outputs found
Targeted matrix completion
Matrix completion is a problem that arises in many data-analysis settings
where the input consists of a partially-observed matrix (e.g., recommender
systems, traffic matrix analysis etc.). Classical approaches to matrix
completion assume that the input partially-observed matrix is low rank. The
success of these methods depends on the number of observed entries and the rank
of the matrix; the larger the rank, the more entries need to be observed in
order to accurately complete the matrix. In this paper, we deal with matrices
that are not necessarily low rank themselves, but rather they contain low-rank
submatrices. We propose Targeted, which is a general framework for completing
such matrices. In this framework, we first extract the low-rank submatrices and
then apply a matrix-completion algorithm to these low-rank submatrices as well
as the remainder matrix separately. Although for the completion itself we use
state-of-the-art completion methods, our results demonstrate that Targeted
achieves significantly smaller reconstruction errors than other classical
matrix-completion methods. One of the key technical contributions of the paper
lies in the identification of the low-rank submatrices from the input
partially-observed matrices.Comment: Proceedings of the 2017 SIAM International Conference on Data Mining
(SDM
Matrix completion with queries
In many applications, e.g., recommender systems and traffic monitoring, the
data comes in the form of a matrix that is only partially observed and low
rank. A fundamental data-analysis task for these datasets is matrix completion,
where the goal is to accurately infer the entries missing from the matrix. Even
when the data satisfies the low-rank assumption, classical matrix-completion
methods may output completions with significant error -- in that the
reconstructed matrix differs significantly from the true underlying matrix.
Often, this is due to the fact that the information contained in the observed
entries is insufficient. In this work, we address this problem by proposing an
active version of matrix completion, where queries can be made to the true
underlying matrix. Subsequently, we design Order&Extend, which is the first
algorithm to unify a matrix-completion approach and a querying strategy into a
single algorithm. Order&Extend is able identify and alleviate insufficient
information by judiciously querying a small number of additional entries. In an
extensive experimental evaluation on real-world datasets, we demonstrate that
our algorithm is efficient and is able to accurately reconstruct the true
matrix while asking only a small number of queries.Comment: Proceedings of the 21th ACM SIGKDD International Conference on
Knowledge Discovery and Data Minin
Robust recovery of missing data in electricity distribution systems
The advanced operation of future electricity distribution systems is likely to require significant observability of the different parameters of interest (e.g., demand, voltages, currents, etc.). Ensuring completeness of data is, therefore, paramount. In this context, an algorithm for recovering missing state variable observations in electricity distribution systems is presented. The proposed method exploits the low rank structure of the state variables via a matrix completion approach while incorporating prior knowledge in the form of second order statistics. Specifically, the recovery method combines nuclear norm minimization with Bayesian estimation. The performance of the new algorithm is compared to the information-theoretic limits and tested trough simulations using real data of an urban low voltage distribution system. The impact of the prior knowledge is analyzed when a mismatched covariance is used and for a Markovian sampling that introduces structure in the observation pattern. Numerical results demonstrate that the proposed algorithm is robust and outperforms existing state of the art algorithms
Analysis of a Collaborative Filter Based on Popularity Amongst Neighbors
In this paper, we analyze a collaborative filter that answers the simple
question: What is popular amongst your friends? While this basic principle
seems to be prevalent in many practical implementations, there does not appear
to be much theoretical analysis of its performance. In this paper, we partly
fill this gap. While recent works on this topic, such as the low-rank matrix
completion literature, consider the probability of error in recovering the
entire rating matrix, we consider probability of an error in an individual
recommendation (bit error rate (BER)). For a mathematical model introduced in
[1],[2], we identify three regimes of operation for our algorithm (named
Popularity Amongst Friends (PAF)) in the limit as the matrix size grows to
infinity. In a regime characterized by large number of samples and small
degrees of freedom (defined precisely for the model in the paper), the
asymptotic BER is zero; in a regime characterized by large number of samples
and large degrees of freedom, the asymptotic BER is bounded away from 0 and 1/2
(and is identified exactly except for a special case); and in a regime
characterized by a small number of samples, the algorithm fails. We also
present numerical results for the MovieLens and Netflix datasets. We discuss
the empirical performance in light of our theoretical results and compare with
an approach based on low-rank matrix completion.Comment: 47 pages. Submitted to IEEE Transactions on Information Theory
(revised in July 2011). A shorter version would be presented at ISIT 201
A Computational Framework for Influenza Antigenic Cartography
Influenza viruses have been responsible for large losses of lives around the world and continue to present a great public health challenge. Antigenic characterization based on hemagglutination inhibition (HI) assay is one of the routine procedures for influenza vaccine strain selection. However, HI assay is only a crude experiment reflecting the antigenic correlations among testing antigens (viruses) and reference antisera (antibodies). Moreover, antigenic characterization is usually based on more than one HI dataset. The combination of multiple datasets results in an incomplete HI matrix with many unobserved entries. This paper proposes a new computational framework for constructing an influenza antigenic cartography from this incomplete matrix, which we refer to as Matrix Completion-Multidimensional Scaling (MC-MDS). In this approach, we first reconstruct the HI matrices with viruses and antibodies using low-rank matrix completion, and then generate the two-dimensional antigenic cartography using multidimensional scaling. Moreover, for influenza HI tables with herd immunity effect (such as those from Human influenza viruses), we propose a temporal model to reduce the inherent temporal bias of HI tables caused by herd immunity. By applying our method in HI datasets containing H3N2 influenza A viruses isolated from 1968 to 2003, we identified eleven clusters of antigenic variants, representing all major antigenic drift events in these 36 years. Our results showed that both the completed HI matrix and the antigenic cartography obtained via MC-MDS are useful in identifying influenza antigenic variants and thus can be used to facilitate influenza vaccine strain selection. The webserver is available at http://sysbio.cvm.msstate.edu/AntigenMap