1,539,825 research outputs found
Scalable Tensor Factorizations for Incomplete Data
The problem of incomplete data - i.e., data with missing or unknown values -
in multi-way arrays is ubiquitous in biomedical signal processing, network
traffic analysis, bibliometrics, social network analysis, chemometrics,
computer vision, communication networks, etc. We consider the problem of how to
factorize data sets with missing values with the goal of capturing the
underlying latent structure of the data and possibly reconstructing missing
values (i.e., tensor completion). We focus on one of the most well-known tensor
factorizations that captures multi-linear structure, CANDECOMP/PARAFAC (CP). In
the presence of missing data, CP can be formulated as a weighted least squares
problem that models only the known entries. We develop an algorithm called
CP-WOPT (CP Weighted OPTimization) that uses a first-order optimization
approach to solve the weighted least squares problem. Based on extensive
numerical experiments, our algorithm is shown to successfully factorize tensors
with noise and up to 99% missing data. A unique aspect of our approach is that
it scales to sparse large-scale data, e.g., 1000 x 1000 x 1000 with five
million known entries (0.5% dense). We further demonstrate the usefulness of
CP-WOPT on two real-world applications: a novel EEG (electroencephalogram)
application where missing data is frequently encountered due to disconnections
of electrodes and the problem of modeling computer network traffic where data
may be absent due to the expense of the data collection process
Dynamic pattern matcher using incomplete data
This invention relates generally to pattern matching systems, and more particularly to a method for dynamically adapting the system to enhance the effectiveness of a pattern match. Apparatus and methods for calculating the similarity between patterns are known. There is considerable interest, however, in the storage and retrieval of data, particularly, when the search is called or initiated by incomplete information. For many search algorithms, a query initiating a data search requires exact information, and the data file is searched for an exact match. Inability to find an exact match thus results in a failure of the system or method
Bayesian Robust Tensor Factorization for Incomplete Multiway Data
We propose a generative model for robust tensor factorization in the presence
of both missing data and outliers. The objective is to explicitly infer the
underlying low-CP-rank tensor capturing the global information and a sparse
tensor capturing the local information (also considered as outliers), thus
providing the robust predictive distribution over missing entries. The
low-CP-rank tensor is modeled by multilinear interactions between multiple
latent factors on which the column sparsity is enforced by a hierarchical
prior, while the sparse tensor is modeled by a hierarchical view of Student-
distribution that associates an individual hyperparameter with each element
independently. For model learning, we develop an efficient closed-form
variational inference under a fully Bayesian treatment, which can effectively
prevent the overfitting problem and scales linearly with data size. In contrast
to existing related works, our method can perform model selection automatically
and implicitly without need of tuning parameters. More specifically, it can
discover the groundtruth of CP rank and automatically adapt the sparsity
inducing priors to various types of outliers. In addition, the tradeoff between
the low-rank approximation and the sparse representation can be optimized in
the sense of maximum model evidence. The extensive experiments and comparisons
with many state-of-the-art algorithms on both synthetic and real-world datasets
demonstrate the superiorities of our method from several perspectives.Comment: in IEEE Transactions on Neural Networks and Learning Systems, 201
Process reconstruction from incomplete and/or inconsistent data
We analyze how an action of a qubit channel (map) can be estimated from the
measured data that are incomplete or even inconsistent. That is, we consider
situations when measurement statistics is insufficient to determine consistent
probability distributions. As a consequence either the estimation
(reconstruction) of the channel completely fails or it results in an unphysical
channel (i.e., the corresponding map is not completely positive). We present a
regularization procedure that allows us to derive physically reasonable
estimates (approximations) of quantum channels. We illustrate our procedure on
specific examples and we show that the procedure can be also used for a
derivation of optimal approximations of operations that are forbidden by the
laws of quantum mechanics (e.g., the universal NOT gate).Comment: 9pages, 5 figure
Formal and Informal Model Selection with Incomplete Data
Model selection and assessment with incomplete data pose challenges in
addition to the ones encountered with complete data. There are two main reasons
for this. First, many models describe characteristics of the complete data, in
spite of the fact that only an incomplete subset is observed. Direct comparison
between model and data is then less than straightforward. Second, many commonly
used models are more sensitive to assumptions than in the complete-data
situation and some of their properties vanish when they are fitted to
incomplete, unbalanced data. These and other issues are brought forward using
two key examples, one of a continuous and one of a categorical nature. We argue
that model assessment ought to consist of two parts: (i) assessment of a
model's fit to the observed data and (ii) assessment of the sensitivity of
inferences to unverifiable assumptions, that is, to how a model described the
unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Distribution of Mutual Information from Complete and Incomplete Data
Mutual information is widely used, in a descriptive way, to measure the
stochastic dependence of categorical random variables. In order to address
questions such as the reliability of the descriptive value, one must consider
sample-to-population inferential approaches. This paper deals with the
posterior distribution of mutual information, as obtained in a Bayesian
framework by a second-order Dirichlet prior distribution. The exact analytical
expression for the mean, and analytical approximations for the variance,
skewness and kurtosis are derived. These approximations have a guaranteed
accuracy level of the order O(1/n^3), where n is the sample size. Leading order
approximations for the mean and the variance are derived in the case of
incomplete samples. The derived analytical expressions allow the distribution
of mutual information to be approximated reliably and quickly. In fact, the
derived expressions can be computed with the same order of complexity needed
for descriptive mutual information. This makes the distribution of mutual
information become a concrete alternative to descriptive mutual information in
many applications which would benefit from moving to the inductive side. Some
of these prospective applications are discussed, and one of them, namely
feature selection, is shown to perform significantly better when inductive
mutual information is used.Comment: 26 pages, LaTeX, 5 figures, 4 table
- …
