Search CORE

1,539,825 research outputs found

Scalable Tensor Factorizations for Incomplete Data

Author: Acar
Acar
Acar
Acar
Andersson
Bader
Bro
Buchanan
Carroll
Daniel M. Dunlavy
Delorme
Dempster
Dunlavy
Evrim Acar
Gabriel
Geng
Harshman
Kiers
Kolda
Kolda
Kruskal
Miwakeichi
Morten Mørup
Moré
Mørup
Nocedal
Orekhov
Paatero
Ruhe
Srebro
Tamara G. Kolda
Tomasi
Tucker
Uhlig
Walczak
Zhang
Publication venue: 'Elsevier BV'
Publication date: 12/05/2010
Field of study

The problem of incomplete data - i.e., data with missing or unknown values - in multi-way arrays is ubiquitous in biomedical signal processing, network traffic analysis, bibliometrics, social network analysis, chemometrics, computer vision, communication networks, etc. We consider the problem of how to factorize data sets with missing values with the goal of capturing the underlying latent structure of the data and possibly reconstructing missing values (i.e., tensor completion). We focus on one of the most well-known tensor factorizations that captures multi-linear structure, CANDECOMP/PARAFAC (CP). In the presence of missing data, CP can be formulated as a weighted least squares problem that models only the known entries. We develop an algorithm called CP-WOPT (CP Weighted OPTimization) that uses a first-order optimization approach to solve the weighted least squares problem. Based on extensive numerical experiments, our algorithm is shown to successfully factorize tensors with noise and up to 99% missing data. A unique aspect of our approach is that it scales to sparse large-scale data, e.g., 1000 x 1000 x 1000 with five million known entries (0.5% dense). We further demonstrate the usefulness of CP-WOPT on two real-world applications: a novel EEG (electroencephalogram) application where missing data is frequently encountered due to disconnections of electrodes and the problem of modeling computer network traffic where data may be absent due to the expense of the data collection process

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Dynamic pattern matcher using incomplete data

Author: Johnson Gordon G.
Wang Lui
Publication venue
Publication date: 23/02/1993
Field of study

This invention relates generally to pattern matching systems, and more particularly to a method for dynamically adapting the system to enhance the effectiveness of a pattern match. Apparatus and methods for calculating the similarity between patterns are known. There is considerable interest, however, in the storage and retrieval of data, particularly, when the search is called or initiated by incomplete information. For many search algorithms, a query initiating a data search requires exact information, and the data file is searched for an exact match. Inability to find an exact match thus results in a failure of the system or method

NASA Technical Reports Server

Bayesian Robust Tensor Factorization for Incomplete Multiway Data

Author: Amari Shun-ichi
Cichocki Andrzej
Zhang Liqing
Zhao Qibin
Zhou Guoxu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/04/2015
Field of study

We propose a generative model for robust tensor factorization in the presence of both missing data and outliers. The objective is to explicitly infer the underlying low-CP-rank tensor capturing the global information and a sparse tensor capturing the local information (also considered as outliers), thus providing the robust predictive distribution over missing entries. The low-CP-rank tensor is modeled by multilinear interactions between multiple latent factors on which the column sparsity is enforced by a hierarchical prior, while the sparse tensor is modeled by a hierarchical view of Student-

t

distribution that associates an individual hyperparameter with each element independently. For model learning, we develop an efficient closed-form variational inference under a fully Bayesian treatment, which can effectively prevent the overfitting problem and scales linearly with data size. In contrast to existing related works, our method can perform model selection automatically and implicitly without need of tuning parameters. More specifically, it can discover the groundtruth of CP rank and automatically adapt the sparsity inducing priors to various types of outliers. In addition, the tradeoff between the low-rank approximation and the sparse representation can be optimized in the sense of maximum model evidence. The extensive experiments and comparisons with many state-of-the-art algorithms on both synthetic and real-world datasets demonstrate the superiorities of our method from several perspectives.Comment: in IEEE Transactions on Neural Networks and Learning Systems, 201

arXiv.org e-Print Archive

CiteSeerX

Process reconstruction from incomplete and/or inconsistent data

Author: Bu?ek
Carlini
Chuang
D?Ariano
Je?ek
M. Plesch
M. Ziman
Poyatos
Ruskai
V. Bu?ek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/12/2004
Field of study

We analyze how an action of a qubit channel (map) can be estimated from the measured data that are incomplete or even inconsistent. That is, we consider situations when measurement statistics is insufficient to determine consistent probability distributions. As a consequence either the estimation (reconstruction) of the channel completely fails or it results in an unphysical channel (i.e., the corresponding map is not completely positive). We present a regularization procedure that allows us to derive physically reasonable estimates (approximations) of quantum channels. We illustrate our procedure on specific examples and we show that the procedure can be also used for a derivation of optimal approximations of operations that are forbidden by the laws of quantum mechanics (e.g., the universal NOT gate).Comment: 9pages, 5 figure

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

Formal and Informal Model Selection with Incomplete Data

Author: Beunckens Caroline
Molenberghs Geert
Verbeke Geert
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2008
Field of study

Model selection and assessment with incomplete data pose challenges in addition to the ones encountered with complete data. There are two main reasons for this. First, many models describe characteristics of the complete data, in spite of the fact that only an incomplete subset is observed. Direct comparison between model and data is then less than straightforward. Second, many commonly used models are more sensitive to assumptions than in the complete-data situation and some of their properties vanish when they are fitted to incomplete, unbalanced data. These and other issues are brought forward using two key examples, one of a continuous and one of a categorical nature. We argue that model assessment ought to consist of two parts: (i) assessment of a model's fit to the observed data and (ii) assessment of the sensitivity of inferences to unverifiable assumptions, that is, to how a model described the unobserved data given the observed ones.Comment: Published in at http://dx.doi.org/10.1214/07-STS253 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Lirias

Crossref

Document Server@UHasselt (Universiteit Hasselt)

Document Server@UHasselt

Distribution of Mutual Information from Complete and Incomplete Data

Author: Hutter Marcus
Zaffalon Marco
Publication venue
Publication date: 01/01/2004
Field of study

Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(1/n^3), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection, is shown to perform significantly better when inductive mutual information is used.Comment: 26 pages, LaTeX, 5 figures, 4 table

arXiv.org e-Print Archive

CiteSeerX

The Australian National University