Search CORE

11,393 research outputs found

Score Function Features for Discriminative Learning: Matrix and Tensor Framework

Author: Anandkumar Anima
Janzamin Majid
Sedghi Hanie
Publication venue
Publication date: 08/12/2014
Field of study

Feature learning forms the cornerstone for tackling challenging learning problems in domains such as speech, computer vision and natural language processing. In this paper, we consider a novel class of matrix and tensor-valued features, which can be pre-trained using unlabeled samples. We present efficient algorithms for extracting discriminative information, given these pre-trained features and labeled samples for any related task. Our class of features are based on higher-order score functions, which capture local variations in the probability density function of the input. We establish a theoretical framework to characterize the nature of discriminative information that can be extracted from score-function features, when used in conjunction with labeled samples. We employ efficient spectral decomposition algorithms (on matrices and tensors) for extracting discriminative components. The advantage of employing tensor-valued features is that we can extract richer discriminative information in the form of an overcomplete representations. Thus, we present a novel framework for employing generative models of the input for discriminative learning.Comment: 29 page

arXiv.org e-Print Archive

eScholarship - University of California

The algorithm of noisy k-means

Author: Brunet Camille
Loustau Sébastien
Publication venue
Publication date: 01/08/2013
Field of study

In this note, we introduce a new algorithm to deal with finite dimensional clustering with errors in variables. The design of this algorithm is based on recent theoretical advances (see Loustau (2013a,b)) in statistical learning with errors in variables. As the previous mentioned papers, the algorithm mixes different tools from the inverse problem literature and the machine learning community. Coarsely, it is based on a two-step procedure: (1) a deconvolution step to deal with noisy inputs and (2) Newton's iterations as the popular k-means

arXiv.org e-Print Archive

Are you going to the party: depends, who else is coming? [Learning hidden group dynamics via conditional latent tree models]

Author: Anandkumar Animashree
Arabshahi Forough
Butts Carter T.
Fitshugh Sean M.
Huang Furong
Publication venue
Publication date: 01/11/2015
Field of study

Scalable probabilistic modeling and prediction in high dimensional multivariate time-series is a challenging problem, particularly for systems with hidden sources of dependence and/or homogeneity. Examples of such problems include dynamic social networks with co-evolving nodes and edges and dynamic student learning in online courses. Here, we address these problems through the discovery of hierarchical latent groups. We introduce a family of Conditional Latent Tree Models (CLTM), in which tree-structured latent variables incorporate the unknown groups. The latent tree itself is conditioned on observed covariates such as seasonality, historical activity, and node attributes. We propose a statistically efficient framework for learning both the hierarchical tree structure and the parameters of the CLTM. We demonstrate competitive performance in multiple real world datasets from different domains. These include a dataset on students' attempts at answering questions in a psychology MOOC, Twitter users participating in an emergency management discussion and interacting with one another, and windsurfers interacting on a beach in Southern California. In addition, our modeling framework provides valuable and interpretable information about the hidden group structures and their effect on the evolution of the time series

arXiv.org e-Print Archive

Crossref

Caltech Authors

Combining multiple observational data sources to estimate causal effects

Author: Ding Peng
Yang Shu
Publication venue
Publication date: 20/04/2019
Field of study

The era of big data has witnessed an increasing availability of multiple data sources for statistical analyses. We consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data with supplementary information on these confounders. Under the unconfoundedness assumption with completely observed confounders, the smaller validation data allow for constructing consistent estimators for causal effects, but the big main data can only give error-prone estimators in general. However, by leveraging the information in the big main data in a principled way, we can improve the estimation efficiencies yet preserve the consistencies of the initial estimators based solely on the validation data. Our framework applies to asymptotically normal estimators, including the commonly-used regression imputation, weighting, and matching estimators, and does not require a correct specification of the model relating the unmeasured confounders to the observed variables. We also propose appropriate bootstrap procedures, which makes our method straightforward to implement using software routines for existing estimators

arXiv.org e-Print Archive

FigShare

Model selection in High-Dimensions: A Quadratic-risk based approach

Author: Akaike H.
Blom G.
Bollen K.
Fraley C.
Haughton D. M. A.
Lindsay B.
McLachlan G.
Ray S.
Schwarz G.
Serfling R. J.
van der Laan M. J.
Yang Y
Publication venue
Publication date: 01/01/2006
Field of study

In this article we propose a general class of risk measures which can be used for data based evaluation of parametric models. The loss function is defined as generalized quadratic distance between the true density and the proposed model. These distances are characterized by a simple quadratic form structure that is adaptable through the choice of a nonnegative definite kernel and a bandwidth parameter. Using asymptotic results for the quadratic distances we build a quick-to-compute approximation for the risk function. Its derivation is analogous to the Akaike Information Criterion (AIC), but unlike AIC, the quadratic risk is a global comparison tool. The method does not require resampling, a great advantage when point estimators are expensive to compute. The method is illustrated using the problem of selecting the number of components in a mixture model, where it is shown that, by using an appropriate kernel, the method is computationally straightforward in arbitrarily high data dimensions. In this same context it is shown that the method has some clear advantages over AIC and BIC.Comment: Updated with reviewer suggestion

arXiv.org e-Print Archive

CiteSeerX

Crossref

Research Papers in Economics

Enlighten