11,393 research outputs found
Score Function Features for Discriminative Learning: Matrix and Tensor Framework
Feature learning forms the cornerstone for tackling challenging learning
problems in domains such as speech, computer vision and natural language
processing. In this paper, we consider a novel class of matrix and
tensor-valued features, which can be pre-trained using unlabeled samples. We
present efficient algorithms for extracting discriminative information, given
these pre-trained features and labeled samples for any related task. Our class
of features are based on higher-order score functions, which capture local
variations in the probability density function of the input. We establish a
theoretical framework to characterize the nature of discriminative information
that can be extracted from score-function features, when used in conjunction
with labeled samples. We employ efficient spectral decomposition algorithms (on
matrices and tensors) for extracting discriminative components. The advantage
of employing tensor-valued features is that we can extract richer
discriminative information in the form of an overcomplete representations.
Thus, we present a novel framework for employing generative models of the input
for discriminative learning.Comment: 29 page
The algorithm of noisy k-means
In this note, we introduce a new algorithm to deal with finite dimensional
clustering with errors in variables. The design of this algorithm is based on
recent theoretical advances (see Loustau (2013a,b)) in statistical learning
with errors in variables. As the previous mentioned papers, the algorithm mixes
different tools from the inverse problem literature and the machine learning
community. Coarsely, it is based on a two-step procedure: (1) a deconvolution
step to deal with noisy inputs and (2) Newton's iterations as the popular
k-means
Are you going to the party: depends, who else is coming? [Learning hidden group dynamics via conditional latent tree models]
Scalable probabilistic modeling and prediction in high dimensional
multivariate time-series is a challenging problem, particularly for systems
with hidden sources of dependence and/or homogeneity. Examples of such problems
include dynamic social networks with co-evolving nodes and edges and dynamic
student learning in online courses. Here, we address these problems through the
discovery of hierarchical latent groups. We introduce a family of Conditional
Latent Tree Models (CLTM), in which tree-structured latent variables
incorporate the unknown groups. The latent tree itself is conditioned on
observed covariates such as seasonality, historical activity, and node
attributes. We propose a statistically efficient framework for learning both
the hierarchical tree structure and the parameters of the CLTM. We demonstrate
competitive performance in multiple real world datasets from different domains.
These include a dataset on students' attempts at answering questions in a
psychology MOOC, Twitter users participating in an emergency management
discussion and interacting with one another, and windsurfers interacting on a
beach in Southern California. In addition, our modeling framework provides
valuable and interpretable information about the hidden group structures and
their effect on the evolution of the time series
Combining multiple observational data sources to estimate causal effects
The era of big data has witnessed an increasing availability of multiple data
sources for statistical analyses. We consider estimation of causal effects
combining big main data with unmeasured confounders and smaller validation data
with supplementary information on these confounders. Under the unconfoundedness
assumption with completely observed confounders, the smaller validation data
allow for constructing consistent estimators for causal effects, but the big
main data can only give error-prone estimators in general. However, by
leveraging the information in the big main data in a principled way, we can
improve the estimation efficiencies yet preserve the consistencies of the
initial estimators based solely on the validation data. Our framework applies
to asymptotically normal estimators, including the commonly-used regression
imputation, weighting, and matching estimators, and does not require a correct
specification of the model relating the unmeasured confounders to the observed
variables. We also propose appropriate bootstrap procedures, which makes our
method straightforward to implement using software routines for existing
estimators
Model selection in High-Dimensions: A Quadratic-risk based approach
In this article we propose a general class of risk measures which can be used
for data based evaluation of parametric models. The loss function is defined as
generalized quadratic distance between the true density and the proposed model.
These distances are characterized by a simple quadratic form structure that is
adaptable through the choice of a nonnegative definite kernel and a bandwidth
parameter. Using asymptotic results for the quadratic distances we build a
quick-to-compute approximation for the risk function. Its derivation is
analogous to the Akaike Information Criterion (AIC), but unlike AIC, the
quadratic risk is a global comparison tool. The method does not require
resampling, a great advantage when point estimators are expensive to compute.
The method is illustrated using the problem of selecting the number of
components in a mixture model, where it is shown that, by using an appropriate
kernel, the method is computationally straightforward in arbitrarily high data
dimensions. In this same context it is shown that the method has some clear
advantages over AIC and BIC.Comment: Updated with reviewer suggestion
- …