333 research outputs found
Statistical learning methods for functional data with applications to prediction, classification and outlier detection
In the era of big data, Functional Data Analysis has become increasingly important insofar
as it constitutes a powerful tool to tackle inference problems in statistics. In particular
in this thesis we have proposed several methods aimed to solve problems of
prediction of time series, classification and outlier detection from a functional approach.
The thesis is organized as follows: In Chapter 1 we introduce the concept of functional
data and state the overview of the thesis. In Chapter 2 of this work we present
the theoretical framework used to we develop the proposed methodologies.
In Chapters 3 and 4 two new ordering mappings for functional data are proposed.
The first is a Kernel depth measure, which satisfies the corresponding theoretical properties,
while the second is an entropy measure. In both cases we propose a parametric
and non-parametric estimation method that allow us to define an order in the data set
at hand. A natural application of these measures is the identification of atypical observations
(functions).
In Chapter 5 we study the Functional Autoregressive Hilbertian model. We also
propose a new family of basis functions for the estimation and prediction of the aforementioned
model, which belong to a reproducing kernel Hilbert space. The properties
of continuity obtained in this space allow us to construct confidence bands for the corresponding
predictions in a detracted time horizon.
In order to boost different classification methods, in Chapter 6 we propose a divergence
measure for functional data. This metric allows us to determine in which part of
the domain two classes of functional present divergent behavior. This methodology is
framed in the field of domain selection, and it is aimed to solve classification problems
by means of the elimination of redundant information.
Finally in Chapter 7 the general conclusions of this work and the future research
lines are presented.Financial support received from the Spanish Ministry of Economy and Competitiveness ECO2015-66593-P and the UC3M PIF scholarship for doctoral studies.Programa de Doctorado en EconomĂa de la Empresa y MĂ©todos Cuantitativos por la Universidad Carlos III de MadridPresidente: Santiago Velilla Cerdán; Secretario: Kalliopi Mylona; Vocal: Luis Antonio Belanche Muño
Logistic Regression and Classification with non-Euclidean Covariates
We introduce a logistic regression model for data pairs consisting of a
binary response and a covariate residing in a non-Euclidean metric space
without vector structures. Based on the proposed model we also develop a binary
classifier for non-Euclidean objects. We propose a maximum likelihood estimator
for the non-Euclidean regression coefficient in the model, and provide upper
bounds on the estimation error under various metric entropy conditions that
quantify complexity of the underlying metric space. Matching lower bounds are
derived for the important metric spaces commonly seen in statistics,
establishing optimality of the proposed estimator in such spaces. Similarly, an
upper bound on the excess risk of the developed classifier is provided for
general metric spaces. A finer upper bound and a matching lower bound, and thus
optimality of the proposed classifier, are established for Riemannian
manifolds. We investigate the numerical performance of the proposed estimator
and classifier via simulation studies, and illustrate their practical merits
via an application to task-related fMRI data.Comment: This revision contains the following updates: (1) The parameter space
is allowed to be unbounded; (2) Some upper bounds are tightene
Direct and Indirect Effects -- An Information Theoretic Perspective
Information theoretic (IT) approaches to quantifying causal influences have
experienced some popularity in the literature, in both theoretical and applied
(e.g. neuroscience and climate science) domains. While these causal measures
are desirable in that they are model agnostic and can capture non-linear
interactions, they are fundamentally different from common statistical notions
of causal influence in that they (1) compare distributions over the effect
rather than values of the effect and (2) are defined with respect to random
variables representing a cause rather than specific values of a cause. We here
present IT measures of direct, indirect, and total causal effects. The proposed
measures are unlike existing IT techniques in that they enable measuring causal
effects that are defined with respect to specific values of a cause while still
offering the flexibility and general applicability of IT techniques. We provide
an identifiability result and demonstrate application of the proposed measures
in estimating the causal effect of the El Ni\~no-Southern Oscillation on
temperature anomalies in the North American Pacific Northwest
- …