333 research outputs found

    Statistical learning methods for functional data with applications to prediction, classification and outlier detection

    Get PDF
    In the era of big data, Functional Data Analysis has become increasingly important insofar as it constitutes a powerful tool to tackle inference problems in statistics. In particular in this thesis we have proposed several methods aimed to solve problems of prediction of time series, classification and outlier detection from a functional approach. The thesis is organized as follows: In Chapter 1 we introduce the concept of functional data and state the overview of the thesis. In Chapter 2 of this work we present the theoretical framework used to we develop the proposed methodologies. In Chapters 3 and 4 two new ordering mappings for functional data are proposed. The first is a Kernel depth measure, which satisfies the corresponding theoretical properties, while the second is an entropy measure. In both cases we propose a parametric and non-parametric estimation method that allow us to define an order in the data set at hand. A natural application of these measures is the identification of atypical observations (functions). In Chapter 5 we study the Functional Autoregressive Hilbertian model. We also propose a new family of basis functions for the estimation and prediction of the aforementioned model, which belong to a reproducing kernel Hilbert space. The properties of continuity obtained in this space allow us to construct confidence bands for the corresponding predictions in a detracted time horizon. In order to boost different classification methods, in Chapter 6 we propose a divergence measure for functional data. This metric allows us to determine in which part of the domain two classes of functional present divergent behavior. This methodology is framed in the field of domain selection, and it is aimed to solve classification problems by means of the elimination of redundant information. Finally in Chapter 7 the general conclusions of this work and the future research lines are presented.Financial support received from the Spanish Ministry of Economy and Competitiveness ECO2015-66593-P and the UC3M PIF scholarship for doctoral studies.Programa de Doctorado en Economía de la Empresa y Métodos Cuantitativos por la Universidad Carlos III de MadridPresidente: Santiago Velilla Cerdán; Secretario: Kalliopi Mylona; Vocal: Luis Antonio Belanche Muño

    Logistic Regression and Classification with non-Euclidean Covariates

    Full text link
    We introduce a logistic regression model for data pairs consisting of a binary response and a covariate residing in a non-Euclidean metric space without vector structures. Based on the proposed model we also develop a binary classifier for non-Euclidean objects. We propose a maximum likelihood estimator for the non-Euclidean regression coefficient in the model, and provide upper bounds on the estimation error under various metric entropy conditions that quantify complexity of the underlying metric space. Matching lower bounds are derived for the important metric spaces commonly seen in statistics, establishing optimality of the proposed estimator in such spaces. Similarly, an upper bound on the excess risk of the developed classifier is provided for general metric spaces. A finer upper bound and a matching lower bound, and thus optimality of the proposed classifier, are established for Riemannian manifolds. We investigate the numerical performance of the proposed estimator and classifier via simulation studies, and illustrate their practical merits via an application to task-related fMRI data.Comment: This revision contains the following updates: (1) The parameter space is allowed to be unbounded; (2) Some upper bounds are tightene

    Direct and Indirect Effects -- An Information Theoretic Perspective

    Full text link
    Information theoretic (IT) approaches to quantifying causal influences have experienced some popularity in the literature, in both theoretical and applied (e.g. neuroscience and climate science) domains. While these causal measures are desirable in that they are model agnostic and can capture non-linear interactions, they are fundamentally different from common statistical notions of causal influence in that they (1) compare distributions over the effect rather than values of the effect and (2) are defined with respect to random variables representing a cause rather than specific values of a cause. We here present IT measures of direct, indirect, and total causal effects. The proposed measures are unlike existing IT techniques in that they enable measuring causal effects that are defined with respect to specific values of a cause while still offering the flexibility and general applicability of IT techniques. We provide an identifiability result and demonstrate application of the proposed measures in estimating the causal effect of the El Ni\~no-Southern Oscillation on temperature anomalies in the North American Pacific Northwest
    • …
    corecore