9 research outputs found

    An Alternative Approach to Functional Linear Partial Quantile Regression

    Full text link
    We have previously proposed the partial quantile regression (PQR) prediction procedure for functional linear model by using partial quantile covariance techniques and developed the simple partial quantile regression (SIMPQR) algorithm to efficiently extract PQR basis for estimating functional coefficients. However, although the PQR approach is considered as an attractive alternative to projections onto the principal component basis, there are certain limitations to uncovering the corresponding asymptotic properties mainly because of its iterative nature and the non-differentiability of the quantile loss function. In this article, we propose and implement an alternative formulation of partial quantile regression (APQR) for functional linear model by using block relaxation method and finite smoothing techniques. The proposed reformulation leads to insightful results and motivates new theory, demonstrating consistency and establishing convergence rates by applying advanced techniques from empirical process theory. Two simulations and two real data from ADHD-200 sample and ADNI are investigated to show the superiority of our proposed methods

    Feature Selection for Functional Data

    Full text link
    In this paper we address the problem of feature selection when the data is functional, we study several statistical procedures including classification, regression and principal components. One advantage of the blinding procedure is that it is very flexible since the features are defined by a set of functions, relevant to the problem being studied, proposed by the user. Our method is consistent under a set of quite general assumptions, and produces good results with the real data examples that we analyze.Comment: 22 pages, 4 figure

    Functional Regression

    Full text link
    Functional data analysis (FDA) involves the analysis of data whose ideal units of observation are functions defined on some continuous domain, and the observed data consist of a sample of functions taken from some population, sampled on a discrete grid. Ramsay and Silverman's 1997 textbook sparked the development of this field, which has accelerated in the past 10 years to become one of the fastest growing areas of statistics, fueled by the growing number of applications yielding this type of data. One unique characteristic of FDA is the need to combine information both across and within functions, which Ramsay and Silverman called replication and regularization, respectively. This article will focus on functional regression, the area of FDA that has received the most attention in applications and methodological development. First will be an introduction to basis functions, key building blocks for regularization in functional regression methods, followed by an overview of functional regression methods, split into three types: [1] functional predictor regression (scalar-on-function), [2] functional response regression (function-on-scalar) and [3] function-on-function regression. For each, the role of replication and regularization will be discussed and the methodological development described in a roughly chronological manner, at times deviating from the historical timeline to group together similar methods. The primary focus is on modeling and methodology, highlighting the modeling structures that have been developed and the various regularization approaches employed. At the end is a brief discussion describing potential areas of future development in this field

    Estimates and bootstrap calibration for functional regression with scalar response

    Get PDF
    The author proposes new presmoothed FPCA-estimators and bootstrap methods for functional linear regression with scalar response, and a thresholding procedure, which detects hidden patterns, for nonparametric functional regression with scalar response

    On the theory and practice of variable selection for functional data

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Matemáticas. Fecha de lectura: 17-12-2015Functional Data Analysis (FDA) might be seen as a partial aspect of the modern mainstream paradigm generally known as Big Data Analysis. The study of functional data requires new methodologies that take into account their special features (e.g. infinite dimension and high level of redundancy). Hence, the use of variable selection methods appears as a particularly appealing choice in this context. Throughout this work, variable selection is considered in the setting of supervised binary classification with functional data fX(t); t 2 [0; 1]g. By variable selection we mean any dimension-reduction method which leads to replace the whole trajectory fX(t); t 2 [0; 1]g, with a low-dimensional vector (X(t1); : : : ;X(td)) still keeping a similar classification error. In this thesis we have addressed the “functional variable selection” in classification problems from both theoretical and empirical perspectives. We first restrict ourselves to the standard situation in which our functional data are generated from Gaussian processes, with distributions P0 and P1 in both populations under study. The classical Hajek-Feldman dichotomy establishes that P0 and P1 are either mutually absolutely continuous with respect to each other (so there is a Radon-Nikodym (RN) density for each measure with respect to the other one) or mutually singular. Unlike the case of finite dimensional Gaussian measures, there are non-trivial examples of mutually singular distributions when dealing with Gaussian stochastic processes. This work provides explicit expressions for the optimal (Bayes) rule in several relevant problems of supervised binary (functional) classification under the absolutely continuous case. Our approach relies on some classical results in the theory of stochastic processes where the so-called Reproducing Kernel Hilbert Spaces (RKHS) play a special role. This RKHS framework allows us also to give an interpretation, in terms of mutual singularity, for the “near perfect classification” phenomenon described by Delaigle and Hall (2012a). We show that the asymptotically optimal rule proposed by these authors can be identified with the sequence of optimal rules for an approximating sequence of classification problems in the absolutely continuous case. The methodological contributions of this thesis are centred in three variable selection methods. The obvious general criterion for variable selection is to choose the “most representative” or “most relevant” variables. However, it is also clear that a purely relevance-oriented criterion could lead to select many redundant variables. First, we provide a new model-based method for variable selection in binary classification problems, which arises in a very natural way from the explicit knowledge of the RN-derivatives and the underlying RKHS structure. As a consequence, the optimal classifier in a wide class of functional classification problems can be expressed in terms of a classical, linear finite-dimensional Fisher rule. Our second proposal for variable selection is based on the idea of selecting the local maxima (t1; : : : ; td) of the function V2 X (t) = V2(X(t); Y ), where V denotes the distance covariance III IV ABSTRACT association measure for random variables due to Sz´ekely et al. (2007). This method provides a simple natural way to deal with the relevance vs. redundancy trade-off which typically appears in variable selection. This proposal is backed by a result of consistent estimation for the maxima of V2 X . We also show different models for the underlying process X(t) under which the relevant information is concentrated on the maxima of V2 X . Our third proposal for variable selection consists of a new version of the minimum Redundancy Maximum Relevance (mRMR) procedure proposed by Ding and Peng (2005) and Peng et al. (2005). It is an algorithm to systematically perform variable selection, achieving a reasonable trade-off between relevance and redundancy. In its original form, this procedure is based on the use of the so-called mutual information criterion to assess relevance and redundancy. Keeping the focus on functional data problems, we propose here a modified version of the mRMR method, obtained by replacing the mutual information by the new distance correlation measure in the general implementation of this method. The performance of the new proposals is assessed through an extensive empirical study, including about 400 simulated models (100 functional models 4 sample sizes) and real data examples, aimed at comparing our variable selection methods with other standard procedures for dimension reduction. The comparison involves different classifiers. A real problem with biomedical data is also analysed in collaboration with researchers of Hospital Vall d’Hebron (Barcelona). The overall conclusions of the empirical experiments are quite positive in favour of the proposed methodologies.El Análisis de Datos Funcionales (FDA por sus siglas en inglés) puede ser visto como una de las facetas del paradigma general conocido como Big Data Analysis. El estudio de los datos funcionales requiere la utilización de nuevas metodologías que tengan en cuenta las características especiales de estos datos (por ejemplo, la dimensión infinita y la elevada redundancia). En este contexto, las técnicas de selección de variables parecen particularmente atractivas. A lo largo de este trabajo, estudiaremos la selección de variables dentro del marco de la clasificación supervisada binaria con datos funcionales fX(t); t 2 [0; 1]g. Por selección de variables entendemos cualquier método de reducción de dimensión enfocado a remplazar las trayectorias completas fX(t); t 2 [0; 1]g por vectores de baja dimensión (X(t1); : : : ;X(td)) conservando la informaci ón discriminante. En esta tesis hemos abordado la “selección de variables funcional” en problemas de clasificación tanto en su vertiente teórica como empírica. Nos restringiremos esencialmente al caso general en que los datos funcionales están generados por procesos Gaussianos, con distribuciones P0 y P1 en las distintas poblaciones. La dicotomía de Hajek-Feldman establece que P0 y P1 sólo pueden ser mutuamente absolutamente continuas (existiendo entonces una densidad de Radon-Nikodym (RN) de cada medida con respecto al a otra) o mutuamente singulares. A diferencia del caso finito dimensional, cuando trabajamos con procesos Gaussianos aparecen ejemplos no triviales de distribuciones mutuamente singulares. En este trabajo se dan expresiones explíıcitas de la regla de clasificación óptima (Bayes) para algunos problemas funcionales binarios relevantes en el contexto absolutamente continuo. Nuestro enfoque se basa en algunos resultados clásicos de la teoría de procesos estocásticos, entre los que los Espacios de Hilbert de Núcleos Reproductores (RKHS) desempeñan un papel fundamental. Este marco RKHS nos permite también dar una interpretacién del fenómeno de la “clasificación casi perfecta” descrito por Delaigle and Hall (2012a), en términos de la singularidad mutua de las distribuciones. Las contribuciones metodológicas de esta tesis se centran en tres métodos de selección de variables. El criterio más obvio para seleccionar las variables sería elegir aquéllas “más representativas” o “más relevantes”. Sin embargo, un criterio basado únicamente en la relevancia probablemente conduciría a la selección de muchas variables redundantes. En primer lugar, proponemos un nuevo método de selección de variables basado en modelo, que surge de manera natural del conocimiento de las derivadas RN y de la estructura RKHS subyacente. Como consecuencia, el clasificador óptimo para una amplia clase de problemas de clasificación funcional puede expresarse en términos de la regla lineal de Fisher finito dimensional. Nuestra segunda propuesta para selección de variables se basa en la idea de seleccionar los máximos locales (t1; : : : ; td) de la función V2 X (t) = V2(X(t); Y ), donde V denota la covarianza de distancias, medida de asociación entre variables aleatorias propuesta por Székely et al. (2007). Este procedimiento se ocupa de manera natural del equilibrio entre relevancia y redundancia tıpico de la selección de variables. Esta propuesta está respaldada por un resultado de consistencia en la estimación de los máximos de V2 X . Además, se muestran distintos modelos de procesos subyacentes X(t) para los que la información relevante se concentra en los máximos de V2 X . La tercera propuesta para seleccionar variables es una nueva versión del método mRMR (mínima Redundancia Máxima Relevancia), propuesto en Ding and Peng (2005) y Peng et al. (2005). Este algoritmo realiza una selección de variables sistemática, consiguiendo un equilibrio relevancia-redundancia razonable. El procedimiento mRMR original se basa en la utilización de la información mutual para medir la relevancia y la redundancia. Manteniendo el problema funcional como referencia, se propone una nueva versión de mRMR en la que la información mutua es remplazada por la nueva correlación de distancias. El rendimiento de las nuevas propuestas es evaluado mediante extensos estudios empíricos con el objetivo de comparar nuestros métodos de selección de variables con otros procedimientos de reducción de dimensiónn ya establecidos. Los experimentos incluyen 400 modelos de simulación (100 modelos funcionales 4 tama˜nos muestrales) y ejemplos con datos reales. La comparativa incluye distintos clasificadores. Además se ha analizado un problema real con datos biomédicos en colaboración con investigadores del Hospital Vall d’Hebron (Barcelona). Los resultados del estudio son, en general, bastante positivos para los nuevos métodosLos medios para que pudiera llevar a cabo mi investigación provienen del Departamento de Matemáticas, el Instituto de Ingeniería del Conocimiento y al programa FPI del MICIN
    corecore