46 research outputs found

    Advances in functional regression and classification models

    Get PDF
    Functional data analysis (FDA) has become a very active field of research in the last few years because it appears naturally in most scientific fields: energy (electricity price curves), environment (curves of pollutant levels), chemometrics (spectrometric data), etc. This thesis is a compendium of the following publications: 1) "Statistical computing in functional data analysis: the R package fda.usc" published in the J STAT SOFTW, the core advances of this paper was to propose a common framework for FDA in R. 2) "Predicting seasonal influenza transmission using functional regression models with temporal dependence" published in PLoS ONE proposes an extension of GLS model to functional case. 3) "The DDG^G--classifier in the functional setting" published in TEST extends the DD-classifier using information derived of the functional depth. 4) "Determining optimum wavelengths for leaf water content estimation from reflectance: A distance correlation approach" published in CHEMOMETR INTELL LAB SYST studies the utility of distance correlation as a method to select impact points in functional regression. 5) "Variable selection in Functional Additive Regression Models", in Comput Stat proposes a variable selection algorithm in the case of mixed predictors (scalar, functional, etc.)

    The DDG^G-classifier in the functional setting

    Get PDF
    The Maximum Depth was the first attempt to use data depths instead of multivariate raw data to construct a classification rule. Recently, the DD-classifier has solved several serious limitations of the Maximum Depth classifier but some issues still remain. This paper is devoted to extending the DD-classifier in the following ways: first, to surpass the limitation of the DD-classifier when more than two groups are involved. Second to apply regular classification methods (like kkNN, linear or quadratic classifiers, recursive partitioning,...) to DD-plots to obtain useful insights through the diagnostics of these methods. And third, to integrate different sources of information (data depths or multivariate functional data) in a unified way in the classification procedure. Besides, as the DD-classifier trick is especially useful in the functional framework, an enhanced revision of several functional data depths is done in the paper. A simulation study and applications to some classical real datasets are also provided showing the power of the new proposal.Comment: 29 pages, 6 figures, 6 tables, Supplemental R Code and Dat

    Statistical Computing in Functional Data Analysis: The R Package fda.usc

    Get PDF
    This paper is devoted to the R package fda.usc which includes some utilities for functional data analysis. This package carries out exploratory and descriptive analysis of functional data analyzing its most important features such as depth measurements or functional outliers detection, among others. The R package fda.usc also includes functions to compute functional regression models, with a scalar response and a functional explanatory data via non-parametric functional regression, basis representation or functional principal components analysis. There are natural extensions such as functional linear models and semi-functional partial linear models, which allow non-functional covariates and factors and make predictions. The functions of this package complement and incorporate the two main references of functional data analysis: The R package fda and the functions implemented by Ferraty and Vieu (2006)S

    Autoproblem : motor generador de problemes d'estadistica bàsica

    Get PDF
    Els objectius del projecte són: -Crear un conjunt d'aplicacions portables a qualsevol ordinador, per tal de que arribi al màxim d'usuaris possibles sigui quina sigui la seva situació geografica, si és a casa o si és a la facultat. D'aquesta manera es vol crear una aplicació que es pogués utilitiar com a material de classes de pràctiques, o com a material d'examen, o simplement perque l'estudiant pugui estudiar l'assignatura. - Creació d'un conjunt de rutines de simulació de dades aleatóries. D'aquesta manera estudiants que executen el mateix problema alhora tenen dades diferents, i per tant poden arribar a conclusions diferents. - La creació d'un programa d'edició de problemes senzill i fàcíl d'utilitzar per part del professor. - Crear una aplicació de resolució seqüencial i guiada deis problemes plantejats pel professor. Aquest és un dels objectius més importants, doncs es vol que l'estudiant resolgui el problema plantejat per l'aplicació de manera que posi en pràctica tot el que ha après a la classe i a més, que el resolgui d'una manera seqüencial. La creació d'un sistema de control deis estudiants que executin els problemes a la pàgina web. Aquest objectiu assolir aquest objectiu és important de cara a fer pràctiques o exàmens amb l'aplicació de l'estudiant. D'aquesta manera el professor pot conèixer l'evolució deIs seus estudiants al llarg del quadrimestres en que es faci l'assígnatura

    Variable selection in Functional Additive Regression Models

    Get PDF
    This is a post-peer-review, pre-copyedit version of an chapter published in Functional Statistics and Related Fields. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-55846-2_15This paper considers the problem of variable selection when some of the variables have a functional nature and can be mixed with other type of variables (scalar, multivariate, directional, etc). Our proposal begins with a simple null model and sequentially selects a new variable to be incorporated into the model. For the sake of simplicity, this paper only uses additive models. However, the proposed algorithm may assess the type of contribution (linear, non linear, …) of each variable. The algorithm have showed quite promising results when applied to real data setsThe authors acknowledge financial support from Ministerio de Economía y Competitividad grant MTM2013-41383-

    Functional Location-Scale Model to Forecast Bivariate Pollution Episodes

    Get PDF
    Predicting anomalous emission of pollutants into the atmosphere well in advance is crucial for industries emitting such elements, since it allows them to take corrective measures aimed to avoid such emissions and their consequences. In this work, we propose a functional location-scale model to predict in advance pollution episodes where two pollutants are involved. Functional generalized additive models (FGAMs) are used to estimate the means and variances of the model, as well as the correlation between both pollutants. The method not only forecasts the concentrations of both pollutants, it also estimates an uncertainty region where the concentrations of both pollutants should be located, given a specific level of uncertainty. The performance of the model was evaluated using real data of SO 2 and NO x emissions from a coal-fired power station, obtaining good resultsThe authors acknowledge financial support from: (1) UO-Proyecto Uni-Ovi (PAPI-18-GR-2014-0014), (2) Project MTM2016-76969-P from Ministerio de Economía y Competitividad—Agencia Estatal de Investigación and European Regional Development Fund (ERDF) and IAP network StUDyS from Belgian Science Policy, (3) Nuevos avances metodológicos y computacionales en estadística no-paramétrica y semiparamétrica—Ministerio de Ciencia e Investigación (MTM2017-89422-P)S

    Functional Regression Models with Functional Response: New Approaches and a Comparative Study

    Full text link
    This paper proposes three new approaches for additive functional regression models with functional responses. The first one is a reformulation of the linear regression model, and the last two are on the yet scarce case of additive nonlinear functional regression models. Both proposals are based on extensions of similar models for scalar responses. One of our nonlinear models is based on constructing a Spectral Additive Model (the word "Spectral" refers to the representation of the covariates in an \mcal{L}_2 basis), which is restricted (by construction) to Hilbertian spaces. The other one extends the kernel estimator, and it can be applied to general metric spaces since it is only based on distances. We include our new approaches as well as real datasets in an R package. The performances of the new proposals are compared with previous ones, which we review theoretically and practically in this paper. The simulation results show the advantages of the nonlinear proposals and the small loss of efficiency when the simulation scenario is truly linear. Finally, the supplementary material provides a visualization tool for checking the linearity of the relationship between a single covariate and the response.Comment: Submitte

    A distance correlation approach for optimum multiscale selection in 3D point cloud classification

    Get PDF
    [Abstract] Supervised classification of 3D point clouds using machine learning algorithms and handcrafted local features as covariates frequently depends on the size of the neighborhood (scale) around each point used to determine those features. It is therefore crucial to estimate the scale or scales providing the best classification results. In this work, we propose three methods to estimate said scales, all of them based on calculating the maximum values of the distance correlation (DC) functions between the features and the label assigned to each point. The performance of the methods was tested using simulated data, and the method presenting the best results was applied to a benchmark data set for point cloud classification. This method consists of detecting the local maximums of DC functions previously smoothed to avoid choosing scales that are very close to each other. Five different classifiers were used: linear discriminant analysis, support vector machines, random forest, multinomial logistic regression and multilayer perceptron neural network. The results obtained were compared with those from other strategies available in the literature, being favorable to our approach.Xunta de Galicia; ED431G 2019/01Ministerio de Ciencia, Innovación y Universidades; MTM2016-76969-PXunta de Galicia; ED431C-2020-14MINECO/AEI/FEDER, UE; MTM2017-89422-

    Real-time predictive seasonal influenza model in Catalonia, Spain

    Get PDF
    Influenza surveillance is critical to monitoring the situation during epidemic seasons and predictive mathematic models may aid the early detection of epidemic patterns. The objective of this study was to design a real-time spatial predictive model of ILI (Influenza Like Illness) incidence rate in Catalonia using one- and two-week forecasts. The available data sources used to select explanatory variables to include in the model were the statutory reporting disease system and the sentinel surveillance system in Catalonia for influenza incidence rates, the official climate service in Catalonia for meteorological data, laboratory data and Google Flu Trend. Time series for every explanatory variable with data from the last 4 seasons (from 2010-2011 to 2013-2014) was created. A pilot test was conducted during the 2014-2015 season to select the explanatory variables to be included in the model and the type of model to be applied. During the 2015-2016 season a real-time model was applied weekly, obtaining the intensity level and predicted incidence rates with 95% confidence levels one and two weeks away for each health region. At the end of the season, the confidence interval success rate (CISR) and intensity level success rate (ILSR) were analysed. For the 2015-2016 season a CISR of 85.3% at one week and 87.1% at two weeks and an ILSR of 82.9% and 82% were observed, respectively. The model described is a useful tool although it is hard to evaluate due to uncertainty. The accuracy of prediction at one and two weeks was above 80% globally, but was lower during the peak epidemic period. In order to improve the predictive power, new explanatory variables should be included

    Determining optimum wavelengths for leaf water content estimation from reflectance: A distance correlation approach

    Get PDF
    This paper proposes a method to estimate leaf water content from reflectance in four commercial vineyard varieties by estimating the local maxima of a distance correlation function. First, it applies four different functional regression models to the data and compares the models to test the viability of estimating water content from reflectance. It then applies our methodology to select a small number of wavelengths (optimum wavelengths) from the continuous spectrum, which simplifies the regression problem. Finally, it compares the results to those obtained by means of two different methods: a nonparametric kernel smoothing for variable selection in functional data and a wavelet-based weighted LASSO functional linear regression. Our approach proved to have some advantages over these two testing approaches, mainly in terms of the computing time and the lack of assumption of an underlying model. Finally, the paper concludes that estimating water content from a few wavelengths is almost equivalent to doing so using larger wavelength intervalsThis study was made possible withfinancial funding from: a) FC-15-GRUPIN14-033 of the Fundaci on para el Fomento en Asturias de la Investigación Científica Aplicada y la Tecnología (FICYT) (Spain), with FEDER support included, b) Ministry of Economy and Competitiveness (MTM2016-76969P) and European Regional Development Fund, b) Spanish Ministry of Economy and Competitiveness (Grant numbers MTM2013-41383-P and MTM2016-76969-P) and European Regional Development Fund (ERDF). c) Grupo de Referencia Competitiva,2016–2019 (ED431C 2016/040),financiado pola Consellería de Cultura,Educación e Ordenación Universitaria, Xunta de GaliciaS
    corecore