49 research outputs found

    Advances in functional regression and classification models

    Get PDF
    Functional data analysis (FDA) has become a very active field of research in the last few years because it appears naturally in most scientific fields: energy (electricity price curves), environment (curves of pollutant levels), chemometrics (spectrometric data), etc. This thesis is a compendium of the following publications: 1) "Statistical computing in functional data analysis: the R package fda.usc" published in the J STAT SOFTW, the core advances of this paper was to propose a common framework for FDA in R. 2) "Predicting seasonal influenza transmission using functional regression models with temporal dependence" published in PLoS ONE proposes an extension of GLS model to functional case. 3) "The DDG^G--classifier in the functional setting" published in TEST extends the DD-classifier using information derived of the functional depth. 4) "Determining optimum wavelengths for leaf water content estimation from reflectance: A distance correlation approach" published in CHEMOMETR INTELL LAB SYST studies the utility of distance correlation as a method to select impact points in functional regression. 5) "Variable selection in Functional Additive Regression Models", in Comput Stat proposes a variable selection algorithm in the case of mixed predictors (scalar, functional, etc.)

    Statistical Computing in Functional Data Analysis: The R Package fda.usc

    Get PDF
    This paper is devoted to the R package fda.usc which includes some utilities for functional data analysis. This package carries out exploratory and descriptive analysis of functional data analyzing its most important features such as depth measurements or functional outliers detection, among others. The R package fda.usc also includes functions to compute functional regression models, with a scalar response and a functional explanatory data via non-parametric functional regression, basis representation or functional principal components analysis. There are natural extensions such as functional linear models and semi-functional partial linear models, which allow non-functional covariates and factors and make predictions. The functions of this package complement and incorporate the two main references of functional data analysis: The R package fda and the functions implemented by Ferraty and Vieu (2006)S

    The DDG^G-classifier in the functional setting

    Get PDF
    The Maximum Depth was the first attempt to use data depths instead of multivariate raw data to construct a classification rule. Recently, the DD-classifier has solved several serious limitations of the Maximum Depth classifier but some issues still remain. This paper is devoted to extending the DD-classifier in the following ways: first, to surpass the limitation of the DD-classifier when more than two groups are involved. Second to apply regular classification methods (like kkNN, linear or quadratic classifiers, recursive partitioning,...) to DD-plots to obtain useful insights through the diagnostics of these methods. And third, to integrate different sources of information (data depths or multivariate functional data) in a unified way in the classification procedure. Besides, as the DD-classifier trick is especially useful in the functional framework, an enhanced revision of several functional data depths is done in the paper. A simulation study and applications to some classical real datasets are also provided showing the power of the new proposal.Comment: 29 pages, 6 figures, 6 tables, Supplemental R Code and Dat

    Autoproblem : motor generador de problemes d'estadistica bàsica

    Get PDF
    Els objectius del projecte són: -Crear un conjunt d'aplicacions portables a qualsevol ordinador, per tal de que arribi al màxim d'usuaris possibles sigui quina sigui la seva situació geografica, si és a casa o si és a la facultat. D'aquesta manera es vol crear una aplicació que es pogués utilitiar com a material de classes de pràctiques, o com a material d'examen, o simplement perque l'estudiant pugui estudiar l'assignatura. - Creació d'un conjunt de rutines de simulació de dades aleatóries. D'aquesta manera estudiants que executen el mateix problema alhora tenen dades diferents, i per tant poden arribar a conclusions diferents. - La creació d'un programa d'edició de problemes senzill i fàcíl d'utilitzar per part del professor. - Crear una aplicació de resolució seqüencial i guiada deis problemes plantejats pel professor. Aquest és un dels objectius més importants, doncs es vol que l'estudiant resolgui el problema plantejat per l'aplicació de manera que posi en pràctica tot el que ha après a la classe i a més, que el resolgui d'una manera seqüencial. La creació d'un sistema de control deis estudiants que executin els problemes a la pàgina web. Aquest objectiu assolir aquest objectiu és important de cara a fer pràctiques o exàmens amb l'aplicació de l'estudiant. D'aquesta manera el professor pot conèixer l'evolució deIs seus estudiants al llarg del quadrimestres en que es faci l'assígnatura

    Variable selection in Functional Additive Regression Models

    Get PDF
    This is a post-peer-review, pre-copyedit version of an chapter published in Functional Statistics and Related Fields. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-55846-2_15This paper considers the problem of variable selection when some of the variables have a functional nature and can be mixed with other type of variables (scalar, multivariate, directional, etc). Our proposal begins with a simple null model and sequentially selects a new variable to be incorporated into the model. For the sake of simplicity, this paper only uses additive models. However, the proposed algorithm may assess the type of contribution (linear, non linear, …) of each variable. The algorithm have showed quite promising results when applied to real data setsThe authors acknowledge financial support from Ministerio de Economía y Competitividad grant MTM2013-41383-

    Functional Location-Scale Model to Forecast Bivariate Pollution Episodes

    Get PDF
    Predicting anomalous emission of pollutants into the atmosphere well in advance is crucial for industries emitting such elements, since it allows them to take corrective measures aimed to avoid such emissions and their consequences. In this work, we propose a functional location-scale model to predict in advance pollution episodes where two pollutants are involved. Functional generalized additive models (FGAMs) are used to estimate the means and variances of the model, as well as the correlation between both pollutants. The method not only forecasts the concentrations of both pollutants, it also estimates an uncertainty region where the concentrations of both pollutants should be located, given a specific level of uncertainty. The performance of the model was evaluated using real data of SO 2 and NO x emissions from a coal-fired power station, obtaining good resultsThe authors acknowledge financial support from: (1) UO-Proyecto Uni-Ovi (PAPI-18-GR-2014-0014), (2) Project MTM2016-76969-P from Ministerio de Economía y Competitividad—Agencia Estatal de Investigación and European Regional Development Fund (ERDF) and IAP network StUDyS from Belgian Science Policy, (3) Nuevos avances metodológicos y computacionales en estadística no-paramétrica y semiparamétrica—Ministerio de Ciencia e Investigación (MTM2017-89422-P)S

    Functional Regression Models with Functional Response: New Approaches and a Comparative Study

    Full text link
    This paper proposes three new approaches for additive functional regression models with functional responses. The first one is a reformulation of the linear regression model, and the last two are on the yet scarce case of additive nonlinear functional regression models. Both proposals are based on extensions of similar models for scalar responses. One of our nonlinear models is based on constructing a Spectral Additive Model (the word "Spectral" refers to the representation of the covariates in an \mcal{L}_2 basis), which is restricted (by construction) to Hilbertian spaces. The other one extends the kernel estimator, and it can be applied to general metric spaces since it is only based on distances. We include our new approaches as well as real datasets in an R package. The performances of the new proposals are compared with previous ones, which we review theoretically and practically in this paper. The simulation results show the advantages of the nonlinear proposals and the small loss of efficiency when the simulation scenario is truly linear. Finally, the supplementary material provides a visualization tool for checking the linearity of the relationship between a single covariate and the response.Comment: Submitte

    Prevalence and associated factors of sexual, psychological, and physical violence among physical therapists in their clinical role in Spain: a national web-based cross-sectional survey

    Get PDF
    Observational study[Abstract] Objectives: To determine the extent of career-long and 12-month exposure to sexual, physical, and psychological/verbal violence committed by patients or their companions among physical therapists in Spain. Additionally, to identify the factors associated with such exposure. Methods: This study employed an observational cross-sectional approach. Initially, a questionnaire was developed and validated using a convenience sample. Subsequently, it was distributed via email to all physical therapists registered in Spain in the first quarter of 2022. Individual risk models were created for each type of violence experienced within the past 12 months. Results: The prevalence of violence encountered by physical therapists throughout their careers was 47.9% for sexual violence, 42.7% for psychological/verbal abuse, and 17.6% for physical abuse. Lower values were observed within the last 12 months (13.4%, 15.8%, and 5.2%, respectively). Statistical risk modeling for each type of violence experienced in the past 12 months indicated that the common precipitating factor for all forms of violence was working with patients with cognitive impairment. Working part-time appeared to be a protective factor. Other factors, such as the practitioners' gender, practice setting, or clinic location showed variations among the diverse types of violence. Conclusions: The exposure to type II workplace violence within the last 12 months among physical therapists in Spain (Europe) is not so high as in some other world regions. Various individual, clinical, and professional/organizational risk factors have been identified in connection with type II workplace violence. Further research is warranted to compare the violence experienced once the COVID pandemic has subsided.This research/work is part of the grant PID2020-113578RB-I00, funded by MCIN/AEI/10.13039/501100011033/. It has been sup- ported by the Xunta de Galicia (Grupos de Referencia Competitiva ED431C-2020/14) and by CITIC, which is supported by Xunta de Galicia, convenio de colaboración entre la Consellería de Cultura, Educación, Formación Profesional e Universidades y las universi- dades gallegas para el refuerzo de los centros de investigación del Sistema Universitario de Galicia (CIGUS). Finally, it has also been supported by the GCCPS and by Universidade da Coruña/CISUG, the latter funding for open access charge.info:eu-repo/grantAgreement/AEI/Programa Estatal de I+D+i Orientada a los Retos de la Sociedad/PID2020-113578RB-I00/ES/METODOS ESTADISTICOS FLEXIBLES EN CIENCIA DE DATOS PARA DATOS COMPLEJOS Y DE GRAN VOLUMEN: TEORIA Y APLICACIONESXunta de Galicia; ED431C-2020%2F1

    A distance correlation approach for optimum multiscale selection in 3D point cloud classification

    Get PDF
    [Abstract] Supervised classification of 3D point clouds using machine learning algorithms and handcrafted local features as covariates frequently depends on the size of the neighborhood (scale) around each point used to determine those features. It is therefore crucial to estimate the scale or scales providing the best classification results. In this work, we propose three methods to estimate said scales, all of them based on calculating the maximum values of the distance correlation (DC) functions between the features and the label assigned to each point. The performance of the methods was tested using simulated data, and the method presenting the best results was applied to a benchmark data set for point cloud classification. This method consists of detecting the local maximums of DC functions previously smoothed to avoid choosing scales that are very close to each other. Five different classifiers were used: linear discriminant analysis, support vector machines, random forest, multinomial logistic regression and multilayer perceptron neural network. The results obtained were compared with those from other strategies available in the literature, being favorable to our approach.Xunta de Galicia; ED431G 2019/01Ministerio de Ciencia, Innovación y Universidades; MTM2016-76969-PXunta de Galicia; ED431C-2020-14MINECO/AEI/FEDER, UE; MTM2017-89422-

    Real-time predictive seasonal influenza model in Catalonia, Spain

    Get PDF
    Influenza surveillance is critical to monitoring the situation during epidemic seasons and predictive mathematic models may aid the early detection of epidemic patterns. The objective of this study was to design a real-time spatial predictive model of ILI (Influenza Like Illness) incidence rate in Catalonia using one- and two-week forecasts. The available data sources used to select explanatory variables to include in the model were the statutory reporting disease system and the sentinel surveillance system in Catalonia for influenza incidence rates, the official climate service in Catalonia for meteorological data, laboratory data and Google Flu Trend. Time series for every explanatory variable with data from the last 4 seasons (from 2010-2011 to 2013-2014) was created. A pilot test was conducted during the 2014-2015 season to select the explanatory variables to be included in the model and the type of model to be applied. During the 2015-2016 season a real-time model was applied weekly, obtaining the intensity level and predicted incidence rates with 95% confidence levels one and two weeks away for each health region. At the end of the season, the confidence interval success rate (CISR) and intensity level success rate (ILSR) were analysed. For the 2015-2016 season a CISR of 85.3% at one week and 87.1% at two weeks and an ILSR of 82.9% and 82% were observed, respectively. The model described is a useful tool although it is hard to evaluate due to uncertainty. The accuracy of prediction at one and two weeks was above 80% globally, but was lower during the peak epidemic period. In order to improve the predictive power, new explanatory variables should be included
    corecore