49 research outputs found
Advances in functional regression and classification models
Functional data analysis (FDA) has become a very active field of research in the last few years because it appears naturally in most scientific fields: energy (electricity price curves), environment (curves of pollutant levels), chemometrics (spectrometric data), etc. This thesis is a compendium of the following publications: 1) "Statistical computing in functional data analysis: the R package fda.usc" published in the J STAT SOFTW, the core advances of this paper was to propose a common framework for FDA in R. 2) "Predicting seasonal influenza transmission using functional regression models with temporal dependence" published in PLoS ONE proposes an extension of GLS model to functional case. 3) "The DD--classifier in the functional setting" published in TEST extends the DD-classifier using information derived of the functional depth. 4) "Determining optimum wavelengths for leaf water content estimation from reflectance: A distance correlation approach" published in CHEMOMETR INTELL LAB SYST studies the utility of distance correlation as a method to select impact points in functional regression. 5) "Variable selection in Functional Additive Regression Models", in Comput Stat proposes a variable selection algorithm in the case of mixed predictors (scalar, functional, etc.)
Statistical Computing in Functional Data Analysis: The R Package fda.usc
This paper is devoted to the R package fda.usc which includes some utilities for functional data analysis. This package carries out exploratory and descriptive analysis of functional data analyzing its most important features such as depth measurements or functional outliers detection, among others. The R package fda.usc also includes functions to compute functional regression models, with a scalar response and a functional explanatory data via non-parametric functional regression, basis representation or functional principal components analysis. There are natural extensions such as functional linear models and semi-functional partial linear models, which allow non-functional covariates and factors and make predictions. The functions of this package complement and incorporate the two main references of functional data analysis: The R package fda and the functions implemented by Ferraty and Vieu (2006)S
The DD-classifier in the functional setting
The Maximum Depth was the first attempt to use data depths instead of
multivariate raw data to construct a classification rule. Recently, the
DD-classifier has solved several serious limitations of the Maximum Depth
classifier but some issues still remain. This paper is devoted to extending the
DD-classifier in the following ways: first, to surpass the limitation of the
DD-classifier when more than two groups are involved. Second to apply regular
classification methods (like NN, linear or quadratic classifiers, recursive
partitioning,...) to DD-plots to obtain useful insights through the diagnostics
of these methods. And third, to integrate different sources of information
(data depths or multivariate functional data) in a unified way in the
classification procedure. Besides, as the DD-classifier trick is especially
useful in the functional framework, an enhanced revision of several functional
data depths is done in the paper. A simulation study and applications to some
classical real datasets are also provided showing the power of the new
proposal.Comment: 29 pages, 6 figures, 6 tables, Supplemental R Code and Dat
Autoproblem : motor generador de problemes d'estadistica bà sica
Els objectius del projecte són:
-Crear un conjunt d'aplicacions portables a qualsevol ordinador, per tal de que arribi al mà xim d'usuaris possibles sigui quina sigui la seva situació geografica, si és a casa o si és a la facultat. D'aquesta manera es vol crear una aplicació que es pogués utilitiar com a material de classes de prà ctiques, o com a material d'examen, o simplement perque l'estudiant pugui estudiar l'assignatura.
- Creació d'un conjunt de rutines de simulació de dades aleatóries. D'aquesta manera estudiants que executen el mateix problema alhora tenen dades diferents, i per tant poden arribar a conclusions diferents.
- La creació d'un programa d'edició de problemes senzill i fà cÃl d'utilitzar per part del professor.
- Crear una aplicació de resolució seqüencial i guiada deis problemes plantejats pel professor. Aquest és un dels objectius més importants, doncs es vol que l'estudiant resolgui el problema plantejat per l'aplicació de manera que posi en prà ctica tot el que ha après a la classe i a més, que el resolgui d'una manera seqüencial.
La creació d'un sistema de control deis estudiants que executin els problemes a la pà gina web. Aquest objectiu assolir aquest objectiu és important de cara a fer prà ctiques o exà mens amb l'aplicació de l'estudiant. D'aquesta manera el professor pot conèixer l'evolució deIs seus estudiants al llarg del quadrimestres en que es faci l'assÃgnatura
Variable selection in Functional Additive Regression Models
This is a post-peer-review, pre-copyedit version of an chapter published in Functional Statistics and Related Fields. The final authenticated version is available online at: https://doi.org/10.1007/978-3-319-55846-2_15This paper considers the problem of variable selection when some of the variables have a functional nature and can be mixed with other type of variables (scalar, multivariate, directional, etc). Our proposal begins with a simple null model and sequentially selects a new variable to be incorporated into the model. For the sake of simplicity, this paper only uses additive models. However, the proposed algorithm may assess the type of contribution (linear, non linear, …) of each variable. The algorithm have showed quite promising results when applied to real data setsThe authors acknowledge financial support from Ministerio de EconomÃa y Competitividad grant MTM2013-41383-
Functional Location-Scale Model to Forecast Bivariate Pollution Episodes
Predicting anomalous emission of pollutants into the atmosphere well in advance is crucial for industries emitting such elements, since it allows them to take corrective measures aimed to avoid such emissions and their consequences. In this work, we propose a functional location-scale model to predict in advance pollution episodes where two pollutants are involved. Functional generalized additive models (FGAMs) are used to estimate the means and variances of the model, as well as the correlation between both pollutants. The method not only forecasts the concentrations of both pollutants, it also estimates an uncertainty region where the concentrations of both pollutants should be located, given a specific level of uncertainty. The performance of the model was evaluated using real data of SO 2 and NO x emissions from a coal-fired power station, obtaining good resultsThe authors acknowledge financial support from: (1) UO-Proyecto Uni-Ovi (PAPI-18-GR-2014-0014), (2) Project MTM2016-76969-P from Ministerio de EconomÃa y Competitividad—Agencia Estatal de Investigación and European Regional Development Fund (ERDF) and IAP network StUDyS from Belgian Science Policy, (3) Nuevos avances metodológicos y computacionales en estadÃstica no-paramétrica y semiparamétrica—Ministerio de Ciencia e Investigación (MTM2017-89422-P)S
Functional Regression Models with Functional Response: New Approaches and a Comparative Study
This paper proposes three new approaches for additive functional regression
models with functional responses. The first one is a reformulation of the
linear regression model, and the last two are on the yet scarce case of
additive nonlinear functional regression models. Both proposals are based on
extensions of similar models for scalar responses. One of our nonlinear models
is based on constructing a Spectral Additive Model (the word "Spectral" refers
to the representation of the covariates in an \mcal{L}_2 basis), which is
restricted (by construction) to Hilbertian spaces. The other one extends the
kernel estimator, and it can be applied to general metric spaces since it is
only based on distances. We include our new approaches as well as real datasets
in an R package. The performances of the new proposals are compared with
previous ones, which we review theoretically and practically in this paper. The
simulation results show the advantages of the nonlinear proposals and the small
loss of efficiency when the simulation scenario is truly linear. Finally, the
supplementary material provides a visualization tool for checking the linearity
of the relationship between a single covariate and the response.Comment: Submitte
Prevalence and associated factors of sexual, psychological, and physical violence among physical therapists in their clinical role in Spain: a national web-based cross-sectional survey
Observational study[Abstract]
Objectives: To determine the extent of career-long and 12-month exposure to sexual, physical, and psychological/verbal violence committed by patients or their companions among physical therapists in Spain. Additionally, to identify the factors associated with such exposure.
Methods: This study employed an observational cross-sectional approach. Initially, a questionnaire was developed and validated using a convenience sample. Subsequently, it was distributed via email to all physical therapists registered in Spain in the first quarter of 2022. Individual risk models were created for each type of violence experienced within the past 12 months.
Results: The prevalence of violence encountered by physical therapists throughout their careers was 47.9% for sexual violence, 42.7% for psychological/verbal abuse, and 17.6% for physical abuse. Lower values were observed within the last 12 months (13.4%, 15.8%, and 5.2%, respectively). Statistical risk modeling for each type of violence experienced in the past 12 months indicated that the common precipitating factor for all forms of violence was working with patients with cognitive impairment. Working part-time appeared to be a protective factor. Other factors, such as the practitioners' gender, practice setting, or clinic location showed variations among the diverse types of violence.
Conclusions: The exposure to type II workplace violence within the last 12 months among physical therapists in Spain (Europe) is not so high as in some other world regions. Various individual, clinical, and professional/organizational risk factors have been identified in connection with type II workplace violence. Further research is warranted to compare the violence experienced once the COVID pandemic has subsided.This research/work is part of the grant PID2020-113578RB-I00,
funded by MCIN/AEI/10.13039/501100011033/. It has been sup-
ported by the Xunta de Galicia (Grupos de Referencia Competitiva
ED431C-2020/14) and by CITIC, which is supported by Xunta de
Galicia, convenio de colaboración entre la ConsellerÃa de Cultura,
Educación, Formación Profesional e Universidades y las universi-
dades gallegas para el refuerzo de los centros de investigación del
Sistema Universitario de Galicia (CIGUS). Finally, it has also been
supported by the GCCPS and by Universidade da Coruña/CISUG,
the latter funding for open access charge.info:eu-repo/grantAgreement/AEI/Programa Estatal de I+D+i Orientada a los Retos de la Sociedad/PID2020-113578RB-I00/ES/METODOS ESTADISTICOS FLEXIBLES EN CIENCIA DE DATOS PARA DATOS COMPLEJOS Y DE GRAN VOLUMEN: TEORIA Y APLICACIONESXunta de Galicia; ED431C-2020%2F1
A distance correlation approach for optimum multiscale selection in 3D point cloud classification
[Abstract] Supervised classification of 3D point clouds using machine learning algorithms and handcrafted local features as covariates frequently depends on the size of the neighborhood (scale) around each point used to determine those features. It is therefore crucial to estimate the scale or scales providing the best classification results. In this work, we propose three methods to estimate said scales, all of them based on calculating the maximum values of the distance correlation (DC) functions between the features and the label assigned to each point. The performance of the methods was tested using simulated data, and the method presenting the best results was applied to a benchmark data set for point cloud classification. This method consists of detecting the local maximums of DC functions previously smoothed to avoid choosing scales that are very close to each other. Five different classifiers were used: linear discriminant analysis, support vector machines, random forest, multinomial logistic regression and multilayer perceptron neural network. The results obtained were compared with those from other strategies available in the literature, being favorable to our approach.Xunta de Galicia; ED431G 2019/01Ministerio de Ciencia, Innovación y Universidades; MTM2016-76969-PXunta de Galicia; ED431C-2020-14MINECO/AEI/FEDER, UE; MTM2017-89422-
Real-time predictive seasonal influenza model in Catalonia, Spain
Influenza surveillance is critical to monitoring the situation during epidemic seasons and predictive mathematic models may aid the early detection of epidemic patterns. The objective of this study was to design a real-time spatial predictive model of ILI (Influenza Like Illness) incidence rate in Catalonia using one- and two-week forecasts. The available data sources used to select explanatory variables to include in the model were the statutory reporting disease system and the sentinel surveillance system in Catalonia for influenza incidence rates, the official climate service in Catalonia for meteorological data, laboratory data and Google Flu Trend. Time series for every explanatory variable with data from the last 4 seasons (from 2010-2011 to 2013-2014) was created. A pilot test was conducted during the 2014-2015 season to select the explanatory variables to be included in the model and the type of model to be applied. During the 2015-2016 season a real-time model was applied weekly, obtaining the intensity level and predicted incidence rates with 95% confidence levels one and two weeks away for each health region. At the end of the season, the confidence interval success rate (CISR) and intensity level success rate (ILSR) were analysed. For the 2015-2016 season a CISR of 85.3% at one week and 87.1% at two weeks and an ILSR of 82.9% and 82% were observed, respectively. The model described is a useful tool although it is hard to evaluate due to uncertainty. The accuracy of prediction at one and two weeks was above 80% globally, but was lower during the peak epidemic period. In order to improve the predictive power, new explanatory variables should be included