2,820 research outputs found

    Warped Functional Analysis of Variance

    Full text link
    This article presents an Analysis of Variance model for functional data that explicitly incorporates phase variability through a time-warping component, allowing for a unified approach to estimation and inference in presence of amplitude and time variability. The focus is on single-random-factor models but the approach can be easily generalized to more complex ANOVA models. The behavior of the estimators is studied by simulation, and an application to the analysis of growth curves of flour beetles is presented. Although the model assumes a smooth latent process behind the observed trajectories, smoothness of the observed data is not required; the method can be applied to the sparsely observed data that is often encountered in longitudinal studies

    Comparison of Clustering Methods for Time Course Genomic Data: Applications to Aging Effects

    Full text link
    Time course microarray data provide insight about dynamic biological processes. While several clustering methods have been proposed for the analysis of these data structures, comparison and selection of appropriate clustering methods are seldom discussed. We compared 33 probabilistic based clustering methods and 33 distance based clustering methods for time course microarray data. Among probabilistic methods, we considered: smoothing spline clustering also known as model based functional data analysis (MFDA), functional clustering models for sparsely sampled data (FCM) and model-based clustering (MCLUST). Among distance based methods, we considered: weighted gene co-expression network analysis (WGCNA), clustering with dynamic time warping distance (DTW) and clustering with autocorrelation based distance (ACF). We studied these algorithms in both simulated settings and case study data. Our investigations showed that FCM performed very well when gene curves were short and sparse. DTW and WGCNA performed well when gene curves were medium or long (>=10>=10 observations). SSC performed very well when there were clusters of gene curves similar to one another. Overall, ACF performed poorly in these applications. In terms of computation time, FCM, SSC and DTW were considerably slower than MCLUST and WGCNA. WGCNA outperformed MCLUST by generating more accurate and biological meaningful clustering results. WGCNA and MCLUST are the best methods among the 6 methods compared, when performance and computation time are both taken into account. WGCNA outperforms MCLUST, but MCLUST provides model based inference and uncertainty measure of clustering results

    On the estimation of variance parameters in non-standard generalised linear mixed models: Application to penalised smoothing

    Get PDF
    We present a novel method for the estimation of variance parameters in generalised linear mixed models. The method has its roots in Harville (1977)'s work, but it is able to deal with models that have a precision matrix for the random-effect vector that is linear in the inverse of the variance parameters (i.e., the precision parameters). We call the method SOP (Separation of Overlapping Precision matrices). SOP is based on applying the method of successive approximations to easy-to-compute estimate updates of the variance parameters. These estimate updates have an appealing form: they are the ratio of a (weighted) sum of squares to a quantity related to effective degrees of freedom. We provide the sufficient and necessary conditions for these estimates to be strictly positive. An important application field of SOP is penalised regression estimation of models where multiple quadratic penalties act on the same regression coefficients. We discuss in detail two of those models: penalised splines for locally adaptive smoothness and for hierarchical curve data. Several data examples in these settings are presented.MTM2014-55966-P MTM2014-52184-

    On the estimation of variance parameters in non-standard generalised linear mixed models: application to penalised smoothing

    Get PDF
    We present a novel method for the estimation of variance parameters in generalised linear mixed models. The method has its roots in Harville (J Am Stat Assoc 72(358):320-338, 1977)'s work, but it is able to deal with models that have a precision matrix for the random effect vector that is linear in the inverse of the variance parameters (i.e., the precision parameters). We call the method SOP (separation of overlapping precision matrices). SOP is based on applying the method of successive approximations to easy-to-compute estimate updates of the variance parameters. These estimate updates have an appealing form: they are the ratio of a (weighted) sum of squares to a quantity related to effective degrees of freedom. We provide the sufficient and necessary conditions for these estimates to be strictly positive. An important application field of SOP is penalised regression estimation of models where multiple quadratic penalties act on the same regression coefficients. We discuss in detail two of those models: penalised splines for locally adaptive smoothness and for hierarchical curve data. Several data examples in these settings are presented.This research was supported by the Basque Government through the BERC 2018-2021 program and by Spanish Ministry of Economy and Competitiveness MINECO through BCAM Severo Ochoa excellence accreditation SEV-2013-0323 and through projects MTM2017-82379-R funded by (AEI/FEDER, UE) and acronym “AFTERAM”, MTM2014-52184-P and MTM2014-55966-P. The MRI/DTI data were collected at Johns Hopkins University and the Kennedy-Krieger Institute. We are grateful to Pedro Caro and Iain Currie for useful discussions, to Martin Boer and Cajo ter Braak for the detailed reading of the paper and their many suggestions, and to Bas Engel for sharing with us his knowledge. We are also grateful to the two peer referees for their constructive comments of the paper

    Spatio‑temporal modelling of high‑throughput phenotyping data

    Get PDF
    High throughput phenotyping (HTP) platforms and devices are increasingly used to characterise growth and developmental processes for large sets of plant genotypes. This dissertation is motivated by the need to accurately estimate genetic effects over time when analysing data from such HTP experiments. The HTP data we deal with here are characterised by phenotypic traits measured multiple times in the presence of spatial and temporal noise and a hierarchical organisation at three levels (populations, genotypes within populations, and plants within genotypes). The challenge is to balance efficient statistical models and com- putational solutions to deal with the complexity and dimensionality of the experimental data. To that aim, we propose two strategies. The first proposal divides the problem into two stages. The first stage (spatial model) focuses on correcting the phenotypic data for experimental design factors and spatial variation, while the second stage (hierarchical longitudinal model) aims to estimate the evolution over time of the genetic signal. The second proposal is to face the problem simultaneously (one-stage approach). That is, mod- elling the longitudinal evolution of the genetic effect on a given phenotypic trait while accounting for the temporal and spatial effects of environmental and design factors (spatio-temporal hierarchical model). We follow the same modelling philosophy throughout our work and propose multidimensional P-spline-based hierarchical approaches. We provide the user with appealing tools that take advantage of the sparse model matrices structure to reduce computational complexity. All our codes are publicly available on the R-package statgenHTP and https://gitlab.bcamath.org/dperez/htp_one_stage_approach. We illustrate the performance of our methods using spatio-temporal simulated data and data from the PhenoArch greenhouse platform at INRAE Montpellier and the outdoor Field Phenotyping platform at ETH Zürich. In the plant breeding context, we show how to extract new time-independent phenotypes for genomic selection purposes.MTM2017-82379-R BERC 2018-2021 BERC 2022-2025 SEV-2017-0718 CEX2021-001142-S/MICIN/AEI/10.13039/50110001103

    On the automated extraction of regression knowledge from databases

    Get PDF
    The advent of inexpensive, powerful computing systems, together with the increasing amount of available data, conforms one of the greatest challenges for next-century information science. Since it is apparent that much future analysis will be done automatically, a good deal of attention has been paid recently to the implementation of ideas and/or the adaptation of systems originally developed in machine learning and other computer science areas. This interest seems to stem from both the suspicion that traditional techniques are not well-suited for large-scale automation and the success of new algorithmic concepts in difficult optimization problems. In this paper, I discuss a number of issues concerning the automated extraction of regression knowledge from databases. By regression knowledge is meant quantitative knowledge about the relationship between a vector of predictors or independent variables (x) and a scalar response or dependent variable (y). A number of difficulties found in some well-known tools are pointed out, and a flexible framework avoiding many such difficulties is described and advocated. Basic features of a new tool pursuing this direction are reviewed

    Cuckoo Search Algorithm with Lévy Flights for Global-Support Parametric Surface Approximation in Reverse Engineering

    Get PDF
    This paper concerns several important topics of the Symmetry journal, namely, computer-aided design, computational geometry, computer graphics, visualization, and pattern recognition. We also take advantage of the symmetric structure of the tensor-product surfaces, where the parametric variables u and v play a symmetric role in shape reconstruction. In this paper we address the general problem of global-support parametric surface approximation from clouds of data points for reverse engineering applications. Given a set of measured data points, the approximation is formulated as a nonlinear continuous least-squares optimization problem. Then, a recent metaheuristics called Cuckoo Search Algorithm (CSA) is applied to compute all relevant free variables of this minimization problem (namely, the data parameters and the surface poles). The method includes the iterative generation of new solutions by using the Lévy flights to promote the diversity of solutions and prevent stagnation. A critical advantage of this method is its simplicity: the CSA requires only two parameters, many fewer than any other metaheuristic approach, so the parameter tuning becomes a very easy task. The method is also simple to understand and easy to implement. Our approach has been applied to a benchmark of three illustrative sets of noisy data points corresponding to surfaces exhibiting several challenging features. Our experimental results show that the method performs very well even for the cases of noisy and unorganized data points. Therefore, the method can be directly used for real-world applications for reverse engineering without further pre/post-processing. Comparative work with the most classical mathematical techniques for this problem as well as a recent modification of the CSA called Improved CSA (ICSA) is also reported. Two nonparametric statistical tests show that our method outperforms the classical mathematical techniques and provides equivalent results to ICSA for all instances in our benchmark.This research work has received funding from the project PDE-GIR (Partial Differential Equations for Geometric modelling, Image processing, and shape Reconstruction) of the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie Grant agreement No. 778035, the Spanish Ministry of Economy and Competitiveness (Computer Science National Program) under Grant #TIN2017-89275-R of the Agencia Estatal de Investigación and European Funds FEDER (AEI/FEDER, UE), and the project #JU12, jointly supported by public body SODERCAN of the Regional Government of Cantabria and European Funds FEDER (SODERCAN/FEDER UE). We also thank Toho University, Nihon University, and the Symmetry 2018, 10, 58 23 of 25 University of Cantabria for their support to conduct this research wor
    corecore