7 research outputs found

    Functional Data Analysis and its application to cancer data

    Get PDF
    The objective of the current work is to develop novel procedures for the analysis of functional data and apply them for investigation of gender disparity in survival of lung cancer patients. In particular, we use the time-dependent Cox proportional hazards model where the clinical information is incorporated via time-independent covariates, and the current age is modeled using its expansion over wavelet basis functions. We developed computer algorithms and applied them to the data set which is derived from Florida Cancer Data depository data set (all personal information which allows to identify patients was eliminated). We also studied the problem of estimation of a continuous matrix-variate function of low rank. We have constructed an estimator of such function using its basis expansion and subsequent solution of an optimization problem with the Schattennorm penalty. We derive an oracle inequality for the constructed estimator, study its properties via simulations and apply the procedure to analysis of Dynamic Contrast medical imaging data

    Advanced Statistical Techniques for Noninvasive Hyperglycemic States Detection in Mice Using Millimeter-Wave Spectroscopy

    Get PDF
    In this article, we discuss the use of advanced statistical techniques (functional data analysis) in millimeter-wave (mm-wave) spectroscopy for biomedical applications. We employ a W-band transmit-receive unit with a reference channel to acquire spectral data. The choice of the W-band is based on a tradeoff between penetration through the skin providing an upper bound for the frequencies and spectral content across the band. The data obtained are processed using functional principal component logit regression (FPCLoR), which enables to obtain a predictive model for sustained hyperglycemia, typically associated with diabetes. The predictions are based on the transmission data from noninvasive mm-wave spectrometer at W-band. We show that there exists a frequency range most suitable for identification, classification, and prediction of sustained hyperglycemia when evaluating the functional parameter of the functional logit model (β). This allows for the optimization of the spectroscopic instrument in the aim to obtain a compact and potential low-cost noninvasive instrument for hyperglycemia assessment. Furthermore, we also demonstrate that the statistical tools alleviate the problem of calibration, which is a serious obstacle in similar measurements at terahertz and IR frequencies

    High-Dimensional Linear and Functional Analysis of Multivariate Grapevine Data

    Get PDF
    Variable selection plays a major role in multivariate high-dimensional statistical modeling. Hence, we need to select a consistent model, which avoids overfitting in prediction, enhances model interpretability and identifies relevant variables. We explore various continuous, nearly unbiased, sparse and accurate technique of linear model using coefficients paths like penalized maximum likelihood and nonconvex penalties, and iterative Sure Independence Screening (SIS). The convex penalized (pseudo-) likelihood approach based on the elastic net uses a mixture of the â„“1 (Lasso) and â„“2 (ridge regression) simultaneously achieve automatic variable selection, continuous shrinkage, and selection of the groups of correlated variables. Variable selection using coefficients paths for minimax concave penalty (MCP), starts applying penalization at the same rate as Lasso, and then smoothly relaxes the rate down to zero as the absolute value of the coefficient increases. The sure screening method is based on correlation learning, which computes component wise estimators using AIC for tuning the regularization parameter of the penalized likelihood Lasso. To reflect the eternal nature of spectral data, we use the Functional Data approach by approximating the finite linear combination of basis functions using B-splines. MCP, SIS and Functional regression are based on the intuition that the predictors are independent. However, high-dimensional grapevine dataset suffers from ill-conditioning of the covariance matrix due to multicollinearity. Under collinearity, the Elastic-Net Regularization path via Coordinate Descent yields the best result to control the sparsity of the model and cross-validation to reduce bias in variable selection. Iterative stepwise multiple linear regression reduces complexity and enhances the predictability of the model by selecting only significant predictors

    Estimation de paramètres et planification d'expériences adaptée aux problèmes de cinétique - Application à la dépollution des fumées en sortie des moteurs

    Get PDF
    Les modèles physico-chimiques destinés à représenter la réalité expérimentale peuvent se révéler inadéquats. C'est le cas du piège à oxyde d'azote, utilisé comme support applicatif de notre thèse, qui est un système catalytique traitant les émissions polluantes du moteur Diesel. Les sorties sont des courbes de concentrations des polluants, qui sont des données fonctionnelles, dépendant de concentrations initiales scalaires.L'objectif initial de cette thèse est de proposer des plans d'expériences ayant un sens pour l'utilisateur. Cependant les plans d'expérience s'appuyant sur des modèles, l'essentiel du travail a conduit à proposer une représentation statistique tenant compte des connaissances des experts, et qui permette de construire ce plan.Trois axes de recherches ont été explorés. Nous avons d'abord considéré une modélisation non fonctionnelle avec le recours à la théorie du krigeage. Puis, nous avons pris en compte la dimension fonctionnelle des réponses, avec l'application et l'extension des modèles à coefficients variables. Enfin en repartant du modèle initial, nous avons fait dépendre les paramètres cinétiques des entrées (scalaires) à l'aide d'une représentation non paramétrique.Afin de comparer les méthodes, il a été nécessaire de mener une campagne expérimentale, et nous proposons une démarche de plan exploratoire, basée sur l entropie maximale.Physico-chemical models designed to represent experimental reality may prove to be inadequate. This is the case of nitrogen oxide trap, used as an application support of our thesis, which is a catalyst system treating the emissions of the diesel engine. The outputs are the curves of concentrations of pollutants, which are functional data, depending on scalar initial concentrations.The initial objective of this thesis is to propose experiental design that are meaningful to the user. However, the experimental design relying on models, most of the work has led us to propose a statistical representation taking into account the expert knowledge, and allows to build this plan.Three lines of research were explored. We first considered a non-functional modeling with the use of kriging theory. Then, we took into account the functional dimension of the responses, with the application and extension of varying coefficent models. Finally, starting again from the original model, we developped a model depending on the kinetic parameters of the inputs (scalar) using a nonparametric representation.To compare the methods, it was necessary to conduct an experimental campaign, and we propose an exploratory design approach, based on maximum entropy.ST ETIENNE-ENS des Mines (422182304) / SudocSudocFranceF

    Randomized Functional Data Analysis and its Application in Astronomy

    No full text
    Functional data analysis (FDA) methods have computational and theoretical appeals for some high dimensional data, but lack the scalability to modern large sample datasets. Covariance operators are fundamental concepts and modeling tools for many FDA methods, such as functional principal component analysis. However, the empirical (or estimated) covariance operator becomes too costly to compute when the functional dataset gets big. We study a randomized algorithm for covariance operator estimation. The algorithm works by sampling and rescaling observations from the large functional data collection to form a sketch of much smaller size, and performs computation on the sketch to obtain the subsampled empirical covariance operator. The proposed algorithm is theoretically justified via non-asymptotic bounds between the subsampled and the full-sample empirical covariance operator in terms of the Hilbert-Schmidt norm and operator norm. It is shown that the optimal sampling probability that minimizes the expected squared Hilbert-Schmidt norm of the subsampling error is determined by the norm of each function. Simulated and real data examples are used to illustrate the effectiveness of the proposed algorithm. The idea of randomization is then used in a Type Ia supernova (SN Ia) spectrophotometric data modeling problem where we develop the Independent Component Estimation (ICE) method for sparse and irregularly spaced spectrophotometric data of Type Ia supernovae (SNe Ia) using functional principal component analysis (FPCA) and independent component analysis (ICA) to explore the separation of SN Ia intrinsic properties and interstellar dust reddening effect. This separation makes it possible to construct the intrinsic spectral energy distribution (SED) manifolds of SNe Ia, which facilitates supernova studies and their cosmological application

    Functional data analysis and its application to local damage detection

    No full text
    Vibration signals sampled with a high frequency constitute a basic source of information about machine behaviour. Few minutes of signal observations easily translate into several millions of data points to be processed with the purpose of the damage detection. Big dimensionality of data sets creates serious difficulties with detection of frequencies specific for a particular local damage. In view of that, traditional spectral analysis tools like spectrograms should be improved to efficiently identify the frequency bands where the impulsivity is most marked (the so-called informative frequency bands or IFB). We propose the functional approach known in modern time series analysis to overcome these difficulties. We will process data sets as collections of random functions to apply techniques of the functional data analysis. As a result, we will be able to represent massive data sets through few real-valued functions and corresponding parameters, which are the eigenfunctions and eigenvalues of the covariance operator describing the signal. We will also propose a new technique based on the bootstrap resampling to choose the optimal dimension in representing big data sets that we process. Using real data generated by a gearbox and a wheel bearings we will show how these techniques work in practice
    corecore