399 research outputs found

    Estimation de la fréquence instantanée des signaux FM par opérateur d'énergie Psi_B

    Get PDF
    Psi_B energy operator is an extension of the cross Teager-Kaiser energy operator which is an non-linear energy tracking operator to deal with complex signals and its usefulness for non-stationary signals analysis has been demonstrated. In this letter two new properties of Psi_B are established. The first property is the link between Psi_B and the dynamic signal which is a generalization of the Instantaneous Frequency (IF). The second property obtained for frequency modulated signals is a simple way to estimate the IF. These properties confirm the interest of Psi_B operator to track the non-stationary of a signal. Results of IF estimation in noisy environment of a non-linear FM signal are presented and comparison to Wigner-Ville distribution and Hilbert transform-based method is provided

    Estimation de l'enveloppe et de la fréquence locales par les opérateurs de Teager-Kaiser en interférométrie en lumière blanche.

    Get PDF
    In this work, a new method for surface extraction in white light scanning interferometry (WLSI) is introduced. The proposed extraction scheme is based on the Teager-Kaiser energy operator and its extended versions. This non-linear class of operators is helpful to extract the local instantaneous envelope and frequency of any narrow band AM-FM signal. Namely, the combination of the envelope and frequency information, allows effective surface extraction by an iterative re-estimation of the phase in association with a new correlation technique, based on a recent TK crossenergy operator. Through the experiments, it is shown that the proposed method produces substantially effective results in term of surface extraction compared to the peak fringe scanning technique, the five step phase shifting algorithm and the continuous wavelet transform based method. In addition, the results obtained show the robustness of the proposed method to noise and to the fluctuations of the carrier frequency

    Analyzing Heterogeneity In Neuroimaging With Probabilistic Multivariate Clustering Approaches

    Get PDF
    Automated quantitative neuroimaging analysis methods have been crucial in elucidating normal and pathological brain structure and function, and in building in vivo markers of disease and its progression. Commonly used methods can identify and precisely quantify subtle and spatially complex imaging patterns of brain change associated with brain diseases. However, the overarching premise of these methods is that the disease group is a homogeneous entity resulting from a single, unifying pathophysiological process that has a single imaging signature. This assumption ignores ample evidence for the heterogeneous nature of neurodegenerative diseases and neuropsychiatric disorders, resulting in incomplete or misleading descriptions. Accurate characterization of heterogeneity is important for deepening our understanding of neurobiological processes, thus leading to improved disease diagnosis and prognosis. In this thesis, we leveraged machine learning techniques to develop novel tools that can analyze the heterogeneity in both cross-sectional and longitudinal neuroimaging studies. Specifically, we developed a semi-supervised clustering method for characterizing heterogeneity in cross-sectional group comparison studies, where normal and patient populations are modeled as high-dimensional point distributions, and heterogeneous disease effects are captured by estimating multiple transformations that align the two distributions, while accounting for the effect of nuisance covariates. Moreover, toward dissecting the heterogeneity in longitudinal cohorts, we proposed a method which simultaneously fits multiple population longitudinal multivariate trajectories and clusters subjects into subgroups. Longitudinal trajectories are modeled using spatiotemporally regularized cubic splines, while clustering is performed by assigning subjects to the subgroup whose population trajectory best fits their data. The proposed tools were extensively validated using synthetic data. Importantly, they were applied to study the heterogeneity in large clinical neuroimaging cohorts. We identified four disease subtypes with distinct imaging signatures using data from Alzheimer’s Disease Neuroimaging Initiative, and revealed two subgroups with different longitudinal patterns using data from Baltimore Longitudinal Study on Aging. Critically, we were able to further characterize the subgroups in each of the studies by performing statistical analyses evaluating subgroup differences with additional information such as neurocognitive data. Our results demonstrate the strength of the developed methods, and may pave the road for a broader understanding of the complexity of brain aging and Alzheimer’s disease

    Machine learning model selection with multi-objective Bayesian optimization and reinforcement learning

    Get PDF
    A machine learning system, including when used in reinforcement learning, is usually fed with only limited data, while aimed at training a model with good predictive performance that can generalize to an underlying data distribution. Within certain hypothesis classes, model selection chooses a model based on selection criteria calculated from available data, which usually serve as estimators of generalization performance of the model. One major challenge for model selection that has drawn increasing attention is the discrepancy between the data distribution where training data is sampled from and the data distribution at deployment. The model can over-fit in the training distribution, and fail to extrapolate in unseen deployment distributions, which can greatly harm the reliability of a machine learning system. Such a distribution shift challenge can become even more pronounced in high-dimensional data types like gene expression data, functional data and image data, especially in a decentralized learning scenario. Another challenge for model selection is efficient search in the hypothesis space. Since training a machine learning model usually takes a fair amount of resources, searching for an appropriate model with favorable configurations is by inheritance an expensive process, thus calling for efficient optimization algorithms. To tackle the challenge of distribution shift, novel resampling methods for the evaluation of robustness of neural network was proposed, as well as a domain generalization method using multi-objective bayesian optimization in decentralized learning scenario and variational inference in a domain unsupervised manner. To tackle the expensive model search problem, combining bayesian optimization and reinforcement learning in an interleaved manner was proposed for efficient search in a hierarchical conditional configuration space. Additionally, the effectiveness of using multi-objective bayesian optimization for model search in a decentralized learning scenarios was proposed and verified. A model selection perspective to reinforcement learning was proposed with associated contributions in tackling the problem of exploration in high dimensional state action spaces and sparse reward. Connections between statistical inference and control was summarized. Additionally, contributions in open source software development in related machine learning sub-topics like feature selection and functional data analysis with advanced tuning method and abundant benchmarking were also made

    Recent advances in directional statistics

    Get PDF
    Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.Comment: 61 page

    Automated Detection of Electric Energy Consumption Load Profile Patterns

    Full text link
    [EN] Load profiles of energy consumption from smart meters are becoming more and more available, and the amount of data to analyse is huge. In order to automate this analysis, the application of state-of-the-art data mining techniques for time series analysis is reviewed. In particular, the use of dynamic clustering techniques to obtain and visualise temporal patterns characterising the users of electrical energy is deeply studied. The performed review can be used as a guide for those interested in the automatic analysis and groups of behaviour detection within load profile databases. Additionally, a selection of dynamic clustering algorithms have been implemented and the performances compared using an available electric energy consumption load profile database. The results allow experts to easily evaluate how users consume energy, to assess trends and to predict future scenarios.The data analysed has been facilitated by the Spanish Distributor Iberdrola Electrical Distribution S.A. as part of the research project GAD (Active Management of the Demand), national project by DEVISE 2010 funded by the INGENIIO 2010 program and the CDTI (Centre for Industrial Technology Development), Business Public Entity dependent of the Ministry of Economy and Competitiveness of the Government of Spain.Benítez, I.; Diez, J. (2022). Automated Detection of Electric Energy Consumption Load Profile Patterns. Energies. 15(6):1-26. https://doi.org/10.3390/en1506217612615

    Statistical methods for the analysis of RNA sequencing data

    Get PDF
    The next generation sequencing technology, RNA-sequencing (RNA-seq), has an increasing popularity over traditional microarrays in transcriptome analyses. Statistical methods used for gene expression analyses with these two technologies are different because the array-based technology measures intensities using continuous distributions, whereas RNA-seq provides absolute quantification of gene expression using counts of reads. There is a need for reliable statistical methods to exploit the information from the rapidly evolving sequencing technologies and limited work has been done on expression analysis of time-course RNA-seq data. In this dissertation, we propose a model-based clustering method for identifying gene expression patterns in time-course RNA-seq data. Our approach employs a longitudinal negative binomial mixture model to postulate the over-dispersed time-course gene count data. We also modify existing common initialization procedures to suit our model-based clustering algorithm. The effectiveness of the proposed methods is assessed using simulated data and is illustrated by real data from time-course genomic experiments. Another common issue in gene expression analysis is the presence of missing values in the datasets. Various treatments to missing values in genomic datasets have been developed but limited work has been done on RNA-seq data. In the current work, we examine the performance of various imputation methods and their impact on the clustering of time-course RNA-seq data. We develop a cluster-based imputation method which is specifically suitable for dealing with missing values in RNA-seq datasets. Simulation studies are provided to assess the performance of the proposed imputation approach

    Feature selection and modelling methods for microarray data from acute coronary syndrome

    Get PDF
    Acute coronary syndrome (ACS) represents a leading cause of mortality and morbidity worldwide. Providing better diagnostic solutions and developing therapeutic strategies customized to the individual patient represent societal and economical urgencies. Progressive improvement in diagnosis and treatment procedures require a thorough understanding of the underlying genetic mechanisms of the disease. Recent advances in microarray technologies together with the decreasing costs of the specialized equipment enabled affordable harvesting of time-course gene expression data. The high-dimensional data generated demands for computational tools able to extract the underlying biological knowledge. This thesis is concerned with developing new methods for analysing time-course gene expression data, focused on identifying differentially expressed genes, deconvolving heterogeneous gene expression measurements and inferring dynamic gene regulatory interactions. The main contributions include: a novel multi-stage feature selection method, a new deconvolution approach for estimating cell-type specific signatures and quantifying the contribution of each cell type to the variance of the gene expression patters, a novel approach to identify the cellular sources of differential gene expression, a new approach to model gene expression dynamics using sums of exponentials and a novel method to estimate stable linear dynamical systems from noisy and unequally spaced time series data. The performance of the proposed methods was demonstrated on a time-course dataset consisting of microarray gene expression levels collected from the blood samples of patients with ACS and associated blood count measurements. The results of the feature selection study are of significant biological relevance. For the first time is was reported high diagnostic performance of the ACS subtypes up to three months after hospital admission. The deconvolution study exposed features of within and between groups variation in expression measurements and identified potential cell type markers and cellular sources of differential gene expression. It was shown that the dynamics of post-admission gene expression data can be accurately modelled using sums of exponentials, suggesting that gene expression levels undergo a transient response to the ACS events before returning to equilibrium. The linear dynamical models capturing the gene regulatory interactions exhibit high predictive performance and can serve as platforms for system-level analysis, numerical simulations and intervention studies
    corecore