11,064 research outputs found

    Application of Subspace Clustering in DNA Sequence Analysis

    Get PDF
    Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. A series of experiments was performed to cluster randomly selected sequences. The experimental design allows for both false positives and false negatives, and estimates for the statistical significance are provided. The clustering results are consistent with the main hypothesis. A simple random mutation binary tree model is used to simulate speciation events that show the interdependence of the subspace rank versus time and mutation rates. The simple mutation model is found to be largely consistent with the observed subspace clustering singular value results. Our study indicates that the subspace clustering method may be applied in orthology analysis

    The discriminative functional mixture model for a comparative analysis of bike sharing systems

    Get PDF
    Bike sharing systems (BSSs) have become a means of sustainable intermodal transport and are now proposed in many cities worldwide. Most BSSs also provide open access to their data, particularly to real-time status reports on their bike stations. The analysis of the mass of data generated by such systems is of particular interest to BSS providers to update system structures and policies. This work was motivated by interest in analyzing and comparing several European BSSs to identify common operating patterns in BSSs and to propose practical solutions to avoid potential issues. Our approach relies on the identification of common patterns between and within systems. To this end, a model-based clustering method, called FunFEM, for time series (or more generally functional data) is developed. It is based on a functional mixture model that allows the clustering of the data in a discriminative functional subspace. This model presents the advantage in this context to be parsimonious and to allow the visualization of the clustered systems. Numerical experiments confirm the good behavior of FunFEM, particularly compared to state-of-the-art methods. The application of FunFEM to BSS data from JCDecaux and the Transport for London Initiative allows us to identify 10 general patterns, including pathological ones, and to propose practical improvement strategies based on the system comparison. The visualization of the clustered data within the discriminative subspace turns out to be particularly informative regarding the system efficiency. The proposed methodology is implemented in a package for the R software, named funFEM, which is available on the CRAN. The package also provides a subset of the data analyzed in this work.Comment: Published at http://dx.doi.org/10.1214/15-AOAS861 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Estimating Time-Varying Effective Connectivity in High-Dimensional fMRI Data Using Regime-Switching Factor Models

    Full text link
    Recent studies on analyzing dynamic brain connectivity rely on sliding-window analysis or time-varying coefficient models which are unable to capture both smooth and abrupt changes simultaneously. Emerging evidence suggests state-related changes in brain connectivity where dependence structure alternates between a finite number of latent states or regimes. Another challenge is inference of full-brain networks with large number of nodes. We employ a Markov-switching dynamic factor model in which the state-driven time-varying connectivity regimes of high-dimensional fMRI data are characterized by lower-dimensional common latent factors, following a regime-switching process. It enables a reliable, data-adaptive estimation of change-points of connectivity regimes and the massive dependencies associated with each regime. We consider the switching VAR to quantity the dynamic effective connectivity. We propose a three-step estimation procedure: (1) extracting the factors using principal component analysis (PCA) and (2) identifying dynamic connectivity states using the factor-based switching vector autoregressive (VAR) models in a state-space formulation using Kalman filter and expectation-maximization (EM) algorithm, and (3) constructing the high-dimensional connectivity metrics for each state based on subspace estimates. Simulation results show that our proposed estimator outperforms the K-means clustering of time-windowed coefficients, providing more accurate estimation of regime dynamics and connectivity metrics in high-dimensional settings. Applications to analyzing resting-state fMRI data identify dynamic changes in brain states during rest, and reveal distinct directed connectivity patterns and modular organization in resting-state networks across different states.Comment: 21 page

    Kernel discriminant analysis and clustering with parsimonious Gaussian process models

    Full text link
    This work presents a family of parsimonious Gaussian process models which allow to build, from a finite sample, a model-based classifier in an infinite dimensional space. The proposed parsimonious models are obtained by constraining the eigen-decomposition of the Gaussian processes modeling each class. This allows in particular to use non-linear mapping functions which project the observations into infinite dimensional spaces. It is also demonstrated that the building of the classifier can be directly done from the observation space through a kernel function. The proposed classification method is thus able to classify data of various types such as categorical data, functional data or networks. Furthermore, it is possible to classify mixed data by combining different kernels. The methodology is as well extended to the unsupervised classification case. Experimental results on various data sets demonstrate the effectiveness of the proposed method
    • …
    corecore