12 research outputs found

    Multi-membership gene regulation in pathway based microarray analysis

    Get PDF
    This article is available through the Brunel Open Access Publishing Fund. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Background: Gene expression analysis has been intensively researched for more than a decade. Recently, there has been elevated interest in the integration of microarray data analysis with other types of biological knowledge in a holistic analytical approach. We propose a methodology that can be facilitated for pathway based microarray data analysis, based on the observation that a substantial proportion of genes present in biochemical pathway databases are members of a number of distinct pathways. Our methodology aims towards establishing the state of individual pathways, by identifying those truly affected by the experimental conditions based on the behaviour of such genes. For that purpose it considers all the pathways in which a gene participates and the general census of gene expression per pathway. Results: We utilise hill climbing, simulated annealing and a genetic algorithm to analyse the consistency of the produced results, through the application of fuzzy adjusted rand indexes and hamming distance. All algorithms produce highly consistent genes to pathways allocations, revealing the contribution of genes to pathway functionality, in agreement with current pathway state visualisation techniques, with the simulated annealing search proving slightly superior in terms of efficiency. Conclusions: We show that the expression values of genes, which are members of a number of biochemical pathways or modules, are the net effect of the contribution of each gene to these biochemical processes. We show that by manipulating the pathway and module contribution of such genes to follow underlying trends we can interpret microarray results centred on the behaviour of these genes.The work was sponsored by the studentship scheme of the School of Information Systems, Computing and Mathematics, Brunel Universit

    Explosion gravitation field algorithm with dust sampling for unconstrained optimization

    Get PDF
    This research was funded by the National Natural Science Foundation of China (Nos. 61572227, 61772227, 61702214), the Development Project of Jilin Province of China (Nos 20170101006JC, 20180414012GH, 20170203002GX, 20190201293JC), Zhuhai Premier-Discipline Enhancement Scheme, China (Grant 2015YXXK02) and Guangdong Premier Key-Discipline Enhancement Scheme, China (Grant 2016GDYSZDXK036). This work was also supported by Jilin Provincial Key Laboratory of Big Date Intelligent Computing, China (No. 20180622002JC).Peer reviewedPostprin

    Improving replicability in single-cell RNA-Seq cell type discovery with Dune

    Get PDF
    Single-cell transcriptome sequencing (scRNA-Seq) has allowed many new types of investigations at unprecedented and unique levels of resolution. Among the primary goals of scRNA-Seq is the classification of cells into potentially novel cell types. Many approaches build on the existing clustering literature to develop tools specific to single-cell applications. However, almost all of these methods rely on heuristics or user-supplied parameters to control the number of clusters identified. This affects both the resolution of the clusters within the original dataset as well as their replicability across datasets. While many recommendations exist to select these tuning parameters, most of them are quite ad hoc. In general, there is little assurance that any given set of parameters will represent an optimal choice in the ever-present trade-off between cluster resolution and replicability. For instance, it may be the case that another set of parameters will result in more clusters that are also more replicable, or in fewer clusters that are also less replicable. Here, we propose a new method called Dune for optimizing the trade-off between the resolution of the clusters and their replicability across datasets. Our method takes as input a set of clustering results on a single dataset, derived from any set of clustering algorithms and associated tuning parameters, and iteratively merges clusters within partitions in order to maximize their concordance between partitions. As demonstrated on a variety of scRNA-Seq datasets from different platforms, Dune outperforms existing techniques, that rely on hierarchical merging for reducing the number of clusters, in terms of replicability of the resultant merged clusters. It provides an objective approach for identifying replicable consensus clusters most likely to represent common biological features across multiple datasets

    Comparing hard and overlapping clusterings

    Get PDF
    Similarity measures for comparing clusterings is an important component, e.g., of evaluating clustering algorithms, for consensus clustering, and for clustering stability assessment. These measures have been studied for over 40 years in the domain of exclusive hard clusterings (exhaustive and mutually exclusive object sets). In the past years, the literature has proposed measures to handle more general clusterings (e.g., fuzzy/probabilistic clusterings). This paper provides an overview of these new measures and discusses their drawbacks. We ultimately develop a corrected-for-chance measure (13AGRI) capable of comparing exclusive hard, fuzzy/probabilistic, non-exclusive hard, and possibilistic clusterings. We prove that 13AGRI and the adjusted Rand index (ARI, by Hubert and Arabie) are equivalent in the exclusive hard domain. The reported experiments show that only 13AGRI could provide both a fine-grained evaluation across clusterings with different numbers of clusters and a constant evaluation between random clusterings, showing all the four desirable properties considered here. We identified a high correlation between 13AGRI applied to fuzzy clusterings and ARI applied to hard exclusive clusterings over 14 real data sets from the UCI repository, which corroborates the validity of 13AGRI fuzzy clustering evaluation. 13AGRI also showed good results as a clustering stability statistic for solutions produced by the expectation maximization algorithm for Gaussian mixture

    Adapting Mixture Models to Take into Account Measurement Non-Invariance

    Get PDF
    Researchers in the social sciences often use finite mixture models to find clusters of individuals on the basis of patterns of indicators. Though covariates are often incorporated in mixture models, it is most often assumed that these covariates exclusively affect class membership, rather than directly impacting the indicators themselves. Violation of this assumption indicates that the measurement of the latent classes by a given indicator is not constant across all individuals. Such violations, known as differential item functioning (DIF), have been well-studied in models for continuous latent variables, but virtually unexamined in models for categorical latent variables. The current study extends the analytic and testing framework developed in continuous latent variable models to the case of latent class analysis. First, a Monte Carlo simulation systematically examined the effects of omitted DIF on mixture model results, as well as the performance of tests to detect DIF. In the presence of DIF in the data-generating model, the omission of these effects in the fitted model was associated with overestimation of the number of classes, as well as biased estimates of covariate effects on class membership and model-implied endorsement probabilities, particularly when classes were poorly separated and DIF was large. Including DIF in the model, even if the nature of this DIF was misspecified, mitigated this bias considerably. Standard model-based procedures drawn from the continuous latent variable modeling literature were shown to detect DIF with high sensitivity and specificity. Finally, DIF was examined in an application of latent class analysis to alcohol use disorder (AUD) diagnostic criteria in an undergraduate sample. Researchers are advised to test comprehensively for DIF in applications of mixture models, in order to ensure that the results obtained are truly applicable to all individuals under study.Doctor of Philosoph
    corecore