47,150 research outputs found

    Gene ordering in partitive clustering using microarray expressions

    Get PDF
    A central step in the analysis of gene expression data is the identification of groups of genes that exhibit similar expression patterns. Clustering and ordering the genes using gene expression data into homogeneous groups was shown to be useful in functional annotation, tissue classification, regulatory motif identification, and other applications. Although there is a rich literature on gene ordering in hierarchical clustering framework for gene expression analysis, there is no work addressing and evaluating the importance of gene ordering in partitive clustering framework, to the best knowledge of the authors. Outside the framework of hierarchical clustering, different gene ordering algorithms are applied on the whole data set, and the domain of partitive clustering is still unexplored with gene ordering approaches. A new hybrid method is proposed for ordering genes in each of the clusters obtained from partitive clustering solution, using microarry gene expressions. Two existing algorithms for optimally ordering cities in travelling salesman problem (TSP), namely, FRAG_GALK and Concorde, are hybridized individually with self organizing MAP to show the importance of gene ordering in partitive clustering framework. We validated our hybrid approach using yeast and fibroblast data and showed that our approach improves the result quality of partitive clustering solution, by identifying subclusters within big clusters, grouping functionally correlated genes within clusters, minimization of summation of gene expression distances, and the maximization of biological gene ordering using MIPS categorization. Moreover, the new hybrid approach, finds comparable or sometimes superior biological gene order in less computation time than those obtained by optimal leaf ordering in hierarchical clustering solution

    Statistical analysis of RNA-seq data from next-generation sequencing technology

    Get PDF
    In recent years, the advent of next-generation sequencing (NGS) technology has been revolutionizing how genomic studies are processed. One important application of NGS technology is the study of transcriptome through sequencing of RNAs (RNA-seq). Compared with previous technologies such as microarray, RNA-seq data have many advantages, such as providing digital rather than analog signals of expression levels, dynamic and wider ranges of measurements, less noise, higher throughput, etc. Hence, RNA-seq is gradually replacing the array-based approach as the major platform in transcriptome studies. Meanwhile, the massive amounts of discrete data generated by the NGS technology call for effective methods of statistical analysis. There are many interesting questions in RNA-seq data analysis, and we focus on three important ones in this dissertation: identifying differentially expressed genes, from two-treatment experiments, detecting alternative splicing patterns using exon-expression data, and clustering gene expression profiles for multi-sample studies. Our major work are introduced in the following chapters: First, we propose an approximated maximum-average powerful (AMAP) testing procedure to compare gene expression from two treatment groups. The proposed method allows for testing null hypotheses that are much more general than what have been considered by most previous studies, and it leads to a natural way of controlling the FDR. We show that our method has higher power as well as better FDR control than other widely-used methods in practice. Second, we generalize the AMAP test from testing gene expression data to studying alternative splicing events from exon-level expression data. A nonparametric algorithm to estimate the distribution of exon usages is proposed, and this algorithm provides more flexibility for fitting the data, and higher computation efficiency. Our method is compared with previous methods and ours is shown to be much more powerful. In the third project, we introduce clustering algorithms based on appropriate probability models for RNA-seq data, with well-designed initialization strategy and grouping algorithms. We also present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq data set show that our proposed methods provide better clustering results than alternative methods that are not based on probability models

    Comparative genomics and transcriptomics of Escherichia coli isolates carrying virulence factors of both enteropathogenic and enterotoxigenic E. coli

    Get PDF
    AbstractEscherichia coli that are capable of causing human disease are often classified into pathogenic variants (pathovars) based on their virulence gene content. However, disease-associated hybrid E. coli, containing unique combinations of multiple canonical virulence factors have also been described. Such was the case of the E. coli O104:H4 outbreak in 2011, which caused significant morbidity and mortality. Among the pathovars of diarrheagenic E. coli that cause significant human disease are the enteropathogenic E. coli (EPEC) and enterotoxigenic E. coli (ETEC). In the current study we use comparative genomics, transcriptomics, and functional studies to characterize isolates that contain virulence factors of both EPEC and ETEC. Based on phylogenomic analysis, these hybrid isolates are more genomically-related to EPEC, but appear to have acquired ETEC virulence genes. Global transcriptional analysis using RNA sequencing, demonstrated that the EPEC and ETEC virulence genes of these hybrid isolates were differentially-expressed under virulence-inducing laboratory conditions, similar to reference isolates. Immunoblot assays further verified that the virulence gene products were produced and that the T3SS effector EspB of EPEC, and heat-labile toxin of ETEC were secreted. These findings document the existence and virulence potential of an E. coli pathovar hybrid that blurs the distinction between E. coli pathovars.</jats:p

    Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics.

    Get PDF
    BackgroundSingle-cell transcriptomics allows researchers to investigate complex communities of heterogeneous cells. It can be applied to stem cells and their descendants in order to chart the progression from multipotent progenitors to fully differentiated cells. While a variety of statistical and computational methods have been proposed for inferring cell lineages, the problem of accurately characterizing multiple branching lineages remains difficult to solve.ResultsWe introduce Slingshot, a novel method for inferring cell lineages and pseudotimes from single-cell gene expression data. In previously published datasets, Slingshot correctly identifies the biological signal for one to three branching trajectories. Additionally, our simulation study shows that Slingshot infers more accurate pseudotimes than other leading methods.ConclusionsSlingshot is a uniquely robust and flexible tool which combines the highly stable techniques necessary for noisy single-cell data with the ability to identify multiple trajectories. Accurate lineage inference is a critical step in the identification of dynamic temporal gene expression

    Techniques for clustering gene expression data

    Get PDF
    Many clustering techniques have been proposed for the analysis of gene expression data obtained from microarray experiments. However, choice of suitable method(s) for a given experimental dataset is not straightforward. Common approaches do not translate well and fail to take account of the data profile. This review paper surveys state of the art applications which recognises these limitations and implements procedures to overcome them. It provides a framework for the evaluation of clustering in gene expression analyses. The nature of microarray data is discussed briefly. Selected examples are presented for the clustering methods considered
    corecore