3 research outputs found

    HTself2: Combining p-values to Improve Classification of Differential Gene Expression in HTself

    Get PDF
    HTself is a web-based bioinformatics tool designed to deal with the classification of differential gene expression for low replication microarray studies. It is based on a statistical test that uses self-self experiments to derive intensity-dependent cutoffs. The method was previously described in Vêncio et al, (DNA Res. 12: 211- e 214, 2005). In this work we consider an extension of HTself by calculating p-values instead of using a fixed credibility level α. As before, the statistic used to compute single spots p-values is obtained from the gaussian Kernel Density Estimator method applied to self-self data. Different spots corresponding to the same biological gene (replicas) give rise to a set of independent p-values which can be combined by well known statistical methods. The combined p-value can be used to decide whether a gene can be considered differentially expressed or not. HTself2 is a new version of HTself that uses the idea of p-values combination. It was implemented as a user-friendly desktop application to help laboratories without a bioinformatics infrastructure

    Two Novel Methods for Clustering Short Time-Course Gene Expression Profiles

    Get PDF
    As genes with similar expression pattern are very likely having the same biological function, cluster analysis becomes an important tool to understand and predict gene functions from gene expression profi les. In many situations, each gene expression profi le only contains a few data points. Directly applying traditional clustering algorithms to such short gene expression profi les does not yield satisfactory results. Developing clustering algorithms for short gene expression profi les is necessary. In this thesis, two novel methods are developed for clustering short gene expression pro files. The fi rst method, called the network-based clustering method, deals with the defect of short gene expression profi les by generating a gene co-expression network using conditional mutual information (CMI), which measures the non-linear relationship between two genes, as well as considering indirect gene relationships in the presence of other genes. The network-based clustering method consists of two steps. A gene co-expression network is firstly constructed from short gene expression profi les using a path consistency algorithm (PCA) based on the CMI between genes. Then, a gene functional module is identi ed in terms of cluster cohesiveness. The network-based clustering method is evaluated on 10 large scale Arabidopsis thaliana short time-course gene expression profi le datasets in terms of gene ontology (GO) enrichment analysis, and compared with an existing method called Clustering with Over-lapping Neighbourhood Expansion (ClusterONE). Gene functional modules identi ed by the network-based clustering method for 10 datasets returns target GO p-values as low as 10-24, whereas the original ClusterONE yields insigni cant results. In order to more speci cally cluster gene expression profi les, a second clustering method, namely the protein-protein interaction (PPI) integrated clustering method, is developed. It is designed for clustering short gene expression profi les by integrating gene expression profi le patterns and curated PPI data. The method consists of the three following steps: (1) generate a number of prede ned profi le patterns according to the number of data points in the profi les and assign each gene to the prede fined profi le to which its expression profi le is the most similar; (2) integrate curated PPI data to refi ne the initial clustering result from (1); (3) combine the similar clusters from (2) to gradually reduce cluster numbers by a hierarchical clustering method. The PPI-integrated clustering method is evaluated on 10 large scale A. thaliana datasets using GO enrichment analysis, and by comparison with an existing method called Short Time-series Expression Miner (STEM). Target gene functional clusters identi ed by the PPI-integrated clustering method for 10 datasets returns GO p-values as low as 10-62, whereas STEM returns GO p-values as low as 10-38. In addition to the method development, obtained clusters by two proposed methods are further analyzed to identify cross-talk genes under fi ve stress conditions in root and shoot tissues. A list of potential abiotic stress tolerant genes are found

    doi:10.1093/dnares/dsi007 HTself: Self–Self Based Statistical Test for Low Replication Microarray Studies

    No full text
    Different statistical methods have been used to classify a gene as differentially expressed in microarray experiments. They usually require a number of experimental observations to be adequately applied. However, many microarray experiments are constrained to low replication designs for different reasons, from financial restrictions to scarcely available RNA samples. Although performed in a high-throughput framework, there are few experimental replicas for each gene to allow the use of traditional or state-of-art statistical methods. In this work, we present a web-based bioinformatics tool that deals with real-life problems concerning low replication experiments. It uses an empirically derived criterion to classify a gene as differentially expressed by combining two widely accepted ideas in microarray analysis: self–self experiments to derive intensity-dependent cutoffs and non-parametric estimation techniques. To help laboratories without a bioinformatics infrastructure, we implemented the tool in a user-friendly websit
    corecore