21,814 research outputs found

    Microarray Enriched Gene Rank

    Get PDF
    We develop a new concept that reflects how genes are connected based on microarray data using the coefficient of determination (the squared Pearson correlation coefficient). Our gene rank combines a priori knowledge about gene connectivity, say, from the Gene Ontology (GO) database, and the microarray expression data at hand, called the microarray enriched gene rank, or simply gene rank (GR). GR, similarly to Google PageRank, is defined in a recursive fashion and is computed as the left maximum eigenvector of a stochastic matrix derived from microarray expression data. An efficient algorithm is devised that allows computation of GR for 50 thousand genes with 500 samples within minutes on a personal computer using the public domain statistical package R

    Mining SOM expression portraits: Feature selection and integrating concepts of molecular function

    Get PDF
    Background: 
Self organizing maps (SOM) enable the straightforward portraying of high-dimensional data of large sample collections in terms of sample-specific images. The analysis of their texture provides so-called spot-clusters of co-expressed genes which require subsequent significance filtering and functional interpretation. We address feature selection in terms of the gene ranking problem and the interpretation of the obtained spot-related lists using concepts of molecular function.

Results: 
Different expression scores based either on simple fold change-measures or on regularized Students t-statistics are applied to spot-related gene lists and compared with special emphasis on the error characteristics of microarray expression data. The spot-clusters are analyzed using different methods of gene set enrichment analysis with the focus on overexpression and/or overrepresentation of predefined sets of genes. Metagene-related overrepresentation of selected gene sets was mapped into the SOM images to assign gene function to different regions. Alternatively we estimated set-related overexpression profiles over all samples studied using a gene set enrichment score. It was also applied to the spot-clusters to generate lists of enriched gene sets. We used the tissue body index data set, a collection of expression data of human tissues, as an illustrative example. We found that tissue related spots typically contain enriched populations of gene sets well corresponding to molecular processes in the respective tissues. In addition, we display special sets of housekeeping and of consistently weak and highly expressed genes using SOM data filtering. 

Conclusions:
The presented methods allow the comprehensive downstream analysis of SOM-transformed expression data in terms of cluster-related gene lists and enriched gene sets for functional interpretation. SOM clustering implies the ability to define either new gene sets using selected SOM spots or to verify and/or to amend existing ones

    Generalized gene co-expression analysis via subspace clustering using low-rank representation

    Get PDF
    BACKGROUND: Gene Co-expression Network Analysis (GCNA) helps identify gene modules with potential biological functions and has become a popular method in bioinformatics and biomedical research. However, most current GCNA algorithms use correlation to build gene co-expression networks and identify modules with highly correlated genes. There is a need to look beyond correlation and identify gene modules using other similarity measures for finding novel biologically meaningful modules. RESULTS: We propose a new generalized gene co-expression analysis algorithm via subspace clustering that can identify biologically meaningful gene co-expression modules with genes that are not all highly correlated. We use low-rank representation to construct gene co-expression networks and local maximal quasi-clique merger to identify gene co-expression modules. We applied our method on three large microarray datasets and a single-cell RNA sequencing dataset. We demonstrate that our method can identify gene modules with different biological functions than current GCNA methods and find gene modules with prognostic values. CONCLUSIONS: The presented method takes advantage of subspace clustering to generate gene co-expression networks rather than using correlation as the similarity measure between genes. Our generalized GCNA method can provide new insights from gene expression datasets and serve as a complement to current GCNA algorithms

    Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis

    Full text link
    A prespecified set of genes may be enriched, to varying degrees, for genes that have altered expression levels relative to two or more states of a cell. Knowing the enrichment of gene sets defined by functional categories, such as gene ontology (GO) annotations, is valuable for analyzing the biological signals in microarray expression data. A common approach to measuring enrichment is by cross-classifying genes according to membership in a functional category and membership on a selected list of significantly altered genes. A small Fisher's exact test pp-value, for example, in this 2×22\times2 table is indicative of enrichment. Other category analysis methods retain the quantitative gene-level scores and measure significance by referring a category-level statistic to a permutation distribution associated with the original differential expression problem. We describe a class of random-set scoring methods that measure distinct components of the enrichment signal. The class includes Fisher's test based on selected genes and also tests that average gene-level evidence across the category. Averaging and selection methods are compared empirically using Affymetrix data on expression in nasopharyngeal cancer tissue, and theoretically using a location model of differential expression. We find that each method has a domain of superiority in the state space of enrichment problems, and that both methods have benefits in practice. Our analysis also addresses two problems related to multiple-category inference, namely, that equally enriched categories are not detected with equal probability if they are of different sizes, and also that there is dependence among category statistics owing to shared genes. Random-set enrichment calculations do not require Monte Carlo for implementation. They are made available in the R package allez.Comment: Published at http://dx.doi.org/10.1214/07-AOAS104 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Strain-dependent host transcriptional responses to toxoplasma infection are largely conserved in mammalian and avian hosts

    Get PDF
    Toxoplasma gondii has a remarkable ability to infect an enormous variety of mammalian and avian species. Given this, it is surprising that three strains (Types I/II/III) account for the majority of isolates from Europe/North America. The selective pressures that have driven the emergence of these particular strains, however, remain enigmatic. We hypothesized that strain selection might be partially driven by adaptation of strains for mammalian versus avian hosts. To test this, we examine in vitro, strain-dependent host responses in fibroblasts of a representative avian host, the chicken (Gallus gallus). Using gene expression profiling of infected chicken embryonic fibroblasts and pathway analysis to assess host response, we show here that chicken cells respond with distinct transcriptional profiles upon infection with Type II versus III strains that are reminiscent of profiles observed in mammalian cells. To identify the parasite drivers of these differences, chicken fibroblasts were infected with individual F1 progeny of a Type II x III cross and host gene expression was assessed for each by microarray. QTL mapping of transcriptional differences suggested, and deletion strains confirmed, that, as in mammalian cells, the polymorphic rhoptry kinase ROP16 is the major driver of strain-specific responses. We originally hypothesized that comparing avian versus mammalian host response might reveal an inversion in parasite strain-dependent phenotypes; specifically, for polymorphic effectors like ROP16, we hypothesized that the allele with most activity in mammalian cells might be less active in avian cells. Instead, we found that activity of ROP16 alleles appears to be conserved across host species; moreover, additional parasite loci that were previously mapped for strain-specific effects on mammalian response showed similar strain-specific effects in chicken cells. These results indicate that if different hosts select for different parasite genotypes, the selection operates downstream of the signaling occurring during the beginning of the host's immune response. © 2011 Ong et al

    Preferential regulation of stably expressed genes in the human genome suggests a widespread expression buffering role of microRNAs

    Get PDF
    In this study, we comprehensively explored the stably expressed genes (SE genes) and fluctuant genes (FL genes) in the human genome by a meta-analysis of large scale microarray data. We found that these genes have distinct function distributions. miRNA targets are shown to be significantly enriched in SE genes by using propensity analysis of miRNA regulation, supporting the hypothesis that miRNAs can buffer whole genome expression fluctuation. The expression-buffering effect of miRNA is independent of the target site number within the 3'-untranslated region. In addition, we found that gene expression fluctuation is positively correlated with the number of transcription factor binding sites in the promoter region, which suggests that coordination between transcription factors and miRNAs leads to balanced responses to external perturbations

    A Microarray study of Carpet-Shell Clam (Ruditapes decussatus) shows common and organ-specific growth-related gene expression Differences in gills and digestive gland

    Get PDF
    Growth rate is one of the most important traits from the point of view of individual fitness and commercial production in mollusks, but its molecular and physiological basis is poorly known. We have studied differential gene expression related to differences in growth rate in adult individuals of the commercial marine clam Ruditapes decussatus. Gene expression in the gills and the digestive gland was analyzed in 5 fast-growing and five slow-growing animals by means of an oligonucleotide microarray containing 14,003 probes. A total of 356 differentially expressed genes (DEG) were found. We tested the hypothesis that differential expression might be concentrated at the growth control gene core (GCGC), i. e., the set of genes that underlie the molecular mechanisms of genetic control of tissue and organ growth and body size, as demonstrated in model organisms. The GCGC includes the genes coding for enzymes of the insulin/ insulin-like growth factor signaling pathway (IIS), enzymes of four additional signaling pathways (Raf/ Ras/ Mapk, Jnk, TOR, and Hippo), and transcription factors acting at the end of those pathways. Only two out of 97 GCGC genes present in themicroarray showed differential expression, indicating a very little contribution of GCGC genes to growth-related differential gene expression. Forty eight DEGs were shared by both organs, with gene ontology (GO) annotations corresponding to transcription regulation, RNA splicing, sugar metabolism, protein catabolism, immunity, defense against pathogens, and fatty acid biosynthesis. GO termenrichment tests indicated that genes related to growth regulation, development and morphogenesis, extracellular matrix proteins, and proteolysis were overrepresented in the gills. In the digestive gland overrepresented GO terms referred to gene expression control through chromatin rearrangement, RAS-related small GTPases, glucolysis, and energy metabolism. These analyses suggest a relevant role of, among others, some genes related to the IIS, such as the ParaHox gene Xlox, CCAR and the CCN family of secreted proteins, in the regulation of growth in bivalves.Direccion General de Investigacion Cientifica y Tecnica of the Spanish Government [AGL2010-16743, AGL2013-49144-C3-3-R]; COMPETE Program; Portuguese National Funds [PEst-255 C/MAR/LA0015/2011]; Portuguese FCT [UID/Multi/04326/2013]; Generalitat Valenciana; Ministry of Education, Culture, and Sports of the Spanish Government; Association of European Marine Biology Laboratoriesinfo:eu-repo/semantics/publishedVersio

    Conservation of a microRNA cluster in parasitic nematodes and profiling of miRNAs in excretory-secretory products and microvesicles of Haemonchus contortus

    Get PDF
    microRNAs are small non-coding RNAs that are important regulators of gene expression in a range of animals, including nematodes. We have analysed a cluster of four miRNAs from the pathogenic nematode species Haemonchus contortus that are closely linked in the genome. We find that the cluster is conserved only in clade V parasitic nematodes and in some ascarids, but not in other clade III species nor in clade V free-living nematodes. Members of the cluster are present in parasite excretory-secretory products and can be detected in the abomasum and draining lymph nodes of infected sheep, indicating their release in vitro and in vivo. As observed for other parasitic nematodes, H. contortus adult worms release extracellular vesicles (EV). Small RNA libraries were prepared from vesicle-enriched and vesicle-depleted supernatants from both adult worms and L4 stage larvae. Comparison of the miRNA species in the different fractions indicated that specific miRNAs are packaged within vesicles, while others are more abundant in vesicle-depleted supernatant. Hierarchical clustering analysis indicated that the gut is the likely source of vesicle-associated miRNAs in the L4 stage, but not in the adult worm. These findings add to the growing body of work demonstrating that miRNAs released from parasitic helminths may play an important role in host-parasite interactions
    corecore