1,518 research outputs found

    agriGO: a GO analysis toolkit for the agricultural community

    Get PDF
    Gene Ontology (GO), the de facto standard in gene functionality description, is used widely in functional annotation and enrichment analysis. Here, we introduce agriGO, an integrated web-based GO analysis toolkit for the agricultural community, using the advantages of our previous GO enrichment tool (EasyGO), to meet analysis demands from new technologies and research objectives. EasyGO is valuable for its proficiency, and has proved useful in uncovering biological knowledge in massive data sets from high-throughput experiments. For agriGO, the system architecture and website interface were redesigned to improve performance and accessibility. The supported organisms and gene identifiers were substantially expanded (including 38 agricultural species composed of 274 data types). The requirement on user input is more flexible, in that user-defined reference and annotation are accepted. Moreover, a new analysis approach using Gene Set Enrichment Analysis strategy and customizable features is provided. Four tools, SEA (Singular enrichment analysis), PAGE (Parametric Analysis of Gene set Enrichment), BLAST4ID (Transfer IDs by BLAST) and SEACOMPARE (Cross comparison of SEA), are integrated as a toolkit to meet different demands. We also provide a cross-comparison service so that different data sets can be compared and explored in a visualized way. Lastly, agriGO functions as a GO data repository with search and download functions; agriGO is publicly accessible at http://bioinfo.cau.edu.cn/agriGO/

    ArrayIDer: automated structural re-annotation pipeline for DNA microarrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systems biology modeling from microarray data requires the most contemporary structural and functional array annotation. However, microarray annotations, especially for non-commercial, non-traditional biomedical model organisms, are often dated. In addition, most microarray analysis tools do not readily accept EST clone names, which are abundantly represented on arrays. Manual re-annotation of microarrays is impracticable and so we developed a computational re-annotation tool (<it>ArrayIDer</it>) to retrieve the most recent accession mapping files from public databases based on EST clone names or accessions and rapidly generate database accessions for entire microarrays.</p> <p>Results</p> <p>We utilized the Fred Hutchinson Cancer Research Centre 13K chicken cDNA array – a widely-used non-commercial chicken microarray – to demonstrate the principle that <it>ArrayIDer </it>could markedly improve annotation. We structurally re-annotated 55% of the entire array. Moreover, we decreased non-chicken functional annotations by 2 fold. One beneficial consequence of our re-annotation was to identify 290 pseudogenes, of which 66 were previously incorrectly annotated.</p> <p>Conclusion</p> <p><it>ArrayIDer </it>allows rapid automated structural re-annotation of entire arrays and provides multiple accession types for use in subsequent functional analysis. This information is especially valuable for systems biology modeling in the non-traditional biomedical model organisms.</p

    Statistical approaches of gene set analysis with quantitative trait loci for high-throughput genomic studies.

    Get PDF
    Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on gene ontology terms, known biological pathways, etc., which may not establish any formal relation between genotype and trait specific phenotype. Further, in plant biology and breeding, gene set analysis with trait specific Quantitative Trait Loci data are considered to be a great source for biological knowledge discovery. Therefore, innovative statistical approaches are developed for analyzing, and interpreting gene expression data from Microarrays, RNA-sequencing studies in the context of gene sets with trait specific Quantitative Trait Loci. The utility of the developed approaches is studied on multiple real gene expression datasets obtained from various Microarrays and RNA-sequencing studies. The selection of gene sets through differential expression analysis is the primary step of gene set analysis, and which can be achieved through using gene selection methods. The existing methods for such analysis in high-throughput studies, such as Microarrays, RNA-sequencing studies, suffer from serious limitations. For instance, in Microarrays, most of the available methods are either based on relevancy or redundancy measures. Through these methods, the ranking of genes is done on single Microarray expression data, which leads to the selection of spuriously associated, and redundant gene sets. Therefore, newer, and innovative differential expression analytical methods have been developed for Microarrays, and single-cell RNA-sequencing studies for identification of gene sets to successfully carry out the gene set and other downstream analyses. Furthermore, several methods specifically designed for single-cell data have been developed in the literature for the differential expression analysis. To provide guidance on choosing an appropriate tool or developing a new one, it is necessary to review the performance of the existing methods. Hence, a comprehensive overview, classification, and comparative study of the available single-cell methods is hereby undertaken to study their unique features, underlying statistical models and their shortcomings on real applications. Moreover, to address one of the shortcomings (i.e., higher dropout events due to lower cell capture rates), an improved statistical method for downstream analysis of single-cell data has been developed. From the users’ point of view, the different developed statistical methods are implemented in various software tools and made publicly available. These methods and tools will help the experimental biologists and genome researchers to analyze their experimental data more objectively and efficiently. Moreover, the limitations and shortcomings of the available methods are reported in this study, and these need to be addressed by statisticians and biologists collectively to develop efficient approaches. These new approaches will be able to analyze high-throughput genomic data more efficiently to better understand the biological systems and increase the specificity, sensitivity, utility, and relevance of high-throughput genomic studies

    Genome-wide gene expression surveys and a transcriptome map in chicken

    Get PDF
    The chicken (Gallus gallus) is an important model organism in genetics, developmental biology, immunology, evolutionary research, and agricultural science. The completeness of the draft chicken genome sequence provided new possibilities to study genomic changes during evolution by comparing the chicken genome to that of other species. The development of long oligonucleotide microarrays based on the genome sequence made it possible to survey genome-wide gene expression in chicken. This thesis describes two gene expression surveys across a range of healthy chicken tissues in both adult and embryonic stages. Specifically, we focus on the mechanisms of regulation of gene transcription and their evolution in the vertebrate genome. Chapter 1 provides a brief history of the chicken as a model organism in biological and genomics research. In particular a brief overview is presented about expression profiling experiments, followed by an introduction to gene transcription regulation in general. Finally, the aim and outline of this thesis is presented. An important aim of this thesis is to generate surveys of genome-wide gene expression data in chicken using microarrays. In chapter 2, we introduce microarray data normalization including background correction, within-array normalization and between-array normalization. Based on these results an analysis approach is recommended for the analysis of two-color microarray data as performed in the experiments described in this thesis. We also briefly explain the relevant methodology for the identification of differentially expressed genes and how to translate resulting gene lists into biological knowledge. Finally, specific issues related to updating microarray probe annotation in farm animals, is discussed. For the analysis of the microarray data in this thesis re-annotation of the probes on the chicken 20K oligoarray was done using the oligoRAP, analysis pipeline. The vast amount of data generated from a single transcriptomics study makes it impossible to extract meaningful biological knowledge by manually going through individual genes from a list with hundreds and thousands of differentially expressed genes. In chapter 3, we present a practical approach using a collection of R/Bioconductor packages to extract biological knowledge from a microarray experiment in farm animals. Furthermore, a locally adaptive statistical procedure (LAP) analysis approach is used to identify differentially expressed chromosomal regions in a microarray experiment. Chapter 4 presents a genome-wide gene expression survey across eight different tissues (brain, bursa of Fabricius, kidney, liver, lung, small intestine, spleen, and thymus from 10-week old chickens) in adult birds using a chicken 20K microarray. To a certain extent, most genes show some tissue-specific pattern of expression. Housekeeping and tissue-specific genes are identified based on gene expression patterns across the eight different tissues. The results show that housekeeping genes are more compact, i.e. are smaller, with shorter, coding sequence length, intron length, and smaller length of the intergenic regions. This observed compactness of housekeeping genes may be a result of selection on economy of transcription during evolution. Furthermore, a comparative analysis of gene expression among mouse, chicken, and frog showed that the expression patterns of orthologous genes are conserved during evolution between mammals, birds, and amphibians. The chicken embryo has been a very popular model for developmental biology. To study the overall gene expression pattern in whole chicken embryos at different developmental stages and/or embryonic tissues, a genome-wide gene expression survey across different developmental and embryonic stages was performed (chapter 5). The study included four different developmental stages (HH stage 3, 10, 15, 22) and eight different embryonic tissues (brain, bursa of Fabricius, heart, kidney, liver, lung, small intestine, and spleen from HH stage 36). We were able to identify several embryonic stage- and tissue-specific genes in our analysis. Genomic features of genes widely expressed under these 12 conditions suggest that widely expressed genes are more compact than tissue-specific genes, confirming the findings described in chapter 4. The analysis of the differentially expressed genes during the different developmental stages of whole embryo indicates a gradual change in gene expression during embryo development. A comparison of the gene expression profiles between the same organs, of adults and embryos reveals both striking similarities as well as differences. The overall goal of this thesis was to improve our understanding of the mechanisms of transcriptional regulation in the chicken. In chapter 6, a transcriptome map for all chicken chromosomes is presented based on the expression data described in chapter 4. The results reveal the presence of two distinct types of chromosomal regions characterized by clusters of highly or lowly expressed genes respectively. Furthermore, these regions show a high correlation with a number of genome characteristics, like gene density, gene length, intron length, and GC content. A comparative analysis between the chicken and human transcriptome maps suggests that the regions with clusters of highly expressed genes are relatively conserved between the two genomes. Our results revealed the presence of a higher order organization of the chicken genome that affects gene expression, confirming similar observations in other species. Finally, in chapter 7 I summarize the main findings and discuss some of the limitations of the analyses described in this thesis. I also discuss the different merits and shortcomings of studying gene expression using either microarrays or next-generation sequencing technology and propose directions for future research. The rapid developments in new-generation sequencing technology will facilitate better coverage and depth of the chicken genome. This will provide a better genome assembly and an improved genome annotation. The sequence-based approaches for studying gene expression will reduce noise levels compared to hybridization-based approaches. Overall, next-generation sequencing is already providing greatly enhance tools to further improve our understanding of the chicken transcriptome and its regulation. <br/

    Development and Application of Comparative Gene Co-expression Network Methods in Brachypodium distachyon

    Get PDF
    Gene discovery and characterization is a long and labor-intensive process. Gene co-expression network analysis is a long-standing powerful approach that can strongly enrich signals within gene expression datasets to predict genes critical for many cellular functions. Leveraging this approach with a large number of transcriptome datasets does not yield a concomitant increase in network granularity. Independently generated datasets that describe gene expression in various tissues, developmental stages, times of day, and environments can carry conflicting co-expression signals. The gene expression responses of the model C3 grass Brachypodium distachyon to abiotic stress is characterized by a co-expression-based analysis, identifying 22 modules of genes, annotated with putative DNA regulatory elements and functional terms. A great deal of co-expression elasticity is found among the genes characterized therein. An algorithm, dGCNA, designed to determine statistically significant changes in gene-gene co-expression relationships is presented. The algorithm is demonstrated on the very well-characterized circadian system of Arabidopsis thaliana, and identifies potential strong signals of molecular interactions between a specific transcription factor and putative target gene loci. Lastly, this network comparison approach based on edge-wise similarities is demonstrated on many pairwise comparisons of independent microarray datasets, to demonstrate the utility of fine-grained network comparison, rather than amassing as large a dataset as possible. This approach identifies a set of 182 gene loci which are differentially expressed under drought stress, change their co-expression strongly under loss of thermocycles or high-salinity stress, and are associated with cell-cycle and DNA replication functions. This set of genes provides excellent candidates for the generation of rhythmic growth under thermocycles in Brachypodium distachyon

    g:Profiler—a web server for functional interpretation of gene lists (2011 update)

    Get PDF
    Functional interpretation of candidate gene lists is an essential task in modern biomedical research. Here, we present the 2011 update of g:Profiler (http://biit.cs.ut.ee/gprofiler/), a popular collection of web tools for functional analysis. g:GOSt and g:Cocoa combine comprehensive methods for interpreting gene lists, ordered lists and list collections in the context of biomedical ontologies, pathways, transcription factor and microRNA regulatory motifs and protein–protein interactions. Additional tools, namely the biomolecule ID mapping service (g:Convert), gene expression similarity searcher (g:Sorter) and gene homology searcher (g:Orth) provide numerous ways for further analysis and interpretation. In this update, we have implemented several features of interest to the community: (i) functional analysis of single nucleotide polymorphisms and other DNA polymorphisms is supported by chromosomal queries; (ii) network analysis identifies enriched protein–protein interaction modules in gene lists; (iii) functional analysis covers human disease genes; and (iv) improved statistics and filtering provide more concise results. g:Profiler is a regularly updated resource that is available for a wide range of species, including mammals, plants, fungi and insects
    corecore