4,635 research outputs found

    Gene Expression Atlas update—a value-added database of microarray and sequencing-based functional genomics experiments

    Get PDF
    Gene Expression Atlas (http://www.ebi.ac.uk/gxa) is an added-value database providing information about gene expression in different cell types, organism parts, developmental stages, disease states, sample treatments and other biological/experimental conditions. The content of this database derives from curation, re-annotation and statistical analysis of selected data from the ArrayExpress Archive and the European Nucleotide Archive. A simple interface allows the user to query for differential gene expression either by gene names or attributes or by biological conditions, e.g. diseases, organism parts or cell types. Since our previous report we made 20 monthly releases and, as of Release 11.08 (August 2011), the database supports 19 species, which contains expression data measured for 19 014 biological conditions in 136 551 assays from 5598 independent studies

    WormBase 2012: more genomes, more data, new website

    Get PDF
    Since its release in 2000, WormBase (http://www.wormbase.org) has grown from a small resource focusing on a single species and serving a dedicated research community, to one now spanning 15 species essential to the broader biomedical and agricultural research fields. To enhance the rate of curation, we have automated the identification of key data in the scientific literature and use similar methodology for data extraction. To ease access to the data, we are collaborating with journals to link entities in research publications to their report pages at WormBase. To facilitate discovery, we have added new views of the data, integrated large-scale datasets and expanded descriptions of models for human disease. Finally, we have introduced a dramatic overhaul of the WormBase website for public beta testing. Designed to balance complexity and usability, the new site is species-agnostic, highly customizable, and interactive. Casual users and developers alike will be able to leverage the public RESTful application programming interface (API) to generate custom data mining solutions and extensions to the site. We report on the growth of our database and on our work in keeping pace with the growing demand for data, efforts to anticipate the requirements of users and new collaborations with the larger science community

    GEORAC: an RNA-seq Atlas Constructor for the Gene Expression Omnibus

    Get PDF
    The meteoric rise of next-generation sequencing technologies over the past 15 years has resulted in a voluminous amount of data generated by modern biological and clinical studies. RNA sequencing, colloquially referred to as RNA-Seq, is a next-generation approach capable of surveying and quantifying whole organism transcriptomes. RNA-Seq methods are valued over microarray assays for their ability to avoid cross-hybridization signal noise, to quantify gene or transcript expression without assay-specific upper limits, to natively provide single-nucleotide genomic resolution, and to allow for de novo transcriptome assemblies. Many thousands of RNA-Seq studies have been published over the past seven years, and a significant area of bioinformatics research has focused on the creation of atlases that aggregate RNA-Seq results. These atlases are crucially useful for surveying trends in gene expression across published studies, for inspecting potentially contentious claims made by novel or prior work, and for synthesizing future research directions. The Expression Atlas currently serves as the canonical example for an RNA-Seq atlas and presents results from over 3,000 studies across numerous model research organisms. An issue with the Expression Atlas is that it forcibly applies a uniform secondary re-analysis pipeline to each RNA-Seq study incorporated within its database; this approach presents a conceptual challenge to studies whose results have been generated and published using established, well-tested workflows. Thus, there exists a critical need to provide for construction of RNA-Seq atlases that precisely reflect original results presented within the literature, and the primary objective of this dissertation is to provide a workflow that allows for transparent, reproducible construction of RNA-Seq atlases from study meta- and expression data housed within the National Center for Biomedical Information’s Gene Expression Omnibus (GEO). The challenge of this goal is exacerbated by the highly flexible design of GEO, which allows researchers to define novel metadata attributes and values at will and to submit expression results in virtually any format. Following an introductory background into modern genomics and RNA-Seq, the second chapter of this work presents GEOMP, a metadata parser and relational database constructor for the Gene Expression Omnibus. The subsequent third chapter describes GEOMP2, an in-place augmentation of GEOMP that provides further atomization and loading of sample-specific characteristics tags; this chapter significantly presents results from a pilot study surveying bioinformatics methods reproducibility across the zebrafish, mouse, and human research communities using metadata parsed and output by GEOMP2. Chapter four details GEORGET, a pipeline designed to rehabilitate, translate, and load expression data pulled from GEO into the relational database store constructed by GEOMP2. Chapter five concludes with discussion of future directions needed to expand and improve upon the current GEORAC workflow and the associated methods reproducibility study

    Gramene 2016: comparative plant genomics and pathway resources

    Get PDF
    Gramene (http://www.gramene.org) is an online resource for comparative functional genomics in crops and model plant species. Its two main frameworks are genomes (collaboration with Ensembl Plants) and pathways (The Plant Reactome and archival BioCyc databases). Since our last NAR update, the database website adopted a new Drupal management platform. The genomes section features 39 fully assembled reference genomes that are integrated using ontology-based annotation and comparative analyses, and accessed through both visual and programmatic interfaces. Additional community data, such as genetic variation, expression and methylation, are also mapped for a subset of genomes. The Plant Reactome pathway portal (http://plantreactome.gramene.org) provides a reference resource for analyzing plant metabolic and regulatory pathways. In addition to approximately 200 curated rice reference pathways, the portal hosts gene homology-based pathway projections for 33 plant species. Both the genome and pathway browsers interface with the EMBL-EBI's Expression Atlas to enable the projection of baseline and differential expression data from curated expression studies in plants. Gramene's archive website (http://archive.gramene.org) continues to provide previously reported resources on comparative maps, markers and QTL. To further aid our users, we have also introduced a live monthly educational webinar series and a Gramene YouTube channel carrying video tutorials

    Detailed transcriptome atlas of the pancreatic beta cell

    Get PDF
    BACKGROUND: Gene expression patterns provide a detailed view of cellular functions. Comparison of profiles in disease vs normal conditions provides insights into the processes underlying disease progression. However, availability and integration of public gene expression datasets remains a major challenge. The aim of the present study was to explore the transcriptome of pancreatic islets and, based on this information, to prepare a comprehensive and open access inventory of insulin-producing beta cell gene expression, the Beta Cell Gene Atlas (BCGA). METHODS: We performed Massively Parallel Signature Sequencing (MPSS) analysis of human pancreatic islet samples and microarray analyses of purified rat beta cells, alpha cells and INS-1 cells, and compared the information with available array data in the literature. RESULTS: MPSS analysis detected around 7600 mRNA transcripts, of which around a third were of low abundance. We identified 2000 and 1400 transcripts that are enriched/depleted in beta cells compared to alpha cells and INS-1 cells, respectively. Microarray analysis identified around 200 transcription factors that are differentially expressed in either beta or alpha cells. We reanalyzed publicly available gene expression data and integrated these results with the new data from this study to build the BCGA. The BCGA contains basal (untreated conditions) gene expression level estimates in beta cells as well as in different cell types in human, rat and mouse pancreas. Hierarchical clustering of expression profile estimates classify cell types based on species while beta cells were clustered together. CONCLUSION: Our gene atlas is a valuable source for detailed information on the gene expression distribution in beta cells and pancreatic islets along with insulin producing cell lines. The BCGA tool, as well as the data and code used to generate the Atlas are available at the T1Dbase website (T1DBase.org).Journal Articleinfo:eu-repo/semantics/publishe

    Establishment of a integrative multi-omics expression database CKDdb in the context of chronic kidney disease (CKD)

    Get PDF
    Complex human traits such as chronic kidney disease (CKD) are a major health and financial burden in modern societies. Currently, the description of the CKD onset and progression at the molecular level is still not fully understood. Meanwhile, the prolific use of high-throughput omic technologies in disease biomarker discovery studies yielded a vast amount of disjointed data that cannot be easily collated. Therefore, we aimed to develop a molecule-centric database featuring CKD-related experiments from available literature publications. We established the Chronic Kidney Disease database CKDdb, an integrated and clustered information resource that covers multi-omic studies (microRNAs, genomics, peptidomics, proteomics and metabolomics) of CKD and related disorders by performing literature data mining and manual curation. The CKDdb database contains differential expression data from 49395 molecule entries (redundant), of which 16885 are unique molecules (non-redundant) from 377 manually curated studies of 230 publications. This database was intentionally built to allow disease pathway analysis through a systems approach in order to yield biological meaning by integrating all existing information and therefore has the potential to unravel and gain an in-depth understanding of the key molecular events that modulate CKD pathogenesis

    BloodChIP: A database of comparative genome-wide transcription factor binding profiles in human blood cells

    Get PDF
    The BloodChIP database (http://www.med.unsw.edu.au/CRCWeb.nsf/page/ BloodChIP) supports exploration and visualization of combinatorial transcription factor (TF) binding at a particular locus in human CD34-positive and other normal and leukaemic cells or retrieval of target gene sets for user-defined combinations of TFs across one or more cell types. Increasing numbers of genome-wide TF binding profiles are being added to public repositories, and this trend is likely to continue. For the power of these data sets to be fully harnessed by experimental scientists, there is a need for these data to be placed in context and easily accessible for downstream applications. To this end, we have built a user-friendly database that has at its core the genome-wide binding profiles of seven key haematopoietic TFs in human stem/progenitor cells. These binding profiles are compared with binding profiles in normal differentiated and leukaemic cells. We have integrated these TF binding profiles with chromatin marks and expression data in normal and leukaemic cell fractions. All queries can be exported into external sites to construct TF-gene and protein-protein networks and to evaluate the association of genes with cellular processes and tissue expression. © 2013 The Author(s). Published by Oxford University Press.Link_to_subscribed_fulltex

    BioXpress: an integrated RNA-seq-derived gene expression database for pan-cancer analysis.

    Get PDF
    BioXpress is a gene expression and cancer association database in which the expression levels are mapped to genes using RNA-seq data obtained from The Cancer Genome Atlas, International Cancer Genome Consortium, Expression Atlas and publications. The BioXpress database includes expression data from 64 cancer types, 6361 patients and 17 469 genes with 9513 of the genes displaying differential expression between tumor and normal samples. In addition to data directly retrieved from RNA-seq data repositories, manual biocuration of publications supplements the available cancer association annotations in the database. All cancer types are mapped to Disease Ontology terms to facilitate a uniform pan-cancer analysis. The BioXpress database is easily searched using HUGO Gene Nomenclature Committee gene symbol, UniProtKB/RefSeq accession or, alternatively, can be queried by cancer type with specified significance filters. This interface along with availability of pre-computed downloadable files containing differentially expressed genes in multiple cancers enables straightforward retrieval and display of a broad set of cancer-related genes
    corecore