130 research outputs found

    The mouse Gene Expression Database (GXD): 2021 update.

    Get PDF
    The Gene Expression Database (GXD; www.informatics.jax.org/expression.shtml) is an extensive and well-curated community resource of mouse developmental gene expression information. For many years, GXD has collected and integrated data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern blot, and western blot experiments through curation of the scientific literature and by collaborations with large-scale expression projects. Since our last report in 2019, we have continued to acquire these classical types of expression data; developed a searchable index of RNA-Seq and microarray experiments that allows users to quickly and reliably find specific mouse expression studies in ArrayExpress (https://www.ebi.ac.uk/arrayexpress/) and GEO (https://www.ncbi.nlm.nih.gov/geo/); and expanded GXD to include RNA-Seq data. Uniformly processed RNA-Seq data are imported from the EBI Expression Atlas and then integrated with the other types of expression data in GXD, and with the genetic, functional, phenotypic and disease-related information in Mouse Genome Informatics (MGI). This integration has made the RNA-Seq data accessible via GXD\u27s enhanced searching and filtering capabilities. Further, we have embedded the Morpheus heat map utility into the GXD user interface to provide additional tools for display and analysis of RNA-Seq data, including heat map visualization, sorting, filtering, hierarchical clustering, nearest neighbors analysis and visual enrichment

    Mouse Genome Informatics (MGI): latest news from MGD and GXD.

    Get PDF
    The Mouse Genome Informatics (MGI) database system combines multiple expertly curated community data resources into a shared knowledge management ecosystem united by common metadata annotation standards. MGI\u27s mission is to facilitate the use of the mouse as an experimental model for understanding the genetic and genomic basis of human health and disease. MGI is the authoritative source for mouse gene, allele, and strain nomenclature and is the primary source of mouse phenotype annotations, functional annotations, developmental gene expression information, and annotations of mouse models with human diseases. MGI maintains mouse anatomy and phenotype ontologies and contributes to the development of the Gene Ontology and Disease Ontology and uses these ontologies as standard terminologies for annotation. The Mouse Genome Database (MGD) and the Gene Expression Database (GXD) are MGI\u27s two major knowledgebases. Here, we highlight some of the recent changes and enhancements to MGD and GXD that have been implemented in response to changing needs of the biomedical research community and to improve the efficiency of expert curation. MGI can be accessed freely at http://www.informatics.jax.org

    The mouse Gene Expression Database (GXD): 2019 update.

    Get PDF
    The mouse Gene Expression Database (GXD) is an extensive, well-curated community resource freely available at www.informatics.jax.org/expression.shtml. Covering all developmental stages, GXD includes data from RNA in situ hybridization, immunohistochemistry, RT-PCR, northern blot and western blot experiments in wild-type and mutant mice. GXD\u27s gene expression information is integrated with the other data in Mouse Genome Informatics and interconnected with other databases, placing these data in the larger biological and biomedical context. Since the last report, the ability of GXD to provide insights into the molecular mechanisms of development and disease has been greatly enhanced by the addition of new data and by the implementation of new web features. These include: improvements to the Differential Gene Expression Data Search, facilitating searches for genes that have been shown to be exclusively expressed in a specified structure and/or developmental stage; an enhanced anatomy browser that now provides access to expression data and phenotype data for a given anatomical structure; direct access to the wild-type gene expression data for the tissues affected in a specific mutant; and a comparison matrix that juxtaposes tissues where a gene is normally expressed against tissues, where mutations in that gene cause abnormalities

    Mouse Genome Database (MGD) 2019.

    Get PDF
    The Mouse Genome Database (MGD; http://www.informatics.jax.org) is the community model organism genetic and genome resource for the laboratory mouse. MGD is the authoritative source for biological reference data sets related to mouse genes, gene functions, phenotypes, and mouse models of human disease. MGD is the primary outlet for official gene, allele and mouse strain nomenclature based on the guidelines set by the International Committee on Standardized Nomenclature for Mice. In this report we describe significant enhancements to MGD, including two new graphical user interfaces: (i) the Multi Genome Viewer for exploring the genomes of multiple mouse strains and (ii) the Phenotype-Gene Expression matrix which was developed in collaboration with the Gene Expression Database (GXD) and allows researchers to compare gene expression and phenotype annotations for mouse genes. Other recent improvements include enhanced efficiency of our literature curation processes and the incorporation of Transcriptional Start Site (TSS) annotations from RIKEN\u27s FANTOM 5 initiative

    An effective biomedical document classification scheme in support of biocuration: addressing class imbalance.

    Get PDF
    Published literature is an important source of knowledge supporting biomedical research. Given the large and increasing number of publications, automated document classification plays an important role in biomedical research. Effective biomedical document classifiers are especially needed for bio-databases, in which the information stems from many thousands of biomedical publications that curators must read in detail and annotate. In addition, biomedical document classification often amounts to identifying a small subset of relevant publications within a much larger collection of available documents. As such, addressing class imbalance is essential to a practical classifier. We present here an effective classification scheme for automatically identifying papers among a large pool of biomedical publications that contain information relevant to a specific topic, which the curators are interested in annotating. The proposed scheme is based on a meta-classification framework using cluster-based under-sampling combined with named-entity recognition and statistical feature selection strategies. We examined the performance of our method over a large imbalanced data set that was originally manually curated by the Jackson Laboratory\u27s Gene Expression Database (GXD). The set consists of more than 90 000 PubMed abstracts, of which about 13 000 documents are labeled as relevant to GXD while the others are not relevant. Our results, 0.72 precision, 0.80 recall and 0.75 f-measure, demonstrate that our proposed classification scheme effectively categorizes such a large data set in the face of data imbalance

    Utilizing image and caption information for biomedical document classification.

    Get PDF
    MOTIVATION: Biomedical research findings are typically disseminated through publications. To simplify access to domain-specific knowledge while supporting the research community, several biomedical databases devote significant effort to manual curation of the literature-a labor intensive process. The first step toward biocuration requires identifying articles relevant to the specific area on which the database focuses. Thus, automatically identifying publications relevant to a specific topic within a large volume of publications is an important task toward expediting the biocuration process and, in turn, biomedical research. Current methods focus on textual contents, typically extracted from the title-and-abstract. Notably, images and captions are often used in publications to convey pivotal evidence about processes, experiments and results. RESULTS: We present a new document classification scheme, using both image and caption information, in addition to titles-and-abstracts. To use the image information, we introduce a new image representation, namely Figure-word, based on class labels of subfigures. We use word embeddings for representing captions and titles-and-abstracts. To utilize all three types of information, we introduce two information integration methods. The first combines Figure-words and textual features obtained from captions and titles-and-abstracts into a single larger vector for document representation; the second employs a meta-classification scheme. Our experiments and results demonstrate the usefulness of the newly proposed Figure-words for representing images. Moreover, the results showcase the value of Figure-words, captions and titles-and-abstracts in providing complementary information for document classification; these three sources of information when combined, lead to an overall improved classification performance. AVAILABILITY AND IMPLEMENTATION: Source code and the list of PMIDs of the publications in our datasets are available upon request

    Integración de datos de imagen molecular y expresión génica

    Get PDF
    As far as all the background information about atlases and gene expression databases has already been analysed, now we need to define further the project, its objectives and facts of interest. Gene expression databases, in most cases, do not provide any kind of integration with anatomical information of where those genes are expressed. The characterization of the whole transcriptome for structures like the brain is of limited utility if we have no anatomical information. Combining the databases with the anatomical information provided by an anatomical atlas, we can have lots of advantages. First of all, the most immediate advantage that such integration would introduce would be user-friendliness. Part of this problem is solved with the aGEM tool already developed, that integrates different databases into a single user interface.Visual representation of the gene locations would improve user experience if it is integrated with aGEM. As a second advantage, this integration could facilitate the connection between imaging and gene expression information when defining or analysing results from preclinical experiments. The definition of an imaging protocol in order to study the phenotype of a transgenic animal model could benefit from the results of this project, since the researcher could look for anatomical structures related to the genes that have been manipulated. The results of image quantification are usually an statistical parametric map, that presents the statistical significance of a certain analysis for every voxel. Significant areas from this image could be related to the underlying genes by means of the proposed integration tool. So, the main objective of the project is to connect all the information provided by aGEM and by the atlas. For the issue, it is needed to find how the information is stored and related in aGEM, in order to extract enough information of interest to program a first version of the tool. Also, it is required to study which atlases are available and which one is the most suitable for our purposes. Once all these steps have been done, the kind of program which is going to be developed needs to be analysed. There are several possibilities, like a program in Java, C++, Matlab or a plugin for its integration in ImageJ. Once all the necessary information is extracted, an integration step must be done for the program to be operative. Then, when the information mapping is ready, the interface of the program can be written. The first version of program should be able to perform certain query types: 1. Anatomy query: Given a list of anatomy structures, the user should be able to select any of them and the program would launch a query showing the genes and their information expressed in the structure, which would be shown in the atlas images. 2. Gene query: The user selects a gene and the program would detect in which structures is the gene expressed and show them in the atlas images. The purpose in a first stage is to integrate part of the information contained in aGEM in a beta trial version of the interface, in order to check its real utility.Ingeniería Biomédic
    corecore