30 research outputs found

    Gene Ontology annotations: what they mean and where they come from

    Get PDF
    To address the challenges of information integration and retrieval, the computational genomics community increasingly has come to rely on the methodology of creating annotations of scientific literature using terms from controlled structured vocabularies such as the Gene Ontology (GO). Here we address the question of what such annotations signify and of how they are created by working biologists. Our goal is to promote a better understanding of how the results of experiments are captured in annotations, in the hope that this will lead both to better representations of biological reality through annotation and ontology development and to more informed use of GO resources by experimental scientists

    Investigation of COVID-19 comorbidities reveals genes and pathways coincident with the SARS-CoV-2 viral disease.

    Get PDF
    The emergence of the SARS-CoV-2 virus and subsequent COVID-19 pandemic initiated intense research into the mechanisms of action for this virus. It was quickly noted that COVID-19 presents more seriously in conjunction with other human disease conditions such as hypertension, diabetes, and lung diseases. We conducted a bioinformatics analysis of COVID-19 comorbidity-associated gene sets, identifying genes and pathways shared among the comorbidities, and evaluated current knowledge about these genes and pathways as related to current information about SARS-CoV-2 infection. We performed our analysis using GeneWeaver (GW), Reactome, and several biomedical ontologies to represent and compare common COVID-19 comorbidities. Phenotypic analysis of shared genes revealed significant enrichment for immune system phenotypes and for cardiovascular-related phenotypes, which might point to alleles and phenotypes in mouse models that could be evaluated for clues to COVID-19 severity. Through pathway analysis, we identified enriched pathways shared by comorbidity datasets and datasets associated with SARS-CoV-2 infection

    Update on the human and mouse lipocalin (LCN) gene family, including evidence the mouse Mup cluster is result of an evolutionary bloom .

    Get PDF
    Lipocalins (LCNs) are members of a family of evolutionarily conserved genes present in all kingdoms of life. There are 19 LCN-like genes in the human genome, and 45 Lcn-like genes in the mouse genome, which include 22 major urinary protein (Mup) genes. The Mup genes, plus 29 of 30 Mup-ps pseudogenes, are all located together on chromosome (Chr) 4; evidence points to an evolutionary bloom that resulted in this Mup cluster in mouse, syntenic to the human Chr 9q32 locus at which a single MUPP pseudogene is located. LCNs play important roles in physiological processes by binding and transporting small hydrophobic molecules -such as steroid hormones, odorants, retinoids, and lipids-in plasma and other body fluids. LCNs are extensively used in clinical practice as biochemical markers. LCN-like proteins (18-40 kDa) have the characteristic eight β-strands creating a barrel structure that houses the binding-site; LCNs are synthesized in the liver as well as various secretory tissues. In rodents, MUPs are involved in communication of information in urine-derived scent marks, serving as signatures of individual identity, or as kairomones (to elicit fear behavior). MUPs also participate in regulation of glucose and lipid metabolism via a mechanism not well understood. Although much has been learned about LCNs and MUPs in recent years, more research is necessary to allow better understanding of their physiological functions, as well as their involvement in clinical disorders

    Cisplatin-resistant triple-negative breast cancer subtypes: multiple mechanisms of resistance.

    Get PDF
    BACKGROUND: Understanding mechanisms underlying specific chemotherapeutic responses in subtypes of cancer may improve identification of treatment strategies most likely to benefit particular patients. For example, triple-negative breast cancer (TNBC) patients have variable response to the chemotherapeutic agent cisplatin. Understanding the basis of treatment response in cancer subtypes will lead to more informed decisions about selection of treatment strategies. METHODS: In this study we used an integrative functional genomics approach to investigate the molecular mechanisms underlying known cisplatin-response differences among subtypes of TNBC. To identify changes in gene expression that could explain mechanisms of resistance, we examined 102 evolutionarily conserved cisplatin-associated genes, evaluating their differential expression in the cisplatin-sensitive, basal-like 1 (BL1) and basal-like 2 (BL2) subtypes, and the two cisplatin-resistant, luminal androgen receptor (LAR) and mesenchymal (M) subtypes of TNBC. RESULTS: We found 20 genes that were differentially expressed in at least one subtype. Fifteen of the 20 genes are associated with cell death and are distributed among all TNBC subtypes. The less cisplatin-responsive LAR and M TNBC subtypes show different regulation of 13 genes compared to the more sensitive BL1 and BL2 subtypes. These 13 genes identify a variety of cisplatin-resistance mechanisms including increased transport and detoxification of cisplatin, and mis-regulation of the epithelial to mesenchymal transition. CONCLUSIONS: We identified gene signatures in resistant TNBC subtypes indicative of mechanisms of cisplatin. Our results indicate that response to cisplatin in TNBC has a complex foundation based on impact of treatment on distinct cellular pathways. We find that examination of expression data in the context of heterogeneous data such as drug-gene interactions leads to a better understanding of mechanisms at work in cancer therapy response

    Harmonizing model organism data in the Alliance of Genome Resources.

    Get PDF
    The Alliance of Genome Resources (the Alliance) is a combined effort of 7 knowledgebase projects: Saccharomyces Genome Database, WormBase, FlyBase, Mouse Genome Database, the Zebrafish Information Network, Rat Genome Database, and the Gene Ontology Resource. The Alliance seeks to provide several benefits: better service to the various communities served by these projects; a harmonized view of data for all biomedical researchers, bioinformaticians, clinicians, and students; and a more sustainable infrastructure. The Alliance has harmonized cross-organism data to provide useful comparative views of gene function, gene expression, and human disease relevance. The basis of the comparative views is shared calls of orthology relationships and the use of common ontologies. The key types of data are alleles and variants, gene function based on gene ontology annotations, phenotypes, association to human disease, gene expression, protein-protein and genetic interactions, and participation in pathways. The information is presented on uniform gene pages that allow facile summarization of information about each gene in each of the 7 organisms covered (budding yeast, roundworm Caenorhabditis elegans, fruit fly, house mouse, zebrafish, brown rat, and human). The harmonized knowledge is freely available on the alliancegenome.org portal, as downloadable files, and by APIs. We expect other existing and emerging knowledge bases to join in the effort to provide the union of useful data and features that each knowledge base currently provides

    Literature Triage and Indexing in the Mouse Genome Informatics (MGI) Group

    Get PDF
    The Mouse Genome Informatics (MGI; "http://www.informatics.jax.org":http://www.informatics.jax.org) group is comprised of several collaborating projects including the Mouse Genome Database (MGD) Project, the Gene Expression Database (GXD) Project, the Mouse Tumor Biology (MTB) Database Project, and the Gene Ontology (GO) Project. Literature identification and collection is performed cooperatively amongst the groups.

In recent years many institutional libraries have transitioned from a focus largely on print holdings to one of electronic access to journals. This change has necessitated adaptation on the part of the MGI curatorial group. Whereas the majority of journals covered by the group used to be surveyed in paper form, those journals are now surveyed electronically. Approximately 160 journals have been identified as those most relevant to the various database groups. Each curator in the group has the responsibility of scanning several journals for articles relevant to any of the database projects. Articles chosen via this process are marked as to their potential significance for various projects. Each article is catalogued in a Master Bibliography section of the MGI database system and annotated to the database sections for which it has been identified as relevant. A secondary triage process allows curators from each group to scan the chosen articles and mark ones desired for their project if such annotation has been missed on the initial scan.

Once articles have been identified for each database project a variety of processes are implemented to further categorize and index data from those articles. For example, the Alleles and Phenotype section of the MGD database indexes each article marked for MGD and in this indexing process they identify each mouse gene and allele examined in the article. The GXD database indexing process has a different focus. In this case articles are indexed with regard to the stage of development used in the study as well as the assay technique used. In each case the indexing gives an overview of the data held in the article and assists in the more extensive curation performed in the following step of the curation process. Indexing also provides each group with valuable information used to prioritize and streamline the overall curation process.

The MGI projects are supported by NHGRI grants HG000330, HG00273, and HG003622, NICHD grant HD033745, and NCI grant CA089713

    The varved succession of Crawford Lake, Milton, Ontario, Canada as a candidate Global boundary Stratotype Section and Point for the Anthropocene series

    Get PDF
    An annually laminated succession in Crawford Lake, Ontario, Canada is proposed as the Global boundary Stratotype Section and Point (GSSP) for the Anthropocene as a series/epoch with a base dated at 1950 CE. Varve couplets of organic matter capped by calcite precipitated each summer in alkaline surface waters reflect environmental change at global to local scales. Spheroidal carbonaceous particles and nitrogen isotopes record an increase in fossil fuel combustion in the early 1950s, coinciding with fallout from nuclear and thermonuclear testing—239+240Pu and 14C:12C, the latter more than compensating for the effects of old carbon in this dolomitic basin. Rapid industrial expansion in the North American Great Lakes region led to enhanced leaching of terrigenous elements by acid precipitation during the Great Acceleration, and calcite precipitation was reduced, producing thin calcite laminae around the GSSP that is marked by a sharp decline in elm pollen (Dutch Elm disease). The lack of bioturbation in well-oxygenated bottom waters, supported by the absence of fossil pigments from obligately anaerobic purple sulfur bacteria, is attributed to elevated salinities and high alkalinity below the chemocline. This aerobic depositional environment, unusual in a meromictic lake, inhibits the mobilization of 239Pu, the proposed primary stratigraphic guide for the Anthropocene

    Consensus coding sequence (CCDS) database: a standardized set of human and mouse protein-coding regions supported by expert curation.

    Get PDF
    The Consensus Coding Sequence (CCDS) project provides a dataset of protein-coding regions that are identically annotated on the human and mouse reference genome assembly in genome annotations produced independently by NCBI and the Ensembl group at EMBL-EBI. This dataset is the product of an international collaboration that includes NCBI, Ensembl, HUGO Gene Nomenclature Committee, Mouse Genome Informatics and University of California, Santa Cruz. Identically annotated coding regions, which are generated using an automated pipeline and pass multiple quality assurance checks, are assigned a stable and tracked identifier (CCDS ID). Additionally, coordinated manual review by expert curators from the CCDS collaboration helps in maintaining the integrity and high quality of the dataset. The CCDS data are available through an interactive web page (https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi) and an FTP site (ftp://ftp.ncbi.nlm.nih.gov/pub/CCDS/). In this paper, we outline the ongoing work, growth and stability of the CCDS dataset and provide updates on new collaboration members and new features added to the CCDS user interface. We also present expert curation scenarios, with specific examples highlighting the importance of an accurate reference genome assembly and the crucial role played by input from the research community. Nucleic Acids Res 2018 Jan 4; 46(D1):D221-D228

    Integrating Text Mining into the MGI Biocuration Workflow

    Full text link
    corecore