2,203 research outputs found

    Protein Ontology: Enhancing and scaling up the representation of protein entities

    Get PDF
    The Protein Ontology (PRO; http://purl.obolibrary.org/obo/pr) formally defines and describes taxon-specific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and protein-containing complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translational modification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities

    ImmPort, toward repurposing of open access immunological assay data for translational and clinical research

    Get PDF
    Immunology researchers are beginning to explore the possibilities of reproducibility, reuse and secondary analyses of immunology data. Open-access datasets are being applied in the validation of the methods used in the original studies, leveraging studies for meta-analysis, or generating new hypotheses. To promote these goals, the ImmPort data repository was created for the broader research community to explore the wide spectrum of clinical and basic research data and associated findings. The ImmPort ecosystem consists of four components–Private Data, Shared Data, Data Analysis, and Resources—for data archiving, dissemination, analyses, and reuse. To date, more than 300 studies have been made freely available through the ImmPort Shared Data portal , which allows research data to be repurposed to accelerate the translation of new insights into discoveries

    Application of Biomedical Text Mining

    Get PDF
    With the enormous volume of biological literature, increasing growth phenomenon due to the high rate of new publications is one of the most common motivations for the biomedical text mining. Aiming at this massive literature to process, it could extract more biological information for mining biomedical knowledge. Using the information will help understand the mechanism of disease generation, promote the development of disease diagnosis technology, and promote the development of new drugs in the field of biomedical research. Based on the background, this chapter introduces the rise of biomedical text mining. Then, it describes the biomedical text-mining technology, namely natural language processing, including the several components. This chapter emphasizes the two aspects in biomedical text mining involving static biomedical information recognization and dynamic biomedical information extraction using instance analysis from our previous works. The aim is to provide a way to quickly understand biomedical text mining for some researchers

    Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery

    Full text link
    Recent research in representation learning utilizes large databases of proteins or molecules to acquire knowledge of drug and protein structures through unsupervised learning techniques. These pre-trained representations have proven to significantly enhance the accuracy of subsequent tasks, such as predicting the affinity between drugs and target proteins. In this study, we demonstrate that by incorporating knowledge graphs from diverse sources and modalities into the sequences or SMILES representation, we can further enrich the representation and achieve state-of-the-art results on established benchmark datasets. We provide preprocessed and integrated data obtained from 7 public sources, which encompass over 30M triples. Additionally, we make available the pre-trained models based on this data, along with the reported outcomes of their performance on three widely-used benchmark datasets for drug-target binding affinity prediction found in the Therapeutic Data Commons (TDC) benchmarks. Additionally, we make the source code for training models on benchmark datasets publicly available. Our objective in releasing these pre-trained models, accompanied by clean data for model pretraining and benchmark results, is to encourage research in knowledge-enhanced representation learning

    The Infectious Disease Ontology in the Age of COVID-19

    Get PDF
    The Infectious Disease Ontology (IDO) is a suite of interoperable ontology modules that aims to provide coverage of all aspects of the infectious disease domain, including biomedical research, clinical care, and public health. IDO Core is designed to be a disease and pathogen neutral ontology, covering just those types of entities and relations that are relevant to infectious diseases generally. IDO Core is then extended by a collection of ontology modules focusing on specific diseases and pathogens. In this paper we present applications of IDO Core within various areas of infectious disease research, together with an overview of all IDO extension ontologies and the methodology on the basis of which they are built. We also survey recent developments involving IDO, including the creation of IDO Virus; the Coronaviruses Infectious Disease Ontology (CIDO); and an extension of CIDO focused on COVID-19 (IDO-CovID-19).We also discuss how these ontologies might assist in information-driven efforts to deal with the ongoing COVID-19 pandemic, to accelerate data discovery in the early stages of future pandemics, and to promote reproducibility of infectious disease research

    PathwayMatcher: proteoform-centric network construction enables fine-granularity multi-omics pathway mapping

    Get PDF
    Background Mapping biomedical data to functional knowledge is an essential task in bioinformatics and can be achieved by querying identifiers (e.g., gene sets) in pathway knowledge bases. However, the isoform and posttranslational modification states of proteins are lost when converting input and pathways into gene-centric lists. Findings Based on the Reactome knowledge base, we built a network of protein-protein interactions accounting for the documented isoform and modification statuses of proteins. We then implemented a command line application called PathwayMatcher (github.com/PathwayAnalysisPlatform/PathwayMatcher) to query this network. PathwayMatcher supports multiple types of omics data as input and outputs the possibly affected biochemical reactions, subnetworks, and pathways. Conclusions PathwayMatcher enables refining the network representation of pathways by including proteoforms defined as protein isoforms with posttranslational modifications. The specificity of pathway analyses is hence adapted to different levels of granularity, and it becomes possible to distinguish interactions between different forms of the same protein.publishedVersio

    Enabling Web-scale data integration in biomedicine through Linked Open Data

    Get PDF
    The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems

    The Gene Ontology knowledgebase in 2023

    Get PDF
    The Gene Ontology (GO) knowledgebase (http://geneontology.org) is a comprehensive resource concerning the functions of genes and gene products (proteins and noncoding RNAs). GO annotations cover genes from organisms across the tree of life as well as viruses, though most gene function knowledge currently derives from experiments carried out in a relatively small number of model organisms. Here, we provide an updated overview of the GO knowledgebase, as well as the efforts of the broad, international consortium of scientists that develops, maintains, and updates the GO knowledgebase. The GO knowledgebase consists of three components: (1) the GO-a computational knowledge structure describing the functional characteristics of genes; (2) GO annotations-evidence-supported statements asserting that a specific gene product has a particular functional characteristic; and (3) GO Causal Activity Models (GO-CAMs)-mechanistic models of molecular "pathways" (GO biological processes) created by linking multiple GO annotations using defined relations. Each of these components is continually expanded, revised, and updated in response to newly published discoveries and receives extensive QA checks, reviews, and user feedback. For each of these components, we provide a description of the current contents, recent developments to keep the knowledgebase up to date with new discoveries, and guidance on how users can best make use of the data that we provide. We conclude with future directions for the project

    NETME: on-the-fly knowledge network construction from biomedical literature

    Get PDF
    Background: The rapidly increasing biological literature is a key resource to automatically extract and gain knowledge concerning biological elements and their relations. Knowledge Networks are helpful tools in the context of biological knowledge discovery and modeling. Results: We introduce a novel system called NETME, which, starting from a set of full-texts obtained from PubMed, through an easy-to-use web interface, interactively extracts biological elements from ontological databases and then synthesizes a network inferring relations among such elements. The results clearly show that our tool is capable of inferring comprehensive and reliable biological networks
    • …
    corecore