346 research outputs found

    Semantically linking and browsing PubMed abstracts with gene ontology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The technological advances in the past decade have lead to massive progress in the field of biotechnology. The documentation of the progress made exists in the form of research articles. The PubMed is the current most used repository for bio-literature. PubMed consists of about 17 million abstracts as of 2007 that require methods to efficiently retrieve and browse large volume of relevant information. The State-of-the-art technologies such as GOPubmed use simple keyword-based techniques for retrieving abstracts from the PubMed and linking them to the Gene Ontology (GO). This paper changes the paradigm by introducing semantics enabled technique to link the PubMed to the Gene Ontology, called, SEGOPubmed for ontology-based browsing. Latent Semantic Analysis (LSA) framework is used to semantically interface PubMed abstracts to the Gene Ontology.</p> <p>Results</p> <p>The Empirical analysis is performed to compare the performance of the SEGOPubmed with the GOPubmed. The analysis is initially performed using a few well-referenced query words. Further, statistical analysis is performed using GO curated dataset as ground truth. The analysis suggests that the SEGOPubmed performs better than the classic GOPubmed as it incorporates semantics.</p> <p>Conclusions</p> <p>The LSA technique is applied on the PubMed abstracts obtained based on the user query and the semantic similarity between the query and the abstracts. The analyses using well-referenced keywords show that the proposed semantic-sensitive technique outperformed the string comparison based techniques in associating the relevant abstracts to the GO terms. The SEGOPubmed also extracted the abstracts in which the keywords do not appear in isolation (i.e. they appear in combination with other terms) that could not be retrieved by simple term matching techniques.</p

    Bridging the gap between social tagging and semantic annotation: E.D. the Entity Describer

    Get PDF
    Semantic annotation enables the development of efficient computational methods for analyzing and interacting with information, thus maximizing its value. With the already substantial and constantly expanding data generation capacity of the life sciences as well as the concomitant increase in the knowledge distributed in scientific articles, new ways to produce semantic annotations of this information are crucial. While automated techniques certainly facilitate the process, manual annotation remains the gold standard in most domains. In this manuscript, we describe a prototype mass-collaborative semantic annotation system that, by distributing the annotation workload across the broad community of biomedical researchers, may help to produce the volume of meaningful annotations needed by modern biomedical science. We present E.D., the Entity Describer, a mashup of the Connotea social tagging system, an index of semantic web-accessible controlled vocabularies, and a new public RDF database for storing social semantic annotations

    DynGO: a tool for visualizing and mining of Gene Ontology and its associations

    Get PDF
    BACKGROUND: A large volume of data and information about genes and gene products has been stored in various molecular biology databases. A major challenge for knowledge discovery using these databases is to identify related genes and gene products in disparate databases. The development of Gene Ontology (GO) as a common vocabulary for annotation allows integrated queries across multiple databases and identification of semantically related genes and gene products (i.e., genes and gene products that have similar GO annotations). Meanwhile, dozens of tools have been developed for browsing, mining or editing GO terms, their hierarchical relationships, or their "associated" genes and gene products (i.e., genes and gene products annotated with GO terms). Tools that allow users to directly search and inspect relations among all GO terms and their associated genes and gene products from multiple databases are needed. RESULTS: We present a standalone package called DynGO, which provides several advanced functionalities in addition to the standard browsing capability of the official GO browsing tool (AmiGO). DynGO allows users to conduct batch retrieval of GO annotations for a list of genes and gene products, and semantic retrieval of genes and gene products sharing similar GO annotations. The result are shown in an association tree organized according to GO hierarchies and supported with many dynamic display options such as sorting tree nodes or changing orientation of the tree. For GO curators and frequent GO users, DynGO provides fast and convenient access to GO annotation data. DynGO is generally applicable to any data set where the records are annotated with GO terms, as illustrated by two examples. CONCLUSION: We have presented a standalone package DynGO that provides functionalities to search and browse GO and its association databases as well as several additional functions such as batch retrieval and semantic retrieval. The complete documentation and software are freely available for download from the website

    Establishing a distributed system for the simple representation and integration of diverse scientific assertions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Information technology has the potential to increase the pace of scientific progress by helping researchers in formulating, publishing and finding information. There are numerous projects that employ ontologies and Semantic Web technologies towards this goal. However, the number of applications that have found widespread use among biomedical researchers is still surprisingly small. In this paper we present the aTag (‘associative tags’) convention, which aims to drastically lower the entry barriers to the biomedical Semantic Web. aTags are short snippets of HTML+RDFa with embedded RDF/OWL based on the Semantically Interlinked Online Communities (SIOC) vocabulary and domain ontologies and taxonomies, such as the Open Biomedical Ontologies and DBpedia. The structure of aTags is very simple: a short piece of human-readable text that is ‘tagged’ with relevant ontological entities. This paper describes our efforts for seeding the creation of a viable ecosystem of datasets, tools and services around aTags.</p> <p>Results</p> <p>Numerous biomedical datasets in aTag format and systems for the creation of aTags have been set-up and are described in this paper. Prototypes of some of these systems are accessible at <url>http://hcls.deri.org/atag</url></p> <p>Conclusions</p> <p>The aTags convention enables the rapid development of diverse, integrated datasets and semantically interoperable applications. More work needs to be done to study the practicability of this approach in different use-case scenarios, and to encourage uptake of the convention by other groups.</p

    Integrating findings of traditional medicine with modern pharmaceutical research: the potential role of linked open data

    Get PDF
    One of the biggest obstacles to progress in modern pharmaceutical research is the difficulty of integrating all available research findings into effective therapies for humans. Studies of traditionally used pharmacologically active plants and other substances in traditional medicines may be valuable sources of previously unknown compounds with therapeutic actions. However, the integration of findings from traditional medicines can be fraught with difficulties and misunderstandings. This article proposes an approach to use linked open data and Semantic Web technologies to address the heterogeneous data integration problem. The approach is based on our initial experiences with implementing an integrated web of data for a selected use-case, i.e., the identification of plant species used in Chinese medicine that indicate potential antidepressant activities

    A Linked Data Approach to Sharing Workflows and Workflow Results

    No full text
    A bioinformatics analysis pipeline is often highly elaborate, due to the inherent complexity of biological systems and the variety and size of datasets. A digital equivalent of the ‘Materials and Methods’ section in wet laboratory publications would be highly beneficial to bioinformatics, for evaluating evidence and examining data across related experiments, while introducing the potential to find associated resources and integrate them as data and services. We present initial steps towards preserving bioinformatics ‘materials and methods’ by exploiting the workflow paradigm for capturing the design of a data analysis pipeline, and RDF to link the workflow, its component services, run-time provenance, and a personalized biological interpretation of the results. An example shows the reproduction of the unique graph of an analysis procedure, its results, provenance, and personal interpretation of a text mining experiment. It links data from Taverna, myExperiment.org, BioCatalogue.org, and ConceptWiki.org. The approach is relatively ‘light-weight’ and unobtrusive to bioinformatics users

    Towards linked open gene mutations data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the advent of high-throughput technologies, a great wealth of variation data is being produced. Such information may constitute the basis for correlation analyses between genotypes and phenotypes and, in the future, for personalized medicine. Several databases on gene variation exist, but this kind of information is still scarce in the Semantic Web framework.</p> <p>In this paper, we discuss issues related to the integration of mutation data in the Linked Open Data infrastructure, part of the Semantic Web framework. We present the development of a mapping from the IARC TP53 Mutation database to RDF and the implementation of servers publishing this data.</p> <p>Methods</p> <p>A version of the IARC TP53 Mutation database implemented in a relational database was used as first test set. Automatic mappings to RDF were first created by using D2RQ and later manually refined by introducing concepts and properties from domain vocabularies and ontologies, as well as links to Linked Open Data implementations of various systems of biomedical interest.</p> <p>Since D2RQ query performances are lower than those that can be achieved by using an RDF archive, generated data was also loaded into a dedicated system based on tools from the Jena software suite.</p> <p>Results</p> <p>We have implemented a D2RQ Server for TP53 mutation data, providing data on a subset of the IARC database, including gene variations, somatic mutations, and bibliographic references. The server allows to browse the RDF graph by using links both between classes and to external systems. An alternative interface offers improved performances for SPARQL queries. The resulting data can be explored by using any Semantic Web browser or application.</p> <p>Conclusions</p> <p>This has been the first case of a mutation database exposed as Linked Data. A revised version of our prototype, including further concepts and IARC TP53 Mutation database data sets, is under development.</p> <p>The publication of variation information as Linked Data opens new perspectives: the exploitation of SPARQL searches on mutation data and other biological databases may support data retrieval which is presently not possible. Moreover, reasoning on integrated variation data may support discoveries towards personalized medicine.</p

    Knowledge Representation for Web Navigation

    Get PDF
    Representations of domain knowledge range from those that are ontologically formal, semantically rich to those that are ontologically informal and semantically weak. Representations of knowledge are important in many tasks, one of which is the support of travel around information spaces through the identification and linking of concepts in a field. In this paper we explore how representations of ontologically informal, semantically weak domain knowledge as captured by the Simple Knowledge Organisation System (SKOS) can enable a system to take advantage of the large number of existing ontological representations to support semantic linking of Web based information and thus facilitate information travel

    GI Systems for public health with an ontology based approach

    Get PDF
    Dissertation submitted in partial fulfillment of the requirements for the Degree of Master of Science in Geospatial Technologies.Health is an indispensable attribute of human life. In modern age, utilizing technologies for health is one of the emergent concepts in several applied fields. Computer science, (geographic) information systems are some of the interdisciplinary fields which motivates this thesis. Inspiring idea of the study is originated from a rhetorical disease DbHd: Database Hugging Disorder, defined by Hans Rosling at World Bank Open Data speech in May 2010. The cure of this disease can be offered as linked open data, which contains ontologies for health science, diseases, genes, drugs, GEO species etc. LOD-Linked Open Data provides the systematic application of information by publishing and connecting structured data on the Web. In the context of this study we aimed to reduce boundaries between semantic web and geo web. For this reason a use case data is studied from Valencia CSISP- Research Center of Public Health in which the mortality rates for particular diseases are represented spatio-temporally. Use case data is divided into three conceptual domains (health, spatial, statistical), enhanced with semantic relations and descriptions by following Linked Data Principles. Finally in order to convey complex health-related information, we offer an infrastructure integrating geo web and semantic web. Based on the established outcome, user access methods are introduced and future researches/studies are outlined

    Word add-in for ontology recognition: semantic enrichment of scientific literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the current era of scientific research, efficient communication of information is paramount. As such, the nature of scholarly and scientific communication is changing; cyberinfrastructure is now absolutely necessary and new media are allowing information and knowledge to be more interactive and immediate. One approach to making knowledge more accessible is the addition of machine-readable semantic data to scholarly articles.</p> <p>Results</p> <p>The Word add-in presented here will assist authors in this effort by automatically recognizing and highlighting words or phrases that are likely information-rich, allowing authors to associate semantic data with those words or phrases, and to embed that data in the document as XML. The add-in and source code are publicly available at <url>http://www.codeplex.com/UCSDBioLit</url>.</p> <p>Conclusions</p> <p>The Word add-in for ontology term recognition makes it possible for an author to add semantic data to a document as it is being written and it encodes these data using XML tags that are effectively a standard in life sciences literature. Allowing authors to mark-up their own work will help increase the amount and quality of machine-readable literature metadata.</p
    corecore