1,623 research outputs found

    On the use of clustering and the MeSH controlled vocabulary to improve MEDLINE abstract search

    Get PDF
    Databases of genomic documents contain substantial amounts of structured information in addition to the texts of titles and abstracts. Unstructured information retrieval techniques fail to take advantage of the structured information available. This paper describes a technique to improve upon traditional retrieval methods by clustering the retrieval result set into two distinct clusters using additional structural information. Our hypothesis is that the relevant documents are to be found in the tightest cluster of the two, as suggested by van Rijsbergen's cluster hypothesis. We present an experimental evaluation of these ideas based on the relevance judgments of the 2004 TREC workshop Genomics track, and the CLUTO software clustering package

    Experiments in terabyte searching, genomic retrieval and novelty detection for TREC 2004

    Get PDF
    In TREC2004, Dublin City University took part in three tracks, Terabyte (in collaboration with University College Dublin), Genomic and Novelty. In this paper we will discuss each track separately and present separate conclusions from this work. In addition, we present a general description of a text retrieval engine that we have developed in the last year to support our experiments into large scale, distributed information retrieval, which underlies all of the track experiments described in this document

    Representing and coding the knowledge embedded in texts of Health Science Web published articles

    Get PDF
    Despite the fact that electronic publishing is a common activity to scholars electronic journals are still based in the print model and do not take full advantage of the facilities offered by the Semantic Web environment. This is a report of the results of a research project with the aim of investigating the possibilities of electronic publishing journal articles both as text for human reading and in machine readable format recording the new knowledge contained in the article. This knowledge is identified with the scientific methodology elements such as problem, methodology, hypothesis, results, and conclusions. A model integrating all those elements is proposed which makes explicit and records the knowledge embedded in the text of scientific articles as an ontology. Knowledge thus represented enables its processing by intelligent software agents The proposed model aims to take advantage of these facilities enabling semantic retrieval and validation of the knowledge contained in articles. To validate and enhance the model a set of electronic journal articles were analyzed

    Methods and trends of biomedical and genomic information retrieval based on semantic relations of thesauri and MeSH

    Get PDF
    There are two methods of retrieving information from documents in the field of genomic science and medicine in general, namely: 1) through the combined use of associations determined by the Medical Subject Headings, and 2) by employing specific terminologies, such as in folksonomies, alternative medical-genomic terms in use in the general language, or acronyms or apocopes from the genomics field. To some extent, many thinkers and indexers hold that the combination of two methods may be the best approach. While few authors advocate for keeping the structure of controlled vocabularies, built up over many years of content interpretation, unchanged, there are numerous proposals for expanding the search horizons of thesauri, whether through social cataloging, algorithmic domain analyses that contrast indicators or the semantic web using markers of meaningful semantic lexicons contained in digitized text

    Could we automatically reproduce semantic relations of an information retrieval thesaurus?

    Full text link
    A well constructed thesaurus is recognized as a valuable source of semantic information for various applications, especially for Information Retrieval. The main hindrances to using thesaurus-oriented approaches are the high complexity and cost of manual thesauri creation. This paper addresses the problem of automatic thesaurus construction, namely we study the quality of automatically extracted semantic relations as compared with the semantic relations of a manually crafted thesaurus. The vector-space model based on syntactic contexts was used to reproduce relations between the terms of a manually constructed thesaurus. We propose a simple algorithm for representing both single word and multiword terms in the distributional space of syntactic contexts. Furthermore, we propose a method for evaluation quality of the extracted relations. Our experiments show significant difference between the automatically and manually constructed relations: while many of the automatically generated relations are relevant, just a small part of them could be found in the original thesaurus

    Exploiting open standards in academic web services

    Get PDF
    In Digital Library-related technologies, there is a whole host of open standards and protocols that are at varying stages of definition or emergence and acceptance or agreement. Nevertheless, specifically in an academic context, these have led to some valuable improvements in the quality and value of services provided to teachers, learners and researchers alike. However, it often remains difficult for these information seekers to find relevant resources that are not immediately 'visible', they may be effectively hidden within database-driven web services or proprietary applications. The focus of this paper is upon a project based at the UK academic data centre, MIMAS, which provides web-based services to the education community in the UK, Ireland and beyond. The project's principle aim was to increase the visibility and accessibility of 'appropriate' resources by exploiting a number of relevant open standards and initiatives to ensure interoperability. This principally required focusing on machine-to-machine metadata interchange
    corecore