73,997 research outputs found

    On the use of clustering and the MeSH controlled vocabulary to improve MEDLINE abstract search

    Get PDF
    Databases of genomic documents contain substantial amounts of structured information in addition to the texts of titles and abstracts. Unstructured information retrieval techniques fail to take advantage of the structured information available. This paper describes a technique to improve upon traditional retrieval methods by clustering the retrieval result set into two distinct clusters using additional structural information. Our hypothesis is that the relevant documents are to be found in the tightest cluster of the two, as suggested by van Rijsbergen's cluster hypothesis. We present an experimental evaluation of these ideas based on the relevance judgments of the 2004 TREC workshop Genomics track, and the CLUTO software clustering package

    09101 Abstracts Collection -- Interactive Information Retrieval

    Get PDF
    From 01.03. to 06.03.2009, the Dagstuhl Seminar 09101 ``Interactive Information Retrieval \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    CREATING A BIOMEDICAL ONTOLOGY INDEXED SEARCH ENGINE TO IMPROVE THE SEMANTIC RELEVANCE OF RETREIVED MEDICAL TEXT

    Get PDF
    Medical Subject Headings (MeSH) is a controlled vocabulary used by the National Library of Medicine to index medical articles, abstracts, and journals contained within the MEDLINE database. Although MeSH imposes uniformity and consistency in the indexing process, it has been proven that using MeSH indices only result in a small increase in precision over free-text indexing. Moreover, studies have shown that the use of controlled vocabularies in the indexing process is not an effective method to increase semantic relevance in information retrieval. To address the need for semantic relevance, we present an ontology-based information retrieval system for the MEDLINE collection that result in a 37.5% increase in precision when compared to free-text indexing systems. The presented system focuses on the ontology to: provide an alternative to text-representation for medical articles, finding relationships among co-occurring terms in abstracts, and to index terms that appear in text as well as discovered relationships. The presented system is then compared to existing MeSH and Free-Text information retrieval systems. This dissertation provides a proof-of-concept for an online retrieval system capable of providing increased semantic relevance when searching through medical abstracts in MEDLINE

    Time to Redefine Database

    Get PDF
    IT USED TO BE EASY to define a database. It was a continuously updated computer file of related information, abstracts, or references on a particular subject, arranged for ease and speed of search and retrieval (ODLIS: Online Dictionary of Library and Information Science, www.wcsu.ctstateu.edu/library/ odlis.html). A database publisher such as Psychological Abstracts or Engineering Information was responsible for creating the content (and perhaps distributing the printed indexes), but the vendor, such as Dialog or SilverPlatter, transformed the content to make it searchable and then provided access

    07071 Abstracts Collection -- Web Information Retrieval and Linear Algebra Algorithms

    Get PDF
    From 12th to 16th February 2007, the Dagstuhl Seminar 07071 ``Web Information Retrieval and Linear Algebra Algorithms\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

    The impact of MIREX on scholarly research (2005-2010)

    Get PDF
    This paper explores the impact of the MIREX (Music Information Retrieval Evaluation eXchange) evaluation initiative on scholarly research. Impact is assessed through a bibliometric evaluation of both the MIREX extended abstracts and the papers citing the MIREX results, the trial framework and methodology, or MIREX datasets. Impact is examined through number of publications and citation analysis. We further explore the primary publication venues for MIREX results, the geographic distribution of both MIREX contributors and researchers citing MIREX results, and the spread of MIREX-based research beyond the MIREX contributor teams. This analysis indicates that research in this area is highly collaborative, has achieved an international dissemination, and has grown to have a significant profile in the research literature

    Improved mutation tagging with gene identifiers applied to membrane protein stability prediction

    Get PDF
    Background The automated retrieval and integration of information about protein point mutations in combination with structure, domain and interaction data from literature and databases promises to be a valuable approach to study structure-function relationships in biomedical data sets. Results We developed a rule- and regular expression-based protein point mutation retrieval pipeline for PubMed abstracts, which shows an F-measure of 87% for the mutation retrieval task on a benchmark dataset. In order to link mutations to their proteins, we utilize a named entity recognition algorithm for the identification of gene names co-occurring in the abstract, and establish links based on sequence checks. Vice versa, we could show that gene recognition improved from 77% to 91% F-measure when considering mutation information given in the text. To demonstrate practical relevance, we utilize mutation information from text to evaluate a novel solvation energy based model for the prediction of stabilizing regions in membrane proteins. For five G protein-coupled receptors we identified 35 relevant single mutations and associated phenotypes, of which none had been annotated in the UniProt or PDB database. In 71% reported phenotypes were in compliance with the model predictions, supporting a relation between mutations and stability issues in membrane proteins. Conclusion We present a reliable approach for the retrieval of protein mutations from PubMed abstracts for any set of genes or proteins of interest. We further demonstrate how amino acid substitution information from text can be utilized for protein structure stability studies on the basis of a novel energy model

    An evaluation of Bradfordizing effects

    Get PDF
    The purpose of this paper is to apply and evaluate the bibliometric method Bradfordizing for information retrieval (IR) experiments. Bradfordizing is used for generating core document sets for subject-specific questions and to reorder result sets from distributed searches. The method will be applied and tested in a controlled scenario of scientific literature databases from social and political sciences, economics, psychology and medical science (SOLIS, SoLit, USB Köln Opac, CSA Sociological Abstracts, World Affairs Online, Psyndex and Medline) and 164 standardized topics. An evaluation of the method and its effects is carried out in two laboratory-based information retrieval experiments (CLEF and KoMoHe) using a controlled document corpus and human relevance assessments. The results show that Bradfordizing is a very robust method for re-ranking the main document types (journal articles and monographs) in today’s digital libraries (DL). The IR tests show that relevance distributions after re-ranking improve at a significant level if articles in the core are compared with articles in the succeeding zones. The items in the core are significantly more often assessed as relevant, than items in zone 2 (z2) or zone 3 (z3). The improvements between the zones are statistically significant based on the Wilcoxon signed-rank test and the paired T-Test

    Text analysis and computers

    Full text link
    Content: Erhard Mergenthaler: Computer-assisted content analysis (3-32); Udo Kelle: Computer-aided qualitative data analysis: an overview (33-63); Christian Mair: Machine-readable text corpora and the linguistic description of danguages (64-75); JĂŒrgen Krause: Principles of content analysis for information retrieval systems (76-99); Conference Abstracts (100-131)
    • 

    corecore