8,818 research outputs found

    Automatic assignment of biomedical categories: toward a generic approach

    Get PDF
    Motivation: We report on the development of a generic text categorization system designed to automatically assign biomedical categories to any input text. Unlike usual automatic text categorization systems, which rely on data-intensive models extracted from large sets of training data, our categorizer is largely data-independent. Methods: In order to evaluate the robustness of our approach we test the system on two different biomedical terminologies: the Medical Subject Headings (MeSH) and the Gene Ontology (GO). Our lightweight categorizer, based on two ranking modules, combines a pattern matcher and a vector space retrieval engine, and uses both stems and linguistically-motivated indexing units. Results and Conclusion: Results show the effectiveness of phrase indexing for both GO and MeSH categorization, but we observe the categorization power of the tool depends on the controlled vocabulary: precision at high ranks ranges from above 90% for MeSH to <20% for GO, establishing a new baseline for categorizers based on retrieval methods. Contact: [email protected]

    Cross-concordances: terminology mapping and its effectiveness for information retrieval

    Get PDF
    The German Federal Ministry for Education and Research funded a major terminology mapping initiative, which found its conclusion in 2007. The task of this terminology mapping initiative was to organize, create and manage 'cross-concordances' between controlled vocabularies (thesauri, classification systems, subject heading lists) centred around the social sciences but quickly extending to other subject areas. 64 crosswalks with more than 500,000 relations were established. In the final phase of the project, a major evaluation effort to test and measure the effectiveness of the vocabulary mappings in an information system environment was conducted. The paper reports on the cross-concordance work and evaluation results.Comment: 19 pages, 4 figures, 11 tables, IFLA conference 200

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Thesaurus-assisted search term selection and query expansion: a review of user-centred studies

    Get PDF
    This paper provides a review of the literature related to the application of domain-specific thesauri in the search and retrieval process. Focusing on studies which adopt a user-centred approach, the review presents a survey of the methodologies and results from empirical studies undertaken on the use of thesauri as sources of term selection for query formulation and expansion during the search process. It summaries the ways in which domain-specific thesauri from different disciplines have been used by various types of users and how these tools aid users in the selection of search terms. The review consists of two main sections covering, firstly studies on thesaurus-aided search term selection and secondly those dealing with query expansion using thesauri. Both sections are illustrated with case studies that have adopted a user-centred approach

    Thesauri on the Web: current developments and trends

    Get PDF
    This article provides an overview of recent developments relating to the application of thesauri in information organisation and retrieval on the World Wide Web. It describes some recent thesaurus projects undertaken to facilitate resource description and discovery and access to wide-ranging information resources on the Internet. Types of thesauri available on the Web, thesauri integrated in databases and information retrieval systems, and multiple-thesaurus systems for cross-database searching are also discussed. Collective efforts and events in addressing the standardisation and novel applications of thesauri are briefly reviewed

    Ensuring the discoverability of digital images for social work education : an online tagging survey to test controlled vocabularies

    Get PDF
    The digital age has transformed access to all kinds of educational content not only in text-based format but also digital images and other media. As learning technologists and librarians begin to organise these new media into digital collections for educational purposes, older problems associated with cataloguing and classifying non-text media have re-emerged. At the heart of this issue is the problem of describing complex and highly subjective images in a reliable and consistent manner. This paper reports on the findings of research designed to test the suitability of two controlled vocabularies to index and thereby improve the discoverability of images stored in the Learning Exchange, a repository for social work education and research. An online survey asked respondents to "tag", a series of images and responses were mapped against the two controlled vocabularies. Findings showed that a large proportion of user generated tags could be mapped to the controlled vocabulary terms (or their equivalents). The implications of these findings for indexing and discovering content are discussed in the context of a wider review of the literature on "folksonomies" (or user tagging) versus taxonomies and controlled vocabularies

    Context and Keyword Extraction in Plain Text Using a Graph Representation

    Full text link
    Document indexation is an essential task achieved by archivists or automatic indexing tools. To retrieve relevant documents to a query, keywords describing this document have to be carefully chosen. Archivists have to find out the right topic of a document before starting to extract the keywords. For an archivist indexing specialized documents, experience plays an important role. But indexing documents on different topics is much harder. This article proposes an innovative method for an indexing support system. This system takes as input an ontology and a plain text document and provides as output contextualized keywords of the document. The method has been evaluated by exploiting Wikipedia's category links as a termino-ontological resources

    Experiments in terabyte searching, genomic retrieval and novelty detection for TREC 2004

    Get PDF
    In TREC2004, Dublin City University took part in three tracks, Terabyte (in collaboration with University College Dublin), Genomic and Novelty. In this paper we will discuss each track separately and present separate conclusions from this work. In addition, we present a general description of a text retrieval engine that we have developed in the last year to support our experiments into large scale, distributed information retrieval, which underlies all of the track experiments described in this document

    A picture is worth a thousand words: The perplexing problem of indexing images

    Get PDF
    Indexing images has always been problematic due to their richness of content and innate subjectivity. Three traditional approaches to indexing images are described and analyzed. An introduction of the contemporary use of social tagging is presented along with its limitations. Traditional practices can continue to be used as a stand-alone solution, however deficiencies limit retrieval. A collaborative technique is supported by current research and a model created by the authors for its inception is explored. CONTENTdm® is used as an example to illustrate tools that can help facilitate this process. Another potential solution discussed is the expansion of algorithms used in computer extraction to include the input and influence of human indexer intelligence. Further research is recommended in each area to discern the most effective method
    corecore