1,192 research outputs found

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Still a Lot to Lose: The Role of Controlled Vocabulary in Keyword Searching

    Get PDF
    In their 2005 study, Gross and Taylor found that more than a third of records retrieved by keyword searches would be lost without subject headings. A review of the literature since then shows that numerous studies, in various disciplines, have found that a quarter to a third of records returned in a keyword search would be lost without controlled vocabulary. Other writers, though, have continued to suggest that controlled vocabulary be discontinued. Addressing criticisms of the Gross/Taylor study, this study replicates the search process in the same online catalog, but after the addition of automated enriched metadata such as tables of contents and summaries. The proportion of results that would be lost remains high

    Thesaurus-assisted search term selection and query expansion: a review of user-centred studies

    Get PDF
    This paper provides a review of the literature related to the application of domain-specific thesauri in the search and retrieval process. Focusing on studies which adopt a user-centred approach, the review presents a survey of the methodologies and results from empirical studies undertaken on the use of thesauri as sources of term selection for query formulation and expansion during the search process. It summaries the ways in which domain-specific thesauri from different disciplines have been used by various types of users and how these tools aid users in the selection of search terms. The review consists of two main sections covering, firstly studies on thesaurus-aided search term selection and secondly those dealing with query expansion using thesauri. Both sections are illustrated with case studies that have adopted a user-centred approach

    Evolutionary Subject Tagging in the Humanities; Supporting Discovery and Examination in Digital Cultural Landscapes

    Get PDF
    In this paper, the authors attempt to identify problematic issues for subject tagging in the humanities, particularly those associated with information objects in digital formats. In the third major section, the authors identify a number of assumptions that lie behind the current practice of subject classification that we think should be challenged. We move then to propose features of classification systems that could increase their effectiveness. These emerged as recurrent themes in many of the conversations with scholars, consultants, and colleagues. Finally, we suggest next steps that we believe will help scholars and librarians develop better subject classification systems to support research in the humanities.NEH Office of Digital Humanities: Digital Humanities Start-Up Grant (HD-51166-10

    Spatial Search Strategies for Open Government Data: A Systematic Comparison

    Full text link
    The increasing availability of open government datasets on the Web calls for ways to enable their efficient access and searching. There is however an overall lack of understanding regarding spatial search strategies which would perform best in this context. To address this gap, this work has assessed the impact of different spatial search strategies on performance and user relevance judgment. We harvested machine-readable spatial datasets and their metadata from three English-based open government data portals, performed metadata enhancement, developed a prototype and performed both a theoretical and user-based evaluation. The results highlight that (i) switching between area of overlap and Hausdorff distance for spatial similarity computation does not have any substantial impact on performance; and (ii) the use of Hausdorff distance induces slightly better user relevance ratings than the use of area of overlap. The data collected and the insights gleaned may serve as a baseline against which future work can compare.Comment: Paper accepted to GIR'19: 13th Workshop on Geographic Information Retrieval (Lyon, France

    Treatment of Semantic Heterogeneity in Information Retrieval

    Full text link
    "Nowadays, users of information services are faced with highly decentralised, heterogeneous document sources with different content analysis. Semantic heterogeneity occurs e.g. when resources using different systems for content description are searched using a single query system. This report describes several approaches of handling semantic heterogeneity used in projects of the German Social Science Information Centre." (author's abstract
    corecore