2,234 research outputs found

    Using a Medical Thesaurus to Predict Query Difficulty

    Get PDF
    International audienceEstimating query performance is the task of predicting the quality of results returned by a search engine in response to a query. In this paper, we focus on pre-retrieval prediction methods for the medical domain. We propose a novel predictor that exploits a thesaurus to as- certain how difficult queries are. In our experiments, we show that our predictor outperforms the state-of-the-art methods that do not use a thesaurus

    Finding Support Documents with a Logistic Regression Approach

    Get PDF
    Entity retrieval finds the relevant results for a user’s information needs at a finer unit called “entity”. To retrieve such entity, people usually first locate a small set of support documents which contain answer entities, and then further detect the answer entities in this set. In the literature, people view the support documents as relevant documents, and their findings as a conventional document retrieval problem. In this paper, we will state that finding support documents and that of relevant documents, although sounds similar, have important differences. Further, we propose a logistic regression approach to find support documents. Our experiment results show that the logistic regression method performs significantly better than a baseline system that treat the support document finding as a conventional document retrieval problem

    HILT : High-Level Thesaurus Project. Phase IV and Embedding Project Extension : Final Report

    Get PDF
    Ensuring that Higher Education (HE) and Further Education (FE) users of the JISC IE can find appropriate learning, research and information resources by subject search and browse in an environment where most national and institutional service providers - usually for very good local reasons - use different subject schemes to describe their resources is a major challenge facing the JISC domain (and, indeed, other domains beyond JISC). Encouraging the use of standard terminologies in some services (institutional repositories, for example) is a related challenge. Under the auspices of the HILT project, JISC has been investigating mechanisms to assist the community with this problem through a JISC Shared Infrastructure Service that would help optimise the value obtained from expenditure on content and services by facilitating subject-search-based resource sharing to benefit users in the learning and research communities. The project has been through a number of phases, with work from earlier phases reported, both in published work elsewhere, and in project reports (see the project website: http://hilt.cdlr.strath.ac.uk/). HILT Phase IV had two elements - the core project, whose focus was 'to research, investigate and develop pilot solutions for problems pertaining to cross-searching multi-subject scheme information environments, as well as providing a variety of other terminological searching aids', and a short extension to encompass the pilot embedding of routines to interact with HILT M2M services in the user interfaces of various information services serving the JISC community. Both elements contributed to the developments summarised in this report

    Review implementation of linguistic approach in schema matching

    Get PDF
    Research related schema matching has been conducted since last decade. Few approach related schema matching has been conducted with various methods such as neuron network, feature selection, constrain based, instance based, linguistic, and so on. Some field used schema matching as basic model such as e-commerce, e-business and data warehousing. Implementation of linguistic approach itself has been used a long time with various problem such as to calculated entity similarity values in two or more schemas. The purpose of this paper was to provide an overview of previous studies related to the implementation of the linguistic approach in the schema matching and finding gap for the development of existing methods. Futhermore, this paper focused on measurement of similarity in linguistic approach in schema matching

    A survey on the use of relevance feedback for information access systems

    Get PDF
    Users of online search engines often find it difficult to express their need for information in the form of a query. However, if the user can identify examples of the kind of documents they require then they can employ a technique known as relevance feedback. Relevance feedback covers a range of techniques intended to improve a user's query and facilitate retrieval of information relevant to a user's information need. In this paper we survey relevance feedback techniques. We study both automatic techniques, in which the system modifies the user's query, and interactive techniques, in which the user has control over query modification. We also consider specific interfaces to relevance feedback systems and characteristics of searchers that can affect the use and success of relevance feedback systems

    Examining Users’ Knowledge Change in the Task Completion Process

    Get PDF
    This paper examines the changes of information searchers’ topic knowledge levels in the process of completing information tasks. Multi-session tasks were used in the study, which enables the convenience of eliciting users’ topic knowledge during their process of completing the whole tasks. The study was a 3-session laboratory experiment with 24 participants, each time working on one subtask in an assigned 3-session general task. The general task was either parallel or dependently structured. Questionnaires were administered before and after each session to elicit users’ perceptions of their knowledge levels, task attributes, and other task features, for both the overall task and the sub-tasks. Our results support the assumption that users’ knowledge generally increases after each search session, but there were exceptions in which a “ceiling” effect was shown. We also found that knowledge was correlated with users’ perceptions of task attributes and accomplishment. In addition, task type was found to affect several aspects of knowledge levels and knowledge change. These findings further our understanding of users’ knowledge in information tasks and are thus helpful for information retrieval research and system design

    Human evaluation of Kea, an automatic keyphrasing system.

    Get PDF
    This paper describes an evaluation of the Kea automatic keyphrase extraction algorithm. Tools that automatically identify keyphrases are desirable because document keyphrases have numerous applications in digital library systems, but are costly and time consuming to manually assign. Keyphrase extraction algorithms are usually evaluated by comparison to author-specified keywords, but this methodology has several well-known shortcomings. The results presented in this paper are based on subjective evaluations of the quality and appropriateness of keyphrases by human assessors, and make a number of contributions. First, they validate previous evaluations of Kea that rely on author keywords. Second, they show Kea's performance is comparable to that of similar systems that have been evaluated by human assessors. Finally, they justify the use of author keyphrases as a performance metric by showing that authors generally choose good keywords

    A Maximum-Entropy approach for accurate document annotation in the biomedical domain

    Get PDF
    The increasing number of scientific literature on the Web and the absence of efficient tools used for classifying and searching the documents are the two most important factors that influence the speed of the search and the quality of the results. Previous studies have shown that the usage of ontologies makes it possible to process document and query information at the semantic level, which greatly improves the search for the relevant information and makes one step further towards the Semantic Web. A fundamental step in these approaches is the annotation of documents with ontology concepts, which can also be seen as a classification task. In this paper we address this issue for the biomedical domain and present a new automated and robust method, based on a Maximum Entropy approach, for annotating biomedical literature documents with terms from the Medical Subject Headings (MeSH)
    • 

    corecore