17,472 research outputs found

    Concept-based Interactive Query Expansion Support Tool (CIQUEST)

    Get PDF
    This report describes a three-year project (2000-03) undertaken in the Information Studies Department at The University of Sheffield and funded by Resource, The Council for Museums, Archives and Libraries. The overall aim of the research was to provide user support for query formulation and reformulation in searching large-scale textual resources including those of the World Wide Web. More specifically the objectives were: to investigate and evaluate methods for the automatic generation and organisation of concepts derived from retrieved document sets, based on statistical methods for term weighting; and to conduct user-based evaluations on the understanding, presentation and retrieval effectiveness of concept structures in selecting candidate terms for interactive query expansion. The TREC test collection formed the basis for the seven evaluative experiments conducted in the course of the project. These formed four distinct phases in the project plan. In the first phase, a series of experiments was conducted to investigate further techniques for concept derivation and hierarchical organisation and structure. The second phase was concerned with user-based validation of the concept structures. Results of phases 1 and 2 informed on the design of the test system and the user interface was developed in phase 3. The final phase entailed a user-based summative evaluation of the CiQuest system. The main findings demonstrate that concept hierarchies can effectively be generated from sets of retrieved documents and displayed to searchers in a meaningful way. The approach provides the searcher with an overview of the contents of the retrieved documents, which in turn facilitates the viewing of documents and selection of the most relevant ones. Concept hierarchies are a good source of terms for query expansion and can improve precision. The extraction of descriptive phrases as an alternative source of terms was also effective. With respect to presentation, cascading menus were easy to browse for selecting terms and for viewing documents. In conclusion the project dissemination programme and future work are outlined

    HITS and misses: combining BM25 with HITS for expert search

    Get PDF
    This paper describes the participation of Dublin City University in the CriES (Cross-Lingual Expert Search) pilot challenge. To realize expert search, we combine traditional information retrieval (IR)using the BM25 model with reranking of results using the HITS algorithm. The experiments were performed on two indexes, one containing all questions and one containing all answers. Two runs were submitted. The first one contains the combination of results from IR on the questions with authority values from HITS; the second contains the reranked results from IR on answers with authority values. To investigate the impact of multilinguality, additional experiments were conducted on the English topic subset and on all topics translated into English with Google Translate. The overall performance is moderate and leaves much room for improvement. However, reranking results with authority values from HITS typically improved results and more than doubled the number of relevant and retrieved results and precision at 10 documents in many experiments

    Finding Relevant Answers in Software Forums

    Get PDF
    Abstract—Online software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containin

    EagleBot: A Chatbot Based Multi-Tier Question Answering System for Retrieving Answers From Heterogeneous Sources Using BERT

    Get PDF
    This paper proposes to tackle Question Answering on a specific domain by developing a multi-tier system using three different types of data storage for storing answers. For testing our system on University domain we have used extracted data from Georgia Southern University website. For the task of faster retrieval we have divided our answer data sources into three distinct types and utilized Dialogflow\u27s Natural Language Understanding engine for route selection. We compared different word and sentence embedding techniques for making a semantic question search engine and BERT sentence embedding gave us the best result and for extracting answer from a large collection of documents we also achieved the highest accuracy using the BERT-base model. Besides trying with the BERT-base model we also achieved competitive accuracy by using BERT embedding on paragraph splitted documents. We have also been able to accelerate the answer retrieval time by a huge percentage using pre-stored embedding

    Reliable online social network data collection

    Get PDF
    Large quantities of information are shared through online social networks, making them attractive sources of data for social network research. When studying the usage of online social networks, these data may not describe properly users’ behaviours. For instance, the data collected often include content shared by the users only, or content accessible to the researchers, hence obfuscating a large amount of data that would help understanding users’ behaviours and privacy concerns. Moreover, the data collection methods employed in experiments may also have an effect on data reliability when participants self-report inacurrate information or are observed while using a simulated application. Understanding the effects of these collection methods on data reliability is paramount for the study of social networks; for understanding user behaviour; for designing socially-aware applications and services; and for mining data collected from such social networks and applications. This chapter reviews previous research which has looked at social network data collection and user behaviour in these networks. We highlight shortcomings in the methods used in these studies, and introduce our own methodology and user study based on the Experience Sampling Method; we claim our methodology leads to the collection of more reliable data by capturing both those data which are shared and not shared. We conclude with suggestions for collecting and mining data from online social networks.Postprin
    corecore