51 research outputs found

    A Survey on Asking Clarification Questions Datasets in Conversational Systems

    Get PDF
    The ability to understand a user's underlying needs is critical for conversational systems, especially with limited input from users in a conversation. Thus, in such a domain, Asking Clarification Questions (ACQs) to reveal users' true intent from their queries or utterances arise as an essential task. However, it is noticeable that a key limitation of the existing ACQs studies is their incomparability, from inconsistent use of data, distinct experimental setups and evaluation strategies. Therefore, in this paper, to assist the development of ACQs techniques, we comprehensively analyse the current ACQs research status, which offers a detailed comparison of publicly available datasets, and discusses the applied evaluation metrics, joined with benchmarks for multiple ACQs-related tasks. In particular, given a thorough analysis of the ACQs task, we discuss a number of corresponding research directions for the investigation of ACQs as well as the development of conversational systems

    Examining and improving the effectiveness of relevance feedback for retrieval of scanned text documents

    Get PDF
    Important legacy paper documents are digitized and collected in online accessible archives. This enables the preservation, sharing, and significantly the searching of these documents. The text contents of these document images can be transcribed automatically using OCR systems and then stored in an information retrieval system. However, OCR systems make errors in character recognition which have previously been shown to impact on document retrieval behaviour. In particular relevance feedback query-expansion methods, which are often effective for improving electronic text retrieval, are observed to be less reliable for retrieval of scanned document images. Our experimental examination of the effects of character recognition errors on an ad hoc OCR retrieval task demonstrates that, while baseline information retrieval can remain relatively unaffected by transcription errors, relevance feedback via query expansion becomes highly unstable. This paper examines the reason for this behaviour, and introduces novel modifications to standard relevance feedback methods. These methods are shown experimentally to improve the effectiveness of relevance feedback for errorful OCR transcriptions. The new methods combine similar recognised character strings based on term collection frequency and a string edit-distance measure. The techniques are domain independent and make no use of external resources such as dictionaries or training data

    QUERY OPTIMISATION USING AN IMPROVED GENETIC ALGORITHM

    Get PDF
    International audienceThis paper presents an approach to intelligent information retrieval based on genetic heuristics. Recent search has shown that applying genetic models for query optimisation improve the retrieval effectiveness. We investigate ways to improve this process by combining genetic heuristics and information retrieval techniques. More precisely, we propose to integrate relevance feedback techniques to perform the genetic operators and the speciation heuristic to solve the relevance multimodality problem. Experiments, with AP documents and queries issued from TREC, showed the effectiveness of our approach. Keywords: Informatio

    Semi-automated ontology generation within OBO-Edit

    Get PDF
    Motivation: Ontologies and taxonomies have proven highly beneficial for biocuration. The Open Biomedical Ontology (OBO) Foundry alone lists over 90 ontologies mainly built with OBO-Edit. Creating and maintaining such ontologies is a labour-intensive, difficult, manual process. Automating parts of it is of great importance for the further development of ontologies and for biocuration

    On-line learning for adaptive text filtering.

    Get PDF
    Yu Kwok Leung.Thesis (M.Phil.)--Chinese University of Hong Kong, 1999.Includes bibliographical references (leaves 91-96).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- The Problem --- p.1Chapter 1.2 --- Information Filtering --- p.2Chapter 1.3 --- Contributions --- p.7Chapter 1.4 --- Organization Of The Thesis --- p.10Chapter 2 --- Related Work --- p.12Chapter 3 --- Adaptive Text Filtering --- p.22Chapter 3.1 --- Representation --- p.22Chapter 3.1.1 --- Textual Document --- p.23Chapter 3.1.2 --- Filtering Profile --- p.28Chapter 3.2 --- On-line Learning Algorithms For Adaptive Text Filtering --- p.29Chapter 3.2.1 --- The Sleeping Experts Algorithm --- p.29Chapter 3.2.2 --- The EG-based Algorithms --- p.32Chapter 4 --- The REPGER Algorithm --- p.37Chapter 4.1 --- A New Approach --- p.37Chapter 4.2 --- Relevance Prediction By RElevant feature Pool --- p.42Chapter 4.3 --- Retrieving Good Training Examples --- p.45Chapter 4.4 --- Learning Dissemination Threshold Dynamically --- p.49Chapter 5 --- The Threshold Learning Algorithm --- p.50Chapter 5.1 --- Learning Dissemination Threshold Dynamically --- p.50Chapter 5.2 --- Existing Threshold Learning Techniques --- p.51Chapter 5.3 --- A New Threshold Learning Algorithm --- p.53Chapter 6 --- Empirical Evaluations --- p.55Chapter 6.1 --- Experimental Methodology --- p.55Chapter 6.2 --- Experimental Settings --- p.59Chapter 6.3 --- Experimental Results --- p.62Chapter 7 --- Integrating With Feature Clustering --- p.76Chapter 7.1 --- Distributional Clustering Algorithm --- p.79Chapter 7.2 --- Integrating With Our REPGER Algorithm --- p.82Chapter 7.3 --- Empirical Evaluation --- p.84Chapter 8 --- Conclusions --- p.87Chapter 8.1 --- Summary --- p.87Chapter 8.2 --- Future Work --- p.88Bibliography --- p.91Chapter A --- Experimental Results On The AP Corpus --- p.97Chapter A.1 --- The EG Algorithm --- p.97Chapter A.2 --- The EG-C Algorithm --- p.98Chapter A.3 --- The REPGER Algorithm --- p.100Chapter B --- Experimental Results On The FBIS Corpus --- p.102Chapter B.1 --- The EG Algorithm --- p.102Chapter B.2 --- The EG-C Algorithm --- p.103Chapter B.3 --- The REPGER Algorithm --- p.105Chapter C --- Experimental Results On The WSJ Corpus --- p.107Chapter C.1 --- The EG Algorithm --- p.107Chapter C.2 --- The EG-C Algorithm --- p.108Chapter C.3 --- The REPGER Algorithm --- p.11

    Concept-based Interactive Query Expansion Support Tool (CIQUEST)

    Get PDF
    This report describes a three-year project (2000-03) undertaken in the Information Studies Department at The University of Sheffield and funded by Resource, The Council for Museums, Archives and Libraries. The overall aim of the research was to provide user support for query formulation and reformulation in searching large-scale textual resources including those of the World Wide Web. More specifically the objectives were: to investigate and evaluate methods for the automatic generation and organisation of concepts derived from retrieved document sets, based on statistical methods for term weighting; and to conduct user-based evaluations on the understanding, presentation and retrieval effectiveness of concept structures in selecting candidate terms for interactive query expansion. The TREC test collection formed the basis for the seven evaluative experiments conducted in the course of the project. These formed four distinct phases in the project plan. In the first phase, a series of experiments was conducted to investigate further techniques for concept derivation and hierarchical organisation and structure. The second phase was concerned with user-based validation of the concept structures. Results of phases 1 and 2 informed on the design of the test system and the user interface was developed in phase 3. The final phase entailed a user-based summative evaluation of the CiQuest system. The main findings demonstrate that concept hierarchies can effectively be generated from sets of retrieved documents and displayed to searchers in a meaningful way. The approach provides the searcher with an overview of the contents of the retrieved documents, which in turn facilitates the viewing of documents and selection of the most relevant ones. Concept hierarchies are a good source of terms for query expansion and can improve precision. The extraction of descriptive phrases as an alternative source of terms was also effective. With respect to presentation, cascading menus were easy to browse for selecting terms and for viewing documents. In conclusion the project dissemination programme and future work are outlined

    Improving Information Retrieval Systems using Part of Speech Tagging

    Get PDF
    The object of Information Retrieval is to retrieve all relevantdocuments for a user query and only those relevant documents. Muchresearch has focused on achieving this objective with little regard forstorage overhead or performance. In the paper we evaluate the use ofPart of Speech Tagging to improve, the index storage overhead andgeneral speed of the system with only a minimal reduction to precisionrecall measurements. We tagged 500Mbs of the Los Angeles Times 1990 and1989 document collection provided by TREC for parts of speech. We thenexperimented to find the most relevant part of speech to index. We showthat 90 percent of precision recall is achieved with 40 percent of the documentcollections terms. We also show that this is a improvement in overheadwith only a 1 percent reduction in precision recall

    Inter-relaão das técnicas Term Extration e Query Expansion aplicadas na recuperação de documentos textuais

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-graduação em Engenharia e Gestão do ConhecimentoConforme Sighal (2006) as pessoas reconhecem a importância do armazenamento e busca da informação e, com o advento dos computadores, tornou-se possível o armazenamento de grandes quantidades dela em bases de dados. Em conseqüência, catalogar a informação destas bases tornou-se imprescindível. Nesse contexto, o campo da Recuperação da Informação, surgiu na década de 50, com a finalidade de promover a construção de ferramentas computacionais que permitissem aos usuários utilizar de maneira mais eficiente essas bases de dados. O principal objetivo da presente pesquisa é desenvolver um Modelo Computacional que possibilite a recuperação de documentos textuais ordenados pela similaridade semântica, baseado na intersecção das técnicas de Term Extration e Query Expansion
    corecore