96,059 research outputs found

    Combining Text and Formula Queries in Math Information Retrieval: Evaluation of Query Results Merging Strategies

    Full text link
    Specific to Math Information Retrieval is combining text with mathematical formulae both in documents and in queries. Rigorous evaluation of query expansion and merging strategies combining math and standard textual keyword terms in a query are given. It is shown that techniques similar to those known from textual query processing may be applied in math information retrieval as well, and lead to a cutting edge performance. Striping and merging partial results from subqueries is one technique that improves results measured by information retrieval evaluation metrics like Bpref

    PARTIAL COORDINATION: A PRELIMINARY EVALUATION AND FAILURE ANALYSIS

    Get PDF
    Partial coordination is a new method for cataloging documents for subject access. It is especially designed to enhance the precision of document searches in online environments. This paper reports a preliminary evaluation of partial coordination which shows promising results compared with full text retrieval. We also report the difficulties in empirically evaluating the effectiveness of automatic full-text retrieval in contrast to mixed methods such as partial coordination which combine human cataloging with computerized retrieval. Based on our study we propose research in this area will substantially benefit from a common framework for failure analysis and a common data set. This will allow information retrieval researchers adapting "library style" cataloging to large electronic document collections, as well as those developing automated or mixed methods, to directly compare their proposals for indexing and retrieval. This paper concludes by suggesting guidelines for constructing such a testbed.Information Systems Working Papers Serie

    Concept coupling learning for improving concept lattice-based document retrieval

    Full text link
    © 2017 Elsevier Ltd The semantic information in any document collection is critical for query understanding in information retrieval. Existing concept lattice-based retrieval systems mainly rely on the partial order relation of formal concepts to index documents. However, the methods used by these systems often ignore the explicit semantic information between the formal concepts extracted from the collection. In this paper, a concept coupling relationship analysis model is proposed to learn and aggregate the intra- and inter-concept coupling relationships. The intra-concept coupling relationship employs the common terms of formal concepts to describe the explicit semantics of formal concepts. The inter-concept coupling relationship adopts the partial order relation of formal concepts to capture the implicit dependency of formal concepts. Based on the concept coupling relationship analysis model, we propose a concept lattice-based retrieval framework. This framework represents user queries and documents in a concept space based on fuzzy formal concept analysis, utilizes a concept lattice as a semantic index to organize documents, and ranks documents with respect to the learned concept coupling relationships. Experiments are performed on the text collections acquired from the SMART information retrieval system. Compared with classic concept lattice-based retrieval methods, our proposed method achieves at least 9%, 8% and 15% improvement in terms of average MAP, IAP@11 and P@10 respectively on all the collections

    Continual Learning for Generative Retrieval over Dynamic Corpora

    Full text link
    Generative retrieval (GR) directly predicts the identifiers of relevant documents (i.e., docids) based on a parametric model. It has achieved solid performance on many ad-hoc retrieval tasks. So far, these tasks have assumed a static document collection. In many practical scenarios, however, document collections are dynamic, where new documents are continuously added to the corpus. The ability to incrementally index new documents while preserving the ability to answer queries with both previously and newly indexed relevant documents is vital to applying GR models. In this paper, we address this practical continual learning problem for GR. We put forward a novel Continual-LEarner for generatiVE Retrieval (CLEVER) model and make two major contributions to continual learning for GR: (i) To encode new documents into docids with low computational cost, we present Incremental Product Quantization, which updates a partial quantization codebook according to two adaptive thresholds; and (ii) To memorize new documents for querying without forgetting previous knowledge, we propose a memory-augmented learning mechanism, to form meaningful connections between old and new documents. Empirical results demonstrate the effectiveness and efficiency of the proposed model.Comment: Accepted by CIKM 202

    An integrated information retrieval and document management system

    Get PDF
    This paper describes the requirements and prototype development for an intelligent document management and information retrieval system that will be capable of handling millions of pages of text or other data. Technologies for scanning, Optical Character Recognition (OCR), magneto-optical storage, and multiplatform retrieval using a Standard Query Language (SQL) will be discussed. The semantic ambiguity inherent in the English language is somewhat compensated-for through the use of coefficients or weighting factors for partial synonyms. Such coefficients are used both for defining structured query trees for routine queries and for establishing long-term interest profiles that can be used on a regular basis to alert individual users to the presence of relevant documents that may have just arrived from an external source, such as a news wire service. Although this attempt at evidential reasoning is limited in comparison with the latest developments in AI Expert Systems technology, it has the advantage of being commercially available

    Combining relevance information in a synchronous collaborative information retrieval environment

    Get PDF
    Traditionally information retrieval (IR) research has focussed on a single user interaction modality, where a user searches to satisfy an information need. Recent advances in both web technologies, such as the sociable web of Web 2.0, and computer hardware, such as tabletop interface devices, have enabled multiple users to collaborate on many computer-related tasks. Due to these advances there is an increasing need to support two or more users searching together at the same time, in order to satisfy a shared information need, which we refer to as Synchronous Collaborative Information Retrieval. Synchronous Collaborative Information Retrieval (SCIR) represents a significant paradigmatic shift from traditional IR systems. In order to support an effective SCIR search, new techniques are required to coordinate users' activities. In this chapter we explore the effectiveness of a sharing of knowledge policy on a collaborating group. Sharing of knowledge refers to the process of passing relevance information across users, if one user finds items of relevance to the search task then the group should benefit in the form of improved ranked lists returned to each searcher. In order to evaluate the proposed techniques we simulate two users searching together through an incremental feedback system. The simulation assumes that users decide on an initial query with which to begin the collaborative search and proceed through the search by providing relevance judgments to the system and receiving a new ranked list. In order to populate these simulations we extract data from the interaction logs of various experimental IR systems from previous Text REtrieval Conference (TREC) workshops

    Ranking expansion terms using partial and ostensive evidence

    Get PDF
    In this paper we examine the problem of ranking candidate expansion terms for query expansion. We show, by an extension to the traditional F4 scheme, how partial relevance assessments (how relevant a document is) and ostensive evidence (when a document was assessed relevant) can be incorporated into a term ranking function. We then investigate this new term ranking function in three user experiments, examining the performance of our function for automatic and interactive query expansion. We show that the new function not only suggests terms that are preferred by searchers but suggests terms that can lead to more use of expansion terms

    Improving average ranking precision in user searches for biomedical research datasets

    Full text link
    Availability of research datasets is keystone for health and life science study reproducibility and scientific progress. Due to the heterogeneity and complexity of these data, a main challenge to be overcome by research data management systems is to provide users with the best answers for their search queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we investigate a novel ranking pipeline to improve the search of datasets used in biomedical experiments. Our system comprises a query expansion model based on word embeddings, a similarity measure algorithm that takes into consideration the relevance of the query terms, and a dataset categorisation method that boosts the rank of datasets matching query constraints. The system was evaluated using a corpus with 800k datasets and 21 annotated user queries. Our system provides competitive results when compared to the other challenge participants. In the official run, it achieved the highest infAP among the participants, being +22.3% higher than the median infAP of the participant's best submissions. Overall, it is ranked at top 2 if an aggregated metric using the best official measures per participant is considered. The query expansion method showed positive impact on the system's performance increasing our baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively. Our similarity measure algorithm seems to be robust, in particular compared to Divergence From Randomness framework, having smaller performance variations under different training conditions. Finally, the result categorization did not have significant impact on the system's performance. We believe that our solution could be used to enhance biomedical dataset management systems. In particular, the use of data driven query expansion methods could be an alternative to the complexity of biomedical terminologies

    Knowledge Engineering in Search Engines

    Get PDF
    With large amounts of information being exchanged on the Internet, search engines have become the most popular tools for helping users to search and filter this information. However, keyword-based search engines sometimes obtain information, which does not meet user’ needs. Some of them are even irrelevant to what the user queries. When the users get query results, they have to read and organize them by themselves. It is not easy for users to handle information when a search engine returns several million results. This project uses a granular computing approach to find knowledge structures of a search engine. The project focuses on knowledge engineering components of a search engine. Based on the earlier work of Dr. Lin and his former student [1], it represents concepts in the Web by simplicial complexes. We found that to represent simplicial complexes adequately, we only need the maximal simplexes. Therefore, this project focuses on building maximal simplexes. Since it is too costly to analyze all Web pages or documents, the project uses the sampling method to get sampling documents. The project constructs simplexes of documents and uses the simplexes to find maximal simplexes. These maximal simplexes are regarded as primitive concepts that can represent Web pages or documents. The maximal simplexes can be used to build an index of a search engine in the future
    • 

    corecore