96,059 research outputs found
Combining Text and Formula Queries in Math Information Retrieval: Evaluation of Query Results Merging Strategies
Specific to Math Information Retrieval is combining text with mathematical
formulae both in documents and in queries. Rigorous evaluation of query
expansion and merging strategies combining math and standard textual keyword
terms in a query are given. It is shown that techniques similar to those known
from textual query processing may be applied in math information retrieval as
well, and lead to a cutting edge performance. Striping and merging partial
results from subqueries is one technique that improves results measured by
information retrieval evaluation metrics like Bpref
PARTIAL COORDINATION: A PRELIMINARY EVALUATION AND FAILURE ANALYSIS
Partial coordination is a new method for cataloging documents for
subject access. It is especially designed to enhance the precision of document
searches in online environments. This paper reports a preliminary
evaluation of partial coordination which shows promising results compared
with full text retrieval. We also report the difficulties in empirically
evaluating the effectiveness of automatic full-text retrieval in contrast to
mixed methods such as partial coordination which combine human
cataloging with computerized retrieval. Based on our study we propose
research in this area will substantially benefit from a common framework for
failure analysis and a common data set. This will allow information retrieval
researchers adapting "library style" cataloging to large electronic document
collections, as well as those developing automated or mixed methods, to
directly compare their proposals for indexing and retrieval. This paper
concludes by suggesting guidelines for constructing such a testbed.Information Systems Working Papers Serie
Concept coupling learning for improving concept lattice-based document retrieval
© 2017 Elsevier Ltd The semantic information in any document collection is critical for query understanding in information retrieval. Existing concept lattice-based retrieval systems mainly rely on the partial order relation of formal concepts to index documents. However, the methods used by these systems often ignore the explicit semantic information between the formal concepts extracted from the collection. In this paper, a concept coupling relationship analysis model is proposed to learn and aggregate the intra- and inter-concept coupling relationships. The intra-concept coupling relationship employs the common terms of formal concepts to describe the explicit semantics of formal concepts. The inter-concept coupling relationship adopts the partial order relation of formal concepts to capture the implicit dependency of formal concepts. Based on the concept coupling relationship analysis model, we propose a concept lattice-based retrieval framework. This framework represents user queries and documents in a concept space based on fuzzy formal concept analysis, utilizes a concept lattice as a semantic index to organize documents, and ranks documents with respect to the learned concept coupling relationships. Experiments are performed on the text collections acquired from the SMART information retrieval system. Compared with classic concept lattice-based retrieval methods, our proposed method achieves at least 9%, 8% and 15% improvement in terms of average MAP, IAP@11 and P@10 respectively on all the collections
Continual Learning for Generative Retrieval over Dynamic Corpora
Generative retrieval (GR) directly predicts the identifiers of relevant
documents (i.e., docids) based on a parametric model. It has achieved solid
performance on many ad-hoc retrieval tasks. So far, these tasks have assumed a
static document collection. In many practical scenarios, however, document
collections are dynamic, where new documents are continuously added to the
corpus. The ability to incrementally index new documents while preserving the
ability to answer queries with both previously and newly indexed relevant
documents is vital to applying GR models. In this paper, we address this
practical continual learning problem for GR. We put forward a novel
Continual-LEarner for generatiVE Retrieval (CLEVER) model and make two major
contributions to continual learning for GR: (i) To encode new documents into
docids with low computational cost, we present Incremental Product
Quantization, which updates a partial quantization codebook according to two
adaptive thresholds; and (ii) To memorize new documents for querying without
forgetting previous knowledge, we propose a memory-augmented learning
mechanism, to form meaningful connections between old and new documents.
Empirical results demonstrate the effectiveness and efficiency of the proposed
model.Comment: Accepted by CIKM 202
An integrated information retrieval and document management system
This paper describes the requirements and prototype development for an intelligent document management and information retrieval system that will be capable of handling millions of pages of text or other data. Technologies for scanning, Optical Character Recognition (OCR), magneto-optical storage, and multiplatform retrieval using a Standard Query Language (SQL) will be discussed. The semantic ambiguity inherent in the English language is somewhat compensated-for through the use of coefficients or weighting factors for partial synonyms. Such coefficients are used both for defining structured query trees for routine queries and for establishing long-term interest profiles that can be used on a regular basis to alert individual users to the presence of relevant documents that may have just arrived from an external source, such as a news wire service. Although this attempt at evidential reasoning is limited in comparison with the latest developments in AI Expert Systems technology, it has the advantage of being commercially available
Combining relevance information in a synchronous collaborative information retrieval environment
Traditionally information retrieval (IR) research has focussed on a single user interaction modality, where a user searches to satisfy an information need. Recent
advances in both web technologies, such as the sociable web of Web 2.0, and computer hardware, such as tabletop interface devices, have enabled multiple users to collaborate on many computer-related tasks. Due to these advances there is an increasing need to support
two or more users searching together at the same time, in order to satisfy a shared information need, which we refer to as Synchronous Collaborative Information Retrieval.
Synchronous Collaborative Information Retrieval (SCIR) represents a significant paradigmatic shift from traditional IR systems. In order to support an effective SCIR search, new techniques are required to coordinate users' activities. In this chapter we explore the effectiveness of a sharing of knowledge policy on a collaborating group. Sharing of knowledge refers to the process of passing relevance information across users,
if one user finds items of relevance to the search task then the group should benefit in the form of improved ranked lists returned to each searcher.
In order to evaluate the proposed techniques we simulate two users searching together through an incremental feedback system. The simulation assumes that users decide on an initial query with which to begin the collaborative search and proceed through the search by providing relevance judgments to the system and receiving a new ranked list. In order to populate these simulations we extract data from the interaction logs of various
experimental IR systems from previous Text REtrieval Conference (TREC) workshops
Ranking expansion terms using partial and ostensive evidence
In this paper we examine the problem of ranking candidate expansion terms for query expansion. We show, by an extension to the traditional F4 scheme, how partial relevance assessments (how relevant a document is) and ostensive evidence (when a document was assessed relevant) can be incorporated into a term ranking function. We then investigate this new term ranking function in three user experiments, examining the performance of our function for automatic and interactive query expansion. We show that the new function not only suggests terms that are preferred by searchers but suggests terms that can lead to more use of expansion terms
Improving average ranking precision in user searches for biomedical research datasets
Availability of research datasets is keystone for health and life science
study reproducibility and scientific progress. Due to the heterogeneity and
complexity of these data, a main challenge to be overcome by research data
management systems is to provide users with the best answers for their search
queries. In the context of the 2016 bioCADDIE Dataset Retrieval Challenge, we
investigate a novel ranking pipeline to improve the search of datasets used in
biomedical experiments. Our system comprises a query expansion model based on
word embeddings, a similarity measure algorithm that takes into consideration
the relevance of the query terms, and a dataset categorisation method that
boosts the rank of datasets matching query constraints. The system was
evaluated using a corpus with 800k datasets and 21 annotated user queries. Our
system provides competitive results when compared to the other challenge
participants. In the official run, it achieved the highest infAP among the
participants, being +22.3% higher than the median infAP of the participant's
best submissions. Overall, it is ranked at top 2 if an aggregated metric using
the best official measures per participant is considered. The query expansion
method showed positive impact on the system's performance increasing our
baseline up to +5.0% and +3.4% for the infAP and infNDCG metrics, respectively.
Our similarity measure algorithm seems to be robust, in particular compared to
Divergence From Randomness framework, having smaller performance variations
under different training conditions. Finally, the result categorization did not
have significant impact on the system's performance. We believe that our
solution could be used to enhance biomedical dataset management systems. In
particular, the use of data driven query expansion methods could be an
alternative to the complexity of biomedical terminologies
Knowledge Engineering in Search Engines
With large amounts of information being exchanged on the Internet, search engines have become the most popular tools for helping users to search and filter this information. However, keyword-based search engines sometimes obtain information, which does not meet userâ needs. Some of them are even irrelevant to what the user queries. When the users get query results, they have to read and organize them by themselves. It is not easy for users to handle information when a search engine returns several million results. This project uses a granular computing approach to find knowledge structures of a search engine. The project focuses on knowledge engineering components of a search engine. Based on the earlier work of Dr. Lin and his former student [1], it represents concepts in the Web by simplicial complexes. We found that to represent simplicial complexes adequately, we only need the maximal simplexes. Therefore, this project focuses on building maximal simplexes. Since it is too costly to analyze all Web pages or documents, the project uses the sampling method to get sampling documents. The project constructs simplexes of documents and uses the simplexes to find maximal simplexes. These maximal simplexes are regarded as primitive concepts that can represent Web pages or documents. The maximal simplexes can be used to build an index of a search engine in the future
- âŠ