Search CORE

22 research outputs found

A study of query expansion methods for patent retrieval

Author: Jones Gareth J.F.
Magdy Walid
Publication venue
Publication date: 01/10/2011
Field of study

Patent retrieval is a recall-oriented search task where the objective is to find all possible relevant documents. Queries in patent retrieval are typically very long since they take the form of a patent claim or even a full patent application in the case of priorart patent search. Nevertheless, there is generally a significant mismatch between the query and the relevant documents, often leading to low retrieval effectiveness. Some previous work has tried to address this mismatch through the application of query expansion (QE) techniques which have generally showed effectiveness for many other retrieval tasks. However, results of QE on patent search have been found to be very disappointing. We present a review of previous investigations of QE in patent retrieval, and explore some of these techniques on a prior-art patent search task. In addition, a novel method for QE using automatically generated synonyms set is presented. While previous QE techniques fail to improve over baseline retrieval, our new approach show statistically better retrieval precision over the baseline, although not for recall. In addition, it proves to be significantly more efficient than existing techniques. An extensive analysis to the results is presented which seeks to better understand situations where these QE techniques succeed or fail

Irish Universities

DCU Online Research Access Service

A pattern mining approach for information filtering systems

Author: Algarni Abdulmohsen
Li Yuefeng
Xu Yue
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well

Queensland University of Technology ePrints Archive

A Study of Collection-Based Features for Adapting the Balance Parameter in Pseudo Relevance Feedback.

Author: Hou Yuexian
Meng Ye
Song Dawei
Zhang Peng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Pseudo-relevance feedback (PRF) is an effective technique to improve the ad-hoc retrieval performance. For PRF methods, how to optimize the balance parameter between the original query model and feedback model is an important but difficult problem. Traditionally, the balance parameter is often manually tested and set to a fixed value across collections and queries. However, due to the difference among collections and individual queries, this parameter should be tuned differently. Recent research has studied various query based and feedback documents based features to predict the optimal balance parameter for each query on a specific collection, through a learning approach based on logistic regression. In this paper, we hypothesize that characteristics of collections are also important for the prediction. We propose and systematically investigate a series of collection- based features for queries, feedback documents and candidate expansion terms. The experiments show that our method is competitive in improving retrieval performance and particularly for cross-collection prediction, in comparison with the state-of-the-art approaches

Crossref

Open Research Online (The Open University)

Wiki-MetaSemantik: A Wikipedia-derived Query Expansion Approach based on Network Properties

Author: Prasetya I. S. W. B.
Puspitaningrum D.
Yulianti G.
Publication venue
Publication date: 23/11/2017
Field of study

This paper discusses the use of Wikipedia for building semantic ontologies to do Query Expansion (QE) in order to improve the search results of search engines. In this technique, selecting related Wikipedia concepts becomes important. We propose the use of network properties (degree, closeness, and pageRank) to build an ontology graph of user query concepts which is derived directly from Wikipedia structures. The resulting expansion system is called Wiki-MetaSemantik. We tested this system against other online thesauruses and ontology based QE in both individual and meta-search engines setups. Despite that our system has to build a Wikipedia ontology graph in order to do its work, the technique turns out to work very fast (1:281) compared to another ontology QE baseline (Wikipedia Persian ontology QE). It has thus the potential to be utilized online. Furthermore, it shows significant improvement in accuracy. Wiki-MetaSemantik also shows better performance in a meta-search engine (MSE) set up rather than in an individual search engine set up

arXiv.org e-Print Archive

Utrecht University Repository

A study of document weight smoothness in pseudo relevance feedback

Author: Hou Yuexian
Song Dawei
Zhang Peng
Zhao Xiaozhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

In pseudo relevance feedback (PRF), the document weight which indicates how important a document is for the PRF model, plays a key role. In this paper, we investigate the smoothness issue of the document weights in PRF. The term smoothness means that the document weights decrease smoothly (i.e. gradually) along the document ranking list, and the weights are smooth (i.e. similar) within topically similar documents. We postulate that a reasonably smooth document- weighting function can benefit the PRF performance. This hypothesis is tested under a typical PRF model, namely the Relevance Model (RM). We propose a two-step document weight smoothing method, the different instantiations of which have different effects on weight smoothing. Ex- periments on three TREC collections show that the instantiated methods with better smoothing effects generally lead to better PRF performance. In addition, the proposed method can significantly improve the RM’s performance and outperform various alternative methods which can also be used to smooth the document weights

CiteSeerX

Crossref

Open Research Online (The Open University)

On the Feasibility and Robustness of Pointwise Evaluation of Query Performance Prediction

Author: Datta Suchana
Ganguly Debasis
Greene Derek
Mitra Mandar
Publication venue
Publication date: 31/03/2023
Field of study

Despite the retrieval effectiveness of queries being mutually independent of one another, the evaluation of query performance prediction (QPP) systems has been carried out by measuring rank correlation over an entire set of queries. Such a listwise approach has a number of disadvantages, notably that it does not support the common requirement of assessing QPP for individual queries. In this paper, we propose a pointwise QPP framework that allows us to evaluate the quality of a QPP system for individual queries by measuring the deviations between each prediction versus the corresponding true value, and then aggregating the results over a set of queries. Our experiments demonstrate that this new approach leads to smaller variances in QPP evaluations across a range of different target metrics and retrieval models

Enlighten

A comparative study of methods for estimating query language models with pseudo feedback

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Crossref

Expansion sélective de requêtes par apprentissage

Author: Chifu Adrian-Gabriel
Mothe Josiane
Publication venue: Faculdade Santa Maria da Gloria
Publication date: 01/03/2014
Field of study

Si l’expansion de requête automatique améliore en moyenne la qualité de recherche, elle peut la dégrader pour certaines requêtes. Ainsi, certains travaux s’intéressent à développer des approches sélectives qui choisissent la fonction de recherche ou d’expansion en fonction des requêtes. La plupart des approches sélectives utilisent un processus d’apprentissage sur des caractéristiques de requêtes passées et sur les performances obtenues. Cet article présente une nouvelle méthode d’expansion sélective qui se base sur des prédicteurs de difficulté des requêtes, prédicteurs linguistiques et statistiques. Le modèle de décision est appris par un SVM. Nous montrons l’efficacité de la méthode sur des collections TREC standards. Les modèles appris ont classé les requêtes de test avec plus de 90% d’exactitude. Par ailleurs, la MAP est améliorée de plus de 11%, comparée à des méthodes non sélectives

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

HAL Descartes

Hal-Diderot