22 research outputs found

    A study of query expansion methods for patent retrieval

    Get PDF
    Patent retrieval is a recall-oriented search task where the objective is to find all possible relevant documents. Queries in patent retrieval are typically very long since they take the form of a patent claim or even a full patent application in the case of priorart patent search. Nevertheless, there is generally a significant mismatch between the query and the relevant documents, often leading to low retrieval effectiveness. Some previous work has tried to address this mismatch through the application of query expansion (QE) techniques which have generally showed effectiveness for many other retrieval tasks. However, results of QE on patent search have been found to be very disappointing. We present a review of previous investigations of QE in patent retrieval, and explore some of these techniques on a prior-art patent search task. In addition, a novel method for QE using automatically generated synonyms set is presented. While previous QE techniques fail to improve over baseline retrieval, our new approach show statistically better retrieval precision over the baseline, although not for recall. In addition, it proves to be significantly more efficient than existing techniques. An extensive analysis to the results is presented which seeks to better understand situations where these QE techniques succeed or fail

    A pattern mining approach for information filtering systems

    Get PDF
    It is a big challenge to clearly identify the boundary between positive and negative streams for information filtering systems. Several attempts have used negative feedback to solve this challenge; however, there are two issues for using negative relevance feedback to improve the effectiveness of information filtering. The first one is how to select constructive negative samples in order to reduce the space of negative documents. The second issue is how to decide noisy extracted features that should be updated based on the selected negative samples. This paper proposes a pattern mining based approach to select some offenders from the negative documents, where an offender can be used to reduce the side effects of noisy features. It also classifies extracted features (i.e., terms) into three categories: positive specific terms, general terms, and negative specific terms. In this way, multiple revising strategies can be used to update extracted features. An iterative learning algorithm is also proposed to implement this approach on the RCV1 data collection, and substantial experiments show that the proposed approach achieves encouraging performance and the performance is also consistent for adaptive filtering as well

    A Study of Collection-Based Features for Adapting the Balance Parameter in Pseudo Relevance Feedback.

    Get PDF
    Pseudo-relevance feedback (PRF) is an effective technique to improve the ad-hoc retrieval performance. For PRF methods, how to optimize the balance parameter between the original query model and feedback model is an important but difficult problem. Traditionally, the balance parameter is often manually tested and set to a fixed value across collections and queries. However, due to the difference among collections and individual queries, this parameter should be tuned differently. Recent research has studied various query based and feedback documents based features to predict the optimal balance parameter for each query on a specific collection, through a learning approach based on logistic regression. In this paper, we hypothesize that characteristics of collections are also important for the prediction. We propose and systematically investigate a series of collection- based features for queries, feedback documents and candidate expansion terms. The experiments show that our method is competitive in improving retrieval performance and particularly for cross-collection prediction, in comparison with the state-of-the-art approaches

    Wiki-MetaSemantik: A Wikipedia-derived Query Expansion Approach based on Network Properties

    Get PDF
    This paper discusses the use of Wikipedia for building semantic ontologies to do Query Expansion (QE) in order to improve the search results of search engines. In this technique, selecting related Wikipedia concepts becomes important. We propose the use of network properties (degree, closeness, and pageRank) to build an ontology graph of user query concepts which is derived directly from Wikipedia structures. The resulting expansion system is called Wiki-MetaSemantik. We tested this system against other online thesauruses and ontology based QE in both individual and meta-search engines setups. Despite that our system has to build a Wikipedia ontology graph in order to do its work, the technique turns out to work very fast (1:281) compared to another ontology QE baseline (Wikipedia Persian ontology QE). It has thus the potential to be utilized online. Furthermore, it shows significant improvement in accuracy. Wiki-MetaSemantik also shows better performance in a meta-search engine (MSE) set up rather than in an individual search engine set up

    A study of document weight smoothness in pseudo relevance feedback

    Get PDF
    In pseudo relevance feedback (PRF), the document weight which indicates how important a document is for the PRF model, plays a key role. In this paper, we investigate the smoothness issue of the document weights in PRF. The term smoothness means that the document weights decrease smoothly (i.e. gradually) along the document ranking list, and the weights are smooth (i.e. similar) within topically similar documents. We postulate that a reasonably smooth document- weighting function can benefit the PRF performance. This hypothesis is tested under a typical PRF model, namely the Relevance Model (RM). We propose a two-step document weight smoothing method, the different instantiations of which have different effects on weight smoothing. Ex- periments on three TREC collections show that the instantiated methods with better smoothing effects generally lead to better PRF performance. In addition, the proposed method can significantly improve the RM’s performance and outperform various alternative methods which can also be used to smooth the document weights

    On the Feasibility and Robustness of Pointwise Evaluation of Query Performance Prediction

    Get PDF
    Despite the retrieval effectiveness of queries being mutually independent of one another, the evaluation of query performance prediction (QPP) systems has been carried out by measuring rank correlation over an entire set of queries. Such a listwise approach has a number of disadvantages, notably that it does not support the common requirement of assessing QPP for individual queries. In this paper, we propose a pointwise QPP framework that allows us to evaluate the quality of a QPP system for individual queries by measuring the deviations between each prediction versus the corresponding true value, and then aggregating the results over a set of queries. Our experiments demonstrate that this new approach leads to smaller variances in QPP evaluations across a range of different target metrics and retrieval models

    A comparative study of methods for estimating query language models with pseudo feedback

    Full text link

    Expansion sĂ©lective de requĂȘtes par apprentissage

    Get PDF
    Si l’expansion de requĂȘte automatique amĂ©liore en moyenne la qualitĂ© de recherche, elle peut la dĂ©grader pour certaines requĂȘtes. Ainsi, certains travaux s’intĂ©ressent Ă  dĂ©velopper des approches sĂ©lectives qui choisissent la fonction de recherche ou d’expansion en fonction des requĂȘtes. La plupart des approches sĂ©lectives utilisent un processus d’apprentissage sur des caractĂ©ristiques de requĂȘtes passĂ©es et sur les performances obtenues. Cet article prĂ©sente une nouvelle mĂ©thode d’expansion sĂ©lective qui se base sur des prĂ©dicteurs de difficultĂ© des requĂȘtes, prĂ©dicteurs linguistiques et statistiques. Le modĂšle de dĂ©cision est appris par un SVM. Nous montrons l’efficacitĂ© de la mĂ©thode sur des collections TREC standards. Les modĂšles appris ont classĂ© les requĂȘtes de test avec plus de 90% d’exactitude. Par ailleurs, la MAP est amĂ©liorĂ©e de plus de 11%, comparĂ©e Ă  des mĂ©thodes non sĂ©lectives
    corecore