43,900 research outputs found
Recommended from our members
Local search: A guide for the information retrieval practitioner
There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR
Recommended from our members
An experimental comparison of a genetic algorithm and a hill-climber for term selection
Purpose â The term selection problem for selecting query terms in information filtering and routing has been investigated using hill-climbers of various kinds, largely through the Okapi experiments in the TREC series of conferences. Although these are simple deterministic approaches which examine the effect of changing the weight of one term at a time, they have been shown to improve the retrieval effectiveness of filtering queries in these TREC experiments. Hill-climbers are, however, likely to get trapped in local optima, and the use of more sophisticated local search techniques for this problem that attempt to break out of these optima are worth investigating. To this end, we apply a genetic algorithm (GA) to the same problem.
Design/Methodology/Approach â We use a standard TREC test collection from the TREC-8 filtering track, recording mean average precision and recall measures to allow comparison between the hillclimber and GA algorithms. We also vary elements of the GA, such as probability of a word being included, probability of mutation and population size in order to measure the effect of these variables. Different strategies such as Elitist and Non-Elitist methods are used, as well as Roulette Wheel and Rank selection GA algorithms.
Findings â The results of tests suggest that both techniques are, on average, better than the baseline, but the implemented GA does not match the overall performance of a hill-climber. The Rank selection algorithm does better on average than the Roulette Wheel algorithm. There is no evidence in this study that varying word inclusion probability, mutation probability or Elitist method make much difference to the overall results. Small population sizes do not appear to be as effective as larger population sizes.
Research limitations/implications â The evidence provided here would suggest that being stuck in a local optima for the term selection optimization problem does not appear to be detrimental to the overall success of the hill-climber. The evidence from term rank order would appear to provide extra useful evidence which hill-climbers can use efficiently and effectively to narrow the search space.
Originality/Value â The paper represents the first attempt to compare hill-climbers with GAs on a problem of this type
Recommended from our members
Interactive product catalogue with user preference tracking
In the context of m-commerce, small screen size poses serious difficulty for users to browse effectively through a product catalogue, given the limited number of products that may be presented on-screen. Despite the availability of search engines, filters and recommender systems to aid users, these techniques focus on a narrow segment of product offering. The users are thus denied the opportunity to do a more expansive exploration of the products available. This paper describes a novel approach to overcome the constraints of small screen size. Through integration of a product catalogue with a recommender system, an adaptive system has been created that guides users through the process of product browsing. An original technique has been developed to cluster similar positive examples together to identify areas of interest of a user. The performance of this technique has been evaluated and the results proved to be promising
Optimisation of the weighting functions of an H<sub>â</sub> controller using genetic algorithms and structured genetic algorithms
In this paper the optimisation of the weighting functions for an H<sub>â</sub> controller using genetic algorithms and structured genetic algorithms is considered. The choice of the weighting functions is one of the key steps in the design of an H<sub>â</sub> controller. The performance of the controller depends on these weighting functions since poorly chosen weighting functions will provide a poor controller. One approach that can solve this problem is the use of evolutionary techniques to tune the weighting parameters. The paper presents the improved performance of structured genetic algorithms over conventional genetic algorithms and how this technique can assist with the identification of appropriate weighting functions' orders
Beyond TREC's filtering track
Following the withdrawal of the filtering track from the latest TREC conferences, there is a niche for new evaluation standards. Towards this end, we suggest, based on variations of TREC's routing subtask, two new evaluation methodologies. The first can be used for evaluating single, multi-topic profiles and the second for testing the ability of a multi-topic profile to adapt to both modest variations and radical drifts in user interests
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Evolutionary intelligent agents for e-commerce: Generic preference detection with feature analysis
Product recommendation and preference tracking systems have been adopted extensively in e-commerce businesses. However, the heterogeneity of product attributes results in undesired impediment for an efficient yet personalized e-commerce product brokering. Amid the assortment of product attributes, there are some intrinsic generic attributes having significant relation to a customerâs generic preference. This paper proposes a novel approach in the detection of generic product attributes through feature analysis. The objective is to provide an insight to the understanding of customersâ generic preference. Furthermore, a genetic algorithm is used to find the suitable feature weight set, hence reducing the rate of misclassification. A prototype has been implemented and the experimental results are promising
Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval
Relevance feedback schemes based on support vector machines (SVM) have been widely used in content-based image retrieval (CBIR). However, the performance of SVM-based relevance feedback is often poor when the number of labeled positive feedback samples is small. This is mainly due to three reasons: 1) an SVM classifier is unstable on a small-sized training set, 2) SVM's optimal hyperplane may be biased when the positive feedback samples are much less than the negative feedback samples, and 3) overfitting happens because the number of feature dimensions is much higher than the size of the training set. In this paper, we develop a mechanism to overcome these problems. To address the first two problems, we propose an asymmetric bagging-based SVM (AB-SVM). For the third problem, we combine the random subspace method and SVM for relevance feedback, which is named random subspace SVM (RS-SVM). Finally, by integrating AB-SVM and RS-SVM, an asymmetric bagging and random subspace SVM (ABRS-SVM) is built to solve these three problems and further improve the relevance feedback performance
- âŠ