4 research outputs found
File Search with Query Expansion in a Network System(s)
The amount of information in the Internet is growing fast; Searching for information has become an important issue; however the user queries impact the effectiveness of retrieving information that users need. The objective of Query Expansion is to find additional and more relevant results. This article used different similarity measures (Cosine, Jaccard, Dice similarity functions) in VSM on three Genetic Algorithm approaches, each similarity function used as fitness function, one point crossover and new selection method based on rank selection is used. The NSC (National Science Council, Taiwan) document data collection is used in this study. Our results show that QE methods increase the precision rates and the recall rates of information retrieval for dealing with document retrieval . Also we present a network system that consists of many servers to decrease the amount of workload from Main server. Keywords: information retrieval, vector space model, similarity measures, genetic algorithm, query expansion
Recommended from our members
An experimental comparison of a genetic algorithm and a hill-climber for term selection
Purpose – The term selection problem for selecting query terms in information filtering and routing has been investigated using hill-climbers of various kinds, largely through the Okapi experiments in the TREC series of conferences. Although these are simple deterministic approaches which examine the effect of changing the weight of one term at a time, they have been shown to improve the retrieval effectiveness of filtering queries in these TREC experiments. Hill-climbers are, however, likely to get trapped in local optima, and the use of more sophisticated local search techniques for this problem that attempt to break out of these optima are worth investigating. To this end, we apply a genetic algorithm (GA) to the same problem.
Design/Methodology/Approach – We use a standard TREC test collection from the TREC-8 filtering track, recording mean average precision and recall measures to allow comparison between the hillclimber and GA algorithms. We also vary elements of the GA, such as probability of a word being included, probability of mutation and population size in order to measure the effect of these variables. Different strategies such as Elitist and Non-Elitist methods are used, as well as Roulette Wheel and Rank selection GA algorithms.
Findings – The results of tests suggest that both techniques are, on average, better than the baseline, but the implemented GA does not match the overall performance of a hill-climber. The Rank selection algorithm does better on average than the Roulette Wheel algorithm. There is no evidence in this study that varying word inclusion probability, mutation probability or Elitist method make much difference to the overall results. Small population sizes do not appear to be as effective as larger population sizes.
Research limitations/implications – The evidence provided here would suggest that being stuck in a local optima for the term selection optimization problem does not appear to be detrimental to the overall success of the hill-climber. The evidence from term rank order would appear to provide extra useful evidence which hill-climbers can use efficiently and effectively to narrow the search space.
Originality/Value – The paper represents the first attempt to compare hill-climbers with GAs on a problem of this type
A new approach to estimating the expected first hitting time of evolutionary algorithms
AbstractEvolutionary algorithms (EA) have been shown to be very effective in solving practical problems, yet many important theoretical issues of them are not clear. The expected first hitting time is one of the most important theoretical issues of evolutionary algorithms, since it implies the average computational time complexity. In this paper, we establish a bridge between the expected first hitting time and another important theoretical issue, i.e., convergence rate. Through this bridge, we propose a new general approach to estimating the expected first hitting time. Using this approach, we analyze EAs with different configurations, including three mutation operators, with/without population, a recombination operator and a time variant mutation operator, on a hard problem. The results show that the proposed approach is helpful for analyzing a broad range of evolutionary algorithms. Moreover, we give an explanation of what makes a problem hard to EAs, and based on the recognition, we prove the hardness of a general problem
A Survey on Important Aspects of Information Retrieval
Information retrieval has become an important field of study and research under computer science due to the explosive growth of information available in the form of full text, hypertext, administrative text, directory, numeric or bibliographic text. The research work is going on various aspects of information retrieval systems so as to improve its efficiency and reliability. This paper presents a comprehensive survey discussing not only the emergence and evolution of information retrieval but also include different information retrieval models and some important aspects such as document representation, similarity measure and query expansion