962 research outputs found
Recommended from our members
An experimental comparison of a genetic algorithm and a hill-climber for term selection
Purpose â The term selection problem for selecting query terms in information filtering and routing has been investigated using hill-climbers of various kinds, largely through the Okapi experiments in the TREC series of conferences. Although these are simple deterministic approaches which examine the effect of changing the weight of one term at a time, they have been shown to improve the retrieval effectiveness of filtering queries in these TREC experiments. Hill-climbers are, however, likely to get trapped in local optima, and the use of more sophisticated local search techniques for this problem that attempt to break out of these optima are worth investigating. To this end, we apply a genetic algorithm (GA) to the same problem.
Design/Methodology/Approach â We use a standard TREC test collection from the TREC-8 filtering track, recording mean average precision and recall measures to allow comparison between the hillclimber and GA algorithms. We also vary elements of the GA, such as probability of a word being included, probability of mutation and population size in order to measure the effect of these variables. Different strategies such as Elitist and Non-Elitist methods are used, as well as Roulette Wheel and Rank selection GA algorithms.
Findings â The results of tests suggest that both techniques are, on average, better than the baseline, but the implemented GA does not match the overall performance of a hill-climber. The Rank selection algorithm does better on average than the Roulette Wheel algorithm. There is no evidence in this study that varying word inclusion probability, mutation probability or Elitist method make much difference to the overall results. Small population sizes do not appear to be as effective as larger population sizes.
Research limitations/implications â The evidence provided here would suggest that being stuck in a local optima for the term selection optimization problem does not appear to be detrimental to the overall success of the hill-climber. The evidence from term rank order would appear to provide extra useful evidence which hill-climbers can use efficiently and effectively to narrow the search space.
Originality/Value â The paper represents the first attempt to compare hill-climbers with GAs on a problem of this type
Multiple Retrieval Models and Regression Models for Prior Art Search
This paper presents the system called PATATRAS (PATent and Article Tracking,
Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach
presents three main characteristics: 1. The usage of multiple retrieval models
(KL, Okapi) and term index definitions (lemma, phrase, concept) for the three
languages considered in the present track (English, French, German) producing
ten different sets of ranked results. 2. The merging of the different results
based on multiple regression models using an additional validation set created
from the patent collection. 3. The exploitation of patent metadata and of the
citation structures for creating restricted initial working sets of patents and
for producing a final re-ranking regression model. As we exploit specific
metadata of the patent documents and the citation relations only at the
creation of initial working sets and during the final post ranking step, our
architecture remains generic and easy to extend
DCU's experiments in NTCIR-8 IR4QA task
We describe DCU's participation in the NTCIR-8 IR4QA
task [16]. This task is a cross-language information retrieval(CLIR) task from English to Simplified Chinese which seeks to provide relevant documents for later cross language question answering (CLQA) tasks. For the IR4QA task, we submitted 5 official runs including two monolingual runs and three CLIR runs. For the monolingual retrieval we tested two information retrieval models. The results show that the KL-Divergence language model method performs better than the Okapi BM25 model for the Simplified Chinese retrieval task. This agrees with our previous CLIR experimental results at NTCIR-5. For the CLIR task, we compare query translation and document translation methods. In the query translation based runs, we tested a method for query expansion from external resource (QEE) before query translation. Our result for this run is slightly lower than the run without QEE. Our results show that the document translation method achieves 68.24% MAP performance compared to our best query translation run. For the document translation method, we found that the main issue is the lack of named entity translation in the documents since we do not have a suitable parallel corpus for training data for the statistical machine translation system. Our best CLIR run comes from the combination of query translation using Google translate and the KL-Divergence language model retrieval method. It achieves 79.94% MAP relative to our best monolingual run
Recommended from our members
Okapi-based XML indexing
Purpose
â Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semiâstructured data management systems. This paper aims to propose a method for XML indexing based on the information retrieval (IR) system Okapi.
Design/methodology/approach
â First, the paper reviews the structure of inverted files and gives an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures of Okapi as an example. Then the paper explores a revised method implemented on Okapi using path indexing structures. The paper evaluates these index structures through the metrics of indexing run time, path search run time and space costs using the INEX and Reuters RVC1 collections.
Findings
â Initial results on the INEX collections show that there is a substantial overhead in space costs for the method, but this increase does not affect run time adversely. Indexing results on differing sized Reuters RVC1 subâcollections show that the increase in space costs with increasing the size of a collection is significant, but in terms of run time the increase is linear. Path search results show subâmillisecond run times, demonstrating minimal overhead for XML search.
Practical implications
â Overall, the results show the method implemented to support XML search in a traditional IR system such as Okapi is viable.
Originality/value
â The paper provides useful information on a method for XML indexing based on the IR system Okapi
Lucene4IR: Developing information retrieval evaluation resources using Lucene
The workshop and hackathon on developing Information Retrieval Evaluation Resources using Lucene (L4IR) was held on the 8th and 9th of September, 2016 at the University of Strathclyde in Glasgow, UK and funded by the ESF Elias Network. The event featured three main elements: (i) a series of keynote and invited talks on industry, teaching and evaluation; (ii) planning, coding and hacking where a number of groups created modules and infrastructure to use Lucene to undertake TREC based evaluations; and (iii) a number of breakout groups discussing challenges, opportunities and problems in bridging the divide between academia and industry, and how we can use Lucene for teaching and learning Information Retrieval (IR). The event was composed of a mix and blend of academics, experts and students wanting to learn, share and create evaluation resources for the community. The hacking was intense and the discussions lively creating the basis of many useful tools but also raising numerous issues. It was clear that by adopting and contributing to most widely used and supported Open Source IR toolkit, there were many benefits for academics, students, researchers, developers and practitioners - providing a basis for stronger evaluation practices, increased reproducibility, more efficient knowledge transfer, greater collaboration between academia and industry, and shared teaching and training resources
Supporting aspect-based video browsing - analysis of a user study
In this paper, we present a novel video search interface based on the concept of aspect browsing. The proposed strategy is to assist the user in exploratory video search by actively suggesting new query terms and video shots. Our approach has the potential to narrow the "Semantic Gap" issue by allowing users to explore the data collection. First, we describe a clustering technique to identify potential aspects of a search. Then, we use the results to propose suggestions to the user to help them in their search task. Finally, we analyse this approach by exploiting the log files and the feedbacks of a user study
A Multi-criteria Decision Support System for Ph.D. Supervisor Selection: A Hybrid Approach
Selection of a suitable Ph.D. supervisor is a very important step in a studentâs career. This paper presents a multi-criteria decision support system to assist students in making this choice. The system employs a hybrid method that first utilizes a fuzzy analytic hierarchy process to extract the relative importance of the identified criteria and sub-criteria to consider when selecting a supervisor. Then, it applies an information retrieval-based similarity algorithm (TF/IDF or Okapi BM25) to retrieve relevant candidate supervisor profiles based on the studentâs research interest. The selected profiles are then re-ranked based on other relevant factors chosen by the user, such as publication record, research grant record, and collaboration record. The ranking method evaluates the potential supervisors objectively based on various metrics that are defined in terms of detailed domain-specific knowledge, making part of the decision making automatic. In contrast with other existing works, this system does not require the professorâs involvement and no subjective measures are employed
- âŠ