617 research outputs found
Evolving Lucene search queries for text classification
We describe a method for generating accurate, compact, human
understandable text classifiers. Text datasets are indexed using Apache Lucene and Genetic Programs are used to construct
Lucene search queries. Genetic programs acquire fitness by
producing queries that are effective binary classifiers for a
particular category when evaluated against a set of training
documents. We describe a set of functions and terminals and
provide results from classification tasks
A comparison of Lucene search queries evolved as text classifiers
In this article, we use a genetic algorithm to evolve seven
different types of Lucene search query with the objective of
generating accurate and readable text classifiers. We compare
the effectiveness of each of the different types of query using
three commonly used text datasets. We vary the number of
words available for classification and compare results for 4, 8,
and 16 words per category. The generated queries can also be
viewed as labels for the categories and there is a benefit to a
human analyst in being able to read and tune the classifier.
The evolved queries also provide an explanation of the classification
process. We consider the consistency of the classifiers
and compare their performance on categories of different
complexities. Finally, various approaches to the analysis of
the results are briefly explored
Term-Specific Eigenvector-Centrality in Multi-Relation Networks
Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem. This article investigates how eigenvector-centrality can be used for approximate matching in multi-relation graphs, that is, graphs where connections of many different types may exist. Based on an extension of the PageRank matrix, eigenvectors representing the distribution of a term after propagating term weights between related data items are computed. The result is an index which takes the document structure into account and can be used with standard document retrieval techniques. As the scheme takes the shape of an index transformation, all necessary calculations are performed during index tim
Dublin City University at QA@CLEF 2008
We describe our participation in Multilingual Question Answering at CLEF 2008 using German and English as our source and target languages respectively. The system was built using UIMA (Unstructured Information Management Architecture) as underlying framework
Task-Oriented Query Reformulation with Reinforcement Learning
Search engines play an important role in our everyday lives by assisting us
in finding the information we need. When we input a complex query, however,
results are often far from satisfactory. In this work, we introduce a query
reformulation system based on a neural network that rewrites a query to
maximize the number of relevant documents returned. We train this neural
network with reinforcement learning. The actions correspond to selecting terms
to build a reformulated query, and the reward is the document recall. We
evaluate our approach on three datasets against strong baselines and show a
relative improvement of 5-20% in terms of recall. Furthermore, we present a
simple method to estimate a conservative upper-bound performance of a model in
a particular environment and verify that there is still large room for
improvements.Comment: EMNLP 201
Using semantic indexing to improve searching performance in web archives
The sheer volume of electronic documents being published on the Web can be overwhelming for users if the searching aspect is not properly addressed. This problem is particularly acute inside archives and repositories containing large collections of web resources or, more precisely, web pages and other web objects. Using the existing search capabilities in web archives, results can be compromised because of the size of data, content heterogeneity and changes in scientific terminologies and meanings. During the course of this research, we will explore whether semantic web technologies, particularly ontology-based annotation and retrieval, could improve precision in search results in multi-disciplinary web archives
- …