7,837 research outputs found
Keep It Simple Sheffield – a KISS approach to the Arabic track
Sheffield’s participation in the inaugural Arabic cross language track is described here. Our goal was to
examine how well one could achieve retrieval of Arabic text with the minimum of resources and adaptation
of existing retrieval systems. To this end the public translators used for query translation and the minimal
changes to our retrieval system are described. While the effectiveness of our resulting system is not as high
as one might desire, it nevertheless provides reasonable performance particularly in the monolingual track:
on average, just under four relevant documents were found in the 10 top ranked documents
Morphological variation of Arabic queries
Although it has been shown that in test collection based studies,
stemming improves retrieval effectiveness in an information retrieval system,
morphological variations of queries searching on the same topic are less well
understood. This work examines the broad morphological variation that
searchers of an Arabic retrieval system put into their queries. In this study, 15
native Arabic speakers were asked to generate queries, morphological variants
of query words were collated across users. Queries composed of either the
commonest or rarest variants of each word were submitted to a retrieval system
and the effectiveness of the searches was measured. It was found that queries
composed of the more popular morphological variants were more likely to
retrieve relevant documents that those composed of less popular
An analysis of machine translation errors on the effectiveness of an Arabic-English QA system
The aim of this paper is to investigate
how much the effectiveness of a Question
Answering (QA) system was affected
by the performance of Machine
Translation (MT) based question translation.
Nearly 200 questions were selected
from TREC QA tracks and ran through a
question answering system. It was able to
answer 42.6% of the questions correctly
in a monolingual run. These questions
were then translated manually from English
into Arabic and back into English using
an MT system, and then re-applied to
the QA system. The system was able to
answer 10.2% of the translated questions.
An analysis of what sort of translation error
affected which questions was conducted,
concluding that factoid type
questions are less prone to translation error
than others
The effects of topic familiarity on user search behavior in question answering systems
This paper reports on experiments that attempt
to characterize the relationship between users
and their knowledge of the search topic in a
Question Answering (QA) system. It also
investigates user search behavior with respect
to the length of answers presented by a QA
system. Two lengths of answers were
compared; snippets (one to two sentences of
text) and exact answers. A user test was
conducted, 92 factoid questions were judged
by 44 participants, to explore the participants’
preferences, feelings and opinions about QA
system tasks. The conclusions drawn from the
results were that participants preferred and
obtained higher accuracy in finding answers
from the snippets set. However, accuracy
varied according to users’ topic familiarity;
users were only substantially helped by the
wider context of a snippet if they were already
familiar with the topic of the question, without
such familiarity, users were about as accurate
at locating answers from the snippets as they
were in exact set
The exotic invasive plant Vincetoxicum rossicum is a strong competitor even outside its current realized climatic temperature range
Dog-strangling vine (Vincetoxicum rossicum) is an exotic plant originating from Central and Eastern Europe that is becoming increasingly invasive in southern Ontario, Canada. Once established, it successfully displaces local native plant species but mechanisms behind this plant’s high competitive ability are not fully understood. It is unknown whether cooler temperatures will limit the range expansion of V. rossicum, which has demonstrated high tolerance for other environmental variables such as light and soil moisture. Furthermore, if V. rossicum can establish outside its current climatic limit it is unknown whether competition with native species can significantly contribute to reduce fitness and slow down invasion. We conducted an experiment to test the potential of V. rossicum to spread into northern areas of Ontario using a set of growth chambers to simulate southern and northern Ontario climatic temperature regimes. We also tested plant-plant competition by growing V. rossicum in pots with a highly abundant native species, Solidago canadensis, and comparing growth responses to plants grown alone. We found that the fitness of V. rossicum was not affected by the cooler climate despite a delay in reproductive phenology. Growing V. rossicum with S. canadensis caused a significant reduction in seedpod biomass of V. rossicum. However, we did not detect a temperature x competition interaction in spite of evidence for adaptation of S. canadensis to cooler temperature conditions. We conclude that the spread of V. rossicum north within the tested range is unlikely to be limited by climatic temperature but competition with an abundant native species may contribute to slow it down
Information extraction from template-generated hidden web documents
The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose
search engines (such as Google and Yahoo). Databases dynamically generate a list of documents in response to a user
query – which are referred to as Hidden Web databases. Such documents are typically presented to users as templategenerated
Web pages. This paper presents a new approach that identifies Web page templates in order to extract queryrelated
information from documents. We propose two forms of representation to analyse the content of a document –
Text with Immediate Adjacent Tag Segments (TIATS) and Text with Neighbouring Adjacent Tag Segments (TNATS).
Our techniques exploit tag structures that surround the textual contents of documents in order to detect Web page
templates thereby extracting query-related information. Experimental results demonstrate that TNATS detects Web page
templates most effectively and extracts information with high recall and precision
Query-related data extraction of hidden web documents
The larger amount of information on the Web is stored in document databases and is not indexed by general-purpose search engines (i.e., Google and Yahoo). Such information is
dynamically generated through querying databases — which are
referred to as Hidden Web databases. Documents returned in
response to a user query are typically presented using templategenerated Web pages. This paper proposes a novel approach that identifies Web page templates by analysing the textual contents and the adjacent tag structures of a document in order to extract query-related data. Preliminary results demonstrate that our approach effectively detects templates and retrieves data with high recall and precision
Relevance Judgments between TREC and Non-TREC Assessors
This paper investigates the agreement of relevance assessments between official TREC judgments and those generated from an interactive IR experiment. Results show that 63% of documents judged relevant by our users matched official TREC judgments. Several factors contributed to differences in the agreements: the number of retrieved relevant documents; the number of relevant documents judged; system effectiveness per topic and the ranking of relevant documents
The Relationship between IR Effectiveness Measures and User Satisfaction
This paper presents an experimental study of users assessing the quality of Google web search results. In particular we look at how users' satisfaction correlates with the effectiveness of Google as quantified by IR measures such as precision and the suite of Cumulative Gain measures (CG, DCG, NDCG). Results indicate strong correlation between users' satisfaction, CG and precision, moderate correlation with DCG, with perhaps surprisingly negligible correlation with NDCG. The reasons for the low correlation with NDCG are examined
Users' effectiveness and satisfaction for image retrieval
This paper presents results from an initial user
study exploring the relationship between system
effectiveness as quantified by traditional
measures such as precision and recall, and users’
effectiveness and satisfaction of the results. The
tasks involve finding images for recall-based
tasks. It was concluded that no direct relationship
between system effectiveness and users’
performance could be proven (as shown by
previous research). People learn to adapt to a
system regardless of its effectiveness. This study
recommends that a combination of attributes
(e.g. system effectiveness, user performance and
satisfaction) is a more effective way to evaluate
interactive retrieval systems. Results of this
study also reveal that users are more concerned
with accuracy than coverage of the search
results
- …