64,844 research outputs found
Query expansion with naive bayes for searching distributed collections
The proliferation of online information resources increases the importance of effective and efficient distributed searching. However, the problem of word mismatch seriously hurts the effectiveness of distributed information retrieval. Automatic query expansion has been suggested as a technique for dealing with the fundamental issue of word mismatch. In this paper, we propose a method - query expansion with Naive Bayes to address the problem, discuss its implementation in IISS system, and present experimental results demonstrating its effectiveness. Such technique not only enhances the discriminatory power of typical queries for choosing the right collections but also hence significantly improves retrieval results
Search of spoken documents retrieves well recognized transcripts
This paper presents a series of analyses and experiments on spoken
document retrieval systems: search engines that retrieve transcripts produced by
speech recognizers. Results show that transcripts that match queries well tend to
be recognized more accurately than transcripts that match a query less well.
This result was described in past literature, however, no study or explanation of
the effect has been provided until now. This paper provides such an analysis
showing a relationship between word error rate and query length. The paper
expands on past research by increasing the number of recognitions systems that
are tested as well as showing the effect in an operational speech retrieval
system. Potential future lines of enquiry are also described
Automatic Segmentation of Multiparty Dialogue
In this paper, we investigate the problem of automatically predicting segment boundaries in spoken multiparty dialogue. We extend prior work in two ways. We first apply approaches that have been proposed for predicting top-level topic shifts to the problem of identifying subtopic boundaries. We then explore the impact on performance of using ASR output as opposed to human transcription. Examination of the effect of features shows that predicting top-level and predicting subtopic boundaries are two distinct tasks: (1) for predicting subtopic boundaries, the lexical cohesion-based approach alone can achieve competitive results, (2) for predicting top-level boundaries, the machine learning approach that combines lexical-cohesion and conversational features performs best, and (3) conversational cues, such as cue phrases and overlapping speech, are better indicators for the top-level prediction task. We also find that the transcription errors inevitable in ASR output have a negative impact on models that combine lexical-cohesion and conversational features, but do not change the general preference of approach for the two tasks
Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors
Statistical significance testing is widely accepted as a means to assess how
well a difference in effectiveness reflects an actual difference between
systems, as opposed to random noise because of the selection of topics.
According to recent surveys on SIGIR, CIKM, ECIR and TOIS papers, the t-test is
the most popular choice among IR researchers. However, previous work has
suggested computer intensive tests like the bootstrap or the permutation test,
based mainly on theoretical arguments. On empirical grounds, others have
suggested non-parametric alternatives such as the Wilcoxon test. Indeed, the
question of which tests we should use has accompanied IR and related fields for
decades now. Previous theoretical studies on this matter were limited in that
we know that test assumptions are not met in IR experiments, and empirical
studies were limited in that we do not have the necessary control over the null
hypotheses to compute actual Type I and Type II error rates under realistic
conditions. Therefore, not only is it unclear which test to use, but also how
much trust we should put in them. In contrast to past studies, in this paper we
employ a recent simulation methodology from TREC data to go around these
limitations. Our study comprises over 500 million p-values computed for a range
of tests, systems, effectiveness measures, topic set sizes and effect sizes,
and for both the 2-tail and 1-tail cases. Having such a large supply of IR
evaluation data with full knowledge of the null hypotheses, we are finally in a
position to evaluate how well statistical significance tests really behave with
IR data, and make sound recommendations for practitioners.Comment: 10 pages, 6 figures, SIGIR 201
What-if analysis: A visual analytics approach to Information Retrieval evaluation
This paper focuses on the innovative visual analytics approach realized by the Visual Analytics Tool for Experimental Evaluation (VATE2) system, which eases and makes more effective the experimental evaluation process by introducing the what-if analysis. The what-if analysis is aimed at estimating the possible effects of a modification to an Information Retrieval (IR) system, in order to select the most promising fixes before implementing them, thus saving a considerable amount of effort. VATE2 builds on an analytical framework which models the behavior of the systems in order to make estimations, and integrates this analytical framework into a visual part which, via proper interaction and animations, receives input and provides feedback to the user. We conducted an experimental evaluation to assess the numerical performances of the analytical model and a validation of the visual analytics prototype with domain experts. Both the numerical evaluation and the user validation have shown that VATE2 is effective, innovative, and useful
Evaluating epistemic uncertainty under incomplete assessments
The thesis of this study is to propose an extended methodology for laboratory based Information Retrieval evaluation under incomplete relevance assessments. This new methodology aims to identify potential uncertainty during system comparison that may result from incompleteness. The adoption of this methodology is advantageous, because the detection of epistemic uncertainty - the amount of knowledge (or ignorance) we have about the estimate of a system's performance - during the evaluation process can guide and direct researchers when evaluating new systems over existing and future test collections. Across a series of experiments we demonstrate how this methodology can lead towards a finer grained analysis of systems. In particular, we show through experimentation how the current practice in Information Retrieval evaluation of using a measurement depth larger than the pooling depth increases uncertainty during system comparison
- …