3,524,709 research outputs found
MIR task and evaluation techniques
Existing tasks in MIREX have traditionally focused on low-level MIR tasks working with flat (usually DSP-only) ground-truth. These evaluation techniques, however, can not evaluate the increasing number of algorithms that utilize relational data and are not currently utilizing the state of the art in evaluating ranked or ordered output. This paper summarizes the state of the art in evaluating relational ground-truth. These components are then synthesized into novel evaluation techniques that are then applied to 14 concrete music document retrieval tasks, demonstrating how these evaluation techniques can be applied in a practical context
Resources for Evaluation of Summarization Techniques
We report on two corpora to be used in the evaluation of component systems
for the tasks of (1) linear segmentation of text and (2) summary-directed
sentence extraction. We present characteristics of the corpora, methods used in
the collection of user judgments, and an overview of the application of the
corpora to evaluating the component system. Finally, we discuss the problems
and issues with construction of the test set which apply broadly to the
construction of evaluation resources for language technologies.Comment: LaTeX source, 5 pages, US Letter, uses lrec98.st
Utility-Based Evaluation of Adaptive Systems
The variety of user-adaptive hypermedia systems available calls for methods of comparison. Layered evaluation techniques appear to be useful for this purpose. In this paper we present a utility-based evaluation approach that is based on these techniques. Issues that arise when putting utility-based evaluation into practice are dealt with. We also explain the need for interpretative user models and common sets of evaluation criteria for different domains
Evaluation of optimization techniques for aggregation
Aggregations are almost always done at the top of operator tree after all selections
and joins in a SQL query. But actually they can be done before joins and make later
joins much cheaper when used properly. Although some enumeration algorithms
considering eager aggregation are proposed, no sufficient evaluations are available
to guide the adoption of this technique in practice. And no evaluations are done
for real data sets and real queries with estimated cardinalities. That means it is not
known how eager aggregation performs in the real world.
In this thesis, a new estimation method for group by and join combining traditional
estimation method and index-based join sampling is proposed and evaluated.
Two enumeration algorithms considering eager aggregation are implemented and
compared in the context of estimated cardinality. We find that the new estimation
method works well with little overhead and that under certain conditions, eager
aggregation can dramatically accelerate queries
Synchronous collaborative information retrieval: techniques and evaluation
Synchronous Collaborative Information Retrieval refers to
systems that support multiple users searching together at the same time in order to satisfy a shared information need. To date most SCIR systems have focussed on providing various awareness tools in order to enable collaborating users to coordinate the search task. However, requiring users to both search and coordinate the group activity may prove too demanding. On the other hand without effective coordination policies the group search may not be effective. In this paper we propose and evaluate novel system-mediated techniques for coordinating a group search. These techniques allow for an effective division of labour across the group whereby each group member can explore a subset of the search space.We also propose and evaluate techniques to support automated sharing of knowledge across searchers in SCIR, through novel collaborative and complementary relevance feedback techniques. In order to evaluate these techniques, we propose a framework for SCIR evaluation based on simulations. To populate these simulations we extract data from TREC interactive search logs. This work represent the first simulations of SCIR to date and the first such use of this TREC data
Empirical Evaluation of Mutation-based Test Prioritization Techniques
We propose a new test case prioritization technique that combines both
mutation-based and diversity-based approaches. Our diversity-aware
mutation-based technique relies on the notion of mutant distinguishment, which
aims to distinguish one mutant's behavior from another, rather than from the
original program. We empirically investigate the relative cost and
effectiveness of the mutation-based prioritization techniques (i.e., using both
the traditional mutant kill and the proposed mutant distinguishment) with 352
real faults and 553,477 developer-written test cases. The empirical evaluation
considers both the traditional and the diversity-aware mutation criteria in
various settings: single-objective greedy, hybrid, and multi-objective
optimization. The results show that there is no single dominant technique
across all the studied faults. To this end, \rev{we we show when and the reason
why each one of the mutation-based prioritization criteria performs poorly,
using a graphical model called Mutant Distinguishment Graph (MDG) that
demonstrates the distribution of the fault detecting test cases with respect to
mutant kills and distinguishment
Finding relevant documents using top ranking sentences: an evaluation of two alternative schemes
In this paper we present an evaluation of techniques that are designed to encourage web searchers to interact more with the results of a web search. Two specific techniques are examined: the presentation of sentences that highly match the searcher's query and the use of implicit evidence. Implicit evidence is evidence captured from the searcher's interaction with the retrieval results and is used to automatically update the display. Our evaluation concentrates on the effectiveness and subject perception of these techniques. The results show, with statistical significance, that the techniques are effective and efficient for information seeking
LINVIEW: Incremental View Maintenance for Complex Analytical Queries
Many analytics tasks and machine learning problems can be naturally expressed
by iterative linear algebra programs. In this paper, we study the incremental
view maintenance problem for such complex analytical queries. We develop a
framework, called LINVIEW, for capturing deltas of linear algebra programs and
understanding their computational cost. Linear algebra operations tend to cause
an avalanche effect where even very local changes to the input matrices spread
out and infect all of the intermediate results and the final view, causing
incremental view maintenance to lose its performance benefit over
re-evaluation. We develop techniques based on matrix factorizations to contain
such epidemics of change. As a consequence, our techniques make incremental
view maintenance of linear algebra practical and usually substantially cheaper
than re-evaluation. We show, both analytically and experimentally, the
usefulness of these techniques when applied to standard analytics tasks. Our
evaluation demonstrates the efficiency of LINVIEW in generating parallel
incremental programs that outperform re-evaluation techniques by more than an
order of magnitude.Comment: 14 pages, SIGMO
CoFeD: A visualisation framework for comparative quality evaluation
Evaluation for the purpose of selection can be a challenging task particularly when there is a plethora of choices available. Short-listing, comparisons and eventual choice(s) can be aided by visualisation techniques. In this paper we use Feature Analysis, Tabular and Tree Representations and Composite Features Diagrams (CFDs) for profiling user requirements and for top-down profiling and evaluation of items (methods, tools, techniques, processes and so on) under evaluation. The resulting framework CoFeD enables efficient visual comparison and initial short-listing. The second phase uses bottom-up quantitative evaluation which aids the elimination of the weakest items and hence the effective selection of the most appropriate item.
The versatility of the framework is illustrated by a case study comparison and evaluation of two agile methodologies. The paper concludes with limitations and indications of further work
Some remarks on IBNR evaluation techniques.
In this short note we give some comments and general remarks on the methodology of IBNR computations, as presented at the workshop on IBNR computations at the 2000 ASTIN Meeting, Porto Cervo, Sardinia.Evaluation;
- …
