1,815 research outputs found
A repository of data and evaluation resources for natural language generation
Starting in 2007, the field of natural language generation (NLG) has organised shared-task evaluation events every year, under the
Generation Challenges umbrella. In the course of these shared tasks, a wealth of data has been created, along with associated task
definitions and evaluation regimes. In other contexts too, sharable NLG data is now being created. In this paper, we describe the online
repository that we have created as a one-stop resource for obtaining NLG task materials, both from Generation Challenges tasks and
from other sources, where the set of materials provided for each task consists of (i) task definition, (ii) input and output data, (iii)
evaluation software, (iv) documentation, and (v) publications reporting previous results.peer-reviewe
The GREC main subject reference generation challenge 2009 : overview and evaluation results
The GREC-MSR Task at Generation Challenges 2009 required participating systems to select coreference chains to the main subject of short encyclopaedic texts collected from Wikipedia. Three teams submitted one system each, and we additionally created four baseline systems. Systems were tested automatically using existing intrinsic metrics. We also evaluated systems extrinsically by applying coreference resolution tools to the outputs and measuring the success of the tools. In addition, systems were tested in an intrinsic evaluation involving human judges. This report describes the GREC-MSR Task and the evaluation methods applied, gives brief descriptions of the participating systems, and presents the evaluation results.peer-reviewe
Recommended from our members
An evaluation framework for stereo-based driver assistance
This is the post-print version of the Article - Copyright @ 2012 Springer VerlagThe accuracy of stereo algorithms or optical flow methods is commonly assessed by comparing the results against the Middlebury
database. However, equivalent data for automotive or robotics applications
rarely exist as they are difficult to obtain. As our main contribution, we introduce an evaluation framework tailored for stereo-based driver assistance able to deliver excellent performance measures while
circumventing manual label effort. Within this framework one can combine several ways of ground-truthing, different comparison metrics, and use large image databases.
Using our framework we show examples on several types of ground truthing techniques: implicit ground truthing (e.g. sequence recorded without a crash occurred), robotic vehicles with high precision sensors, and to a small extent, manual labeling. To show the effectiveness of our evaluation framework we compare three different stereo algorithms on
pixel and object level. In more detail we evaluate an intermediate representation
called the Stixel World. Besides evaluating the accuracy of the Stixels, we investigate the completeness (equivalent to the detection rate) of the StixelWorld vs. the number of phantom Stixels. Among many findings, using this framework enables us to reduce the number of phantom Stixels by a factor of three compared to the base parametrization. This base parametrization has already been optimized by test driving vehicles for distances exceeding 10000 km
- …