Search CORE

27 research outputs found

Identification of context markers for Russian nouns

Author: Grachkova Maria
Shimorina Anastasia
Publication venue
Publication date: 10/05/2011
Field of study

Proceedings of the 18th Nordic Conference of Computational Linguistics NODALIDA 2011. Editors: Bolette Sandford Pedersen, Gunta Nešpore and Inguna Skadiņa. NEALT Proceedings Series, Vol. 11 (2011), 344-347. © 2011 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/1695

DSpace at Tartu University Library

Split and Rephrase

Author: Cohen Shay
Gardent Claire
Narayan Shashi
Shimorina Anastasia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

We propose a new sentence simplification task (Split-and-Rephrase) where the aim is to split a complex sentence into a meaning preserving sequence of shorter sentences. Like sentence simplification, splitting-and-rephrasing has the potential of benefiting both natural language processing and societal applications. Because shorter sentences are generally better processed by NLP systems, it could be used as a preprocessing step which facilitates and improves the performance of parsers, semantic role labellers and machine translation systems. It should also be of use for people with reading disabilities because it allows the conversion of longer sentences into shorter ones. This paper makes two contributions towards this new task. First, we create and make available a benchmark consisting of 1,066,115 tuples mapping a single complex sentence to a sequence of sentences expressing the same meaning. Second, we propose five models (vanilla sequence-to-sequence to semantically-motivated models) to understand the difficulty of the proposed task.Comment: 11 pages, EMNLP 201

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

The human evaluation datasheet: a template for recording details of human evaluation experiments in NLP

Author: Belz Anya
Shimorina Anastasia
Publication venue: Association for Computing Machinery (ACM)
Publication date: 17/03/2021
Field of study

This paper presents the Human Evaluation Datasheet (HEDS), a template for recording the details of individual human evaluation experiments in Natural Language Processing (NLP), and reports on first experience of researchers using HEDS sheets in practice. Originally taking inspiration from seminal papers by Bender and Friedman (2018), Mitchell et al. (2019), and Gebru et al. (2020), HEDS facilitates the recording of properties of human evaluations in sufficient detail, and with sufficient standardisation, to support comparability, meta-evaluation, and reproducibility assessments for human evaluations. These are crucial for scientifically principled evaluation, but the overhead of completing a detailed datasheet is substantial, and we discuss possible ways of addressing this and other issues observed in practice

arXiv.org e-Print Archive

Aberdeen University Research

DCU Online Research Access Service

ReproGen : Proposal for a Shared Task on Reproducibility of Human Evaluations in NLG

Author: Agarwal Shubham
Belz Anya
Reiter Ehud
Shimorina Anastasia
Publication venue
Publication date: 01/12/2020
Field of study

Peer reviewedPublisher PD

Aberdeen University Research

University of Brighton Research Portal

The 2022 ReproGen Shared Task on Reproducibility of Evaluations in NLG : Overview and Results

Author: Belz Anya
Popović Maja
Reiter Ehud
Shimorina Anastasia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/07/2022
Field of study

Publisher PD

Aberdeen University Research

DCU Online Research Access Service

Creating Training Corpora for NLG Micro-Planning

Author: Gardent Claire
Narayan Shashi
Perez-Beltrachini Laura
Shimorina Anastasia
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

International audienceIn this paper, we focus on how to create data-to-text corpora which can support the learning of wide-coverage micro-planners i.e., generation systems that handle lexicalisation, aggregation, surface re-alisation, sentence segmentation and referring expression generation. We start by reviewing common practice in designing training benchmarks for Natural Language Generation. We then present a novel framework for semi-automatically creating linguistically challenging NLG corpora from existing Knowledge Bases. We apply our framework to DBpedia data and compare the resulting dataset with (Wen et al., 2016)'s dataset. We show that while (Wen et al., 2016)'s dataset is more than twice larger than ours, it is less diverse both in terms of input and in terms of text. We thus propose our corpus generation framework as a novel method for creating challenging data sets from which NLG models can be learned which are capable of generating text from KB data

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

WebNLG Challenge: Human Evaluation Results

Author: Gardent Claire
Narayan Shashi
Perez-Beltrachini Laura
Shimorina Anastasia
Publication venue: HAL CCSD
Publication date: 15/01/2018
Field of study

This report presents the human evaluation results for the WebNLG Challenge which was held in 2017. The automatic evaluation results can be foundin [Gardent et al., 2017a]. In this report, we describe human evaluation design, communicate the results, and explore correlation between automaticand human assessments

INRIA a CCSD electronic archive server

Mapping Natural Language to Description Logic

Author: Cruz-Lara Samuel
Gardent Claire
Gyawali Bikash
Mahfoudh Mariem
Shimorina Anastasia
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

International audienceWhile much work on automated ontology enrichment has focused on mining text for concepts and relations, little attention has been paid to the task of enriching ontologies with complex axioms. In this paper, we focus on a form of text that is frequent in industry, namely system installation design principle (SIDP) and we present a framework which can be used both to map SIDPs to OWL DL axioms and to assess the quality of these automatically derived axioms. We present experimental results on a set of 960 SIDPs provided by Airbus which demonstrate (i) that the approach is robust (97.50% of the SIDPs can be parsed) and (ii) that DL axioms assigned to full parses are very likely to be correct in 96% of the cases

Crossref

INRIA a CCSD electronic archive server