Search CORE

30 research outputs found

Structuring knowledge for reference generation : a clustering algorithm

Author: 11th Conference of the European Chapter of the Association for computational Linguistics
Gatt Albert
Publication venue: The European Chapter of the ACL (EACL)
Publication date: 01/01/2006
Field of study

This paper discusses two problems that arise in the Generation of Referring Expressions: (a) numeric-valued attributes, such as size or location; (b) perspective-taking in reference. Both problems, it is argued, can be resolved if some structure is imposed on the available knowledge prior to content determination. We describe a clustering algorithm which is sufficiently general to be applied to these diverse problems, discuss its application, and evaluate its performance.peer-reviewe

OAR@UM

Demonstration of a prototype for a conversational companion for reminiscing about images

Author: 48th Annual Meeting of the Association for Computational Linguistics
Catizone Roberta
Cheng Weiwei
Dingli Alexiei
Wilks Yorick
Publication venue: The Association for Computational Linguistics
Publication date: 01/01/2010
Field of study

This work was funded by the Companions project (2006-2009) sponsored by the European Commission as part of the Information Society Technologies (IST) programme under EC grant number IST-FP6-034434.This paper describes an initial prototype demonstrator of a Companion, designed as a platform for novel approaches to the following: 1) The use of Information Extraction (IE) techniques to extract the content of incoming dialogue utterances after an Automatic Speech Recognition (ASR) phase, 2) The conversion of the input to Resource Descriptor Format (RDF) to allow the generation of new facts from existing ones, under the control of a Dialogue Manger (DM), that also has access to stored knowledge and to open knowledge accessed in real time from the web, all in RDF form, 3) A DM implemented as a stack and network virtual machine that models mixed initiative in dialogue control, and 4) A tuned dialogue act detector based on corpus evidence. The prototype platform was evaluated, and we describe this briefly; it is also designed to support more extensive forms of emotion detection carried by both speech and lexical content, as well as extended forms of machine learning.peer-reviewe

OAR@UM

Intrinsic vs. extrinsic evaluation measures for referring expression generation

Author: 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies
Belz Anja
Gatt Albert
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2008
Field of study

In this paper we present research in which we apply (i) the kind of intrinsic evaluation metrics that are characteristic of current comparative HLT evaluation, and (ii) extrinsic, human task-performance evaluations more in keeping with NLG traditions, to 15 systems implementing a language generation task. We analyse the evaluation results and find that there are no significant correlations between intrinsic and extrinsic evaluation measures for this task.peer-reviewe

OAR@UM

Named Entity Recognition as Dependency Parsing

Author: Bohnet B
Poesio M
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Yu J
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing, concerned with identifying spans of text expressing references to entities. NER research is often focused on flat entities only (flat NER), ignoring the fact that entity references can be nested, as in [Bank of [China]] (Finkel and Manning, 2009). In this paper, we use ideas from graph-based dependency parsing to provide our model a global view on the input via a biaffine model (Dozat and Manning, 2017). The biaffine model scores pairs of start and end tokens in a sentence which we use to explore all spans, so that the model is able to predict named entities accurately. We show that the model works well for both nested and flat NER through evaluation on 8 corpora and achieving SoTA performance on all of them, with accuracy gains of up to 2.2 percentage points

Crossref

Queen Mary Research Online

Mining web sites using adaptive information extraction

Author: 10th Conference on European Chapter of the Association for Computational Linguistics
Ciravegna Fabio
Dingli Alexiei
Guthrie David
Wilks Yorick
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2003
Field of study

Adaptive Information Extraction systems (IES) are currently used by some Semantic Web (SW) annotation tools as support to annotation (Handschuh et al., 2002; Vargas-Vera et al., 2002). They are generally based on fully supervised methodologies requiring fairly intense domain-specific annotation. Unfortunately, selecting representative examples may be difficult and annotations can be incorrect and require time. In this paper we present a methodology that drastically reduce (or even remove) the amount of manual annotation required when annotating consistent sets of pages. A very limited number of user-defined examples are used to bootstrap learning. Simple, high precision (and possibly high recall) IE patterns are induced using such examples, these patterns will then discover more examples which will in turn discover more patterns, etc.peer-reviewe

OAR@UM

Letters from the past : modeling historical sound change through diachronic character embeddings

Author: Boldsen Sidsel
Paggio Patrizia
The 60th Annual Meeting of the Association for Computational Linguistics
Publication venue: Association for Computational Linguistics
Publication date: 01/01/2022
Field of study

While a great deal of work has been done on NLP approaches to lexical semantic change detection, other aspects of language change have received less attention from the NLP community. In this paper, we address the detection of sound change through historical spelling. We propose that a sound change can be captured by comparing the relative distance through time between the distributions of the characters involved before and after the change has taken place. We model these distributions using PPMI character embeddings. We verify this hypothesis in synthetic data and then test the method’s ability to trace the well-known historical change of lenition of plosives in Danish historical sources. We show that the models are able to identify several of the changes under consideration and to uncover meaningful contexts in which they appeared. The methodology has the potential to contribute to the study of open questions such as the relative chronology of sound shifts and their geographical distributionpeer-reviewe

OAR@UM

Active PETs: Active Data Annotation Prioritisation for Few-Shot Claim Verification with Pattern Exploiting Training

Author: Conference of the European Chapter of the Association for Computational Linguistics Findings of EACL 2023
Zeng X
Zubiaga A
Publication venue
Publication date: 01/01/2023
Field of study

To mitigate the impact of the scarcity of labelled data on fact-checking systems, we focus on few-shot claim verification. Despite recent work on few-shot classification by proposing advanced language models, there is a dearth of research in data annotation prioritisation that improves the selection of the few shots to be labelled for optimal model performance. We propose Active PETs, a novel weighted approach that utilises an ensemble of Pattern Exploiting Training (PET) models based on various language models, to actively select unlabelled data as candidates for annotation. Using Active PETs for few-shot data selection shows consistent improvement over the baseline methods, on two technical fact-checking datasets and using six different pretrained language models. We show further improvement with Active PETs-o, which further integrates an oversampling strategy. Our approach enables effective selection of instances to be labelled where unlabelled data is abundant but resources for labelling are limited, leading to consistently improved few-shot claim verification performance

Queen Mary Research Online

An OCR system for the Unified Northern Alphabet

Author: Association for Computational Linguistics
Kaalep Heiki-Jaan
Partanen Niko
Pirinen Tommi A.
Rießler Michael
Tyers Francis M.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

Partanen N, Rießler M. An OCR system for the Unified Northern Alphabet. In: Pirinen TA, Kaalep H-J, Tyers FM, Association for Computational Linguistics, eds. The fifth International Workshop on Computational Linguistics for Uralic Languages. Tartu: Association for Computational Linguistics; 2019: 77-89.This paper presents experiments done in order to build a functional OCR model for the Unified Northern Alphabet. This writing system was used between 1931 and 1937 for 16 (Uralic and non-Uralic) minority languages spoken in the Soviet Union. The character accuracy of the developed model reaches more than 98% and clearly shows cross-linguistic applicability. The tests described here therefore also include general guidelines for the amount of training data needed to boot-strap an OCR system under similar conditions

Publications at Bielefeld University

Einsatz von Data-Mining in modernen Produktentstehungsprozessen

Author: Association for Computational Linguistics
Publication venue: 'Carl Hanser Verlag'
Publication date
Field of study

Crossref