Search CORE

693 research outputs found

Dealing with Metonymic Readings of Named Entities

Author: Poibeau Thierry
Publication venue
Publication date: 01/01/2006
Field of study

The aim of this paper is to propose a method for tagging named entities (NE), using natural language processing techniques. Beyond their literal meaning, named entities are frequently subject to metonymy. We show the limits of current NE type hierarchies and detail a new proposal aiming at dynamically capturing the semantics of entities in context. This model can analyze complex linguistic phenomena like metonymy, which are known to be difficult for natural language processing but crucial for most applications. We present an implementation and some test using the French ESTER corpus and give significant results

arXiv.org e-Print Archive

CiteSeerX

HAL Descartes

eScholarship - University of California

HAL-Paris 13

Hal-Diderot

Albayzin 2010 Evaluation campaign: speaker diarization

Author: Hernando Pericás Francisco Javier
Schulz Henrik
Zelenak Martin
Publication venue
Publication date: 01/01/2010
Field of study

In this paper we present the evaluation results for the task of speaker diarization in broadcast news domain as part of the Albayzin 2010 evaluation campaign of language and speech technologies. The evaluation data was a subset of the Catalan broadcast news database recorded from the 3/24 TV channel. Six competing systems from five different universities were submitted for the Albayzin 2010: Speaker diarization session and the lowest diarization error rate obtained was 30.4%.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Recommended from our members

Speaker diarisation and longitudinal linking in multi-genre broadcast data

Author: Gales MJF
Karanasou P
Lanchantin P
Liu X
Qian Y
Wang L
Woodland PC
Zhang C
Publication venue: 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings
Publication date: 01/01/2015
Field of study

This paper presents a multi-stage speaker diarisation system with longitudinal linking developed on BBC multi-genre data for the 2015 Multi-Genre Broadcast (MGB) challenge. The basic speaker diarisation system draws on techniques from the Cambridge March 2005 system with a new deep neural network (DNN)-based speech/non speech segmenter. A newly developed linking stage is next added to the basic diarisation output aiming at the identification of speakers across multiple episodes of the same series. The longitudinal constraint imposes an incremental processing of the episodes, where speaker labels for each episode can be obtained using only material from the episode in question, and those broadcast earlier in time. The nature of the data as well as the longitudinal linking constraint position this diarisation task as a new open-research topic, and a particularly challenging one. Different linking clustering metrics are compared and the lowest within-episode and cross-episode DER scores are achieved on the MGB challenge evaluation set.This work is in part supported by EPSRC Programme Grant EP/I031022/1 (Natural Speech Technology). C. Zhang is also supported by a Cambridge International Scholarship from the Cambridge Commonwealth, European & International Trust.This is the author accepted manuscript. The final version is available from IEEE via http://dx.doi.org/10.1109/ASRU.2015.740485

Apollo (Cambridge)

Gender Representation in French Broadcast Corpora and Its Impact on ASR Performance

Author: Besacier Laurent
Garnerin Mahault
Rossato Solange
Publication venue
Publication date: 23/08/2019
Field of study

This paper analyzes the gender representation in four major corpora of French broadcast. These corpora being widely used within the speech processing community, they are a primary material for training automatic speech recognition (ASR) systems. As gender bias has been highlighted in numerous natural language processing (NLP) applications, we study the impact of the gender imbalance in TV and radio broadcast on the performance of an ASR system. This analysis shows that women are under-represented in our data in terms of speakers and speech turns. We introduce the notion of speaker role to refine our analysis and find that women are even fewer within the Anchor category corresponding to prominent speakers. The disparity of available data for both gender causes performance to decrease on women. However this global trend can be counterbalanced for speaker who are used to speak in the media when sufficient amount of data is available.Comment: Accepted to ACM Workshop AI4T

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Proposal for an Extension of Traditional Named Entitites: from Guidelines to Evaluation, an Overview

Author: Fort Karen
Galibert Olivier
Grouin Cyril
Quintard Ludovic
Rosset Sophie
Zweigenbaum Pierre
Publication venue: HAL CCSD
Publication date: 23/06/2011
Field of study

International audienceWithin the framework of the construction of a fact database, we defined guidelines to extract named entities, using a taxonomy based on an extension of the usual named entities defini- tion. We thus defined new types of entities with broader coverage including substantive- based expressions. These extended named en- tities are hierarchical (with types and compo- nents) and compositional (with recursive type inclusion and metonymy annotation). Human annotators used these guidelines to annotate a 1.3M word broadcast news corpus in French. This article presents the definition and novelty of extended named entity annotation guide- lines, the human annotation of a global corpus and of a mini reference corpus, and the evalu- ation of annotations through the computation of inter-annotator agreement. Finally, we dis- cuss our approach and the computed results, and outline further work

HAL-Paris 13

Hal-Diderot

Albayzín-2014 evaluation: audio segmentation and classification in broadcast news domains

Author: Castán Diego
Delgado Héctor
Docío-Fernández Laura
Franco-Pedroso Javier
Lleida Eduardo
Lopez-Otero Paula
Navas Eva
Ortega Alfonso R.
Ramos Daniel
Serrano Javier
Tavárez David E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0076-3Audio segmentation is important as a pre-processing task to improve the performance of many speech technology tasks and, therefore, it has an undoubted research interest. This paper describes the database, the metric, the systems and the results for the Albayzín-2014 audio segmentation campaign. In contrast to previous evaluations where the task was the segmentation of non-overlapping classes, Albayzín-2014 evaluation proposes the delimitation of the presence of speech, music and/or noise that can be found simultaneously. The database used in the evaluation was created by fusing different media and noises in order to increase the difficulty of the task. Seven segmentation systems from four different research groups were evaluated and combined. Their experimental results were analyzed and compared with the aim of providing a benchmark and showing up the promising directions in this field.This work has been partially funded by the Spanish Government and the European Union (FEDER) under the project TIN2011-28169-C05-02 and supported by the European Regional Development Fund and the Spanish Government (‘SpeechTech4All Project’ TEC2012-38939-C03

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

Repositorio Universidad de Zaragoza

Biblos-e Archivo

Building and exploiting a dependency treebank for French radio broadcasts

Author: Anderson Corinna
Cerisara Christophe
Gardent Claire
Publication venue
Publication date: 29/11/2010
Field of study

Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 31-42. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

DSpace at Tartu University Library

Leveraging study of robustness and portability of spoken language understanding systems across languages and domains: the PORTMEDIA corpora

Author: Besacier Laurent
Camelin Nathalie
Estève Yannick
Favre Benoit
Jabaian Bassam
Lefèvre Fabrice
Mostefa Djamel
Quignard Matthieu
Rojas Barahona Lina Maria
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

International audienceThe PORTMEDIA project is intended to develop new corpora for the evaluation of spoken language understanding systems. The newly collected data are in the field of human-machine dialogue systems for tourist information in French in line with the MEDIA corpus. Transcriptions and semantic annotations, obtained by low-cost procedures, are provided to allow a thorough evaluation of the systems' capabilities in terms of robustness and portability across languages and domains. A new test set with some adaptation data is prepared for each case: in Italian as an example of a new language, for ticket reservation as an example of a new domain. Finally the work is complemented by the proposition of a new high level semantic annotation scheme well-suited to dialogue data

Hal - Université Grenoble Alpes

HAL AMU

INRIA a CCSD electronic archive server

Un système de détection d'entités nommées adapté pour la campagne d'évaluation ESTER 2

Author: BRUN Caroline
EHRMANN MAUD
Publication venue: 'Associacio catalana de Salut Laboral'
Publication date: 14/04/2010
Field of study

In this paper, we report our participation to the ESTER 2 (Evaluation des Systèmes de Transcription Enrichie d¿Emissions Radiophoniques) evaluation campaign. After describing the goals, specificities and challenges of the campaign, we present our named entity detection system and focus on the adaptations made in the framework of the campaign. We present the results obtained during the competition and then new results obtained afterward. We then conclude by the lessons we learned from this experiment.JRC.G.2-Global security and crisis managemen

JRC Publications Repository