Search CORE

9 research outputs found

Are the Best Multilingual Document Embeddings simply Based on Sentence Embeddings?

Author: Espana-Bonet Cristina
Sannigrahi Sonal
van Genabith Josef
Publication venue
Publication date: 28/04/2023
Field of study

Dense vector representations for textual data are crucial in modern NLP. Word embeddings and sentence embeddings estimated from raw texts are key in achieving state-of-the-art results in various tasks requiring semantic understanding. However, obtaining embeddings at the document level is challenging due to computational requirements and lack of appropriate data. Instead, most approaches fall back on computing document embeddings based on sentence representations. Although there exist architectures and models to encode documents fully, they are in general limited to English and few other high-resourced languages. In this work, we provide a systematic comparison of methods to produce document-level representations from sentences based on LASER, LaBSE, and Sentence BERT pre-trained multilingual models. We compare input token number truncation, sentence averaging as well as some simple windowing and in some cases new augmented and learnable approaches, on 3 multi- and cross-lingual tasks in 8 languages belonging to 3 different language families. Our task-based extrinsic evaluations show that, independently of the language, a clever combination of sentence embeddings is usually better than encoding the full document as a single unit, even when this is possible. We demonstrate that while a simple sentence average results in a strong baseline for classification tasks, more complex combinations are necessary for semantic tasks.Comment: EACL 2023 Findings paper, to present at LoResM

arXiv.org e-Print Archive

An Empirical Analysis of NMT-Derived Interlingual Embeddings and Their Use in Parallel Sentence Identification

Author: Adam Csaba Varga
Alberto Barron-Cedeno
Cristina Espana-Bonet
Josef van Genabith
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

The effect of domain and diacritics in Yorùbá-English neural machine translation

Author: Adebonojo Damilola
Adelani David,
Adeyemi Mofetoluwa
Alabi Jesujoba,
Awokoya Ayodele
Ayeni Adesina
Espana-Bonet Cristina
Ruiter Dana
Publication venue: HAL CCSD
Publication date: 16/08/2021
Field of study

International audienceMassively multilingual machine translation (MT) has shown impressive capabilities, including zero and few-shot translation between low-resource language pairs. However, these models are often evaluated on high-resource languages with the assumption that they generalize to low-resource ones. The difficulty of evaluating MT models on low-resource pairs is often due to lack of standardized evaluation datasets. In this paper, we present MENYO-20k, the first multi-domain parallel corpus with a special focus on clean orthography for Yorùbá-English with standardized train-test splits for benchmarking. We provide several neural MT benchmarks and compare them to the performance of popular pre-trained (massively multilingual) MT models both for the heterogeneous test set and its subdomains. Since these pre-trained models use huge amounts of data with uncertain quality, we also analyze the effect of diacritics, a major characteristic of Yorùbá, in the training data. We investigate how and when this training condition affects the final quality and intelligibility of a translation. Our models outperform massively multilingual models such as Google (+8.7 BLEU) and Facebook M2M (+9.1 BLEU) when translating to Yorùbá, setting a high quality benchmark for future research

HAL Descartes

Findings of the Second WMT Shared Task on Sign Language Translation (WMT-SLT23)

Author: Alikhani Malihe
Avramidis Eleftherios
Bowden Richard
Braffort Annelies
Camgöz Necati Cihan
Ebling Sarah
Espana-Bonet Cristina
Grundkiewicz Roman
Göhring Anne
Inan Mert
Jiang Zifan
Koller Oscar
Landuyt Davy Van
Moryossef Amit
Müller Mathias
Rios Annette
Shterionov Dimitar
Sidler-Miserez Sandra
Tissi Katja
Publication venue: HAL CCSD
Publication date: 01/12/2023
Field of study

International audienceThis paper presents the results of the Second WMT Shared Task on Sign Language Translation (WMT-SLT23) 1 . This shared task is concerned with automatic translation between signed and spoken 2 languages. The task is unusual in the sense that it requires processing visual information (such as video frames or human pose estimation) beyond the well-known paradigm of text-to-text machine translation (MT). The task offers four tracks involving the following languages: Swiss German Sign Language (DSGS), French Sign Language of Switzerland (LSF-CH), Italian Sign Language of Switzerland (LIS-CH), German, French and Italian. Four teams (including one working on a baseline submission) participated in this second edition of the task, all submitting to the DSGS-to-German track. Besides a system ranking and system papers describing state-of-the-art techniques, this shared task makes the following scientific contributions: novel corpora and reproducible baseline systems. Finally, the task also resulted in publicly available sets of system outputs and more human evaluation scores for sign language translation

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

ZORA

Findings of the Second WMT Shared Task on Sign Language Translation (WMT-SLT23)

Author: Alikhani Malihe
Avramidis Eleftherios
Bowden Richard
Braffort Annelies
Camgöz Necati Cihan
Ebling Sarah
Espana-Bonet Cristina
Grundkiewicz Roman
Göhring Anne
Inan Mert
Jiang Zifan
Koller Oscar
Landuyt Davy Van
Moryossef Amit
Müller Mathias
Rios Annette
Shterionov Dimitar
Sidler-Miserez Sandra
Tissi Katja
Publication venue: HAL CCSD
Publication date: 06/12/2023
Field of study

HAL-CentraleSupelec

First WMT Shared Task on Sign Language Translation (WMT-SLT22)

Author: Avramidis Eleftherios
Battisti Alessia
Berger Michèle
Bowden Richard
Braffort Annelies
Camgöz Necati Cihan
Ebling Sarah
Espana-Bonet Cristina
Grundkiewicz Roman
Jiang Zifan
Koller Oscar
Moryossef Amit
Müller Mathias
Perrollaz Regula
Reinhard Sabine
Rios Annette
Shterionov Dimitar
Sidler-Miserez Sandra
Tissi Katja
Van Landuyt Davy
Publication venue: HAL CCSD
Publication date: 01/06/2023
Field of study

International audienceThis paper is a brief summary of the First WMT Shared Task on Sign Language Translation (WMT-SLT22), a project partly funded by EAMT. The focus of this shared task is automatic translation between signed and spoken languages. Details can be found on our website 1 or in the findings paper (Müller et al., 2022)

HAL-CentraleSupelec

First WMT Shared Task on Sign Language Translation (WMT-SLT22)

Author: Avramidis Eleftherios
Battisti Alessia
Berger Michèle
Bowden Richard
Braffort Annelies
Camgöz Necati Cihan
Ebling Sarah
Espana-Bonet Cristina
Grundkiewicz Roman
Jiang Zifan
Koller Oscar
Moryossef Amit
Müller Mathias
Perrollaz Regula
Reinhard Sabine
Rios Annette
Shterionov Dimitar
Sidler-Miserez Sandra
Tissi Katja
Van Landuyt Davy
Publication venue: HAL CCSD
Publication date: 01/06/2023
Field of study

INRIA a CCSD electronic archive server

Role of age and comorbidities in mortality of patients with infective endocarditis.

Author: Adán Iván
Agüero Balbín Jesús
Alcaraz Vidal Begoña
Almela Manuel
Almendro Delia Manuel
Alonso Luis Javier
Alonso Mª Del Mar
Alonso Ángel
Amado Cristina
Ambrosioni Juan
Antorrena Isabel
Araji Omar
Aramburu Javier
Aramendi José
Armiñanzas Castillo Carlos
Armiñanzas Carlos
Arnaiz García Ana
Arribas Leal José Mª
Arrizabalaga Asenjo María
Asensi Álvarez Víctor
Aured Guallar Carmen
Azcona Gutiérrez José Manuel
Azcárate Pedro
Azqueta Manuel
Barquero José Miguel
Bellón Munera Mª Carmen
Bereciartua Elena
Bermejo Javier
Blanco José Ramón
Blanco María José
Blanco Roberto
Blázquez Ana
Boado María Victoria
Bodro Marta
Bouza Emilio
Bravo-Ferrer José María
Brunet Mercè
Calvo Jambrina Román
Calvo Felicitas Elena
Campaña Lázaro Marta
Canut Blasco Andrés
Carrasco Rafael
Cartañá Ramón
Cascales Alcolea Eva
Castelo Corral Laura
Castelo Laura
Castro Beatriz
Celemín Daniel
Centella Tomasa
Cifuentes Luna Carmen
Climent Vicente
Cobo Belaustegui Manuel
Cobos Trigueros Nazaret
Cordo Mollar José
Costas Carlos
Crespo Alejandro
Cuenca José
Cuende Ana María
Cuerpo Caballero Gregorio
de Alarcón Arístides
de Benito Natividad
de Cueto Marina
de la Hera Jesús
de la Morena Valenzuela Gonzalo
de la Torre Lima Javier
Del Amor Espín María Jesús
Del Río Alejandro
Delgado Montero Antonia
Domínguez Fernando
Durán Mª Del Carmen
Echeverría Tomás
Egea Serrano Pilar
Escribano Garaizabal Elena
Falces Carlos
Fariñas Maria Carmen
Fariñas María Carmen
Fariñas-Alvarez Concepción
Fariñas-Álvarez Concepción
Fernández Abad Nuria
Fernández Cruz Ana
Fernández Suárez Jonnathan
Fernández Sánchez Fernando
Fernández A L
Fita Guillermina
Fuerte Ana
Fuster David
Gainzarain Arana Juan Carlos
Gaminde Eduardo
García de la Mària Cristina
García Domínguez Gloria
García Leoni Mª Eugenia
García López Mª Victoria
García Mangas Pilar
García Mansilla Ana
García Pavía Pablo
García Rosado Dácil
García Uriarte Oscar
García Vázquez Elisa
García Emilio
García Iván
García-Álvarez Lara
Georgieva Radka Ivanova
Giner Caro José Antonio
Goenaga Miguel Ángel
Goenaga Miguel Ángel
Goikoetxea Josune
González Ramallo Víctor
González Ramallo Víctor
González Jesús
González-Rico Claudia
Gurguí Mercé
Gutiérrez Díez José
Gutiérrez-Cuadra Manuel
Gálvez Acebal Juan
Gálvez-Acebal Juan
Gómez Izquierdo Rubén
Haro Juan Luis
Heredero Gálvez Eva
Hermida José Manuel
Hernández Roca José Joaquín
Hernández Torres Alicia
Hernández-Meneses Marta
Hidalgo Tenorio Carmen
Hualde Amaia Mari
Idígoras Pedro
Iglesias Fraile Lisardo
Iqbal-Mirza Sadaf Zafar
Iribarren José Antonio
Iruretagoyena José Ramón
Irurzun Zuazabal Josu
Izaguirre Yarza Alberto
Jimeno Almazán Amaya
Jiménez Sánchez Roberto
Keituqwa Yañez Ivan
Kestler Hernández Martha
Kortajarena Urkola Xabier
Lacalzada Juan
Largo Pau José
Lepe José Antonio
León Arguero Víctor
Llamas Patricio
Llinares Pedro
Llopis Pérez Jaume
Loeches Belén
Luque Rafael
López Menéndez José
López Francisco
López-Cortés Luis Eduardo
López-Soria Leire
Maicas Bellido Carolina
Marco Francesc
Martín López Alejandro
Martín Quirós Alejandro
Martín-Dávila Pilar
Martínez Marcos Francisco Javier
Martínez Sellés Manuel
Martínez Amparo
Martínez Francisco Javier
Martínez-Sellés Manuel
Marín Mercedes
Matamala Adell Marta
Mencia Bajo Pilar
Menárguez Mª Cruz
Merino Esperanza
Miguel Gómez Mª Antonia
Miguez Rey Enrique
Miró Meda José Mª
Miró José M
Montejo Miguel
Morales Carlos
Morales Isabel
Moreno Escobar Eduardo
Moreno Rodríguez Anai
Moreno Torrico Alfonso
Moreno Asunción
Moreno Mar
Moya José Luis
Muñoz Patricia
Muñoz Patricia
Méndez Irene
Nassar Ibrahim
Navas Enrique
Nicolás David
Nieto Javier
Ninot Salvador
Noureddine Mariam
Núñez Morcillo Juana
Ojeda Burgos Guillermo
Ojeda Guillermo
Oliva Enrique
Orden Beatriz
Ortiz de Zárate Zuriñe
Ortín Freire Alejandro
Oteo José Antonio
Pacho Cristina
Pajarón Marcos
Palacián Ruiz Mª Pilar
Palomo Carmen
Parra José Antonio
Paré Carlos
Paya Martínez Begoña
Peláez Ballesta Ana
Pereda Daniel
Pericas Roser
Pericás Ramis Pere
Pericás Juan M
Pericás Juan Manuel
Peña Monje Alejandro
Pinilla Blanca
Pinto Ángel
Plata Ciezar Antonio
Plazas Joaquín
Pomar José L
Pons Guillem
Porres Juan Carlos
Prieto A
Pérez Seco Mª Cruz
Quintana Eduardo
Ramos Antonio
Ramírez José
Ramírez Ulises
Regueiro Benito
Reguera Iglesias José Mª
Reus Sergio
Reviejo Carlos
Rial Bastón Verónica
Ribas Blanco Mª Ángels
Rincón Cristina
Rodrigo David
Rodríguez Bailón Isabel
Rodríguez Esteban Ángeles
Rodríguez García Raquel
Rodríguez Mayo María
Rodríguez Álvarez Regino
Rodríguez David
Rodríguez Regino
Rodríguez-Abella Hugo
Rodríguez-Créixems Marta
Romero María
Rosas Gabriel
Rovira Irene
Ruiz de Gopegui Bordes Enrique
Ruiz Morales Josefa
Ruiz Soledad
Saldaña Araceli
Sandoval Elena
Sanz Mercedes
Sarralde Aurelio
Segura Luque Juan Carlos
Sepúlveda Mª Antonia
Sitges Marta
Soriano Víctor
Sousa Regueiro Dolores
Soy Dolors
Spanish Collaboration on Endocarditis — Grupo de Apoyo al Manejo de la Endocarditis Infecciosa en Espana Study Group
Sánchez Cabrera Valme
Sánchez Efrén
Sánchez-Porto Antonio
Tarabini-Castellani Paola
Teira Ramón
Telenti Asensio Mauricio
Tercero Martínez Antonia
Tijeira E
Toledano Sierra Pilar
Tolosana José M
Téllez Adrián
Urturi Matos José Antonio
Valerio Maricela
Vega Marino
Verde Moreno Eduardo
Vidal Bonet Laura
Vidal Bárbara
Vila Jordi
Villoslada Gelabert Aroa
Vinuesa García David
Viqueira González Monserrat
Vitoria Yolanda
Voces Roberto
Vázquez Pilar
Zarauza Jesús
Zarauza Jesús
Álvarez M
Álvarez Nemesio
Publication venue: 'Elsevier BV'
Publication date: 21/03/2019
Field of study

The aim of this study was to analyse the characteristics of patients with IE in three groups of age and to assess the ability of age and the Charlson Comorbidity Index (CCI) to predict mortality. Prospective cohort study of all patients with IE included in the GAMES Spanish database between 2008 and 2015.Patients were stratified into three age groups: A total of 3120 patients with IE (1327  There were no differences in the clinical presentation of IE between the groups. Age ≥ 80 years, high comorbidity (measured by CCI),and non-performance of surgery were independent predictors of mortality in patients with IE.CCI could help to identify those patients with IE and surgical indication who present a lower risk of in-hospital and 1-year mortality after surgery, especially in th

Fondo Bibliográfico Digital Institucional