Search CORE

106 research outputs found

Combining semantic and syntactic generalization in example-based machine translation

Author: Ebling Sarah
Kumar Naskar Sudip
Volk Martin
Way Andy
Publication venue: European Association for Machine Translation
Publication date: 30/05/2011
Field of study

In this paper, we report our experiments in combining two EBMT systems that rely on generalized templates, Marclator and CMU-EBMT, on an English–German translation task. Our goal was to see whether a statistically signiﬁcant improvement could be achieved over the individual performances of these two systems. We observed that this was not the case. However, our system consistently outperformed a lexical EBMT baseline system

CiteSeerX

Irish Universities

DCU Online Research Access Service

Automatic Annotation Elaboration as Feedback to Sign Language Learners

Author: Battisti Alessia
Ebling Sarah
Publication venue
Publication date: 22/03/2024
Field of study

Beyond enabling linguistic analyses, linguistic annotations may serve as training material for developing automatic language assessment models as well as for providing textual feedback to language learners. Yet these linguistic annotations in their original form are often not easily comprehensible for learners. In this paper, we explore the utilization of GPT-4, as an example of a large language model (LLM), to process linguistic annotations into clear and understandable feedback on their productions for language learners, specifically sign language learners

ZORA

Target-Level Sentence Simplification as Controlled Paraphrasing

Author: Ebling Sarah
Kew Tannon
Publication venue
Publication date: 08/12/2022
Field of study

Automatic text simplification aims to reduce the linguistic complexity of a text in order to make it easier to understand and more accessible. However, simplified texts are consumed by a diverse array of target audiences and what might be appropriately simplified for one group of readers may differ considerably for another. In this work we investigate a novel formulation of sentence simplification as paraphrasing with controlled decoding. This approach aims to alleviate the major burden of relying on large amounts of in-domain parallel training data, while at the same time allowing for modular and adaptive simplification. According to automatic metrics, our approach performs competitively against baselines that prove more difficult to adapt to the needs of different tar- get audiences or require significant amounts of complex-simple parallel aligned data

ZORA

20 Minuten: A Multi-task News Summarisation Dataset for German

Author: Ebling Sarah
Kew Tannon
Kostrzewa Marek
Publication venue
Publication date: 14/06/2023
Field of study

Automatic text summarisation (ATS) is a central task in natural language processing that aims to reduce a long document into a shorter, concise summary that conveys its key points. Extractive approaches to ATS, which identify and copy the most important sentences or phrases from the original text, have long been a popular choice, but these summaries suffer from being incohesive and disjointed. More recently, abstractive approaches to ATS have gained popularity thanks to advancements in neural text generation. Yet, much of the research on ATS has been limited to English, due to its high-resource dominance. This work introduces a new dataset for German- language news summarisation. Aside from summarisation, the dataset also allows for addressing additional NLP tasks such as image caption generation and read- ing time prediction. Furthermore, it is multi-purpose since article summaries cover a range of styles, including headlines, lead paragraphs and bullet-point summaries. In order to showcase the versatility of our dataset for different NLP tasks, we conduct experiments using mT5 [2] and compare the performance on six different tasks under single- and multi-task fine-tuning conditions, providing baselines for future work. Our findings show that dedicated models consistently perform better according to automatic metrics

ZORA

A Multilingual Simplified Language News Corpus

Author: Ebling Sarah
Hauser Renate
Vamvas Jannis
Volk Martin
Publication venue: European Language Resources Association
Publication date: 24/06/2022
Field of study

Simplified language news articles are being offered by specialized web portals in several countries. The thousands of articles that have been published over the years are a valuable resource for natural language processing, especially for efforts towards automatic text simplification. In this paper, we present SNIML, a large multilingual corpus of news in simplified language. The corpus contains 13k simplified news articles written in one of six languages: Finnish, French, Italian, Swedish, English, and German. All articles are shared under open licenses that permit academic use. The level of text simplification varies depending on the news portal. We believe that even though SNIML is not a parallel corpus, it can be useful as a complement to the more homogeneous but often smaller corpora of news in the simplified variety of one language that are currently in use

ZORA

Machine Translation between Spoken Languages and Signed Languages Represented in SignWriting

Author: Ebling Sarah
Jiang Zifan
Moryossef Amit
Müller Mathias
Publication venue: Association for Computational Linguistics
Publication date: 01/05/2023
Field of study

This paper presents work on novel machine translation (MT) systems between spoken and signed languages, where signed languages are represented in SignWriting, a sign language writing system. Our work seeks to address the lack of out-of-the-box support for signed languages in current MT systems and is based on the SignBank dataset, which contains pairs of spoken language text and SignWriting content. We introduce novel methods to parse, factorize, decode, and evaluate SignWriting, leveraging ideas from neural factored MT. In a bilingual setup—-translating from American Sign Language to (American) English—-our method achieves over 30 BLEU, while in two multilingual setups—-translating in both directions between spoken languages and signed languages—-we achieve over 20 BLEU. We find that common MT techniques used to improve spoken language translation similarly affect the performance of sign language translation. These findings validate our use of an intermediate text representation for signed languages to include them in natural language processing research

ZORA

Advancing Annotation for Continuous Data in Swiss German Sign Language

Author: Battisti Alessia
Ebling Sarah
Sidler-Miserez Sandra
Tissi Katja
Publication venue
Publication date: 25/05/2024
Field of study

ZORA

SwissSLi: The Multi-parallel Sign Language Corpus for Switzerland

Author: Ebling Sarah
Göhring Anne
Jiang Zifan
Moryossef Amit
Sennrich Rico
Publication venue: ELRA and ICCL
Publication date: 25/05/2024
Field of study

In this work, we introduce SwissSLi, the first sign language corpus that contains parallel data of all three Swiss sign languages, namely Swiss German Sign Language (DSGS), French Sign Language of Switzerland (LSF-CH), and Italian Sign Language of Switzerland (LIS-CH). The data underlying this corpus originates from television programs in three spoken languages: German, French, and Italian. The programs have for the most part been translated into sign language by deaf translators, resulting in a unique, up to six-way multi-parallel dataset between spoken and sign languages. We describe and release the sign language videos and spoken language subtitles as well as the overall statistics and some derivatives of the raw material. These derived components include cropped videos, pose estimation, phrase/sign-segmented videos, and sentence-segmented subtitles, all of which facilitate downstream tasks such as sign language transcription (glossing) and machine translation. The corpus is publicly available on the SWISSUbase data platform for research purposes only under a CC BY-NC-SA 4.0 license

ZORA