5 research outputs found
Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study
Code-switching (CSW) text generation has been receiving increasing attention
as a solution to address data scarcity. In light of this growing interest, we
need more comprehensive studies comparing different augmentation approaches. In
this work, we compare three popular approaches: lexical replacements,
linguistic theories, and back-translation (BT), in the context of Egyptian
Arabic-English CSW. We assess the effectiveness of the approaches on machine
translation and the quality of augmentations through human evaluation. We show
that BT and CSW predictive-based lexical replacement, being trained on CSW
parallel data, perform best on both tasks. Linguistic theories and random
lexical replacement prove to be effective in the lack of CSW parallel data,
where both approaches achieve similar results.Comment: Findings of EMNLP 202
ArzEn-ST: A Three-way Speech Translation Corpus for Code-Switched Egyptian Arabic - English
We present our work on collecting ArzEn-ST, a code-switched Egyptian Arabic -
English Speech Translation Corpus. This corpus is an extension of the ArzEn
speech corpus, which was collected through informal interviews with bilingual
speakers. In this work, we collect translations in both directions, monolingual
Egyptian Arabic and monolingual English, forming a three-way speech translation
corpus. We make the translation guidelines and corpus publicly available. We
also report results for baseline systems for machine translation and speech
translation tasks. We believe this is a valuable resource that can motivate and
facilitate further research studying the code-switching phenomenon from a
linguistic perspective and can be used to train and evaluate NLP systems.Comment: Accepted to the Seventh Arabic Natural Language Processing Workshop
(WANLP 2022
Collecting Data for Automatic Speech Recognition Systems in Dialectal Arabic using Games With a Purpose
Abstract. Building Automatic Speech Recognition (ASR) systems for spoken languages usually suffer from the problem of limited available transcriptions. Automatic Speech Recognition (ASR) systems require large speech corpora that contain speech and their corresponding transcriptions for training acoustic models. In this paper, we target the Egyptian dialectal Arabic. As other spoken languages, it is mainly used for spoken rather than writing purposes. Transcriptions are usually collected manually by experts. However, this proved to be a time-consuming and expensive process. In this paper, we introduce Games With a Purpose as a cheap and fast approach to gather transcriptions for Egyptian dialectal Arabic. Furthermore, Arabic orthographic transcriptions lack diacritizations, which leads to ambiguity. On the other hand, transcriptions written in Arabic Chat Alphabet are widely used, and include the pronunciation effects given by diacritics. In this work, we present the game Ma ame o (pronouced as makhamekho) that aims at collecting transcriptions in Arabic orthography, as well as in Arabic Chat Alphabet. It also gathers mappings of words from Arabic orthography to Arabic Chat Alphabet
Investigating Lexical Replacements for Arabic-English Code-Switched Data Augmentation
Code-switching (CS) poses several challenges to NLP tasks, where data
sparsity is a main problem hindering the development of CS NLP systems. In this
paper, we investigate data augmentation techniques for synthesizing Dialectal
Arabic-English CS text. We perform lexical replacements using parallel corpora
and alignments where CS points are either randomly chosen or learnt using a
sequence-to-sequence model. We evaluate the effectiveness of data augmentation
on language modeling (LM), machine translation (MT), and automatic speech
recognition (ASR) tasks. Results show that in the case of using 1-1 alignments,
using trained predictive models produces more natural CS sentences, as
reflected in perplexity. By relying on grow-diag-final alignments, we then
identify aligning segments and perform replacements accordingly. By replacing
segments instead of words, the quality of synthesized data is greatly improved.
With this improvement, random-based approach outperforms using trained
predictive models on all extrinsic tasks. Our best models achieve 33.6%
improvement in perplexity, +3.2-5.6 BLEU points on MT task, and 7% relative
improvement on WER for ASR task. We also contribute in filling the gap in
resources by collecting and publishing the first Arabic English CS-English
parallel corpus
PocketEAR: Asistivní systém klasifikace zvuků pro sluchově postižené
Tento článek popisuje architekturu a činnost asistivního systému nazvaného PocketEAR, který je primárně určen pro sluchově postižené uživatele. Pomáhá jim orientovat se v akusticky aktivním prostředí tím, že průběžně monitoruje a klasifikuje přicházející zvuky a zobrazuje uživateli třídy těchto zachycených zvuků. Klasifikátor zvuků prostředí je navržen jako dvojstupňový klasifikátor hlubokou konvoluční neuronovou sítí (skládá se z tzv. superklasifikátoru a množiny tzv. subklasifikátorů), do kterého vstupuje sekvence MFCC vektorů. Klasifikátor je obklopen distribuovaným klient-server systémem, kdy zachycení zvuků v terénu, jejich (před)zpracování a zobrazení výsledků klasifikace zajišťují instance mobilní klientské aplikace, zatímco o samotnou klasifikaci a správu systému se starají dva kooperující servery. Článek detailně rozebírá architekturu klasifikátoru zvuků z prostředí a také použité specifické postupy zpracování zvuku.This paper describes the design and operation of an assistive system called PocketEAR which is primarily targeted towards hearing-impaired users. It helps them with orientation in acoustically active environments by continuously monitoring and classifying the incoming sounds and displaying the captured sound classes to the users. The environmental sound recognizer is designed as a two-stage deep convolutional neural network classifier (consists of the so-called superclassifier and a set of the so-called subclassifiers) fed with sequences of MFCC vectors. It is wrapped in a distributed client-server system where the sound capturing in terrain, (pre)processing and displaying of the classication results are performed by instances of a mobile client application, and the actual classication and maintenance are carried out by two co-operating servers. The paper discusses in details the architecture of the environmental sound classier as well as the used task-specific sound processin