8,063 research outputs found
End-to-End Speech Translation of Arabic to English Broadcast News
Speech translation (ST) is the task of directly translating acoustic speech
signals in a source language into text in a foreign language. ST task has been
addressed, for a long time, using a pipeline approach with two modules : first
an Automatic Speech Recognition (ASR) in the source language followed by a
text-to-text Machine translation (MT). In the past few years, we have seen a
paradigm shift towards the end-to-end approaches using sequence-to-sequence
deep neural network models. This paper presents our efforts towards the
development of the first Broadcast News end-to-end Arabic to English speech
translation system. Starting from independent ASR and MT LDC releases, we were
able to identify about 92 hours of Arabic audio recordings for which the manual
transcription was also translated into English at the segment level. These data
was used to train and compare pipeline and end-to-end speech translation
systems under multiple scenarios including transfer learning and data
augmentation techniques.Comment: Arabic Natural Language Processing Workshop 202
Unconstrained Scene Text and Video Text Recognition for Arabic Script
Building robust recognizers for Arabic has always been challenging. We
demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid
architecture in recognizing Arabic text in videos and natural scenes. We
outperform previous state-of-the-art on two publicly available video text
datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a
new Arabic scene text dataset and establish baseline results. For scripts like
Arabic, a major challenge in developing robust recognizers is the lack of large
quantity of annotated data. We overcome this by synthesising millions of Arabic
text images from a large vocabulary of Arabic words and phrases. Our
implementation is built on top of the model introduced here [37] which is
proven quite effective for English scene text recognition. The model follows a
segmentation-free, sequence to sequence transcription approach. The network
transcribes a sequence of convolutional features from the input image to a
sequence of target labels. This does away with the need for segmenting input
image into constituent characters/glyphs, which is often difficult for Arabic
script. Further, the ability of RNNs to model contextual dependencies yields
superior recognition results.Comment: 5 page
Towards Understanding Egyptian Arabic Dialogues
Labelling of user's utterances to understanding his attends which called
Dialogue Act (DA) classification, it is considered the key player for dialogue
language understanding layer in automatic dialogue systems. In this paper, we
proposed a novel approach to user's utterances labeling for Egyptian
spontaneous dialogues and Instant Messages using Machine Learning (ML) approach
without relying on any special lexicons, cues, or rules. Due to the lack of
Egyptian dialect dialogue corpus, the system evaluated by multi-genre corpus
includes 4725 utterances for three domains, which are collected and annotated
manually from Egyptian call-centers. The system achieves F1 scores of 70. 36%
overall domains.Comment: arXiv admin note: substantial text overlap with arXiv:1505.0308
- …