280 research outputs found
Text Messaging in English and Arabic with Reference to Translation
In the twenty-first century, many people now live their lives through text messaging. In fact, you can witness people using their cell phones to send character-based messages to their friends, classmates, family members, and coworkers in malls, schools, and pretty much anywhere else. The popularity of this method of communication has increased particularly among young people. One benefit is that technology enables people to speak with others virtually anywhere. Second, it enables individuals to speak softly, which is useful in noisy places like bars, where it would be challenging to have a productive conversation over the phone, or when extraneous communication needs to be done quietly, such as in a school. Thirdly, it combines some of the advantages of phone and email communication by allowing them to communicate both synchronously (i.e., two-way communication occurs concurrently) and asynchronously (i.e., two-way communication is delayed). The usage of acronyms, abbreviations, and other shorthand notations has become commonplace in this technology's creation of a new language form. The focus of this research is on these qualities specifically and how they are used. The aim of this study was to analyze not only how frequently but also how these symbolic expressions are used in relation to the linguistic functions that they signal, which was followed by a number of discoveries
Recommended from our members
Machine Translation of Arabic Dialects
This thesis discusses different approaches to machine translation (MT) from Dialectal Arabic (DA) to English. These approaches handle the varying stages of Arabic dialects in terms of types of available resources and amounts of training data. The overall theme of this work revolves around building dialectal resources and MT systems or enriching existing ones using the currently available resources (dialectal or standard) in order to quickly and cheaply scale to more dialects without the need to spend years and millions of dollars to create such resources for every dialect.
Unlike Modern Standard Arabic (MSA), DA-English parallel corpora is scarcely available for few dialects only. Dialects differ from each other and from MSA in orthography, morphology, phonology, and to some lesser degree syntax. This means that combining all available parallel data, from dialects and MSA, to train DA-to-English statistical machine translation (SMT) systems might not provide the desired results. Similarly, translating dialectal sentences with an SMT system trained on that dialect only is also challenging due to different factors that affect the sentence word choices against that of the SMT training data. Such factors include the level of dialectness (e.g., code switching to MSA versus dialectal training data), topic (sports versus politics), genre (tweets versus newspaper), script (Arabizi versus Arabic), and timespan of test against training. The work we present utilizes any available Arabic resource such as a preprocessing tool or a parallel corpus, whether MSA or DA, to improve DA-to-English translation and expand to more dialects and sub-dialects.
The majority of Arabic dialects have no parallel data to English or to any other foreign language. They also have no preprocessing tools such as normalizers, morphological analyzers, or tokenizers. For such dialects, we present an MSA-pivoting approach where DA sentences are translated to MSA first, then the MSA output is translated to English using the wealth of MSA-English parallel data. Since there is virtually no DA-MSA parallel data to train an SMT system, we build a rule-based DA-to-MSA MT system, ELISSA, that uses morpho-syntactic translation rules along with dialect identification and language modeling components. We also present a rule-based approach to quickly and cheaply build a dialectal morphological analyzer, ADAM, which provides ELISSA with dialectal word analyses.
Other Arabic dialects have a relatively small-sized DA-English parallel data amounting to a few million words on the DA side. Some of these dialects have dialect-dependent preprocessing tools that can be used to prepare the DA data for SMT systems. We present techniques to generate synthetic parallel data from the available DA-English and MSA- English data. We use this synthetic data to build statistical and hybrid versions of ELISSA as well as improve our rule-based ELISSA-based MSA-pivoting approach. We evaluate our best MSA-pivoting MT pipeline against three direct SMT baselines trained on these three parallel corpora: DA-English data only, MSA-English data only, and the combination of DA-English and MSA-English data. Furthermore, we leverage the use of these four MT systems (the three baselines along with our MSA-pivoting system) in two system combination approaches that benefit from their strengths while avoiding their weaknesses.
Finally, we propose an approach to model dialects from monolingual data and limited DA-English parallel data without the need for any language-dependent preprocessing tools. We learn DA preprocessing rules using word embedding and expectation maximization. We test this approach by building a morphological segmentation system and we evaluate its performance on MT against the state-of-the-art dialectal tokenization tool
Phonological adaptation of English loanwords into Qassimi Arabic :an optimality- theoretic account
IPhD ThesisWithin the field of loanword phonology, this study enhances our understanding of the role played
by the contrastive features of the borrowing language in shaping the segmental adaptation patterns
of loanwords from the source language. This has been achieved by performing a theoretical
analysis of the segmental adaptation patterns of English loanwords into Qassimi Arabic, a dialect
spoken in the region of Qassim in central Saudi Arabia, using an Optimality-Theoretic framework.
The central argument of this study assumes that the inputs to QA are fully-specified English
outputs, which serve as inputs to QA. Then, the native grammar of QA allows only the
phonological features of inputs to surface that are contrastive in QA. Thus, redundant or noncontrastive phonological features in QA are eliminated from the outputs. The evidence behind the
argument that the contrastive features of QA segments play a main role in the adaptation process
emerges from adapting the English segments that are non-native in QA. For instance, English lax
vowels /ɪ/, /ʊ/, /æ/ are adapted as their tense counterparts in QA [i], [u] and [a]. I have argued that
the reason for this adaptation lies in the fact that the feature [ATR] is not a contrastive feature
within the QA vowel inventory. Therefore, dispensing with the value of the input feature [-ATR]
culminates in the tense vowels appearing at the surface level.
To identify the contrastive features of QA phonological inventory, I rely on the Contrastive
Hierarchy Theory proposed by Dresher (2009). This theory suggests that phonological features
should be ordered hierarchically to obtain only the contrastive features of any phonological
inventory. This is achieved by dividing any inventory into subsets of features until each segment
is distinguished contrastively from all others. Therefore, the features of QA segments are built
initially into a contrastive hierarchy model. Within this hierarchy, features are created and ordered
according to one or more of the following motivations: Activity, Minimality and Universality.
Finally, the contrastive hierarchy of QA segment inventory is converted into OT constraints. The
ranking of these constraints is sufficient to account for the evaluations of the segmental adaptation
patterns of loanwords from English into QA. For instance, based on the contrastive hierarchy of
QA, /b/ is contrastively specified as [-sonorant, +labial, -continuant]. In the adaptation of English
consonants, the English input segment /p/ is mapped consistently to [b] in the QA. In this case, the
contrastive hierarchy of QA consonant inventory contains the co-occurrence constraints *[αVoice,
+labial] and *[αCoronal, +labial], which filter the input features if the input is fully-specified
[-sonorant, +labial, -coronal, -continuant, -voiced, …], and permits only the contrastive features
[-sonorant, +labial, -continuant] to surface.Qassim University in Saudi Arabi
¿Qué es la fonologÃa computacional?
Computational phonology is not one thing. Rather, it is an umbrella term which may refer to work on formal language theory, computer-implemented models of cognitive processes, and corpus methods derived from the literature on natural language processing (NLP). This article gives an overview of these distinct areas, identifying commonalities and differences in the goals of each area, as well as highlighting recent results of interest. The overview is necessarily brief and subjective. Broadly speaking, it is argued that learning is a pervasive theme in these areas, but the core questions and concerns vary too much to define a coherent field. Computational phonologists are more united by a shared body of formal knowledge than they are by a shared sense of what the important questions are.La fonologÃa computacional no representa un campo unitario, sino que es un término genérico que puede hacer referencia a obras sobre teorÃas de lenguajes formales; a modelos de procesos cognitivos implementados por ordenador; y a métodos de trabajo con corpus, derivados de la bibliografÃa sobre procesamiento del lenguaje natural (PLN). Este artÃculo ofrece una visión de conjunto de estas distintas áreas, identifica los puntos comunes y las diferencias en los objetivos de cada una, y pone de relieve algunos de los últimos resultados más relevantes. Esta visión de conjunto es necesariamente breve y subjetiva. En términos generales, se argumenta que el aprendizaje es un tema recurrente en estos ámbitos, pero las preguntas y los problemas centrales varÃan demasiado como para definir un área de estudio unitaria y coherente. Los fonólogos computacionales están unidos por un cúmulo común de conocimientos formales más que por un parecer compartido acerca de cuáles son las preguntas importantes
The Problems of Translating Medical Terms from English into Arabic
Abstract
This study tackles the problems of translating medical terms from English into Arabic a. It uses an evaluative approach to investigate and discuss the problems and intricacies of translating medical terms from English into Arabic. The purpose of the study is to display the difficulties of translating medical terms and how they were tackled by postgraduate students who are competent in medical translation and professional Arabic translators who work in the medical field. The study adopts a qualitative-quantitative approach. It focuses on different types of medical terms, excluding pharmacy-related terms. In order to find out and identify the real difficulties behind translating medical terms and how they could be approached by experienced translators, the researcher utilized a questionnaire test that included a set of English medical terms to be translated into Arabic by students who were doing a PhD in translation. The same questionnaire was also given to a group of professional Arabic translators. As medical terms are the key components of medical texts, the questionnaire included forty-five diversified English medical terms taken from different medical reports, namely National Health Service (NHS) leaflets and flyers and World Health Organization (WHO) reports for 2007 and 2008. The official Arabic translations of these documents were used to assess the translations given by the subjects in comparison to and contrast with some medical dictionaries and reliable medical websites. The population of the study included 54 postgraduate students (doing PhDs in Arabic translation) in Libyan (the researcher’s origin country) and UK universities and 12 Arabic translators working in UK hospitals and clinics.
The results from the data analysis showed that the translation of the medical terms posed real difficulties and challenges for the students and inexperienced professional translators although the experienced professional translators found them comparatively straightforward. Hence, the result highlights the problems of translating medical terms from English into
Arabic and the importance of training to work in the medical field as a translator. Also, the study concluded that literal translation, the heavy use of transliteration, inconsistency, the students’ lack of sufficient experience and practice in medical translation, and lack of up-to date English-Arabic medical dictionaries are factors that have given rise to problems in medical translation. Also, the study showed that almost no professional translators use CAT tools or MT to help them translate the medical terms
Knowledge Modelling and Learning through Cognitive Networks
One of the most promising developments in modelling knowledge is cognitive network science, which aims to investigate cognitive phenomena driven by the networked, associative organization of knowledge. For example, investigating the structure of semantic memory via semantic networks has illuminated how memory recall patterns influence phenomena such as creativity, memory search, learning, and more generally, knowledge acquisition, exploration, and exploitation. In parallel, neural network models for artificial intelligence (AI) are also becoming more widespread as inferential models for understanding which features drive language-related phenomena such as meaning reconstruction, stance detection, and emotional profiling. Whereas cognitive networks map explicitly which entities engage in associative relationships, neural networks perform an implicit mapping of correlations in cognitive data as weights, obtained after training over labelled data and whose interpretation is not immediately evident to the experimenter. This book aims to bring together quantitative, innovative research that focuses on modelling knowledge through cognitive and neural networks to gain insight into mechanisms driving cognitive processes related to knowledge structuring, exploration, and learning. The book comprises a variety of publication types, including reviews and theoretical papers, empirical research, computational modelling, and big data analysis. All papers here share a commonality: they demonstrate how the application of network science and AI can extend and broaden cognitive science in ways that traditional approaches cannot
Multi-dialect Arabic broadcast speech recognition
Dialectal Arabic speech research suffers from the lack of labelled resources and
standardised orthography. There are three main challenges in dialectal Arabic
speech recognition: (i) finding labelled dialectal Arabic speech data, (ii) training
robust dialectal speech recognition models from limited labelled data and (iii)
evaluating speech recognition for dialects with no orthographic rules. This thesis
is concerned with the following three contributions:
Arabic Dialect Identification: We are mainly dealing with Arabic speech
without prior knowledge of the spoken dialect. Arabic dialects could be sufficiently
diverse to the extent that one can argue that they are different languages
rather than dialects of the same language. We have two contributions:
First, we use crowdsourcing to annotate a multi-dialectal speech corpus collected
from Al Jazeera TV channel. We obtained utterance level dialect labels for 57
hours of high-quality consisting of four major varieties of dialectal Arabic (DA),
comprised of Egyptian, Levantine, Gulf or Arabic peninsula, North African or
Moroccan from almost 1,000 hours. Second, we build an Arabic dialect identification
(ADI) system. We explored two main groups of features, namely acoustic
features and linguistic features. For the linguistic features, we look at a wide
range of features, addressing words, characters and phonemes. With respect to
acoustic features, we look at raw features such as mel-frequency cepstral coefficients
combined with shifted delta cepstra (MFCC-SDC), bottleneck features and
the i-vector as a latent variable. We studied both generative and discriminative
classifiers, in addition to deep learning approaches, namely deep neural network
(DNN) and convolutional neural network (CNN). In our work, we propose Arabic
as a five class dialect challenge comprising of the previously mentioned four
dialects as well as modern standard Arabic.
Arabic Speech Recognition: We introduce our effort in building Arabic automatic
speech recognition (ASR) and we create an open research community
to advance it. This section has two main goals: First, creating a framework for
Arabic ASR that is publicly available for research. We address our effort in building
two multi-genre broadcast (MGB) challenges. MGB-2 focuses on broadcast
news using more than 1,200 hours of speech and 130M words of text collected
from the broadcast domain. MGB-3, however, focuses on dialectal multi-genre
data with limited non-orthographic speech collected from YouTube, with special
attention paid to transfer learning. Second, building a robust Arabic ASR system
and reporting a competitive word error rate (WER) to use it as a potential
benchmark to advance the state of the art in Arabic ASR. Our overall system is
a combination of five acoustic models (AM): unidirectional long short term memory
(LSTM), bidirectional LSTM (BLSTM), time delay neural network (TDNN),
TDNN layers along with LSTM layers (TDNN-LSTM) and finally TDNN layers
followed by BLSTM layers (TDNN-BLSTM). The AM is trained using purely
sequence trained neural networks lattice-free maximum mutual information (LFMMI).
The generated lattices are rescored using a four-gram language model
(LM) and a recurrent neural network with maximum entropy (RNNME) LM.
Our official WER is 13%, which has the lowest WER reported on this task.
Evaluation: The third part of the thesis addresses our effort in evaluating dialectal
speech with no orthographic rules. Our methods learn from multiple
transcribers and align the speech hypothesis to overcome the non-orthographic
aspects. Our multi-reference WER (MR-WER) approach is similar to the BLEU
score used in machine translation (MT). We have also automated this process
by learning different spelling variants from Twitter data. We mine automatically
from a huge collection of tweets in an unsupervised fashion to build more than
11M n-to-m lexical pairs, and we propose a new evaluation metric: dialectal
WER (WERd). Finally, we tried to estimate the word error rate (e-WER) with
no reference transcription using decoding and language features. We show that
our word error rate estimation is robust for many scenarios with and without the
decoding features
Elements, Government, and Licensing: Developments in phonology
Elements, Government, and Licensing brings together new theoretical and empirical developments in phonology. It covers three principal domains of phonological representation: melody and segmental structure; tone, prosody and prosodic structure; and phonological relations, empty categories, and vowel-zero alternations. Theoretical topics covered include the formalisation of Element Theory, the hotly debated topic of structural recursion in phonology, and the empirical status of government.
In addition, a wealth of new analyses and empirical evidence sheds new light on empty categories in phonology, the analysis of certain consonantal sequences, phonological and non-phonological alternation, the elemental composition of segments, and many more. Taking up long-standing empirical and theoretical issues informed by the Government Phonology and Element Theory, this book provides theoretical advances while also bringing to light new empirical evidence and analysis challenging previous generalisations.
The insights offered here will be equally exciting for phonologists working on related issues inside and outside the Principles & Parameters programme, such as researchers working in Optimality Theory or classical rule-based phonology
Learning About And Becoming Aware Of Reading Strategies And Metacognition In English By Adult Second Language Learners
The purpose of the present study was to investigate the need to learn and/or become aware of reading strategies and metacognitive strategies by adult English language learners while making sense of English texts. A mixed method grounded theory (MMGT) in a sequential design (quantitative Qualitative) with a qualitative dominant status was employed to collect and analyze data. In the quantitative phase of this study, data were collected by administrating a background questionnaire and the Survey of Reading Strategies (SORS). Data collected by these tools were statistically analyzed with descriptive analysis and one-way Analysis of Variance. In the qualitative phase of this study, data were collected through retrospective miscue analysis (RMA) and semi-structured interviews. Data collected by these methods were coded in order to verify reading patterns among participants. Quantitative results demonstrated that second language learners from different language backgrounds and English proficiency levels perceived the use of reading strategies differently. Qualitative results demonstrated that Saudi-Arabian second language learners tend to transfer their reading strategy in relying on small grain size units while reading in English. These results bring a new perspective to the second language reading field by demonstrating that second language learners from different language backgrounds apply reading strategies differently based on their initial reading development in their first languages
- …