Search CORE

280 research outputs found

Translating English Names to Arabic Using Phonotactic Rules

Author: Alshuwaier Faisal
Areshey Ali
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Text Messaging in English and Arabic with Reference to Translation

Author: Adel Nouri Ahmed
Ali Hussein Omar
Publication venue: Hasanuddin University
Publication date: 24/09/2023
Field of study

In the twenty-first century, many people now live their lives through text messaging. In fact, you can witness people using their cell phones to send character-based messages to their friends, classmates, family members, and coworkers in malls, schools, and pretty much anywhere else. The popularity of this method of communication has increased particularly among young people. One benefit is that technology enables people to speak with others virtually anywhere. Second, it enables individuals to speak softly, which is useful in noisy places like bars, where it would be challenging to have a productive conversation over the phone, or when extraneous communication needs to be done quietly, such as in a school. Thirdly, it combines some of the advantages of phone and email communication by allowing them to communicate both synchronously (i.e., two-way communication occurs concurrently) and asynchronously (i.e., two-way communication is delayed). The usage of acronyms, abbreviations, and other shorthand notations has become commonplace in this technology's creation of a new language form. The focus of this research is on these qualities specifically and how they are used. The aim of this study was to analyze not only how frequently but also how these symbolic expressions are used in relation to the linguistic functions that they signal, which was followed by a number of discoveries

Universitas Hasanuddin: e-Journals

Recommended from our members

Machine Translation of Arabic Dialects

Author: Salloum Wael Sameer
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

This thesis discusses different approaches to machine translation (MT) from Dialectal Arabic (DA) to English. These approaches handle the varying stages of Arabic dialects in terms of types of available resources and amounts of training data. The overall theme of this work revolves around building dialectal resources and MT systems or enriching existing ones using the currently available resources (dialectal or standard) in order to quickly and cheaply scale to more dialects without the need to spend years and millions of dollars to create such resources for every dialect. Unlike Modern Standard Arabic (MSA), DA-English parallel corpora is scarcely available for few dialects only. Dialects differ from each other and from MSA in orthography, morphology, phonology, and to some lesser degree syntax. This means that combining all available parallel data, from dialects and MSA, to train DA-to-English statistical machine translation (SMT) systems might not provide the desired results. Similarly, translating dialectal sentences with an SMT system trained on that dialect only is also challenging due to different factors that affect the sentence word choices against that of the SMT training data. Such factors include the level of dialectness (e.g., code switching to MSA versus dialectal training data), topic (sports versus politics), genre (tweets versus newspaper), script (Arabizi versus Arabic), and timespan of test against training. The work we present utilizes any available Arabic resource such as a preprocessing tool or a parallel corpus, whether MSA or DA, to improve DA-to-English translation and expand to more dialects and sub-dialects. The majority of Arabic dialects have no parallel data to English or to any other foreign language. They also have no preprocessing tools such as normalizers, morphological analyzers, or tokenizers. For such dialects, we present an MSA-pivoting approach where DA sentences are translated to MSA first, then the MSA output is translated to English using the wealth of MSA-English parallel data. Since there is virtually no DA-MSA parallel data to train an SMT system, we build a rule-based DA-to-MSA MT system, ELISSA, that uses morpho-syntactic translation rules along with dialect identification and language modeling components. We also present a rule-based approach to quickly and cheaply build a dialectal morphological analyzer, ADAM, which provides ELISSA with dialectal word analyses. Other Arabic dialects have a relatively small-sized DA-English parallel data amounting to a few million words on the DA side. Some of these dialects have dialect-dependent preprocessing tools that can be used to prepare the DA data for SMT systems. We present techniques to generate synthetic parallel data from the available DA-English and MSA- English data. We use this synthetic data to build statistical and hybrid versions of ELISSA as well as improve our rule-based ELISSA-based MSA-pivoting approach. We evaluate our best MSA-pivoting MT pipeline against three direct SMT baselines trained on these three parallel corpora: DA-English data only, MSA-English data only, and the combination of DA-English and MSA-English data. Furthermore, we leverage the use of these four MT systems (the three baselines along with our MSA-pivoting system) in two system combination approaches that benefit from their strengths while avoiding their weaknesses. Finally, we propose an approach to model dialects from monolingual data and limited DA-English parallel data without the need for any language-dependent preprocessing tools. We learn DA preprocessing rules using word embedding and expectation maximization. We test this approach by building a morphological segmentation system and we evaluate its performance on MT against the state-of-the-art dialectal tokenization tool

Columbia University Academic Commons

Phonological adaptation of English loanwords into Qassimi Arabic :an optimality- theoretic account

Author: Alhoody Metab Mohammad A
Publication venue: Newcastle University
Publication date: 01/01/2019
Field of study

IPhD ThesisWithin the field of loanword phonology, this study enhances our understanding of the role played by the contrastive features of the borrowing language in shaping the segmental adaptation patterns of loanwords from the source language. This has been achieved by performing a theoretical analysis of the segmental adaptation patterns of English loanwords into Qassimi Arabic, a dialect spoken in the region of Qassim in central Saudi Arabia, using an Optimality-Theoretic framework. The central argument of this study assumes that the inputs to QA are fully-specified English outputs, which serve as inputs to QA. Then, the native grammar of QA allows only the phonological features of inputs to surface that are contrastive in QA. Thus, redundant or noncontrastive phonological features in QA are eliminated from the outputs. The evidence behind the argument that the contrastive features of QA segments play a main role in the adaptation process emerges from adapting the English segments that are non-native in QA. For instance, English lax vowels /ɪ/, /ʊ/, /æ/ are adapted as their tense counterparts in QA [i], [u] and [a]. I have argued that the reason for this adaptation lies in the fact that the feature [ATR] is not a contrastive feature within the QA vowel inventory. Therefore, dispensing with the value of the input feature [-ATR] culminates in the tense vowels appearing at the surface level. To identify the contrastive features of QA phonological inventory, I rely on the Contrastive Hierarchy Theory proposed by Dresher (2009). This theory suggests that phonological features should be ordered hierarchically to obtain only the contrastive features of any phonological inventory. This is achieved by dividing any inventory into subsets of features until each segment is distinguished contrastively from all others. Therefore, the features of QA segments are built initially into a contrastive hierarchy model. Within this hierarchy, features are created and ordered according to one or more of the following motivations: Activity, Minimality and Universality. Finally, the contrastive hierarchy of QA segment inventory is converted into OT constraints. The ranking of these constraints is sufficient to account for the evaluations of the segmental adaptation patterns of loanwords from English into QA. For instance, based on the contrastive hierarchy of QA, /b/ is contrastively specified as [-sonorant, +labial, -continuant]. In the adaptation of English consonants, the English input segment /p/ is mapped consistently to [b] in the QA. In this case, the contrastive hierarchy of QA consonant inventory contains the co-occurrence constraints *[αVoice, +labial] and *[αCoronal, +labial], which filter the input features if the input is fully-specified [-sonorant, +labial, -coronal, -continuant, -voiced, …], and permits only the contrastive features [-sonorant, +labial, -continuant] to surface.Qassim University in Saudi Arabi

Newcastle University eTheses

¿Qué es la fonología computacional?

Author: Daland Robert
Publication venue: 'Editorial CSIC'
Publication date: 30/06/2014
Field of study

Computational phonology is not one thing. Rather, it is an umbrella term which may refer to work on formal language theory, computer-implemented models of cognitive processes, and corpus methods derived from the literature on natural language processing (NLP). This article gives an overview of these distinct areas, identifying commonalities and differences in the goals of each area, as well as highlighting recent results of interest. The overview is necessarily brief and subjective. Broadly speaking, it is argued that learning is a pervasive theme in these areas, but the core questions and concerns vary too much to define a coherent field. Computational phonologists are more united by a shared body of formal knowledge than they are by a shared sense of what the important questions are.La fonología computacional no representa un campo unitario, sino que es un término genérico que puede hacer referencia a obras sobre teorías de lenguajes formales; a modelos de procesos cognitivos implementados por ordenador; y a métodos de trabajo con corpus, derivados de la bibliografía sobre procesamiento del lenguaje natural (PLN). Este artículo ofrece una visión de conjunto de estas distintas áreas, identifica los puntos comunes y las diferencias en los objetivos de cada una, y pone de relieve algunos de los últimos resultados más relevantes. Esta visión de conjunto es necesariamente breve y subjetiva. En términos generales, se argumenta que el aprendizaje es un tema recurrente en estos ámbitos, pero las preguntas y los problemas centrales varían demasiado como para definir un área de estudio unitaria y coherente. Los fonólogos computacionales están unidos por un cúmulo común de conocimientos formales más que por un parecer compartido acerca de cuáles son las preguntas importantes

Loquens (E-Journal)

The Problems of Translating Medical Terms from English into Arabic

Author: ARGEG GARSA,MOUSBAH
Publication venue
Publication date: 01/01/2015
Field of study

Abstract This study tackles the problems of translating medical terms from English into Arabic a. It uses an evaluative approach to investigate and discuss the problems and intricacies of translating medical terms from English into Arabic. The purpose of the study is to display the difficulties of translating medical terms and how they were tackled by postgraduate students who are competent in medical translation and professional Arabic translators who work in the medical field. The study adopts a qualitative-quantitative approach. It focuses on different types of medical terms, excluding pharmacy-related terms. In order to find out and identify the real difficulties behind translating medical terms and how they could be approached by experienced translators, the researcher utilized a questionnaire test that included a set of English medical terms to be translated into Arabic by students who were doing a PhD in translation. The same questionnaire was also given to a group of professional Arabic translators. As medical terms are the key components of medical texts, the questionnaire included forty-five diversified English medical terms taken from different medical reports, namely National Health Service (NHS) leaflets and flyers and World Health Organization (WHO) reports for 2007 and 2008. The official Arabic translations of these documents were used to assess the translations given by the subjects in comparison to and contrast with some medical dictionaries and reliable medical websites. The population of the study included 54 postgraduate students (doing PhDs in Arabic translation) in Libyan (the researcher’s origin country) and UK universities and 12 Arabic translators working in UK hospitals and clinics. The results from the data analysis showed that the translation of the medical terms posed real difficulties and challenges for the students and inexperienced professional translators although the experienced professional translators found them comparatively straightforward. Hence, the result highlights the problems of translating medical terms from English into Arabic and the importance of training to work in the medical field as a translator. Also, the study concluded that literal translation, the heavy use of transliteration, inconsistency, the students’ lack of sufficient experience and practice in medical translation, and lack of up-to date English-Arabic medical dictionaries are factors that have given rise to problems in medical translation. Also, the study showed that almost no professional translators use CAT tools or MT to help them translate the medical terms

Durham e-Theses

Knowledge Modelling and Learning through Cognitive Networks

Author
Publication venue: 'MDPI AG'
Publication date: 06/07/2022
Field of study

One of the most promising developments in modelling knowledge is cognitive network science, which aims to investigate cognitive phenomena driven by the networked, associative organization of knowledge. For example, investigating the structure of semantic memory via semantic networks has illuminated how memory recall patterns influence phenomena such as creativity, memory search, learning, and more generally, knowledge acquisition, exploration, and exploitation. In parallel, neural network models for artificial intelligence (AI) are also becoming more widespread as inferential models for understanding which features drive language-related phenomena such as meaning reconstruction, stance detection, and emotional profiling. Whereas cognitive networks map explicitly which entities engage in associative relationships, neural networks perform an implicit mapping of correlations in cognitive data as weights, obtained after training over labelled data and whose interpretation is not immediately evident to the experimenter. This book aims to bring together quantitative, innovative research that focuses on modelling knowledge through cognitive and neural networks to gain insight into mechanisms driving cognitive processes related to knowledge structuring, exploration, and learning. The book comprises a variety of publication types, including reviews and theoretical papers, empirical research, computational modelling, and big data analysis. All papers here share a commonality: they demonstrate how the application of network science and AI can extend and broaden cognitive science in ways that traditional approaches cannot

Directory of Open Access Books (DOAB)

Multi-dialect Arabic broadcast speech recognition

Author: Ali Ahmed Mohamed Abdel Maksoud
Publication venue: The University of Edinburgh
Publication date: 02/07/2018
Field of study

Dialectal Arabic speech research suffers from the lack of labelled resources and standardised orthography. There are three main challenges in dialectal Arabic speech recognition: (i) finding labelled dialectal Arabic speech data, (ii) training robust dialectal speech recognition models from limited labelled data and (iii) evaluating speech recognition for dialects with no orthographic rules. This thesis is concerned with the following three contributions: Arabic Dialect Identification: We are mainly dealing with Arabic speech without prior knowledge of the spoken dialect. Arabic dialects could be sufficiently diverse to the extent that one can argue that they are different languages rather than dialects of the same language. We have two contributions: First, we use crowdsourcing to annotate a multi-dialectal speech corpus collected from Al Jazeera TV channel. We obtained utterance level dialect labels for 57 hours of high-quality consisting of four major varieties of dialectal Arabic (DA), comprised of Egyptian, Levantine, Gulf or Arabic peninsula, North African or Moroccan from almost 1,000 hours. Second, we build an Arabic dialect identification (ADI) system. We explored two main groups of features, namely acoustic features and linguistic features. For the linguistic features, we look at a wide range of features, addressing words, characters and phonemes. With respect to acoustic features, we look at raw features such as mel-frequency cepstral coefficients combined with shifted delta cepstra (MFCC-SDC), bottleneck features and the i-vector as a latent variable. We studied both generative and discriminative classifiers, in addition to deep learning approaches, namely deep neural network (DNN) and convolutional neural network (CNN). In our work, we propose Arabic as a five class dialect challenge comprising of the previously mentioned four dialects as well as modern standard Arabic. Arabic Speech Recognition: We introduce our effort in building Arabic automatic speech recognition (ASR) and we create an open research community to advance it. This section has two main goals: First, creating a framework for Arabic ASR that is publicly available for research. We address our effort in building two multi-genre broadcast (MGB) challenges. MGB-2 focuses on broadcast news using more than 1,200 hours of speech and 130M words of text collected from the broadcast domain. MGB-3, however, focuses on dialectal multi-genre data with limited non-orthographic speech collected from YouTube, with special attention paid to transfer learning. Second, building a robust Arabic ASR system and reporting a competitive word error rate (WER) to use it as a potential benchmark to advance the state of the art in Arabic ASR. Our overall system is a combination of five acoustic models (AM): unidirectional long short term memory (LSTM), bidirectional LSTM (BLSTM), time delay neural network (TDNN), TDNN layers along with LSTM layers (TDNN-LSTM) and finally TDNN layers followed by BLSTM layers (TDNN-BLSTM). The AM is trained using purely sequence trained neural networks lattice-free maximum mutual information (LFMMI). The generated lattices are rescored using a four-gram language model (LM) and a recurrent neural network with maximum entropy (RNNME) LM. Our official WER is 13%, which has the lowest WER reported on this task. Evaluation: The third part of the thesis addresses our effort in evaluating dialectal speech with no orthographic rules. Our methods learn from multiple transcribers and align the speech hypothesis to overcome the non-orthographic aspects. Our multi-reference WER (MR-WER) approach is similar to the BLEU score used in machine translation (MT). We have also automated this process by learning different spelling variants from Twitter data. We mine automatically from a huge collection of tweets in an unsupervised fashion to build more than 11M n-to-m lexical pairs, and we propose a new evaluation metric: dialectal WER (WERd). Finally, we tried to estimate the word error rate (e-WER) with no reference transcription using decoding and language features. We show that our word error rate estimation is robust for many scenarios with and without the decoding features

Edinburgh Research Archive

Elements, Government, and Licensing: Developments in phonology

Author
Publication venue: 'UCL Press'
Publication date: 01/01/2023
Field of study

Elements, Government, and Licensing brings together new theoretical and empirical developments in phonology. It covers three principal domains of phonological representation: melody and segmental structure; tone, prosody and prosodic structure; and phonological relations, empty categories, and vowel-zero alternations. Theoretical topics covered include the formalisation of Element Theory, the hotly debated topic of structural recursion in phonology, and the empirical status of government. In addition, a wealth of new analyses and empirical evidence sheds new light on empty categories in phonology, the analysis of certain consonantal sequences, phonological and non-phonological alternation, the elemental composition of segments, and many more. Taking up long-standing empirical and theoretical issues informed by the Government Phonology and Element Theory, this book provides theoretical advances while also bringing to light new empirical evidence and analysis challenging previous generalisations. The insights offered here will be equally exciting for phonologists working on related issues inside and outside the Principles & Parameters programme, such as researchers working in Optimality Theory or classical rule-based phonology

UCL Discovery

Learning About And Becoming Aware Of Reading Strategies And Metacognition In English By Adult Second Language Learners

Author: Biazotto André Aline
Publication venue: ISU ReD: Research and eData
Publication date: 07/06/2018
Field of study

The purpose of the present study was to investigate the need to learn and/or become aware of reading strategies and metacognitive strategies by adult English language learners while making sense of English texts. A mixed method grounded theory (MMGT) in a sequential design (quantitative Qualitative) with a qualitative dominant status was employed to collect and analyze data. In the quantitative phase of this study, data were collected by administrating a background questionnaire and the Survey of Reading Strategies (SORS). Data collected by these tools were statistically analyzed with descriptive analysis and one-way Analysis of Variance. In the qualitative phase of this study, data were collected through retrospective miscue analysis (RMA) and semi-structured interviews. Data collected by these methods were coded in order to verify reading patterns among participants. Quantitative results demonstrated that second language learners from different language backgrounds and English proficiency levels perceived the use of reading strategies differently. Qualitative results demonstrated that Saudi-Arabian second language learners tend to transfer their reading strategy in relying on small grain size units while reading in English. These results bring a new perspective to the second language reading field by demonstrating that second language learners from different language backgrounds apply reading strategies differently based on their initial reading development in their first languages

ISU ReD: Research and eData