1,859 research outputs found

    Towards a flexible open-source software library for multi-layered scholarly textual studies: An Arabic case study dealing with semi-automatic language processing

    Get PDF
    This paper presents both the general model and a case study of the Computational and Collaborative Philology Library (CoPhiLib), an ongoing initiative underway at the Institute for Computational Linguistics (ILC) of the National Research Council (CNR), Pisa, Italy. The library, designed and organized as a reusable, abstract and open-source software component, aims at solving the needs of multi-lingual and cross-lingual analysis by exposing common Application Programming Interfaces (APIs). The core modules, coded by the Java programming language, constitute the groundwork of a Web platform designed to deal with textual scholarly needs. The Web application, implemented according to the Java Enterprise specifications, focuses on multi-layered analysis for the study of literary documents and related multimedia sources. This ambitious challenge seeks to obtain the management of textual resources, on the one hand by abstracting from current language, on the other hand by decoupling from the specific requirements of single projects. This goal is achieved thanks to methodologies declared by the 'agile process', and by putting into effect suitable use case modeling, design patterns, and component-based architectures. The reusability and flexibility of the system have been tested on an Arabic case study: the system allows users to choose the morphological engine (such as AraMorph or Al-Khalil), along with linguistic granularity (i.e. with or without declension). Finally, the application enables the construction of annotated resources for further statistical engines (training set). © 2014 IEEE

    Roots, leaves and branches – The typology of sign languages

    Get PDF

    Metaphor in (Arabic-into-English) translation with specific reference to metaphorical concepts and expressions in political discourse

    Get PDF
    Cognitive linguistics scholars argue that metaphor is fundamentally a conceptual process of mapping one domain of experience onto another domain. The study of metaphor in the context of Translation Studies has not, unfortunately, kept pace with the discoveries about the nature and role of metaphor in the cognitive sciences. This study aims primarily to fill part of this gap of knowledge. Specifically, the thesis is an attempt to explore some implications of the conceptual theory of metaphor for translation. Because the study of metaphor in translation is also based on views about the nature of translation, the thesis first presents a general overview of the discipline of Translation Studies, describing the major models of translation. The study (in Chapter Two) then discusses the major traditional theories of metaphor (comparison, substitution and interaction theories) and shows how the ideas of those theories were adopted in specific translation studies of metaphor. After that, the study presents a detailed account of the conceptual theory of metaphor and some hypothetical implications for the study of metaphor in translation from the perspective of cognitive linguistics. The data and methodology are presented in Chapter Four. A novel classification of conceptual metaphor is presented which distinguishes between different source domains of conceptual metaphors: physical, human-life and intertextual. It is suggested that each source domain places different demands on translators. The major sources of the data for this study are (1) the translations done by the Foreign Broadcasting Information Service (FBIS), which is a translation service of the Central Intelligence Agency (CIA) in the United Sates of America, of a number of speeches by the Iraqi president Saddam Hussein during the Gulf Crisis (1990-1991) and (2) official (governmental) Omani translations of National Day speeches of Sultan Qaboos bin Said of Oman

    Combining Minimally-supervised Methods for Arabic Named Entity Recognition.

    Get PDF
    Supervised methods can achieve high performance on NLP tasks, such as Named Entity Recognition (NER), but new annotations are required for every new domain and/or genre change. This has motivated research in minimally supervised methods such as semi-supervised learning and distant learning, but neither technique has yet achieved performance levels comparable to those of supervised methods. Semi-supervised methods tend to have very high precision but comparatively low recall, whereas distant learning tends to achieve higher recall but lower precision. This complementarity suggests that better results may be obtained by combining the two types of minimally supervised methods. In this paper we present a novel approach to Arabic NER using a combination of semi-supervised and distant learning techniques. We trained a semi-supervised NER classifier and another one using distant learning techniques, and then combined them using a variety of classifier combination schemes, including the Bayesian Classifier Combination (BCC) procedure recently proposed for sentiment analysis. According to our results, the BCC model leads to an increase in performance of 8 percentage points over the best base classifiers

    Open-source resources and standards for Arabic word structure analysis: Fine grained morphological analysis of Arabic text corpora

    Get PDF
    Morphological analyzers are preprocessors for text analysis. Many Text Analytics applications need them to perform their tasks. The aim of this thesis is to develop standards, tools and resources that widen the scope of Arabic word structure analysis - particularly morphological analysis, to process Arabic text corpora of different domains, formats and genres, of both vowelized and non-vowelized text. We want to morphologically tag our Arabic Corpus, but evaluation of existing morphological analyzers has highlighted shortcomings and shown that more research is required. Tag-assignment is significantly more complex for Arabic than for many languages. The morphological analyzer should add the appropriate linguistic information to each part or morpheme of the word (proclitic, prefix, stem, suffix and enclitic); in effect, instead of a tag for a word, we need a subtag for each part. Very fine-grained distinctions may cause problems for automatic morphosyntactic analysis – particularly probabilistic taggers which require training data, if some words can change grammatical tag depending on function and context; on the other hand, finegrained distinctions may actually help to disambiguate other words in the local context. The SALMA – Tagger is a fine grained morphological analyzer which is mainly depends on linguistic information extracted from traditional Arabic grammar books and prior knowledge broad-coverage lexical resources; the SALMA – ABCLexicon. More fine-grained tag sets may be more appropriate for some tasks. The SALMA –Tag Set is a theory standard for encoding, which captures long-established traditional fine-grained morphological features of Arabic, in a notation format intended to be compact yet transparent. The SALMA – Tagger has been used to lemmatize the 176-million words Arabic Internet Corpus. It has been proposed as a language-engineering toolkit for Arabic lexicography and for phonetically annotating the Qur’an by syllable and primary stress information, as well as, fine-grained morphological tagging

    English in the expanding circle of Morocco: Spread, uses, and functions

    Get PDF
    Research using Kachru’s (1984) World Englishes theoretical framework and Three Circles model has produced a wealth of knowledge about the spread and functions of English to speech communities around the world. However, there is a recognition that disproportionate attention has been accorded across these spheres. The most compelling argument outlining this gap in the literature was offered by Berns (2005) over a decade ago and was reiterated by Elyas and Mahboob (2020) just recently. Berns (2005: 85) concluded that while the bulk of academic research has focused on the use of English in Inner and Outer Circle contexts, the Expanding Circle remains mostly overlooked. Elyas and Mahboob (2020: 1), who co-edited a special journal issue on the North African and Middle East contexts, underscored that the topic of English in these regions ‘is largely under-studied and undertheorized.’ Following Berns’ remarks, numerous studies have focused on this underrepresented context. Nevertheless, despite their solid contributions, these investigations remain insufficient for constructing a comprehensive understanding of the distinct dynamics of the Expanding Circle. To contribute to the Expanding Circle literature, this exploratory, qualitative, macrosociolinguistic study employs Kachru’s (1984) World Englishes theoretical framework to investigate in greater depth the spread, functional range, and domains of English use in the multilingual country of Morocco. Specifically, this study initially provides an overview of the various languages used in Morocco, then outlines the history of its contact with the English language. It next explores English use in Moroccan media, examining in detail the language’s wide-ranging uses in broadcast, digital, print, and film media. This is followed by an in-depth examination of the linguistic landscape of the metropolitan city of Casablanca, with a focus on shop signs and outdoor advertisements. Whilst the users and uses of the English language are the major focus of analysis, additional attention is given to what such a spread means for the other four historically well-established languages of use within this Expanding Circle context: Arabic, French, Spanish, and the indigenous language Tmazight. A further aim of this study is to contribute new perspectives to the existing literature on the distinct dynamics of the Expanding Circle in general

    Saudi Arabia, Lebanon and the Changing Arab Information Order

    Get PDF
    This article explores the impact of Arab reality television on Arab governance. Reality television activates hypermedia space (Kraidy, 2006c), a broadly defined inter-media symbolic field, because its commercial logic promotes ostensibly participatory practices like voting, campaigning and alliance building via mobile telephones and the Internet. How does hypermedia space contribute to changing the ways in which Arab citizens and regimes access, use, create and control information? How do the new information dynamics affect the way citizens and governments relate to each other? To address these questions, this article focuses on recent social and political developments in Saudi Arabia and Lebanon, treating the two countries as a dynamic pair whose multi-faceted interactions shape a pan-Arab hypermedia space. This article will endeavor to explain how various Saudi and Lebanese actors have appropriated the reality TV show Star Academy for social and political purposes, and how increased public awareness of the hypermedia space engendered by the program has affected the nature of governance in the two countries. This article concludes with a discussion of how hypermedia space contributes to shifts in the nature and boundaries of social and political agency

    Total Quality Management Plan in Non-profit Translation Service Providers in the United Arab Emirates: Identifying Critical Success Factors for Improvement

    Get PDF
    The notion of quality has become an important topic in the translation domain, especially as most translation projects are no longer just the outcome of the work of a single expert translator, but rather a corporate activity, consistent with the norms of the structural environment and bureaucratic workflows of the organisation that is responsible for the work. Managing quality across this value-chain is therefore one of the most challenging areas in 21st century translation. In such a complex process, the notion of Total Quality Management (TQM), originally a quality management tool in mass production, has started to be implemented effectively in many diverse sectors, such as medicine and education (Hansson, 2003); and translation project organisations have themselves become interested in applying TQM in their own quality assurance processes, especially as their activities also include digital translation mechanisms (DGT, 2009; Mitterlehner, 2012; BSI ISO 17100, 2015). The starting point of this research was to understand the existing quality management mechanisms and processes across English to Arabic translation companies and how they could be improved in a corporate context. Kalima, a translation project organisation, was selected as the leading case study, given its well-established reputation in the United Arab Emirates (UAE), as well as the wider Arab World, as a serious contributor to the body of translated into Arabic. Two other non-profit translation service providers (TSPs) of the sector were also analysed, so as to have a sound overview and to provide a broader insight into managerial practices concerning quality assurance within translation processes, and thus to determine whether TQM in its wider state-of-the-art sense could be relevant for the translation sector in the UAE. This research has developed a framework for implementing TQM in TSPs based on three main dimensions of critical success factors (CSFs); namely, leadership commitment and strategic direction; managerial and structural reforms; and procedural changes. The proposed framework suggests appointing a portfolio manager as suggested by Giammarresi (2011) in order to regulate the organisational strategy and optimise resources for effective and efficient quality in TSPs performing in a similar context to those particularly studied in this research. The researcher’s critical analysis is the basis for a novel framework that may be of interest for TSPs, and may be used as a benchmark for further research

    ORTHOGRAPHIC ENRICHMENT FOR ARABIC GRAMMATICAL ANALYSIS

    Get PDF
    Thesis (Ph.D.) - Indiana University, Linguistics, 2010The Arabic orthography is problematic in two ways: (1) it lacks the short vowels, and this leads to ambiguity as the same orthographic form can be pronounced in many different ways each of which can have its own grammatical category, and (2) the Arabic word may contain several units like pronouns, conjunctions, articles and prepositions without an intervening white space. These two problems lead to difficulties in the automatic processing of Arabic. The thesis proposes a pre-processing scheme that applies word segmentation and word vocalization for the purpose of grammatical analysis: part of speech tagging and parsing. The thesis examines the impact of human-produced vocalization and segmentation on the grammatical analysis of Arabic, then applies a pipeline of automatic vocalization and segmentation for the purpose of Arabic part of speech tagging. The pipeline is then used, along with the POS tags produced, for the purpose of dependency parsing, which produces grammatical relations between the words in a sentence. The study uses the memory-based algorithm for vocalization, segmentation, and part of speech tagging, and the natural language parser MaltParser for dependency parsing. The thesis represents the first approach to the processing of real-world Arabic, and has found that through the correct choice of features and algorithms, the need for pre-processing for grammatical analysis can be minimized

    Lexical Borrowings in Immigrant Speech: A Sociolinguistic Study of Ḥassāniyya Arabic Speakers in Medina (Saudi Arabia)

    Get PDF
    This study investigates lexical borrowings and the phonological processes associated with them as an outcome of the dialect contact situation in Medina (Saudi Arabia) between the Shanāqiṭa immigrant community, who immigrated to this holy city from Mauritania and who speak Ḥassāniyya Arabic, and the urban Hijazi community, who speak urban Hijazi Arabic. The study introduces to the reader the main phonological and morphological features of these two Arabic dialects and presents traditional and modern approaches towards lexical borrowings in Arabic. The present study adopts the quantitative sociolinguistic method which is widely used in sociolinguistic studies in order to analyse the speech of this immigrant community (focusing on borrowings from urban Hijazi Arabic), and correlates it with the social variables of age, educational attainment, ethnicity and gender. The study focuses on six phonological variables which are correlated with the social variables; these variables represent common phonological features which contrast both dialects. These phonological variables are divided into two groups: consonantal and vocalic variables. For the consonantal variables, the present study investigates the variation of three variables: de-affrication ([dʒ] → [ʒ]), lenition ([f] → [v]), and initial hamza dropping ([ʔ] → [Ø]). As for the vocalic variables, the research examines three variables: re-syllabification, consisting of initial [CV] and sequenced [CV.CV] → syncope, epenthesis and metathesis; diphthongisation: monophthongs → diphthongs; and vowel centralisation: (i), (u) → [ə]. The statistical data analysis reveals that age (generation) plays a central role in the phonological variation between the study participants when they borrow linguistic elements from urban Hijazi Arabic; ethnicity is the second most important factor. The analysis also shows that socio-cultural and socio-psychological factors facilitate the strong linguistic preservation of Ḥassāniyya Arabic by this immigrant community in Medina
    corecore