30 research outputs found
Arabic and English News Coverage on aljazeera.net
The controversial Al Jazeera network, with its Arabic and English news websites, is an interesting object for comparative study. This study compares the\ud
two language versions in terms of their layouts and the structural features, regional and thematic coverage, and ideological perspective reflected in the headlines of\ud
news reports. Content analysis and critical discourse analysis revealed differences between the two versions for all aspects except for thematic coverage, indicating\ud
systematic biases in coverage, alongside efforts to present ideological balance. \ud
\ud
<br />\ud
<br />\ud
\ud
Le réseau Al Jazeera, avec ses sites d’information en arabe et en anglais\ud
représente un objet intéressant pour une étude comparative. Cette étude compare les versions dans les deux langues, en ce qui concerne la présentation et les\ud
caractéristiques structurelles, la couverture régionale et thématique, ainsi que la perspective idéologique telle qu’elle est reflétée par les grands titres. L’analyse du\ud
contenu et l’analyse du discours révèlent des différences entre les deux versions sur tous les aspects, sauf pour la couverture thématique et pointent un biais\ud
systématique pour les domaines couverts et des efforts pour assurer un équilibre idéologiqu
TARJAMAT: Evaluation of Bard and ChatGPT on Machine Translation of Ten Arabic Varieties
Despite the purported multilingual proficiency of instruction-finetuned large
language models (LLMs) such as ChatGPT and Bard, the linguistic inclusivity of
these models remains insufficiently explored. Considering this constraint, we
present a thorough assessment of Bard and ChatGPT (encompassing both GPT-3.5
and GPT-4) regarding their machine translation proficiencies across ten
varieties of Arabic. Our evaluation covers diverse Arabic varieties such as
Classical Arabic (CA), Modern Standard Arabic (MSA), and several country-level
dialectal variants. Our analysis indicates that LLMs may encounter challenges
with dialects for which minimal public datasets exist, but on average are
better translators of dialects than existing commercial systems. On CA and MSA,
instruction-tuned LLMs, however, trail behind commercial systems such as Google
Translate. Finally, we undertake a human-centric study to scrutinize the
efficacy of the relatively recent model, Bard, in following human instructions
during translation tasks. Our analysis reveals a circumscribed capability of
Bard in aligning with human instructions in translation contexts. Collectively,
our findings underscore that prevailing LLMs remain far from inclusive, with
only limited ability to cater for the linguistic and cultural intricacies of
diverse communities.Comment: ArabicNLP 202
Splitting Arabic Texts into Elementary Discourse Units
International audienceIn this article, we propose the first work that investigates the feasibility of Arabic discourse segmentation into elementary discourse units within the segmented discourse representation theory framework. We first describe our annotation scheme that defines a set of principles to guide the segmentation process. Two corpora have been annotated according to this scheme: elementary school textbooks and newspaper documents extracted from the syntactically annotated Arabic Treebank. Then, we propose a multiclass supervised learning approach that predicts nested units. Our approach uses a combination of punctuation, morphological, lexical, and shallow syntactic features. We investigate how each feature contributes to the learning process. We show that an extensive morphological analysis is crucial to achieve good results in both corpora. In addition, we show that adding chunks does not boost the performance of our system
Online News Sites and Journalism 2.0: Reader Comments on Al Jazeera Arabic
The current paper investigates reader commenting on news sites as one facet of journalism 2.0. Specifically, the themes, frequency, and regional coverage of readers’ comments—and in general, their activity levels and distribution—are considered, with a goal to increase knowledge of convergent media and computer-mediated communication (CMC), as well as shed light on the interactivity strategies adopted by influential news producers. The corpus is collected from the Arabic news site of the controversial Middle East-based, bilingual network Al Jazeera. Reader commenting was found to be a regular occurrence on the site but distributed unevenly across stories. The stories focused mostly on themes related to military and political violence, politics, and foreign relations, and covered events related to the Arab world more than other regions. Also, patterns of commenting varied according to day of the week and position of the story on the web page. Overall, these findings suggest that citizen journalism—journalism is performed by lay persons—on Al Jazeera tends to be shaped by the coverage and layout of the news site. Moreover, citizen participation in online news sites such as Al Jazeera is still far from ideal, in that commenters are given neither the access nor the facilitation to use modalities other than written text. These limitations are critiqued in light of contemporary discourses about media convergence and journalism 2.0
Trusted Data Forever: Is AI the Answer?
Archival institutions and programs worldwide work to ensure that the records of governments, organizations, communities, and individuals are preserved for future generations as cultural heritage, as sources of rights, and as vehicles for holding the past accountable and to inform the future. This commitment is guaranteed through the adoption of strategic and technical measures for the long-term preservation of digital assets in any medium and form — textual, visual, or aural. Public and private archives are the largest providers of data big and small in the world and collectively host yottabytes of trusted data, to be preserved forever. Several aspects of retention and preservation, arrangement and description, management and administrations, and access and use are still open to improvement. In particular, recent advances in Artificial Intelligence (AI) open the discussion as to whether AI can support the ongoing availability and accessibility of trustworthy public records. This paper presents preliminary results of the InterPARES Trust AI (“I Trust AI") international research partnership, which aims to (1) identify and develop specific AI technologies to address critical records and archives challenges; (2) determine the benefits and risks of employing AI technologies on records and archives; (3) ensure that archival concepts and principles inform the development of responsible AI; and (4) validate outcomes through a conglomerate of case studies and demonstrations