7 research outputs found
Communication
This chapter discusses research on the
capacity and effectiveness of government’s
communications strategy as South Africa
went through the various stages of lockdown
during the Covid-19 pandemic in 2020. It
probes the working relationship between
communications from all spheres of
government and community, private, digital,
and social media, as well as organised civil
society before and during the lockdown and
assesses its impact and efficacy.
Recognising the multilingual nature of
South African society, the urban–rural
digital divide, and the prohibitive costs of
data, the chapter identifies lessons and
reaffirms the relevance of the development
communications approach to government–
citizen communications. It motivates for the prioritisation of accessible, multilingual digital
communications with a citizen feedback loop
that is transparent and responsive to ensure
people are informed and empowered, as
envisioned in the Constitution.
Such responsiveness needs an enabling
environment from government and from
the public, private, and community media
landscape. Collaboration and cooperation
across these sectors with government
communications and with the nongovernmental
health and communications
sectors is critical in such an all-encompassing
crisis. The chapter highlights the need to
continue to understand South Africa’s highly
diverse communication space, in which
digital new media platforms exist alongside
loudhailers, and make accommodations in
legislation, policy, and government coordination
with social partners to reach all people across
the digital, class, and language divides.This chapter 4 is published in the first edition of South Africa Covid-19 country report in June 2021.https://www.gov.za/sites/default/files/gcis_document/202206/sa-covid-19-reporta.pd
MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages
In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages
AfriMTE and AfriCOMET : Empowering COMET to Embrace Under-resourced African Languages
Despite the progress we have recorded in scaling multilingual machine translation (MT) models and evaluation data to several under-resourced African languages, it is difficult to measure accurately the progress we have made on these languages because evaluation is often performed on n-gram matching metrics like BLEU that often have worse correlation with human judgments. Embedding-based metrics such as COMET correlate better; however, lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with a simplified MQM guideline for error-span annotation and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET, a COMET evaluation metric for African languages by leveraging DA training data from high-resource languages and African-centric multilingual encoder (AfroXLM-Roberta) to create the state-of-the-art evaluation metric for African languages MT with respect to Spearman-rank correlation with human judgments (+0.406)
AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441)