7 research outputs found

    Communication

    Get PDF
    This chapter discusses research on the capacity and effectiveness of government’s communications strategy as South Africa went through the various stages of lockdown during the Covid-19 pandemic in 2020. It probes the working relationship between communications from all spheres of government and community, private, digital, and social media, as well as organised civil society before and during the lockdown and assesses its impact and efficacy. Recognising the multilingual nature of South African society, the urban–rural digital divide, and the prohibitive costs of data, the chapter identifies lessons and reaffirms the relevance of the development communications approach to government– citizen communications. It motivates for the prioritisation of accessible, multilingual digital communications with a citizen feedback loop that is transparent and responsive to ensure people are informed and empowered, as envisioned in the Constitution. Such responsiveness needs an enabling environment from government and from the public, private, and community media landscape. Collaboration and cooperation across these sectors with government communications and with the nongovernmental health and communications sectors is critical in such an all-encompassing crisis. The chapter highlights the need to continue to understand South Africa’s highly diverse communication space, in which digital new media platforms exist alongside loudhailers, and make accommodations in legislation, policy, and government coordination with social partners to reach all people across the digital, class, and language divides.This chapter 4 is published in the first edition of South Africa Covid-19 country report in June 2021.https://www.gov.za/sites/default/files/gcis_document/202206/sa-covid-19-reporta.pd

    MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages

    Get PDF
    In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages

    MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

    Get PDF
    African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages

    AfriMTE and AfriCOMET : Empowering COMET to Embrace Under-resourced African Languages

    Get PDF
    Despite the progress we have recorded in scaling multilingual machine translation (MT) models and evaluation data to several under-resourced African languages, it is difficult to measure accurately the progress we have made on these languages because evaluation is often performed on n-gram matching metrics like BLEU that often have worse correlation with human judgments. Embedding-based metrics such as COMET correlate better; however, lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with a simplified MQM guideline for error-span annotation and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET, a COMET evaluation metric for African languages by leveraging DA training data from high-resource languages and African-centric multilingual encoder (AfroXLM-Roberta) to create the state-of-the-art evaluation metric for African languages MT with respect to Spearman-rank correlation with human judgments (+0.406)

    AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages

    Get PDF
    Despite the recent progress on scaling multilingual machine translation (MT) to several under-resourced African languages, accurately measuring this progress remains challenging, since evaluation is often performed on n-gram matching metrics such as BLEU, which typically show a weaker correlation with human judgments. Learned metrics such as COMET have higher correlation; however, the lack of evaluation data with human ratings for under-resourced languages, complexity of annotation guidelines like Multidimensional Quality Metrics (MQM), and limited language coverage of multilingual encoders have hampered their applicability to African languages. In this paper, we address these challenges by creating high-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AfriCOMET: COMET evaluation metrics for African languages by leveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-the-art MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441)
    corecore