6 research outputs found

    MasakhaPOS: Part-of-Speech Tagging for Typologically Diverse African languages

    Get PDF
    In this paper, we present AfricaPOS, the largest part-of-speech (POS) dataset for 20 typologically diverse African languages. We discuss the challenges in annotating POS for these languages using the universal dependencies (UD) guidelines. We conducted extensive POS baseline experiments using both conditional random field and several multilingual pre-trained language models. We applied various cross-lingual transfer models trained with data available in the UD. Evaluating on the AfricaPOS dataset, we show that choosing the best transfer language(s) in both single-source and multi-source setups greatly improves the POS tagging performance of the target languages, in particular when combined with parameter-fine-tuning methods. Crucially, transferring knowledge from a language that matches the language family and morphosyntactic properties seems to be more effective for POS tagging in unseen languages

    MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition

    Get PDF
    African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages

    Assessment of the HER2DX Assay in Patients with ERBB2 -Positive Breast Cancer Treated with Neoadjuvant Paclitaxel, Trastuzumab, and Pertuzumab

    Get PDF
    Importance: Patients with early-stage ERBB2 (formerly HER2)-positive breast cancer (ERBB2+BC) who experience a pathologic complete response (pCR) after receiving neoadjuvant therapy have favorable survival outcomes. Predicting the likelihood of pCR may help optimize neoadjuvant therapy. Objective: To test the ability of the HER2DX assay to predict the likelihood of pCR in patients with early-stage ERBB2+BC who are receiving deescalated neoadjuvant therapy. Design, Setting, and Participants: In this diagnostic/prognostic study, the HER2DX assay was administered on pretreatment tumor biopsy samples from patients enrolled in the single-arm, multicenter, prospective phase 2 DAPHNe clinical trial who had newly diagnosed stage II to III ERBB2+BC that was treated with neoadjuvant paclitaxel weekly for 12 weeks plus trastuzumab and pertuzumab every 3 weeks for 4 cycles. Interventions and Exposures: The HER2DX assay is a classifier derived from gene expression and limited clinical features that provides 2 independent scores to predict prognosis and likelihood of pCR in patients with early-stage ERBB2+BC. The assay was administered on baseline tumor samples from 80 of 97 patients (82.5%) in the DAPHNe trial. Main Outcomes and Measures: The primary aim was to test the ability of the HER2DX pCR likelihood score (as a continuous variable from 0-100) to predict pCR (ypT0/isN0). Results: Of 80 participants, 79 (98.8%) were women and there were 4 African American (5.0%), 6 Asian (7.5%), 4 Hispanic (5.0%), and 66 White individuals (82.5%); the mean (range) age was 50.3 (26.0-78.0) years. The HER2DX pCR score was significantly associated with pCR (odds ratio, 1.05; 95% CI, 1.03-1.08; P <.001). The pCR rates in the HER2DX high, medium, and low pCR score groups were 92.6%, 63.6%, and 29.0%, respectively (high vs low odds ratio, 30.6; P <.001). The HER2DX pCR score was significantly associated with pCR independently of hormone receptor status, ERBB2 immunohistochemistry score, HER2DX ERBB2 expression score, and prediction analysis of microarray 50 ERBB2-enriched subtype. The correlation between the HER2DX pCR score and prognostic risk score was weak (Pearson coefficient, -0.12). Performance of the risk score could not be assessed due to lack of recurrence events. Conclusions and Relevance: The results of this diagnostic/prognostic study suggest that the HER2DX pCR score assay could predict pCR following treatment with deescalated neoadjuvant paclitaxel with trastuzumab and pertuzumab in patients with early-stage ERBB2+BC. The HER2DX pCR score might guide therapeutic decisions by identifying patients who are candidates for deescalated or escalated approaches

    MasakhaNER 2.0:Africa-centric Transfer Learning for Named Entity Recognition

    No full text
    African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages
    corecore