Search CORE

8 research outputs found

Clause structure, pro-drop and control in Wolof: an LFG/XLE perspective

Author: Dione Cheikh Bamba
Publication venue: Nordic Africa Research Network
Publication date: 01/01/2019
Field of study

This paper provides a formal description of the syntactic analysis of core constructions of Wolof clausal/verbal morphosyntax within the Lexical-Functional Grammar formalism. This includes the basic phrase structure, pro-drop, and control relations. The Wolof grammar is implemented in XLE and uses a cascade of finite-state transducers for morphological analysis and tokenization. This work is part of the ongoing process on building language resources and tools for Wolof, in particular a computational grammar.publishedVersio

University of Bergen

NORA - Norwegian Open Research Archives

LFG parse disambiguation for Wolof

Author: Dione Cheikh Bamba
Publication venue: Institute of Computer Science of the Polish Academy of Sciences
Publication date: 01/01/2014
Field of study

This paper presents several techniques for managing ambiguity in LFG parsing of Wolof, a less-resourced Niger-Congo language. Ambiguity is pervasive in Wolof and This raises a number of theoretical and practical issues for managing ambiguity associated with different objectives. From a theoretical perspective, the main aim is to design a large-scale grammar for Wolof that is able to make linguistically motivated disambiguation decisions, and to find appropriate ways of controlling ambiguity at important interface representations. The practical aim is to develop disambiguation strategies to improve the performance of the grammar in terms of efficiency, robustness and coverage. To achieve these goals, different avenues are explored to manage ambiguity in the Wolof grammar, including the formal encoding of noun class indeterminacy, lexical specifications, the use of Constraint Grammar models (Karlsson 1990) for morphological disambiguation, the application of the c-structure pruning mechanism (Cahill et al. 2007, 2008; Crouch et al. 2013), and the use of optimality marks for preferences (Frank et al. 1998, 2001). The parsing system is further controlled by packing ambiguities. In addition, discriminant-based techniques for parse disambiguation (Rosén et al. 2007) are applied for treebanking purposes

University of Bergen

Biblioteka Nauki - repozytorium artykuÅÃ³w

Directory of Open Access Journals

NORA - Norwegian Open Research Archives

Clause structure, pro-drop and control in Wolof: an LFG/XLE perspective

Author: Dione Cheikh Bamba
Publication venue: Nordic Africa Research Network
Publication date: 01/01/2019
Field of study

NORA - Norwegian Open Research Archives

Finite-State Tokenization for a Deep Wolof LFG Grammar

Author: Dione Cheikh M. Bamba
Publication venue: 'Universtity of Bergen Library'
Publication date: 23/11/2017
Field of study

This paper presents a finite-state transducer (FST) for tokenizing and normalizing natural texts that are input to a large-scale LFG grammar for Wolof. In the early stage of grammar development, a language-independent tokenizer was used to split the input stream into a unique sequence of tokens. is simple transducer took into account general character classes, without using any language-specific information. However, at a later stage of grammar development, uncovered and non-trivial tokenization issues arose, including issues related to multi-word expressions (MWEs), clitics and text normalization. As a consequence, the tokenizer was extended by integrating FST components. is extension was crucial for scaling the hand-written grammar to free text and for enhancing the performance of the parser.

Bergen Open Access Publishing (University of Bergen Library)

Directory of Open Access Journals

Design and Development of Part-of-Speech-Tagging Resources for Wolof (Niger-Congo, spoken in Senegal)

Author: Dione Cheikh M. Bamba
Kuhn Jonas
Zarrieß Sina
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2010
Field of study

Dione CMB, Kuhn J, Zarrieß S. Design and Development of Part-of-Speech-Tagging Resources for Wolof (Niger-Congo, spoken in Senegal). In: Proceedings of the Seventh International Conference on Language Resources and Evaluation ({LREC}'10). Valletta, Malta: European Language Resources Association (ELRA); 2010.In this paper, we report on the design of a part-of-speech-tagset for Wolof and on the creation of a semi-automatically annotated gold standard. In order to achieve high-quality annotation relatively fast, we first generated an accurate lexicon that draws on existing word and name lists and takes into account inflectional and derivational morphology. The main motivation for the tagged corpus is to obtain data for training automatic taggers with machine learning approaches. Hence, we took machine learning considerations into account during tagset design and we present training experiments as part of this paper. The best automatic tagger achieves an accuracy of 95.2{\%} in cross-validation experiments. We also wanted to create a basis for experimenting with annotation projection techniques, which exploit parallel corpora. For this reason, it was useful to use a part of the Bible as the gold standard corpus, for which sentence-aligned parallel versions in many languages are easy to obtain. We also report on preliminary experiments exploiting a statistical word alignment of the parallel text

Publications at Bielefeld University

Correction orthographique pour la langue wolof : état de l'art et perspectives

Author: Abdoulaye N'Diaye
Bao-Diop Sokhna
Cissé Mame-Thierno
Dione Cheikh Bamba
Khoule Mouhamadou
Lo Alla
Mangeot Mathieu
Nguer El Hadji Mamadou
Publication venue: HAL CCSD
Publication date: 04/07/2016
Field of study

International audienc

Hal - Université Grenoble Alpes

HAL Université de Savoie

ParGramBank: The ParGram Parallel Treebank

Author: Arka I Wayan
Bamba Dione Cheikh
Butt Miriam
Cetinoglu Ozlem
De Smedt Koenraad
Dyvik Helge
Holloway King Tracy
Lazcko Tibor
Meurer Paul
Mistica Meladel
Patejuk Agnieszka
Rakosi Gyorgy
Rosen Victoria
Sulger Sebastian
Publication venue: OmniPress
Publication date: 20/12/2020
Field of study

This paper discusses the construction of a parallel treebank currently involving ten languages from six language families. The treebank is based on deep LFG (LexicalFunctional Grammar) grammars that were developed within the framework of the ParGram (Parallel Grammar) effort. The grammars produce output that is maximally parallelized across languages and language families. This output forms the basis of a parallel treebank covering a diverse set of phenomena. The treebank is publicly available via the INESS treebanking environment, which also allows for the alignment of language pairs. We thus present a unique, multilayered parallel treebank that represents more and different types of languages than are available in other treebanks, that represents deep linguistic knowledge and that allows for the alignment of sentences at several levels: dependency structures, constituency structures and POS informatio

The Australian National University

MasakhaNER 2.0:Africa-centric Transfer Learning for Named Entity Recognition

Author: Abdulmumin Idris
Adelani David Ifeoluwa
Adewumi Tosin P.
Adeyemi Mofetoluwa
Ahia Orevaoghene
Alabi Jesujoba O.
Anuoluwapo Aremu
Beukman Michael
Bukula Andiswa
Buzaaba Happy
Chukwuneke Chiamaka
Dione Cheikh M. Bamba
Dossou Bonaventure F. P.
Emezue Chris Chinenye
Ezeani Ignatius
Gitau Catherine
Gwadabe Tajuddeen
Hacheme Gilles
Kabore Fatoumata Ouoba
Kalipe Godson
Klakow Dietrich
Koagne Victoire Memdjokam
Lignos Constantine
Mabuya Rooweither
Macucwa Tebogo
Marivate Vukosi
Mbaye Derguene
Mboning Elvis
Mokono Neo L.
Muhammad Shamsuddeen Hassan
Mukiibi Jonathan
Munkoh-Buabeng Edwin
Nabende Peter
Nakatumba-Nabende Joyce
Neubig Graham
Ngoli Tatiana Moteu
Ogayo Perez
Ogundepo Odunayo
Palen-Michel Chester
Rijhwani Shruti
Ruder Sebastian
Sibanda Blessing
Tapo Allahsera Auguste
Taylor Amelia
Yousuf Oreen
Publication venue
Publication date: 15/11/2022
Field of study

African languages are spoken by over a billion people, but are underrepresented in NLP research and development. The challenges impeding progress include the limited availability of annotated datasets, as well as a lack of understanding of the settings where current methods are effective. In this paper, we make progress towards solutions for these challenges, focusing on the task of named entity recognition (NER). We create the largest human-annotated NER dataset for 20 African languages, and we study the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance. We show that choosing the best transfer language improves zero-shot F1 scores by an average of 14 points across 20 languages compared to using English. Our results highlight the need for benchmark datasets and models that cover typologically-diverse African languages

Lancaster E-Prints