Search CORE

1,192 research outputs found

The Development of Dutch and Afrikaans Language Resources for Compound Boundary Analysis

Author: Aussems Suzanne
Eiselen Roald
Emmery Chris
van Huyssteen Gerhard
van Zaanen M.M.
Publication venue
Publication date: 01/01/2014
Field of study

Tilburg University Repository

Automatic Compound Processing:Compound Splitting and Semantic Analysis for Afrikaans and Dutch

Author: Daelemans W.
van Huyssteen G.B.
van Zaanen M.
Verhoeven B.
Publication venue
Publication date: 01/01/2014
Field of study

Tilburg University Repository

Factors in the persistence or decline of ethnic group mobilisation: a conceptual review and case study of cultural group responses among Afrikaners in post-apartheid South Africa

Author: Schlemmer Lawrence
Publication venue: Department of Political Studies
Publication date: 14/05/2020
Field of study

The candidate has two major linked interests. One is to reconcile competing explanations of ethnicity, and the other is to explore the factors underlying ethnicity in the light of a case study of the rise and decline of ethnic mobilisation among white Afrikaners in South Africa. For many observers the recent apparent "decomposition" of Afrikaner nationalist mobilisation has been surprising, and the factors associated with this trend were expected to contain insights relevant to the theoretical debate. The first part of the thesis is a review of key aspects of literature which offers alternative explanations of ethnic attachments and mobilisation. It commences with a theme-setting example of a reconciliation of alternative viewpoints. At the end of the literature review a series of propositions is offered, suggesting the utility of an integration of alternative perspectives. The case study of Afrikaner ethnic mobilisation commences with a historical overview of the emergence of Afrikaner ethnic nationalism, from the early colonial settlement up to the present. Thereafter a wide range of empirical, survey-based evidence is presented, including exploratory factor analyses, covering patterns in the cultural, racial, socio-economic and political attitudes of Afrikaners, comparing their responses with those of other South Africans. An account of recent political change and the responses of Afrikaners to the events is given. In the final chapter conclusions drawn from the evidence are presented as further propositions in a broader theoretical context. The fragmentation of Afrikaner ethnic nationalism is found to be associated with the bureaucratization of ethnicity during the period of apartheid rule, ambivalence on group boundaries, the usurpation of cultural identity by race, and a breakdown of internal coordination processes which ethnic mobilisation appears to require. At the same time a core of ethnic commitment, substantially independent of its material and political utility, is found to persist, surrounded by a wider compound of racial, cultural and political consciousness. Alternative scenarios of probable future developments are tentatively offered. The analysis appears to support the initial argument that ethnic mobilisation involves full combinations of the processes which competing theories usually pit against one another. The process of ethnic mobilisation involves a variable incorporation of elements of class, group status and honour and political activation, in which identity commitment, co-ordinating agencies and ethnic boundary-construction interact as defining and integrating elements

Cape Town University OpenUCT

CompoundPiece: Evaluating and Improving Decompounding Performance of Language Models

Author: Minixhofer Benjamin
Pfeiffer Jonas
Vulić Ivan
Publication venue
Publication date: 23/10/2023
Field of study

While many languages possess processes of joining two or more words to create compound words, previous studies have been typically limited only to languages with excessively productive compound formation (e.g., German, Dutch) and there is no public dataset containing compound and non-compound words across a large number of languages. In this work, we systematically study decompounding, the task of splitting compound words into their constituents, at a wide scale. We first address the data gap by introducing a dataset of 255k compound and non-compound words across 56 diverse languages obtained from Wiktionary. We then use this dataset to evaluate an array of Large Language Models (LLMs) on the decompounding task. We find that LLMs perform poorly, especially on words which are tokenized unfavorably by subword tokenization. We thus introduce a novel methodology to train dedicated models for decompounding. The proposed two-stage procedure relies on a fully self-supervised objective in the first stage, while the second, supervised learning stage optionally fine-tunes the model on the annotated Wiktionary data. Our self-supervised models outperform the prior best unsupervised decompounding models by 13.9% accuracy on average. Our fine-tuned models outperform all prior (language-specific) decompounding tools. Furthermore, we use our models to leverage decompounding during the creation of a subword tokenizer, which we refer to as CompoundPiece. CompoundPiece tokenizes compound words more favorably on average, leading to improved performance on decompounding over an otherwise equivalent model using SentencePiece tokenization.Comment: EMNLP 202

arXiv.org e-Print Archive

Compounding in Namagowab and English: (exploring meaning creation in compounds)

Author: Caroline Kloppert
Publication venue: 'Japanese Association of Sign Linguistics'
Publication date: 01/01/2016
Field of study

This essay investigates compounding in Namagowab and English, which belong to two widely divergent groups of languages, the Khoesan and Indo-European, respectively. The first motive is to investigate how and why new words are created from existing ones. The reading and data interpretation seeks an understanding of word formation and an overview of semantic compositionality, structure and productivity, within the broad context of cognitive, lexicalist and distributed morphology paradigms. This coupled with history reading about the languages and its people, is used to speculate about why compounds feature in lexical creation. Compounding is prevalent in both languages and their distance in terms of phylogenetic relationships should allow limited generalizing about these processes of formation. Word lists taken from dictionaries in both languages were analyzed by entering the words in Excel spreadsheets so that various attributes of these words, such as word type, compound class (Noun, Verb, Preposition, Adjective and Adverb) and constituent class could be counted, and described with formulae, and compound and constituent meaning analyzed. The conclusion was that socio historical factors such as language contact, and aspects of cognition such as memory and transparency, account for compounding in a language in addition to typology

Cape Town University OpenUCT

Multilingualism and the structure of code-mixing

Author: Sippola Eeva
Publication venue: Routledge
Publication date: 01/01/2020
Field of study

Non peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Recognition, Regulation, Revitalisation

Author
Publication venue: 'AFRICAN SUN MeDIA'
Publication date: 25/10/2022
Field of study

Recognition, Regulation, Revitalisation: Place Names and Indigenous Languages is a selection of double-blind peer-reviewed papers from the 5th International Symposium on Place Names that took place 18-20 September 2020 in Clarens, South Africa. The symposium celebrated 2019 as the International Year of Indigenous Languages as declared by the United Nations

Directory of Open Access Books (DOAB)