253 research outputs found

    Assessing the Impact of Vocabulary Similarity on Multilingual Information Retrieval for Bantu Languages

    Get PDF
    Despite the availability of massive open information and efforts to promote multilingualism on the Web, content in Bantu languages remains negligible. Additionally, Information Retrieval (IR) systems, such as the Google search engine, use algorithms that work well with languages that have the most content. Similarities across related languages such as vocabulary overlap can potentially be exploited to provide more opportunities for information access for languages with limited digital content. This study investigates how vocabulary similarity impacts on the quality of search results in Multilingual Information Retrieval (MLIR) environments. More specifically, the study evaluates indexing strategies for MLIR and their effect on the quality of retrieval for related languages. A multilingual test collection consisting of two Bantu languages, Citumbuka and Chichewa, and English was developed and used in the evaluation. The results show that when comparing related and unrelated language pairs, MLIR indexing strategies result in comparable or worse retrieval performance

    Ranking by Language Similarity for Resource Scarce Southern Bantu Languages

    Get PDF
    Resource Scarce Languages (RSLs) lack sufficient resources to use Cross-Lingual Information Retrieval (CLIR) techniques and tools such as machine translation. Consequentially, searching using RSLs is frustrating and usually ends in unsuccessful struggling search. In such search tasks, search engines return low-quality results; relevant documents are either limited and lowly ranked or non-existent. Previous work has shown that alternative relevant results written in similar languages, including dialects, neighbouring and genetically related languages, can assist multilingual RSLs speakers to complete their search tasks. To improve the quality of search results in this context, we propose the re-ranking of documents based on the similarity between the language of the document and the language of the query. Accordingly, we created a dataset of four Southern Bantu languages that includes documents, topics, topical relevance and intelligibility features, and document utility annotations. To understand the intelligibility dimension of the studied languages, we conducted online intelligibility test experiments and used the data for feature selection and intelligibility prediction. We performed re-ranking of search results using offline evaluation, exploring Learning To Rank (LTR). Our results show that integrating topical relevance and intelligibility in ranking slightly improves retrieval effectiveness. Further, results on intelligibility prediction show that classification of intelligibility is feasible at a fair accuracy

    Intercomprehension in Retrieval: User Perspectives on Six Related Scarce Resource Languages

    Get PDF
    The majority of web content is published in languages not accessible to many potential users who may only be able to read and understand their local languages. Prior research has focused on using translation to provide users with information written in other languages, yet there are still many languages with little or no such resources. In this paper, we propose the use of intercomprehension - a form of communication in which speakers of two different languages communicate using their own languages, mainly due to similarities between the languages. Accordingly, we conducted a user study to explore user interaction behaviour in a retrieval environment where intercomprehension is expected; to investigate the usefulness of search results, which assumes intelligibility and relevance; and investigate affective episodes associated with intercomprehension in retrieval through retrospection. Although intercomprehension may come with a cost to understand unfa- miliar languages, user preference of ranking of results in related languages incorporates intelligibility, which assumes intercomprehension. Our findings also suggest that intercomprehension is useful in retrieval for related languages - users are able to identify relevant documents as well as complete search tasks by applying intercomprehension. However, the negative emotions or frustration associated with intercomprehension suggest that this type of interaction should be used in extreme cases where there are no relevant or few documents available associated with the query

    Low-Resource Language Modelling of South African Languages

    Get PDF
    Language models are the foundation of current neural network-based models for natural language understanding and generation. However, research on the intrinsic performance of language models on African languages has been extremely limited, and is made more challenging by the lack of large or standardised training and evaluation sets that exist for English and other high-resource languages. In this paper, we evaluate the performance of open-vocabulary language models on low-resource South African languages, using byte-pair encoding to handle the rich morphology of these languages. We evaluate different variants of n-gram models, feedforward neural networks, recurrent neural networks (RNNs), and Transformers on small-scale datasets. Overall, well-regularized RNNs give the best performance across two isiZulu and one Sepedi datasets. Multilingual training further improves performance on these datasets. We hope that this work will open new avenues for research into multi-lingual and low-resource language modelling for African languages

    Vocabulary assessment in grade 1 Afrikaans-English bilinguals

    Get PDF
    A dissertation submitted to The Department of Speech Pathology and Audiology School of Human and Community Development Faculty of Humanities University of the Witwatersrand In fulfilment of the requirements of the degree Master of Arts in Speech-Pathology March, 2017Purpose: There is a need to develop and refine assessment measures on bilingual children, since language measures used on monolingual individuals cannot and should not be directly applied to the bilingual population (Hoff et al., 2012; O’Brien, 2015). The occurrence of Afrikaans-English bilinguals in South Africa provides a rewarding area of investigation for the Speech-Language Therapist (SLT) (Penn & Jordaan, 2016), as the Afrikaans language is well-researched and many individuals from this population are considered to be more balanced bilinguals than other bilingual groups (Coetzee-Van Rooyen, 2013).The assessment of vocabulary in bilingual children has received particular attention because limited vocabulary is one of the first signs of language impairment (Ellis & Thal, 2008). This research aimed to determine how Grade 1 Afrikaans-English bilingual children perform on a bilingual vocabulary assessment. Design: A quantitative, descriptive, cross-sectional and comparative design was used in this study. Method: The Expressive One-Word Picture Vocabulary Test 4 (EOWPVT-4) (Martin & Brownell, 2011a) and the Receptive One-Word Picture Vocabulary Test 4 (ROWPVT-4) (Martin & Brownell, 2011b) were used to assess 30 grade 1 Englishspeaking monolinguals. In addition an adapted Afrikaans expressive one word vocabulary test based on the EOWPVT-4 and an adapted Afrikaans receptive one word vocabulary test based on the ROWPVT-4 were used to assess 30 grade 1 Afrikaans-English bilinguals. Permission from the schools involved, informed consent from the parent/s or guardian/s as well as child assent were obtained. The data gathered from testing was tabulated, interpreted with the use of mean scores and standard deviations (SD) and analysed using within- and between -group statistical comparisons. Mean raw scores were converted to percentages for ease of comparison between receptive and expressive scores. Results: Within-language comparisons revealed that on the English test, receptive and expressive scores within both the English monolingual and bilingual groups were significantly correlated. Expressive scores could therefore be predicted from receptive scores or vice versa in both the English monolingual and bilingual groups. However, the receptive and expressive score on the Afrikaans tests were not significantly correlated. In the bilingual group, the receptive score in Afrikaans was significantly higher than the expressive score suggesting that although the bilingual participants had good knowledge of Afrikaans vocabulary they could not always express this in a naming test. They frequently used the English word. Afrikaans is possibly being used less in the home and school environments so that the English words are more familiar. Nonetheless, both the monolingual and bilingual participants had significantly higher scores on the receptive vocabulary assessment than on the expressive vocabulary assessments in both English and Afrikaans. Between-group comparison revealed that the differences between the scores of the English monolingual and Afrikaans-English bilingual learners were not significant on either the receptive or expressive vocabulary measure in English. The bilingual group performed as well as the English participants on the English tests, suggesting that they are not disadvantaged in the language of instruction. The norms used in the EOWPVT and the ROWPVT were applicable to both the monolingual and bilingual groups’ scores for the age range of the participants and highlighted that these tests were suitable in assessing an English monolingual and Afrikaans-English bilingual child in South Africa. When composite scoring was used the bilinguals scored significantly better than their monolingual peers on both the receptive and expressive measures, which confirmed the premise behind this study- that composite scoring should be used to gain an accurate assessment of a bilingual child’s vocabulary. Adaptation of the English tests into Afrikaans, as opposed to O’Brien’s study (2015), which adapted English tests into isiZulu, may have positively affected the results as all English words had direct translation equivalents in Afrikaans, which was not the case in isiZulu. The comparison between simultaneous and sequential bilinguals within the bilingual group demonstrated that the simultaneous bilinguals’ mean receptive and expressive scores surpassed those obtained by the sequential bilingual participants. A significant difference was identified between simultaneous and sequential bilinguals’ composite receptive scores and Afrikaans expressive scores. Finally, only one monolingual participant scored below the peer group mean on both the receptive and expressive vocabulary tests, indicating low proficiency in English and risk of language impairment; however no bilingual participants were found to be language impaired when composite scoring was used. Conclusion: Bilingualism remains a rewarding area of investigation in South Africa. Afrikaans-English bilingual children performed significantly better than O’Brien’s (2015) isiZulu-English participants on a translated, originally English vocabulary test. Throughout this study the refinement of valid assessment tools for accurate description of bilingual children’s vocabulary was highlighted. The well-researched technique of composite scoring has proven to be valuable in avoiding overdiagnosis in South African bilingual children.MT201

    To be or not to be bilingual: cognitive processing skills and literacy development in monolingual English, emergent bilingual Zulu and English, as well as bilingual Afrikaans and English speaking children

    Get PDF
    A thesis submitted to the Faculty of Humanities, Department of Psychology at the University of the Witwatersrand, in fulfilment of the requirements for the degree of Doctor of Philosophy October 2016.Literacy in multilingual contexts includes social and cognitive dimensions (GoPaul-McNicol & Armour-Thomas, 1997). Becoming literate carries with it the ability to develop and access higher-order thinking skills that are the building blocks for cognitive academic language proficiency, as well as the means that define educational opportunities (Bialystok, 2007). South Africa has 11 official languages and a multilingual education policy but South African schools are able to determine their language of instruction policy of monolingualism or multilingualism (Heugh, 2010). This raises the question of whether monolingualism or bilingualism influences children’s successful acquisition of reading. It is important to investigate the effect this has on reading processes and skills of monolingual and bilingual children because this issue has received limited research attention while it contributes to our greater understanding of how children’s cognitive capacities for literacy attainment are either constrained or promoted through broader social factors operating in a child’s literacy-learning environment (Bialystok, 2007; Vygotsky, 1978). Cognitive processing and reading skills were assessed in monolingual and bilingual children at a public school in an urban area of Johannesburg. An English-speaking monolingual group with English as the language of instruction (N = 100) was compared with a Zulu-English bilingual group with Zulu as first language (L1) speaking proficiency and English as second language (L2) literacy experience (N = 100) on measures of reading, phonological awareness, vocabulary skills, and working memory. Performance in cognitive processing and reading skills of these two groups was compared to an Afrikaans-English bilingual group (N = 100) with dual medium instruction. Tests of language proficiency confirmed that the Afrikaans-English bilinguals were balanced bilinguals and that the Zulu-English bilinguals were partial bilinguals. Aim and method: The purpose of this study was to expand knowledge in the field of second language reading acquisition and language of instruction by examining the impact of language related factors on the cognitive development and literacy competence of monolingual and bilingual children in the South African context. The central tenet of the bio-ecological approach to language, cognitive and reading assessment is that language acquisition is inseparable from the context in which it is learned (Armour-Thomas & Go-Paul-McNicol, 1997). Drawing from this approach, the present research project investigated the effects of the level of orthographic transparency on reading development in the transparent L1 and opaque L2 of biliterate Afrikaans-English bilinguals learning to read in a dual medium school setting. The effects of oral vs. written language proficiency in the L1 on the acquisition of L2 English reading was also investigated by examining whether reading processes and skills transferred from one language to another and the direction or nature of this transfer in partial and balanced bilinguals. Finally, whether a balanced bilingualism and biliteracy Cognitive processing skills and literacy development in monolingual and bilingual children in South Africa vi experience had beneficial effects on cognitive tasks demanding high levels of working memory capacity, was investigated. Results: Reading in Afrikaans – the more transparent orthography – reached a higher competency level than reading in the less transparent English. Dual medium learners and L1 English monolingual learners acquired reading skills in their home language(s) at a higher level than L2 English with L1 Zulu speaking proficiency learners did. Dual medium learners outperformed both monolingual learners and L2 English with L1 Zulu speaking proficiency learners on tests of phonological awareness, working memory, and reading comprehension. They also reached similar competency levels in tests of vocabulary knowledge than monolingual English (L1) learners. These differences translated into different relationships and strengths for reading attainment in monolingual and bilingual children. These findings provide support for a language-based and context-dependent bio-ecological model of reading attainment for South African children. Conclusions: Bilingual children who are exposed to dual medium reading instruction programmes that value bilingualism philosophically and support it pedagogically create optimal conditions for high levels of cognitive development and academic achievement, both in the first and in the L2. Absence of mother tongue instruction and English-only instruction result in a reading achievement gap between emergent Zulu-English bilinguals and English monolinguals. This effect is not observed in the biliterate Afrikaans-English bilinguals; instead, these children performed better than the English monolinguals on many English tasks and working tasks requiring high levels of executive control and analysis of linguistic knowledge, despite English being their L2 while learning to concurrently read in Afrikaans and English. Arguments for and (misguided) arguments against dual medium education are examined to identify the consequences of translating this model of education into effective schooling practices, given the socio-political contexts in which educational reforms take place at local schools and in communities (Heugh, 2002). More broadly, good early childhood education includes a rich language learning environment with skilled, responsive teachers who facilitate children’s literacy learning by providing intentional exposure to and support for vocabulary and concept development. Classroom settings that provide extensive opportunities to build children’s reading competences are beneficial for young dual language learners no less than for children acquiring literacy skills in a one-language environment (Cummins, 2000; Heugh, 2002).GR201

    The relationship between proficiency in multiple languages and working memory: a study of multilingual advantages in South Africa.

    Get PDF
    A research project submitted in partial fulfilment of the requirements for the degree of MA in Psychology in the Faculty of Humanities, University of the Witwatersrand, Johannesburg, 20 June 2018This study explores the relationship between multilingualism and working memory. Multilingual advantages in various executive functions have been established, but little is known about whether multilingual advantages extend to working memory capacity and functioning, or about the effect of speaking more than two languages. In a sample of 189 multilingual young adults in South Africa, this study used a multiple regression design in which numerous aspects of multilingualism - balance in proficiency across and within languages, the age of acquisition of additional languages, and speaking a third language - could be compared with one another while controlling for socio-economic status. Four aspects of working memory (verbal storage, verbal processing, visuospatial storage and visuospatial processing), measured using the Automated Working Memory Assessment (Alloway, 2007), acted as the dependent variables in respective regressions while independent variables measuring multilingualism, including the continuous measures of balance in reading, speaking and understanding proficiency across languages, were based on self-report information from the Language Experience and Proficiency Questionnaire (LEAPQ; Marian, Blumenfeld, & Kaushanskaya, 2007). Balance in proficiency emerged as a strong predictor of the verbal processing component of working memory, while no aspect of multilingualism significantly predicted visuospatial working memory. Combined with other results, this finding suggested that the effect of multilingualism on working memory may not follow the pattern observed in other tasks where multilinguals are advantaged in domaingeneral executive functions (like inhibitory control) but disadvantaged in linguistic tasks. Multilinguals’ experience in storing and processing linguistic information may lead to advantages (possibly through managing attention) that are specific to this kind of information. Keywords: bilingual advantage, executive function, multilingual advantage, trilingualism, working memory !GR201
    • …
    corecore