4,489 research outputs found
Evaluation of large language models using an Indian language LGBTI+ lexicon
Large language models (LLMs) are typically evaluated on the basis of
task-based benchmarks such as MMLU. Such benchmarks do not examine responsible
behaviour of LLMs in specific contexts. This is particularly true in the LGBTI+
context where social stereotypes may result in variation in LGBTI+ terminology.
Therefore, domain-specific lexicons or dictionaries may be useful as a
representative list of words against which the LLM's behaviour needs to be
evaluated. This paper presents a methodology for evaluation of LLMs using an
LGBTI+ lexicon in Indian languages. The methodology consists of four steps:
formulating NLP tasks relevant to the expected behaviour, creating prompts that
test LLMs, using the LLMs to obtain the output and, finally, manually
evaluating the results. Our qualitative analysis shows that the three LLMs we
experiment on are unable to detect underlying hateful content. Similarly, we
observe limitations in using machine translation as means to evaluate natural
language understanding in languages other than English. The methodology
presented in this paper can be useful for LGBTI+ lexicons in other languages as
well as other domain-specific lexicons. The work done in this paper opens
avenues for responsible behaviour of LLMs, as demonstrated in the context of
prevalent social perception of the LGBTI+ community.Comment: Selected for publication in the AI Ethics Journal published by the
Artificial Intelligence Robotics Ethics Society (AIRES
Contact, the feature pool and the speech community : The emergence of Multicultural London English.
In Northern Europe’s major cities, new varieties of the host languages are emerging in the multilingual inner cities. While some analyse these ‘multiethnolects’ as youth styles, we take a variationist approach to an emerging ‘Multicultural London English’ (MLE), asking: (1) what features characterise MLE? (2) at what age(s) are they acquired? (3) is MLE vernacularised? (4) when did MLE emerge, and what factors enabled its emergence? We argue that innovations in the diphthongs and the quotative system are generated from the specific sociolinguistics of inner-city London, where at least half the population is undergoing group second-language acquisition and where high linguistic diversity leads to a feature pool to select from. We look for incrementation (Labov) in the acquisition of the features, but find this only for two ‘global’ changes, BE LIKE and GOOSE-fronting, for which adolescents show the highest usage. Community-internal factors explain the age-related variation in the remaining features
Epistemological access through lecture materials in multiple modes and language varieties: the role of ideologies and multilingual literacy practices in student evaluations of such materials at a South African University
This paper seeks to address the ways in which ideology and literacy practices shape the responses of students to an ongoing initiative at the University of the Western Cape aimed at diversifying options for epistemological access, specifically the language varieties and the modes in which parts of the curriculum for a third year linguistics module are delivered. Students’ responses to the materials in English and in two varieties of Afrikaans and isiXhosa (as mediated in writing vs orally) are determined, and used as basis to problematize decisions on language variety and mode in language diversification initiatives in Higher Education in South Africa. The findings of the paper are juxtaposed against particular group interests in the educational use of a language as well as differences in the affordances and impact of different modes of language use. The paper suggests that beyond the euphoria of using languages other than English in South African Higher Education, several issues (such as entrenched language practices, beliefs and language management orientations) require attention if the goals of transformation in this sector are to be attained
Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit
The primary focus of this thesis is to make Sanskrit manuscripts more
accessible to the end-users through natural language technologies. The
morphological richness, compounding, free word orderliness, and low-resource
nature of Sanskrit pose significant challenges for developing deep learning
solutions. We identify four fundamental tasks, which are crucial for developing
a robust NLP technology for Sanskrit: word segmentation, dependency parsing,
compound type identification, and poetry analysis. The first task, Sanskrit
Word Segmentation (SWS), is a fundamental text processing task for any other
downstream applications. However, it is challenging due to the sandhi
phenomenon that modifies characters at word boundaries. Similarly, the existing
dependency parsing approaches struggle with morphologically rich and
low-resource languages like Sanskrit. Compound type identification is also
challenging for Sanskrit due to the context-sensitive semantic relation between
components. All these challenges result in sub-optimal performance in NLP
applications like question answering and machine translation. Finally, Sanskrit
poetry has not been extensively studied in computational linguistics.
While addressing these challenges, this thesis makes various contributions:
(1) The thesis proposes linguistically-informed neural architectures for these
tasks. (2) We showcase the interpretability and multilingual extension of the
proposed systems. (3) Our proposed systems report state-of-the-art performance.
(4) Finally, we present a neural toolkit named SanskritShala, a web-based
application that provides real-time analysis of input for various NLP tasks.
Overall, this thesis contributes to making Sanskrit manuscripts more accessible
by developing robust NLP technology and releasing various resources, datasets,
and web-based toolkit.Comment: Ph.D. dissertatio
- …