4,489 research outputs found

    Evaluation of large language models using an Indian language LGBTI+ lexicon

    Full text link
    Large language models (LLMs) are typically evaluated on the basis of task-based benchmarks such as MMLU. Such benchmarks do not examine responsible behaviour of LLMs in specific contexts. This is particularly true in the LGBTI+ context where social stereotypes may result in variation in LGBTI+ terminology. Therefore, domain-specific lexicons or dictionaries may be useful as a representative list of words against which the LLM's behaviour needs to be evaluated. This paper presents a methodology for evaluation of LLMs using an LGBTI+ lexicon in Indian languages. The methodology consists of four steps: formulating NLP tasks relevant to the expected behaviour, creating prompts that test LLMs, using the LLMs to obtain the output and, finally, manually evaluating the results. Our qualitative analysis shows that the three LLMs we experiment on are unable to detect underlying hateful content. Similarly, we observe limitations in using machine translation as means to evaluate natural language understanding in languages other than English. The methodology presented in this paper can be useful for LGBTI+ lexicons in other languages as well as other domain-specific lexicons. The work done in this paper opens avenues for responsible behaviour of LLMs, as demonstrated in the context of prevalent social perception of the LGBTI+ community.Comment: Selected for publication in the AI Ethics Journal published by the Artificial Intelligence Robotics Ethics Society (AIRES

    Contact, the feature pool and the speech community : The emergence of Multicultural London English.

    Get PDF
    In Northern Europe’s major cities, new varieties of the host languages are emerging in the multilingual inner cities. While some analyse these ‘multiethnolects’ as youth styles, we take a variationist approach to an emerging ‘Multicultural London English’ (MLE), asking: (1) what features characterise MLE? (2) at what age(s) are they acquired? (3) is MLE vernacularised? (4) when did MLE emerge, and what factors enabled its emergence? We argue that innovations in the diphthongs and the quotative system are generated from the specific sociolinguistics of inner-city London, where at least half the population is undergoing group second-language acquisition and where high linguistic diversity leads to a feature pool to select from. We look for incrementation (Labov) in the acquisition of the features, but find this only for two ‘global’ changes, BE LIKE and GOOSE-fronting, for which adolescents show the highest usage. Community-internal factors explain the age-related variation in the remaining features

    Epistemological access through lecture materials in multiple modes and language varieties: the role of ideologies and multilingual literacy practices in student evaluations of such materials at a South African University

    Get PDF
    This paper seeks to address the ways in which ideology and literacy practices shape the responses of students to an ongoing initiative at the University of the Western Cape aimed at diversifying options for epistemological access, specifically the language varieties and the modes in which parts of the curriculum for a third year linguistics module are delivered. Students’ responses to the materials in English and in two varieties of Afrikaans and isiXhosa (as mediated in writing vs orally) are determined, and used as basis to problematize decisions on language variety and mode in language diversification initiatives in Higher Education in South Africa. The findings of the paper are juxtaposed against particular group interests in the educational use of a language as well as differences in the affordances and impact of different modes of language use. The paper suggests that beyond the euphoria of using languages other than English in South African Higher Education, several issues (such as entrenched language practices, beliefs and language management orientations) require attention if the goals of transformation in this sector are to be attained

    Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

    Full text link
    The primary focus of this thesis is to make Sanskrit manuscripts more accessible to the end-users through natural language technologies. The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions. We identify four fundamental tasks, which are crucial for developing a robust NLP technology for Sanskrit: word segmentation, dependency parsing, compound type identification, and poetry analysis. The first task, Sanskrit Word Segmentation (SWS), is a fundamental text processing task for any other downstream applications. However, it is challenging due to the sandhi phenomenon that modifies characters at word boundaries. Similarly, the existing dependency parsing approaches struggle with morphologically rich and low-resource languages like Sanskrit. Compound type identification is also challenging for Sanskrit due to the context-sensitive semantic relation between components. All these challenges result in sub-optimal performance in NLP applications like question answering and machine translation. Finally, Sanskrit poetry has not been extensively studied in computational linguistics. While addressing these challenges, this thesis makes various contributions: (1) The thesis proposes linguistically-informed neural architectures for these tasks. (2) We showcase the interpretability and multilingual extension of the proposed systems. (3) Our proposed systems report state-of-the-art performance. (4) Finally, we present a neural toolkit named SanskritShala, a web-based application that provides real-time analysis of input for various NLP tasks. Overall, this thesis contributes to making Sanskrit manuscripts more accessible by developing robust NLP technology and releasing various resources, datasets, and web-based toolkit.Comment: Ph.D. dissertatio
    • …
    corecore