Search CORE

4,489 research outputs found

Evaluation of large language models using an Indian language LGBTI+ lexicon

Author: Dange Alpana
Joshi Aditya
Rawat Shruta
Publication venue
Publication date: 26/10/2023
Field of study

Large language models (LLMs) are typically evaluated on the basis of task-based benchmarks such as MMLU. Such benchmarks do not examine responsible behaviour of LLMs in specific contexts. This is particularly true in the LGBTI+ context where social stereotypes may result in variation in LGBTI+ terminology. Therefore, domain-specific lexicons or dictionaries may be useful as a representative list of words against which the LLM's behaviour needs to be evaluated. This paper presents a methodology for evaluation of LLMs using an LGBTI+ lexicon in Indian languages. The methodology consists of four steps: formulating NLP tasks relevant to the expected behaviour, creating prompts that test LLMs, using the LLMs to obtain the output and, finally, manually evaluating the results. Our qualitative analysis shows that the three LLMs we experiment on are unable to detect underlying hateful content. Similarly, we observe limitations in using machine translation as means to evaluate natural language understanding in languages other than English. The methodology presented in this paper can be useful for LGBTI+ lexicons in other languages as well as other domain-specific lexicons. The work done in this paper opens avenues for responsible behaviour of LLMs, as demonstrated in the context of prevalent social perception of the LGBTI+ community.Comment: Selected for publication in the AI Ethics Journal published by the Artificial Intelligence Robotics Ethics Society (AIRES

arXiv.org e-Print Archive

Contact, the feature pool and the speech community : The emergence of Multicultural London English.

Author: Androutsopoulos
Bickerton
Britain
Britain
Britain
Brown
Buchstaller
Castells
Chambers
Chambers
Cheshire
Cheshire
Clyne
Eckert
Gabrielatos
Hickey
Horvath
Inwood
Kerswill
Kerswill
Kerswill
Kerswill
Kerswill
Kotsinas
Krashen
Labov
Labov
Labov
Labov
Labov
Labov
Labov
Lapidus
Lass
Le Page
Lobanov
Macaulay
Mesthrie
Meyerhoff
Milroy
Milroy
Mufwene
Newton
Quist
Rampton
Roberts
Schilling-Estes
Schumann
Sebba
Sebba
Siegel
Smith
Svendsen
Tagliamonte
Thomason
Torgersen
Torgersen
Trudgill
Watermeyer
Wells
Wiese
Wiese
Winford
Publication venue: 'Wiley'
Publication date: 01/04/2011
Field of study

In Northern Europe’s major cities, new varieties of the host languages are emerging in the multilingual inner cities. While some analyse these ‘multiethnolects’ as youth styles, we take a variationist approach to an emerging ‘Multicultural London English’ (MLE), asking: (1) what features characterise MLE? (2) at what age(s) are they acquired? (3) is MLE vernacularised? (4) when did MLE emerge, and what factors enabled its emergence? We argue that innovations in the diphthongs and the quotative system are generated from the specific sociolinguistics of inner-city London, where at least half the population is undergoing group second-language acquisition and where high linguistic diversity leads to a feature pool to select from. We look for incrementation (Labov) in the acquisition of the features, but find this only for two ‘global’ changes, BE LIKE and GOOSE-fronting, for which adolescents show the highest usage. Community-internal factors explain the age-related variation in the remaining features

Epistemological access through lecture materials in multiple modes and language varieties: the role of ideologies and multilingual literacy practices in student evaluations of such materials at a South African University

Author: A Pavlenko
B Busch
B Spolsky
Bassey E. Antia
BE Antia
BE Antia
C Baker
C Walt Van der
Charlyn Dyers
Department of Education
Department of Education
E Ramani
G Kress
J Blaauw
J Brown
J Weber
JT Irvine
L Plooy du
M Absalom
M Paxton
N Fairclough
N Hornberger
O García
R Lund
R Ruiz
S Makoni
S Makoni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

This paper seeks to address the ways in which ideology and literacy practices shape the responses of students to an ongoing initiative at the University of the Western Cape aimed at diversifying options for epistemological access, specifically the language varieties and the modes in which parts of the curriculum for a third year linguistics module are delivered. Students’ responses to the materials in English and in two varieties of Afrikaans and isiXhosa (as mediated in writing vs orally) are determined, and used as basis to problematize decisions on language variety and mode in language diversification initiatives in Higher Education in South Africa. The findings of the paper are juxtaposed against particular group interests in the educational use of a language as well as differences in the affordances and impact of different modes of language use. The paper suggests that beyond the euphoria of using languages other than English in South African Higher Education, several issues (such as entrenched language practices, beliefs and language management orientations) require attention if the goals of transformation in this sector are to be attained

Linguistically-Informed Neural Architectures for Lexical, Syntactic and Semantic Tasks in Sanskrit

Author: Sandhan Jivnesh
Publication venue
Publication date: 17/08/2023
Field of study

The primary focus of this thesis is to make Sanskrit manuscripts more accessible to the end-users through natural language technologies. The morphological richness, compounding, free word orderliness, and low-resource nature of Sanskrit pose significant challenges for developing deep learning solutions. We identify four fundamental tasks, which are crucial for developing a robust NLP technology for Sanskrit: word segmentation, dependency parsing, compound type identification, and poetry analysis. The first task, Sanskrit Word Segmentation (SWS), is a fundamental text processing task for any other downstream applications. However, it is challenging due to the sandhi phenomenon that modifies characters at word boundaries. Similarly, the existing dependency parsing approaches struggle with morphologically rich and low-resource languages like Sanskrit. Compound type identification is also challenging for Sanskrit due to the context-sensitive semantic relation between components. All these challenges result in sub-optimal performance in NLP applications like question answering and machine translation. Finally, Sanskrit poetry has not been extensively studied in computational linguistics. While addressing these challenges, this thesis makes various contributions: (1) The thesis proposes linguistically-informed neural architectures for these tasks. (2) We showcase the interpretability and multilingual extension of the proposed systems. (3) Our proposed systems report state-of-the-art performance. (4) Finally, we present a neural toolkit named SanskritShala, a web-based application that provides real-time analysis of input for various NLP tasks. Overall, this thesis contributes to making Sanskrit manuscripts more accessible by developing robust NLP technology and releasing various resources, datasets, and web-based toolkit.Comment: Ph.D. dissertatio

arXiv.org e-Print Archive