257 research outputs found

    Sindhī Multiscriptality, Past and Present: A Sociolinguistic Investigation into Community Acceptance

    Get PDF
    This thesis is on the sociolinguistics of writing. It investigates the use of scripts for the Sindhī language of South Asia, both from a diachronic and synchronic perspective. The thesis first analyses the rich but understudied script history of the Sindhī language from the tenth century to modern times. In doing so, it investigates the domains in which certain scripts were used, and identifies definite patterns in their distribution. Particular attention is paid to Perso-Arabic and Devanāgarī, which emerged as the two most widely used scripts for the language in the twentieth century. The diachronic analysis draws on several linguistic, literary and other academic works on the Sindhī language and brings to the fore hitherto neglected data on historical script use for the language. The thesis then presents and analyses oral interview data on community opinion on the recent proposal to use the Roman script to read and write Sindhī. The synchronic analysis is based on original fieldwork data, comprising in-depth qualitative interviews with fifty members of the Indian Sindhī community of diverse backgrounds and ages from various geographical locations. Empirically, this work is one of the first to provide a comprehensive diachronic and synchronic review and analysis of script practices in the Sindhī community specifically from a sociolinguistic perspective. It also provides revealing insights into the kinds of expectations an urbanised, highly educated and socioeconomically successful minority has of a writing system for its language. In doing so, the study challenges the prevalent simplistic claim in the literature that minority communities are desirous of seeing their language in writing. Most importantly, this work indicates the emergence of a so-called new variety of Sindhī phonology in India, which differs subtly from the old variety phonology. The implications of this subtle shift in phonology for Sindhī pedagogical material form a key part of the findings of this study. Theoretically, this work contributes to the concept of orthographic transfer, which is the phenomenon of phoneme-grapheme correspondences in a particular orthography being inadvertently applied to another orthography. The study also affirms the presence of a scriptal diglossia, or digraphia, in script use for the Sindhī language, where the use of particular scripts for the language is implicitly determined by domain and context. The potential impact of orthographic transfer and digraphia on the pedagogy of lesser-learnt languages is a key part of the study’s findings. Methodologically, the juxtaposition of historical and present-day sociolinguistic factors at play offers a fresh and nuanced look at the rise and fall of scripts in the context of a language with a centuries-old written tradition. The study concludes that usage of a particular script for a language is not the result of a simplistic binary opposition between authoritarian imposition and voluntary choice. Rather, it is a reflection of several pragmatic and symbolic considerations by the community in question. The thesis puts into perspective the various psychological, socioeconomic and cultural forces at work in determining script use for the Sindhī language. In doing so, the thesis makes several additions not just to the existing body of knowledge on the Sindhī language, but also to the fledgling field of inquiry that is the sociolinguistics of writing. These varied and unique contributions set the study apart from previous research on the subject

    Review of Paul Newman and Martha Ratliff, eds., 'Linguistic Fieldwork'

    Get PDF

    The State of Language, Endangerment, and Policy in India: A Forking Path

    Get PDF
    The Indian subcontinent is one of the most linguistically diverse areas in the world. The 2011 Census of India reports over 1,950 languages and 720 dialects are spoken in India. Although India itself has speakers of four distinct language families, its people have a shared culture, genetics, and history that spans thousands of years. The languages spoken in India have grown, stymied, and influenced each other before reaching their current state. The multiplicity of languages led to implementation of institutionalized language protection measures during the Independence period. Despite these efforts, many languages remain at risk for endangerment and extinction. Language endangerment is not a problem unique to India. Ethnologue estimates approximately 42% of the world’s languages- about 3,018 languages- are endangered in 2021. Section Ⅰ of this paper will provide background information on language endangerment. Section Ⅱ will discuss the linguistic families that are spoken in India, their history, development, and current speaker range. Section Ⅲ will detail the history of language policy in India in three phases: during the Pre-British Colonial Period, the British Colonial Period, and during the Independence Period. Finally, Section Ⅳ will discuss the divergent nature of language vitality in India today

    Glyph guessing for 'oo' and 'ee': spatial frequency information in sound symbolic matching for ancient and unfamiliar scripts.

    Get PDF
    In three experiments, we asked whether diverse scripts contain interpretable information about the speech sounds they represent. When presented with a pair of unfamiliar letters, adult readers correctly guess which is /i/ (the 'ee' sound in 'feet'), and which is /u/ (the 'oo' sound in 'shoe') at rates higher than expected by chance, as shown in a large sample of Singaporean university students (Experiment 1) and replicated in a larger sample of international Internet users (Experiment 2). To uncover what properties of the letters contribute to different scripts' 'guessability,' we analysed the visual spatial frequencies in each letter (Experiment 3). We predicted that the lower spectral frequencies in the formants of the vowel /u/ would pattern with lower spatial frequencies in the corresponding letters. Instead, we found that across all spatial frequencies, the letter with more black/white cycles (i.e. more ink) was more likely to be guessed as /u/, and the larger the difference between the glyphs in a pair, the higher the script's guessability. We propose that diverse groups of humans across historical time and geographical space tend to employ similar iconic strategies for representing speech in visual form, and provide norms for letter pairs from 56 diverse scripts

    Persian as Koine: Written Persian in World-Historical Perspective

    Get PDF
    Persian emerged as the common language of court life and administration in the Islamic world east of Baghdad in the 8th and 9th centuries (2nd and 3rd centuries into the Islamic era). The process began in Khurasan, the large historical region of southwest-central Asia, which besides the northeast quadrant of modern Iran included most of modern Turkmenistan, Uzbekistan, and Tajikistan, and northern Afghanistan. Persian radiated out from the pre-Islamic cities that became new power centers, filling the vacuum left by the declining political (as distinct from symbolic) role of the Caliphate in Baghdad. Persian spread to its greatest extent five centuries later, under Mongol and Turkic administrations, when it stretched from the Balkans in the west to southern India in the south and along the trade routes into central China in the east. A century later, it began to give way to the rise of vernacular languages—first in the west, where the use of Ottoman Turkish increased in the 15th century. It finally declined significantly in the east in India in the 19th century, where the British replaced it formally with Urdu and English in 1835. Over the past century and a half Persian has undergone a process of functional transformation, passing into the status of a classical language, as locally people began to write in Pashto, Sindhi, Urdu, and other vernaculars in the peripheral territories of the Islamic world. In the 20th century, at the expense of losing its unitary identity and universally standard form, Persian achieved the modern status of national language in three countries—in Afghanistan, (where it was renamed dari), in Iran (as Fārsi), and in Tajikistan (where it was renamed tajiki, or tojiki when transliterated from Cyrillic). It is still spoken widely in Pakistan, Uzbekistan, and the southern littoral of the Persian Gulf, and continues to flourish among post-revolutionary diaspora communities in America, Asia, and Europe

    IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

    Full text link
    India has a rich linguistic landscape with languages from 4 major language families spoken by over a billion people. 22 of these languages are listed in the Constitution of India (referred to as scheduled languages) are the focus of this work. Given the linguistic diversity, high-quality and accessible Machine Translation (MT) systems are essential in a country like India. Prior to this work, there was (i) no parallel training data spanning all the 22 languages, (ii) no robust benchmarks covering all these languages and containing content relevant to India, and (iii) no existing translation models which support all the 22 scheduled languages of India. In this work, we aim to address this gap by focusing on the missing pieces required for enabling wide, easy, and open access to good machine translation systems for all 22 scheduled Indian languages. We identify four key areas of improvement: curating and creating larger training datasets, creating diverse and high-quality benchmarks, training multilingual models, and releasing models with open access. Our first contribution is the release of the Bharat Parallel Corpus Collection (BPCC), the largest publicly available parallel corpora for Indic languages. BPCC contains a total of 230M bitext pairs, of which a total of 126M were newly added, including 644K manually translated sentence pairs created as part of this work. Our second contribution is the release of the first n-way parallel benchmark covering all 22 Indian languages, featuring diverse domains, Indian-origin content, and source-original test sets. Next, we present IndicTrans2, the first model to support all 22 languages, surpassing existing models on multiple existing and new benchmarks created as a part of this work. Lastly, to promote accessibility and collaboration, we release our models and associated data with permissive licenses at https://github.com/ai4bharat/IndicTrans2
    corecore