44 research outputs found

    A Sentiment Analysis Dataset for Code-Mixed Malayalam-English

    Get PDF
    There is an increasing demand for sentiment analysis of text from social media which are mostly code-mixed. Systems trained on monolingual data fail for code-mixed data due to the complexity of mixing at different levels of the text. However, very few resources are available for code-mixed data to create models specific for this data. Although much research in multilingual and cross-lingual sentiment analysis has used semi-supervised or unsupervised methods, supervised methods still performs better. Only a few datasets for popular languages such as English-Spanish, English-Hindi, and English-Chinese are available. There are no resources available for Malayalam-English code-mixed data. This paper presents a new gold standard corpus for sentiment analysis of code-mixed text in Malayalam-English annotated by voluntary annotators. This gold standard corpus obtained a Krippendorff's alpha above 0.8 for the dataset. We use this new corpus to provide the benchmark for sentiment analysis in Malayalam-English code-mixed texts

    Language variation in Gulf Pidgin Arabic

    Get PDF
    PhD Thesisworks such as Smart 1990, Hobrom 1996, Wiswal 2002, Gomaa 2007, Almoaily 2008, Naess 2008, Bakir 2010, and Alshammari 2010. Importantly, since GPA is spoken by a non-indigenous workforce over a wide geographical area in a multi-ethnic speech community, language variation seems inevitable. However, to date, there is no account of variation in GPA conditioned by substrate language or length of stay. Therefore, in this thesis I analyse the impact of the first language of the speakers and the number of years of residency in their location in the Gulf as potential factors conditioning language variation in GPA. The data-base for the study consists of interviews with sixteen informants from three linguistic backgrounds: Malayalam, Bengali, and Punjabi. Interviews were conducted in two cities in Saudi Arabia: Riyadh and Alkharj. Half of the data is produced by informants who have spent five or less years in the Gulf while the other half has spent ten or more years in the Gulf by the time they were interviewed. The analysis is based on ten morpho-syntactic phenomena: free or bound object or possessive pronoun, presence or absence of the Arabic definiteness marker, presence or absence of Arabic conjunction markers, presence or absence of the GPA copula, and presence or absence of agreement in the verb phrase and the noun phrase. Given the fact that most of the current theories on contact languages have been made on the basis of Indo-European language based pidgins and creoles, analysing the above features in an Arabic-based pidgin promises to be a great addition to the literature of pidgins and creoles. Results of this thesis show that both first language and number of years of stay in the Gulf seem to have little effect on my informants’ choices as regards the studied morpho-syntactic features. There is a significant adaptation to the system of Gulf Arabic (the lexifier language) only with respect to one feature: conjunction markers. This finding could be taken to support Universalist theories of the emergence of contact languages. However, some substratal effect can still be noticed in the data

    ‘North Indians’ and ‘south Indians’ online: a discursive psychological study of the use of membership categories on social media

    Get PDF
    This thesis examines the use of categories ‘north Indian’ and ‘south Indian’ by social media users in online conversations. Anecdotal evidence shows that people use these categories in everyday conversations with friends, family members, colleagues, and peers, to discuss differences of language, geography, race, ethnicity, and caste between peoples of India. However, they are elusive categories in academic literature. My review of India’s social and cultural history of the 19th and 20th century suggests that current scholarship has not examined the use of the explicitly labelled categories ‘north Indian’ or ‘south Indian.’ There have instead been studies of the peoples of north India and south India through accounts of other socially constructed categories like language, caste, region, or race. In the 20th century these other constructed categories and the peoples of north India and south India were also mobilised in political movements such as state reorganisation on linguistic basis in 1956 and were therefore of academic interest for political scientists. Despite serving such varied purposes, the specific use of ‘north Indian’ and ‘south Indian’ has not been systematically investigated in social psychology. Moreover, the interest in these groups has seemingly declined in these other fields such as history, political science, and anthropology. While these categories have not been extensively investigated in academic literature, I reviewed some work from media and culture studies; my analysis of films and popular culture shows that descriptions of ‘north Indian’ and ‘south Indian’ people are depicted in films and media through storylines, song lyrics, dialogues, costumes, and food. Interactions from social media also reveal that people use these categories in mundane conversations with each other. Drawing on Conversation Analysis’ (CA) and Discursive Psychology’s (DP) interest in and assumptions about categories that they are mobilised for local purposes in interactions, I systematically examine how, when, and for what purposes these categories (‘north Indian’ or ‘south Indian’) are used in such mundane, social media interactions. The data collected for analysis includes ‘threads’ from Twitter and Question-and-Answer ‘posts’ from Quora. By taking a CA/DP approach, I identify and examine four contexts in which the categories ‘north Indian’ and ‘south Indian’ are invoked. The first is that of agreeing or disagreeing with a food assessment of a south Indian food, idlis. The analysis shows that membership of the category ‘south Indian’ was used as an epistemic resource to agree or disagree with the food assessment. I also present some instances wherein the category ‘south Indian’ is invoked as a resource to question the legitimacy of proffering the assessment; this is done by treating the assessed food (idlis) as a cultural object that is tied to membership of the category ‘south Indian.’ The second context in which the categories are invoked is that of complaining about someone’s use of the categories ‘north Indian’ or ‘south Indian.’ This is identified as the complainable matter because the complainer infers it as morally condemnable and criticisable. The complainer also constructs the complainable conduct as a recurring pattern of behaviour and as intentional, which marks the criticism of category use as a complaint. The third context is that of asking ‘loaded’ questions and answering them. I present two questions posted on Twitter that are phrased as information-seeking questions. These questions are also ambiguous, which is exploited by those answering them. This is demonstrated by looking at the answers because they construct the question as doing more than merely seeking information. I argue that the questions are treated as being ‘loaded’ with unfair expectations category members. Users reply to these questions with indirect answers, by posing counter questions, or by invoking alternative (more ‘appropriate’) categories and category-bound attributes. The fourth context analysed in this thesis is also of Question-and-Answers, but these data are from Quora. The categories ‘north Indian’ or ‘south Indian’ are invoked within questions seeking descriptions of members of these categories. The answers contain detailed descriptions or lists of attributes or characteristics. In some instances, the answers also contain hedging or disclaimers, which may allow those answering to manage the delicateness of producing general lists and inoculate themselves against accusations of sounding biased or harsh in producing the descriptions of categories ‘north Indian’ or ‘south Indian.’ This examination of the categories allows me to draw some important conclusions. First, ‘north Indian’ and ‘south Indian’ are treated by users as meaningful categories in describing recognisable characteristics of people. I argue that invoking the category ‘north Indian’ or ‘south Indian’ serves critical purposes in an ongoing interaction, like allowing Twitter and Quora users to accomplish actions like agreeing or disagreeing with an assessment, complaining, or asking and answering different types of questions. The discursive analysis also allows me to examine phenomena, like assessing, complaining, and question-answering, and to situate my findings within the existing literature. I also show that social media users make use of various features, like replying, mentioning, liking or upvoting, and adding emojis, to aid in accomplish the various discursive actions identified. This adds to the ongoing conversation analytic study of interactions in the virtual space, particularly on social media. Importantly, I argue that the categories ‘north Indian’ and ‘south Indian’ are very much ‘alive’, meaningful, and functional to the people using them and this thesis is a novel study to examine members’ use of these seemingly ‘elusive’ categories

    Topics in the morphophonology of standard spoken Tamil (SST) : an optimality theoretic study

    Get PDF
    This thesis provides a novel account of the morphophonology of Standard Spoken Tamil (SST) in a constraint-based framework. Special focus is given to the constraints governed by sonority-distance in avoiding possible tension at morphology and phonology interfaces (M-P interfaces). The study is based on a thorough analysis of an extensive body of data which constitute empirical evidence for the present research. It has been argued that the repair strategies devised at M-P interfaces can be properly predicted from the perspective of sonority distance between the segments occupying the edges of the preceding and succeeding lexical items. This thesis consists of seven chapters. The first chapter, in addition to laying a background for the present study, also gives theoretical and empirical evidence justifying the need for conducting a constraint-based study for long-running issues on the morphophonology of Tamil. The chapter includes an overview of widely applied SST in Malaysia, the source which provided statistical and empirical evidence for the present study, a brief review of the related literature, and description of the aims of the study, research questions, methodology, limitations of the study and the organization of the chapters. Chapter two, the theoretical framework of sonority-related repair strategies (SrRS) at M-P interfaces in Tamil, introduces the theoretical framework guiding the present thesis. This chapter illustrates the sonority requirement underpinning the solutions at different types of interfaces, namely, vowel hiatus ((i) vowel versus vowel (V-V)), onset/coda asymmetry ((ii) consonant versus consonant (C-C)), general alignment ((iii) consonant versus vowel (C-V)), and less-preferred interaction of (iv) the vowel versus consonant (V-C). This chapter clarifies the relevance of sonority distance and the selection of the correct strategies to resolve conflict at M-P interfaces. The third chapter is on the prosodic phonology of the SST. It provides a description of the prosodic phonology of standard spoken Tamil without relying upon a particular theoretical framework. The description is intended to provide insight into the overall phonological patterns of lexemes and the phonological properties of the language. v Chapter four, vowel hiatus (_V# + #V_) and SrRS in Tamil, deals with issues relating to vowel hiatus (VH), which commonly emerge when two vowels come into contact as a result of morphological concatenation. Tamil as an agglutinative language which applies various processes to word result in to various types of V# + #V_ interfaces. The language employs a range of sonority related resolutions to avoid vowel hiatus, with the sole aim of maintaining the uniformity of word internal syllables and preserving harmonic contact at the M-P interfaces. This chapter explores the sonority-related motivation behind the assignment of glides, vowel deletion (VD), and epenthesis to avoid hiatus. Chapter five is on _C# versus #C_ interfaces and conflict management in Tamil. It deals with sonority-related resolutions applied to avoid Onset-Coda asymmetries in Tamil. Irregularities resulting from consonant versus consonant (_C# versus #C_ ) interaction at M-P interfaces are aggressively initiated by various segmental and sub-segmental properties. Involvement of segmental values including the visible individual segmental values and the invisible sub-strength properties such as sonority, prosodic features and the positional prominences at the interfaces have been analyzed within the positional faithfulness framework in this chapter. Chapter six deals with _C#_#V_ (C-V) and _V#_C#_ (V-C) types of interactions in Tamil. Though these interactions appear to be a simple form of interaction at face value, they exhibit systematic and interesting phonological reactions at M-P interfaces. Previous studies analyzing the nature of the phonological reactions of C-V and V-C in literature, which have treated the foregoing interfaces as a natural way of forming demisyllables, have to a great extent obscured their amazing phonological relevance. The present study offers alternative remedies, claiming that the C-V and V-C interfaces are hosting equally important phonological reactions just as in the case of vowel hiatus (V-V) and coda and onset asymmetry (C-C), casting relevance on sonority distance. The last chapter is the conclusion. It provides a summary and discussion of the findings.EThOS - Electronic Theses Online ServiceUniversity of MalayaGBUnited Kingdo

    Prosodic analysis and Asian linguistics : to Honour R.K. Sprigg

    Get PDF

    Handbook of Lexical Functional Grammar

    Get PDF
    Lexical Functional Grammar (LFG) is a nontransformational theory of linguistic structure, first developed in the 1970s by Joan Bresnan and Ronald M. Kaplan, which assumes that language is best described and modeled by parallel structures representing different facets of linguistic organization and information, related by means of functional correspondences. This volume has five parts. Part I, Overview and Introduction, provides an introduction to core syntactic concepts and representations. Part II, Grammatical Phenomena, reviews LFG work on a range of grammatical phenomena or constructions. Part III, Grammatical modules and interfaces, provides an overview of LFG work on semantics, argument structure, prosody, information structure, and morphology. Part IV, Linguistic disciplines, reviews LFG work in the disciplines of historical linguistics, learnability, psycholinguistics, and second language learning. Part V, Formal and computational issues and applications, provides an overview of computational and formal properties of the theory, implementations, and computational work on parsing, translation, grammar induction, and treebanks. Part VI, Language families and regions, reviews LFG work on languages spoken in particular geographical areas or in particular language families. The final section, Comparing LFG with other linguistic theories, discusses LFG work in relation to other theoretical approaches

    Indic Manuscript Cultures through the Ages

    Get PDF
    Stemming from the Sanskrit Manuscripts Project that ran in Cambridge (UK) in 2011-2014 and led to the cataloguing and partial digitization of the rich collections of South Asian manuscripts in the University Library, these essays explore the manuscript culture of India and beyond – Nepal, Cambodia, Tibet – from a variety of angles: books as artefacts, works of art, commodities, staples of tradition, and of course as repositories of knowledge

    Indic Manuscript Cultures through the Ages

    Get PDF
    Stemming from the Sanskrit Manuscripts Project that ran in Cambridge (UK) in 2011-2014 and led to the cataloguing and partial digitization of the rich collections of South Asian manuscripts in the University Library, these essays explore the manuscript culture of India and beyond – Nepal, Cambodia, Tibet – from a variety of angles: books as artefacts, works of art, commodities, staples of tradition, and of course as repositories of knowledge

    Contentious Ethics Creativity and Persuasion among Environmental Organizers in South India

    Full text link
    What makes a person take up a cause? This ethnographic study of environmental and social activists in Kerala, India examines how they commit themselves to normative visions for social transformation and how they attempt to persuade others to take up these causes as well. Through thick description of the causal forces at play in these processes, I attempt to push beyond the binary between freedom and determinism in ethical life. This study is based on thirty-two months of fieldwork conducted between 2005 and 2014 with activists in Kerala’s “people's struggles,” a mode of grassroots community organizing primarily concerned with the impacts of industrial pollution, land rights, and other environmental conflicts. Fieldwork focused on two groups of activists as they collaborated on a campaign to stop pollution from a suburban gelatin factory. The first group was a local action council formed by nearby residents to protest the health effects of the factory’s emissions. The second group was a network of environmentalists who supported such campaigns as part of a broader effort at radically transforming environmental values. Making use of archival data, recordings of face-to-face interaction, participant observation, and interviews, the study follows activists as they transformed their own ethical lives—learning protest songs, going to marches instead of going to work, or giving up tea and Western medicine—and also as they attempted to persuade others with magazine articles, roadside speeches, and guided tours of pollution. This dissertation challenges dominant accounts of purpose and agency in literatures on social movements, community organizing, and the anthropology of ethics. Drawing on moral philosophy and the linguistic anthropology of stance, I trace relations of influence from evaluating subject to evaluated object, object to subject, and between subjects. I show that the causes of people’s struggle activists are best understood not as functions of predetermined interests, nor as the creations of radically free subjects, but as products of activists’ interactions with social others and a value-laden world. Describing the entanglements of changing oneself and changing others in people’s struggle activism, I argue for the importance of various “unfreedoms” in even the most strategic, norm-contesting ethical projects.PHDSocial Work & AnthropologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/138708/1/jmathias_1.pd
    corecore