164 research outputs found
Factors Affecting Part-of-Speech Tagging for Tagalog
PACLIC 23 / City University of Hong Kong / 3-5 December 200
An investigation into deviant morphology : issues in the implementation of a deep grammar for Indonesian
This thesis investigates deviant morphology in Indonesian for the implementation of a deep grammar. In particular we focus on the implementation of the verbal suffix -kan. This suffix has been described as having many functions, which alter the kinds of arguments and the number of arguments the verb takes (Dardjowidjojo 1971; Chung 1976; Arka 1993; Vamarasi 1999; Kroeger 2007; Son and Cole 2008). Deep grammars or precision grammars (Butt et al. 1999a; Butt et al. 2003; Bender et al. 2011) have been shown to be useful for natural language processing (NLP) tasks, such as machine translation and generation (Oepen et al. 2004; Cahill and Riester 2009; Graham 2011), and information extraction (MacKinlay et al. 2012), demonstrating the need for linguistically rich information to aid NLP tasks. Although these linguistically-motivated grammars are invaluable resources to the NLP community, the biggest drawback is the time required for the manual creation and curation of the lexicon. Our work aims to expedite this process by applying methods to assign syntactic information to kan-affixed verbs automatically. The method we employ exploits the hypothesis that semantic similarity is tightly connected with syntactic behaviour (Levin 1993). Our endeavour in automatically acquiring verbal information for an Indonesian deep grammar poses a number of lingustic challenges. First of all Indonesian verbs exhibit voice marking that is characteristic of the subgrouping of its language family. In order to be able to characterise verbal behaviour in Indonesian, we first need to devise a detailed analysis of voice for implementation. Another challenge we face is the claim that all open class words in Indonesian, at least as it is spoken in some varieties (Gil 1994; Gil 2010), cannot linguistically be analysed as being distinct from each other. That is, there is no distiction between nouns, verbs or adjectives in Indonesian, and all word from the open class categories should be analysed uniformly. This poses difficulties in implementing a grammar in a linguistically motivated way, as well discovering syntactic behaviour of verbs, if verbs cannot be distinguished from nouns. As part of our investigation we conduct experiments to verify the need to employ word class categories, and we find that indeed these are linguistically motivated labels in Indonesian. Through our investigation into deviant morphological behaviour, we gain a better characterisation of the morphosyntactic effects of -kan, and we discover that, although Indonesian has been labelled as a language with no open word class distinctions, word classes can be established as being linguistically-motivated
Social and structural aspects of language contact and change
This book brings together papers that discuss social and structural aspects of language contact and language change.
Several papers look at the relevance of historical documents to determine the linguistic nature of early contact varieties, while others investigate the specific processes of contact-induced change that were involved in the emergence and development of these languages. A third set of papers look at how new datasets and greater sensitivity to social issues can help to (re)assess persistent theoretical and empirical questions as well as help to open up new avenues of research. In particular they highlight the heterogeneity of contemporary language practices and attitudes often obscured in sociolinguistic research.
The contributions all focus on language variation and change but investigate it from a variety of disciplinary and empirical perspectives and cover a range of linguistic contexts
Social and structural aspects of language contact and change
This book brings together papers that discuss social and structural aspects of language contact and language change. Several papers look at the relevance of historical documents to determine the linguistic nature of early contact varieties, while others investigate the specific processes of contact-induced change that were involved in the emergence and development of these languages. A third set of papers look at how new datasets and greater sensitivity to social issues can help to (re)assess persistent theoretical and empirical questions as well as help to open up new avenues of research. In particular they highlight the heterogeneity of contemporary language practices and attitudes often obscured in sociolinguistic research. The contributions all focus on language variation and change but investigate it from a variety of disciplinary and empirical perspectives and cover a range of linguistic contexts
Social and structural aspects of language contact and change
This book brings together papers that discuss social and structural aspects of language contact and language change. Several papers look at the relevance of historical documents to determine the linguistic nature of early contact varieties, while others investigate the specific processes of contact-induced change that were involved in the emergence and development of these languages. A third set of papers look at how new datasets and greater sensitivity to social issues can help to (re)assess persistent theoretical and empirical questions as well as help to open up new avenues of research. In particular they highlight the heterogeneity of contemporary language practices and attitudes often obscured in sociolinguistic research. The contributions all focus on language variation and change but investigate it from a variety of disciplinary and empirical perspectives and cover a range of linguistic contexts
Recommended from our members
What Code-Switching Strategies are Effective in Dialogue Systems?
Since most people in the world today are multilingual, code-switching is ubiquitous in spoken and written interactions. Paving the way for future adaptive, multilingual conversational agents, we incorporate linguistically-motivated strategies of code-switching into a rule-based goal-oriented dialogue system. We collect and release CommonAmigos, a corpus of 587 human-computer text conversations between our dialogue system and human users in mixed Spanish and English. From this new corpus, we analyze the amount of elicited code-switching, preferred patterns of user code-switching, and the impact of user demographics on code-switching. Based on these exploratory findings, we give recommendations for future effective code-switching dialogue systems, highlighting user\u27s language proficiency and gender as critical considerations
- …