11 research outputs found

    Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop

    Get PDF
    Indonesian and Malay are underrepresented in the development of natural language processing (NLP) technologies and available resources are difficult to find. A clear picture of existing work can invigorate and inform how researchers conceptualise worthwhile projects. Using an education sector project to motivate the study, we conducted a wide-ranging overview of Indonesian and Malay human language technologies and corpus work. We charted 657 included studies according to Hirschberg and Manning's 2015 description of NLP, concluding that the field was dominated by exploratory corpus work, machine reading of text gathered from the Internet, and sentiment analysis. In this paper, we identify most published authors and research hubs, and make a number of recommendations to encourage future collaboration and efficiency within NLP in Indonesian and Malay

    Indonesian syllabification using a pseudo nearest neighbour rule and phonotactic knowledge

    No full text
    © 2016 Elsevier B.V. This paper discusses phonemic syllabification using a pseudo nearest neighbour rule (PNNR) and phonotactic knowledge for Indonesian language. The proposed data-driven model uses a four-feature phoneme encoding and a phonotactic-based pre-syllabification. Evaluating on 50 k words dataset using 5-fold cross-validation shows that the proposed encoding significantly reduces the average syllable error rate (SER) by 13.90% relatively to the commonly used orthogonal binary encoding and the pre-syllabification also reduces the average SER up to 17.17% relatively to the PNNR without pre-syllabification. Five-fold cross-validating proves that the proposed PNNR-based syllabification is stable by producing an average SER of 0.64%. Most errors come from derivatives with the prefixes ‘ber’, ‘per’, and ‘ter’ as well as from compound words. This result is also significantly lower than a Look-Up-based syllabification that gives an average SER of 2.60%.Suyanto S., Hartati S., Harjoko A., Van Compernolle D., ''Indonesian syllabification using a pseudo nearest neighbour rule and phonotactic knowledge'', Speech communication, vol. 85, pp. 109-118, December 2016.status: publishe

    An Investigation of Intelligibility and Lingua Franca Core Features in Indonesian Accented English

    Get PDF
    Recent approaches to teaching pronunciation of English in second or foreign language contexts have favoured the role of students’ L1 accents in the teaching and learning process with the emphasis on intelligibility and the use of English as a Lingua Franca rather than on achieving native like pronunciation. As far as English teaching in Indonesia is concerned, there is limited information on the intelligibility of Indonesian Accented English, as well as insufficient guidance on key pronunciation features for effective teaching. This research investigates features of Indonesian Accented English and critically assesses the intelligibility of different levels of Indonesian Accented English.English Speech data were elicited from 50 Indonesian speakers using reading texts. Key phonological features of Indonesian Accented English were investigated through acoustic analysis involving spectrographic observation using Praat Speech Analysis software. The intelligibility of different levels of Indonesian Accented English was measured using a transcription task performed by 24 native and non-native English listeners. The overall intelligibility of each accent was measured by examining the correctness of the transcriptions. The key pronunciation features which caused intelligibility failure were identified by analysing the incorrect transcriptions.The analysis of the key phonological features of Indonesian Accented English showed that while there was some degree of regularity in the production of vowel duration and consonant clusters, more individual variations were observed in segmental features particularly in the production of consonants /v, z, ʃ/ which are absent in the Indonesian phonemic inventory. The results of the intelligibility analysis revealed that although light and moderate accented speech data were significantly more intelligible than the heavier accented speech data, the native and non-native listeners did not have major problems with the intelligibility of Indonesian Accented English across the different accent levels. The analysis of incorrect transcriptions suggested that intelligibility failures were associated more with combined phonological miscues rather than a single factor. These results indicate that while Indonesian Accented English can be used effectively in international communication, it can also inform English language teaching in Indonesia

    Currents in Pacific linguistics : papers on Austronesian languages and ethnolinguistics in honour of George W. Grace

    Get PDF

    Foreigner-directed speech and L2 speech learning in an understudied interactional setting: the case of foreign-domestic helpers in Oman

    Get PDF
    Ph. D. (Integrated) ThesisSet in Arabic-speaking Oman, the present study investigates whether speech directed to foreign domestic helpers (FDH-directed speech) is modified when compared with speech addressed to native Arabic speakers. It also explores the FDH’s ability to learn the sound system of their L2 in a near-naturalistic setting. In relation to input, the study explores whether there are any adaptations in native speakers’ realizations of complex Arabic consonants, consonant clusters, and vowels in FDH-directed speech. By doing so, it compares the phonetic features of FDH-directed speech in relation to other speech registers such as foreigner-directed speech (FDS), infant-directed speech (IDS) and clear speech. The study also investigates whether foreign accentedness, religion and Arabic language experience, as indexed by length of residence (LoR), play a role in the extent of adaptations present in FDH-directed speech. In relation to L2 speech learning, the study investigates the extent to which FDHs are sensitive to the phonemic contrasts of Arabic and whether their production of complex Arabic consonants and consonant clusters is target-like. It also examines the social and linguistic factors (LoR, first and second language literacy) that play a role in the learnability of these sounds. Speech recordings were collected from 22 Omani female native Arabic speakers who interacted 1) with their FDHs and 2) with a native-speaking adult (the order was reversed for half of the participants), in both instances using a spot the difference task. A picture naming task was then used to collect data for production data by the same FDHs, while perception data consisted of an AX forced choice task. Results demonstrate the distinctiveness of FDH-directed speech from other speech registers. Neither simplification of complex sounds nor hyperarticulation of consonant contrasts were attested in FDH-directed speech, despite them being reported in other studies on FDS and IDS. We attribute this to the familiarity of the native speakers with their FDHs and the formulaic nature of their daily interactions. Expansion of vowel space was evident in this study, conforming with other FDS studies. Results from perception and production tasks revealed that FDHs fell short of native-like performance, despite the more naturalistic setting and regardless of LoR. L1 and L2 literacy played varying roles in FDHs’ phonological sensitivity and production of certain contrasts. The study is original is terms of showing that FDS is not an automatic outcome of interactions with L2 speakers and links these results with the unusual social setting

    Chamic and beyond : studies in mainland Austronesian languages

    Get PDF

    Kayardild Morphology, Phonology and Morphosyntax

    Get PDF
    Kayardild possesses one of, if not the, most exuberant systems of morphological concord known to linguists, and a phonological system which is intricately sensitive to its morphology. This dissertation provides a comprehensive description of the phonology of Kayardild, an investigation of its phonetics, its intonation, and a formal analysis of its inflectional morphology. A key component of the latter is the existence of a ‘morphomic’ level of representation intermediate between morphosyntactic features and underlying phonological forms. Chapter 2 introduces the segmental inventory of Kayardild, the phonetic realisations of surface segments, and their phonotactics. Chapter 3 provides an introduction to the empirical facts of Kayardild word structure, outlining the kinds of morphs of which words are composed, their formal shapes and their combinations. Chapter 4 treats the segmental phonology of Kayardild. After a survey of the mappings between underlying and (lexical) surface forms, the primary topic is the interaction of the phonology with morphology, although major generalisations identifiable in the phonology itself are also identified and discussed. Chapter 5 examines Kayardild stress, and presents a constraint based analysis, before turning to an empirical and analytical discussion of intonation. Chapter 6, on the syntax and morphosyntax of Kayardild, is most substantial chapter of the dissertation. In association with the examination of a large corpus of new and newly collated data, mutually compatible analyses of the syntax and morphosyntactic features of Kayardild are built up and compared against less favourable alternatives. A critical review of Evans’ (1995a) analysis of similar phenomena is also provided. Chapter 7 turns to the realisational morphology — the component of the grammar which ties the morphosyntax to the phonology, by realising morphosyntactic features structures as morphomic representations, then morphomic representations as underlying phonological representations. A formalism is proposed in order to express these mappings within a constraint based grammar. In addition to enriching our understanding of Kayardild, the dissertation presents data and analyses which will be of interest for theories of the interface between morphology on the one hand and phonology and syntax on the other, as well as for morphological and phonological theory more narrowly

    The Mehweb language

    Get PDF
    This book is an investigation into the grammar of Mehweb (Dargwa, East Caucasian also known as Nakh-Daghestanian) based on several years of team fieldwork. Mehweb is spoken in one village community in Daghestan, Russia, with a population of some 800 people, In many ways, Mehweb is a typical East Caucasian language: it has a rich inventory of consonants; an extensive system of spatial forms in nouns and converbs and volitional forms in verbs; pervasive gender-number agreement; and ergative alignment in case marking and in gender agreement. It is also a typical language of the Dargwa branch, with symmetrical verb inflection in the imperfective and perfective paradigm and extensive use of spatial encoding for experiencers. Although Mehweb is clearly close to the northern varieties of Dargwa, it has been long isolated from the main body of Dargwa varieties by speakers of Avar and Lak

    The Austronesian languages

    No full text
    This is a revised edition of the 2009 The Austronesian languages, which was published as a paperback in the then Pacific Linguistics series (ISBN 9780858836020). This revision includes typographical corrections, an improved index, and various minor content changes. The release of the open access edition serves to meet the strong ongoing demand for this important handbook, of which only 200 copies of the first edition were printed. This is the first single-authored book that attempts to describe the Austronesian language family in its entirety. Topics covered include: the physical and cultural background, official and national languages, largest and smallest languages in all major geographical regions, language contact, sound systems, linguistic palaeontology, morphology, syntax, the history of scholarship on Austronesian languages, and a critical assessment of the reconstruction of Proto Austronesian phonology.Australian National University, College of Asia and the Pacifi

    Essays on phonology, morphology and syntax

    Get PDF
    This book is an investigation into the grammar of Mehweb (Dargwa, East Caucasian also known as Nakh-Daghestanian) based on several years of team fieldwork. Mehweb is spoken in one village community in Daghestan, Russia, with a population of some 800 people, In many ways, Mehweb is a typical East Caucasian language: it has a rich inventory of consonants; an extensive system of spatial forms in nouns and converbs and volitional forms in verbs; pervasive gender-number agreement; and ergative alignment in case marking and in gender agreement. It is also a typical language of the Dargwa branch, with symmetrical verb inflection in the imperfective and perfective paradigm and extensive use of spatial encoding for experiencers. Although Mehweb is clearly close to the northern varieties of Dargwa, it has been long isolated from the main body of Dargwa varieties by speakers of Avar and Lak. As a result of both independent internal evolution and contact with its neighbours, Mehweb developed some deviant properties, including accusatively aligned egophoric agreement, a split in the feminine class, and the typologically rare grammatical categories of verificative and apprehensive. But most importantly, Mehweb is where our friends live
    corecore