56 research outputs found

    Automatic Extraction Of Malay Compound Nouns Using A Hybrid Of Statistical And Machine Learning Methods

    Get PDF
    Identifying of compound nouns is important for a wide spectrum of applications in the field of natural language processing such as machine translation and information retrieval. Extraction of compound nouns requires deep or shallow syntactic preprocessing tools and large corpora. This paper investigates several methods for extracting Noun compounds from Malay text corpora. First, we present the empirical results of sixteen statistical association measures of Malay <N+N> compound nouns extraction. Second, we introduce the possibility of integrating multiple association measures. Third, this work also provides a standard dataset intended to provide a common platform for evaluating research on the identification compound Nouns in Malay language. The standard data set contains 7,235 unique N-N candidates, 2,970 of them are N-N compound nouns collocations. The extraction algorithms are evaluated against this reference data set. The experimental results  demonstrate that a group of association measures (T-test , Piatersky-Shapiro (PS) , C_value, FGM and  rank combination method) are the best association measure and outperforms the other association measures for <N+N> collocations in the Malay  corpus. Finally, we describe several classification methods for combining association measures scores of the basic measures, followed by their evaluation. Evaluation results show that classification algorithms significantly outperform individual association measures. Experimental results obtained are quite satisfactory in terms of the Precision, Recall and F-score

    Transforming noun phrase structure form into rules to detect compound nouns in Malay sentences

    Get PDF
    This paper addresses the process of transforming the noun phrase structure form into a list of rules to detect compound noun words in Malay sentences.Rules are collection of word syntax that are derived from a specific resource (as defined in our study).Comprehension of the concept rule used in a system is important (i.e. using rules to find a list of compound nouns that may exist in a sentence).The noun phrase frame structure is a form that contains a list of noun modifier categories.The list of noun modifier categories is then divided into several sub-categories such as numeral, numeral classifier, appellation, etc. All categories are arranged in sequence based on correct grammar.The noun phrase frame structure is then used to analyse the sentence.The words in the sentence will be arranged according to their suitable noun modifier category as defined by the noun phrase frame structure.In terms of data requirements, we will only focus on examples of sentences that combine two noun phrases

    Detection of Compound Word with Combination Noun and Adjective using Rule Based Technique in Malay Standard Document

    Get PDF
    In this paper we describe our methods for detecting the compound word with combination of Noun and Adjective Compound Nouns in Malay standard document. We addressed the problem on detection of combination noun and adjective in Malay sentences to become a compound word. We modified several identification rules based by using Malay grammar rules and syntactic information to increase the percentage of recall, precision and F1-Score. For compound word identification, we used dictionary-based and thesaurus information for implementing Part of Speech (POS) tagging to all words in the selected Malay document. Testing was done on selected Malay document. The result showed an improvement compared to previous research with a precision of 90.9%, a recall of 10.2% and a F1-Score of 18.1%

    Automatic Extraction Of Malay Compound Nouns Using A Hybrid Of Statistical And Machine Learning Methods

    Get PDF

    Tagging narrator’s names in Hadith text

    Get PDF
    No AbstractKeywords: tagging; hadith text; nam

    Papers in New Guinea Linguistics No. 22

    Get PDF

    Cappadocian kinship

    Get PDF
    Cappadocian kinship systems are very interesting from a sociolinguistic and anthropological perspective because of the mixture of inherited Greek and borrowed Turkish kinship terms. Precisely because the number of Turkish kinship terms differs from one variety to another, it is necessary to talk about Cappadocian kinship systems in the plural rather than about the Cappadocian kinship system in the singular. Although reference will be made to other Cappadocian varieties, this paper will focus on the kinship systems of Mišotika and Aksenitika, the two Central Cappadocian dialects still spoken today in several communities in Greece. Particular attention will be given to the use of borrowed Turkish kinship terms, which sometimes seem to co-exist together with their inherited Greek counterparts, e.g. mána vs. néne ‘mother’, ailfó/aelfó vs. γardáš ‘brother’ etc. In the final part of the paper some kinship terms with obscure or hitherto unknown etymology will be discussed, e.g. káka ‘grandmother’, ižá ‘aunt’, lúva ‘uncle (father’s brother)’ etc

    On looking into words (and beyond): Structures, Relations, Analyses

    Get PDF
    On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory

    On looking into words (and beyond): Structures, Relations, Analyses

    Get PDF
    On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory

    On looking into words (and beyond): Structures, Relations, Analyses

    Get PDF
    On Looking into Words is a wide-ranging volume spanning current research into word structure and morphology, with a focus on historical linguistics and linguistic theory. The papers are offered as a tribute to Stephen R. Anderson, the Dorothy R. Diebold Professor of Linguistics at Yale, who is retiring at the end of the 2016-2017 academic year. The contributors are friends, colleagues, and former students of Professor Anderson, all important contributors to linguistics in their own right. As is typical for such volumes, the contributions span a variety of topics relating to the interests of the honorand. In this case, the central contributions that Anderson has made to so many areas of linguistics and cognitive science, drawing on synchronic and diachronic phenomena in diverse linguistic systems, are represented through the papers in the volume. The 26 papers that constitute this volume are unified by their discussion of the interplay between synchrony and diachrony, theory and empirical results, and the role of diachronic evidence in understanding the nature of language. Central concerns of the volume include morphological gaps, learnability, increases and declines in productivity, and the interaction of different components of the grammar. The papers deal with a range of linked synchronic and diachronic topics in phonology, morphology, and syntax (in particular, cliticization), and their implications for linguistic theory
    • …
    corecore