964 research outputs found

    Dependency-based Analysis for Tagalog Sentences

    Get PDF

    Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text

    Full text link
    The grammatical analysis of texts in any human language typically involves a number of basic processing tasks, such as tokenization, morphological tagging, and dependency parsing. State-of-the-art systems can achieve high accuracy on these tasks for languages with large datasets, but yield poor results for languages such as Tagalog which have little to no annotated data. To address this issue for the Tagalog language, we investigate the use of auxiliary data sources for creating task-specific models in the absence of annotated Tagalog data. We also explore the use of word embeddings and data augmentation to improve performance when only a small amount of annotated Tagalog data is available. We show that these zero-shot and few-shot approaches yield substantial improvements on grammatical analysis of both in-domain and out-of-domain Tagalog text compared to state-of-the-art supervised baselines.Comment: To appear at PACLIC 2022. 10 pages, 2 figures, 4 table

    Questions on transitivity

    Get PDF
    This handout (it isn’t a paper) presents phenomena and questions, rather than conclusions, related to the concept of transitivity. The idea is to return to these questions at the end of the Workshop to see if we can have a clearer consensus about the best general analysis of phenomena associated with transitivity. Section 2 presents alternative analyses of transitivity and questions about transitivity in three languages I have worked on. Section 3 discusses a few of the different conceptualisations of transitivity that might be relevant to our thinking about the questions related to these languages or that bring up further questions. Section 4 presents some general questions that might be asked of individual languages

    Review of Regina Pustet : Copulas: universals in the categorization of the lexicon: (Oxford University Press 2003; 262pp)

    Get PDF
    The renowned Grimm Dictionary (1854-1961) makes the statement that the German copula sein (to be) is “the most general and colourless of all verbal concepts” (der allgemeinste und farbloseste aller verbalbegriffe). A more concise summary of the linguistic issues surrounding the copula is hardly possible. These two properties (and the latent tension between them!) make copulas a particularly interesting and vexing subject of linguistic research. Copulas appear to be almost colourless, i.e., devoid of any concrete meaning, thus leading to the question of why such expressions exist at all, not only in German but in the majority of the world’s languages. And at the same time copulas presumably provide the best window into the core of verbal concepts thereby telling us what it actually means to be a verb – at least in a language like German or English. While there is a rather rich body of research on copulas in philosophical and formal semantics including several in-depth studies on the copular systems of individual languages, copulas have received comparably little attention from a typological perspective. The monograph of Regina Pustet sets out to fill this gap. She presents an extensive cross-linguistic study of copula usage based on a sample of 154 languages drawn from the language families of the world. The analysis is embedded in the theoretical framework of functional typology. The study aims at uncovering universal principles that govern the distribution of copulas in nominal, adjectival, and verbal predications. Its major objective is the development of a “semantically-based model of copula distribution” (p.62) by means of which the presence vs. absence of copulas can be motivated through the inherent meaning of the lexical items they potentially combine with. Drawing mainly on the work by Givón (1979, 1984) and Croft (1991, 2001), who provide a functional foundation of the traditional parts of speech, Pustet identifies four semantic parameters which, if taken together, are claimed to support substantial generalisations on copula distribution – within a given language as well as crosslinguistically. These parameters are DYNAMICITY, TRANSIENCE, TRANSITIVITY, and DEPENDENCY. Pustet goes on to argue – and this is in fact the driving force behind the overall monograph – that the distributional behaviour of copulas, in turn, yields a useful methodology for developing a general approach to lexical categorization. Thus, in the long run Pustet aims at contributing to a better understanding of the traditional parts of speech, noun, adjective, and verb by defining them in terms of “semantic feature bundles, which can be arranged in [a] coherent semantic similarity space” (p.193)

    Differential Object Marking in Tagalog

    Get PDF

    A, B, C, or None of the Above: A C-Command Puzzle in Tagalog

    Get PDF
    corecore