964 research outputs found
Benchmarking zero-shot and few-shot approaches for tokenization, tagging, and dependency parsing of Tagalog text
The grammatical analysis of texts in any human language typically involves a
number of basic processing tasks, such as tokenization, morphological tagging,
and dependency parsing. State-of-the-art systems can achieve high accuracy on
these tasks for languages with large datasets, but yield poor results for
languages such as Tagalog which have little to no annotated data. To address
this issue for the Tagalog language, we investigate the use of auxiliary data
sources for creating task-specific models in the absence of annotated Tagalog
data. We also explore the use of word embeddings and data augmentation to
improve performance when only a small amount of annotated Tagalog data is
available. We show that these zero-shot and few-shot approaches yield
substantial improvements on grammatical analysis of both in-domain and
out-of-domain Tagalog text compared to state-of-the-art supervised baselines.Comment: To appear at PACLIC 2022. 10 pages, 2 figures, 4 table
Questions on transitivity
This handout (it isn’t a paper) presents phenomena and questions, rather than conclusions, related to the concept of transitivity. The idea is to return to these questions at the end of the Workshop to see if we can have a clearer consensus about the best general analysis of phenomena associated with transitivity. Section 2 presents alternative analyses of transitivity and questions about transitivity in three languages I have worked on. Section 3 discusses a few of the different conceptualisations of transitivity that might be relevant to our thinking about the questions related to these languages or that bring up further questions. Section 4 presents some general questions that might be asked of individual languages
Review of Regina Pustet : Copulas: universals in the categorization of the lexicon: (Oxford University Press 2003; 262pp)
The renowned Grimm Dictionary (1854-1961) makes the statement that the German copula sein (to be) is “the most general and colourless of all verbal concepts” (der allgemeinste und farbloseste aller verbalbegriffe). A more concise summary of the linguistic issues surrounding the copula is hardly possible. These two properties (and the latent tension between them!) make copulas a particularly interesting and vexing subject of linguistic research. Copulas appear to be almost colourless, i.e., devoid of any concrete meaning, thus leading to the question of why such expressions exist at all, not only in German but in the majority of the world’s languages. And at the same time copulas presumably provide the best window into the core of verbal concepts thereby telling us what it actually means to be a verb – at least in a language like German or English. While there is a rather rich body of research on copulas in philosophical and formal semantics including several in-depth studies on the copular systems of individual languages, copulas have received comparably little attention from a typological perspective. The monograph of Regina Pustet sets out to fill this gap. She presents an extensive cross-linguistic study of copula usage based on a sample of 154 languages drawn from the language families of the world. The analysis is embedded in the theoretical framework of functional typology. The study aims at uncovering universal principles that govern the distribution of copulas in nominal, adjectival, and verbal predications. Its major objective is the development of a “semantically-based model of copula distribution” (p.62) by means of which the presence vs. absence of copulas can be motivated through the inherent meaning of the lexical items they potentially combine with. Drawing mainly on the work by Givón (1979, 1984) and Croft (1991, 2001), who provide a functional foundation of the traditional parts of speech, Pustet identifies four semantic parameters which, if taken together, are claimed to support substantial generalisations on copula distribution – within a given language as well as crosslinguistically. These parameters are DYNAMICITY, TRANSIENCE, TRANSITIVITY, and DEPENDENCY. Pustet goes on to argue – and this is in fact the driving force behind the overall monograph – that the distributional behaviour of copulas, in turn, yields a useful methodology for developing a general approach to lexical categorization. Thus, in the long run Pustet aims at contributing to a better understanding of the traditional parts of speech, noun, adjective, and verb by defining them in terms of “semantic feature bundles, which can be arranged in [a] coherent semantic similarity space” (p.193)
Recommended from our members
Erratum to: Experimental syntax and the variation of island effects in English and Italian, Nat Lang Linguist Theory, (2015), 10.1007/s11049-015-9286-8
- …