Search CORE

130 research outputs found

Calculating Selectional Preferences of Transitive Verbs in Korean

Author: Choe Jae-Woong
Song Sanghoun
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Implicit indefinite objects at the syntax-semantics-pragmatics interface: a probabilistic model of acceptability judgments

Author: CAPPELLI Giulia
Publication venue: 'Scuola Normale Superiore - Edizioni della Normale'
Publication date: 21/10/2022
Field of study

Optionally transitive verbs, whose Patient participant is semantically obligatory but syntactically optional (e.g., to eat, to drink, to write), deviate from the transitive prototype defined by Hopper and Thompson (1980). Following Fillmore (1986), unexpressed objects may be either indefinite (referring to prototypical Patients of a verb, whose actual entity is unknown or irrelevant) or definite (with a referent available in the immediate intra- or extra-linguistic context). This thesis centered on indefinite null objects, which the literature argues to be a gradient, non-categorical phenomenon possible with virtually any transitive verb (in different degrees depending on the verb semantics), favored or hindered by several semantic, aspectual, pragmatic, and discourse factors. In particular, the probabilistic model of the grammaticality of indefinite null objects hereby discussed takes into account a continuous factor (semantic selectivity, as a proxy to object recoverability) and four binary factors (telicity, perfectivity, iterativity, and manner specification). This work was inspired by Medina (2007), who modeled the effect of three predictors (semantic selectivity, telicity, and perfectivity) on the grammaticality of indefinite null objects (as gauged via Likert-scale acceptability judgments elicited from native speakers of English) within the framework of Stochastic Optimality Theory. In her variant of the framework, the constraints get floating rankings based on the input verb’s semantic selectivity, which she modeled via the Selectional Preference Strength measure by Resnik (1993, 1996). I expanded Medina’s model by modeling implicit indefinite objects in two languages (English and Italian), by using three different measures of semantic selectivity (Resnik’s SPS; Behavioral PISA, inspired by Medina’s Object Similarity measure; and Computational PISA, a novel similarity-based measure by Cappelli and Lenci (2020) based on distributional semantics), and by adding iterativity and manner specification as new predictors in the model. Both the English and the Italian five-predictor models based on Behavioral PISA explain almost half of the variance in the data, improving on the Medina-like three-predictor models based on Resnik’s SPS. Moreover, they have a comparable range of predicted object-dropping probabilities (30-100% in English, 30-90% in Italian), and the predictors perform consistently with theoretical literature on object drop. Indeed, in both models, atelic imperfective iterative manner-specified inputs are the most likely to drop their object (between 80% and 90%), while telic perfective non-iterative manner-unspecified inputs are the least likely (between 30% and 40%). The constraint re-ranking probabilities are always directly proportional to semantic selectivity, with the exception of Telic End in Italian. Both models show a main effect of telicity, but the second most relevant factor in the model is perfectivity in English and manner specification in Italian

Archivio istituzionale della Ricerca - Scuola Normale Superiore

Proceedings of the COLING 2004 Post Conference Workshop on Multilingual Linguistic Ressources MLR2004

Author: Armstrong Susan
Boitet Christian
Popescu-Belis Andrei
Sérasset Gilles
Tufis Dan
Publication venue: COLING
Publication date: 01/01/2004
Field of study

International audienceIn an ever expanding information society, most information systems are now facing the "multilingual challenge". Multilingual language resources play an essential role in modern information systems. Such resources need to provide information on many languages in a common framework and should be (re)usable in many applications (for automatic or human use). Many centres have been involved in national and international projects dedicated to building har- monised language resources and creating expertise in the maintenance and further development of standardised linguistic data. These resources include dictionaries, lexicons, thesauri, word-nets, and annotated corpora developed along the lines of best practices and recommendations. However, since the late 90's, most efforts in scaling up these resources remain the responsibility of the local authorities, usually, with very low funding (if any) and few opportunities for academic recognition of this work. Hence, it is not surprising that many of the resource holders and developers have become reluctant to give free access to the latest versions of their resources, and their actual status is therefore currently rather unclear. The goal of this workshop is to study problems involved in the development, management and reuse of lexical resources in a multilingual context. Moreover, this workshop provides a forum for reviewing the present state of language resources. The workshop is meant to bring to the international community qualitative and quantitative information about the most recent developments in the area of linguistic resources and their use in applications. The impressive number of submissions (38) to this workshop and in other workshops and conferences dedicated to similar topics proves that dealing with multilingual linguistic ressources has become a very hot problem in the Natural Language Processing community. To cope with the number of submissions, the workshop organising committee decided to accept 16 papers from 10 countries based on the reviewers' recommendations. Six of these papers will be presented in a poster session. The papers constitute a representative selection of current trends in research on Multilingual Language Resources, such as multilingual aligned corpora, bilingual and multilingual lexicons, and multilingual speech resources. The papers also represent a characteristic set of approaches to the development of multilingual language resources, such as automatic extraction of information from corpora, combination and re-use of existing resources, online collaborative development of multilingual lexicons, and use of the Web as a multilingual language resource. The development and management of multilingual language resources is a long-term activity in which collaboration among researchers is essential. We hope that this workshop will gather many researchers involved in such developments and will give them the opportunity to discuss, exchange, compare their approaches and strengthen their collaborations in the field. The organisation of this workshop would have been impossible without the hard work of the program committee who managed to provide accurate reviews on time, on a rather tight schedule. We would also like to thank the Coling 2004 organising committee that made this workshop possible. Finally, we hope that this workshop will yield fruitful results for all participants

Hal - Université Grenoble Alpes

Hal-Diderot

Recommended from our members

Representation Learning beyond Semantic Similarity: Character-aware and Function-specific Approaches

Author: Gerz Daniela Susanne
Publication venue: University of Cambridge
Publication date: 28/04/2020
Field of study

Representation learning is a research area within machine learning and natural language processing (NLP) concerned with building machine-understandable representations of discrete units of text. Continuous representations are at the core of modern machine learning applications, and representation learning has thereby become one of the central research areas in NLP. The induction of text representations is typically based on the distributional hypothesis, and consequently encodes general information about word similarity. Words or phrases with similar meaning obtain similar representations in a vector space constructed for this purpose. This established methodology excels for morphologically-simple languages such as English, and in data-rich settings. However, several useful lexical relations such as entailment or selectional preference, are not captured or get conflated with other relations. Another challenge is dealing with low-data regimes for morphologically-complex and under-resourced languages. In this thesis we construct novel representation learning methods that go beyond the limitations of the distributional hypothesis and investigate solutions that induce vector spaces with diverse properties. In particular, we look at how the vector space induction process influences the contained information, and how the information manifests in a number of core NLP tasks: semantic similarity, lexical entailment, selectional preference, and language modeling. We contribute novel evaluations of state-of-the-art models highlighting their current capabilities and limitations. An analysis of language modeling in 50 typologically-diverse languages demonstrates that representations can indeed pose a performance bottleneck. We introduce a novel approach to leveraging subword-level information in word representations: our solution lifts this bottleneck in low-resource scenarios. Finally, we introduce a novel paradigm of function-specific representation learning that aims to integrate fine-grained semantic relations and real-world knowledge into the word vector spaces. We hope this thesis can serve as a valuable overview on word representations, and inspire future work in modeling \textit{semantic similarity and beyond}.ERC Consolidator Grant LEXICAL (648909

Apollo (Cambridge)

The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study

Author: Carminati Maria Nella
Knoeferle Pia
Publication venue
Publication date: 01/01/2012
Field of study

Carminati MN, Knoeferle P. The Processing of Emotional Sentences by Young and Older Adults: A Visual World Eye-movement Study. Presented at the Architectures and Mechanisms of Language and Processing (AMLaP), Riva del Garda, Italy

Publications at Bielefeld University

The role of voice and word order in incremental sentence processing: Studies on sentence production and comprehension in Tagalog and German

Author: Sauppe S.
Publication venue: Radboud University Nijmegen
Publication date: 01/01/2017
Field of study

MPG.PuRe

Paths in first language acquisition: Motion through space in English, French and Japanese

Author: Stringer David
Publication venue
Publication date: 01/01/2005
Field of study

This thesis examines how children attain the linguistic knowledge they need to grammatically express basic trajectories through physical space in English, French and Japanese. In Talmy's (1991; 2000b) descriptive binary typology, 'verb-framed’ languages such as Japanese and French systematically encode PATH (or 'direction') in verbs, whilst 'satellite-framed' languages such as English systematically do so in adpositions. How such phenomena might be formalized is considered in terms of two contrasting hypotheses: (i) the Path Parameter Hypothesis, which suggests binary parameterization at the whole-language level, and (іі) the Lexicalist Path Hypothesis, which suggests that all relevant aspects of PATH predication are determined at the level of individual lexical items. Two experiments with original research methodology were conducted with English, French and Japanese children and adults. In Experiment I, directional predicates were elicited using a purpose-designed picture-story, and in Experiment II, grammaticality judgements were elicited from the same test subjects. Whilst predictions of general tendencies were upheld (strongly for English and Japanese, weakly for French), several findings support a non-parameterized, lexicalist account of PATH predication. First, in all child age groups, the three languages fell into discrete response categories for directional utterances in the absence of an inherent PATH verb. Second, both lexicalization types were found in each language, again in all age groups. Third, the three languages are revealed to have a shared syntax of directional predication, involving the same set of interpretable features and the same set of basic syntactic structures, including a layered pp structure. These findings suggest that whilst the typology remains broadly descriptive, there is no language-particular grammar involved in this variation. Rather, both directional V and a fully articulated pp structure are available in all three languages, show no discernable development, and are presumably part of the machinery of Universal Grammar. Children already understand the syntactic possibilities in the predication of PATH, but must learn the particular complexities of their lexicon, the primary locus of variation in the linguistic expression of motion events

Durham e-Theses

OpenGrey Repository