Search CORE

3,414 research outputs found

Redefining part-of-speech classes with distributional semantic models

Author: Kutuzov Andrey
Velldal Erik
Øvrelid Lilja
Publication venue
Publication date: 01/01/2016
Field of study

This paper studies how word embeddings trained on the British National Corpus interact with part of speech boundaries. Our work targets the Universal PoS tag set, which is currently actively being used for annotation of a range of languages. We experiment with training classifiers for predicting PoS tags for words based on their embeddings. The results show that the information about PoS affiliation contained in the distributional vectors allows us to discover groups of words with distributional patterns that differ from other words of the same part of speech. This data often reveals hidden inconsistencies of the annotation process or guidelines. At the same time, it supports the notion of `soft' or `graded' part of speech affiliations. Finally, we show that information about PoS is distributed among dozens of vector components, not limited to only one or two features

arXiv.org e-Print Archive

Crossref

NORA - Norwegian Open Research Archives

SimLex-999: Evaluating Semantic Models with (Genuine) Similarity Estimation

Author: Hill Felix
Korhonen Anna
Reichart Roi
Publication venue
Publication date: 14/08/2014
Field of study

We present SimLex-999, a gold standard resource for evaluating distributional semantic models that improves on existing resources in several important ways. First, in contrast to gold standards such as WordSim-353 and MEN, it explicitly quantifies similarity rather than association or relatedness, so that pairs of entities that are associated but not actually similar [Freud, psychology] have a low rating. We show that, via this focus on similarity, SimLex-999 incentivizes the development of models with a different, and arguably wider range of applications than those which reflect conceptual association. Second, SimLex-999 contains a range of concrete and abstract adjective, noun and verb pairs, together with an independent rating of concreteness and (free) association strength for each pair. This diversity enables fine-grained analyses of the performance of models on concepts of different types, and consequently greater insight into how architectures can be improved. Further, unlike existing gold standard evaluations, for which automatic approaches have reached or surpassed the inter-annotator agreement ceiling, state-of-the-art models perform well below this ceiling on SimLex-999. There is therefore plenty of scope for SimLex-999 to quantify future improvements to distributional semantic models, guiding the development of the next generation of representation-learning architectures

arXiv.org e-Print Archive

CiteSeerX

Using distributional similarity to organise biomedical terminology

Author: Dowdall James
Keller Bill
Schneider Gerold
Weeds Julie
Weir David
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2005
Field of study

We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

ZORA

Sussex Research Online

The company that words keep: comparing the statistical structure of child- versus adult-directed language

Author: Block
Brown
Du Bois
Grimshaw
Hayes
MacWhinney
Markman
Newport
Plaut
Quine
Recchia
Riordan
Saussure
Snow
THOMAS HILLS
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/06/2013
Field of study

Does child-directed language differ from adult-directed language in ways that might facilitate word learning? Associative structure (the probability that a word appears with its free associates), contextual diversity, word repetitions and frequency were compared longitudinally across six language corpora, with four corpora of language directed at children aged 1 ; 0 to 5 ; 0, and two adult-directed corpora representing spoken and written language. Statistics were adjusted relative to shuffled corpora. Child-directed language was found to be more associative, repetitive and consistent than adult-directed language. Moreover, these statistical properties of child-directed language better predicted word acquisition than the same statistics in adult-directed language. Word frequency and repetitions were the best predictors within word classes (nouns, verbs, adjectives and function words). For all word classes combined, associative structure, contextual diversity and word repetitions best predicted language acquisition. These results support the hypothesis that child-directed language is structured in ways that facilitate language acquisition

Crossref

Warwick Research Archives Portal Repository

Functional versus lexical: a cognitive dichotomy

Author: Cann Ronnie
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2000
Field of study

Edinburgh Research Explorer

Modelling the acquisition of syntactic categories

Author: Gobet F
Pine J M
Publication venue: 'Informa UK Limited'
Publication date: 01/01/1997
Field of study

This research represents an attempt to model the child’s acquisition of syntactic categories. A computational model, based on the EPAM theory of perception and learning, is developed. The basic assumptions are that (1) syntactic categories are actively constructed by the child using distributional learning abilities; and (2) cognitive constraints in learning rate and memory capacity limit these learning abilities. We present simulations of the syntax acquisition of a single subject, where the model learns to build up multi-word utterances by scanning a sample of the speech addressed to the subject by his mother

CiteSeerX

Brunel University Research Archive

Input and Intake in Language Acquisition

Author: Gagliardi Ann C.
Publication venue
Publication date: 01/01/2012
Field of study

This dissertation presents an approach for a productive way forward in the study of language acquisition, sealing the rift between claims of an innate linguistic hypothesis space and powerful domain general statistical inference. This approach breaks language acquisition into its component parts, distinguishing the input in the environment from the intake encoded by the learner, and looking at how a statistical inference mechanism, coupled with a well defined linguistic hypothesis space could lead a learn to infer the native grammar of their native language. This work draws on experimental work, corpus analyses and computational models of Tsez, Norwegian and English children acquiring word meanings, word classes and syntax to highlight the need for an appropriate encoding of the linguistic input in order to solve any given problem in language acquisition

Digital Repository at the University of Maryland

The ‘nouniness’ of attributive adjectives and ‘verbiness’ of predicative adjectives:Evidence from phonology

Author: Hollmann Willem
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 30/06/2021
Field of study

This article investigates prototypically attributive versus predicative adjectives in English in terms of the phonological properties that have been associated especially with nouns versus verbs in a substantial body of psycholinguistic research (e.g. Kelly 1992) - often ignored in theoretical linguistic work on word classes. Inspired by Berg's (2000, 2009) 'cross-level harmony constraint', the hypothesis I test is that prototypically attributive adjectives not only align more with nouns than with verbs syntactically, semantically and pragmatically, but also phonologically - and likewise for prototypically predicative adjectives and verbs. I analyse the phonological structure of frequent adjectives from the Corpus of Contemporary American English (COCA), and show that the data do indeed support the hypothesis. Berg's 'cross-level harmony constraint' may thus apply not only to the entire word classes noun, verb and adjective, but also to these two adjectival subclasses. I discuss several theoretical issues that emerge. The facts are most readily accommodated in a usage-based model, such as Radical Construction Grammar (Croft 2001), where these adjectives are seen as forming two distinct but overlapping classes. Drawing also on recent research by Boyd & Goldberg (2011) and Hao (2015), I explore the possible nature and emergence of these classes in some detail

Lancaster E-Prints

ADJECTIVISH INDONESIAN VERBS: A COGNITIVE SEMANTICS PERSPECTIVE

Author: Suparto Suparto
Publication venue
Publication date: 02/09/2015
Field of study

There has been a deeply rooted belief that parts of speech can be discretely categorized. It is somethingwidely accepted in linguistics. There is a tendency of taking for granted of such an academic beliefTherefore it happens from time to time without being thought critically the degree of its empirical truthThose studying linguistics will sooner or later read many linguistics text books stating that once a word hasits own category, there will be no potential of the word to have another word category. Most people learninglinguistics considered it as something necessary to occur. This linguistic phenomenon is not just taken tobe true, yet it comes to be taken as something conclusive. Factually, there are Indonesian verbs behavingadjectivishly. They are, to some extent, verbs, yet to another one, they are adjectives. It is evidenced by thefact that they have the properties of adjective. These linguistic phenomena demonstrate that there are Indonesian verbs that have stronger quality of their verbness. It means that there are Indonesian verbs thaare verbier than others. Based on the data found, Indonesian transitive verbs have higher potential to behaveadjectivishly than the Indonesian intransitive ones. A certain kind of Indonesian transitive verbs can betreated adjectivishly. This finding shows that the degree of word category discreteness, particularly verb, isnot something clear and cut. There are possibilities to emerge that word categories can, to some extent, be fuzzy. The fuzzy quality can be referred to the attributions of adjective to the Indonesian transitive verbs. Imeans that categorizing word class is not as simple as we thought before

Diponegoro University Institutional Repository