Search CORE

144 research outputs found

Discrimination in lexical decision.

Author: A Stefanowitsch
AJ Parker
BC Love
C Burgess
C Leys
C Shaoul
CJ Marsolek
CR Oehrn
D Danks
D Norris
D Norris
D Norris
D Norris
DA Balota
DE Knuth
E Beyersmann
F Moscoso del Prado Martín
F Moscoso del Prado Martín
F Rosenblatt
G Recchia
GA Miller
GE Bodner
GE Booij
GH Dunteman
H Kučera
Hedderik van Rijn
J Friedman
J Friedman
J Friedman
J Heister
JA Dunabeitia
JL McClelland
JP Blevins
JS Bowers
JS Burt
K Lund
K Mulder
K Rastle
K Rastle
KY Chan
L Bauer
Laurie Beth Feldman
LB Feldman
LB Feldman
LB Feldman
LG Allan
LH Wurm
M Brysbeart
M Coltheart
M Marelli
M Minsky
M Ramscar
M Ramscar
M Ramscar
M Ramscar
M Ramscar
M Ramscar
M Taft
M Taft
M Taft
M Taft
M Taft
M Taft
MEJ Masson
Michael Ramscar
MM Botvinick
MN Shadlen
MS Vitevitch
MW Harm
NC Ellis
P Milin
P Milin
PC Trimmer
Petar Milin
Peter Hendrix
PH Matthews
R Romo
R Schreuder
R Schreuder
R Schreuder
R. Harald Baayen
RA Rescorla
RA Rescorla
RH Baayen
RH Baayen
RH Baayen
RH Baayen
RH Baayen
RQ Quiroga
RR Miller
S Andrews
S Andrews
S Andrews
S Waydo
SN Wood
T Yarkoni
TK Landauer
TL Griffiths
WJM Levelt
Z Chen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

In this study we present a novel set of discrimination-based indicators of language processing derived from Naive Discriminative Learning (ndl) theory. We compare the effectiveness of these new measures with classical lexical-distributional measures-in particular, frequency counts and form similarity measures-to predict lexical decision latencies when a complete morphological segmentation of masked primes is or is not possible. Data derive from a re-analysis of a large subset of decision latencies from the English Lexicon Project, as well as from the results of two new masked priming studies. Results demonstrate the superiority of discrimination-based predictors over lexical-distributional predictors alone, across both the simple and primed lexical decision tasks. Comparable priming after masked corner and cornea type primes, across two experiments, fails to support early obligatory segmentation into morphemes as predicted by the morpho-orthographic account of reading. Results fit well with ndl theory, which, in conformity with Word and Paradigm theory, rejects the morpheme as a relevant unit of analysis. Furthermore, results indicate that readers with greater spelling proficiency and larger vocabularies make better use of orthographic priors and handle lexical competition more efficiently

Crossref

University of Birmingham Research Portal

Directory of Open Access Journals

PubMed Central

White Rose Research Online

Probabilistic Modelling of Morphologically Rich Languages

Author: Botha Jan A.
Publication venue
Publication date: 01/01/2014
Field of study

This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich internal structure and sub-word elements are shared across distinct word forms. Our approach is to encode basic notions of morphology into the assumptions of three different types of language models, with the intention that leveraging shared sub-word structure can improve model performance and help overcome data sparsity that arises from morphological processes. In the context of n-gram language modelling, we formulate a new Bayesian model that relies on the decomposition of compound words to attain better smoothing, and we develop a new distributed language model that learns vector representations of morphemes and leverages them to link together morphologically related words. In both cases, we show that accounting for word sub-structure improves the models' intrinsic performance and provides benefits when applied to other tasks, including machine translation. We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words. We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to. This approach is demonstrated on Semitic languages, and we find that modelling discontiguous sub-word structures leads to improvements in the task of segmenting words into their contiguous morphemes.Comment: DPhil thesis, University of Oxford, submitted and accepted 2014. http://ora.ox.ac.uk/objects/uuid:8df7324f-d3b8-47a1-8b0b-3a6feb5f45c

arXiv.org e-Print Archive

Oxford University Research Archive

Bayesian Inference for PCFGs via Markov Chain Monte Carlo

Author: Goldwater Sharon
Griffiths Thomas
Johnson Mark
Publication venue
Publication date: 01/01/2007
Field of study

8 page(s

Edinburgh Research Explorer

Macquarie University ResearchOnline

A Latent Morphology Model for Open-Vocabulary Neural Machine Translation

Author: Ataman Duygu
Aziz Wilker
Birch Alexandra
Publication venue
Publication date: 26/02/2020
Field of study

Translation into morphologically-rich languages challenges neural machine translation (NMT) models with extremely sparse vocabularies where atomic treatment of surface forms is unrealistic. This problem is typically addressed by either pre-processing words into subword units or performing translation directly at the level of characters. The former is based on word segmentation algorithms optimized using corpus-level statistics with no regard to the translation task. The latter learns directly from translation data but requires rather deep architectures. In this paper, we propose to translate words by modeling word formation through a hierarchical latent variable model which mimics the process of morphological inflection. Our model generates words one character at a time by composing two latent representations: a continuous one, aimed at capturing the lexical semantics, and a set of (approximately) discrete features, aimed at capturing the morphosyntactic function, which are shared among different surface forms. Our model achieves better accuracy in translation into three morphologically-rich languages than conventional open-vocabulary NMT methods, while also demonstrating a better generalization capacity under low to mid-resource settings.Comment: Published at ICLR 202

arXiv.org e-Print Archive

Edinburgh Research Explorer

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Recommended from our members

Inducing grammars from linguistic universals and realistic amounts of supervision

Author: Garrette Daniel Hunter
Publication venue
Publication date: 20/01/2017
Field of study

The best performing NLP models to date are learned from large volumes of manually-annotated data. For tasks like part-of-speech tagging and grammatical parsing, high performance can be achieved with plentiful supervised data. However, such resources are extremely costly to produce, making them an unlikely option for building NLP tools in under-resourced languages or domains. This dissertation is concerned with reducing the annotation required to learn NLP models, with the goal of opening up the range of domains and languages to which NLP technologies may be applied. In this work, we explore the possibility of learning from a degree of supervision that is at or close to the amount that could reasonably be collected from annotators for a particular domain or language that currently has none. We show that just a small amount of annotation input — even that which can be collected in just a few hours — can provide enormous advantages if we have learning algorithms that can appropriately exploit it. This work presents new algorithms, models, and approaches designed to learn grammatical information from weak supervision. In particular, we look at ways of intersecting a variety of different forms of supervision in complementary ways, thus lowering the overall annotation burden. Sources of information include tag dictionaries, morphological analyzers, constituent bracketings, and partial tree annotations, as well as unannotated corpora. For example, we present algorithms that are able to combine faster-to-obtain type-level annotation with unannotated text to remove the need for slower-to-obtain token-level annotation. Much of this dissertation describes work on Combinatory Categorial Grammar (CCG), a grammatical formalism notable for its use of structured, logic-backed categories that describe how each word and constituent fits into the overall syntax of the sentence. This work shows how linguistic universals intrinsic to the CCG formalism itself can be encoded as Bayesian priors to improve learning.Computer Science

Texas ScholarWorks

The Missing Link between Morphemic Assemblies and Behavioral Responses:a Bayesian Information-Theoretical model of lexical processing

Author: Filipovic-Djurdjevic Dusica
Kostic Prof Aleksandar
Moscoso del Prado Martin Dr Fermin
Publication venue
Publication date: 01/02/2006
Field of study

We present the Bayesian Information-Theoretical (BIT) model of lexical processing: A mathematical model illustrating a novel approach to the modelling of language processes. The model shows how a neurophysiological theory of lexical processing relying on Hebbian association and neural assemblies can directly account for a variety of effects previously observed in behavioural experiments. We develop two information-theoretical measures of the distribution of usages of a morpheme or word, and use them to predict responses in three visual lexical decision datasets investigating inflectional morphology and polysemy. Our model offers a neurophysiological basis for the effects of morpho-semantic neighbourhoods. These results demonstrate how distributed patterns of activation naturally result in the arisal of symbolic structures. We conclude by arguing that the modelling framework exemplified here, is a powerful tool for integrating behavioural and neurophysiological results

CogPrints Cognitive Sciences Eprint Archive

The relationship between thematic, lexical, and syntactic features of written texts and personality traits

Author: Jakovljev I.
Milin P.
Publication venue: 'National Library of Serbia'
Publication date: 01/01/2017
Field of study

The relationship between linguistic features of written texts and personality traits was investigated. Linguistic features used in this study were thematic (co-occurrence of the most frequent content words across participants), lexical (the maximum of new words) and syntactic (average sentence length). Personality traits were measured by VP+2 questionnaire standardized for Serbian population. Research was conducted on text materials collected from 114 Serbian participants (age 15–65), in their native tongue. Results showed that participants who gained low scores on Conscientiousness and high scores on Neuroticism and Negative Valence wrote about repeated daily activities and everyday life, but not about job-related matters or life perspective. Higher scores on Aggressiveness and Negative Valence coincided with writing about job-related matters and with the lower lexical richness. By showing that thematic content of text materials is affected by personality traits, these results support and expand previous findings regarding the relationship between personality and linguistic behaviour

Crossref

University of Birmingham Research Portal

Directory of Open Access Journals

White Rose Research Online

Speaking while listening: Language processing in speech shadowing and translation

Author: Van Paridon J.
Publication venue: Radboud University Nijmegen
Publication date: 01/01/2021
Field of study

Contains fulltext : 233349.pdf (Publisher’s version ) (Open Access)Radboud University, 25 mei 2021Promotores : Meyer, A.S., Roelofs, A.P.A.199 p

Radboud Repository

MPG.PuRe

Private State in Public Media: Subjectivity in French Traditional and Online News

Author: Ho-Dac Lydia-Mai
Küppers A.
Publication venue: HAL CCSD
Publication date: 17/08/2010
Field of study

International audienceThis paper reports on ongoing work dealing with the linguistic impact of putting the news on-line. In this framework, we investigate differences in one traditional newspaper and two forms of alternative on-line media with respect to the expression of authorial stance. Our research is based on a comparable large-scale corpus of articles published on the websites of the three respective media and aims at answering the question to what extent the presence of the author varies in the different media. - Is it a matter of amount and mode of the author's presence? - Is it a matter of lexical choice and diversity? - If this were the case, what expressions are used in the respective media? Our endeavour will be a methodological one. We firstly present our data, and thus describe the different news media included in our analysis and the diverse computer aided and manual production steps we performed in order to build up the corpus. Secondly, we outline our working hypotheses that are linked to the chosen types of media and describe the theoretical framework within which they are situated. Thirdly, we present our research method as well as some first results and insights gained throughout the pilot study of our data

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes