Search CORE

13,057 research outputs found

Modelling the formation of phonotactic restrictions across the mental lexicon

Author: Apoussidou Diana
Boersma Paul
Hamann Silke
Publication venue
Publication date: 01/01/2009
Field of study

Experimental data shows that adult learners of an artificial language with a phonotactic restriction learned this restriction better when being trained on word types (e.g. when they were presented with 80 different words twice each) than when being trained on word tokens (e.g. when presented with 40 different words four times each) (Hamann & Ernestus submitted). These findings support Pierrehumbert’s (2003) observation that phonotactic co-occurrence restrictions are formed across lexical entries, since only lexical levels of representation can be sensitive to type frequencies

CiteSeerX

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Hochschulschriftenserver - Universität Frankfurt am Main

Learning OT constraint rankings using a maximum entropy model

Author: Goldwater Sharon
Johnson M
Publication venue
Publication date: 01/01/2003
Field of study

Abstract. A weakness of standard Optimality Theory is its inability to account for grammar

CiteSeerX

Edinburgh Research Explorer

Review of Laurence R. Horn and Yasuhiko Kato (eds) (2000) Negation and polarity: syntactic and semantic perspectives. (Oxford University Press.)

Author: Rowlett PA
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2002
Field of study

University of Salford Institutional Repository

Crossref

A Machine learning approach to POS tagging

Author: Màrquez Villodre Lluís
Padró Lluís
Rodríguez Hontoria Horacio
Publication venue
Publication date: 01/01/1997
Field of study

We have applied inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities. This model consists of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired language models are complete enough to be directly used as sets of POS disambiguation rules, and include more complex contextual information than simple collections of n-grams usually used in statistical taggers. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labelling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine learned decision trees. Simultaneously, we address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Wide-coverage deep statistical parsing using automatic dependency structure annotation

Author: Abney Stephen
Andy Way
Aoife Cahill
Briscoe Edward
Chinchor Nancy
Johnson Mark
Josef van Genabith
Michael Burke
Ruth O'Donovan
Stefan Riezler
Xue Nianwen
Publication venue: 'MIT Press - Journals'
Publication date: 01/03/2008
Field of study

A number of researchers (Lin 1995; Carroll, Briscoe, and Sanfilippo 1998; Carroll et al. 2002; Clark and Hockenmaier 2002; King et al. 2003; Preiss 2003; Kaplan et al. 2004;Miyao and Tsujii 2004) have convincingly argued for the use of dependency (rather than CFG-tree) representations for parser evaluation. Preiss (2003) and Kaplan et al. (2004) conducted a number of experiments comparing “deep” hand-crafted wide-coverage with “shallow” treebank- and machine-learning based parsers at the level of dependencies, using simple and automatic methods to convert tree output generated by the shallow parsers into dependencies. In this article, we revisit the experiments in Preiss (2003) and Kaplan et al. (2004), this time using the sophisticated automatic LFG f-structure annotation methodologies of Cahill et al. (2002b, 2004) and Burke (2006), with surprising results. We compare various PCFG and history-based parsers (based on Collins, 1999; Charniak, 2000; Bikel, 2002) to find a baseline parsing system that fits best into our automatic dependency structure annotation technique. This combined system of syntactic parser and dependency structure annotation is compared to two hand-crafted, deep constraint-based parsers (Carroll and Briscoe 2002; Riezler et al. 2002). We evaluate using dependency-based gold standards (DCU 105, PARC 700, CBS 500 and dependencies for WSJ Section 22) and use the Approximate Randomization Test (Noreen 1989) to test the statistical significance of the results. Our experiments show that machine-learning-based shallow grammars augmented with sophisticated automatic dependency annotation technology outperform hand-crafted, deep, widecoverage constraint grammars. Currently our best system achieves an f-score of 82.73% against the PARC 700 Dependency Bank (King et al. 2003), a statistically significant improvement of 2.18%over the most recent results of 80.55%for the hand-crafted LFG grammar and XLE parsing system of Riezler et al. (2002), and an f-score of 80.23% against the CBS 500 Dependency Bank (Carroll, Briscoe, and Sanfilippo 1998), a statistically significant 3.66% improvement over the 76.57% achieved by the hand-crafted RASP grammar and parsing system of Carroll and Briscoe (2002)

Crossref

Irish Universities

DCU Online Research Access Service

Treebank-based acquisition of LFG parsing resources for French

Author: Schluter Natalie
van Genabith Josef
Publication venue
Publication date: 01/01/2008
Field of study

Motivated by the expense in time and other resources to produce hand-crafted grammars, there has been increased interest in automatically obtained wide-coverage grammars from treebanks for natural language processing. In particular, recent years have seen the growth in interest in automatically obtained deep resources that can represent information absent from simple CFG-type structured treebanks and which are considered to produce more language-neutral linguistic representations, such as dependency syntactic trees. As is often the case in early pioneering work on natural language processing, English has provided the focus of first efforts towards acquiring deep-grammar resources, followed by successful treatments of, for example, German, Japanese, Chinese and Spanish. However, no comparable large-scale automatically acquired deep-grammar resources have been obtained for French to date. The goal of this paper is to present the application of treebank-based language acquisition to the case of French. We show that with modest changes to the established parsing architectures, encouraging results can be obtained for French, with a best dependency structure f-score of 86.73%

CiteSeerX

Irish Universities

DCU Online Research Access Service

Semantics as a gateway to language

Author: Wiese Heike
Publication venue
Publication date: 27/10/2008
Field of study

This paper presents an account of semantics as a system that integrates conceptual representations into language. I define the semantic system as an interface level of the conceptual system CS that translates conceptual representations into a format that is accessible by language. The analysis I put forward does not treat the make up of this level as idiosyncratic, but subsumes it under a unified notion of linguistic interfaces. This allows us to understand core aspects of the linguistic-conceptual interface as an instance of a general pattern underlying the correlation of linguistic and non-linguistic structures. By doing so, the model aims to provide a broader perspective onto the distinction between and interaction of conceptual and linguistic processes and the correlation of semantic and syntactic structures

Hochschulschriftenserver - Universität Frankfurt am Main

RDF/S)XML Linguistic Annotation of Semantic Web Pages

Author: Aguado de Cea G.
Pareja-Lora A.
Plaza Arteche R.
Álvarez de Mon Rego I.
Publication venue: Facultad de Informática (UPM)
Publication date: 01/01/2002
Field of study

Although with the Semantic Web initiative much research on web pages semantic annotation has already done by AI researchers, linguistic text annotation, including the semantic one, was originally developed in Corpus Linguistics and its results have been somehow neglected by AI. ..

Archivo Digital UPM

Introduction

Author: Broekhuis H.
Vogel R.
Publication venue: Equinox Publishing, Sheffield (UK)/Bristol (USA)
Publication date: 01/01/2013
Field of study

This chapter will motivate why it is useful to consider the topic of derivations and filtering in more detail. We will argue against the popular belief that the minimalist program and optimality theory are incompatible theories in that the former places the explanatory burden on the generative device (the computational system) whereas the latter places it on the fi ltering device (the OT evaluator). Although this belief may be correct in as far as it describes existing tendencies, we will argue that minimalist and optimality theoretic approaches normally adopt more or less the same global architecture of grammar: both assume that a generator defines a set S of potentially well-formed expressions that can be generated on the basis of a given input and that there is an evaluator that selects the expressions from S that are actually grammatical in a given language L. For this reason, we believe that it has a high priority to investigate the role of the two components in more detail in the hope that this will provide a better understanding of the differences and similarities between the two approaches. We will conclude this introduction with a brief review of the studies collected in this book.

KNAW Repository

The diachronic emergence of retroflex segments in three languages

Author: Hamann Silke
Publication venue
Publication date: 01/10/2009
Field of study

The present study shows that though retroflex segments can be considered articulatorily marked, there are perceptual reasons why languages introduce this class into their phoneme inventory. This observation is illustrated with the diachronic developments of retroflexes in Norwegian (North- Germanic), Nyawaygi (Australian) and Minto-Nenana (Athapaskan). The developments in these three languages are modelled in a perceptually oriented phonological theory, since traditional articulatorily-based features cannot deal with such processes

Hochschulschriftenserver - Universität Frankfurt am Main