112 research outputs found
Learning Functional Prepositions
In first language acquisition, what does it mean for a grammatical category to have been acquired, and what are the mechanisms by which children learn functional categories in general? In the context of prepositions (Ps), if the lexical/functional divide cuts through the P category, as has been suggested in the theoretical literature, then constructivist accounts of language acquisition would predict that children develop adult-like competence with the more abstract units, functional Ps, at a slower rate compared to their acquisition of lexical Ps. Nativists instead assume that the features of functional P are made available by Universal Grammar (UG), and are mapped as quickly, if not faster, than the semantic features of their lexical counterparts. Conversely, if Ps are either all lexical or all functional, on both accounts of acquisition we should observe few differences in learning.
Three empirical studies of the development of P were conducted via computer analysis of the English and Spanish sub-corpora of the CHILDES database. Study 1 analyzed errors in child usage of Ps, finding almost no errors in commission in either language, but that the English learners lag in their production of functional Ps relative to lexical Ps. That no such delay was found in the Spanish data suggests that the English pattern is not universal. Studies 2 and 3 applied novel measures of phrasal (P head + nominal complement) productivity to the data. Study 2 examined prepositional phrases (PPs) whose head-complement pairs appeared in both child and adult speech, while Study 3 considered PPs produced by children that never occurred in adult speech. In both studies the productivity of Ps for English children developed faster than that of lexical Ps. In Spanish there were few differences, suggesting that children had already mastered both orders of Ps early in acquisition. These empirical results suggest that at least in English P is indeed a split category, and that children acquire the syntax of the functional subset very quickly, committing almost no errors. The UG position is thus supported.
Next, the dissertation investigates a \u27soft nativist\u27 acquisition strategy that composes the distributional analysis of input, minimal a priori knowledge of the possible co-occurrence of morphosyntactic features associated with functional elements, and linguistic knowledge that is presumably acquired via the experience of pragmatic, communicative situations. The output of the analysis consists in a mapping of morphemes to the feature bundles of nominative pronouns for English and Spanish, plus specific claims about the sort of knowledge required from experience.
The acquisition model is then extended to adpositions, to examine what, if anything, distributional analysis can tell us about the functional sequences of PPs. The results confirm the theoretical position according to which spatiotemporal Ps are lexical in character, rooting their own extended projections, and that functional Ps express an aspectual sequence in the functional superstructure of the PP
Wide-coverage parsing for Turkish
Wide-coverage parsing is an area that attracts much attention in natural language processing
research. This is due to the fact that it is the first step tomany other applications
in natural language understanding, such as question answering.
Supervised learning using human-labelled data is currently the best performing
method. Therefore, there is great demand for annotated data. However, human annotation
is very expensive and always, the amount of annotated data is much less than
is needed to train well-performing parsers. This is the motivation behind making the
best use of data available. Turkish presents a challenge both because syntactically
annotated Turkish data is relatively small and Turkish is highly agglutinative, hence
unusually sparse at the whole word level.
METU-Sabancı Treebank is a dependency treebank of 5620 sentences with surface
dependency relations and morphological analyses for words. We show that including
even the crudest forms of morphological information extracted from the data boosts
the performance of both generative and discriminative parsers, contrary to received
opinion concerning English.
We induce word-based and morpheme-based CCG grammars from Turkish dependency
treebank. We use these grammars to train a state-of-the-art CCG parser that
predicts long-distance dependencies in addition to the ones that other parsers are capable
of predicting. We also use the correct CCG categories as simple features in a
graph-based dependency parser and show that this improves the parsing results.
We show that a morpheme-based CCG lexicon for Turkish is able to solve many
problems such as conflicts of semantic scope, recovering long-range dependencies,
and obtaining smoother statistics from the models. CCG handles linguistic phenomena
i.e. local and long-range dependencies more naturally and effectively than other linguistic
theories while potentially supporting semantic interpretation in parallel. Using
morphological information and a morpheme-cluster based lexicon improve the performance
both quantitatively and qualitatively for Turkish.
We also provide an improved version of the treebank which will be released by
kind permission of METU and Sabancı
Statistical Knowledge and Learning in Phonology
This thesis deals with the theory of the phonetic component of grammar in a formal probabilistic inference framework: (1) it has been recognized since the beginning of generative phonology that some language-specific phonetic implementation is actually context-dependent, and thus it can be said that there are gradient "phonetic processes" in grammar in addition to categorical "phonological processes." However, no explicit theory has been developed to characterize these processes. Meanwhile, (2) it is understood that language acquisition and perception are both really informed guesswork: the result of both types of inference can be reasonably thought to be a less-than-perfect committment, with multiple candidate grammars or parses considered and each associated with some degree of credence. Previous research has used probability theory to formalize these inferences in implemented computational models, especially in phonetics and phonology. In this role, computational models serve to demonstrate the existence of working learning/per- ception/parsing systems assuming a faithful implementation of one particular theory of human language, and are not intended to adjudicate whether that theory is correct. The current thesis (1) develops a theory of the phonetic component of grammar and how it
relates to the greater phonological system and (2) uses a formal Bayesian treatment of learning to evaluate this theory of the phonological architecture and for making predictions about how the resulting grammars will be organized. The coarse description of the consequence for linguistic theory is that the processes we think of as "allophonic" are actually language-specific, gradient phonetic processes, assigned to the phonetic component of grammar; strict allophones have no representation in the output of the categorical phonological grammar
Recommended from our members
Aspects of emergent cyclicity in language and computation
This thesis has four parts, which correspond to the presentation and development of a theoretical
framework for the study of cognitive capacities qua physical phenomena, and a case study of locality conditions over natural languages.
Part I deals with computational considerations, setting the tone of the rest of the thesis, and introducing and defining critical concepts like ‘grammar’, ‘automaton’, and the relations between them
. Fundamental questions concerning the place of formal language theory in
linguistic inquiry, as well as the expressibility of linguistic and computational concepts in
common terms, are raised in this part.
Part II further explores the issues addressed in Part I with particular emphasis on how
grammars are implemented by means of automata, and the properties of the formal languages
that these automata generate. We will argue against the equation between effective computation
and function-based computation, and introduce examples of computable procedures which are
nevertheless impossible to capture using traditional function-based theories. The connection
with cognition will be made in the light of dynamical frustrations: the irreconciliable tension
between mutually incompatible tendencies that hold for a given dynamical system. We will
provide arguments in favour of analyzing natural language as emerging from a tension between
different systems (essentially, semantics and morpho-phonology) which impose orthogonal
requirements over admissible outputs. The concept of level of organization or scale comes to
the foreground here; and apparent contradictions and incommensurabilities between concepts
and theories are revisited in a new light: that of dynamical nonlinear systems which are
fundamentally frustrated. We will also characterize the computational system that emerges from
such an architecture: the goal is to get a syntactic component which assigns the simplest
possible structural description to sub-strings, in terms of its computational complexity. A
system which can oscillate back and forth in the hierarchy of formal languages in assigning
structural representations to local domains will be referred to as a computationally mixed
system.
Part III is where the really fun stuff starts. Field theory is introduced, and its applicability to
neurocognitive phenomena is made explicit, with all due scale considerations. Physical and
mathematical concepts are permanently interacting as we analyze phrase structure in terms of
pseudo-fractals (in Mandelbrot’s sense) and define syntax as a (possibly unary) set of
topological operations over completely Hausdorff (CH) ultrametric spaces. These operations, which makes field perturbations interfere, transform that initial completely Hausdorff
ultrametric space into a metric, Hausdorff space with a weaker separation axiom. Syntax, in this
proposal, is not ‘generative’ in any traditional sense –except the ‘fully explicit theory’ one-:
rather, it partitions (technically, ‘parametrizes’) a topological space. Syntactic dependencies are
defined as interferences between perturbations over a field, which reduce the total entropy of
the system per cycles, at the cost of introducing further dimensions where attractors
corresponding to interpretations for a phrase marker can be found.
Part IV is a sample of what we can gain by further pursuing the physics of language approach,
both in terms of empirical adequacy and theoretical elegance, not to mention the unlimited
possibilities of interdisciplinary collaboration. In this section we set our focus on island
phenomena as defined by Ross (1967), critically revisiting the most relevant literature on this
topic, and establishing a typology of constructions that are strong islands, which cannot be
violated. These constructions are particularly interesting because they limit the phase space of
what is expressible via natural language, and thus reveal crucial aspects of its underlying
dynamics. We will argue that a dynamically frustrated system which is characterized by
displaying mixed computational dependencies can provide straightforward characterizations of
cyclicity in terms of changes in dependencies in local domains
Formal Linguistic Models and Knowledge Processing. A Structuralist Approach to Rule-Based Ontology Learning and Population
2013 - 2014The main aim of this research is to propose a structuralist approach for knowledge processing by means of ontology learning and population, achieved starting from unstructured and structured texts. The method suggested includes distributional semantic approaches and NL formalization theories, in order to develop a framework, which relies upon deep linguistic analysis... [edited by author]XIII n.s
Head-Driven Phrase Structure Grammar
Head-Driven Phrase Structure Grammar (HPSG) is a constraint-based or declarative approach to linguistic knowledge, which analyses all descriptive levels (phonology, morphology, syntax, semantics, pragmatics) with feature value pairs, structure sharing, and relational constraints. In syntax it assumes that expressions have a single relatively simple constituent structure. This volume provides a state-of-the-art introduction to the framework. Various chapters discuss basic assumptions and formal foundations, describe the evolution of the framework, and go into the details of the main syntactic phenomena. Further chapters are devoted to non-syntactic levels of description. The book also considers related fields and research areas (gesture, sign languages, computational linguistics) and includes chapters comparing HPSG with other frameworks (Lexical Functional Grammar, Categorial Grammar, Construction Grammar, Dependency Grammar, and Minimalism)
Topological Foundations of Cognitive Science
A collection of papers presented at the First International Summer Institute in Cognitive Science, University at Buffalo, July 1994, including the following papers:
** Topological Foundations of Cognitive Science, Barry Smith
** The Bounds of Axiomatisation, Graham White
** Rethinking Boundaries, Wojciech Zelaniec
** Sheaf Mereology and Space Cognition, Jean Petitot
** A Mereotopological Definition of 'Point', Carola Eschenbach
** Discreteness, Finiteness, and the Structure of Topological Spaces, Christopher Habel
** Mass Reference and the Geometry of Solids, Almerindo E. Ojeda
** Defining a 'Doughnut' Made Difficult, N .M. Gotts
** A Theory of Spatial Regions with Indeterminate Boundaries, A.G. Cohn and N.M. Gotts
** Mereotopological Construction of Time from Events, Fabio Pianesi and Achille C. Varzi
** Computational Mereology: A Study of Part-of Relations for Multi-media Indexing, Wlodek Zadrozny and Michelle Ki
- …