Search CORE

10 research outputs found

Extended finite state models of language

Author
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/1999
Field of study

EXTENDED FINITE STATE MODELS OF LANGUAGE

Author: Kornai András
Publication venue
Publication date: 01/01/1996
Field of study

not be included here because of space constraints or because the authors felt that their subsequent work took a direction that they no longer consider the workshop paper fully representative of their current thinking. In particular, we call attention to the tutorial paper by Jelinek (excerpted from a his forthcoming book (Jelinek 1977)), the paper by Mohri, Pereira, and Riley describing the AT&T/Bell Labs approach to language modeling using weighted transducers, and the paper by Oehrle on binding and anaphora. Even without these papers, the sheer size of the proceedings made it impossible to include the same material in this issue of JNLE, and the participants were asked to prepare shorter versions (in some cases, extended abstracts) for inclusion here. A full version of these papers, taking into account the comments received at the workshop, will be published later this year by Cambridge University Press. In addition, a formal Call For Papers yielded several new papers for this issu

CiteSeerX

Crossref

SZTAKI Publication Repository

A természetes nyelvek formális modelljeiről

Author: Prószéky Gábor
Publication venue
Publication date: 01/01/2003
Field of study

University of Szeged

On Folding and Twisting (and whatknot): towards a characterization of workspaces in syntax

Author: Charles R G
Chen N
Gimenez G
Guttmann C
Milles J
Panych L
Zhu Y M
Publication venue
Publication date: 01/01/2004
Field of study

Syntactic theory has traditionally adopted a constructivist approach, in which a set of atomic elements are manipulated by combinatory operations to yield derived, complex elements. Syntactic structure is thus seen as the result or discrete recursive combinatorics over lexical items which get assembled into phrases, which are themselves combined to form sentences. This view is common to European and American structuralism (e.g., Benveniste, 1971; Hockett, 1958) and different incarnations of generative grammar, transformational and non-transformational (Chomsky, 1956, 1995; and Kaplan & Bresnan, 1982; Gazdar, 1982). Since at least Uriagereka (2002), there has been some attention paid to the fact that syntactic operations must apply somewhere, particularly when copying and movement operations are considered. Contemporary syntactic theory has thus somewhat acknowledged the importance of formalizing aspects of the spaces in which elements are manipulated, but it is still a vastly underexplored area. In this paper we explore the consequences of conceptualizing syntax as a set of topological operations applying over spaces rather than over discrete elements. We argue that there are empirical advantages in such a view for the treatment of long-distance dependencies and cross-derivational dependencies: constraints on possible configurations emerge from the dynamics of the system.Comment: Manuscript. Do not cite without permission. Comments welcom

arXiv.org e-Print Archive

HAL Descartes

Hal-Diderot

I. Magyar Számítógépes Nyelvészeti Konferencia

Author
Publication venue
Publication date: 01/01/2003
Field of study

University of Szeged

Composite pseudogrammars based on parallel language models of Serbian

Author: Škorić Mihailo
Publication venue: Универзитет у Београду, Студије при универзитету
Publication date: 06/06/2023
Field of study

Циљ овог рада је да предочи предности коришћења композитних интелигентних система заснованих на паралелним архитектурама, а пре свега предност композитних псеудограматика заснованих на паралелним језичким моделима у обради, генерисању и евалуацији природног језика, и то поготово српског. У њему је најпре дат кратак увод у теорију формалних језика, предочене су различите врсте граматика и дат је преглед радова из области креирања њихових апроксимација. Описани су појмови псеудограматика и језичких модела и приказан је њихов историјски развој, са највећим акцентом на тренутно стање и најактуалније методе моделовања језика и језичке моделе. Уведена је проблематика евалуације квалитета текста, и описане су различите методе полу-аутоматске и аутоматске евалуације. У другом делу рада описана су два експеримента која су имала за циљ да утврде методологију креирања композитних система за потребе моделовања српског језика, при чему су описани начини креирања различитих репрезентација докумената и различити начини комбиновања излаза самосталних система у обради природног језика. Паралелни системи су том приликом успешно тестирани на задацима обележавања врста речи и утврђивања ауторства кроз моделовања мини-језика, где су остварили значајно боље резултате од самосталних метода. Коначно, описан је процес обучавања серије генеративних предобучених трансформера над различитим репрезентацијама корпуса српског језика и креирања композитних псеудограматика заснованих на тим моделима и различитим методама комбиновања. Развијени системи су евалуирани на задацима оцењивања квалитета текста, те проналажења и исправљања грешака. Приказани резултати издвојили су наслагани обучени класификатор као оптимални метод комбиновања језичких модела у јединствену псеудограматику.The aim of this paper is to present the advantages of using composite intelligent systems based on parallel architectures and, above all, the advantage of composite pseudogrammars based on parallel language models in the processing, generation, and evaluation of natural languages, especially Serbian. First a brief introduction to the theory of formal languages is given, distinct types of grammars are described an overview of papers in the field of creating their approximations were presented. The concepts of pseudogrammars and language models were described together with their historical development, with the emphasis on the current state-of-the-art and the best methods of language modelling and currently top-performing language models. The issue of quality evaluation of a text is introduced, and various methods of semi-automatic and automatic evaluation are described. In the second part of the paper, two experiments were described that aimed to determine the methodology of creating composite systems for the needs of modelling the Serbian language, where the ways of creating different representations of documents and diverse ways of combining the outputs of independent natural language processing systems were described. On that occasion, parallel systems were successfully tested on the tasks of part-of-speech tagging and authorship attribution through mini-language modelling, for which they achieved significantly better results than independent methods. Finally, the process of training a series of generative pretrained transformers on different representations of the corpus of the Serbian language and creating composite pseudogrammars based on those models and different combining methods is described. The developed systems were evaluated on the tasks of text quality evaluation and finding and correcting errors in the text. The presented results singled out the stacked trained classifier as the optimal method of combining language models into a unique pseudogrammar

Nardus

On the metatheory of linguistics

Author: Wurm Christian
Publication venue: UB Bielefeld
Publication date: 01/01/2013
Field of study

Wurm C. On the metatheory of linguistics. Bielefeld: UB Bielefeld; 2013

Publications at Bielefeld University

Mathematical linguistics

Author: Kornai András
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

but in fact this is still an early draft, version 0.56, August 1 2001. Please d

CiteSeerX

SZTAKI Publication Repository

Recommended from our members

Aspects of emergent cyclicity in language and computation

Author: Krivochen Diego G.
Publication venue
Publication date
Field of study

This thesis has four parts, which correspond to the presentation and development of a theoretical framework for the study of cognitive capacities qua physical phenomena, and a case study of locality conditions over natural languages. Part I deals with computational considerations, setting the tone of the rest of the thesis, and introducing and defining critical concepts like ‘grammar’, ‘automaton’, and the relations between them . Fundamental questions concerning the place of formal language theory in linguistic inquiry, as well as the expressibility of linguistic and computational concepts in common terms, are raised in this part. Part II further explores the issues addressed in Part I with particular emphasis on how grammars are implemented by means of automata, and the properties of the formal languages that these automata generate. We will argue against the equation between effective computation and function-based computation, and introduce examples of computable procedures which are nevertheless impossible to capture using traditional function-based theories. The connection with cognition will be made in the light of dynamical frustrations: the irreconciliable tension between mutually incompatible tendencies that hold for a given dynamical system. We will provide arguments in favour of analyzing natural language as emerging from a tension between different systems (essentially, semantics and morpho-phonology) which impose orthogonal requirements over admissible outputs. The concept of level of organization or scale comes to the foreground here; and apparent contradictions and incommensurabilities between concepts and theories are revisited in a new light: that of dynamical nonlinear systems which are fundamentally frustrated. We will also characterize the computational system that emerges from such an architecture: the goal is to get a syntactic component which assigns the simplest possible structural description to sub-strings, in terms of its computational complexity. A system which can oscillate back and forth in the hierarchy of formal languages in assigning structural representations to local domains will be referred to as a computationally mixed system. Part III is where the really fun stuff starts. Field theory is introduced, and its applicability to neurocognitive phenomena is made explicit, with all due scale considerations. Physical and mathematical concepts are permanently interacting as we analyze phrase structure in terms of pseudo-fractals (in Mandelbrot’s sense) and define syntax as a (possibly unary) set of topological operations over completely Hausdorff (CH) ultrametric spaces. These operations, which makes field perturbations interfere, transform that initial completely Hausdorff ultrametric space into a metric, Hausdorff space with a weaker separation axiom. Syntax, in this proposal, is not ‘generative’ in any traditional sense –except the ‘fully explicit theory’ one-: rather, it partitions (technically, ‘parametrizes’) a topological space. Syntactic dependencies are defined as interferences between perturbations over a field, which reduce the total entropy of the system per cycles, at the cost of introducing further dimensions where attractors corresponding to interpretations for a phrase marker can be found. Part IV is a sample of what we can gain by further pursuing the physics of language approach, both in terms of empirical adequacy and theoretical elegance, not to mention the unlimited possibilities of interdisciplinary collaboration. In this section we set our focus on island phenomena as defined by Ross (1967), critically revisiting the most relevant literature on this topic, and establishing a typology of constructions that are strong islands, which cannot be violated. These constructions are particularly interesting because they limit the phase space of what is expressible via natural language, and thus reveal crucial aspects of its underlying dynamics. We will argue that a dynamically frustrated system which is characterized by displaying mixed computational dependencies can provide straightforward characterizations of cyclicity in terms of changes in dependencies in local domains

Central Archive at the University of Reading