85 research outputs found

    Learning categorial grammars

    Get PDF
    In 1967 E. M. Gold published a paper in which the language classes from the Chomsky-hierarchy were analyzed in terms of learnability, in the technical sense of identification in the limit. His results were mostly negative, and perhaps because of this his work had little impact on linguistics. In the early eighties there was renewed interest in the paradigm, mainly because of work by Angluin and Wright. Around the same time, Arikawa and his co-workers refined the paradigm by applying it to so-called Elementary Formal Systems. By making use of this approach Takeshi Shinohara was able to come up with an impressive result; any class of context-sensitive grammars with a bound on its number of rules is learnable. Some linguistically motivated work on learnability also appeared from this point on, most notably Wexler & Culicover 1980 and Kanazawa 1994. The latter investigates the learnability of various classes of categorial grammar, inspired by work by Buszkowski and Penn, and raises some interesting questions. We follow up on this work by exploring complexity issues relevant to learning these classes, answering an open question from Kanazawa 1994, and applying the same kind of approach to obtain (non)learnable classes of Combinatory Categorial Grammars, Tree Adjoining Grammars, Minimalist grammars, Generalized Quantifiers, and some variants of Lambek Grammars. We also discuss work on learning tree languages and its application to learning Dependency Grammars. Our main conclusions are: - formal learning theory is relevant to linguistics, - identification in the limit is feasible for non-trivial classes, - the `Shinohara approach' -i.e., placing a numerical bound on the complexity of a grammar- can lead to a learnable class, but this completely depends on the specific nature of the formalism and the notion of complexity. We give examples of natural classes of commonly used linguistic formalisms that resist this kind of approach, - learning is hard work. Our results indicate that learning even `simple' classes of languages requires a lot of computational effort, - dealing with structure (derivation-, dependency-) languages instead of string languages offers a useful and promising approach to learnabilty in a linguistic contex

    Da linguística gerativa à gramática categorial : sujeitos lexicais em infinitivos controlados

    Get PDF
    Orientadores: Marcelo Esteban Coniglio, Sonia Maria Lazzarini CyrinoTese (doutorado) - Universidade Estadual de Campinas, Instituto de Filosofia e Ciências HumanasResumo: A presente tese situa-se na interface da lógica e da linguística; o seu objeto de estudo são os pronomes lexicais em sentenças de controle em três línguas Românicas: Português, Italiano e Espanhol. Esse assunto tem recebido mais atenção na linguística gerativa, especialmente nos anos recentes, do que na gramática de cunho lógico. Talvez como consequência disso, há ainda muito a ser entendido sobre essas estruturas linguísticas e as suas propriedades lógicas. Essa tese tenta preencher as lacunas na literatura \--- ou, pelo menos, avançar nessa direção \--- colocando questões que não foram suficientemente exploradas até agora. Para tal efeito avançamos duas perguntas-chaves, uma linguística e a outra lógica. Elas são, respectivamente: Qual é o estatuto sintático dos pronomes lexicais em estruturas de controle? E: Quais são os mecanismos disponíveis, em uma gramática lógica livre de contração, para se reusar recursos semânticos? A tese divide-se, consequentemente, em duas partes: linguística gerativa e gramática categorial. Na Parte I revisamos algumas das principais teorias de controle gerativistas e a recente discussão acerca das cláusulas infinitivas com sujeito lexical. Na Parte II revisamos a literatura categorial, atendendo principalmente às propostas acerca das estruturas de controle e dos pronomes anafóricos. Em última instância, mostraremos que as propostas linguísticas e lógicas prévias precisam ser modificadas para se explicar o fenômeno linguístico em questão. Com efeito, nos capítulos finais de cada uma das partes avançamos propostas alternativas que, a nosso ver, resultam mais adequadas que as suas rivais. Mais específicamente, na Parte I avançamos uma proposta linguística na linha do cálculo de controle T/Agr de Landau. Na Parte II apresentamos duas propostas categoriais, uma na linha do cálculo categorial combinatório e a outra, na gramática lógica de tipos. Finalmente mostramos a implementação da última proposta em um analisador sintático e de demonstração categorialAbstract: The present thesis lies at the interface of logic and linguistics; its object of study are control sentences with overt pronouns in Romance languages (European and Brazilian Portuguese, Italian and Spanish). This is a topic that has received considerably more attention on the part of linguists, especially in recent years, than from logicians. Perhaps for this reason, much remains to be understood about these linguistic structures and their underlying logical properties. This thesis seeks to fill the lacunas in the literature \--- or at least take steps in this direction \--- by way of addressing a number of issues that have so far been under-explored. To this end we put forward two key questions, one linguistic and the other logical. These are, respectively: What is the syntactic status of the surface pronoun? And: What are the available mechanisms to reuse semantic resources in a contraction-free logical grammar? Accordingly, the thesis is divided into two parts: generative linguistics and categorial grammar. Part I starts by reviewing the recent discussion within the generative literature on infinitive clauses with overt subjects, paying detailed attention to the main accounts in the field. Part II does the same on the logical grammar front, addressing in particular the issues of control and of anaphoric pronouns. Ultimately, the leading accounts from both camps will be found wanting. The closing chapter of each of Part I and Part II will thus put forward alternative candidates, that we contend are more successful than their predecessors. More specifically, in Part I we offer a linguistic account along the lines of Landau's T/Agr theory of control. In Part II we present two alternative categorial accounts: one based on Combinatory Categorial Grammar, the other on Type-Logical Grammar. Each of these accounts offers an improved, more fine-grained perspective on control infinitives featuring overt pronominal subjects. Finally, we include an Appendix in which our type-logical proposal is implemented in a categorial parser/theorem-prover (categorial parser/theorem-prover)DoutoradoFilosofiaDoutora em Filosofia2013/08115-1, 2015/09699-2FAPESPCAPE

    Category-Theoretic Quantitative Compositional Distributional Models of Natural Language Semantics

    Full text link
    This thesis is about the problem of compositionality in distributional semantics. Distributional semantics presupposes that the meanings of words are a function of their occurrences in textual contexts. It models words as distributions over these contexts and represents them as vectors in high dimensional spaces. The problem of compositionality for such models concerns itself with how to produce representations for larger units of text by composing the representations of smaller units of text. This thesis focuses on a particular approach to this compositionality problem, namely using the categorical framework developed by Coecke, Sadrzadeh, and Clark, which combines syntactic analysis formalisms with distributional semantic representations of meaning to produce syntactically motivated composition operations. This thesis shows how this approach can be theoretically extended and practically implemented to produce concrete compositional distributional models of natural language semantics. It furthermore demonstrates that such models can perform on par with, or better than, other competing approaches in the field of natural language processing. There are three principal contributions to computational linguistics in this thesis. The first is to extend the DisCoCat framework on the syntactic front and semantic front, incorporating a number of syntactic analysis formalisms and providing learning procedures allowing for the generation of concrete compositional distributional models. The second contribution is to evaluate the models developed from the procedures presented here, showing that they outperform other compositional distributional models present in the literature. The third contribution is to show how using category theory to solve linguistic problems forms a sound basis for research, illustrated by examples of work on this topic, that also suggest directions for future research.Comment: DPhil Thesis, University of Oxford, Submitted and accepted in 201

    Prospects for Declarative Mathematical Modeling of Complex Biological Systems

    Full text link
    Declarative modeling uses symbolic expressions to represent models. With such expressions one can formalize high-level mathematical computations on models that would be difficult or impossible to perform directly on a lower-level simulation program, in a general-purpose programming language. Examples of such computations on models include model analysis, relatively general-purpose model-reduction maps, and the initial phases of model implementation, all of which should preserve or approximate the mathematical semantics of a complex biological model. The potential advantages are particularly relevant in the case of developmental modeling, wherein complex spatial structures exhibit dynamics at molecular, cellular, and organogenic levels to relate genotype to multicellular phenotype. Multiscale modeling can benefit from both the expressive power of declarative modeling languages and the application of model reduction methods to link models across scale. Based on previous work, here we define declarative modeling of complex biological systems by defining the operator algebra semantics of an increasingly powerful series of declarative modeling languages including reaction-like dynamics of parameterized and extended objects; we define semantics-preserving implementation and semantics-approximating model reduction transformations; and we outline a "meta-hierarchy" for organizing declarative models and the mathematical methods that can fruitfully manipulate them

    The Logic of Categorial Grammars: Lecture Notes

    Get PDF
    These lecture notes present categorial grammars as deductive systems, and include detailed proofs of their main properties. The first chapter deals with Ajdukiewicz and Bar-Hillel categorial grammars (AB grammars), their relation to context-free grammars and their learning algorithms. The second chapter is devoted to the Lambek calculus as a deductive system; the weak equivalence with context free grammars is proved; we also define the mapping from a syntactic analysis to a higher-order logical formula, which describes the semantics of the parsed sentence. The third and last chapter is about proof-nets as parse structures for Lambek grammars; we show the linguistic relevance of these graphs in particular through the study of a performance question. Although definitions, theorems and proofs have been reformulated for pedagogical reasons, these notes contain no personnal result but in the proofnet chapter

    K + K = 120 : Papers dedicated to László Kálmán and András Kornai on the occasion of their 60th birthdays

    Get PDF

    Meaning versus Grammar

    Get PDF
    This volume investigates the complicated relationship between grammar, computation, and meaning in natural languages. It details conditions under which meaning-driven processing of natural language is feasible, discusses an operational and accessible implementation of the grammatical cycle for Dutch, and offers analyses of a number of further conjectures about constituency and entailment in natural language

    Algebraic dependency grammar

    Get PDF
    We propose a mathematical formalism called Algebraic Dependency Grammar with applications to formal linguistics and to formal language theory. Regarding formal linguistics we aim to address the problem of grammaticality with special attention to cross-linguistic cases. In the field of formal language theory this formalism provides a new perspective allowing an algebraic classification of languages. Notably our approach suggests the existence of so-called anti-classes of languages associated to certain classes of languages. Our notion of a dependency grammar is as of a definition of a set of well-constructed dependency trees (we call this algebraic governance) and a relation which associates word-orders to dependency trees (we call this algebraic linearization). In relation to algebraic governance, we define a manifold which is a set of dependency trees satisfying an agreement condition throughout a pattern, which is the algebraic form of a collection of syntactic addresses over the dependency tree. A boolean condition on the words formalizes the notion of agreement. In relation to algebraic linearization, first we observe that the notion of projectivity is quintessentially that certain substructures of a dependency tree always form an interval in its linearization. So we have to establish well what is a substructure; we see again that patterns proportion the key, generalizing the notion of projectivity with recursive linearization procedures. Combining the above modules we have the formalism: an algebraic dependency grammar is a manifold together with a linearization. Notice that patterns sustain both manifolds and linearizations. We study their interrelation in terms of a new algebraic classification of classes of languages. We highlight the main contributions of the thesis. Regarding mathematical linguistics, algebraic dependency grammar considers trees and word-order different modules in the architecture, which allows description of languages with varied word-order. Ellipses are permitted; this issue is usually avoided because it makes some formalisms non-decidable. We differentiate linguistic phenomena structurally by their algebraic description. Algebraic dependency grammar permits observance of affinity between linguistic constructions which seem superficially different. Regarding formal language theory, a new system for understanding a very large family of languages is presented which permits observation of languages in broader contexts. We identify a new class named anti-context-free languages containing constructions structurally symmetric to context-free languages. Informally we could say that context-free languages are well-parenthesized, while anti-context-free languages are cross-serial-parenthesized. For example copy languages and respectively languages are anti-context-free.Es proposa un formalisme matemàtic anomenat Gramàtica de Dependències Algebraica amb aplicacions a la lingüística formal i a la teoria de llenguatges formals. Pel que fa a la lingüística formal es pretén abordar el problema de la gramaticalitat, amb un èmfasi especial en la transversalitat, això és, que el formalisme sigui apte per a un bon nombre de llengües. En el camp dels llenguatges formals aquest formalisme proporciona una nova perspectiva que permet una classificació algebraica dels llenguatges. Aquest enfocament suggereix a més a més l'existència de les aquí anomenades anti-classes de llenguatges associades a certes classes de llenguatges. La nostra idea d'una gramàtica de dependències és en un conjunt de sintagmes ben construïts (d'això en diem recció algebraica) i una relació que associa ordres de paraules als sintagmes d'aquest conjunt (d'això en diem linearització algebraica). Pel que fa a la recció algebraica, introduïm el concepte de varietat sintàctica com el conjunt de sintagmes que satisfan una concordança sobre un determinat patró. Un patró és un conjunt d'adreces sintàctiques descrit algebraicament. La concordança es formalitza a través d'una condició booleana sobre el vocabulari. En relació amb linearització algebraica, en primer lloc, observem que l'essencial de la noció clàssica de projectivitat rau en el fet que certes subestructures d'un arbre de dependències formen sempre un interval en la seva linearització. Així doncs, primer hem d'establir bé que vol dir subestructura. Un cop més veiem que els patrons en proporcionen la clau, tot generalitzant la noció de projectivitat a través d'un procediment recursiu de linearització. Tot unint els dos mòduls anteriors ja tenim el nostre formalisme a punt: una gramàtica de dependències algebraica és una varietat sintàctica juntament amb una linearització. Notem que els patrons són a la base de tots dos mòduls: varietats i linearitzacions, així que resulta del tot natural estudiar-ne la interrelació en termes d'un nou sistema de classificació algebraica de classes de llenguatges. Destaquem les principals contribucions d'aquesta tesi. Pel que fa a la matemàtica lingüística, la gramàtica de dependències algebraica considera els arbres i l'ordre de les paraules diferents mòduls dins l'arquitectura la qual cosa permet de descriure llenguatges amb una gran varietat d'ordre. L'ús d'el·lipsis és permès; aquesta qüestió és normalment evitada en altres formalismes per tal com la possibilitat d'el·lipsis fa que els models es tornin no decidibles. El nostre model també ens permet classificar estructuralment fenòmens lingüístics segons la seva descripció algebraica, així com de copsar afinitats entre construccions que semblen superficialment diferents. Pel que fa a la teoria dels llenguatges formals, presentem un nou sistema de classificació que ens permet d'entendre els llenguatges en un context més ampli. Identifiquem una nova classe que anomenem llenguatges anti-lliures-de-context que conté construccions estructuralment simètriques als llenguatges lliures de context. Informalment podríem dir que els llenguatges lliures de context estan ben parentetitzats, mentre que els anti-lliures-de-context estan parentetitzats segons dependències creuades en sèrie. En són mostres d'aquesta classe els llenguatges còpia i els llenguatges respectivament.Postprint (published version
    corecore