49 research outputs found

    Self-Attention Networks Can Process Bounded Hierarchical Languages

    Full text link
    Despite their impressive performance in NLP, self-attention networks were recently proved to be limited for processing formal languages with hierarchical structure, such as Dyckk\mathsf{Dyck}_k, the language consisting of well-nested parentheses of kk types. This suggested that natural language can be approximated well with models that are too weak for formal languages, or that the role of hierarchy and recursion in natural language might be limited. We qualify this implication by proving that self-attention networks can process Dyckk,D\mathsf{Dyck}_{k, D}, the subset of Dyckk\mathsf{Dyck}_{k} with depth bounded by DD, which arguably better captures the bounded hierarchical structure of natural language. Specifically, we construct a hard-attention network with D+1D+1 layers and O(logk)O(\log k) memory size (per token per layer) that recognizes Dyckk,D\mathsf{Dyck}_{k, D}, and a soft-attention network with two layers and O(logk)O(\log k) memory size that generates Dyckk,D\mathsf{Dyck}_{k, D}. Experiments show that self-attention networks trained on Dyckk,D\mathsf{Dyck}_{k, D} generalize to longer inputs with near-perfect accuracy, and also verify the theoretical memory advantage of self-attention networks over recurrent networks.Comment: ACL 2021. 19 pages with extended appendix. Fixed a small typo in the formula at the end of page 5 (thank to Gabriel Faria). Code: https://github.com/princeton-nlp/dyck-transforme

    Topics in Programming Languages, a Philosophical Analysis through the case of Prolog

    Get PDF
    [EN]Programming languages seldom find proper anchorage in philosophy of logic, language and science. is more, philosophy of language seems to be restricted to natural languages and linguistics, and even philosophy of logic is rarely framed into programming languages topics. The logic programming paradigm and Prolog are, thus, the most adequate paradigm and programming language to work on this subject, combining natural language processing and linguistics, logic programming and constriction methodology on both algorithms and procedures, on an overall philosophizing declarative status. Not only this, but the dimension of the Fifth Generation Computer system related to strong Al wherein Prolog took a major role. and its historical frame in the very crucial dialectic between procedural and declarative paradigms, structuralist and empiricist biases, serves, in exemplar form, to treat straight ahead philosophy of logic, language and science in the contemporaneous age as well. In recounting Prolog's philosophical, mechanical and algorithmic harbingers, the opportunity is open to various routes. We herein shall exemplify some: - the mechanical-computational background explored by Pascal, Leibniz, Boole, Jacquard, Babbage, Konrad Zuse, until reaching to the ACE (Alan Turing) and EDVAC (von Neumann), offering the backbone in computer architecture, and the work of Turing, Church, Gödel, Kleene, von Neumann, Shannon, and others on computability, in parallel lines, throughly studied in detail, permit us to interpret ahead the evolving realm of programming languages. The proper line from lambda-calculus, to the Algol-family, the declarative and procedural split with the C language and Prolog, and the ensuing branching and programming languages explosion and further delimitation, are thereupon inspected as to relate them with the proper syntax, semantics and philosophical élan of logic programming and Prolog

    Kunz Languages

    Get PDF

    Transformers Learn Shortcuts to Automata

    Full text link
    Algorithmic reasoning requires capabilities which are most naturally understood through recurrent models of computation, like the Turing machine. However, Transformer models, while lacking recurrence, are able to perform such reasoning using far fewer layers than the number of reasoning steps. This raises the question: what solutions are learned by these shallow and non-recurrent models? We find that a low-depth Transformer can represent the computations of any finite-state automaton (thus, any bounded-memory algorithm), by hierarchically reparameterizing its recurrent dynamics. Our theoretical results characterize shortcut solutions, whereby a Transformer with o(T)o(T) layers can exactly replicate the computation of an automaton on an input sequence of length TT. We find that polynomial-sized O(logT)O(\log T)-depth solutions always exist; furthermore, O(1)O(1)-depth simulators are surprisingly common, and can be understood using tools from Krohn-Rhodes theory and circuit complexity. Empirically, we perform synthetic experiments by training Transformers to simulate a wide variety of automata, and show that shortcut solutions can be learned via standard training. We further investigate the brittleness of these solutions and propose potential mitigations

    Acta Cybernetica : Volume 15. Number 1.

    Get PDF

    Knowledge Transfer, Templates, and the Spillovers

    Get PDF
    Mathematical models and their modeling frameworks developed to advance knowledge in one discipline are sometimes sourced to answer questions or solve problems in another discipline. Studying this aspect of cross-disciplinary transfer of knowledge objects, philosophers of science have weighed in on the question of whether knowledge about how a mathematical model is previously applied in one discipline is necessary for the success of reapplying said model in a different discipline. However, not much has been said about whether the answer to that epistemological question applies to the reapplication of a modeling framework. More generally, regarding the nature of the production of knowledge in science, a metaphysical question remains to be explored whether historical contingencies associated with a mathematical construct have a genuine impact on the nature—as opposed to sociological practices or individual psychology—of advancing scientific knowledge with said construct. Focusing on this metaphysical question, this paper analyzes the use of mathematical logic in the development of the Chomsky hierarchy and subsequent reapplications of said hierarchy; with these examples, this paper develops the notion of “spillovers” as a way to detect cross-disciplinary justifications for better understanding the relations between reapplications of the same mathematical construct across disciplines

    REGULAR LANGUAGES: TO FINITE AUTOMATA AND BEYOND - SUCCINCT DESCRIPTIONS AND OPTIMAL SIMULATIONS

    Get PDF
    \uc8 noto che i linguaggi regolari \u2014 o di tipo 3 \u2014 sono equivalenti agli automi a stati finiti. Tuttavia, in letteratura sono presenti altre caratterizzazioni di questa classe di linguaggi, in termini di modelli riconoscitori e grammatiche. Per esempio, limitando le risorse computazionali di modelli pi\uf9 generali, quali grammatiche context-free, automi a pila e macchine di Turing, che caratterizzano classi di linguaggi pi\uf9 ampie, \ue8 possibile ottenere modelli che generano o riconoscono solamente i linguaggi regolari. I dispositivi risultanti forniscono delle rappresentazioni alternative dei linguaggi di tipo 3, che, in alcuni casi, risultano significativamente pi\uf9 compatte rispetto a quelle dei modelli che caratterizzano la stessa classe di linguaggi. Il presente lavoro ha l\u2019obiettivo di studiare questi modelli formali dal punto di vista della complessit\ue0 descrizionale, o, in altre parole, di analizzare le relazioni tra le loro dimensioni, ossia il numero di simboli utilizzati per specificare la loro descrizione. Sono presentati, inoltre, alcuni risultati connessi allo studio della famosa domanda tuttora aperta posta da Sakoda e Sipser nel 1978, inerente al costo, in termini di numero di stati, per l\u2019eliminazione del nondeterminismo dagli automi stati finiti sfruttando la capacit\ue0 degli automi two-way deterministici di muovere la testina avanti e indietro sul nastro di input.It is well known that regular \u2014 or type 3 \u2014 languages are equivalent to finite automata. Nevertheless, many other characterizations of this class of languages in terms of computational devices and generative models are present in the literature. For example, by suitably restricting more general models such as context-free grammars, pushdown automata, and Turing machines, that characterize wider classes of languages, it is possible to obtain formal models that generate or recognize regular languages only. The resulting formalisms provide alternative representations of type 3 languages that may be significantly more concise than other models that share the same expressing power. The goal of this work is to investigate these formal systems from a descriptional complexity perspective, or, in other words, to study the relationships between their sizes, namely the number of symbols used to write down their descriptions. We also present some results related to the investigation of the famous question posed by Sakoda and Sipser in 1978, concerning the size blowups from nondeterministic finite automata to two-way deterministic finite automata

    An Open Logic Approach to EPM

    Get PDF
    open2noEPM is a high operative and didactic versatile tool and new application areas are envisaged continuously. In turn, this new awareness has allowed to enlarge our panorama for neurocognitive system EPM is a high operative and didactic versatile tool and new application areas are envisaged continuosly. In turn, this new awareness has allowed to enlarge our panorama for neurocognitive system behavior understanding, and to develop information conservation and regeneration systems in a numeric self-reflexive/reflective evolutive reference framework. Unfortunately, a logically closed model cannot cope with ontological uncertainty by itself; it needs a complementary logical aperture operational support extension. To achieve this goal, it is possible to use two coupled irreducible information management subsystems, based on the following ideal coupled irreducible asymptotic dichotomy: "Information Reliable Predictability" and "Information Reliable Unpredictability" subsystems. To behave realistically, overall system must guarantee both Logical Closure and Logical Aperture, both fed by environmental "noise" (better… from what human beings call "noise"). So, a natural operating point can emerge as a new Trans-disciplinary Reality Level, out of the Interaction of Two Complementary Irreducible Information Management Subsystems within their environment. In this way, it is possible to extend the traditional EPM approach in order to profit by both classic EPM intrinsic Self-Reflexive Functional Logical Closure and new numeric CICT Self-Reflective Functional Logical Aperture. EPM can be thought as a reliable starting subsystem to initialize a process of continuous self-organizing and self-logic learning refinement. understanding, and to develop information conservation and regeneration systems in a numeric self-reflexive/reflective evolutive reference framework. Unfortunately, a logically closed model cannot cope with ontological uncertainty by itself; it needs a complementary logical aperture operational support extension. To achieve this goal, it is possible to use two coupled irreducible information management subsystems, based on the following ideal coupled irreducible asymptotic dichotomy: "Information Reliable Predictability" and "Information Reliable Unpredictability" subsystems. To behave realistically, overall system must guarantee both Logical Closure and Logical Aperture, both fed by environmental "noise" (better… from what human beings call "noise"). So, a natural operating point can emerge as a new Trans-disciplinary Reality Level, out of the Interaction of Two Complementary Irreducible Information Management Subsystems within their environment. In this way, it is possible to extend the traditional EPM approach in order to profit by both classic EPM intrinsic Self-Reflexive Functional Logical Closure and new numeric CICT Self-Reflective Functional Logical Aperture. EPM can be thought as a reliable starting subsystem to initialize a process of continuous self-organizing and self-logic learning refinement.Fiorini, Rodolfo; Degiacomo, PieroFiorini, Rodolfo; Degiacomo, Pier
    corecore