14 research outputs found

    Specifying Software Languages: Grammars, Projectional Editors, and Unconventional Approaches

    Get PDF
    We discuss several approaches for defining software languages, together with Integrated Development Environments for them. Theoretical foundation is grammar-based models: they can be used where proven correctness of specifications is required. From a practical point of view, we discuss how language specification can be made more accessible by focusing on language workbenches and projectional editing, and discuss how it can be formalized. We also give a brief overview of unconventional ideas to language definition, and outline three open problems connected to the approaches we discuss

    Clique-Based Lower Bounds for Parsing Tree-Adjoining Grammars

    Get PDF
    up to lower order factors

    If the Current Clique Algorithms are Optimal, so is Valiant's Parser

    Full text link
    The CFG recognition problem is: given a context-free grammar G\mathcal{G} and a string ww of length nn, decide if ww can be obtained from G\mathcal{G}. This is the most basic parsing question and is a core computer science problem. Valiant's parser from 1975 solves the problem in O(nω)O(n^{\omega}) time, where ω<2.373\omega<2.373 is the matrix multiplication exponent. Dozens of parsing algorithms have been proposed over the years, yet Valiant's upper bound remains unbeaten. The best combinatorial algorithms have mildly subcubic O(n3/log3n)O(n^3/\log^3{n}) complexity. Lee (JACM'01) provided evidence that fast matrix multiplication is needed for CFG parsing, and that very efficient and practical algorithms might be hard or even impossible to obtain. Lee showed that any algorithm for a more general parsing problem with running time O(Gn3ε)O(|\mathcal{G}|\cdot n^{3-\varepsilon}) can be converted into a surprising subcubic algorithm for Boolean Matrix Multiplication. Unfortunately, Lee's hardness result required that the grammar size be G=Ω(n6)|\mathcal{G}|=\Omega(n^6). Nothing was known for the more relevant case of constant size grammars. In this work, we prove that any improvement on Valiant's algorithm, even for constant size grammars, either in terms of runtime or by avoiding the inefficiencies of fast matrix multiplication, would imply a breakthrough algorithm for the kk-Clique problem: given a graph on nn nodes, decide if there are kk that form a clique. Besides classifying the complexity of a fundamental problem, our reduction has led us to similar lower bounds for more modern and well-studied cubic time problems for which faster algorithms are highly desirable in practice: RNA Folding, a central problem in computational biology, and Dyck Language Edit Distance, answering an open question of Saha (FOCS'14)

    Small-Space Algorithms for the Online Language Distance Problem for Palindromes and Squares

    Full text link
    We study the online variant of the language distance problem for two classical formal languages, the language of palindromes and the language of squares, and for the two most fundamental distances, the Hamming distance and the edit (Levenshtein) distance. In this problem, defined for a fixed formal language LL, we are given a string TT of length nn, and the task is to compute the minimal distance to LL from every prefix of TT. We focus on the low-distance regime, where one must compute only the distances smaller than a given threshold kk. In this work, our contribution is twofold: - First, we show streaming algorithms, which access the input string TT only through a single left-to-right scan. Both for palindromes and squares, our algorithms use O(kpoly logn)O(k \cdot\mathrm{poly}~\log n) space and time per character in the Hamming-distance case and O(k2poly logn)O(k^2 \cdot\mathrm{poly}~\log n) space and time per character in the edit-distance case. These algorithms are randomised by necessity, and they err with probability inverse-polynomial in nn. - Second, we show deterministic read-only online algorithms, which are also provided with read-only random access to the already processed characters of TT. Both for palindromes and squares, our algorithms use O(kpoly logn)O(k \cdot\mathrm{poly}~\log n) space and time per character in the Hamming-distance case and O(k4poly logn)O(k^4 \cdot\mathrm{poly}~\log n) space and amortised time per character in the edit-distance case.Comment: Accepted to ISAAC'2

    Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication

    Get PDF
    We describe a matrix multiplication recognition algorithm for a subset of binary linear context-free rewriting systems (LCFRS) with running time O(nωd)O(n^{\omega d}) where M(m)=O(mω)M(m) = O(m^{\omega}) is the running time for m×mm \times m matrix multiplication and dd is the "contact rank" of the LCFRS -- the maximal number of combination and non-combination points that appear in the grammar rules. We also show that this algorithm can be used as a subroutine to get a recognition algorithm for general binary LCFRS with running time O(nωd+1)O(n^{\omega d + 1}). The currently best known ω\omega is smaller than 2.382.38. Our result provides another proof for the best known result for parsing mildly context sensitive formalisms such as combinatory categorial grammars, head grammars, linear indexed grammars, and tree adjoining grammars, which can be parsed in time O(n4.76)O(n^{4.76}). It also shows that inversion transduction grammars can be parsed in time O(n5.76)O(n^{5.76}). In addition, binary LCFRS subsumes many other formalisms and types of grammars, for some of which we also improve the asymptotic complexity of parsing

    New Graph Decompositions and Combinatorial Boolean Matrix Multiplication Algorithms

    Full text link
    We revisit the fundamental Boolean Matrix Multiplication (BMM) problem. With the invention of algebraic fast matrix multiplication over 50 years ago, it also became known that BMM can be solved in truly subcubic O(nω)O(n^\omega) time, where ω<3\omega<3; much work has gone into bringing ω\omega closer to 22. Since then, a parallel line of work has sought comparably fast combinatorial algorithms but with limited success. The naive O(n3)O(n^3)-time algorithm was initially improved by a log2n\log^2{n} factor [Arlazarov et al.; RAS'70], then by log2.25n\log^{2.25}{n} [Bansal and Williams; FOCS'09], then by log3n\log^3{n} [Chan; SODA'15], and finally by log4n\log^4{n} [Yu; ICALP'15]. We design a combinatorial algorithm for BMM running in time n3/2Ω(logn7)n^3 / 2^{\Omega(\sqrt[7]{\log n})} -- a speed-up over cubic time that is stronger than any poly-log factor. This comes tantalizingly close to refuting the conjecture from the 90s that truly subcubic combinatorial algorithms for BMM are impossible. This popular conjecture is the basis for dozens of fine-grained hardness results. Our main technical contribution is a new regularity decomposition theorem for Boolean matrices (or equivalently, bipartite graphs) under a notion of regularity that was recently introduced and analyzed analytically in the context of communication complexity [Kelley, Lovett, Meka; arXiv'23], and is related to a similar notion from the recent work on 33-term arithmetic progression free sets [Kelley, Meka; FOCS'23]

    Tree Adjoining Grammar Parsing and Boolean Matrix Multiplication

    No full text
    The computational problem of parsing a sentence in a tree-adjoining language is investigated. An interesting relation is studied between this problem and the well-known computational problem of Boolean matrix multiplication: it is shown that any algorithm for the solution of the former problem can easily be converted into an algorithm for the solution of the latter problem. This result bears on at least two important computational issues. First, we realize that a straightforward method that improves the known upper bound for tree-adjoining grammar parsing is hard to find. Second, we understand which features of the tree-adjoining grammar parsing problem are responsible for the claimed difficulty

    CLiFF Notes: Research in the Language, Information and Computation Laboratory of the University of Pennsylvania

    Get PDF
    One concern of the Computer Graphics Research Lab is in simulating human task behavior and understanding why the visualization of the appearance, capabilities and performance of humans is so challenging. Our research has produced a system, called Jack, for the definition, manipulation, animation and human factors analysis of simulated human figures. Jack permits the envisionment of human motion by interactive specification and simultaneous execution of multiple constraints, and is sensitive to such issues as body shape and size, linkage, and plausible motions. Enhanced control is provided by natural behaviors such as looking, reaching, balancing, lifting, stepping, walking, grasping, and so on. Although intended for highly interactive applications, Jack is a foundation for other research. The very ubiquitousness of other people in our lives poses a tantalizing challenge to the computational modeler: people are at once the most common object around us, and yet the most structurally complex. Their everyday movements are amazingly fluid, yet demanding to reproduce, with actions driven not just mechanically by muscles and bones but also cognitively by beliefs and intentions. Our motor systems manage to learn how to make us move without leaving us the burden or pleasure of knowing how we did it. Likewise we learn how to describe the actions and behaviors of others without consciously struggling with the processes of perception, recognition, and language. Present technology lets us approach human appearance and motion through computer graphics modeling and three dimensional animation, but there is considerable distance to go before purely synthesized figures trick our senses. We seek to build computational models of human like figures which manifest animacy and convincing behavior. Towards this end, we: Create an interactive computer graphics human model; Endow it with reasonable biomechanical properties; Provide it with human like behaviors; Use this simulated figure as an agent to effect changes in its world; Describe and guide its tasks through natural language instructions. There are presently no perfect solutions to any of these problems; ultimately, however, we should be able to give our surrogate human directions that, in conjunction with suitable symbolic reasoning processes, make it appear to behave in a natural, appropriate, and intelligent fashion. Compromises will be essential, due to limits in computation, throughput of display hardware, and demands of real-time interaction, but our algorithms aim to balance the physical device constraints with carefully crafted models, general solutions, and thoughtful organization. The Jack software is built on Silicon Graphics Iris 4D workstations because those systems have 3-D graphics features that greatly aid the process of interacting with highly articulated figures such as the human body. Of course, graphics capabilities themselves do not make a usable system. Our research has therefore focused on software to make the manipulation of a simulated human figure easy for a rather specific user population: human factors design engineers or ergonomics analysts involved in visualizing and assessing human motor performance, fit, reach, view, and other physical tasks in a workplace environment. The software also happens to be quite usable by others, including graduate students and animators. The point, however, is that program design has tried to take into account a wide variety of physical problem oriented tasks, rather than just offer a computer graphics and animation tool for the already computer sophisticated or skilled animator. As an alternative to interactive specification, a simulation system allows a convenient temporal and spatial parallel programming language for behaviors. The Graphics Lab is working with the Natural Language Group to explore the possibility of using natural language instructions, such as those found in assembly or maintenance manuals, to drive the behavior of our animated human agents. (See the CLiFF note entry for the AnimNL group for details.) Even though Jack is under continual development, it has nonetheless already proved to be a substantial computational tool in analyzing human abilities in physical workplaces. It is being applied to actual problems involving space vehicle inhabitants, helicopter pilots, maintenance technicians, foot soldiers, and tractor drivers. This broad range of applications is precisely the target we intended to reach. The general capabilities embedded in Jack attempt to mirror certain aspects of human performance, rather than the specific requirements of the corresponding workplace. We view the Jack system as the basis of a virtual animated agent that can carry out tasks and instructions in a simulated 3D environment. While we have not yet fooled anyone into believing that the Jack figure is real , its behaviors are becoming more reasonable and its repertoire of actions more extensive. When interactive control becomes more labor intensive than natural language instructional control, we will have reached a significant milestone toward an intelligent agent
    corecore