14 research outputs found
Specifying Software Languages: Grammars, Projectional Editors, and Unconventional Approaches
We discuss several approaches for defining software languages, together with Integrated Development Environments for them. Theoretical foundation is grammar-based models: they can be used where proven correctness of specifications is required. From a practical point of view, we discuss how language specification can be made more accessible by focusing on language workbenches and projectional editing, and discuss how it can be formalized. We also give a brief overview of unconventional ideas to language definition, and outline three open problems connected to the approaches we discuss
Clique-Based Lower Bounds for Parsing Tree-Adjoining Grammars
up to lower order factors
If the Current Clique Algorithms are Optimal, so is Valiant's Parser
The CFG recognition problem is: given a context-free grammar
and a string of length , decide if can be obtained from
. This is the most basic parsing question and is a core computer
science problem. Valiant's parser from 1975 solves the problem in
time, where is the matrix multiplication
exponent. Dozens of parsing algorithms have been proposed over the years, yet
Valiant's upper bound remains unbeaten. The best combinatorial algorithms have
mildly subcubic complexity.
Lee (JACM'01) provided evidence that fast matrix multiplication is needed for
CFG parsing, and that very efficient and practical algorithms might be hard or
even impossible to obtain. Lee showed that any algorithm for a more general
parsing problem with running time can
be converted into a surprising subcubic algorithm for Boolean Matrix
Multiplication. Unfortunately, Lee's hardness result required that the grammar
size be . Nothing was known for the more relevant
case of constant size grammars.
In this work, we prove that any improvement on Valiant's algorithm, even for
constant size grammars, either in terms of runtime or by avoiding the
inefficiencies of fast matrix multiplication, would imply a breakthrough
algorithm for the -Clique problem: given a graph on nodes, decide if
there are that form a clique.
Besides classifying the complexity of a fundamental problem, our reduction
has led us to similar lower bounds for more modern and well-studied cubic time
problems for which faster algorithms are highly desirable in practice: RNA
Folding, a central problem in computational biology, and Dyck Language Edit
Distance, answering an open question of Saha (FOCS'14)
Small-Space Algorithms for the Online Language Distance Problem for Palindromes and Squares
We study the online variant of the language distance problem for two
classical formal languages, the language of palindromes and the language of
squares, and for the two most fundamental distances, the Hamming distance and
the edit (Levenshtein) distance. In this problem, defined for a fixed formal
language , we are given a string of length , and the task is to
compute the minimal distance to from every prefix of . We focus on the
low-distance regime, where one must compute only the distances smaller than a
given threshold . In this work, our contribution is twofold:
- First, we show streaming algorithms, which access the input string only
through a single left-to-right scan. Both for palindromes and squares, our
algorithms use space and time per character in
the Hamming-distance case and space and time
per character in the edit-distance case. These algorithms are randomised by
necessity, and they err with probability inverse-polynomial in .
- Second, we show deterministic read-only online algorithms, which are also
provided with read-only random access to the already processed characters of
. Both for palindromes and squares, our algorithms use space and time per character in the
Hamming-distance case and space and
amortised time per character in the edit-distance case.Comment: Accepted to ISAAC'2
Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication
We describe a matrix multiplication recognition algorithm for a subset of
binary linear context-free rewriting systems (LCFRS) with running time
where is the running time for matrix multiplication and is the "contact rank" of the LCFRS --
the maximal number of combination and non-combination points that appear in the
grammar rules. We also show that this algorithm can be used as a subroutine to
get a recognition algorithm for general binary LCFRS with running time
. The currently best known is smaller than
. Our result provides another proof for the best known result for parsing
mildly context sensitive formalisms such as combinatory categorial grammars,
head grammars, linear indexed grammars, and tree adjoining grammars, which can
be parsed in time . It also shows that inversion transduction
grammars can be parsed in time . In addition, binary LCFRS
subsumes many other formalisms and types of grammars, for some of which we also
improve the asymptotic complexity of parsing
New Graph Decompositions and Combinatorial Boolean Matrix Multiplication Algorithms
We revisit the fundamental Boolean Matrix Multiplication (BMM) problem. With
the invention of algebraic fast matrix multiplication over 50 years ago, it
also became known that BMM can be solved in truly subcubic time,
where ; much work has gone into bringing closer to .
Since then, a parallel line of work has sought comparably fast combinatorial
algorithms but with limited success. The naive -time algorithm was
initially improved by a factor [Arlazarov et al.; RAS'70], then by
[Bansal and Williams; FOCS'09], then by [Chan;
SODA'15], and finally by [Yu; ICALP'15].
We design a combinatorial algorithm for BMM running in time -- a speed-up over cubic time that is stronger
than any poly-log factor. This comes tantalizingly close to refuting the
conjecture from the 90s that truly subcubic combinatorial algorithms for BMM
are impossible. This popular conjecture is the basis for dozens of fine-grained
hardness results.
Our main technical contribution is a new regularity decomposition theorem for
Boolean matrices (or equivalently, bipartite graphs) under a notion of
regularity that was recently introduced and analyzed analytically in the
context of communication complexity [Kelley, Lovett, Meka; arXiv'23], and is
related to a similar notion from the recent work on -term arithmetic
progression free sets [Kelley, Meka; FOCS'23]
Tree Adjoining Grammar Parsing and Boolean Matrix Multiplication
The computational problem of parsing a sentence in a tree-adjoining language is investigated. An interesting relation is studied between this problem and the well-known computational problem of Boolean matrix multiplication: it is shown that any algorithm for the solution of the former problem can easily be converted into an algorithm for the solution of the latter problem. This result bears on at least two important computational issues. First, we realize that a straightforward method that improves the known upper bound for tree-adjoining grammar parsing is hard to find. Second, we understand which features of the tree-adjoining grammar parsing problem are responsible for the claimed difficulty
CLiFF Notes: Research in the Language, Information and Computation Laboratory of the University of Pennsylvania
One concern of the Computer Graphics Research Lab is in simulating human task behavior and understanding why the visualization of the appearance, capabilities and performance of humans is so challenging. Our research has produced a system, called Jack, for the definition, manipulation, animation and human factors analysis of simulated human figures. Jack permits the envisionment of human motion by interactive specification and simultaneous execution of multiple constraints, and is sensitive to such issues as body shape and size, linkage, and plausible motions. Enhanced control is provided by natural behaviors such as looking, reaching, balancing, lifting, stepping, walking, grasping, and so on. Although intended for highly interactive applications, Jack is a foundation for other research.
The very ubiquitousness of other people in our lives poses a tantalizing challenge to the computational modeler: people are at once the most common object around us, and yet the most structurally complex. Their everyday movements are amazingly fluid, yet demanding to reproduce, with actions driven not just mechanically by muscles and bones but also cognitively by beliefs and intentions. Our motor systems manage to learn how to make us move without leaving us the burden or pleasure of knowing how we did it. Likewise we learn how to describe the actions and behaviors of others without consciously struggling with the processes of perception, recognition, and language.
Present technology lets us approach human appearance and motion through computer graphics modeling and three dimensional animation, but there is considerable distance to go before purely synthesized figures trick our senses. We seek to build computational models of human like figures which manifest animacy and convincing behavior. Towards this end, we: Create an interactive computer graphics human model; Endow it with reasonable biomechanical properties; Provide it with human like behaviors; Use this simulated figure as an agent to effect changes in its world; Describe and guide its tasks through natural language instructions.
There are presently no perfect solutions to any of these problems; ultimately, however, we should be able to give our surrogate human directions that, in conjunction with suitable symbolic reasoning processes, make it appear to behave in a natural, appropriate, and intelligent fashion. Compromises will be essential, due to limits in computation, throughput of display hardware, and demands of real-time interaction, but our algorithms aim to balance the physical device constraints with carefully crafted models, general solutions, and thoughtful organization.
The Jack software is built on Silicon Graphics Iris 4D workstations because those systems have 3-D graphics features that greatly aid the process of interacting with highly articulated figures such as the human body. Of course, graphics capabilities themselves do not make a usable system. Our research has therefore focused on software to make the manipulation of a simulated human figure easy for a rather specific user population: human factors design engineers or ergonomics analysts involved in visualizing and assessing human motor performance, fit, reach, view, and other physical tasks in a workplace environment. The software also happens to be quite usable by others, including graduate students and animators. The point, however, is that program design has tried to take into account a wide variety of physical problem oriented tasks, rather than just offer a computer graphics and animation tool for the already computer sophisticated or skilled animator.
As an alternative to interactive specification, a simulation system allows a convenient temporal and spatial parallel programming language for behaviors. The Graphics Lab is working with the Natural Language Group to explore the possibility of using natural language instructions, such as those found in assembly or maintenance manuals, to drive the behavior of our animated human agents. (See the CLiFF note entry for the AnimNL group for details.)
Even though Jack is under continual development, it has nonetheless already proved to be a substantial computational tool in analyzing human abilities in physical workplaces. It is being applied to actual problems involving space vehicle inhabitants, helicopter pilots, maintenance technicians, foot soldiers, and tractor drivers. This broad range of applications is precisely the target we intended to reach. The general capabilities embedded in Jack attempt to mirror certain aspects of human performance, rather than the specific requirements of the corresponding workplace.
We view the Jack system as the basis of a virtual animated agent that can carry out tasks and instructions in a simulated 3D environment. While we have not yet fooled anyone into believing that the Jack figure is real , its behaviors are becoming more reasonable and its repertoire of actions more extensive. When interactive control becomes more labor intensive than natural language instructional control, we will have reached a significant milestone toward an intelligent agent