1,127 research outputs found
Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication
We describe a matrix multiplication recognition algorithm for a subset of
binary linear context-free rewriting systems (LCFRS) with running time
where is the running time for matrix multiplication and is the "contact rank" of the LCFRS --
the maximal number of combination and non-combination points that appear in the
grammar rules. We also show that this algorithm can be used as a subroutine to
get a recognition algorithm for general binary LCFRS with running time
. The currently best known is smaller than
. Our result provides another proof for the best known result for parsing
mildly context sensitive formalisms such as combinatory categorial grammars,
head grammars, linear indexed grammars, and tree adjoining grammars, which can
be parsed in time . It also shows that inversion transduction
grammars can be parsed in time . In addition, binary LCFRS
subsumes many other formalisms and types of grammars, for some of which we also
improve the asymptotic complexity of parsing
Parsing Unary Boolean Grammars Using Online Convolution
In contrast to context-free grammars, the extension of these
grammars by explicit conjunction, the so-called conjunctive
grammars can generate (quite complicated) non-regular languages
over a single-letter alphabet (DLT 2007). Given these
expressibility results, we study the parsability of Boolean grammars,
an extension of context-free grammars by conjunction and negation,
over a unary alphabet and show that they can be parsed in time O(|G| log^2(n) M(n))
where M(n) is the time to multiply two n-bit integers. This multiplication
algorithm is transformed into a convolution algorithm which in turn is
converted to an online convolution algorithm which is used for the parsing
Clique-Based Lower Bounds for Parsing Tree-Adjoining Grammars
up to lower order factors
Rewriting a Deep Generative Model
A deep generative model such as a GAN learns to model a rich set of semantic
and physical rules about the target distribution, but up to now, it has been
obscure how such rules are encoded in the network, or how a rule could be
changed. In this paper, we introduce a new problem setting: manipulation of
specific rules encoded by a deep generative model. To address the problem, we
propose a formulation in which the desired rule is changed by manipulating a
layer of a deep network as a linear associative memory. We derive an algorithm
for modifying one entry of the associative memory, and we demonstrate that
several interesting structural rules can be located and modified within the
layers of state-of-the-art generative models. We present a user interface to
enable users to interactively change the rules of a generative model to achieve
desired effects, and we show several proof-of-concept applications. Finally,
results on multiple datasets demonstrate the advantage of our method against
standard fine-tuning methods and edit transfer algorithms.Comment: ECCV 2020 (oral). Code at https://github.com/davidbau/rewriting. For
videos and demos see https://rewriting.csail.mit.edu
An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities
We describe an extension of Earley's parser for stochastic context-free
grammars that computes the following quantities given a stochastic context-free
grammar and an input string: a) probabilities of successive prefixes being
generated by the grammar; b) probabilities of substrings being generated by the
nonterminals, including the entire string being generated by the grammar; c)
most likely (Viterbi) parse of the string; d) posterior expected number of
applications of each grammar production, as required for reestimating rule
probabilities. (a) and (b) are computed incrementally in a single left-to-right
pass over the input. Our algorithm compares favorably to standard bottom-up
parsing methods for SCFGs in that it works efficiently on sparse grammars by
making use of Earley's top-down control structure. It can process any
context-free rule format without conversion to some normal form, and combines
computations for (a) through (d) in a single algorithm. Finally, the algorithm
has simple extensions for processing partially bracketed inputs, and for
finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational
Linguistics 2
Multiple context-free path querying by matrix multiplication
Many graph analysis problems can be formulated as formal language-constrained path querying problems where the formal languages are used as constraints for navigational path queries. Recently, the context-free language (CFL) reachability formulation has become very popular and can be used in many areas, for example, querying graph databases, Resource Description Framework (RDF) analysis. However, the generative capacity of context-free grammars (CFGs) is too weak to generate some complex queries, for example, from natural languages, and the various extensions of CFGs
have been proposed. Multiple context-free grammar (MCFG) is one of such extensions of CFGs. Despite the fact that, to the best of our knowledge, there is no algorithm for MCFL-reachability, this problem is known to be decidable. This paper is devoted to developing the first such algorithm for the MCFL-reachability problem. The essence of the proposed algorithm is to use a set of Boolean matrices and operations on them to find paths in a graph that satisfy the given constraints. The main operation here is Boolean matrix multiplication. As a result, the algorithm returns a set of matrices containing all information needed to solve the MCFL-reachability problem. The presented algorithm is implemented in Python using GraphBLAS API. An analysis of real RDF data and synthetic graphs for some MCFLs is performed. The
study showed that using a sparse format for matrix storage and parallel computing for graphs with tens of thousands of
edges the analysis time can be 10–20 minutes. The result of the analysis provides tens of millions of reachable vertex pairs. The proposed algorithm can be applied in problems of static code analysis, bioinformatics, network analysis, as well as in graph databases when a path query cannot be expressed using context-free grammars. The provided algorithm is linear algebra-based, hence, it allows one to use high-performance libraries and utilize modern parallel hardware
Survey on Instruction Selection: An Extensive and Modern Literature Review
Instruction selection is one of three optimisation problems involved in the
code generator backend of a compiler. The instruction selector is responsible
of transforming an input program from its target-independent representation
into a target-specific form by making best use of the available machine
instructions. Hence instruction selection is a crucial part of efficient code
generation.
Despite on-going research since the late 1960s, the last, comprehensive
survey on the field was written more than 30 years ago. As new approaches and
techniques have appeared since its publication, this brings forth a need for a
new, up-to-date review of the current body of literature. This report addresses
that need by performing an extensive review and categorisation of existing
research. The report therefore supersedes and extends the previous surveys, and
also attempts to identify where future research should be directed.Comment: Major changes: - Merged simulation chapter with macro expansion
chapter - Addressed misunderstandings of several approaches - Completely
rewrote many parts of the chapters; strengthened the discussion of many
approaches - Revised the drawing of all trees and graphs to put the root at
the top instead of at the bottom - Added appendix for listing the approaches
in a table See doc for more inf
Improving the Representation and Conversion of Mathematical Formulae by Considering their Textual Context
Mathematical formulae represent complex semantic information in a concise
form. Especially in Science, Technology, Engineering, and Mathematics,
mathematical formulae are crucial to communicate information, e.g., in
scientific papers, and to perform computations using computer algebra systems.
Enabling computers to access the information encoded in mathematical formulae
requires machine-readable formats that can represent both the presentation and
content, i.e., the semantics, of formulae. Exchanging such information between
systems additionally requires conversion methods for mathematical
representation formats. We analyze how the semantic enrichment of formulae
improves the format conversion process and show that considering the textual
context of formulae reduces the error rate of such conversions. Our main
contributions are: (1) providing an openly available benchmark dataset for the
mathematical format conversion task consisting of a newly created test
collection, an extensive, manually curated gold standard and task-specific
evaluation metrics; (2) performing a quantitative evaluation of
state-of-the-art tools for mathematical format conversions; (3) presenting a
new approach that considers the textual context of formulae to reduce the error
rate for mathematical format conversions. Our benchmark dataset facilitates
future research on mathematical format conversions as well as research on many
problems in mathematical information retrieval. Because we annotated and linked
all components of formulae, e.g., identifiers, operators and other entities, to
Wikidata entries, the gold standard can, for instance, be used to train methods
for formula concept discovery and recognition. Such methods can then be applied
to improve mathematical information retrieval systems, e.g., for semantic
formula search, recommendation of mathematical content, or detection of
mathematical plagiarism.Comment: 10 pages, 4 figure
- …