2,219 research outputs found
Ten virtues of structured graphs
This paper extends the invited talk by the first author about the virtues
of structured graphs. The motivation behind the talk and this paper relies on our
experience on the development of ADR, a formal approach for the design of styleconformant,
reconfigurable software systems. ADR is based on hierarchical graphs
with interfaces and it has been conceived in the attempt of reconciling software architectures
and process calculi by means of graphical methods. We have tried to
write an ADR agnostic paper where we raise some drawbacks of flat, unstructured
graphs for the design and analysis of software systems and we argue that hierarchical,
structured graphs can alleviate such drawbacks
Defining Models - Meta Models versus Graph Grammars
The precise specification of software models is a major concern in model-driven design of object-oriented software. Metamodelling and graph grammars are apparent choices for such specifications. Metamodelling has several advantages: it is easy to use, and provides procedures that check automatically whether a model is valid or not. However, it is less suited for proving properties of models, or for generating large sets of example models. Graph grammars, in contrast, offer a natural procedure - the derivation process - for generating example models, and they support proofs because they define a graph language inductively. However, not all graph grammars that allow to specify practically relevant models are easily parseable. In this paper, we propose contextual star grammars as a graph grammar approach that allows for simple parsing and that is powerful enough for specifying non-trivial software models. This is demonstrated by defining program graphs, a language-independent model of object-oriented programs, with a focus on shape (static structure) rather than behavior
PCA Beyond The Concept of Manifolds: Principal Trees, Metro Maps, and Elastic Cubic Complexes
Multidimensional data distributions can have complex topologies and variable
local dimensions. To approximate complex data, we propose a new type of
low-dimensional ``principal object'': a principal cubic complex. This complex
is a generalization of linear and non-linear principal manifolds and includes
them as a particular case. To construct such an object, we combine a method of
topological grammars with the minimization of an elastic energy defined for its
embedment into multidimensional data space. The whole complex is presented as a
system of nodes and springs and as a product of one-dimensional continua
(represented by graphs), and the grammars describe how these continua transform
during the process of optimal complex construction. The simplest case of a
topological grammar (``add a node'', ``bisect an edge'') is equivalent to the
construction of ``principal trees'', an object useful in many practical
applications. We demonstrate how it can be applied to the analysis of bacterial
genomes and for visualization of cDNA microarray data using the ``metro map''
representation. The preprint is supplemented by animation: ``How the
topological grammar constructs branching principal components
(AnimatedBranchingPCA.gif)''.Comment: 19 pages, 8 figure
Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech
We describe a statistical approach for modeling dialogue acts in
conversational speech, i.e., speech-act-like units such as Statement, Question,
Backchannel, Agreement, Disagreement, and Apology. Our model detects and
predicts dialogue acts based on lexical, collocational, and prosodic cues, as
well as on the discourse coherence of the dialogue act sequence. The dialogue
model is based on treating the discourse structure of a conversation as a
hidden Markov model and the individual dialogue acts as observations emanating
from the model states. Constraints on the likely sequence of dialogue acts are
modeled via a dialogue act n-gram. The statistical dialogue grammar is combined
with word n-grams, decision trees, and neural networks modeling the
idiosyncratic lexical and prosodic manifestations of each dialogue act. We
develop a probabilistic integration of speech recognition with dialogue
modeling, to improve both speech recognition and dialogue act classification
accuracy. Models are trained and evaluated using a large hand-labeled database
of 1,155 conversations from the Switchboard corpus of spontaneous
human-to-human telephone speech. We achieved good dialogue act labeling
accuracy (65% based on errorful, automatically recognized words and prosody,
and 71% based on word transcripts, compared to a chance baseline accuracy of
35% and human accuracy of 84%) and a small reduction in word recognition error.Comment: 35 pages, 5 figures. Changes in copy editing (note title spelling
changed
Edge Replacement Grammars: A Formal Language Approach for Generating Graphs
Graphs are increasingly becoming ubiquitous as models for structured data. A
generative model that closely mimics the structural properties of a given set
of graphs has utility in a variety of domains. Much of the existing work
require that a large number of parameters, in fact exponential in size of the
graphs, be estimated from the data. We take a slightly different approach to
this problem, leveraging the extensive prior work in the formal graph grammar
literature. In this paper, we propose a graph generation model based on
Probabilistic Edge Replacement Grammars (PERGs). We propose a variant of PERG
called Restricted PERG (RPERG), which is analogous to PCFGs in string grammar
literature. With this restriction, we are able to derive a learning algorithm
for estimating the parameters of the grammar from graph data. We empirically
demonstrate on real life datasets that RPERGs outperform existing methods for
graph generation. We improve on the performance of the state-of-the-art
Hyperedge Replacement Grammar based graph generative model. Despite being a
context free grammar, the proposed model is able to capture many of the
structural properties of real networks, such as degree distributions, power law
and spectral characteristics.Comment: To be presented at SIAM International Conference on Data Mining
(SDM19). arXiv admin note: text overlap with arXiv:1802.08068,
arXiv:1608.03192 by other author
One Parser to Rule Them All
Despite the long history of research in parsing, constructing parsers for real programming languages remains a difficult and painful task. In the last decades, different parser generators emerged to allow the construction of parsers from a BNF-like specification. However, still today, many parsers are handwritten, or are only partly generated, and include various hacks to deal with different peculiarities in programming languages. The main problem is that current declarative syntax definition techniques are based on pure context-free grammars, while many constructs found in programming languages require context information.
In this paper we propose a parsing framework that embraces context information in its core. Our framework is based on data-dependent grammars, which extend context-free grammars with arbitrary computation, variable binding and constraints. We present an implementation of our framework on top of the Generalized LL (GLL) parsing algorithm, and show how common idioms in syntax of programming languages such as (1) lexical disambiguation filters, (2) operator precedence, (3) indentation-sensitive rules, and (4) conditional preprocessor directives can be mapped to data-dependent grammars. We demonstrate the initial experience with our framework, by parsing more than 20000 Java, C#, Haskell, and OCaml source files
- …