5,198 research outputs found
Extending Attribute Grammars to Support Programming-in-the-Large
Attribute grammars add specification of static semantic properties to context-free grammars, which in turn describe the syntactic structure of program units. However, context-free grammars cannot express programming-in-the-large features common in modern programming languages, including unordered collections of units, included units and sharing of included units. We present extensions to context-free grammars, and corresponding extensions to attribute grammars, suitable for defining such features. We explain how batch and incremental attribute evaluation algorithms can be adapted to support these extensions, resulting in a uniform approach to intra-unit and inter-unit static semantic analysis and translation of multi-unit programs
Interpretable Categorization of Heterogeneous Time Series Data
Understanding heterogeneous multivariate time series data is important in
many applications ranging from smart homes to aviation. Learning models of
heterogeneous multivariate time series that are also human-interpretable is
challenging and not adequately addressed by the existing literature. We propose
grammar-based decision trees (GBDTs) and an algorithm for learning them. GBDTs
extend decision trees with a grammar framework. Logical expressions derived
from a context-free grammar are used for branching in place of simple
thresholds on attributes. The added expressivity enables support for a wide
range of data types while retaining the interpretability of decision trees. In
particular, when a grammar based on temporal logic is used, we show that GBDTs
can be used for the interpretable classi cation of high-dimensional and
heterogeneous time series data. Furthermore, we show how GBDTs can also be used
for categorization, which is a combination of clustering and generating
interpretable explanations for each cluster. We apply GBDTs to analyze the
classic Australian Sign Language dataset as well as data on near mid-air
collisions (NMACs). The NMAC data comes from aircraft simulations used in the
development of the next-generation Airborne Collision Avoidance System (ACAS
X).Comment: 9 pages, 5 figures, 2 tables, SIAM International Conference on Data
Mining (SDM) 201
On Language Processors and Software Maintenance
This work investigates declarative transformation tools in the context of software maintenance. Besides maintenance of the language specification, evolution of a software language
requires the adaptation of the software written in that language as well as the adaptation of the software that transforms software written in the evolving language. This co-evolution is studied to derive automatic adaptations of artefacts from adaptations of the language specification.
Furthermore, AOP for Prolog is introduced to improve maintainability of language specifications and derived tools.Die Arbeit unterstützt deklarative Transformationswerkzeuge
im Kontext der Softwarewartung. Neben der Wartung der
Sprachbeschreibung erfordert die Evolution einer Sprache
sowohl die Anpassung der Software, die in dieser Sprache geschrieben ist als auch die Anpassung der Software, die diese Software transformiert. Diese Koevolution wird untersucht, um automatische Anpassungen
von Artefakten von Anpassungen der Sprachbeschreibungen abzuleiten. Weiterhin wird AOP für Prolog eingeführt, um die Wartbarkeit von Sprachbeschreibungen und den daraus abgeleiteten Werkzeugen zu erhöhen
The Rascal Language Workbench
Rascal is a programming language for source code analysis and transformation. This means
that typically the input of a Rascal program is a program in some programming language, and
the output is often yet another program. So Rascal is a meta programming language. Source code
is thus primary object of manipulation in Rascal.
Many of the use cases that Rascal is designed to address, follow the Extract-Analyze-
SYnthesize, or EASY paradigm (shown in Figure 1.1). Meta programs often start by extracting
information (facts) from the input program. This is the extraction phase. An example could
be the call-graph of a program. Then, this extracted information is often subject to analysis:
derived facts are computed, the information is enriched. For the call graph, a simple analysis
is determining the root or leaf routines in the a source program by analysing the extracted
call-graph. Another analysis could be concerned by identifying routines that are never called
(dead code). Finally, the meta program will synthesize some kind of result. This can be transformed
source code (e.g., removal of dead code from the input program), a report (e.g., statistics
on the number of root and leaf routines), or a visualization (e.g., a graphical depiction of the
call-graph). Of course, these phases are not strictly sequential: there may be feedback loops.
Some analysis leads to new extraction, synthesis of a result may lead to new analyses and so
on. Rascal has elaborated features to support each of the phases of the EASY paradigm fully
integrated in the language.
Naturally, the implementation of domain specific languages (DSLs), or more generally, modeldriven
engineering (MDE) fits the EASY paradigm very well. When implementing a DSL compiler
or interpreter the input is, of course, DSL source code. Extraction could, for instance,
include the derivation of an AST from the concrete syntax tree. Another extracted model could
be a graph-like structure representing the input in a more abstract way, or a performance model.
Such abstractions are input to analyses such as constraint checking or type checking, verification,
quality-of-service analysis etc. Finally, synthesis covers tasks such as graphical visualization,
code generation, and optimization. To conclude, in the context of Rascal, we see DSL implementation
as an instance of source code analysis and transformation
Generalizing input-driven languages: theoretical and practical benefits
Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks
to their simplicity they enjoy various nice algebraic and logic properties that
have been successfully exploited in many application fields. Practically all of
their related problems are decidable, so that they support automatic
verification algorithms. Also, they can be recognized in real-time.
Context-free languages (CFL) are another major family well-suited to
formalize programming, natural, and many other classes of languages; their
increased generative power w.r.t. RL, however, causes the loss of several
closure properties and of the decidability of important problems; furthermore
they need complex parsing algorithms. Thus, various subclasses thereof have
been defined with different goals, spanning from efficient, deterministic
parsing to closure properties, logic characterization and automatic
verification techniques.
Among CFL subclasses, so-called structured ones, i.e., those where the
typical tree-structure is visible in the sentences, exhibit many of the
algebraic and logic properties of RL, whereas deterministic CFL have been
thoroughly exploited in compiler construction and other application fields.
After surveying and comparing the main properties of those various language
families, we go back to operator precedence languages (OPL), an old family
through which R. Floyd pioneered deterministic parsing, and we show that they
offer unexpected properties in two fields so far investigated in totally
independent ways: they enable parsing parallelization in a more effective way
than traditional sequential parsers, and exhibit the same algebraic and logic
properties so far obtained only for less expressive language families
- …