1,521 research outputs found
Simulation computation in grammar-compressed graphs
Like [1], we present an algorithm to compute the simulation of a query
pattern in a graph of labeled nodes and unlabeled edges. However, our algorithm
works on a compressed graph grammar, instead of on the original graph. The
speed-up of our algorithm compared to the algorithm in [1] grows with the size
of the graph and with the compression strength
A Survey on Methods and Systems for Graph Compression
We present an informal survey (meant to accompany another paper) on graph
compression methods. We focus on lossless methods, briefly list available
pproaches, and compare them where possible or give some indicators on their
compression ratios. We also mention some relevant results from the field of
lossy compression and algorithms specialized for the use on large graphs. ---
Note: The comparison is by no means complete. This document is a first draft
and will be updated and extended
PLG2: Multiperspective Processes Randomization and Simulation for Online and Offline Settings
Process mining represents an important field in BPM and data mining research.
Recently, it has gained importance also for practitioners: more and more
companies are creating business process intelligence solutions. The evaluation
of process mining algorithms requires, as any other data mining task, the
availability of large amount of real-world data. Despite the increasing
availability of such datasets, they are affected by many limitations, in primis
the absence of a "gold standard" (i.e., the reference model).
This paper extends an approach, already available in the literature, for the
generation of random processes. Novelties have been introduced throughout the
work and, in particular, they involve the complete support for multiperspective
models and logs (i.e., the control-flow perspective is enriched with time and
data information) and for online settings (i.e., generation of multiperspective
event streams and concept drifts). The proposed new framework is able to almost
entirely cover the spectrum of possible scenarios that can be observed in the
real-world. The proposed approach is implemented as a publicly available Java
application, with a set of APIs for the programmatic execution of experiments.Comment: 36 pages, minor update
Tracelets and Tracelet Analysis Of Compositional Rewriting Systems
Taking advantage of a recently discovered associativity property of rule
compositions, we extend the classical concurrency theory for rewriting systems
over adhesive categories. We introduce the notion of tracelets, which are
defined as minimal derivation traces that universally encode sequential
compositions of rewriting rules. Tracelets are compositional, capture the
causality of equivalence classes of traditional derivation traces, and
intrinsically suggest a clean mathematical framework for the definition of
various notions of abstractions of traces. We illustrate these features by
introducing a first prototype for a framework of tracelet analysis, which as a
key application permits to formulate a first-of-its-kind algorithm for the
static generation of minimal derivation traces with prescribed terminal events.Comment: In Proceedings ACT 2019, arXiv:2009.0633
Improved ESP-index: a practical self-index for highly repetitive texts
While several self-indexes for highly repetitive texts exist, developing a
practical self-index applicable to real world repetitive texts remains a
challenge. ESP-index is a grammar-based self-index on the notion of
edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees
upper bounds of parsing discrepancies between different appearances of the same
subtexts in a text. Although ESP-index performs efficient top-down searches of
query texts, it has a serious issue on binary searches for finding appearances
of variables for a query text, which resulted in slowing down the query
searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea
behind succinct data structures for large alphabets. While ESP-index-I keeps
the same types of efficiencies as ESP-index about the top-down searches, it
avoid the binary searches using fast rank/select operations. We experimentally
test ESP-index-I on the ability to search query texts and extract subtexts from
real world repetitive texts on a large-scale, and we show that ESP-index-I
performs better that other possible approaches.Comment: This is the full version of a proceeding accepted to the 11th
International Symposium on Experimental Algorithms (SEA2014
Streaming algorithms for language recognition problems
We study the complexity of the following problems in the streaming model.
Membership testing for \DLIN We show that every language in \DLIN\ can be
recognised by a randomized one-pass space algorithm with inverse
polynomial one-sided error, and by a deterministic p-pass space
algorithm. We show that these algorithms are optimal.
Membership testing for \LL For languages generated by \LL grammars
with a bound of on the number of nonterminals at any stage in the left-most
derivation, we show that membership can be tested by a randomized one-pass
space algorithm with inverse polynomial (in ) one-sided error.
Membership testing for \DCFL We show that randomized algorithms as efficient
as the ones described above for \DLIN\ and \LL(k) (which are subclasses of
\DCFL) cannot exist for all of \DCFL: there is a language in \VPL\ (a subclass
of \DCFL) for which any randomized p-pass algorithm with error bounded by
must use space.
Degree sequence problem We study the problem of determining, given a sequence
and a graph , whether the degree sequence of is
precisely . We give a randomized one-pass space
algorithm with inverse polynomial one-sided error probability. We show that our
algorithms are optimal.
Our randomized algorithms are based on the recent work of Magniez et al.
\cite{MMN09}; our lower bounds are obtained by considering related
communication complexity problems
Evolutionary development of tensegrity structures
Contributions from the emerging fields of molecular genetics and evo-devo (evolutionary developmental biology) are greatly benefiting the field of evolutionary computation, initiating a promise of renewal in the traditional methodology. While direct encoding has constituted a dominant paradigm, indirect ways to encode the solutions have been reported, yet little attention has been paid to the benefits of the proposed methods to real problems. In this work, we study the biological properties that emerge by means of using indirect encodings in the context of form-finding problems. A novel indirect encoding model for artificial development has been defined and applied to an engineering structural-design problem, specifically to the discovery of tensegrity structures. This model has been compared with a direct encoding scheme. While the direct encoding performs similarly well to the proposed method, indirect-based results typically outperform the direct-based results in aspects not directly linked to the nature of the problem itself, but to the emergence of properties found in biological organisms, like organicity, generalization capacity, or modularity aspects which are highly valuable in engineering
Simulating the DNA String Graph in Succinct Space
Converting a set of sequencing reads into a lossless compact data structure
that encodes all the relevant biological information is a major challenge. The
classical approaches are to build the string graph or the de Bruijn graph. Each
has advantages over the other depending on the application. Still, the ideal
setting would be to have an index of the reads that is easy to build and can be
adapted to any type of biological analysis. In this paper, we propose a new
data structure we call rBOSS, which gets close to that ideal. Our rBOSS is a de
Bruijn graph in practice, but it simulates any length up to k and can compute
overlaps of size at least m between the labels of the nodes, with k and m being
parameters. If we choose the parameter k equal to the size of the reads, then
we can simulate a complete string graph. As most BWT-based structures, rBOSS is
unidirectional, but it exploits the property of the DNA reverse complements to
simulate bi-directionality with some time-space trade-offs. We implemented a
genome assembler on top of rBOSS to demonstrate its usefulness. Our
experimental results show that using k = 100, rBOSS can assemble 185 MB of
reads in less than 15 minutes and using 110 MB in total. It produces contigs of
mean sizes over 10,000, which is twice the size obtained by using a pure de
Bruijn graph of fixed length k.Comment: This research has received funding from the European Union's Horizon
2020 research and innovation programme under the Marie Sklodowska-Curie
Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094
Data Race Detection on Compressed Traces
We consider the problem of detecting data races in program traces that have
been compressed using straight line programs (SLP), which are special
context-free grammars that generate exactly one string, namely the trace that
they represent. We consider two classical approaches to race detection ---
using the happens-before relation and the lockset discipline. We present
algorithms for both these methods that run in time that is linear in the size
of the compressed, SLP representation. Typical program executions almost always
exhibit patterns that lead to significant compression. Thus, our algorithms are
expected to result in large speedups when compared with analyzing the
uncompressed trace. Our experimental evaluation of these new algorithms on
standard benchmarks confirms this observation
On the Use of Quasiorders in Formal Language Theory
In this thesis we use quasiorders on words to offer a new perspective on two
well-studied problems from Formal Language Theory: deciding language inclusion
and manipulating the finite automata representations of regular languages.
First, we present a generic quasiorder-based framework that, when instantiated
with different quasiorders, yields different algorithms (some of them new) for
deciding language inclusion. We then instantiate this framework to devise an
efficient algorithm for searching with regular expressions on
grammar-compressed text. Finally, we define a framework of quasiorder-based
automata constructions to offer a new perspective on residual automata.Comment: PhD thesi
- …