796 research outputs found
A similarity measure for cyclic unary regular languages
A cyclic unary regular language is a regular language over a unary alphabet that is represented
by a cyclic automaton. We propose a similarity measure for cyclic unary regular
languages by modifying the Jaccard similarity coe cient and the So rensen coe cient to
measure the level of overlap between such languages. This measure computes the proportion
of strings that are shared by two or more cyclic unary regular languages and is
an upper bound of the Jaccard coe cient and the S orensen coe cient. By using such
similarity measure, we de ne a dissimilarity measure for cyclic unary regular languages
that is a semimetric distance. Moreover, it can be used for the non-cyclic case
Evolving under small disruption
We extend the edit operators of substitution, deletion, and insertion of a symbol over a word
by introducing two new operators (partial copy and partial elimination) inspired by biological
gene duplication. We de ne a disruption measure for an operator over a word and prove that
whereas the traditional edit operators are disruptive, partial copy and partial elimination are
non-disruptive. Moreover, we show that the application of only edit operators does not generate
(with low disruption) all the words over a binary alphabet, but this can indeed be done by
combining partial copy and partial elimination with the substitution operator.Universidad de Málaga. Campus de Excelencia Internacional AndalucÃa Tech
Low Disruption Transformations on Cyclic Automata
We extend the edit operators of substitution, deletion, and insertion of a symbol over a
word by introducing two new operators (partial copy and partial elimination) inspired by biological
gene duplication. We define a disruption measure for an operator over a word and prove that whereas
the traditional edit operators are disruptive, partial copy and partial elimination are non-disruptive.
Moreover, we show that the application of only edit operators does not generate (with low disruption)
all the words over a binary alphabet, but this can indeed be done by combining partial copy and
partial elimination with the substitution operator
Evolving Complexity and Similarity in an Artificial Life Framework based on Formal Language Theory
In this thesis, a formal framework where the evolution of biological complexity can be studied in an objective way is defined. That objectivity is due to state complexity for regular languages is used and it is a well-known and rigorous complexity measure. Such a framework is composed of a population of cyclic unary regular languages (individuals) that try to adapt to a given environment (that also consists of cyclic unary regular languages) by means of evolutionary computation
Regular Language Distance and Entropy
This paper addresses the problem of determining the distance between two regular languages. It will show how to expand Jaccard distance, which works on finite sets, to potentially-infinite regular languages.
The entropy of a regular language plays a large role in the extension. Much of the paper is spent investigating the entropy of a regular language. This includes addressing issues that have required previous authors to rely on the upper limit of Shannon\u27s traditional formulation of channel capacity, because its limit does not always exist. The paper also includes proposing a new limit based formulation for the entropy of a regular language and proves that formulation to both exist and be equivalent to Shannon\u27s original formulation (when it exists). Additionally, the proposed formulation is shown to equal an analogous but formally quite different notion of topological entropy from Symbolic Dynamics -- consequently also showing Shannon\u27s original formulation to be
equivalent to topological entropy.
Surprisingly, the natural Jaccard-like entropy distance is trivial in most cases. Instead, the entropy sum distance metric is suggested, and shown to be granular in certain situations
An Analytical Study of Large SPARQL Query Logs
With the adoption of RDF as the data model for Linked Data and the Semantic
Web, query specification from end- users has become more and more common in
SPARQL end- points. In this paper, we conduct an in-depth analytical study of
the queries formulated by end-users and harvested from large and up-to-date
query logs from a wide variety of RDF data sources. As opposed to previous
studies, ours is the first assessment on a voluminous query corpus, span- ning
over several years and covering many representative SPARQL endpoints. Apart
from the syntactical structure of the queries, that exhibits already
interesting results on this generalized corpus, we drill deeper in the
structural char- acteristics related to the graph- and hypergraph represen-
tation of queries. We outline the most common shapes of queries when visually
displayed as pseudographs, and char- acterize their (hyper-)tree width.
Moreover, we analyze the evolution of queries over time, by introducing the
novel con- cept of a streak, i.e., a sequence of queries that appear as
subsequent modifications of a seed query. Our study offers several fresh
insights on the already rich query features of real SPARQL queries formulated
by real users, and brings us to draw a number of conclusions and pinpoint
future di- rections for SPARQL query evaluation, query optimization, tuning,
and benchmarking
Evaluating Datalog via Tree Automata and Cycluits
We investigate parameterizations of both database instances and queries that
make query evaluation fixed-parameter tractable in combined complexity. We show
that clique-frontier-guarded Datalog with stratified negation (CFG-Datalog)
enjoys bilinear-time evaluation on structures of bounded treewidth for programs
of bounded rule size. Such programs capture in particular conjunctive queries
with simplicial decompositions of bounded width, guarded negation fragment
queries of bounded CQ-rank, or two-way regular path queries. Our result is
shown by translating to alternating two-way automata, whose semantics is
defined via cyclic provenance circuits (cycluits) that can be tractably
evaluated.Comment: 56 pages, 63 references. Journal version of "Combined Tractability of
Query Evaluation via Tree Automata and Cycluits (Extended Version)" at
arXiv:1612.04203. Up to the stylesheet, page/environment numbering, and
possible minor publisher-induced changes, this is the exact content of the
journal paper that will appear in Theory of Computing Systems. Update wrt
version 1: latest reviewer feedbac
Navigating the Maze of Wikidata Query Logs
International audienceThis paper provides an in-depth and diversified analysis of the Wikidata query logs, recently made publicly available. Although the usage of Wikidata queries has been the object of recent studies, our analysis of the query traffic reveals interesting and unforeseen findings concerning the usage, types of recursion, and the shape classification of complex recursive queries. Wikidata specific features combined with recursion let us identify a significant subset of the entire corpus that can be used by the community for further assessment. We considered and analyzed the queries across many different dimensions, such as the robotic and organic queries, the presence/absence of constants along with the correctly executed and timed out queries. A further investigation that we pursue in this paper is to find, given a query, a number of queries structurally similar to the given query. We provide a thorough characterization of the queries in terms of their expressive power, their topological structure and shape, along with a deeper understanding of the usage of recursion in these logs. We make the code for the analysis available as open source
- …