796 research outputs found

    A similarity measure for cyclic unary regular languages

    Get PDF
    A cyclic unary regular language is a regular language over a unary alphabet that is represented by a cyclic automaton. We propose a similarity measure for cyclic unary regular languages by modifying the Jaccard similarity coe cient and the So rensen coe cient to measure the level of overlap between such languages. This measure computes the proportion of strings that are shared by two or more cyclic unary regular languages and is an upper bound of the Jaccard coe cient and the S orensen coe cient. By using such similarity measure, we de ne a dissimilarity measure for cyclic unary regular languages that is a semimetric distance. Moreover, it can be used for the non-cyclic case

    Evolving under small disruption

    Get PDF
    We extend the edit operators of substitution, deletion, and insertion of a symbol over a word by introducing two new operators (partial copy and partial elimination) inspired by biological gene duplication. We de ne a disruption measure for an operator over a word and prove that whereas the traditional edit operators are disruptive, partial copy and partial elimination are non-disruptive. Moreover, we show that the application of only edit operators does not generate (with low disruption) all the words over a binary alphabet, but this can indeed be done by combining partial copy and partial elimination with the substitution operator.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Low Disruption Transformations on Cyclic Automata

    Get PDF
    We extend the edit operators of substitution, deletion, and insertion of a symbol over a word by introducing two new operators (partial copy and partial elimination) inspired by biological gene duplication. We define a disruption measure for an operator over a word and prove that whereas the traditional edit operators are disruptive, partial copy and partial elimination are non-disruptive. Moreover, we show that the application of only edit operators does not generate (with low disruption) all the words over a binary alphabet, but this can indeed be done by combining partial copy and partial elimination with the substitution operator

    Evolving Complexity and Similarity in an Artificial Life Framework based on Formal Language Theory

    Get PDF
    In this thesis, a formal framework where the evolution of biological complexity can be studied in an objective way is defined. That objectivity is due to state complexity for regular languages is used and it is a well-known and rigorous complexity measure. Such a framework is composed of a population of cyclic unary regular languages (individuals) that try to adapt to a given environment (that also consists of cyclic unary regular languages) by means of evolutionary computation

    Regular Language Distance and Entropy

    Get PDF
    This paper addresses the problem of determining the distance between two regular languages. It will show how to expand Jaccard distance, which works on finite sets, to potentially-infinite regular languages. The entropy of a regular language plays a large role in the extension. Much of the paper is spent investigating the entropy of a regular language. This includes addressing issues that have required previous authors to rely on the upper limit of Shannon\u27s traditional formulation of channel capacity, because its limit does not always exist. The paper also includes proposing a new limit based formulation for the entropy of a regular language and proves that formulation to both exist and be equivalent to Shannon\u27s original formulation (when it exists). Additionally, the proposed formulation is shown to equal an analogous but formally quite different notion of topological entropy from Symbolic Dynamics -- consequently also showing Shannon\u27s original formulation to be equivalent to topological entropy. Surprisingly, the natural Jaccard-like entropy distance is trivial in most cases. Instead, the entropy sum distance metric is suggested, and shown to be granular in certain situations

    An Analytical Study of Large SPARQL Query Logs

    Full text link
    With the adoption of RDF as the data model for Linked Data and the Semantic Web, query specification from end- users has become more and more common in SPARQL end- points. In this paper, we conduct an in-depth analytical study of the queries formulated by end-users and harvested from large and up-to-date query logs from a wide variety of RDF data sources. As opposed to previous studies, ours is the first assessment on a voluminous query corpus, span- ning over several years and covering many representative SPARQL endpoints. Apart from the syntactical structure of the queries, that exhibits already interesting results on this generalized corpus, we drill deeper in the structural char- acteristics related to the graph- and hypergraph represen- tation of queries. We outline the most common shapes of queries when visually displayed as pseudographs, and char- acterize their (hyper-)tree width. Moreover, we analyze the evolution of queries over time, by introducing the novel con- cept of a streak, i.e., a sequence of queries that appear as subsequent modifications of a seed query. Our study offers several fresh insights on the already rich query features of real SPARQL queries formulated by real users, and brings us to draw a number of conclusions and pinpoint future di- rections for SPARQL query evaluation, query optimization, tuning, and benchmarking

    Evaluating Datalog via Tree Automata and Cycluits

    Full text link
    We investigate parameterizations of both database instances and queries that make query evaluation fixed-parameter tractable in combined complexity. We show that clique-frontier-guarded Datalog with stratified negation (CFG-Datalog) enjoys bilinear-time evaluation on structures of bounded treewidth for programs of bounded rule size. Such programs capture in particular conjunctive queries with simplicial decompositions of bounded width, guarded negation fragment queries of bounded CQ-rank, or two-way regular path queries. Our result is shown by translating to alternating two-way automata, whose semantics is defined via cyclic provenance circuits (cycluits) that can be tractably evaluated.Comment: 56 pages, 63 references. Journal version of "Combined Tractability of Query Evaluation via Tree Automata and Cycluits (Extended Version)" at arXiv:1612.04203. Up to the stylesheet, page/environment numbering, and possible minor publisher-induced changes, this is the exact content of the journal paper that will appear in Theory of Computing Systems. Update wrt version 1: latest reviewer feedbac

    Navigating the Maze of Wikidata Query Logs

    Get PDF
    International audienceThis paper provides an in-depth and diversified analysis of the Wikidata query logs, recently made publicly available. Although the usage of Wikidata queries has been the object of recent studies, our analysis of the query traffic reveals interesting and unforeseen findings concerning the usage, types of recursion, and the shape classification of complex recursive queries. Wikidata specific features combined with recursion let us identify a significant subset of the entire corpus that can be used by the community for further assessment. We considered and analyzed the queries across many different dimensions, such as the robotic and organic queries, the presence/absence of constants along with the correctly executed and timed out queries. A further investigation that we pursue in this paper is to find, given a query, a number of queries structurally similar to the given query. We provide a thorough characterization of the queries in terms of their expressive power, their topological structure and shape, along with a deeper understanding of the usage of recursion in these logs. We make the code for the analysis available as open source
    • …
    corecore