74,518 research outputs found
On the k-Abelian Equivalence Relation of Finite Words
This thesis is devoted to the so-called k-abelian equivalence relation of sequences of symbols, that is, words. This equivalence relation is a generalization of the abelian equivalence of words. Two words are abelian equivalent if one is a permutation of the other. For any positive integer k, two words are called k-abelian equivalent if each word of length at most k occurs equally many times as a factor in the two words. The k-abelian equivalence defines an equivalence relation, even a congruence, of finite words. A hierarchy of equivalence classes in between the equality relation and the abelian equivalence of words is thus obtained.
Most of the literature on the k-abelian equivalence deals with infinite words. In this thesis we consider several aspects of the equivalence relations, the main objective being to build a fairly comprehensive picture on the structure of the k-abelian equivalence classes themselves. The main part of the thesis deals with the structural aspects of k-abelian equivalence classes. We also consider aspects of k-abelian equivalence in infinite words.
We survey known characterizations of the k-abelian equivalence of finite words from the literature and also introduce novel characterizations. For the analysis of structural properties of the equivalence relation, the main tool is the characterization by the rewriting rule called the k-switching. Using this rule it is straightforward to show that the language comprised of the lexicographically least elements of the k-abelian equivalence classes is regular. Further word-combinatorial analysis of the lexicographically least elements leads us to describe the deterministic finite automata recognizing this language. Using tools from formal language theory combined with our analysis, we give an optimal expression for the asymptotic growth rate of the number of k-abelian equivalence classes of length n over an m-letter alphabet. Explicit formulae are computed for small values of k and m, and these sequences appear in Sloane’s Online Encyclopedia of Integer Sequences.
Due to the fact that the k-abelian equivalence relation is a congruence of the free monoid, we study equations over the k-abelian equivalence classes. The main result in this setting is that any system of equations of k-abelian equivalence classes is equivalent to one of its finite subsystems, i.e., the monoid defined by the k-abelian equivalence relation possesses the compactness property.
Concerning infinite words, we mainly consider the (k-)abelian complexity function. We complete a classification of the asymptotic abelian complexities of pure morphic binary words. In other words, given a morphism which has an infinite binary fixed point, the limit superior asymptotic abelian complexity of the fixed point can be computed (in principle). We also give a new proof of the fact that the k-abelian complexity of a Sturmian word is n + 1 for length n 2k. In fact, we consider several aspects of the k-abelian equivalence relation in Sturmian words using a dynamical interpretation of these words. We reprove the fact that any Sturmian word contains arbitrarily large k-abelian repetitions. The methods used allow to analyze the situation in more detail, and this leads us to define the so-called k-abelian critical exponent which measures the ratio of the exponent and the length of the root of a k-abelian repetition. This notion is connected to a deep number theoretic object called the Lagrange spectrum
The Stochastic complexity of spin models: Are pairwise models really simple?
Models can be simple for different reasons: because they yield a simple and
computationally efficient interpretation of a generic dataset (e.g. in terms of
pairwise dependences) - as in statistical learning - or because they capture
the essential ingredients of a specific phenomenon - as e.g. in physics -
leading to non-trivial falsifiable predictions. In information theory and
Bayesian inference, the simplicity of a model is precisely quantified in the
stochastic complexity, which measures the number of bits needed to encode its
parameters. In order to understand how simple models look like, we study the
stochastic complexity of spin models with interactions of arbitrary order. We
highlight the existence of invariances with respect to bijections within the
space of operators, which allow us to partition the space of all models into
equivalence classes, in which models share the same complexity. We thus found
that the complexity (or simplicity) of a model is not determined by the order
of the interactions, but rather by their mutual arrangements. Models where
statistical dependencies are localized on non-overlapping groups of few
variables (and that afford predictions on independencies that are easy to
falsify) are simple. On the contrary, fully connected pairwise models, which
are often used in statistical learning, appear to be highly complex, because of
their extended set of interactions
Experimental Study of Concise Representations of Concepts and Dependencies
In this paper we are interested in studying concise representations of
concepts and dependencies, i.e., implications and association rules. Such
representations are based on equivalence classes and their elements, i.e.,
minimal generators, minimum generators including keys and passkeys, proper
premises, and pseudo-intents. All these sets of attributes are significant and
well studied from the computational point of view, while their statistical
properties remain to be studied. This is the purpose of this paper to study
these singular attribute sets and in parallel to study how to evaluate the
complexity of a dataset from an FCA point of view. In the paper we analyze the
empirical distributions and the sizes of these particular attribute sets. In
addition we propose several measures of data complexity, such as
distributivity, linearity, size of concepts, size of minimum generators, for
the analysis of real-world and synthetic datasets
Document Type De�nition (DTD) Metrics
In this paper, we present two complexity metrics for the assessment of schema quality written in Document Type De�finition (DTD) language. Both "Entropy (E) metric: E(DTD)" and "Distinct Structured Element Repetition Scale (DSERS) metric: DSERS(DTD)" are intended to measure the structural complexity of schemas in DTD language. These metrics exploit a directed graph representation of schema document and consider the complexity of schema due to its similar structured elements and the occurrences of these
elements. The empirical and theoretical validations of these metrics prove the robustness of the metrics
Entropy as a Measure of Quality of XML Schema Document
In this paper, a metric for the assessment of the structural complexity of eXtensible Markup Language schema
document is formulated. The present metric ‘Schema Entropy is based on entropy concept and intended to measure the
complexity of the schema documents written in W3C XML Schema Language due to diversity in the structures of its elements. The SE is useful in evaluating the efficiency of the design of Schemas. A good design reduces the maintainability efforts. Therefore, our metric provides valuable information about the reliability and maintainability of systems. In this respect, this
metric is believed to be a valuable contribution for improving the quality of XML-based systems. It is demonstrated with examples and validated empirically through actual test cases
Benchmarks for Parity Games (extended version)
We propose a benchmark suite for parity games that includes all benchmarks
that have been used in the literature, and make it available online. We give an
overview of the parity games, including a description of how they have been
generated. We also describe structural properties of parity games, and using
these properties we show that our benchmarks are representative. With this work
we provide a starting point for further experimentation with parity games.Comment: The corresponding tool and benchmarks are available from
https://github.com/jkeiren/paritygame-generator. This is an extended version
of the paper that has been accepted for FSEN 201
- …