18 research outputs found
Complexity of Equivalence and Learning for Multiplicity Tree Automata
We consider the complexity of equivalence and learning for multiplicity tree
automata, i.e., weighted tree automata over a field. We first show that the
equivalence problem is logspace equivalent to polynomial identity testing, the
complexity of which is a longstanding open problem. Secondly, we derive lower
bounds on the number of queries needed to learn multiplicity tree automata in
Angluin's exact learning model, over both arbitrary and fixed fields.
Habrard and Oncina (2006) give an exact learning algorithm for multiplicity
tree automata, in which the number of queries is proportional to the size of
the target automaton and the size of a largest counterexample, represented as a
tree, that is returned by the Teacher. However, the smallest
tree-counterexample may be exponential in the size of the target automaton.
Thus the above algorithm does not run in time polynomial in the size of the
target automaton, and has query complexity exponential in the lower bound.
Assuming a Teacher that returns minimal DAG representations of
counterexamples, we give a new exact learning algorithm whose query complexity
is quadratic in the target automaton size, almost matching the lower bound, and
improving the best previously-known algorithm by an exponential factor
Learning Multipicity Tree Automata
International audienceIn this paper, we present a theoretical approach for the problem of learning multiplicity tree automata. These automata allows one to define functions which compute a number for each tree. They can be seen as a strict generalization of stochastic tree automata since they allow to define functions over any field K. A multiplicity automaton admits a support which is a non deterministic automaton. From a grammatical inference point of view, this paper presents a contribution which is original due to the combination of two important aspects. This is the first time, as far as we now, that a learning method focuses on non deterministic tree automata which computes functions over a field. The algorithm proposed in this paper stands in Angluin's exact model where a learner is allowed to use membership and equivalence queries. We show that this algorithm is polynomial in time in function of the size of the representation
On the exact learnability of graph parameters: The case of partition functions
We study the exact learnability of real valued graph parameters which are
known to be representable as partition functions which count the number of
weighted homomorphisms into a graph with vertex weights and edge
weights . M. Freedman, L. Lov\'asz and A. Schrijver have given a
characterization of these graph parameters in terms of the -connection
matrices of . Our model of learnability is based on D. Angluin's
model of exact learning using membership and equivalence queries. Given such a
graph parameter , the learner can ask for the values of for graphs of
their choice, and they can formulate hypotheses in terms of the connection
matrices of . The teacher can accept the hypothesis as correct, or
provide a counterexample consisting of a graph. Our main result shows that in
this scenario, a very large class of partition functions, the rigid partition
functions, can be learned in time polynomial in the size of and the size of
the largest counterexample in the Blum-Shub-Smale model of computation over the
reals with unit cost.Comment: 14 pages, full version of the MFCS 2016 conference pape
Minimisation of Multiplicity Tree Automata
We consider the problem of minimising the number of states in a multiplicity
tree automaton over the field of rational numbers. We give a minimisation
algorithm that runs in polynomial time assuming unit-cost arithmetic. We also
show that a polynomial bound in the standard Turing model would require a
breakthrough in the complexity of polynomial identity testing by proving that
the latter problem is logspace equivalent to the decision version of
minimisation. The developed techniques also improve the state of the art in
multiplicity word automata: we give an NC algorithm for minimising multiplicity
word automata. Finally, we consider the minimal consistency problem: does there
exist an automaton with states that is consistent with a given finite
sample of weight-labelled words or trees? We show that this decision problem is
complete for the existential theory of the rationals, both for words and for
trees of a fixed alphabet rank.Comment: Paper to be published in Logical Methods in Computer Science. Minor
editing changes from previous versio
Learning of Structurally Unambiguous Probabilistic Grammars
The problem of identifying a probabilistic context free grammar has two
aspects: the first is determining the grammar's topology (the rules of the
grammar) and the second is estimating probabilistic weights for each rule.
Given the hardness results for learning context-free grammars in general, and
probabilistic grammars in particular, most of the literature has concentrated
on the second problem. In this work we address the first problem. We restrict
attention to structurally unambiguous weighted context-free grammars (SUWCFG)
and provide a query learning algorithm for structurally unambiguous
probabilistic context-free grammars (SUPCFG). We show that SUWCFG can be
represented using co-linear multiplicity tree automata (CMTA), and provide a
polynomial learning algorithm that learns CMTAs. We show that the learned CMTA
can be converted into a probabilistic grammar, thus providing a complete
algorithm for learning a structurally unambiguous probabilistic context free
grammar (both the grammar topology and the probabilistic weights) using
structured membership queries and structured equivalence queries. We
demonstrate the usefulness of our algorithm in learning PCFGs over genomic
data
Approximate Learning of Limit-Average Automata
Limit-average automata are weighted automata on infinite words that use average to aggregate the weights seen in infinite runs. We study approximate learning problems for limit-average automata in two settings: passive and active. In the passive learning case, we show that limit-average automata are not PAC-learnable as samples must be of exponential-size to provide (with good probability) enough details to learn an automaton. We also show that the problem of finding an automaton that fits a given sample is NP-complete. In the active learning case, we show that limit-average automata can be learned almost-exactly, i.e., we can learn in polynomial time an automaton that is consistent with the target automaton on almost all words. On the other hand, we show that the problem of learning an automaton that approximates the target automaton (with perhaps fewer states) is NP-complete. The abovementioned results are shown for the uniform distribution on words. We briefly discuss learning over different distributions
Series, Weighted Automata, Probabilistic Automata and Probability Distributions for Unranked Trees.
We study tree series and weighted tree automata over unranked trees. The message is that recognizable tree series for unranked trees can be defined and studied from recognizable tree series for binary representations of unranked trees. For this we prove results of Denis et al (2007) as follows. We extend hedge automata -- a class of tree automata for unranked trees -- to weighted hedge automata. We define weighted stepwise automata as weighted tree automata for binary representations of unranked trees. We show that recognizable tree series can be equivalently defined by weighted hedge automata or weighted stepwise automata. Then we consider real-valued tree series and weighted tree automata over the field of real numbers. We show that the result also holds for probabilistic automata -- weighted automata with normalisation conditions for rules. We also define convergent tree series and show that convergence properties for recognizable tree series are preserved via binary encoding. From Etessami and Yannakakis (2009), we present decidability results on probabilistic tree automata and algorithms for computing sums of convergent series. Last we show that streaming algorithms for unranked trees can be seen as slight transformations of algorithms on the binary representations
Tree Automata as Algebras: Minimisation and Determinisation
We study a categorical generalisation of tree automata, as algebras for a fixed endofunctor endowed with initial and final states. Under mild assumptions about the base category, we present a general minimisation algorithm for these automata. We then build upon and extend an existing generalisation of the Nerode equivalence to a categorical setting and relate it to the existence of minimal automata. Finally, we show that generalised types of side-effects, such as non-determinism, can be captured by this categorical framework, leading to a general determinisation procedure
MAT learners for recognizable tree languages and tree series
We review a family of closely related query learning algorithms for unweighted and weighted tree automata, all of which are based on adaptations of the minimal adequate teacher (MAT) model by Angluin. Rather than presenting
new results, the goal is to discuss these algorithms in sufficient detail to make their similarities and differences transparent to the reader interested in grammatical inference of tree automata
Learning of Structurally Unambiguous Probabilistic Grammars
The problem of identifying a probabilistic context free grammar has two
aspects: the first is determining the grammar's topology (the rules of the
grammar) and the second is estimating probabilistic weights for each rule.
Given the hardness results for learning context-free grammars in general, and
probabilistic grammars in particular, most of the literature has concentrated
on the second problem. In this work we address the first problem. We restrict
attention to structurally unambiguous weighted context-free grammars (SUWCFG)
and provide a query learning algorithm for \structurally unambiguous
probabilistic context-free grammars (SUPCFG). We show that SUWCFG can be
represented using \emph{co-linear multiplicity tree automata} (CMTA), and
provide a polynomial learning algorithm that learns CMTAs. We show that the
learned CMTA can be converted into a probabilistic grammar, thus providing a
complete algorithm for learning a structurally unambiguous probabilistic
context free grammar (both the grammar topology and the probabilistic weights)
using structured membership queries and structured equivalence queries. A
summarized version of this work was published at AAAI 21