Search CORE

61 research outputs found

Complexity of Equivalence and Learning for Multiplicity Tree Automata

Author: A. Beimel
A. Habrard
A.R. Klivans
D. Angluin
E. Allender
H. Seidl
S. Bozapalidis
Publication venue
Publication date: 01/01/2014
Field of study

We consider the complexity of equivalence and learning for multiplicity tree automata, i.e., weighted tree automata over a field. We first show that the equivalence problem is logspace equivalent to polynomial identity testing, the complexity of which is a longstanding open problem. Secondly, we derive lower bounds on the number of queries needed to learn multiplicity tree automata in Angluin's exact learning model, over both arbitrary and fixed fields. Habrard and Oncina (2006) give an exact learning algorithm for multiplicity tree automata, in which the number of queries is proportional to the size of the target automaton and the size of a largest counterexample, represented as a tree, that is returned by the Teacher. However, the smallest tree-counterexample may be exponential in the size of the target automaton. Thus the above algorithm does not run in time polynomial in the size of the target automaton, and has query complexity exponential in the lower bound. Assuming a Teacher that returns minimal DAG representations of counterexamples, we give a new exact learning algorithm whose query complexity is quadratic in the target automaton size, almost matching the lower bound, and improving the best previously-known algorithm by an exponential factor

arXiv.org e-Print Archive

CiteSeerX

Crossref

Oxford University Research Archive

Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound

Author: Emonet R
Germain P
Guedj B
Habrard A
Morvant E
Viallard P
Zantedeschi V
Publication venue: Advances in Neural Information Processing Systems (NeurIPS 2021)
Publication date: 01/01/2021
Field of study

We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective.The resulting stochastic majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight generalization bounds, in a series of numerical experiments when compared to competing algorithms which also minimize PAC-Bayes objectives -- both with uninformed (data-independent) and informed (data-dependent) priors

UCL Discovery

A distance for partially labeled trees

Author: A. Habrard
B. DasGupta
C.-M. Lee
D. Rizo
D. Rizo
D.F. Robinson
G. Valiente
H. Samet
K. Zhang
K.-C. Tai
P. Bille
R.A. Finkel
R.B. Russell
S.M. Selkow
T. Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In a number of practical situations, data have structure and the relations among its component parts need to be coded with suitable data models. Trees are usually utilized for representing data for which hierarchical relations can be defined. This is the case in a number of fields like image analysis, natural language processing, protein structure, or music retrieval, to name a few. In those cases, procedures for comparing trees are very relevant. An approximate tree edit distance algorithm has been introduced for working with trees labeled only at the leaves. In this paper, it has been applied to handwritten character recognition, providing accuracies comparable to those by the most comprehensive search method, being as efficient as the fastest.This work is supported by the Spanish Ministry projects DRIMS (TIN2009-14247-C02), and Consolider Ingenio 2010 (MIPRCV, CSD2007-00018), partially supported by EU ERDF and the Pascal Network of Excellence

Repositorio Institucional de la Universidad de Alicante

Crossref

Learning probabilistic models of tree edit distance

Author: Amaury Habrard
Laurent Boyer
Marc Bernard
Marc Sebban A
Publication venue
Publication date: 01/01/2008
Field of study

Nowadays, there is a growing interest in machine learning and pattern recognition for tree-structured data. Trees actually provide a suitable structural representation to deal with complex tasks such as web information extraction, RNA secondary structure prediction, computer music, or conversion of semi-structured data (e.g. XML documents). Many applications in these domains require the calculation of similarities over pairs of trees. In this context, the tree edit distance (ED) has been subject of investigations for many years in order to improve its computational efficiency. However, used in its classical form, the tree ED needs a priori fixed edit costs which are often difficult to tune, that leaves little room for tackling complex problems. In this paper, to overcome this drawback, we focus on the automatic learning of a non parametric stochastic tree ED. More precisely, we are interested in two kinds of probabilistic approaches. The first one builds a generative model of the tree ED from a joint distribution over the edit operations, while the second works from a conditional distribution providing then a discriminative model. To tackl

CiteSeerX

HAL-UJM

HAL AMU

Learning Rational Stochastic Tree Languages

Author: A. Habrard
A. Habrard
C.S. Wetherell
F. Denis
F. Denis
G. Hardy
G. Lugosi
J. Berstel
J. Rico-Juan
N. Abe
R. Carrasco
V. Vapnik
Z. Ésik
Publication venue
Publication date: 01/01/2007
Field of study

Abstract. We consider the problem of learning stochastic tree languages, i.e. probability distributions over a set of trees T(F), from a sample of trees independently drawn according to an unknown target P. We consider the case where the target is a rational stochastic tree language, i.e. it can be computed by a rational tree series or, equivalently, by a multiplicity tree automaton. In this paper, we provide two contributions. First, we show that rational tree series admit a canonical representation with parameters that can be efficiently estimated from samples. Then, we give an inference algorithm that identifies the class of rational stochastic tree languages in the limit with probability one.

CiteSeerX

Crossref

HAL AMU

New partially labelled tree similarity measure: a case study

Author: A. Habrard
D. Rizo
D. Rizo
K. Zhang
K.-C. Tai
P. Bille
R.A. Finkel
R.B. Russell
S.M. Selkow
T. Jiang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Trees are a powerful data structure for representing data for which hierarchical relations can be defined. They have been applied in a number of fields like image analysis, natural language processing, protein structure, or music retrieval, to name a few. Procedures for comparing trees are very relevant in many task where tree representations are involved. The computation of these measures is usually a time consuming tasks and different authors have proposed algorithms that are able to compute them in a reasonable time, through approximated versions of the similarity measure. Other methods require that the trees are fully labelled for the distance to be computed. In this paper, a new measure is presented able to deal with trees labelled only at the leaves, that runs in O(|TA|×|TB|) time. Experiments and comparative results are provided.This work was funded by the Spanish DRIMS project (TIN2009-14247-C02), and the research programme Consolider Ingenio 2010 (MIPRCV, CSD2007-00018)

Repositorio Institucional de la Universidad de Alicante

CiteSeerX

Crossref

Improvement of the State Merging Rule on Noisy Data in Probabilistic Grammatical Inference

Author: A. Habrard
A. Reber
C. Kermorvant
R. Carrasco
R. Carrasco
R. Carrasco
R. Lyngsø
V. Honavar
W. Hoeffding
Publication venue
Publication date: 01/01/2003
Field of study

In this paper we study the influence of noise in probabilistic grammatical inference. We paradoxically bring out the idea that specialized automata deal better with noisy data than more general ones. We propose then to replace the statistical test of the Alergia algorithm by a more restrictive merging rule based on a test of proportion comparison

CiteSeerX

Crossref