15 research outputs found
280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification
We propose a simple, yet effective, approach towards inducing multilingual
taxonomies from Wikipedia. Given an English taxonomy, our approach leverages
the interlanguage links of Wikipedia followed by character-level classifiers to
induce high-precision, high-coverage taxonomies in other languages. Through
experiments, we demonstrate that our approach significantly outperforms the
state-of-the-art, heuristics-heavy approaches for six languages. As a
consequence of our work, we release presumably the largest and the most
accurate multilingual taxonomic resource spanning over 280 languages
Limits of Preprocessing
We present a first theoretical analysis of the power of polynomial-time
preprocessing for important combinatorial problems from various areas in AI. We
consider problems from Constraint Satisfaction, Global Constraints,
Satisfiability, Nonmonotonic and Bayesian Reasoning. We show that, subject to a
complexity theoretic assumption, none of the considered problems can be reduced
by polynomial-time preprocessing to a problem kernel whose size is polynomial
in a structural problem parameter of the input, such as induced width or
backdoor size. Our results provide a firm theoretical boundary for the
performance of polynomial-time preprocessing algorithms for the considered
problems.Comment: This is a slightly longer version of a paper that appeared in the
proceedings of AAAI 201
Taxonomy Induction using Hypernym Subsequences
We propose a novel, semi-supervised approach towards domain taxonomy
induction from an input vocabulary of seed terms. Unlike all previous
approaches, which typically extract direct hypernym edges for terms, our
approach utilizes a novel probabilistic framework to extract hypernym
subsequences. Taxonomy induction from extracted subsequences is cast as an
instance of the minimumcost flow problem on a carefully designed directed
graph. Through experiments, we demonstrate that our approach outperforms
stateof- the-art taxonomy induction approaches across four languages.
Importantly, we also show that our approach is robust to the presence of noise
in the input vocabulary. To the best of our knowledge, no previous approaches
have been empirically proven to manifest noise-robustness in the input
vocabulary
Guarantees and Limits of Preprocessing in Constraint Satisfaction and Reasoning
We present a first theoretical analysis of the power of polynomial-time
preprocessing for important combinatorial problems from various areas in AI. We
consider problems from Constraint Satisfaction, Global Constraints,
Satisfiability, Nonmonotonic and Bayesian Reasoning under structural
restrictions. All these problems involve two tasks: (i) identifying the
structure in the input as required by the restriction, and (ii) using the
identified structure to solve the reasoning task efficiently. We show that for
most of the considered problems, task (i) admits a polynomial-time
preprocessing to a problem kernel whose size is polynomial in a structural
problem parameter of the input, in contrast to task (ii) which does not admit
such a reduction to a problem kernel of polynomial size, subject to a
complexity theoretic assumption. As a notable exception we show that the
consistency problem for the AtMost-NValue constraint admits a polynomial kernel
consisting of a quadratic number of variables and domain values. Our results
provide a firm worst-case guarantees and theoretical boundaries for the
performance of polynomial-time preprocessing algorithms for the considered
problems.Comment: arXiv admin note: substantial text overlap with arXiv:1104.2541,
arXiv:1104.556
Improvement and Integration of Counting-Based Search Heuristics in Constraint Programming
Ce mémoire s’intéresse à la programmation par contraintes, un paradigme pour résoudre des problèmes combinatoires. Pour la plupart des problèmes, trouver une solution n’est pas
possible si on se limite à des mécanismes d’inférence logique; l’exploration d’un espace des solutions à l’aide d’heuristiques de recherche est nécessaire. Des nombreuses heuristiques existantes, les heuristiques de branchement basées sur le dénombrement seront au centre de ce mémoire. Cette approche repose sur l’utilisation d’algorithmes pour estimer le nombre de solutions des contraintes individuelles d’un problème de satisfaction de contraintes. Notre contribution se résume principalement à l’amélioration de deux algorithmes de dénombrement pour les contraintes alldifferent et spanningTree; ces contraintes peuvent exprimer de nombreux problèmes de satisfaction, et sont par le fait même essentielles à nos heuristiques de branchement.
Notre travail fait également l’objet d’une contribution à un solveur de programmation par contraintes open-source. Ainsi, l’ensemble de ce mémoire est motivé par cette considération
pratique; nos algorithmes doivent être accessibles et performants. Finalement, nous explorons deux techniques applicables à l’ensemble de nos heuristiques: une
technique qui réutilise des calculs précédemment faits dans l’arbre de recherche ainsi qu’une manière d’apprendre de nouvelles heuristiques de branchement pour un problème.=----------ABSTRACT: This thesis concerns constraint programming, a paradigm for solving combinatorial problems. The focus is on the mechanism involved in making hypotheses and exploring the solution space towards satisfying solutions: search heuristics. Of interest to us is a specific family called counting-based search, an approach that uses algorithms to estimate the number of
solutions of individual constraints in constraint satisfaction problems to guide search. The improvements of two existing counting algorithms and the integration of counting-based search in a constraint programming solver are the two main contributions of this thesis. The first counting algorithm concerns the alldifferent constraint; the second one, the spanningTree constraint. Both constraints are useful for expressing many constraint satisfaction
problems and thus are essential for counting-based search.
Practical matters are also central to this work; we integrated counting-based search in an open-source constraint programming solver called Gecode. In doing so, we bring this family of search heuristics to a wider audience; everything in this thesis is built upon this contribution.
Lastly, we also look at more general improvements to counting-based search with a method for trading computation time for accuracy, and a method for learning new counting-based search heuristics from past experiments