1,089 research outputs found
Pac-Learning Recursive Logic Programs: Efficient Algorithms
We present algorithms that learn certain classes of function-free recursive
logic programs in polynomial time from equivalence queries. In particular, we
show that a single k-ary recursive constant-depth determinate clause is
learnable. Two-clause programs consisting of one learnable recursive clause and
one constant-depth determinate non-recursive clause are also learnable, if an
additional ``basecase'' oracle is assumed. These results immediately imply the
pac-learnability of these classes. Although these classes of learnable
recursive programs are very constrained, it is shown in a companion paper that
they are maximally general, in that generalizing either class in any natural
way leads to a computationally difficult learning problem. Thus, taken together
with its companion paper, this paper establishes a boundary of efficient
learnability for recursive logic programs.Comment: See http://www.jair.org/ for any accompanying file
The Consistency dimension and distribution-dependent learning from queries
We prove a new combinatorial characterization of polynomial
learnability from equivalence queries, and state some of its
consequences relating the learnability of a class with the
learnability via equivalence and membership queries of its
subclasses obtained by restricting the instance space.
Then we propose and study two models of query learning in which there
is a probability distribution on the instance space, both as an
application of the tools developed from the combinatorial
characterization and as models of independent interest.Postprint (published version
Recommended from our members
Structure identification in relational data
This paper presents several investigations into the prospects for identifying meaningful structures in empirical data, namely, structures permitting effective organization of the data to meet requirements of future queries. We propose a general framework whereby the notion of identifiability is given a precise formal definition similar to that of learnability. Using this framework, we then explore if a tractable procedure exists for deciding whether a given relation is decomposable into a constraint network or a CNF theory with desirable topology and, if the answer is positive, identifying the desired decomposition. Finally, we address the problem of expressing a given relation as a Horn theory and, if this is impossible, finding the best k-Horn approximation to the given relation. We show that both problems can be solved in time polynomial in the length of the data
Bounds in Query Learning
We introduce new combinatorial quantities for concept classes, and prove
lower and upper bounds for learning complexity in several models of query
learning in terms of various combinatorial quantities. Our approach is flexible
and powerful enough to enough to give new and very short proofs of the
efficient learnability of several prominent examples (e.g. regular languages
and regular -languages), in some cases also producing new bounds on the
number of queries. In the setting of equivalence plus membership queries, we
give an algorithm which learns a class in polynomially many queries whenever
any such algorithm exists.
We also study equivalence query learning in a randomized model, producing new
bounds on the expected number of queries required to learn an arbitrary
concept. Many of the techniques and notions of dimension draw inspiration from
or are related to notions from model theory, and these connections are
explained. We also use techniques from query learning to mildly improve a
result of Laskowski regarding compression schemes
Complexity of Equivalence and Learning for Multiplicity Tree Automata
We consider the complexity of equivalence and learning for multiplicity tree
automata, i.e., weighted tree automata over a field. We first show that the
equivalence problem is logspace equivalent to polynomial identity testing, the
complexity of which is a longstanding open problem. Secondly, we derive lower
bounds on the number of queries needed to learn multiplicity tree automata in
Angluin's exact learning model, over both arbitrary and fixed fields.
Habrard and Oncina (2006) give an exact learning algorithm for multiplicity
tree automata, in which the number of queries is proportional to the size of
the target automaton and the size of a largest counterexample, represented as a
tree, that is returned by the Teacher. However, the smallest
tree-counterexample may be exponential in the size of the target automaton.
Thus the above algorithm does not run in time polynomial in the size of the
target automaton, and has query complexity exponential in the lower bound.
Assuming a Teacher that returns minimal DAG representations of
counterexamples, we give a new exact learning algorithm whose query complexity
is quadratic in the target automaton size, almost matching the lower bound, and
improving the best previously-known algorithm by an exponential factor
Towards unsupervised ontology learning from data
Data-driven elicitation of ontologies from structured data is a well-recognized knowledge acquisition bottleneck. The development of efficient techniques for (semi-)automating this task is therefore practically vital - yet, hindered by the lack of robust theoretical foundations. In this paper, we study the problem of learning Description Logic TBoxes from interpretations, which naturally translates to the task of ontology learning from data.In the presented framework, the learner is provided with a set of positive interpretations (i.e., logical models) of the TBox adopted by the teacher. The goal is to correctly identify the TBox given this input. We characterize the key constraints on the models that warrant finite learnability of TBoxes expressed in selected fragments of the Description Logic ε λ and define corresponding learning algorithms.This work was funded in part by the National Research Foundation under Grant no. 85482
Learning probability distributions generated by finite-state machines
We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the inference
in the limit and PAC formal models. The methods we review are state merging and state splitting methods for probabilistic deterministic automata and the recently developed spectral method for nondeterministic probabilistic automata. In both cases, we derive them from a high-level algorithm described in terms of the Hankel matrix of the distribution to be learned, given as an oracle, and then describe how to adapt that algorithm to account for the error introduced by a finite sample.Peer ReviewedPostprint (author's final draft
- …