672 research outputs found

    Learning Twig and Path Queries

    Get PDF
    International audienceWe investigate the problem of learning XML queries, \emph{path} queries and \emph{twig} queries, from examples given by the user. A learning algorithm takes on the input a set of XML documents with nodes annotated by the user and returns a query that selects the nodes in a manner consistent with the annotation. We study two learning settings that differ with the types of annotations. In the first setting the user may only indicate \emph{required nodes} that the query must return. In the second, more general, setting, the user may also indicate \emph{forbidden nodes} that the query must not return. The query may or may not return any node with no annotation. We formalize what it means for a class of queries to be \emph{learnable}. One requirement is the existence of a learning algorithm that is \emph{sound} i.e., always returns a query consistent with the examples given by the user. Furthermore, the learning algorithm should be \emph{complete} i.e., able to produce every query with sufficiently rich examples. Other requirements involve tractability of the learning algorithm and its robustness to nonessential examples. We identify practical classes of Boolean and unary, path and twig queries that are learnable from positive examples. We also show that adding negative examples to the picture renders learning unfeasible

    On the learnability of E-pattern languages over small alphabets

    Get PDF
    This paper deals with two well discussed, but largely open problems on E-pattern languages, also known as extended or erasing pattern languages: primarily, the learnability in Gold’s learning model and, secondarily, the decidability of the equivalence. As the main result, we show that the full class of E-pattern languages is not inferrable from positive data if the corresponding terminal alphabet consists of exactly three or of exactly four letters – an insight that remarkably contrasts with the recent positive finding on the learnability of the subclass of terminal-free E-pattern languages for these alphabets. As a side-effect of our reasoning thereon, we reveal some particular example patterns that disprove a conjecture of Ohlebusch and Ukkonen (Theoretical Computer Science 186, 1997) on the decidability of the equivalence of E-pattern languages

    Distribution-Independent Evolvability of Linear Threshold Functions

    Full text link
    Valiant's (2007) model of evolvability models the evolutionary process of acquiring useful functionality as a restricted form of learning from random examples. Linear threshold functions and their various subclasses, such as conjunctions and decision lists, play a fundamental role in learning theory and hence their evolvability has been the primary focus of research on Valiant's framework (2007). One of the main open problems regarding the model is whether conjunctions are evolvable distribution-independently (Feldman and Valiant, 2008). We show that the answer is negative. Our proof is based on a new combinatorial parameter of a concept class that lower-bounds the complexity of learning from correlations. We contrast the lower bound with a proof that linear threshold functions having a non-negligible margin on the data points are evolvable distribution-independently via a simple mutation algorithm. Our algorithm relies on a non-linear loss function being used to select the hypotheses instead of 0-1 loss in Valiant's (2007) original definition. The proof of evolvability requires that the loss function satisfies several mild conditions that are, for example, satisfied by the quadratic loss function studied in several other works (Michael, 2007; Feldman, 2009; Valiant, 2010). An important property of our evolution algorithm is monotonicity, that is the algorithm guarantees evolvability without any decreases in performance. Previously, monotone evolvability was only shown for conjunctions with quadratic loss (Feldman, 2009) or when the distribution on the domain is severely restricted (Michael, 2007; Feldman, 2009; Kanade et al., 2010

    On an open problem in classification of languages

    Get PDF
    • …
    corecore