806 research outputs found

    Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs

    Full text link
    This paper presents a method for inducing logic programs from examples that learns a new class of concepts called first-order decision lists, defined as ordered lists of clauses each ending in a cut. The method, called FOIDL, is based on FOIL (Quinlan, 1990) but employs intensional background knowledge and avoids the need for explicit negative examples. It is particularly useful for problems that involve rules with specific exceptions, such as learning the past-tense of English verbs, a task widely studied in the context of the symbolic/connectionist debate. FOIDL is able to learn concise, accurate programs for this problem from significantly fewer examples than previous methods (both connectionist and symbolic).Comment: See http://www.jair.org/ for any accompanying file

    Recursion Aware Modeling and Discovery For Hierarchical Software Event Log Analysis (Extended)

    Get PDF
    This extended paper presents 1) a novel hierarchy and recursion extension to the process tree model; and 2) the first, recursion aware process model discovery technique that leverages hierarchical information in event logs, typically available for software systems. This technique allows us to analyze the operational processes of software systems under real-life conditions at multiple levels of granularity. The work can be positioned in-between reverse engineering and process mining. An implementation of the proposed approach is available as a ProM plugin. Experimental results based on real-life (software) event logs demonstrate the feasibility and usefulness of the approach and show the huge potential to speed up discovery by exploiting the available hierarchy.Comment: Extended version (14 pages total) of the paper Recursion Aware Modeling and Discovery For Hierarchical Software Event Log Analysis. This Technical Report version includes the guarantee proofs for the proposed discovery algorithm

    Qualitative System Identification from Imperfect Data

    Full text link
    Experience in the physical sciences suggests that the only realistic means of understanding complex systems is through the use of mathematical models. Typically, this has come to mean the identification of quantitative models expressed as differential equations. Quantitative modelling works best when the structure of the model (i.e., the form of the equations) is known; and the primary concern is one of estimating the values of the parameters in the model. For complex biological systems, the model-structure is rarely known and the modeler has to deal with both model-identification and parameter-estimation. In this paper we are concerned with providing automated assistance to the first of these problems. Specifically, we examine the identification by machine of the structural relationships between experimentally observed variables. These relationship will be expressed in the form of qualitative abstractions of a quantitative model. Such qualitative models may not only provide clues to the precise quantitative model, but also assist in understanding the essence of that model. Our position in this paper is that background knowledge incorporating system modelling principles can be used to constrain effectively the set of good qualitative models. Utilising the model-identification framework provided by Inductive Logic Programming (ILP) we present empirical support for this position using a series of increasingly complex artificial datasets. The results are obtained with qualitative and quantitative data subject to varying amounts of noise and different degrees of sparsity. The results also point to the presence of a set of qualitative states, which we term kernel subsets, that may be necessary for a qualitative model-learner to learn correct models. We demonstrate scalability of the method to biological system modelling by identification of the glycolysis metabolic pathway from data

    An efficient algorithm for discovering frequent subgraphs

    Get PDF
    Abstract — Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to non-traditional domains, existing frequent pattern discovery approach cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the datasets in these domains. An alternate way of modeling the objects in these datasets is to represent them using graphs. Within that model, one way of formulating the frequent pattern discovery problem is as that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph datasets. We experimentally evaluate the performance of FSG using a variety of real and synthetic datasets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in datasets containing over 200,000 graph transactions and scales linearly with respect to the size of the dataset. Index Terms — Data mining, scientific datasets, frequent pattern discovery, chemical compound datasets

    Efficient Minimum Flow Decomposition via Integer Linear Programming

    Get PDF
    Extended version of RECOMB 2022 paperMinimum flow decomposition (MFD) is an NP-hard problem asking to decompose a network flow into a minimum set of paths (together with associated weights). Variants of it are powerful models in multiassembly problems in Bioinformatics, such as RNA assembly. Owing to its hardness, practical multiassembly tools either use heuristics or solve simpler, polynomial time-solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Here, we provide the first fast and exact solver for MFD on acyclic flow networks, based on Integer Linear Programming (ILP). Key to our approach is an encoding of all the exponentially many solution paths using only a quadratic number of variables. We also extend our ILP formulation to many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. On both simulated and real-flow splicing graphs, our approach solves any instance inPeer reviewe

    Resilient scalable internet routing and embedding algorithms

    Get PDF
    • …
    corecore