106 research outputs found

    Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs

    Full text link
    Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is described for executing such query packs. A complexity analysis shows that considerable efficiency improvements can be achieved through the use of this query pack execution mechanism. This claim is supported by empirical results obtained by incorporating support for query pack execution in two existing learning systems

    A Delta Debugger for ILP Query Execution

    Full text link
    Because query execution is the most crucial part of Inductive Logic Programming (ILP) algorithms, a lot of effort is invested in developing faster execution mechanisms. These execution mechanisms typically have a low-level implementation, making them hard to debug. Moreover, other factors such as the complexity of the problems handled by ILP algorithms and size of the code base of ILP data mining systems make debugging at this level a very difficult job. In this work, we present the trace-based debugging approach currently used in the development of new execution mechanisms in hipP, the engine underlying the ACE Data Mining system. This debugger uses the delta debugging algorithm to automatically reduce the total time needed to expose bugs in ILP execution, thus making manual debugging step much lighter.Comment: Paper presented at the 16th Workshop on Logic-based Methods in Programming Environments (WLPE2006

    goSLP: Globally Optimized Superword Level Parallelism Framework

    Full text link
    Modern microprocessors are equipped with single instruction multiple data (SIMD) or vector instruction sets which allow compilers to exploit superword level parallelism (SLP), a type of fine-grained parallelism. Current SLP auto-vectorization techniques use heuristics to discover vectorization opportunities in high-level language code. These heuristics are fragile, local and typically only present one vectorization strategy that is either accepted or rejected by a cost model. We present goSLP, a novel SLP auto-vectorization framework which solves the statement packing problem in a pairwise optimal manner. Using an integer linear programming (ILP) solver, goSLP searches the entire space of statement packing opportunities for a whole function at a time, while limiting total compilation time to a few minutes. Furthermore, goSLP optimally solves the vector permutation selection problem using dynamic programming. We implemented goSLP in the LLVM compiler infrastructure, achieving a geometric mean speedup of 7.58% on SPEC2017fp, 2.42% on SPEC2006fp and 4.07% on NAS benchmarks compared to LLVM's existing SLP auto-vectorizer.Comment: Published at OOPSLA 201

    From sequential to parallel Inductive Logic Programming

    Get PDF
    Inductive Logic Programming (ILP) has achieved considerablesuccess in a wide range of domains. It is recognized however thateciency is a major obstacle to the use of ILP systems in applicationsrequiring large amounts of data. In this paper we address the problem ofeciency in ILP in three steps: i) we survey speedup techniques proposedfor sequential execution of ILP systems; ii) we survey dierent ways ofparallelizing an ILP system and; ii) adapt and combine the sequentialexecution speedup techniques in the parallel implementations of an ILPsystem. We also propose a novel technique to partition the search spaceinto independent sub-spaces that may be adequately searched in parallel

    Processing Analytical Queries over Encrypted Data

    Get PDF
    MONOMI is a system for securely executing analytical workloads over sensitive data on an untrusted database server. MONOMI works by encrypting the entire database and running queries over the encrypted data. MONOMI introduces split client/server query execution, which can execute arbitrarily complex queries over encrypted data, as well as several techniques that improve performance for such workloads, including per-row precomputation, space-efficient encryption, grouped homomorphic addition, and pre-filtering. Since these optimizations are good for some queries but not others, MONOMI introduces a designer for choosing an efficient physical design at the server for a given workload, and a planner to choose an efficient execution plan for a given query at runtime. A prototype of MONOMI running on top of Postgres can execute most of the queries from the TPC-H benchmark with a median overhead of only 1.24× (ranging from 1.03×to 2.33×) compared to an un-encrypted Postgres database where a compromised server would reveal all data.National Science Foundation (U.S.) (Award IIS-1065219)Google (Firm

    Efficient algorithms for decision tree cross-validation

    Full text link
    Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced significantly by integrating the cross-validation with the normal decision tree induction process. We discuss how existing decision tree algorithms can be adapted to this aim, and provide an analysis of the speedups these adaptations may yield. The analysis is supported by experimental results.Comment: 9 pages, 6 figures. http://www.cs.kuleuven.ac.be/cgi-bin-dtai/publ_info.pl?id=3478

    dRAP-Independent: A Data Distribution Algorithm for Mining First-Order Frequent Patterns

    Get PDF
    In this paper we present dRAP-Independent, an algorithm for independent distributed mining of first-order frequent patterns. This system is based on RAP, an algorithm for finding maximal frequent patterns in first-order logic. dRAP-Independent utilizes a modified data partitioning schema introduced by Savasere et al. and offers good performance and low communication overhead. We analyze the performance of the algorithm on four different tasks: Mutagenicity prediction -- a standard ILP benchmark, information extraction from biological texts, context-sensitive spelling correction, and morphological disambiguation of Czech. The results of the analysis show that the algorithm can generate more patterns than the serial algorithm RAP in the same overall time

    An Open Ended Tree

    Full text link
    An open ended list is a well known data structure in Prolog programs. It is frequently used to represent a value changing over time, while this value is referred to from several places in the data structure of the application. A weak point in this technique is that the time complexity is linear in the number of updates to the value represented by the open ended list. In this programming pearl we present a variant of the open ended list, namely an open ended tree, with an update and access time complexity logarithmic in the number of updates to the value
    corecore