Search CORE

106 research outputs found

Improving the Efficiency of Inductive Logic Programming Through the Use of Query Packs

Author: Blockeel H.
Dehaspe L.
Demoen B.
Janssens G.
Ramon J.
Vandecasteele H.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2002
Field of study

Inductive logic programming, or relational learning, is a powerful paradigm for machine learning or data mining. However, in order for ILP to become practically useful, the efficiency of ILP systems must improve substantially. To this end, the notion of a query pack is introduced: it structures sets of similar queries. Furthermore, a mechanism is described for executing such query packs. A complexity analysis shows that considerable efficiency improvements can be achieved through the use of this query pack execution mechanism. This claim is supported by empirical results obtained by incorporating support for query pack execution in two existing learning systems

arXiv.org e-Print Archive

Lirias

Crossref

A Delta Debugger for ILP Query Execution

Author: Janssens Gerda
Troncon Remko
Publication venue
Publication date: 01/01/2006
Field of study

Because query execution is the most crucial part of Inductive Logic Programming (ILP) algorithms, a lot of effort is invested in developing faster execution mechanisms. These execution mechanisms typically have a low-level implementation, making them hard to debug. Moreover, other factors such as the complexity of the problems handled by ILP algorithms and size of the code base of ILP data mining systems make debugging at this level a very difficult job. In this work, we present the trace-based debugging approach currently used in the development of new execution mechanisms in hipP, the engine underlying the ACE Data Mining system. This debugger uses the delta debugging algorithm to automatically reduce the total time needed to expose bugs in ILP execution, thus making manual debugging step much lighter.Comment: Paper presented at the 16th Workshop on Logic-based Methods in Programming Environments (WLPE2006

arXiv.org e-Print Archive

Lirias

goSLP: Globally Optimized Superword Level Parallelism Framework

Author: Amarasinghe Saman
Mendis Charith
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/10/2018
Field of study

Modern microprocessors are equipped with single instruction multiple data (SIMD) or vector instruction sets which allow compilers to exploit superword level parallelism (SLP), a type of fine-grained parallelism. Current SLP auto-vectorization techniques use heuristics to discover vectorization opportunities in high-level language code. These heuristics are fragile, local and typically only present one vectorization strategy that is either accepted or rejected by a cost model. We present goSLP, a novel SLP auto-vectorization framework which solves the statement packing problem in a pairwise optimal manner. Using an integer linear programming (ILP) solver, goSLP searches the entire space of statement packing opportunities for a whole function at a time, while limiting total compilation time to a few minutes. Furthermore, goSLP optimally solves the vector permutation selection problem using dynamic programming. We implemented goSLP in the LLVM compiler infrastructure, achieving a geometric mean speedup of 7.58% on SPEC2017fp, 2.42% on SPEC2006fp and 4.07% on NAS benchmarks compared to LLVM's existing SLP auto-vectorizer.Comment: Published at OOPSLA 201

arXiv.org e-Print Archive

DSpace@MIT

From sequential to parallel Inductive Logic Programming

Author: Rui Camacho
Publication venue
Publication date: 01/01/2004
Field of study

Inductive Logic Programming (ILP) has achieved considerablesuccess in a wide range of domains. It is recognized however thateciency is a major obstacle to the use of ILP systems in applicationsrequiring large amounts of data. In this paper we address the problem ofeciency in ILP in three steps: i) we survey speedup techniques proposedfor sequential execution of ILP systems; ii) we survey dierent ways ofparallelizing an ILP system and; ii) adapt and combine the sequentialexecution speedup techniques in the parallel implementations of an ILPsystem. We also propose a novel technique to partition the search spaceinto independent sub-spaces that may be adequately searched in parallel

Repositório Aberto da Universidade do Porto

Processing Analytical Queries over Encrypted Data

Author: Kaashoek M. Frans
Madden Samuel R.
Tu Stephen Lyle
Zeldovich Nickolai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2013
Field of study

MONOMI is a system for securely executing analytical workloads over sensitive data on an untrusted database server. MONOMI works by encrypting the entire database and running queries over the encrypted data. MONOMI introduces split client/server query execution, which can execute arbitrarily complex queries over encrypted data, as well as several techniques that improve performance for such workloads, including per-row precomputation, space-efficient encryption, grouped homomorphic addition, and pre-filtering. Since these optimizations are good for some queries but not others, MONOMI introduces a designer for choosing an efficient physical design at the server for a given workload, and a planner to choose an efficient execution plan for a given query at runtime. A prototype of MONOMI running on top of Postgres can execute most of the queries from the TPC-H benchmark with a median overhead of only 1.24× (ranging from 1.03×to 2.33×) compared to an un-encrypted Postgres database where a compromised server would reveal all data.National Science Foundation (U.S.) (Award IIS-1065219)Google (Firm

DSpace@MIT

Efficient algorithms for decision tree cross-validation

Author: Blockeel Hendrik
Struyf Jan
Publication venue
Publication date: 01/01/2001
Field of study

Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced significantly by integrating the cross-validation with the normal decision tree induction process. We discuss how existing decision tree algorithms can be adapted to this aim, and provide an analysis of the speedups these adaptations may yield. The analysis is supported by experimental results.Comment: 9 pages, 6 figures. http://www.cs.kuleuven.ac.be/cgi-bin-dtai/publ_info.pl?id=3478

arXiv.org e-Print Archive

Lirias

CiteSeerX

dRAP-Independent: A Data Distribution Algorithm for Mining First-Order Frequent Patterns

Author: Blaťák Jan
Popelínský Luboš
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 27/01/2012
Field of study

In this paper we present dRAP-Independent, an algorithm for independent distributed mining of first-order frequent patterns. This system is based on RAP, an algorithm for finding maximal frequent patterns in first-order logic. dRAP-Independent utilizes a modified data partitioning schema introduced by Savasere et al. and offers good performance and low communication overhead. We analyze the performance of the algorithm on four different tasks: Mutagenicity prediction -- a standard ILP benchmark, information extraction from biological texts, context-sensitive spelling correction, and morphological disambiguation of Czech. The results of the analysis show that the algorithm can generate more patterns than the serial algorithm RAP in the same overall time

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

An Open Ended Tree

Author: Janssens Gerda
Vandecasteele Henk
Publication venue
Publication date: 01/05/2003
Field of study

An open ended list is a well known data structure in Prolog programs. It is frequently used to represent a value changing over time, while this value is referred to from several places in the data structure of the application. A weak point in this technique is that the time complexity is linear in the number of updates to the value represented by the open ended list. In this programming pearl we present a variant of the open ended list, namely an open ended tree, with an update and access time complexity logarithmic in the number of updates to the value

arXiv.org e-Print Archive

Lirias