Search CORE

1,801 research outputs found

A System for Induction of Oblique Decision Trees

Author: Kasif S.
Murthy S. K.
Salzberg S.
Publication venue
Publication date: 01/01/1994
Field of study

This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

arXiv.org e-Print Archive

CiteSeerX

An offline/online procedure for dual norm calculations of parameterized functionals: empirical quadrature and empirical test spaces

Author: Taddei Tommaso
Publication venue
Publication date: 06/02/2019
Field of study

We present an offline/online computational procedure for computing the dual norm of parameterized linear functionals. The key elements of the approach are (i) an empirical test space for the manifold of Riesz elements associated with the parameterized functional, and (ii) an empirical quadrature procedure to efficiently deal with parametrically non-affine terms. We present a number of theoretical results to identify the different sources of error and to motivate the technique. Finally, we show the effectiveness of our approach to reduce both offline and online costs associated with the computation of the time-averaged residual indicator proposed in [Fick, Maday, Patera, Taddei, Journal of Computational Physics, 2018 (accepted)]

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Oskar Bordeaux

Quadratic programming for class ordering in rule induction

Author: Yıldız Olcay Taner
Publication venue: 'Elsevier BV'
Publication date: 01/03/2015
Field of study

Separate-and-conquer type rule induction algorithms such as Ripper, solve a K>2 class problem by converting it into a sequence of K - 1 two-class problems. As a usual heuristic, the classes are fed into the algorithm in the order of increasing prior probabilities. Although the heuristic works well in practice, there is much room for improvement. In this paper, we propose a novel approach to improve this heuristic. The approach transforms the ordering search problem into a quadratic optimization problem and uses the solution of the optimization problem to extract the optimal ordering. We compared new Ripper (guided by the ordering found with our approach) with original Ripper (guided by the heuristic ordering) on 27 datasets. Simulation results show that our approach produces rulesets that are significantly better than those produced by the original Ripper.Publisher's VersionAuthor Post Prin

Isik University Academic Open Access

Omnivariate rule induction using a novel pairwise statistical test

Author: Yıldız Olcay Taner
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2013
Field of study

Rule learning algorithms, for example, RIPPER, induces univariate rules, that is, a propositional condition in a rule uses only one feature. In this paper, we propose an omnivariate induction of rules where under each condition, both a univariate and a multivariate condition are trained, and the best is chosen according to a novel statistical test. This paper has three main contributions: First, we propose a novel statistical test, the combined 5 x 2 cv t test, to compare two classifiers, which is a variant of the 5 x 2 cv t test and give the connections to other tests as 5 x 2 cv F test and k-fold paired t test. Second, we propose a multivariate version of RIPPER, where support vector machine with linear kernel is used to find multivariate linear conditions. Third, we propose an omnivariate version of RIPPER, where the model selection is done via the combined 5 x 2 cv t test. Our results indicate that 1) the combined 5 x 2 cv t test has higher power (lower type II error), lower type I error, and higher replicability compared to the 5 x 2 cv t test, 2) omnivariate rules are better in that they choose whichever condition is more accurate, selecting the right model automatically and separately for each condition in a rule.Publisher's VersionAuthor Post Prin

Isik University Academic Open Access

Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers

Author: Dinh David
Simhadri Harsha Vardhan
Tang Yuan
Publication venue
Publication date: 14/02/2016
Field of study

The nested parallel (a.k.a. fork-join) model is widely used for writing parallel programs. However, the two composition constructs, i.e. "

\parallel

" (parallel) and "

;

" (serial), are insufficient in expressing "partial dependencies" or "partial parallelism" in a program. We propose a new dataflow composition construct "

\leadsto

" to express partial dependencies in algorithms in a processor- and cache-oblivious way, thus extending the Nested Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign several divide-and-conquer algorithms ranging from dense linear algebra to dynamic-programming in the ND model and prove that they all have optimal span while retaining optimal cache complexity. We propose the design of runtime schedulers that map ND programs to multicore processors with multiple levels of possibly shared caches (i.e, Parallel Memory Hierarchies) and provide theoretical guarantees on their ability to preserve locality and load balance. For this, we adapt space-bounded (SB) schedulers for the ND model. We show that our algorithms have increased "parallelizability" in the ND model, and that SB schedulers can use the extra parallelizability to achieve asymptotically optimal bounds on cache misses and running time on a greater number of processors than in the NP model. The running time for the algorithms in this paper is

O\left(\frac{\sum_{i=0}^{h-1} Q^{*}({\mathsf t};\sigma\cdot M_i)\cdot C_i}{p}\right)

, where

Q^{*}

is the cache complexity of task

{\mathsf t}

C_i

is the cost of cache miss at level-

i

cache which is of size

M_i

\sigma\in(0,1)

is a constant, and

p

is the number of processors in an

h

-level cache hierarchy

arXiv.org e-Print Archive

Crossref

Modelling and Searching of Combinatorial Spaces Based on Markov Logic Networks

Author: Floriana Esposito
Marenglen Biba
Stefano Ferilli
Publication venue
Publication date: 01/01/2011
Field of study

Markov Logic Networks (MLNs) combine Markov networks (MNs) and first-order logic by attaching weights to first-order formulas and using these as templates for features of MNs. Learning the structure of MLNs is performed by state-of-the-art methods by maximizing the likelihood of a relational database. This leads to suboptimal results for prediction tasks due to the mismatch between the objective function (likelihood) and the task of classification (maximizing conditional likelihood (CL)). In this paper we propose two algorithms for learning the structure of MLNs. The first maximizes the CL of query predicates instead of the joint likelihood of all predicates while the other maximizes the area under the Precision-Recall curve (AUC). Both algorithms set the parameters by maximum likelihood and choose structures by maximizing CL or AUC. For each of these algorithms we develop two different searching strategies. The first is based on Iterated Local Search and the second on Greedy Randomized Adaptive Search Procedure. We compare the performances of these randomized search approaches on real-world datasets and show that on larger datasets, the ILS-based approaches perform better, both in terms of CLL and AUC, while on small datasets, ILS and RBS approaches are competitive and RBS can also lead to better results for AUC

Archivio istituzionale della ricerca - Università di Bari

Open Access Repository