Search CORE

42,522 research outputs found

Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm

Author: Turney P. D.
Publication venue
Publication date: 01/01/1995
Field of study

This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.Comment: See http://www.jair.org/ for any accompanying file

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

CogPrints Cognitive Sciences Eprint Archive

HBST: A Hamming Distance embedding Binary Search Tree for Visual Place Recognition

Author: Grisetti Giorgio
Schlegel Dominik
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Reliable and efficient Visual Place Recognition is a major building block of modern SLAM systems. Leveraging on our prior work, in this paper we present a Hamming Distance embedding Binary Search Tree (HBST) approach for binary Descriptor Matching and Image Retrieval. HBST allows for descriptor Search and Insertion in logarithmic time by exploiting particular properties of binary Feature descriptors. We support the idea behind our search structure with a thorough analysis on the exploited descriptor properties and their effects on completeness and complexity of search and insertion. To validate our claims we conducted comparative experiments for HBST and several state-of-the-art methods on a broad range of publicly available datasets. HBST is available as a compact open-source C++ header-only library.Comment: Submitted to IEEE Robotics and Automation Letters (RA-L) 2018 with International Conference on Intelligent Robots and Systems (IROS) 2018 option, 8 pages, 10 figure

arXiv.org e-Print Archive

Archivio della ricerca- Università di Roma La Sapienza

Active Learning of Multiple Source Multiple Destination Topologies

Author: Animashree An
Athina Markopoulou
Maciej Kurant
Michael Rabbat
Pegah Sattari
Senior Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2013
Field of study

We consider the problem of inferring the topology of a network with

M

sources and

N

receivers (hereafter referred to as an

M

-by-

N

network), by sending probes between the sources and receivers. Prior work has shown that this problem can be decomposed into two parts: first, infer smaller subnetwork components (i.e.,

1

-by-

N

's or

2

-by-

2

's) and then merge these components to identify the

M

-by-

N

topology. In this paper, we focus on the second part, which had previously received less attention in the literature. In particular, we assume that a

1

-by-

N

topology is given and that all

2

-by-

2

components can be queried and learned using end-to-end probes. The problem is which

2

-by-

2

's to query and how to merge them with the given

1

-by-

N

, so as to exactly identify the

2

-by-

N

topology, and optimize a number of performance metrics, including the number of queries (which directly translates into measurement bandwidth), time complexity, and memory usage. We provide a lower bound,

\lceil \frac{N}{2} \rceil

, on the number of

2

-by-

2

's required by any active learning algorithm and propose two greedy algorithms. The first algorithm follows the framework of multiple hypothesis testing, in particular Generalized Binary Search (GBS), since our problem is one of active learning, from

2

-by-

2

queries. The second algorithm is called the Receiver Elimination Algorithm (REA) and follows a bottom-up approach: at every step, it selects two receivers, queries the corresponding

2

-by-

2

, and merges it with the given

1

-by-

N

; it requires exactly

N-1

steps, which is much less than all

\binom{N}{2}

possible

2

-by-

2

's. Simulation results over synthetic and realistic topologies demonstrate that both algorithms correctly identify the

2

-by-

N

topology and are near-optimal, but REA is more efficient in practice

arXiv.org e-Print Archive

CiteSeerX

eScholarship - University of California

Caltech Authors

A survey of cost-sensitive decision tree induction algorithms

Author: Bradford J. P.
Elkan C.
Esmeir S.
Esmeir S.
Estruch V.
Fan W.
Ferri C.
Freund Y.
Hart A. E.
Knoll U.
Li J.
Lin F. Y.
Liu X.
Mease D.
Murthy S.
Ni A.
Norton S. W.
Pazzani M.
Quinlan J. R.
Quinlan J. R.
Schapire R. E.
Sunil Vadera
Susan Lomax
Swets J.
Tan M.
Ting K.
Ting K.
Ting K. M.
von Neumann J.
Zadrozny B.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2013
Field of study

The past decade has seen a significant interest on the problem of inducing decision trees that take account of costs of misclassification and costs of acquiring the features used for decision making. This survey identifies over 50 algorithms including approaches that are direct adaptations of accuracy based methods, use genetic algorithms, use anytime methods and utilize boosting and bagging. The survey brings together these different studies and novel approaches to cost-sensitive decision tree learning, provides a useful taxonomy, a historical timeline of how the field has developed and should provide a useful reference point for future research in this field

University of Salford Institutional Repository

Crossref

Evading Classifiers by Morphing in the Dark

Author: Chang Ee-Chien
Dang Hung
Huang Yue
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/08/2017
Field of study

Learning-based systems have been shown to be vulnerable to evasion through adversarial data manipulation. These attacks have been studied under assumptions that the adversary has certain knowledge of either the target model internals, its training dataset or at least classification scores it assigns to input samples. In this paper, we investigate a much more constrained and realistic attack scenario wherein the target classifier is minimally exposed to the adversary, revealing on its final classification decision (e.g., reject or accept an input sample). Moreover, the adversary can only manipulate malicious samples using a blackbox morpher. That is, the adversary has to evade the target classifier by morphing malicious samples "in the dark". We present a scoring mechanism that can assign a real-value score which reflects evasion progress to each sample based on the limited information available. Leveraging on such scoring mechanism, we propose an evasion method -- EvadeHC -- and evaluate it against two PDF malware detectors, namely PDFRate and Hidost. The experimental evaluation demonstrates that the proposed evasion attacks are effective, attaining

100\%

evasion rate on the evaluation dataset. Interestingly, EvadeHC outperforms the known classifier evasion technique that operates based on classification scores output by the classifiers. Although our evaluations are conducted on PDF malware classifier, the proposed approaches are domain-agnostic and is of wider application to other learning-based systems

arXiv.org e-Print Archive

Crossref

Geometry Helps to Compare Persistence Diagrams

Author: Kerber Michael
Morozov Dmitriy
Nigmetov Arnur
Publication venue
Publication date: 16/11/2015
Field of study

Exploiting geometric structure to improve the asymptotic complexity of discrete assignment problems is a well-studied subject. In contrast, the practical advantages of using geometry for such problems have not been explored. We implement geometric variants of the Hopcroft--Karp algorithm for bottleneck matching (based on previous work by Efrat el al.) and of the auction algorithm by Bertsekas for Wasserstein distance computation. Both implementations use k-d trees to replace a linear scan with a geometric proximity query. Our interest in this problem stems from the desire to compute distances between persistence diagrams, a problem that comes up frequently in topological data analysis. We show that our geometric matching algorithms lead to a substantial performance gain, both in running time and in memory consumption, over their purely combinatorial counterparts. Moreover, our implementation significantly outperforms the only other implementation available for comparing persistence diagrams.Comment: 20 pages, 10 figures; extended version of paper published in ALENEX 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

MPG.PuRe