440 research outputs found
Solving Multiclass Learning Problems via Error-Correcting Output Codes
Multiclass learning problems involve finding a definition for an unknown
function f(x) whose range is a discrete set containing k > 2 values (i.e., k
``classes''). The definition is acquired by studying collections of training
examples of the form [x_i, f (x_i)]. Existing approaches to multiclass learning
problems include direct application of multiclass algorithms such as the
decision-tree algorithms C4.5 and CART, application of binary concept learning
algorithms to learn individual binary functions for each of the k classes, and
application of binary concept learning algorithms with distributed output
representations. This paper compares these three approaches to a new technique
in which error-correcting codes are employed as a distributed output
representation. We show that these output representations improve the
generalization performance of both C4.5 and backpropagation on a wide range of
multiclass learning tasks. We also demonstrate that this approach is robust
with respect to changes in the size of the training sample, the assignment of
distributed representations to particular classes, and the application of
overfitting avoidance techniques such as decision-tree pruning. Finally, we
show that---like the other methods---the error-correcting code technique can
provide reliable class probability estimates. Taken together, these results
demonstrate that error-correcting output codes provide a general-purpose method
for improving the performance of inductive learning programs on multiclass
problems.Comment: See http://www.jair.org/ for any accompanying file
The use of provenance in information retrieval
The volume of electronic information that users accumulate is steadily rising. A recent study [2] found that there were on average 32,000 pieces of information (e-mails, web pages, documents, etc.) for each user. The problem of organizin
Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification
Mammogram classification is directly related to computer-aided diagnosis of
breast cancer. Traditional methods rely on regions of interest (ROIs) which
require great efforts to annotate. Inspired by the success of using deep
convolutional features for natural image analysis and multi-instance learning
(MIL) for labeling a set of instances/patches, we propose end-to-end trained
deep multi-instance networks for mass classification based on whole mammogram
without the aforementioned ROIs. We explore three different schemes to
construct deep multi-instance networks for whole mammogram classification.
Experimental results on the INbreast dataset demonstrate the robustness of
proposed networks compared to previous work using segmentation and detection
annotations.Comment: MICCAI 2017 Camera Read
Fast Reinforcement Learning with Large Action Sets Using Error-Correcting Output Codes for MDP Factorization
International audienceThe use of Reinforcement Learning in real-world scenarios is strongly limited by issues of scale. Most RL learning algorithms are unable to deal with problems composed of hundreds or sometimes even dozens of possible actions, and therefore cannot be applied to many real-world problems. We consider the RL problem in the supervised classification framework where the optimal policy is obtained through a multiclass classifier, the set of classes being the set of actions of the problem. We introduce error-correcting output codes (ECOCs) in this setting and propose two new methods for reducing complexity when using rollouts-based approaches. The first method consists in using an ECOC-based classifier as the multiclass classifier, reducing the learning complexity from O(A2) to O(Alog(A)) . We then propose a novel method that profits from the ECOC's coding dictionary to split the initial MDP into O(log(A)) separate two-action MDPs. This second method reduces learning complexity even further, from O(A2) to O(log(A)) , thus rendering problems with large action sets tractable. We finish by experimentally demonstrating the advantages of our approach on a set of benchmark problems, both in speed and performance
Adapting Quality Assurance to Adaptive Systems: The Scenario Coevolution Paradigm
From formal and practical analysis, we identify new challenges that
self-adaptive systems pose to the process of quality assurance. When tackling
these, the effort spent on various tasks in the process of software engineering
is naturally re-distributed. We claim that all steps related to testing need to
become self-adaptive to match the capabilities of the self-adaptive
system-under-test. Otherwise, the adaptive system's behavior might elude
traditional variants of quality assurance. We thus propose the paradigm of
scenario coevolution, which describes a pool of test cases and other
constraints on system behavior that evolves in parallel to the (in part
autonomous) development of behavior in the system-under-test. Scenario
coevolution offers a simple structure for the organization of adaptive testing
that allows for both human-controlled and autonomous intervention, supporting
software engineering for adaptive systems on a procedural as well as technical
level.Comment: 17 pages, published at ISOLA 201
Continuing education in structural biology for science teachers
The present paper sought to identify what perception teachers from Natural Science fields have on the use of instructional strategies that make use of models to represent biomolecules. The data presented are related to two continuing education courses\ud
carried out with teachers from public schools of the state of São Paulo (Brazil). Such data showed that the teachers approved the use of instructional materials such as the ones suggested in the courses (e.g., construction of a 3-D biomolecular structure) and\ud
they pointed out some advantages and obstacles to the use of such materials.\ud
© 2010 Elsevier Ltd. All rights reserved
Finding a short and accurate decision rule in disjunctive normal form by exhaustive search
Greedy approaches suffer from a restricted search space which could lead to suboptimal classifiers in terms of performance and classifier size. This study discusses exhaustive search as an alternative to greedy search for learning short and accurate decision rules. The Exhaustive Procedure for LOgic-Rule Extraction (EXPLORE) algorithm is presented, to induce decision rules in disjunctive normal form (DNF) in a systematic and efficient manner. We propose a method based on subsumption to reduce the number of values considered for instantiation in the literals, by taking into account the relational operator without loss of performance. Furthermore, we describe a branch-and-bound approach that makes optimal use of user-defined performance constraints. To improve the generalizability we use a validation set to determine the optimal length of the DNF rule. The performance and size of the DNF rules induced by EXPLORE are compared to those of eight well-known rule learners. Our results show that an exhaustive approach to rule learning in DNF results in significantly smaller classifiers than those of the other rule learners, while securing comparable or even better performance. Clearly, exhaustive search is computer-intensive and may not always be feasible. Nevertheless, based on this study, we believe that exhaustive search should be considered an alternative for greedy search in many problems
Ontology of core data mining entities
In this article, we present OntoDM-core, an ontology of core data mining
entities. OntoDM-core defines themost essential datamining entities in a three-layered
ontological structure comprising of a specification, an implementation and an application
layer. It provides a representational framework for the description of mining
structured data, and in addition provides taxonomies of datasets, data mining tasks,
generalizations, data mining algorithms and constraints, based on the type of data.
OntoDM-core is designed to support a wide range of applications/use cases, such as
semantic annotation of data mining algorithms, datasets and results; annotation of
QSAR studies in the context of drug discovery investigations; and disambiguation of
terms in text mining. The ontology has been thoroughly assessed following the practices
in ontology engineering, is fully interoperable with many domain resources and
is easy to extend
- …