52,329 research outputs found
An experimental study on rank methods for prototype selection
Prototype selection is one of the most popular approaches for addressing the low efficiency issue typically found in the well-known k-Nearest Neighbour classification rule. These techniques select a representative subset from an original collection of prototypes with the premise of maintaining the same classification accuracy. Most recently, rank methods have been proposed as an alternative to develop new selection strategies. Following a certain heuristic, these methods sort the elements of the initial collection according to their relevance and then select the best possible subset by means of a parameter representing the amount of data to maintain. Due to the relative novelty of these methods, their performance and competitiveness against other strategies is still unclear. This work performs an exhaustive experimental study of such methods for prototype selection. A representative collection of both classic and sophisticated algorithms are compared to the aforementioned techniques in a number of datasets, including different levels of induced noise. Results report the remarkable competitiveness of these rank methods as well as their excellent trade-off between prototype reduction and achieved accuracy.This work has been supported by the Vicerrectorado de Investigación, Desarrollo e Innovación de la Universidad de Alicante through the FPU programme (UAFPU2014-5883), the Spanish Ministerio de Educación, Cultura y Deporte through a FPU Fellowship (Ref. AP2012-0939) and the Spanish Ministerio de Economía y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R, supported by UE FEDER funds) and Consejería de Educación de la Comunidad Valenciana through project PROMETEO/2012/017
Efficient Data Representation by Selecting Prototypes with Importance Weights
Prototypical examples that best summarizes and compactly represents an
underlying complex data distribution communicate meaningful insights to humans
in domains where simple explanations are hard to extract. In this paper we
present algorithms with strong theoretical guarantees to mine these data sets
and select prototypes a.k.a. representatives that optimally describes them. Our
work notably generalizes the recent work by Kim et al. (2016) where in addition
to selecting prototypes, we also associate non-negative weights which are
indicative of their importance. This extension provides a single coherent
framework under which both prototypes and criticisms (i.e. outliers) can be
found. Furthermore, our framework works for any symmetric positive definite
kernel thus addressing one of the key open questions laid out in Kim et al.
(2016). By establishing that our objective function enjoys a key property of
that of weak submodularity, we present a fast ProtoDash algorithm and also
derive approximation guarantees for the same. We demonstrate the efficacy of
our method on diverse domains such as retail, digit recognition (MNIST) and on
publicly available 40 health questionnaires obtained from the Center for
Disease Control (CDC) website maintained by the US Dept. of Health. We validate
the results quantitatively as well as qualitatively based on expert feedback
and recently published scientific studies on public health, thus showcasing the
power of our technique in providing actionability (for retail), utility (for
MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an
effective data mining method.Comment: Accepted for publication in International Conference on Data Mining
(ICDM) 201
Combination of linear classifiers using score function -- analysis of possible combination strategies
In this work, we addressed the issue of combining linear classifiers using
their score functions. The value of the scoring function depends on the
distance from the decision boundary. Two score functions have been tested and
four different combination strategies were investigated. During the
experimental study, the proposed approach was applied to the heterogeneous
ensemble and it was compared to two reference methods -- majority voting and
model averaging respectively. The comparison was made in terms of seven
different quality criteria. The result shows that combination strategies based
on simple average, and trimmed average are the best combination strategies of
the geometrical combination
Evolving Non-Dominated Parameter Sets for Computational Models from Multiple Experiments
© Peter C. R. Lane, Fernand Gobet. This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. (CC BY-NC 3.0)Creating robust, reproducible and optimal computational models is a key challenge for theorists in many sciences. Psychology and cognitive science face particular challenges as large amounts of data are collected and many models are not amenable to analytical techniques for calculating parameter sets. Particular problems are to locate the full range of acceptable model parameters for a given dataset, and to confirm the consistency of model parameters across different datasets. Resolving these problems will provide a better understanding of the behaviour of computational models, and so support the development of general and robust models. In this article, we address these problems using evolutionary algorithms to develop parameters for computational models against multiple sets of experimental data; in particular, we propose the ‘speciated non-dominated sorting genetic algorithm’ for evolving models in several theories. We discuss the problem of developing a model of categorisation using twenty-nine sets of data and models drawn from four different theories. We find that the evolutionary algorithms generate high quality models, adapted to provide a good fit to all available data.Peer reviewedFinal Published versio
Recruitment and selection processes through an effective GDSS
[[abstract]]This study proposes a group decision support system (GDSS), with multiple criteria to assist in recruitment and selection (R&S) processes of human resources. A two-phase decision-making procedure is first suggested; various techniques involving multiple criteria and group participation are then defined corresponding to each step in the procedure. A wide scope of personnel characteristics is evaluated, and the concept of consensus is enhanced. The procedure recommended herein is expected to be more effective than traditional approaches. In addition, the procedure is implemented on a network-based PC system with web interfaces to support the R&S activities. In the final stage, key personnel at a human resources department of a chemical company in southern Taiwan authenticated the feasibility of the illustrated example.[[notice]]補正完畢[[journaltype]]國內[[incitationindex]]SCI[[incitationindex]]E
- …