Search CORE

5,446 research outputs found

A Clustering-Based Algorithm for Data Reduction

Author: Lee Shie-Jue
Ouyang Jeng
Yeh Chi-Yuan
Publication venue: IEEE SMC Hiroshima Chapter
Publication date: 01/11/2009
Field of study

Finding an efficient data reduction method for large-scale problems is an imperative task. In this paper, we propose a similarity-based self-constructing fuzzy clustering algorithm to do the sampling of instances for the classification task. Instances that are similar to each other are grouped into the same cluster. When all the instances have been fed in, a number of clusters are formed automatically. Then the statistical mean for each cluster will be regarded as representing all the instances covered in the cluster. This approach has two advantages. One is that it can be faster and uses less storage memory. The other is that the number of new representative instances need not be specified in advance by the user. Experiments on real-world datasets show that our method can run faster and obtain better reduction rate than other methods

Hiroshima University Institutional Repository

Okayama University Scientific Achievement Repository

MRPR: a MapReduce solution for prototype reduction in big data classification

Author: Alpaydin
Angiulli
Bacardit
Cano
Caruana
Chang
Chen
Cover
Daniel Peralta
Dean
Dean
Derrac
Derrac
Derrac
Francisco Herrera
García
García
García-Pedrajas
García-Pedrajas
Hart
He
Isaac Triguero
Jaume Bacardit
Kohonen
Lam
Marx
Minelli
Mollineda
Nanni
Neri
Palit
Price
Pyle
Sakr
Salvador García
Snir
Srinivasan
Sánchez
Sánchez
Triguero
Triguero
Triguero
White
Wilson
Wilson
Witten
Woniak
Zhao
Publication venue: 'Elsevier BV'
Publication date: 03/03/2014
Field of study

In the era of big data, analyzing and extracting knowledge from large-scale data sets is a very interesting and challenging task. The application of standard data mining tools in such data sets is not straightforward. Hence, a new class of scalable mining method that embraces the huge storage and processing capacity of cloud platforms is required. In this work, we propose a novel distributed partitioning methodology for prototype reduction techniques in nearest neighbor classification. These methods aim at representing original training data sets as a reduced number of instances. Their main purposes are to speed up the classification process and reduce the storage requirements and sensitivity to noise of the nearest neighbor rule. However, the standard prototype reduction methods cannot cope with very large data sets. To overcome this limitation, we develop a MapReduce-based framework to distribute the functioning of these algorithms through a cluster of computing elements, proposing several algorithmic strategies to integrate multiple partial solutions (reduced sets of prototypes) into a single one. The proposed model enables prototype reduction algorithms to be applied over big data classification problems without significant accuracy loss. We test the speeding up capabilities of our model with data sets up to 5.7 millions of instances. The results show that this model is a suitable tool to enhance the performance of the nearest neighbor classifier with big data

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Repositorio Institucional Universidad de Granada

An Overview of Classifier Fusion Methods

Author: Gabrys Bogdan
Ruta Dymitr
Publication venue
Publication date: 01/01/2000
Field of study

A number of classifier fusion methods have been recently developed opening an alternative approach leading to a potential improvement in the classification performance. As there is little theory of information fusion itself, currently we are faced with different methods designed for different problems and producing different results. This paper gives an overview of classifier fusion methods and attempts to identify new trends that may dominate this area of research in future. A taxonomy of fusion methods trying to bring some order into the existing “pudding of diversities” is also provided

CiteSeerX

Bournemouth University Research Online

An Overview of Classifier Fusion Methods

Author: Ruta Dymitr
Gabrys Bogdan
Publication venue
Publication date: 01/02/2000
Field of study

Crossref

Bournemouth University Research Online

Multilabel Prototype Generation for Data Reduction in k-Nearest Neighbour classification

Author: Alonso-Jiménez Pablo
Gallego Antonio Javier
Serra Xavier
Valero-Mas Jose J.
Publication venue
Publication date: 22/07/2022
Field of study

Prototype Generation (PG) methods are typically considered for improving the efficiency of the

k

-Nearest Neighbour (

k

NN) classifier when tackling high-size corpora. Such approaches aim at generating a reduced version of the corpus without decreasing the classification performance when compared to the initial set. Despite their large application in multiclass scenarios, very few works have addressed the proposal of PG methods for the multilabel space. In this regard, this work presents the novel adaptation of four multiclass PG strategies to the multilabel case. These proposals are evaluated with three multilabel

k

NN-based classifiers, 12 corpora comprising a varied range of domains and corpus sizes, and different noise scenarios artificially induced in the data. The results obtained show that the proposed adaptations are capable of significantly improving -- both in terms of efficiency and classification performance -- the only reference multilabel PG work in the literature as well as the case in which no PG method is applied, also presenting a statistically superior robustness in noisy scenarios. Moreover, these novel PG strategies allow prioritising either the efficiency or efficacy criteria through its configuration depending on the target scenario, hence covering a wide area in the solution space not previously filled by other works

arXiv.org e-Print Archive

Repositorio Institucional de la Universidad de Alicante

UPF Digital Repository