Search CORE

12 research outputs found

A Clustering-Based Algorithm for Data Reduction

Author: Lee Shie-Jue
Ouyang Jeng
Yeh Chi-Yuan
Publication venue: IEEE SMC Hiroshima Chapter
Publication date: 01/11/2009
Field of study

Finding an efficient data reduction method for large-scale problems is an imperative task. In this paper, we propose a similarity-based self-constructing fuzzy clustering algorithm to do the sampling of instances for the classification task. Instances that are similar to each other are grouped into the same cluster. When all the instances have been fed in, a number of clusters are formed automatically. Then the statistical mean for each cluster will be regarded as representing all the instances covered in the cluster. This approach has two advantages. One is that it can be faster and uses less storage memory. The other is that the number of new representative instances need not be specified in advance by the user. Experiments on real-world datasets show that our method can run faster and obtain better reduction rate than other methods

Hiroshima University Institutional Repository

Okayama University Scientific Achievement Repository

Selección de prototipos basada en conjuntos rugosos difusos

Author: Cornelis Chris
Herrera Francisco
Verbiest Nele
Publication venue: 'Universidad de Valladolid'
Publication date: 01/01/2012
Field of study

En este trabajo abordamos uno de los principales problemas de k vecinos más cercanos (kNN): su sensibilidad al ruido. Llevamos a cabo Selección de Prototipos (SP), es decir, eliminamos instancias ruidosas para mejorar la calidad de la clasificación de k vecinos más cercanos. Concretamente, basándonos en un método existente de selección de instancias basada en conjuntos rugosos difusos, construimos un algoritmo de tipo envoltura que tiene en cuenta la granularidad óptima de la relación difusa de indiscernibilidad en cada conjunto de datos. Llamamos a este método Selección de Prototipos a base de Conjuntos Aproximados Difusos (SPCAD). La comparación del enfoque con el estado del arte en Selección de Prototipos confirma que nuestro método ofrece buenos resultados: supera a todos los métodos de selección de prototipos existentes con respeto a la precisión de la clasificación

Ghent University Academic Bibliography

OWA-FRPS: A Prototype Selection method based on Ordered Weighted Average Fuzzy Rough Set Theory

Author: C. Brodley
C. Cornelis
C. Cornelis
D. Dubois
E. Marchiori
G. Gates
I. Tomek
J. Cano
J. Derrac
J. Riquelme
J. Sanchez
K. Hattori
L. Kuncheva
L. Kuncheva
R. Barandela
R. Yager
S. García
S. García
T. Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The Nearest Neighbor (NN) algorithm is a well-known and effective classification algorithm. Prototype Selection (PS), which provides NN with a good training set to pick its neighbors from, is an important topic as NN is highly susceptible to noisy data. Accurate state-of-the-art PS methods are generally slow, which motivates us to propose a new PS method, called OWA-FRPS. Based on the Ordered Weighted Average (OWA) fuzzy rough set model, we express the quality of instances, and use a wrapper approach to decide which instances to select. An experimental evaluation shows that OWA-FRPS is significantly more accurate than state-of-the-art PS methods without requiring a high computational cost.Spanish Government TIN2011-2848

Crossref

Ghent University Academic Bibliography

Repositorio Institucional Universidad de Granada

Data Mining with Supervised Instance Selection Improves Artificial Neural Network Classification Accuracy

Author: S. Srinivas Reddy , et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/08/2023
Field of study

IDSs may monitor intrusion logs, traffic control packets, and assaults. Nets create large amounts of data. IDS log characteristics are used to detect whether a record or connection was attacked or regular network activity. Reduced feature size aids machine learning classification. This paper describes a standardised and systematic intrusion detection classification approach. Using dataset signatures, the Naive Bayes Algorithm, Random Tree, and Neural Network classifiers are assessed. We examine the feature reduction efficacy of PCA and the fisheries score in this study. The first round of testing uses a reduced dataset without decreasing the components set, and the second uses principal components analysis. PCA boosts classification accuracy by 1.66 percent. Artificial immune systems, inspired by the human immune system, use learning, long-term memory, and association to recognise and v-classify. Introduces the Artificial Neural Network (ANN) classifier model and its development issues. Iris and Wine data from the UCI learning repository proves the ANN approach works. Determine the role of dimension reduction in ANN-based classifiers. Detailed mutual information-based feature selection methods are provided. Simulations from the KDD Cup'99 demonstrate the method's efficacy. Classifying big data is important to tackle most engineering, health, science, and business challenges. Labelled data samples train a classifier model, which classifies unlabeled data samples into numerous categories. Fuzzy logic and artificial neural networks (ANNs) are used to classify data in this dissertation

International Journal on Recent and Innovation Trends in Computing and Communication

Profiling Instances in Noise Reduction

Author: Delany Sarah Jane
MacNamee Brian
Segata Nicola
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2012
Field of study

The dependency on the quality of the training data has led to significant work in noise reduction for instance-based learning algorithms. This paper presents an empirical evaluation of current noise reduction techniques, not just from the perspective of their comparative performance, but from the perspective of investigating the types of instances that they focus on for re- moval. A novel instance profiling technique known as RDCL profiling allows the structure of a training set to be analysed at the instance level cate- gorising each instance based on modelling their local competence properties. This profiling approach o↵ers the opportunity of investigating the types of instances removed by the noise reduction techniques that are currently in use in instance-based learning. The paper also considers the e↵ect of removing instances with specific profiles from a dataset and shows that a very simple approach of removing instances that are misclassified by the training set and cause other instances in the dataset to be misclassified is an e↵ective noise reduction technique

Arrow@TUDublin

Instance selection of linear complexity for big data

Over recent decades, database sizes have grown considerably. Larger sizes present new challenges, because machine learning algorithms are not prepared to process such large volumes of information. Instance selection methods can alleviate this problem when the size of the data set is medium to large. However, even these methods face similar problems with very large-to-massive data sets. In this paper, two new algorithms with linear complexity for instance selection purposes are presented. Both algorithms use locality-sensitive hashing to find similarities between instances. While the complexity of conventional methods (usually quadratic, O(n2), or log-linear, O(nlogn)) means that they are unable to process large-sized data sets, the new proposal shows competitive results in terms of accuracy. Even more remarkably, it shortens execution time, as the proposal manages to reduce complexity and make it linear with respect to the data set size. The new proposal has been compared with some of the best known instance selection methods for testing and has also been evaluated on large data sets (up to a million instances).Supported by the Research Projects TIN 2011-24046 and TIN 2015-67534-P from the Spanish Ministry of Economy and Competitiveness

Elsevier - Publisher Connector

Crossref

Repositorio Institucional de la Universidad de Burgos

Hit Miss Networks with Applications to Instance Selection

Author: Marchiori E.
Publication venue
Publication date: 01/01/2008
Field of study

Contains fulltext : 84542.pdf (publisher's version ) (Open Access)21 p

CiteSeerX

Radboud Repository

Estudio de métodos de selección de instancias

Author: Arnaiz González Álvar
Publication venue: 'Universidad de Burgos'
Publication date: 01/01/2018
Field of study

En la tesis se ha realizado un estudio de las técnicas de selección de instancias: analizando el estado del arte y desarrollando nuevos métodos para cubrir algunas áreas que no habían recibido la debida atención hasta el momento. Los dos primeros capítulos presentan nuevos métodos de selección de instancias para regresión, un tema poco estudiado hasta la fecha en la literatura. El tercer capítulo, estudia la posibilidad de cómo la combinación de algoritmos de selección de instancias para regresión ofrece mejores resultados que los métodos por sí mismos. El último de los capítulos presenta una novedosa idea: la utilización de las funciones hash localmente sensibles para diseñar dos nuevos algoritmos de selección de instancias para clasificación. La ventaja que presenta esta solución, es que ambos algoritmos tienen complejidad lineal. Los resultados de esta tesis han sido publicados en cuatro artículos en revistas JCR del primer cuartil.Ministerio de Economía, Industria y Competitividad, la Junta de Castilla y León y el Fondo Europeo para el Desarrollo Regional, proyectos TIN 2011-24046, TIN 2015-67534-P (MINECO/FEDER) y BU085P17 (JCyL/FEDER

Crossref

Repositorio Institucional de la Universidad de Burgos

Fuzzy rough and evolutionary approaches to instance selection

Author: Verbiest Nele
Publication venue: Ghent University. Faculty of Sciences
Publication date: 01/01/2014
Field of study

Ghent University Academic Bibliography