Search CORE

2,787 research outputs found

An Enhanced Initialization Method to Find an Initial Center for K-modes Clustering

Author: S. Saranya, Dr.P.Jayanthi
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 26/02/2017
Field of study

Data mining is a technique which extracts the information from the large amount of data. To group the objects having similar characteristics, clustering method is used. K-means clustering algorithm is very efficient for large data sets deals with numerical quantities however it not works well for real world data sets which contain categorical values for most of the attributes. K-modes algorithm is used in the place of K-means algorithm. In the existing system, the initialization of K- modes clustering from the view of outlier detection is considered. It avoids that various initial cluster centers come from the same cluster. To overcome the above said limitation, it uses Initial_Distance and Initial_Entropy algorithms which use a new weightage formula to calculate the degree of outlierness of each object. K-modes algorithm can guarantee that the chosen initial cluster centers are not outliers. To improve the performance further, a new modified distance metric -weighted matching distance is used to calculate the distance between two objects during the process of initialization. As well as, one of the data pre-processing methods is used to improve the quality of data. Experiments are carried out on several data sets from UCI repository and the results demonstrated the effectiveness of the initialization method in the proposed algorithm

International Journal on Recent and Innovation Trends in Computing and Communication

Outlier Detection using Boxplot-Mean Algorithm

Author: Deeksha Agrawal Rajesh Boghey
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 28/04/2016
Field of study

In this paper, we present a novel method for the detection of outlier in intrusion detection system. The proposed detection algorithm, are called hybrid algorithm. It is combination of two algorithm k-mean and boxplot. Experimental results demonstrate to be superior to existing SCF algorithm. One of the most common problems in existing SCF technique detection techniques is that such as ignoring dependency among categorical variables, handling data streams and mixed data sets. Moreover, identifying number of outliers in advance is an impractical issue in the SCF algorithm and other outlier identification techniques. This paper investigates the performances of boxplot-mean method for detecting different types of abnormal data. Keywords: Outlier detection techniques, clustering, scf, genetic and boxplotmean technique

International Institute for Science, Technology and Education (IISTE): E-Journals

A Method Non-Deterministic and Computationally Viable for Detecting Outliers in Large Datasets

Author: Abreu Ortega Miguel
Berna-Martinez Jose Vicente
Fernández Oliva Alberto
Maciá Pérez Francisco
Publication venue: Institute of Information Science, Academia Sinica
Publication date: 01/05/2020
Field of study

This paper presents an outlier detection method that is based on a Variable Precision Rough Set Model (VPRSM). This method generalizes the standard set inclusion relation, which is the foundation of the Rough Sets Basic Model (RSBM). The main contribution of this research is an improvement in the quality of detection because this generalization allows us to classify when there is some degree of uncertainty. From the proposed method, a computationally viable algorithm for large volumes of data is also introduced. The experiments performed in a real scenario and a comparison of the results with the RSBM-based method demonstrate the efficiency of both the method and the algorithm in diverse contexts that involve large volumes of data.This work has been supported by grant TIN2016-78103-C2-2-R, and University of Alicante projects GRE14-02 and Smart University

Repositorio Institucional de la Universidad de Alicante

Data mining based cyber-attack detection

Author: Tianfield Huaglory
Publication venue
Publication date: 31/05/2017
Field of study

ResearchOnline@GCU

Data Stream Clustering: Challenges and Issues

Author: Khalilian Madjid
Mustapha Norwati
Publication venue
Publication date: 01/01/2010
Field of study

Very large databases are required to store massive amounts of data that are continuously inserted and queried. Analyzing huge data sets and extracting valuable pattern in many applications are interesting for researchers. We can identify two main groups of techniques for huge data bases mining. One group refers to streaming data and applies mining techniques whereas second group attempts to solve this problem directly with efficient algorithms. Recently many researchers have focused on data stream as an efficient strategy against huge data base mining instead of mining on entire data base. The main problem in data stream mining means evolving data is more difficult to detect in this techniques therefore unsupervised methods should be applied. However, clustering techniques can lead us to discover hidden information. In this survey, we try to clarify: first, the different problem definitions related to data stream clustering in general; second, the specific difficulties encountered in this field of research; third, the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and how several prominent solutions tackle different problems. Index Terms- Data Stream, Clustering, K-Means, Concept driftComment: IMECS201

arXiv.org e-Print Archive

CiteSeerX

A review of clustering techniques and developments

Author: Bharill N
Ding W
Er MJ
Gupta A
Lin CT
Patel OP
Prasad M
Saxena A
Tiwari A
Publication venue: 'Elsevier BV'
Publication date: 06/12/2017
Field of study

© 2017 Elsevier B.V. This paper presents a comprehensive study on clustering: exiting methods and developments made at various times. Clustering is defined as an unsupervised learning where the objects are grouped on the basis of some similarity inherent among them. There are different methods for clustering the objects such as hierarchical, partitional, grid, density based and model based. The approaches used in these methods are discussed with their respective states of art and applicability. The measures of similarity as well as the evaluation criteria, which are the central components of clustering, are also presented in the paper. The applications of clustering in some fields like image segmentation, object and character recognition and data mining are highlighted

OPUS - University of Technology Sydney