393,529 research outputs found
ANALISA PERBANDINGAN CLUSTERING-BASED, DISTANCE-BASED DAN DENSITY-BASED DALAM MENDETEKSI OUTLIER ANALYSIS OF CLUSTERING-BASED, DISTANCE-BASED AND DENSITY-BASED IN OUTLIER DETECTION
ABSTRAKSI: Data Mining adalah proses pencarian pola-pola dan kecenderungan yang menarik dari dalam basis data berukuran besar. Sebuah outlier didefinisikan sebagai sebuah titik data pada suatu data set dimana sangat berbeda dibandingkan dengan titik data pada data set pada umumnya dengan suatu ukuran tertentu. Outlier ini walaupun mempunyai kelakuan yang abnormal, seringkali mengandung informasi yang sangat berguna. Permasalahan deteksi outlier ini mempunyai peran yang sangat penting pada aplikasi deteksi kecurangan, analisis kekuatan jaringan dan deteksi intrusi.Pencarian outlier biasanya dengan konsep keterdekatan berdasarkan hubungannya dengan sisa data yang ada. Pada data berdimensi tinggi, kepadatan data akan semakin berkurang, akibatnya dugaan akan keterdekatan antar data menjadi gagal.Pada tugas akhir ini akan dilakukan perbandingan metode dalam pencarian suatu outlier dalam data berdimensi tinggi. Metode yang akan dibandingkan yaitu: Clustering-based, Density-based, dan Distance-based. Dimana masing-masing metode telah mendukung data berdimensi tinggi.Kata Kunci : data mining, outlier, deteksi outlier, metode deteksi outlier.ABSTRACT: Data mining is interesting patterns and trend finding process in large database. Outlier defined as a data point in database where is different than data point from common database with fixed size. Even outlier have an abnormal behaviour, often contain important information. Outlier detection have important role in fraud detection, intrusion detection, and network monitoring application.Finding an outlier usually using proximity based on existing remain data. In high dimensional data, data become spare, finally proximity notion data become failed.In this final assignment, will doing methods comparison finding outlier in high dimensional data. Existing methods which will be use is Clustering-based, Density-based, and Distance-based. Where each methods support on high dimensional data.Keyword: data mining,outlier, outlier detection, outlier detection method
SOTXTSTREAM: Density-based self-organizing clustering of text streams
A streaming data clustering algorithm is presented building upon the density-based selforganizing stream clustering algorithm SOSTREAM. Many density-based clustering algorithms are limited by their inability to identify clusters with heterogeneous density. SOSTREAM addresses this limitation through the use of local (nearest neighbor-based) density determinations. Additionally, many stream clustering algorithms use a two-phase clustering approach. In the first phase, a micro-clustering solution is maintained online, while in the second phase, the micro-clustering solution is clustered offline to produce a macro solution. By performing self-organization techniques on micro-clusters in the online phase, SOSTREAM is able to maintain a macro clustering solution in a single phase. Leveraging concepts from SOSTREAM, a new density-based self-organizing text stream clustering algorithm, SOTXTSTREAM, is presented that addresses several shortcomings of SOSTREAM. Gains in clustering performance of this new algorithm are demonstrated on several real-world text stream datasets
Cluster Evaluation of Density Based Subspace Clustering
Clustering real world data often faced with curse of dimensionality, where
real world data often consist of many dimensions. Multidimensional data
clustering evaluation can be done through a density-based approach. Density
approaches based on the paradigm introduced by DBSCAN clustering. In this
approach, density of each object neighbours with MinPoints will be calculated.
Cluster change will occur in accordance with changes in density of each object
neighbours. The neighbours of each object typically determined using a distance
function, for example the Euclidean distance. In this paper SUBCLU, FIRES and
INSCY methods will be applied to clustering 6x1595 dimension synthetic
datasets. IO Entropy, F1 Measure, coverage, accurate and time consumption used
as evaluation performance parameters. Evaluation results showed SUBCLU method
requires considerable time to process subspace clustering; however, its value
coverage is better. Meanwhile INSCY method is better for accuracy comparing
with two other methods, although consequence time calculation was longer.Comment: 6 pages, 15 figure
Fast multi-image matching via density-based clustering
We consider the problem of finding consistent matches
across multiple images. Previous state-of-the-art solutions
use constraints on cycles of matches together with convex
optimization, leading to computationally intensive iterative
algorithms. In this paper, we propose a clustering-based
formulation. We first rigorously show its equivalence with
the previous one, and then propose QuickMatch, a novel
algorithm that identifies multi-image matches from a density
function in feature space. We use the density to order the
points in a tree, and then extract the matches by breaking this
tree using feature distances and measures of distinctiveness.
Our algorithm outperforms previous state-of-the-art methods
(such as MatchALS) in accuracy, and it is significantly faster
(up to 62 times faster on some bechmarks), and can scale to
large datasets (with more than twenty thousands features).Accepted manuscriptSupporting documentatio
- …