Search CORE

7 research outputs found

Streamed Sampling on Dynamic data as Support for Classification Model

Author: Djatna Taufik
Silvanie Astried
Sukoco Heru
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/12/2013
Field of study

Data mining process on dynamically changing data have several problems, such as unknown data size and changing of class distribution. Random sampling method commonly applied for extracting general synopsis from very large database. In this research, Vitter’s reservoir algorithm is used to retrieve k records of data from the database and put into the sample. Sample is used as input for classification task in data mining. Sample type is backing sample and it saved as table contains value of id, priority and timestamp. Priority indicates the probability of how long data retained in the sample. Kullback-Leibler divergence applied to measure the similarity between database and sample distribution. Result of this research is showed that continuously taken samples randomly is possible when transaction occurs. Kullback-Leibler divergence with interval from 0 to 0.0001, is a very good measure to maintain similar class distribution between database and sample. Sample results are always up to date on new transactions with similar class distribution. Classifier built from balance class distribution showed to have better performance than from imbalance one

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Streamed Sampling on Dynamic data as Support for Classification Model

Author
Publication venue: 'Universitas Ahmad Dahlan'
Publication date
Field of study

Crossref

Adaptive Evolutionary Clustering

Author: AC Harvey
Alfred O. Hero III
DJ Fenn
GW Milligan
H Lütkepohl
H Ning
HW Kuhn
J Schäfer
J Shi
Kevin S. Xu
M Charikar
Mark Kliger
N Eagle
O Ledoit
PJ Mucha
S Haykin
S Tadepalli
T Hastie
T Yang
TW Anderson
U Luxburg von
Y Chen
Y Chi
YR Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In many practical applications of clustering, the objects to be clustered evolve over time, and a clustering result is desired at each time step. In such applications, evolutionary clustering typically outperforms traditional static clustering by producing clustering results that reflect long-term trends while being robust to short-term variations. Several evolutionary clustering algorithms have recently been proposed, often by adding a temporal smoothness penalty to the cost function of a static clustering method. In this paper, we introduce a different approach to evolutionary clustering by accurately tracking the time-varying proximities between objects followed by static clustering. We present an evolutionary clustering framework that adaptively estimates the optimal smoothing parameter using shrinkage estimation, a statistical approach that improves a naive estimate using additional information. The proposed framework can be used to extend a variety of static clustering algorithms, including hierarchical, k-means, and spectral clustering, into evolutionary clustering algorithms. Experiments on synthetic and real data sets indicate that the proposed framework outperforms static clustering and existing evolutionary clustering algorithms in many scenarios.Comment: To appear in Data Mining and Knowledge Discovery, MATLAB toolbox available at http://tbayes.eecs.umich.edu/xukevin/affec

arXiv.org e-Print Archive

CiteSeerX

Crossref

Computational Methods for Learning and Inference on Dynamic Networks.

Author: Xu Kevin S.
Publication venue
Publication date: 01/01/2012
Field of study

Networks are ubiquitous in science, serving as a natural representation for many complex physical, biological, and social phenomena. Significant efforts have been dedicated to analyzing such network representations to reveal their structure and provide some insight towards the phenomena of interest. Computational methods for analyzing networks have typically been designed for static networks, which cannot capture the time-varying nature of many complex phenomena. In this dissertation, I propose new computational methods for machine learning and statistical inference on dynamic networks with time-evolving structures. Specifically, I develop methods for visualization, tracking, clustering, and prediction of dynamic networks. The proposed methods take advantage of the dynamic nature of the network by intelligently combining observations at multiple time steps. This involves the development of novel statistical models and state-space representations of dynamic networks. Using the methods proposed in this dissertation, I identify long-term trends and structural changes in a variety of dynamic network data sets including a social network of spammers and a network of physical proximity among employees and students at a university campus.PHDElectrical Engineering-SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/94022/1/xukevin_1.pd

CiteSeerX

Deep Blue Documents at the University of Michigan

Mining Naturally Smooth Evolution of Clusters from Dynamic Data

Author
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date
Field of study

Crossref

Abstract Mining Naturally Smooth Evolution of Clusters from Dynamic Data ∗

Author: Jianhua Feng
Shi-xia Liu
Yi Wang
Publication venue
Publication date
Field of study

Many clustering algorithms have been proposed to partition a set of static data points into groups. In this paper, we consider an evolutionary clustering problem where the input data points may move, disappeare, and emerge. Generally, these changes should result in a smooth evolution of the clusters. Mining this naturally smooth evolution is valuable for providing an aggregated view of the numerous individual behaviors. We solve this novel and generalized form of clustering problem by converting it into a Bayesian learning problem. Analogous to that the EM clustering algorithm clusters static data points by learning a Gaussian mixture model, our method mines the evolution of clusters from dynamic data points by learning a hidden semi-Markov model (HSMM). By utilizing characteristic

CiteSeerX