7 research outputs found
Streamed Sampling on Dynamic data as Support for Classification Model
Data mining process on dynamically changing data have several problems, such as unknown data size and changing of class distribution. Random sampling method commonly applied for extracting general synopsis from very large database. In this research, Vitter’s reservoir algorithm is used to retrieve k records of data from the database and put into the sample. Sample is used as input for classification task in data mining. Sample type is backing sample and it saved as table contains value of id, priority and timestamp. Priority indicates the probability of how long data retained in the sample. Kullback-Leibler divergence applied to measure the similarity between database and sample distribution. Result of this research is showed that continuously taken samples randomly is possible when transaction occurs. Kullback-Leibler divergence with interval from 0 to 0.0001, is a very good measure to maintain similar class distribution between database and sample. Sample results are always up to date on new transactions with similar class distribution. Classifier built from balance class distribution showed to have better performance than from imbalance one
Adaptive Evolutionary Clustering
In many practical applications of clustering, the objects to be clustered
evolve over time, and a clustering result is desired at each time step. In such
applications, evolutionary clustering typically outperforms traditional static
clustering by producing clustering results that reflect long-term trends while
being robust to short-term variations. Several evolutionary clustering
algorithms have recently been proposed, often by adding a temporal smoothness
penalty to the cost function of a static clustering method. In this paper, we
introduce a different approach to evolutionary clustering by accurately
tracking the time-varying proximities between objects followed by static
clustering. We present an evolutionary clustering framework that adaptively
estimates the optimal smoothing parameter using shrinkage estimation, a
statistical approach that improves a naive estimate using additional
information. The proposed framework can be used to extend a variety of static
clustering algorithms, including hierarchical, k-means, and spectral
clustering, into evolutionary clustering algorithms. Experiments on synthetic
and real data sets indicate that the proposed framework outperforms static
clustering and existing evolutionary clustering algorithms in many scenarios.Comment: To appear in Data Mining and Knowledge Discovery, MATLAB toolbox
available at http://tbayes.eecs.umich.edu/xukevin/affec
Computational Methods for Learning and Inference on Dynamic Networks.
Networks are ubiquitous in science, serving as a natural representation for many complex physical, biological, and social phenomena. Significant efforts have been dedicated to analyzing such network representations to reveal their structure and provide some insight towards the phenomena of interest. Computational methods for analyzing networks have typically been designed for static networks, which cannot capture the time-varying nature of many complex phenomena.
In this dissertation, I propose new computational methods for machine learning and statistical inference on dynamic networks with time-evolving structures. Specifically, I develop methods for visualization, tracking, clustering, and prediction of dynamic networks. The proposed methods take advantage of the dynamic nature of the network by intelligently combining observations at multiple time steps. This involves the development of novel statistical models and state-space representations of dynamic networks. Using the methods proposed in this dissertation, I identify long-term trends and structural changes in a variety of dynamic network data sets including a social network of spammers and a network of physical proximity among employees and students at a university campus.PHDElectrical Engineering-SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/94022/1/xukevin_1.pd
Abstract Mining Naturally Smooth Evolution of Clusters from Dynamic Data ∗
Many clustering algorithms have been proposed to partition a set of static data points into groups. In this paper, we consider an evolutionary clustering problem where the input data points may move, disappeare, and emerge. Generally, these changes should result in a smooth evolution of the clusters. Mining this naturally smooth evolution is valuable for providing an aggregated view of the numerous individual behaviors. We solve this novel and generalized form of clustering problem by converting it into a Bayesian learning problem. Analogous to that the EM clustering algorithm clusters static data points by learning a Gaussian mixture model, our method mines the evolution of clusters from dynamic data points by learning a hidden semi-Markov model (HSMM). By utilizing characteristic