159,592 research outputs found
Online Spectral Clustering on Network Streams
Graph is an extremely useful representation of a wide variety of practical systems in data analysis. Recently, with the fast accumulation of stream data from various type of networks, significant research interests have arisen on spectral clustering for network streams (or evolving networks). Compared with the general spectral clustering problem, the data analysis of this new type of problems may have additional requirements, such as short processing time, scalability in distributed computing environments, and temporal variation tracking. However, to design a spectral clustering method to satisfy these requirements certainly presents non-trivial efforts. There are three major challenges for the new algorithm design. The first challenge is online clustering computation. Most of the existing spectral methods on evolving networks are off-line methods, using standard eigensystem solvers such as the Lanczos method. It needs to recompute solutions from scratch at each time point. The second challenge is the parallelization of algorithms. To parallelize such algorithms is non-trivial since standard eigen solvers are iterative algorithms and the number of iterations can not be predetermined. The third challenge is the very limited existing work. In addition, there exists multiple limitations in the existing method, such as computational inefficiency on large similarity changes, the lack of sound theoretical basis, and the lack of effective way to handle accumulated approximate errors and large data variations over time. In this thesis, we proposed a new online spectral graph clustering approach with a family of three novel spectrum approximation algorithms. Our algorithms incrementally update the eigenpairs in an online manner to improve the computational performance. Our approaches outperformed the existing method in computational efficiency and scalability while retaining competitive or even better clustering accuracy. We derived our spectrum approximation techniques GEPT and EEPT through formal theoretical analysis. The well established matrix perturbation theory forms a solid theoretic foundation for our online clustering method. We facilitated our clustering method with a new metric to track accumulated approximation errors and measure the short-term temporal variation. The metric not only provides a balance between computational efficiency and clustering accuracy, but also offers a useful tool to adapt the online algorithm to the condition of unexpected drastic noise. In addition, we discussed our preliminary work on approximate graph mining with evolutionary process, non-stationary Bayesian Network structure learning from non-stationary time series data, and Bayesian Network structure learning with text priors imposed by non-parametric hierarchical topic modeling
Fast Color Quantization Using Weighted Sort-Means Clustering
Color quantization is an important operation with numerous applications in
graphics and image processing. Most quantization methods are essentially based
on data clustering algorithms. However, despite its popularity as a general
purpose clustering algorithm, k-means has not received much respect in the
color quantization literature because of its high computational requirements
and sensitivity to initialization. In this paper, a fast color quantization
method based on k-means is presented. The method involves several modifications
to the conventional (batch) k-means algorithm including data reduction, sample
weighting, and the use of triangle inequality to speed up the nearest neighbor
search. Experiments on a diverse set of images demonstrate that, with the
proposed modifications, k-means becomes very competitive with state-of-the-art
color quantization methods in terms of both effectiveness and efficiency.Comment: 30 pages, 2 figures, 4 table
A Study on Clustering for Clustering Based Image De-Noising
In this paper, the problem of de-noising of an image contaminated with
Additive White Gaussian Noise (AWGN) is studied. This subject is an open
problem in signal processing for more than 50 years. Local methods suggested in
recent years, have obtained better results than global methods. However by more
intelligent training in such a way that first, important data is more effective
for training, second, clustering in such way that training blocks lie in
low-rank subspaces, we can design a dictionary applicable for image de-noising
and obtain results near the state of the art local methods. In the present
paper, we suggest a method based on global clustering of image constructing
blocks. As the type of clustering plays an important role in clustering-based
de-noising methods, we address two questions about the clustering. The first,
which parts of the data should be considered for clustering? and the second,
what data clustering method is suitable for de-noising.? Then clustering is
exploited to learn an over complete dictionary. By obtaining sparse
decomposition of the noisy image blocks in terms of the dictionary atoms, the
de-noised version is achieved. In addition to our framework, 7 popular
dictionary learning methods are simulated and compared. The results are
compared based on two major factors: (1) de-noising performance and (2)
execution time. Experimental results show that our dictionary learning
framework outperforms its competitors in terms of both factors.Comment: 9 pages, 8 figures, Journal of Information Systems and
Telecommunications (JIST
- …