9,795 research outputs found

    Fast k-means based on KNN Graph

    Full text link
    In the era of big data, k-means clustering has been widely adopted as a basic processing tool in various contexts. However, its computational cost could be prohibitively high as the data size and the cluster number are large. It is well known that the processing bottleneck of k-means lies in the operation of seeking closest centroid in each iteration. In this paper, a novel solution towards the scalability issue of k-means is presented. In the proposal, k-means is supported by an approximate k-nearest neighbors graph. In the k-means iteration, each data sample is only compared to clusters that its nearest neighbors reside. Since the number of nearest neighbors we consider is much less than k, the processing cost in this step becomes minor and irrelevant to k. The processing bottleneck is therefore overcome. The most interesting thing is that k-nearest neighbor graph is constructed by iteratively calling the fast kk-means itself. Comparing with existing fast k-means variants, the proposed algorithm achieves hundreds to thousands times speed-up while maintaining high clustering quality. As it is tested on 10 million 512-dimensional data, it takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the same scale of clustering, it would take 3 years for traditional k-means

    Mathematical approaches to digital color image denoising

    Get PDF
    Many mathematical models have been designed to remove noise from images. Most of them focus on grey value images with additive artificial noise. Only very few specifically target natural color photos taken by a digital camera with real noise. Noise in natural color photos have special characteristics that are substantially different from those that have been added artificially. In this thesis previous denoising models are reviewed. We analyze the strengths and weakness of existing denoising models by showing where they perform well and where they don't. We put special focus on two models: The steering kernel regression model and the non-local model. For Kernel Regression model, an adaptive bilateral filter is introduced as complementary to enhance it. Also a non-local bilateral filter is proposed as an application of the idea of non-local means filter. Then the idea of cross-channel denoising is proposed in this thesis. It is effective in denoising monochromatic images by understanding the characteristics of digital noise in natural color images. A non-traditional color space is also introduced specifically for this purpose. The cross-channel paradigm can be applied to most of the exisiting models to greatly improve their performance for denoising natural color images.Ph.D.Committee Chair: Haomin Zhou; Committee Member: Luca Dieci; Committee Member: Ronghua Pan; Committee Member: Sung Ha Kang; Committee Member: Yang Wan
    corecore