187,190 research outputs found

    Unsupervised cryo-EM data clustering through adaptively constrained K-means algorithm

    Full text link
    In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.Comment: 35 pages, 14 figure

    Improved Dynamic Parallel K-Means Algorithm using Dunn?s Index Method

    Get PDF
    K-Means is popular and widely used clustering technique in present scenario. Many research has been done in same area for the improvement of K-Means clustering algorithm, but further investigation is always required to reveal the answers of the important questions such as ?is it possible to find optimal number of clusters dynamically while ignoring the empty clusters? or ?does the parallel execution of any clustering algorithm really improves it performance in terms of speedup?. This research presents an improved K-Means algorithm which is capable to calculate the number of clusters dynamically using Dunn?s index approach and further executes the algorithm in parallel using the capabilities of Microsoft?s Task Parallel Libraries. The original K-Means and Improved parallel modified K-Means algorithm performed for the two dimensional raw data consisting different numbers of records. From the results it is clear that the Improved K-Means is better in all the scenarios either increase the numbers of clusters or change the number of records in raw data. For the same number of input clusters and different data sets in original K-Means and Improved K-Means, the performance of Modified parallel K-Means is 20 to 50 percent better than the original K-Means in terms of Execution time and Speedup

    Unsupervised clustering approach for network anomaly detection

    No full text
    This paper describes the advantages of using the anomaly detection approach over the misuse detection technique in detecting unknown network intrusions or attacks. It also investigates the performance of various clustering algorithms when applied to anomaly detection. Five different clustering algorithms: k-Means, improved k-Means, k-Medoids, EM clustering and distance-based outlier detection algorithms are used. Our experiment shows that misuse detection techniques, which implemented four different classifiers (naĂŻve Bayes, rule induction, decision tree and nearest neighbour) failed to detect network traffic, which contained a large number of unknown intrusions; where the highest accuracy was only 63.97% and the lowest false positive rate was 17.90%. On the other hand, the anomaly detection module showed promising results where the distance-based outlier detection algorithm outperformed other algorithms with an accuracy of 80.15%. The accuracy for EM clustering was 78.06%, for k-Medoids it was 76.71%, for improved k-Means it was 65.40% and for k-Means it was 57.81%. Unfortunately, our anomaly detection module produces high false positive rate (more than 20%) for all four clustering algorithms. Therefore, our future work will be more focus in reducing the false positive rate and improving the accuracy using more advance machine learning technique

    Advanced Methods to Improve Performance of K-Means Algorithm: A Review

    Get PDF
    Clustering is an unsupervised classification that is the partitioning of a data set in a set of meaningful subsets. Each object in dataset shares some common property- often proximity according to some defined distance measure. Among various types of clustering techniques, K-Means is one of the most popular algorithms. The objective of K-means algorithm is to make the distances of objects in the same cluster as small as possible. Algorithms, systems and frameworks that address clustering challenges have been more elaborated over the past years. In this review paper, we present the K-Means algorithm and its improved techniques

    Improvement of k-means Clustering Algorithm for Analyzing the Morphology of Ice Ridge Sails

    Get PDF
    An improved k-means clustering algorithm is proposed after analyzing the disadvantages of the traditional k-means algorithm. The cluster centers are initialized by combining the sample mean and standard deviation, the optimal cluster centers are searched by the hybridizing particle swarm optimization and traditional k-means algorithm, and the criterion function is improved during the iteration process to search the optimal number of clusters. The theory analysis and experimental results show that the improved algorithm not only avoids the local optima, also has greater searching capability than the traditional algorithm. This improved algorithm is used to analyze the morphology of the ridge sail (the upper surface of ice ridges). The comparison with the measured data shows that the influences of the geographical locations and the growing environments on the formation of ice ridges can be perfectly reflected by the clustered results
    • …
    corecore