54 research outputs found

    Automatic Clustering with Single Optimal Solution

    Get PDF
    Determining optimal number of clusters in a dataset is a challenging task. Though some methods are available, there is no algorithm that produces unique clustering solution. The paper proposes an Automatic Merging for Single Optimal Solution (AMSOS) which aims to generate unique and nearly optimal clusters for the given datasets automatically. The AMSOS is iteratively merges the closest clusters automatically by validating with cluster validity measure to find single and nearly optimal clusters for the given data set. Experiments on both synthetic and real data have proved that the proposed algorithm finds single and nearly optimal clustering structure in terms of number of clusters, compactness and separation.Comment: 13 pages,4 Tables, 3 figure

    PFU: Profiling Forum users in online social networks, a knowledge driven data mining approach

    Get PDF
    Online Social Networks (OSNs) provide platform to raise opinions on various issues, create and spread news rapidly in Online Social Network Forums (OSNFs). This work proposes a novel method for Profiling Forum Users (PFU) by exploring their behavioral characteristics based on their involvement in various topics of discussion and number of posts in respective topics posted by them in OSNFs dynamically. Modeling the proposed method mathematically, the PFU algorithm is illustrated for its adequacy and accuracy

    Balancing Exploitation And Exploration Search Behavior On Nature-Inspired Clustering Algorithms

    Get PDF
    Nature-inspired optimization-based clustering techniques are powerful, robust and more sophisticated than the conventional clustering methods due to their stochastic and heuristic characteristics. Unfortunately, these algorithms suffer with several drawbacks such as the tendency to be trapped or stagnate into local optima and slow convergence rates. The latter drawbacks are consequences of the difficulty in balancing the exploration and exploitation processes which directly affect the final quality of the clustering solutions. Hence, this research has proposed three enhanced frameworks, namely, Optimized Gravitational-based (OGC), Density-Based Particle Swarm Optimization (DPSO), and Variance-based Differential Evolution with an Optional Crossover (VDEO) frameworks for data clustering. In the OGC framework, the exhibited explorative search behavior of the Gravitational Clustering (GC) algorithm has been addressed by (i) eliminating the agent velocity accumulation, and (ii) integrating an initialization method of agents using variance and median to subrogate the exploration process. Moreover, the balance between the exploration and exploitation processes in the DPSO framework is considered using a combination of (i) a kernel density estimation technique associated with new bandwidth estimation method and (ii) estimated multi-dimensional gravitational learning coefficients. Lastly, (i) a single-based solution representation, (ii) a switchable mutation scheme, (iii) a vector-based estimation of the mutation factor, and (iv) an optional crossover strategy are proposed in the VDEO framework. The overall performances of the three proposed frameworks have been compared with several current state-of-the-art clustering algorithms on 15 benchmark datasets from the UCI repository. The experimental results are also thoroughly evaluated and verified via non-parametric statistical analysis. Based on the obtained experimental results, the OGC, DPSO, and VDEO frameworks achieved an average enhancement up to 24.36%, 9.38%, and 11.98% of classification accuracy, respectively. All the frameworks also achieved the first rank by the Friedman aligned-ranks (FA) test in all evaluation metrics. Moreover, the three frameworks provided convergent performances in terms of the repeatability. Meanwhile, the OGC framework obtained a significant performance in terms of the classification accuracy, where the VDEO framework presented a significant performance in terms of cluster compactness. On the other hand, the DPSO framework favored the balanced state by producing very competitive results compared to the OGC and DPSO in both evaluation metrics. As a conclusion, balancing the search behavior notably enhanced the overall performance of the three proposed frameworks and made each of them an excellent tool for data clustering

    U-Control Chart Based Differential Evolution Clustering for Determining the Number of Cluster in k-Means

    Get PDF
    The automatic clustering differential evolution (ACDE) is one of the clustering methods that are able to determine the cluster number automatically. However, ACDE still makes use of the manual strategy to determine k activation threshold thereby affecting its performance. In this study, the ACDE problem will be ameliorated using the u-control chart (UCC) then the cluster number generated from ACDE will be fed to k-means. The performance of the proposed method was tested using six public datasets from the UCI repository about academic efficiency (AE) and evaluated with Davies Bouldin Index (DBI) and Cosine Similarity (CS) measure. The results show that the proposed method yields excellent performance compared to prior researches

    A Weighted Voting Classifier Based on Differential Evolution

    Get PDF
    Ensemble learning is to employ multiple individual classifiers and combine their predictions, which could achieve better performance than a single classifier. Considering that different base classifier gives different contribution to the final classification result, this paper assigns greater weights to the classifiers with better performance and proposes a weighted voting approach based on differential evolution. After optimizing the weights of the base classifiers by differential evolution, the proposed method combines the results of each classifier according to the weighted voting combination rule. Experimental results show that the proposed method not only improves the classification accuracy, but also has a strong generalization ability and universality

    Attribute Selection Algorithm with Clustering based Optimization Approach based on Mean and Similarity Distance

    Get PDF
    With hundreds or thousands of attributes in high-dimensional data, the computational workload is challenging. Attributes that have no meaningful influence on class predictions throughout the classification process increase the computing load. This article's goal is to use attribute selection to reduce the size of high-dimensional data, which will lessen the computational load. Considering selected attribute subsets that cover all attributes. As a result, there are two stages to the process: filtering out superfluous information and settling on a single attribute to stand in for a group of similar but otherwise meaningless characteristics. Numerous studies on attribute selection, including backward and forward selection, have been undertaken. This experiment and the accuracy of the categorization result recommend a k-means based PSO clustering-based attribute selection. It is likely that related attributes are present in the same cluster while irrelevant attributes are not identified in any clusters. Datasets for Credit Approval, Ionosphere, Annealing, Madelon, Isolet, and Multiple Attributes are employed alongside two other high-dimensional datasets. Both databases include the class label for each data point. Our test demonstrates that attribute selection using k-means clustering may be done to offer a subset of characteristics and that doing so produces classification outcomes that are more accurate than 80%
    corecore