5 research outputs found

    A review of clustering techniques and developments

    Full text link
    © 2017 Elsevier B.V. This paper presents a comprehensive study on clustering: exiting methods and developments made at various times. Clustering is defined as an unsupervised learning where the objects are grouped on the basis of some similarity inherent among them. There are different methods for clustering the objects such as hierarchical, partitional, grid, density based and model based. The approaches used in these methods are discussed with their respective states of art and applicability. The measures of similarity as well as the evaluation criteria, which are the central components of clustering, are also presented in the paper. The applications of clustering in some fields like image segmentation, object and character recognition and data mining are highlighted

    Dimensionality Reduction With Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy

    No full text
    This article presents a two-phase scheme to select reduced number of features from a dataset using Genetic Algorithm (GA) and testing the classification accuracy (CA) of the dataset with the reduced feature set. In the first phase of the proposed work, an unsupervised approach to select a subset of features is applied. GA is used to select stochastically reduced number of features with Sammon Error as the fitness function. Different subsets of features are obtained. In the second phase, each of the reduced features set is applied to test the CA of the dataset. The CA of a data set is validated using supervised k-nearest neighbor (k-nn) algorithm. The novelty of the proposed scheme is that each reduced feature set obtained in the first phase is investigated for CA using the k-nn classification with different Minkowski metric i.e. non-Euclidean norms instead of conventional Euclidean norm (L2). Final results are presented in the article with extensive simulations on seven real and one synthetic, data sets. It is revealed from the proposed investigation that taking different norms produces better CA and hence a scope for better feature subset selection

    A Machine Learning Clustering Technique for Autism Screening and Other Applications

    Get PDF
    Clustering is one of the challenging machine learning techniques due to its unsupervised learning nature. While many clustering algorithms constrain objects to single clusters, K-means overlapping partitioning clustering based methods assign objects to multiple clusters by relaxing the constraints and allowing objects to belong to more than one cluster to better fit hidden structures in the data. However, when datasets contain outliers, they can significantly influence the mean distance of the data objects to their respective clusters, which is a drawback. Therefore, most researchers address this problem by simply removing the outliers. This can be problematic especially in applications such as autism screening, fraud detection, and cybersecurity attacks among others. In this thesis, an alternative solution to this problem is proposed that captures outliers and stores them on the fly within a new cluster, instead of discarding. The new algorithm is named Outlier-based Multi-Cluster Overlapping K-Means Extension (OMCOKE). The algorithm addresses an issue previously ignored by other work in overlapping clustering and therefore benefits various stakeholders as these outliers could have real-life applications. The proposed solution has been evaluated on a crucial behavioural science problem called screening of autistic traits to improve the performance of detecting autism spectrum disorder (ASD) traits and reduce features redundancy. OMCOKE was integrated as a learning algorithm with a semi-supervised ML framework approach called Clustering based Autistic Trait Classification (CATC) in Chapter 5. Based on the experimental results obtained on real datasets related to autism screening OMCOKE was able to identify potential autism cases based on their similarity traits as opposed to conventional scoring functions used by ASD screening tools. Moreover, the empirical results obtained by OMCOKE on different datasets involving children, adolescents, and adults were compared to other results produced by common ML techniques. The results showed that our semi-supervised framework offers models with higher predictive accuracy, sensitivity, and specificity rates than those of other intelligent classification approaches such as Artificial Neural Network (ANN), Random Forest, and Random Trees, and Rule Induction. These models are useful since they are exploited by diagnosticians and other stakeholders involved in ASD screening besides highlighting the most influential features. The chapters in this thesis have been disseminated or are under review in various reputable journals and in refereed conference proceedings
    corecore