9,425 research outputs found

    Optimizing an Organized Modularity Measure for Topographic Graph Clustering: a Deterministic Annealing Approach

    Full text link
    This paper proposes an organized generalization of Newman and Girvan's modularity measure for graph clustering. Optimized via a deterministic annealing scheme, this measure produces topologically ordered graph clusterings that lead to faithful and readable graph representations based on clustering induced graphs. Topographic graph clustering provides an alternative to more classical solutions in which a standard graph clustering method is applied to build a simpler graph that is then represented with a graph layout algorithm. A comparative study on four real world graphs ranging from 34 to 1 133 vertices shows the interest of the proposed approach with respect to classical solutions and to self-organizing maps for graphs

    Unsupervised machine learning clustering and data exploration of radio-astronomical images

    Get PDF
    In this thesis, I demonstrate a novel and efficient unsupervised clustering and data exploration method with the combination of a Self-Organising Map (SOM) and a Convolutional Autoencoder, applied to radio-astronomical images from the Radio Galaxy Zoo (RGZ) dataset. The rapidly increasing volume and complexity of radio-astronomical data have ushered in a new era of big-data astronomy which has increased the demand for Machine Learning (ML) solutions. In this era, the sheer amount of image data produced with modern instruments and has resulted in a significant data deluge. Furthermore, the morphologies of objects captured in these radio-astronomical images are highly complex and challenging to classify conclusively due to their intricate and indiscrete nature. Additionally, major radio-astronomical discoveries are unplanned and found in the unexpected, making unsupervised ML highly desirable by operating with few assumptions and without labelled training data. In this thesis, I developed a novel unsupervised ML approach as a practical solution to these astronomy challenges. Using this system, I demonstrated the use of convolutional autoencoders and SOM’s as a dimensionality reduction method to delineate the complexity and volume of astronomical data. My optimised system shows that the coupling of these methods is a powerful method of data exploration and unsupervised clustering of radio-astronomical images. The results of this thesis show this approach is capable of accurately separating features by complexity on a SOM manifold and unified distance matrix with neighbourhood similarity and hierarchical clustering of the mapped astronomical features. This method provides an effective means to explore the high-level topological relationships of image features and morphology in large datasets automatically with minimal processing time and computational resources. I achieved these capabilities with a new and innovative method of SOM training using the autoencoder compressed latent feature vector representations of radio-astronomical data, rather than raw images. Using this system, I successfully investigated SOM affine transformation invariance and analysed the true nature of rotational effects on this manifold using autoencoder random rotation training augmentations. Throughout this thesis, I present my method as a powerful new approach to data exploration technique and contribution to the field. The speed and effectiveness of this method indicates excellent scalability and holds implications for use on large future surveys, large-scale instruments such as the Square Kilometre Array and in other big-data and complexity analysis applications

    Clustering Algorithms: Their Application to Gene Expression Data

    Get PDF
    Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and iden-tify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure

    Adaptive Cooperative Learning Methodology for Oil Spillage Pattern Clustering and Prediction

    Get PDF
    The serious environmental, economic and social consequences of oil spillages could devastate any nation of the world. Notable aftermath of this effect include loss of (or serious threat to) lives, huge financial losses, and colossal damage to the ecosystem. Hence, understanding the pattern and  making precise predictions in real time is required (as opposed to existing rough and discrete prediction) to give decision makers a more realistic picture of environment. This paper seeks to address this problem by exploiting oil spillage features with sets of collected data of oil spillage scenarios. The proposed system integrates three state-of-the-art tools: self organizing maps, (SOM), ensembles of deep neural network (k-DNN) and adaptive neuro-fuzzy inference system (ANFIS). It begins with unsupervised learning using SOM, where four natural clusters were discovered and used in making the data suitable for classification and prediction (supervised learning) by ensembles of k-DNN and ANFIS. Results obtained showed the significant classification and prediction improvements, which is largely attributed to the hybrid learning approach, ensemble learning and cognitive reasoning capabilities. However, optimization of k-DNN structure and weights would be needed for speed enhancement. The system would provide a means of understanding the nature, type and severity of oil spillages thereby facilitating a rapid response to impending oils spillages. Keywords: SOM, ANFIS, Fuzzy Logic, Neural Network, Oil Spillage, Ensemble Learnin

    A review of clustering techniques and developments

    Full text link
    © 2017 Elsevier B.V. This paper presents a comprehensive study on clustering: exiting methods and developments made at various times. Clustering is defined as an unsupervised learning where the objects are grouped on the basis of some similarity inherent among them. There are different methods for clustering the objects such as hierarchical, partitional, grid, density based and model based. The approaches used in these methods are discussed with their respective states of art and applicability. The measures of similarity as well as the evaluation criteria, which are the central components of clustering, are also presented in the paper. The applications of clustering in some fields like image segmentation, object and character recognition and data mining are highlighted
    corecore