448 research outputs found

    Partition Around Medoids Clustering on the Intel Xeon Phi Many-Core Coprocessor

    Full text link
    Abstract. The paper touches upon the problem of implementation Partition Around Medoids (PAM) clustering algorithm for the Intel Many Integrated Core architecture. PAM is a form of well-known k-Medoids clustering algorithm and is applied in various subject domains, e.g. bioinformatics, text analysis, intelligent transportation systems, etc. An optimized version of PAM for the Intel Xeon Phi coprocessor is introduced where OpenMP parallelizing technology, loop vectorization, tiling technique and efficient distance matrix computation for Euclidean metric are used. Experimental results for different data sets confirm the efficiency of the proposed algorithm

    Effective Spell Checking Methods Using Clustering Algorithms

    Get PDF
    This paper presents a novel approach to spell checking using dictionary clustering. The main goal is to reduce the number of times distances have to be calculated when finding target words for misspellings. The method is unsupervised and combines the application of anomalous pattern initialization and partition around medoids (PAM). To evaluate the method, we used an English misspelling list compiled using real examples extracted from the Birkbeck spelling error corpus.Final Published versio

    MACOC: a medoid-based ACO clustering algorithm

    Get PDF
    The application of ACO-based algorithms in data mining is growing over the last few years and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works concerning unsupervised learning have been focused on clustering, showing great potential of ACO-based techniques. This work presents an ACO-based clustering algorithm inspired by the ACO Clustering (ACOC) algorithm. The proposed approach restructures ACOC from a centroid-based technique to a medoid-based technique, where the properties of the search space are not necessarily known. Instead, it only relies on the information about the distances amongst data. The new algorithm, called MACOC, has been compared against well-known algorithms (K-means and Partition Around Medoids) and with ACOC. The experiments measure the accuracy of the algorithm for both synthetic datasets and real-world datasets extracted from the UCI Machine Learning Repository

    Automatic Dimension Selection for a Non-negative Factorization Approach to Clustering Multiple Random Graphs

    Full text link
    We consider a problem of grouping multiple graphs into several clusters using singular value thesholding and non-negative factorization. We derive a model selection information criterion to estimate the number of clusters. We demonstrate our approach using "Swimmer data set" as well as simulated data set, and compare its performance with two standard clustering algorithms.Comment: This paper has been withdrawn by the author due to a newer version with overlapping content

    Deep Gaussian Mixture Models

    Get PDF
    Deep learning is a hierarchical inference method formed by subsequent multiple layers of learning able to more efficiently describe complex relationships. In this work, Deep Gaussian Mixture Models are introduced and discussed. A Deep Gaussian Mixture model (DGMM) is a network of multiple layers of latent variables, where, at each layer, the variables follow a mixture of Gaussian distributions. Thus, the deep mixture model consists of a set of nested mixtures of linear models, which globally provide a nonlinear model able to describe the data in a very flexible way. In order to avoid overparameterized solutions, dimension reduction by factor models can be applied at each layer of the architecture thus resulting in deep mixtures of factor analysers.Comment: 19 pages, 4 figure

    Modeling Wheezing Spells Identifies Phenotypes with Different Outcomes and Genetic Associates

    Get PDF
    Funding Information: Supported by the UK Medical Research Council (UK MRC) Programme grant MR/S025340/1 and grants G0601361 and MR/K002449/1. R.G. is in part funded through Wellcome Trust Strategic Award 108818/15/Z. The UK MRC and Wellcome (grant ref: 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC (Avon Longitudinal Study of Parents and Children). MAAS (Manchester Asthma and Allergy Study) was supported by the Asthma UK Grants No 301 (1995–1998), No 362 (1998–2001), No 01/012 (2001–2004), No 04/014 (2004–2007), British Medical Association James Trust (2005), and the JP Moulton Charitable Foundation (2004–2016), the North West Lung Centre Charity (1997–current), and the UK MRC grant MR/L012693/1 (2014–2018). Acknowledgment This article is dedicated to the memory of our wonderful colleague and friend Prof. John Henderson (1958–2019), whose contribution to the understanding of the heterogeneity of childhood asthma cannot be overstated. Rainbow chasers and UNICORN riders forever.Peer reviewedPublisher PD

    A New Partitioning Around Medoids Algorithm

    Get PDF
    Kaufman & Rousseeuw (1990) proposed a clustering algorithm Partitioning Around Medoids (PAM) which maps a distance matrix into a specified number of clusters. A particularly nice property is that PAM allows clustering with respect to any specified distance metric. In addition, the medoids are robust representations of the cluster centers, which is particularly important in the common context that many elements do not belong well to any cluster. Based on our experience in clustering gene expression data, we have noticed that PAM does have problems recognizing relatively small clusters in situations where good partitions around medoids clearly exist. In this note, we propose to partition around medoids by maximizing a criteria Average Silhouette\u27\u27 defined by Kaufman & Rousseeuw. We also propose a fast-to-compute approximation of Average Silhouette\u27\u27. We implement these two new partitioning around medoids algorithms and illustrate their performance relative to existing partitioning methods in simulations

    Disrupted Modularity and Local Connectivity of Brain Functional Networks in Childhood-Onset Schizophrenia

    Get PDF
    Modularity is a fundamental concept in systems neuroscience, referring to the formation of local cliques or modules of densely intra-connected nodes that are sparsely inter-connected with nodes in other modules. Topological modularity of brain functional networks can quantify theoretically anticipated abnormality of brain network community structure – so-called dysmodularity – in developmental disorders such as childhood-onset schizophrenia (COS). We used graph theory to investigate topology of networks derived from resting-state fMRI data on 13 COS patients and 19 healthy volunteers. We measured functional connectivity between each pair of 100 regional nodes, focusing on wavelet correlation in the frequency interval 0.05–0.1 Hz, then applied global and local thresholding rules to construct graphs from each individual association matrix over the full range of possible connection densities. We show how local thresholding based on the minimum spanning tree facilitates group comparisons of networks by forcing the connectedness of sparse graphs. Threshold-dependent graph theoretical results are compatible with the results of a k-means unsupervised learning algorithm and a multi-resolution (spin glass) approach to modularity, both of which also find community structure but do not require thresholding of the association matrix. In general modularity of brain functional networks was significantly reduced in COS, due to a relatively reduced density of intra-modular connections between neighboring regions. Other network measures of local organization such as clustering were also decreased, while complementary measures of global efficiency and robustness were increased, in the COS group. The group differences in complex network properties were mirrored by differences in simpler statistical properties of the data, such as the variability of the global time series and the internal homogeneity of the time series within anatomical regions of interest
    corecore