6,485 research outputs found

    Cluster validity in clustering methods

    Get PDF

    Meta-optimizations for Cluster Analysis

    Get PDF
    This dissertation thesis deals with advances in the automation of cluster analysis.This dissertation thesis deals with advances in the automation of cluster analysis

    Estimates of unresolved point sources contribution to WMAP 5

    Full text link
    We present an alternative estimate of the unresolved point source contribution to the WMAP temperature power spectrum based on current knowledge of sources from radio surveys in the 1.4-90 GHz range. We implement a stochastic extrapolation of radio point sources in the NRAO-VLA Sky Survey (NVSS) catalog, from the original 1.4 GHz to the ~ 100 GHz frequency range relevant for CMB experiments. With a bootstrap approach, we generate an ensemble of realizations that provides the probability distribution for the flux of each NVSS source at the final frequency. The predicted source counts agree with WMAP results for S > 1 Jy and the corresponding sky maps correlate with WMAP observed maps in Q-, V- and W- bands, for sources with flux S > 0.2 Jy. The low-frequency radio surveys found a steeper frequency dependence for sources just below the WMAP nominal threshold than the one estimated by the WMAP team. This feature is present in our simulations and translates into a shift of 0.3-0.4 \sigma in the estimated value of the tilt of the power spectrum of scalar perturbation, n_s, as well as \omega_c. This approach demonstrates the use of external point sources datasets for CMB data analysis.Comment: 12 pages, 8 figures, to be published on MNRA

    Adaptive Cooperative Learning Methodology for Oil Spillage Pattern Clustering and Prediction

    Get PDF
    The serious environmental, economic and social consequences of oil spillages could devastate any nation of the world. Notable aftermath of this effect include loss of (or serious threat to) lives, huge financial losses, and colossal damage to the ecosystem. Hence, understanding the pattern and  making precise predictions in real time is required (as opposed to existing rough and discrete prediction) to give decision makers a more realistic picture of environment. This paper seeks to address this problem by exploiting oil spillage features with sets of collected data of oil spillage scenarios. The proposed system integrates three state-of-the-art tools: self organizing maps, (SOM), ensembles of deep neural network (k-DNN) and adaptive neuro-fuzzy inference system (ANFIS). It begins with unsupervised learning using SOM, where four natural clusters were discovered and used in making the data suitable for classification and prediction (supervised learning) by ensembles of k-DNN and ANFIS. Results obtained showed the significant classification and prediction improvements, which is largely attributed to the hybrid learning approach, ensemble learning and cognitive reasoning capabilities. However, optimization of k-DNN structure and weights would be needed for speed enhancement. The system would provide a means of understanding the nature, type and severity of oil spillages thereby facilitating a rapid response to impending oils spillages. Keywords: SOM, ANFIS, Fuzzy Logic, Neural Network, Oil Spillage, Ensemble Learnin

    clusterBMA: Bayesian model averaging for clustering

    Full text link
    Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen. Bayesian model averaging (BMA) is a popular approach for combining results across multiple models that offers some attractive benefits in this setting, including probabilistic interpretation of the combined cluster structure and quantification of model-based uncertainty. In this work we introduce clusterBMA, a method that enables weighted model averaging across results from multiple unsupervised clustering algorithms. We use clustering internal validation criteria to develop an approximation of the posterior model probability, used for weighting the results from each model. From a consensus matrix representing a weighted average of the clustering solutions across models, we apply symmetric simplex matrix factorisation to calculate final probabilistic cluster allocations. In addition to outperforming other ensemble clustering methods on simulated data, clusterBMA offers unique features including probabilistic allocation to averaged clusters, combining allocation probabilities from 'hard' and 'soft' clustering algorithms, and measuring model-based uncertainty in averaged cluster allocation. This method is implemented in an accompanying R package of the same name

    The Consensus Clustering as a Contribution to Parental Recognition Problem Based on Hand Biometrics

    Full text link
    The clustering analysis is a subject that has been interesting researchers from several areas, such as health (medical diagnosis, clustering of proteins and genes), marketing (market analysis and image segmentation), information management (clustering of web pages). The clustering algorithms are usually applied in Data Mining, allowing the identification of natural groups for a given data set. The use of different clustering methods for the same data set can produce different groups. So, several studies have been led to validate the resulting clusters. There has been an increasing interest on how to determine a consensus clustering that combines the different individual clusterings, reflecting the main structure in clusters inherent to each of them, as a perspective to get a higher quality clustering. As several techniques of consensus clustering have been researched, the present work focuses on problem of finding the best partition in the consensus clustering. We analyze the most referred techniques in literature, the consensus clustering techniques with different mechanisms to achieve the consensus, i.e.; Voting mechanisms; Co-association matrix; Mutual Information and hyper-graphs; and a multi-objective consensus clustering existing on literature. In this paper we discuss these approaches and a comparative study is presented, that considers a set of experiments using two-dimensional synthetic data sets with different characteristics, as number of clusters, their cardinality, shape, homogeneity and separability, and a real-world data set based on hand\u27s biometrics shape, in context of people parental recognition. With this data we intend to investigate the ability of the consensus clustering algorithms in correctly cluster a child and her/his parents. This has an enormous business potential leading to a great economic value, since that with this technology a website can match data, as hand\u27s photographs, and say if A and B are related somehow. We conclude that, in some cases, the multi-objective technique proved to outperform the other techniques, and unlike the other techniques, is little influenced by poor clustering even in situations like noise introduction and clusters with different homogeneity or overlapped. Furthermore, shows that can capture the performance of the best base clustering and still outperform it. Regarding to real data, no technique was capable of identifying a person\u27s mother/father. However, the research of distances between hands from a person and its father, mother, siblings, can retrieve the probability of that person being his/her familiar. This doesn\u27t enable the identification of relatives but instead, decreases the size of database for seeking the matches

    An overview of clustering methods with guidelines for application in mental health research

    Get PDF
    Cluster analyzes have been widely used in mental health research to decompose inter-individual heterogeneity by identifying more homogeneous subgroups of individuals. However, despite advances in new algorithms and increasing popularity, there is little guidance on model choice, analytical framework and reporting requirements. In this paper, we aimed to address this gap by introducing the philosophy, design, advantages/disadvantages and implementation of major algorithms that are particularly relevant in mental health research. Extensions of basic models, such as kernel methods, deep learning, semi-supervised clustering, and clustering ensembles are subsequently introduced. How to choose algorithms to address common issues as well as methods for pre-clustering data processing, clustering evaluation and validation are then discussed. Importantly, we also provide general guidance on clustering workflow and reporting requirements. To facilitate the implementation of different algorithms, we provide information on R functions and librarie
    corecore