17,638 research outputs found

    Soft clustering analysis of galaxy morphologies: A worked example with SDSS

    Full text link
    Context: The huge and still rapidly growing amount of galaxies in modern sky surveys raises the need of an automated and objective classification method. Unsupervised learning algorithms are of particular interest, since they discover classes automatically. Aims: We briefly discuss the pitfalls of oversimplified classification methods and outline an alternative approach called "clustering analysis". Methods: We categorise different classification methods according to their capabilities. Based on this categorisation, we present a probabilistic classification algorithm that automatically detects the optimal classes preferred by the data. We explore the reliability of this algorithm in systematic tests. Using a small sample of bright galaxies from the SDSS, we demonstrate the performance of this algorithm in practice. We are able to disentangle the problems of classification and parametrisation of galaxy morphologies in this case. Results: We give physical arguments that a probabilistic classification scheme is necessary. The algorithm we present produces reasonable morphological classes and object-to-class assignments without any prior assumptions. Conclusions: There are sophisticated automated classification algorithms that meet all necessary requirements, but a lot of work is still needed on the interpretation of the results.Comment: 18 pages, 19 figures, 2 tables, submitted to A

    Quality Assessment of Linked Datasets using Probabilistic Approximation

    Full text link
    With the increasing application of Linked Open Data, assessing the quality of datasets by computing quality metrics becomes an issue of crucial importance. For large and evolving datasets, an exact, deterministic computation of the quality metrics is too time consuming or expensive. We employ probabilistic techniques such as Reservoir Sampling, Bloom Filters and Clustering Coefficient estimation for implementing a broad set of data quality metrics in an approximate but sufficiently accurate way. Our implementation is integrated in the comprehensive data quality assessment framework Luzzu. We evaluated its performance and accuracy on Linked Open Datasets of broad relevance.Comment: 15 pages, 2 figures, To appear in ESWC 2015 proceeding

    Mixtures of Skew-t Factor Analyzers

    Get PDF
    In this paper, we introduce a mixture of skew-t factor analyzers as well as a family of mixture models based thereon. The mixture of skew-t distributions model that we use arises as a limiting case of the mixture of generalized hyperbolic distributions. Like their Gaussian and t-distribution analogues, our mixture of skew-t factor analyzers are very well-suited to the model-based clustering of high-dimensional data. Imposing constraints on components of the decomposed covariance parameter results in the development of eight flexible models. The alternating expectation-conditional maximization algorithm is used for model parameter estimation and the Bayesian information criterion is used for model selection. The models are applied to both real and simulated data, giving superior clustering results compared to a well-established family of Gaussian mixture models

    Parsimonious Shifted Asymmetric Laplace Mixtures

    Full text link
    A family of parsimonious shifted asymmetric Laplace mixture models is introduced. We extend the mixture of factor analyzers model to the shifted asymmetric Laplace distribution. Imposing constraints on the constitute parts of the resulting decomposed component scale matrices leads to a family of parsimonious models. An explicit two-stage parameter estimation procedure is described, and the Bayesian information criterion and the integrated completed likelihood are compared for model selection. This novel family of models is applied to real data, where it is compared to its Gaussian analogue within clustering and classification paradigms
    • …
    corecore