4,022 research outputs found

    Cluster validation by measurement of clustering characteristics relevant to the user

    Full text link
    There are many cluster analysis methods that can produce quite different clusterings on the same dataset. Cluster validation is about the evaluation of the quality of a clustering; "relative cluster validation" is about using such criteria to compare clusterings. This can be used to select one of a set of clusterings from different methods, or from the same method ran with different parameters such as different numbers of clusters. There are many cluster validation indexes in the literature. Most of them attempt to measure the overall quality of a clustering by a single number, but this can be inappropriate. There are various different characteristics of a clustering that can be relevant in practice, depending on the aim of clustering, such as low within-cluster distances and high between-cluster separation. In this paper, a number of validation criteria will be introduced that refer to different desirable characteristics of a clustering, and that characterise a clustering in a multidimensional way. In specific applications the user may be interested in some of these criteria rather than others. A focus of the paper is on methodology to standardise the different characteristics so that users can aggregate them in a suitable way specifying weights for the various criteria that are relevant in the clustering application at hand.Comment: 20 pages 2 figure

    Habitat filtering determines spatial variation of macroinvertebrate community traits in northern headwater streams

    Get PDF
    Although our knowledge of the spatial distribution of stream organisms has been increasing rapidly in the last decades, there is still little consensus about trait-based variability of macroinvertebrate communities within and between catchments in near-pristine systems. Our aim was to examine the taxonomic and trait based stability vs. variability of stream macroinvertebrates in three high-latitude catchments in Finland. The collected taxa were assigned to unique trait combinations (UTCs) using biological traits. We found that only a single or a highly limited number of taxa formed a single UTC, suggesting a low degree of redundancy. Our analyses revealed significant differences in the environmental conditions of the streams among the three catchments. Linear models, rarefaction curves and beta-diversity measures showed that the catchments differed in both alpha and beta diversity. Taxon- and trait-based multivariate analyses also indicated that the three catchments were significantly different in terms of macroinvertebrate communities. All these findings suggest that habitat filtering, i.e., environmental differences among catchments, determines the variability of macroinvertebrate communities, thereby contributing to the significant biological differences among the catchments. The main implications of our study is that the sensitivity of trait-based analyses to natural environmental variation should be carefully incorporated in the assessment of environmental degradation, and that further studies are needed for a deeper understanding of trait-based community patterns across near-pristine streams

    Data granulation by the principles of uncertainty

    Full text link
    Researches in granular modeling produced a variety of mathematical models, such as intervals, (higher-order) fuzzy sets, rough sets, and shadowed sets, which are all suitable to characterize the so-called information granules. Modeling of the input data uncertainty is recognized as a crucial aspect in information granulation. Moreover, the uncertainty is a well-studied concept in many mathematical settings, such as those of probability theory, fuzzy set theory, and possibility theory. This fact suggests that an appropriate quantification of the uncertainty expressed by the information granule model could be used to define an invariant property, to be exploited in practical situations of information granulation. In this perspective, a procedure of information granulation is effective if the uncertainty conveyed by the synthesized information granule is in a monotonically increasing relation with the uncertainty of the input data. In this paper, we present a data granulation framework that elaborates over the principles of uncertainty introduced by Klir. Being the uncertainty a mesoscopic descriptor of systems and data, it is possible to apply such principles regardless of the input data type and the specific mathematical setting adopted for the information granules. The proposed framework is conceived (i) to offer a guideline for the synthesis of information granules and (ii) to build a groundwork to compare and quantitatively judge over different data granulation procedures. To provide a suitable case study, we introduce a new data granulation technique based on the minimum sum of distances, which is designed to generate type-2 fuzzy sets. We analyze the procedure by performing different experiments on two distinct data types: feature vectors and labeled graphs. Results show that the uncertainty of the input data is suitably conveyed by the generated type-2 fuzzy set models.Comment: 16 pages, 9 figures, 52 reference

    Earnings efficiency and poverty dominance analysis: a spatial approach

    Get PDF
    The paper estimates an earnings frontier by the method of Corrected Ordinary Least Squares (COLS) and categorizes households as efficient or inefficient based on some benchmark efficiency score and the estimated frontier. The spatial distribution of the poor and non poor households is then explored by construction of a poverty segregation curve across efficiency zones. Robust poverty comparisons across the efficient and inefficient groups reveal that poverty is in fact higher for the efficient group compared to the inefficient one. The paper thus indirectly supports the “poor but efficient hypothesisâ€.Earnings Frontier, Poverty, Stochastic Dominance, Treatment Effect.

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
    • …
    corecore