132,140 research outputs found

    A clustering based technique for large scale prioritization during requirements elicitation

    Get PDF
    We consider the prioritization problem in cases where the number of requirements to prioritize is large using a clustering technique. Clustering is a method used to find classes of data elements with respect to their attributes. KMeans, one of the most popular clustering algorithms, was adopted in this research. To utilize k-means algorithm for solving requirements prioritization problems, weights of attributes of requirement sets from relevant project stakeholders are required as input parameters. This paper showed that, the output of running k-means algorithm on requirement sets varies depending on the weights provided by relevant stakeholders. The proposed approach was validated using a requirement dataset known as RALIC. The results suggested that, a synthetic method with scrambled centroids is effective for prioritizing requirements using k-means clustering

    Unravelling small world networks

    Get PDF
    New classes of random graphs have recently been shown to exhibit the small world phenomenon - they are clustered like regular lattices and yet have small average pathlengths like traditional random graphs. Small world behaviour has been observed in a number of real life networks, and hence these random graphs represent a useful modelling tool. In particular, Grindrod [Phys. Rev. E 66 (2002) 066702-1] has proposed a class of range dependent random graphs for modelling proteome networks in bioinformatics. A property of these graphs is that, when suitably ordered, most edges in the graph are short-range, in the sense that they connect near-neighbours, and relatively few are long-range. Grindrod also looked at an inverse problem - given a graph that is known to be an instance of a range dependent random graph, but with vertices in arbitrary order, can we reorder the vertices so that the short-range/long-range connectivity structure is apparent? When the graph is viewed in terms of its adjacency matrix, this becomes a problem in sparse matrix theory: find a symmetric row/column reordering that places most nonzeros close to the diagonal. Algorithms of this general nature have been proposed for other purposes, most notably for reordering to reduce fill-in and for clustering large data sets. Here, we investigate their use in the small world reordering problem. Our numerical results suggest that a spectral reordering algorithm is extremely promising, and we give some theoretical justification for this observation via the maximum likelihood principle

    Soft clustering analysis of galaxy morphologies: A worked example with SDSS

    Full text link
    Context: The huge and still rapidly growing amount of galaxies in modern sky surveys raises the need of an automated and objective classification method. Unsupervised learning algorithms are of particular interest, since they discover classes automatically. Aims: We briefly discuss the pitfalls of oversimplified classification methods and outline an alternative approach called "clustering analysis". Methods: We categorise different classification methods according to their capabilities. Based on this categorisation, we present a probabilistic classification algorithm that automatically detects the optimal classes preferred by the data. We explore the reliability of this algorithm in systematic tests. Using a small sample of bright galaxies from the SDSS, we demonstrate the performance of this algorithm in practice. We are able to disentangle the problems of classification and parametrisation of galaxy morphologies in this case. Results: We give physical arguments that a probabilistic classification scheme is necessary. The algorithm we present produces reasonable morphological classes and object-to-class assignments without any prior assumptions. Conclusions: There are sophisticated automated classification algorithms that meet all necessary requirements, but a lot of work is still needed on the interpretation of the results.Comment: 18 pages, 19 figures, 2 tables, submitted to A

    General fuzzy min-max neural network for clustering and classification

    Get PDF
    This paper describes a general fuzzy min-max (GFMM) neural network which is a generalization and extension of the fuzzy min-max clustering and classification algorithms of Simpson (1992, 1993). The GFMM method combines supervised and unsupervised learning in a single training algorithm. The fusion of clustering and classification resulted in an algorithm that can be used as pure clustering, pure classification, or hybrid clustering classification. It exhibits a property of finding decision boundaries between classes while clustering patterns that cannot be said to belong to any of existing classes. Similarly to the original algorithms, the hyperbox fuzzy sets are used as a representation of clusters and classes. Learning is usually completed in a few passes and consists of placing and adjusting the hyperboxes in the pattern space; this is an expansion-contraction process. The classification results can be crisp or fuzzy. New data can be included without the need for retraining. While retaining all the interesting features of the original algorithms, a number of modifications to their definition have been made in order to accommodate fuzzy input patterns in the form of lower and upper bounds, combine the supervised and unsupervised learning, and improve the effectiveness of operations. A detailed account of the GFMM neural network, its comparison with the Simpson's fuzzy min-max neural networks, a set of examples, and an application to the leakage detection and identification in water distribution systems are given

    Median evidential c-means algorithm and its application to community detection

    Get PDF
    Median clustering is of great value for partitioning relational data. In this paper, a new prototype-based clustering method, called Median Evidential C-Means (MECM), which is an extension of median c-means and median fuzzy c-means on the theoretical framework of belief functions is proposed. The median variant relaxes the restriction of a metric space embedding for the objects but constrains the prototypes to be in the original data set. Due to these properties, MECM could be applied to graph clustering problems. A community detection scheme for social networks based on MECM is investigated and the obtained credal partitions of graphs, which are more refined than crisp and fuzzy ones, enable us to have a better understanding of the graph structures. An initial prototype-selection scheme based on evidential semi-centrality is presented to avoid local premature convergence and an evidential modularity function is defined to choose the optimal number of communities. Finally, experiments in synthetic and real data sets illustrate the performance of MECM and show its difference to other methods

    Autonomous clustering using rough set theory

    Get PDF
    This paper proposes a clustering technique that minimises the need for subjective human intervention and is based on elements of rough set theory. The proposed algorithm is unified in its approach to clustering and makes use of both local and global data properties to obtain clustering solutions. It handles single-type and mixed attribute data sets with ease and results from three data sets of single and mixed attribute types are used to illustrate the technique and establish its efficiency

    Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering

    Get PDF
    This study introduces a new method for detecting and sorting spikes from multiunit recordings. The method combines the wavelet transform, which localizes distinctive spike features, with superparamagnetic clustering, which allows automatic classification of the data without assumptions such as low variance or gaussian distributions. Moreover, an improved method for setting amplitude thresholds for spike detection is proposed. We describe several criteria for implementation that render the algorithm unsupervised and fast. The algorithm is compared to other conventional methods using several simulated data sets whose characteristics closely resemble those of in vivo recordings. For these data sets, we found that the proposed algorithm outperformed conventional methods
    corecore