1,330 research outputs found

    HIERARCHICAL CLUSTERING USING LEVEL SETS

    Get PDF
    Over the past several decades, clustering algorithms have earned their place as a go-to solution for database mining. This paper introduces a new concept which is used to develop a new recursive version of DBSCAN that can successfully perform hierarchical clustering, called Level- Set Clustering (LSC). A level-set is a subset of points of a data-set whose densities are greater than some threshold, ‘t’. By graphing the size of each level-set against its respective ‘t,’ indents are produced in the line graph which correspond to clusters in the data-set, as the points in a cluster have very similar densities. This new algorithm is able to produce the clustering result with the same O(n log n) time complexity as DBSCAN and OPTICS, while catching clusters the others missed

    Automated Software Architecture Extraction Using Graph-based Clustering

    Get PDF
    As the size and complexity of software grows developers have an ever-increasing need to understand software in a modular way. Most complex software systems can be divided into smaller modules if the developer has domain knowledge of the code or up-to-date documentation. If neither of these exist discovery of code modules can be a tedious, manual process. This research hypothesizes that graph-based clustering can be used effectively for automated software architecture extraction. We propose methods of representing relationships between program artifacts as graphs and then propose new partitional algorithms to extract software modules from those graphs. To validate our hypothesis and the partitional algorithms a new set of tools, including a software data miner, cluster builder, graph viewer, and cluster score calculator, were created. This toolset was used to implement partitional algorithms and analyze their performance in extracting modules. The Xinu operating system was used as a case study because it has defined modules that can be compared to the results of the partitional algorithm

    Cluster analysis of flow cytometric list mode data on a personal computer

    Get PDF
    A cluster analysis algorithm, dedicated to analysis of flow cytometric data is described. The algorithm is written in Pascal and implemented on an MS-DOS personal computer. It uses k-means, initialized with a large number of seed points, followed by a modified nearest neighbor technique to reduce the large number of subclusters. Thus we combine the advantage of the k-means (speed) with that of the nearest neighbor technique (accuracy). In order to achieve a rapid analysis, no complex data transformations such as principal components analysis were used. \ud Results of the cluster analysis on both real and artificial flow cytometric data are presented and discussed. The results show that it is possible to get very good cluster analysis partitions, which compare favorably with manually gated analysis in both time and in reliability, using a personal computer

    A theoretical analysis of the validity of the Van Hiele levels of reasoning in graph theory

    Get PDF
    The need to develop consistent theoretical frameworks for the teaching and learning of discrete mathematics, specifically of graph theory, has attracted the attention of the researchers in mathematics education. Responding to this demand, the scope of the Van Hiele model has been extended to the field of graphs through a proposal of four levels of reasoning whose descriptors need to be validated according to the structure of this model. In this paper, the validity of these descriptors has been approached with a theoretical analysis that is organized by means of the so-called processes of reasoning, which are different mathematics abilities that students activate when solving graph theory problems: recognition, use and formulation of definitions, classification, and proof. The analysis gives support to the internal validity of the levels of reasoning in graph theory as the properties of the Van Hiele levels have been verified: fixed sequence, adjacency, distinction, and separation. Moreover, the external validity of the levels has been supported by providing evidence of their coherence with the levels of geometrical reasoning from which they originally emerge. The results thus point to the suitability of applying the Van Hiele model in the teaching and learning of graph theory

    Towards an online mitigation strategy for N2O emissions through principal components analysis and clustering techniques

    Get PDF
    Emission of N2O represents an increasing concern in wastewater treatment, in particular for its large contribution to the plant's carbon footprint (CFP). In view of the potential introduction of more stringent regulations regarding wastewater treatment plants' CFP, there is a growing need for advanced monitoring with online implementation of mitigation strategies for N2O emissions. Mechanistic kinetic modelling in full-scale applications, are often represented by a very detailed representation of the biological mechanisms resulting in an elevated uncertainty on the many parameters used while limited by a poor representation of hydrodynamics. This is particularly true for current N2O kinetic models. In this paper, a possible full-scale implementation of a data mining approach linking plant-specific dynamics to N2O production is proposed. A data mining approach was tested on full-scale data along with different clustering techniques to identify process criticalities. The algorithm was designed to provide an applicable solution for full-scale plants' control logics aimed at online N2O emission mitigation. Results show the ability of the algorithm to isolate specific N2O emission pathways, and highlight possible solutions towards emission control

    On morphological hierarchical representations for image processing and spatial data clustering

    Full text link
    Hierarchical data representations in the context of classi cation and data clustering were put forward during the fties. Recently, hierarchical image representations have gained renewed interest for segmentation purposes. In this paper, we briefly survey fundamental results on hierarchical clustering and then detail recent paradigms developed for the hierarchical representation of images in the framework of mathematical morphology: constrained connectivity and ultrametric watersheds. Constrained connectivity can be viewed as a way to constrain an initial hierarchy in such a way that a set of desired constraints are satis ed. The framework of ultrametric watersheds provides a generic scheme for computing any hierarchical connected clustering, in particular when such a hierarchy is constrained. The suitability of this framework for solving practical problems is illustrated with applications in remote sensing
    • …
    corecore