80,955 research outputs found

    Relational visual cluster validity

    Get PDF
    The assessment of cluster validity plays a very important role in cluster analysis. Most commonly used cluster validity methods are based on statistical hypothesis testing or finding the best clustering scheme by computing a number of different cluster validity indices. A number of visual methods of cluster validity have been produced to display directly the validity of clusters by mapping data into two- or three-dimensional space. However, these methods may lose too much information to correctly estimate the results of clustering algorithms. Although the visual cluster validity (VCV) method of Hathaway and Bezdek can successfully solve this problem, it can only be applied for object data, i.e. feature measurements. There are very few validity methods that can be used to analyze the validity of data where only a similarity or dissimilarity relation exists – relational data. To tackle this problem, this paper presents a relational visual cluster validity (RVCV) method to assess the validity of clustering relational data. This is done by combining the results of the non-Euclidean relational fuzzy c-means (NERFCM) algorithm with a modification of the VCV method to produce a visual representation of cluster validity. RVCV can cluster complete and incomplete relational data and adds to the visual cluster validity theory. Numeric examples using synthetic and real data are presente

    Mapping similarities in temporal parking occupancy behavior based on city-wide parking meter data

    Get PDF
    The search for a parking space is a severe and stressful problem for drivers in many cities. The provision of maps with parking space occupancy information assists drivers in avoiding the most crowded roads at certain times. Since parking occupancy reveals a repetitive pattern per day and per week, typical parking occupancy patterns can be extracted from historical data. In this paper, we analyze city-wide parking meter data from Hannover, Germany, for a full year. We describe an approach of clustering these parking meters to reduce the complexity of this parking occupancy information and to reveal areas with similar parking behavior. The parking occupancy at every parking meter is derived from a timestamp of ticket payment and the validity period of the parking tickets. The similarity of the parking meters is computed as the mean-squared deviation of the average daily patterns in parking occupancy at the parking meters. Based on this similarity measure, a hierarchical clustering is applied. The number of clusters is determined with the Davies-Bouldin Index and the Silhouette Index. Results show that, after extensive data cleansing, the clustering leads to three clusters representing typical parking occupancy day patterns. Those clusters differ mainly in the hour of the maximum occupancy. In addition, the lo-cations of parking meter clusters, computed only based on temporal similarity, also show clear spatial distinctions from other clusters

    Model order selection for bio-molecular data clustering

    Get PDF
    Background: Cluster analysis has been widely applied for investigating structure in bio-molecular data. A drawback of most clustering algorithms is that they cannot automatically detect the ”natural ” number of clusters underlying the data, and in many cases we have no enough ”a priori ” biological knowledge to evaluate both the number of clusters as well as their validity. Recently several methods based on the concept of stability have been proposed to estimate the ”optimal ” number of clusters, but despite their successful application to the analysis of complex bio-molecular data, the assessment of the statistical significance of the discovered clustering solutions and the detection of multiple structures simultaneously present in high-dimensional bio-molecular data are still major problems. Results: We propose a stability method based on randomized maps that exploits the high-dimensionality and relatively low cardinality that characterize bio-molecular data, by selecting subsets of randomized linear combinations of the input variables, and by using stability indices based on the overall distribution of similarity measures between multiple pairs of clusterings performed on the randomly projected data. A χ 2-based statistical test is proposed to assess the significance of the clustering solutions and to detect significant and if possible multi-level structures simultaneously present in the data (e.g. hierarchical structures)

    Email grouping method

    Get PDF

    Fuzzy clustering with volume prototypes and adaptive cluster merging

    Get PDF
    Two extensions to the objective function-based fuzzy clustering are proposed. First, the (point) prototypes are extended to hypervolumes, whose size can be fixed or can be determined automatically from the data being clustered. It is shown that clustering with hypervolume prototypes can be formulated as the minimization of an objective function. Second, a heuristic cluster merging step is introduced where the similarity among the clusters is assessed during optimization. Starting with an overestimation of the number of clusters in the data, similar clusters are merged in order to obtain a suitable partitioning. An adaptive threshold for merging is proposed. The extensions proposed are applied to Gustafson–Kessel and fuzzy c-means algorithms, and the resulting extended algorithm is given. The properties of the new algorithm are illustrated by various examples
    • …
    corecore