428,649 research outputs found

    Overlapping Multi-hop Clustering for Wireless Sensor Networks

    Full text link
    Clustering is a standard approach for achieving efficient and scalable performance in wireless sensor networks. Traditionally, clustering algorithms aim at generating a number of disjoint clusters that satisfy some criteria. In this paper, we formulate a novel clustering problem that aims at generating overlapping multi-hop clusters. Overlapping clusters are useful in many sensor network applications, including inter-cluster routing, node localization, and time synchronization protocols. We also propose a randomized, distributed multi-hop clustering algorithm (KOCA) for solving the overlapping clustering problem. KOCA aims at generating connected overlapping clusters that cover the entire sensor network with a specific average overlapping degree. Through analysis and simulation experiments we show how to select the different values of the parameters to achieve the clustering process objectives. Moreover, the results show that KOCA produces approximately equal-sized clusters, which allows distributing the load evenly over different clusters. In addition, KOCA is scalable; the clustering formation terminates in a constant time regardless of the network size

    When Should You Adjust Standard Errors for Clustering?

    Full text link
    In empirical work in economics it is common to report standard errors that account for clustering of units. Typically, the motivation given for the clustering adjustments is that unobserved components in outcomes for units within clusters are correlated. However, because correlation may occur across more than one dimension, this motivation makes it difficult to justify why researchers use clustering in some dimensions, such as geographic, but not others, such as age cohorts or gender. It also makes it difficult to explain why one should not cluster with data from a randomized experiment. In this paper, we argue that clustering is in essence a design problem, either a sampling design or an experimental design issue. It is a sampling design issue if sampling follows a two stage process where in the first stage, a subset of clusters were sampled randomly from a population of clusters, while in the second stage, units were sampled randomly from the sampled clusters. In this case the clustering adjustment is justified by the fact that there are clusters in the population that we do not see in the sample. Clustering is an experimental design issue if the assignment is correlated within the clusters. We take the view that this second perspective best fits the typical setting in economics where clustering adjustments are used. This perspective allows us to shed new light on three questions: (i) when should one adjust the standard errors for clustering, (ii) when is the conventional adjustment for clustering appropriate, and (iii) when does the conventional adjustment of the standard errors matter

    Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes

    Get PDF
    A key issue in cluster analysis is the choice of an appropriate clustering method and the determination of the best number of clusters. Different clusterings are optimal on the same data set according to different criteria, and the choice of such criteria depends on the context and aim of clustering. Therefore, researchers need to consider what data analytic characteristics the clusters they are aiming at are supposed to have, among others within-cluster homogeneity, between-clusters separation, and stability. Here, a set of internal clustering validity indexes measuring different aspects of clustering quality is proposed, including some indexes from the literature. Users can choose the indexes that are relevant in the application at hand. In order to measure the overall quality of a clustering (for comparing clusterings from different methods and/or different numbers of clusters), the index values are calibrated for aggregation. Calibration is relative to a set of random clusterings on the same data. Two specific aggregated indexes are proposed and compared with existing indexes on simulated and real data.Comment: 42 pages, 11 figure
    • …
    corecore