428,649 research outputs found
Overlapping Multi-hop Clustering for Wireless Sensor Networks
Clustering is a standard approach for achieving efficient and scalable
performance in wireless sensor networks. Traditionally, clustering algorithms
aim at generating a number of disjoint clusters that satisfy some criteria. In
this paper, we formulate a novel clustering problem that aims at generating
overlapping multi-hop clusters. Overlapping clusters are useful in many sensor
network applications, including inter-cluster routing, node localization, and
time synchronization protocols. We also propose a randomized, distributed
multi-hop clustering algorithm (KOCA) for solving the overlapping clustering
problem. KOCA aims at generating connected overlapping clusters that cover the
entire sensor network with a specific average overlapping degree. Through
analysis and simulation experiments we show how to select the different values
of the parameters to achieve the clustering process objectives. Moreover, the
results show that KOCA produces approximately equal-sized clusters, which
allows distributing the load evenly over different clusters. In addition, KOCA
is scalable; the clustering formation terminates in a constant time regardless
of the network size
When Should You Adjust Standard Errors for Clustering?
In empirical work in economics it is common to report standard errors that
account for clustering of units. Typically, the motivation given for the
clustering adjustments is that unobserved components in outcomes for units
within clusters are correlated. However, because correlation may occur across
more than one dimension, this motivation makes it difficult to justify why
researchers use clustering in some dimensions, such as geographic, but not
others, such as age cohorts or gender. It also makes it difficult to explain
why one should not cluster with data from a randomized experiment. In this
paper, we argue that clustering is in essence a design problem, either a
sampling design or an experimental design issue. It is a sampling design issue
if sampling follows a two stage process where in the first stage, a subset of
clusters were sampled randomly from a population of clusters, while in the
second stage, units were sampled randomly from the sampled clusters. In this
case the clustering adjustment is justified by the fact that there are clusters
in the population that we do not see in the sample. Clustering is an
experimental design issue if the assignment is correlated within the clusters.
We take the view that this second perspective best fits the typical setting in
economics where clustering adjustments are used. This perspective allows us to
shed new light on three questions: (i) when should one adjust the standard
errors for clustering, (ii) when is the conventional adjustment for clustering
appropriate, and (iii) when does the conventional adjustment of the standard
errors matter
Comparing clusterings and numbers of clusters by aggregation of calibrated clustering validity indexes
A key issue in cluster analysis is the choice of an appropriate clustering
method and the determination of the best number of clusters. Different
clusterings are optimal on the same data set according to different criteria,
and the choice of such criteria depends on the context and aim of clustering.
Therefore, researchers need to consider what data analytic characteristics the
clusters they are aiming at are supposed to have, among others within-cluster
homogeneity, between-clusters separation, and stability. Here, a set of
internal clustering validity indexes measuring different aspects of clustering
quality is proposed, including some indexes from the literature. Users can
choose the indexes that are relevant in the application at hand. In order to
measure the overall quality of a clustering (for comparing clusterings from
different methods and/or different numbers of clusters), the index values are
calibrated for aggregation. Calibration is relative to a set of random
clusterings on the same data. Two specific aggregated indexes are proposed and
compared with existing indexes on simulated and real data.Comment: 42 pages, 11 figure
- …