14,776 research outputs found
Distributed Holistic Clustering on Linked Data
Link discovery is an active field of research to support data integration in
the Web of Data. Due to the huge size and number of available data sources,
efficient and effective link discovery is a very challenging task. Common
pairwise link discovery approaches do not scale to many sources with very large
entity sets. We here propose a distributed holistic approach to link many data
sources based on a clustering of entities that represent the same real-world
object. Our clustering approach provides a compact and fused representation of
entities, and can identify errors in existing links as well as many new links.
We support a distributed execution of the clustering approach to achieve faster
execution times and scalability for large real-world data sets. We provide a
novel gold standard for multi-source clustering, and evaluate our methods with
respect to effectiveness and efficiency for large data sets from the geographic
and music domains
Element-centric clustering comparison unifies overlaps and hierarchy
Clustering is one of the most universal approaches for understanding complex
data. A pivotal aspect of clustering analysis is quantitatively comparing
clusterings; clustering comparison is the basis for many tasks such as
clustering evaluation, consensus clustering, and tracking the temporal
evolution of clusters. In particular, the extrinsic evaluation of clustering
methods requires comparing the uncovered clusterings to planted clusterings or
known metadata. Yet, as we demonstrate, existing clustering comparison measures
have critical biases which undermine their usefulness, and no measure
accommodates both overlapping and hierarchical clusterings. Here we unify the
comparison of disjoint, overlapping, and hierarchically structured clusterings
by proposing a new element-centric framework: elements are compared based on
the relationships induced by the cluster structure, as opposed to the
traditional cluster-centric philosophy. We demonstrate that, in contrast to
standard clustering similarity measures, our framework does not suffer from
critical biases and naturally provides unique insights into how the clusterings
differ. We illustrate the strengths of our framework by revealing new insights
into the organization of clusters in two applications: the improved
classification of schizophrenia based on the overlapping and hierarchical
community structure of fMRI brain networks, and the disentanglement of various
social homophily factors in Facebook social networks. The universality of
clustering suggests far-reaching impact of our framework throughout all areas
of science
Predictive intelligence to the edge through approximate collaborative context reasoning
We focus on Internet of Things (IoT) environments where a network of sensing and computing devices are responsible to locally process contextual data, reason and collaboratively infer the appearance of a specific phenomenon (event). Pushing processing and knowledge inference to the edge of the IoT network allows the complexity of the event reasoning process to be distributed into many manageable pieces and to be physically located at the source of the contextual information. This enables a huge amount of rich data streams to be processed in real time that would be prohibitively complex and costly to deliver on a traditional centralized Cloud system. We propose a lightweight, energy-efficient, distributed, adaptive, multiple-context perspective event reasoning model under uncertainty on each IoT device (sensor/actuator). Each device senses and processes context data and infers events based on different local context perspectives: (i) expert knowledge on event representation, (ii) outliers inference, and (iii) deviation from locally predicted context. Such novel approximate reasoning paradigm is achieved through a contextualized, collaborative belief-driven clustering process, where clusters of devices are formed according to their belief on the presence of events. Our distributed and federated intelligence model efficiently identifies any localized abnormality on the contextual data in light of event reasoning through aggregating local degrees of belief, updates, and adjusts its knowledge to contextual data outliers and novelty detection. We provide comprehensive experimental and comparison assessment of our model over real contextual data with other localized and centralized event detection models and show the benefits stemmed from its adoption by achieving up to three orders of magnitude less energy consumption and high quality of inference
- …