19,864 research outputs found
When Things Matter: A Data-Centric View of the Internet of Things
With the recent advances in radio-frequency identification (RFID), low-cost
wireless sensor devices, and Web technologies, the Internet of Things (IoT)
approach has gained momentum in connecting everyday objects to the Internet and
facilitating machine-to-human and machine-to-machine communication with the
physical world. While IoT offers the capability to connect and integrate both
digital and physical entities, enabling a whole new class of applications and
services, several significant challenges need to be addressed before these
applications and services can be fully realized. A fundamental challenge
centers around managing IoT data, typically produced in dynamic and volatile
environments, which is not only extremely large in scale and volume, but also
noisy, and continuous. This article surveys the main techniques and
state-of-the-art research efforts in IoT from data-centric perspectives,
including data stream processing, data storage models, complex event
processing, and searching in IoT. Open research issues for IoT data management
are also discussed
Network Sampling: From Static to Streaming Graphs
Network sampling is integral to the analysis of social, information, and
biological networks. Since many real-world networks are massive in size,
continuously evolving, and/or distributed in nature, the network structure is
often sampled in order to facilitate study. For these reasons, a more thorough
and complete understanding of network sampling is critical to support the field
of network science. In this paper, we outline a framework for the general
problem of network sampling, by highlighting the different objectives,
population and units of interest, and classes of network sampling methods. In
addition, we propose a spectrum of computational models for network sampling
methods, ranging from the traditionally studied model based on the assumption
of a static domain to a more challenging model that is appropriate for
streaming domains. We design a family of sampling methods based on the concept
of graph induction that generalize across the full spectrum of computational
models (from static to streaming) while efficiently preserving many of the
topological properties of the input graphs. Furthermore, we demonstrate how
traditional static sampling algorithms can be modified for graph streams for
each of the three main classes of sampling methods: node, edge, and
topology-based sampling. Our experimental results indicate that our proposed
family of sampling methods more accurately preserves the underlying properties
of the graph for both static and streaming graphs. Finally, we study the impact
of network sampling algorithms on the parameter estimation and performance
evaluation of relational classification algorithms
Loom: Query-aware Partitioning of Online Graphs
As with general graph processing systems, partitioning data over a cluster of
machines improves the scalability of graph database management systems.
However, these systems will incur additional network cost during the execution
of a query workload, due to inter-partition traversals. Workload-agnostic
partitioning algorithms typically minimise the likelihood of any edge crossing
partition boundaries. However, these partitioners are sub-optimal with respect
to many workloads, especially queries, which may require more frequent
traversal of specific subsets of inter-partition edges. Furthermore, they
largely unsuited to operating incrementally on dynamic, growing graphs.
We present a new graph partitioning algorithm, Loom, that operates on a
stream of graph updates and continuously allocates the new vertices and edges
to partitions, taking into account a query workload of graph pattern
expressions along with their relative frequencies.
First we capture the most common patterns of edge traversals which occur when
executing queries. We then compare sub-graphs, which present themselves
incrementally in the graph update stream, against these common patterns.
Finally we attempt to allocate each match to single partitions, reducing the
number of inter-partition edges within frequently traversed sub-graphs and
improving average query performance.
Loom is extensively evaluated over several large test graphs with realistic
query workloads and various orderings of the graph updates. We demonstrate
that, given a workload, our prototype produces partitionings of significantly
better quality than existing streaming graph partitioning algorithms Fennel and
LDG
- …