231 research outputs found
Extensible Context-aware Stream Processing on the Cloud
Rationale and Challenges for Massive Data Stream Processing on the CloudThe ubiquity of mobile devices, location services, and sensor pervasiveness, e.g., as in smart city initiatives, call for scalable computing platforms and massively parallel architectures to process the vast amounts of the generated streamed data. Cloud computing provides some of the features needed for these massive data streaming applications. For example, the dynamic allocation of resources on an as-needed basis addresses the variability in sensor and location data distributions over time. However, today’s cloud computing platforms lack very important features that are necessary in order to support the massive amounts of data streams envisioned by the massive and ubiquitous dissemination of sensors and mobile devices of all sorts in smart-city-scale applications
bdbms -- A Database Management System for Biological Data
Biologists are increasingly using databases for storing and managing their
data. Biological databases typically consist of a mixture of raw data,
metadata, sequences, annotations, and related data obtained from various
sources. Current database technology lacks several functionalities that are
needed by biological databases. In this paper, we introduce bdbms, an
extensible prototype database management system for supporting biological data.
bdbms extends the functionalities of current DBMSs to include: (1) Annotation
and provenance management including storage, indexing, manipulation, and
querying of annotation and provenance as first class objects in bdbms, (2)
Local dependency tracking to track the dependencies and derivations among data
items, (3) Update authorization to support data curation via content-based
authorization, in contrast to identity-based authorization, and (4) New access
methods and their supporting operators that support pattern matching on various
types of compressed biological data types. This paper presents the design of
bdbms along with the techniques proposed to support these functionalities
including an extension to SQL. We also outline some open issues in building
bdbms.Comment: This article is published under a Creative Commons License Agreement
(http://creativecommons.org/licenses/by/2.5/.) You may copy, distribute,
display, and perform the work, make derivative works and make commercial use
of the work, but, you must attribute the work to the author and CIDR 2007.
3rd Biennial Conference on Innovative Data Systems Research (CIDR) January
710, 2007, Asilomar, California, US
An Update-intensive LSM-based R-tree Index
Many applications require update-intensive workloads on spatial objects,
e.g., social-network services and shared-riding services that track moving
objects. By buffering insert and delete operations in memory, the Log
Structured Merge Tree (LSM) has been used widely in various systems because of
its ability to handle write-heavy workloads. While the focus on LSM has been on
key-value stores and their optimizations, there is a need to study how to
efficiently support LSM-based {\em secondary} indexes (e.g., location-based
indexes) as modern, heterogeneous data necessitates the use of secondary
indexes. In this paper, we investigate the augmentation of a main-memory-based
memo structure into an LSM secondary index structure to handle update-intensive
workloads efficiently. We conduct this study in the context of an R-tree-based
secondary index. In particular, we introduce the LSM RUM-tree that demonstrates
the use of an Update Memo in an LSM-based R-tree to enhance the performance of
the R-tree's insert, delete, update, and search operations. The LSM RUM-tree
introduces new strategies to control the size of the Update Memo to make sure
it always fits in memory for high performance. The Update Memo is a
light-weight in-memory structure that is suitable for handling update-intensive
workloads without introducing significant overhead. Experimental results using
real spatial data demonstrate that the LSM RUM-tree achieves up to 9.6x speedup
on update operations and up to 2400x speedup on query processing over existing
LSM R-tree implementations
- …