32 research outputs found
Emerging multidisciplinary research across database management systems
The database community is exploring more and more multidisciplinary avenues:
Data semantics overlaps with ontology management; reasoning tasks venture into
the domain of artificial intelligence; and data stream management and
information retrieval shake hands, e.g., when processing Web click-streams.
These new research avenues become evident, for example, in the topics that
doctoral students choose for their dissertations. This paper surveys the
emerging multidisciplinary research by doctoral students in database systems
and related areas. It is based on the PIKM 2010, which is the 3rd Ph.D.
workshop at the International Conference on Information and Knowledge
Management (CIKM). The topics addressed include ontology development, data
streams, natural language processing, medical databases, green energy, cloud
computing, and exploratory search. In addition to core ideas from the workshop,
we list some open research questions in these multidisciplinary areas
Emerging multidisciplinary research across database management systems
The database community is exploring more and more multidisciplinary avenues:
Data semantics overlaps with ontology management; reasoning tasks venture into
the domain of artificial intelligence; and data stream management and
information retrieval shake hands, e.g., when processing Web click-streams.
These new research avenues become evident, for example, in the topics that
doctoral students choose for their dissertations. This paper surveys the
emerging multidisciplinary research by doctoral students in database systems
and related areas. It is based on the PIKM 2010, which is the 3rd Ph.D.
workshop at the International Conference on Information and Knowledge
Management (CIKM). The topics addressed include ontology development, data
streams, natural language processing, medical databases, green energy, cloud
computing, and exploratory search. In addition to core ideas from the workshop,
we list some open research questions in these multidisciplinary areas
Quality-Driven Disorder Handling for M-way Sliding Window Stream Joins
Sliding window join is one of the most important operators for stream
applications. To produce high quality join results, a stream processing system
must deal with the ubiquitous disorder within input streams which is caused by
network delay, asynchronous source clocks, etc. Disorder handling involves an
inevitable tradeoff between the latency and the quality of produced join
results. To meet different requirements of stream applications, it is desirable
to provide a user-configurable result-latency vs. result-quality tradeoff.
Existing disorder handling approaches either do not provide such
configurability, or support only user-specified latency constraints.
In this work, we advocate the idea of quality-driven disorder handling, and
propose a buffer-based disorder handling approach for sliding window joins,
which minimizes sizes of input-sorting buffers, thus the result latency, while
respecting user-specified result-quality requirements. The core of our approach
is an analytical model which directly captures the relationship between sizes
of input buffers and the produced result quality. Our approach is generic. It
supports m-way sliding window joins with arbitrary join conditions. Experiments
on real-world and synthetic datasets show that, compared to the state of the
art, our approach can reduce the result latency incurred by disorder handling
by up to 95% while providing the same level of result quality.Comment: 12 pages, 11 figures, IEEE ICDE 201
FPTree: A Hybrid SCM-DRAM Persistent and Concurrent B-Tree for Storage Class Memory
The advent of Storage Class Memory (SCM) is driving a rethink of storage systems towards a single-level architecture where memory and storage are merged. In this context, several works have investigated how to design persistent trees in SCM as a fundamental building block for these novel systems. However, these trees are significantly slower than DRAM-based counterparts since trees are latency-sensitive and SCM exhibits higher latencies than DRAM. In this paper we propose a novel hybrid SCM-DRAM persistent and concurrent B-Tree, named Fingerprinting Persistent Tree (FPTree) that achieves similar performance to DRAM-based counterparts. In this novel design, leaf nodes are persisted in SCM while inner nodes are placed in DRAM and rebuilt upon recovery. The FPTree uses Fingerprinting, a technique that limits the expected number of in-leaf probed keys to one. In addition, we propose a hybrid concurrency scheme for the FPTree that is partially based on Hardware Transactional Memory. We conduct a thorough performance evaluation and show that the FPTree outperforms state-of-the-art persistent trees with different SCM latencies by up to a factor of 8.2. Moreover, we show that the FPTree scales very well on a machine with 88 logical cores. Finally, we integrate the evaluated trees in memcached and a prototype database. We show that the FPTree incurs an almost negligible performance overhead over using fully transient data structures, while significantly outperforming other persistent trees
PIKM 2010ACM Workshop for Ph.D. Students in Information and Knowledge Management
The PIKM workshop focuses on papers consisting mainly of the Ph.D. dissertation proposals of doctoral students. A wide range of topics on any area in databases, information retrieval and knowledge management are presented at this workshop. The areas of interest are similar to those at the CIKM main conference in the three respective tracks. Interdisciplinary work across these tracks is encouraged
Real-Time Networking over HIPPI
HIPPI provides a very-high-speed communication medium, which is very well suited for a large number of bandwidth-demanding distributed applications. Unfortunately, its circuit-switched nature makes it very difficult to provide real-time guarantees when connections contend for network resources. We present a time-division-multiplex access scheme designed to give timing guarantees to high-speed connections. We describe the problem of scheduling the access to a HIPPI network, and show that, although the problem is very unlikely to be computationally tractable, very simple heuristics give high network utilizations for moderately-sized networks. We present the RMP/RMCP protocol, our implementation of the scheme described in this paper on the XUNET-West HIPPI testbed. 1 Introduction A large number of applications in distributed control, distributed virtual reality, and remote laboratoring demand for hard delay guarantees in order to satisfy the timing requirements of their time-critical com..
Using Containment Information for View Evolution in Dynamic Distributed Environments
The maintenance of materialized views in large-scale environments composed of numerous information sources (ISs), such as in the WWW, is complicated by ISs not only continuously modifying their contents but also their capabilities (schemas and query interfaces). With current view technology, views become undefined when ISs change their capabilities. Our Evolvable View Environment (EVE) project addresses this new problem of evolving views under IS capabilities changes, which we coin view synchronization problem. Key principles of EVE include a userspecified preference model for view evolution (EvolvableSQL (E-SQL)) and a Model for Information Source Descriptions (MISD). In this paper, we first present a formal characterization of correctness of view synchronization using containment constraints defined in MISD. Then, we give a novel view synchronization algorithm for view rewriting exploiting general containment constraints between the tobe -replaced relation and its replacement. 1. Int..