1,410 research outputs found
Implementing Performance Competitive Logical Recovery
New hardware platforms, e.g. cloud, multi-core, etc., have led to a
reconsideration of database system architecture. Our Deuteronomy project
separates transactional functionality from data management functionality,
enabling a flexible response to exploiting new platforms. This separation
requires, however, that recovery is described logically. In this paper, we
extend current recovery methods to work in this logical setting. While this is
straightforward in principle, performance is an issue. We show how ARIES style
recovery optimizations can work for logical recovery where page information is
not captured on the log. In side-by-side performance experiments using a common
log, we compare logical recovery with a state-of-the art ARIES style recovery
implementation and show that logical redo performance can be competitive.Comment: VLDB201
Supporting service discovery, querying and interaction in ubiquitous computing environments.
In this paper, we contend that ubiquitous computing environments will be highly heterogeneous, service rich domains. Moreover, future applications will consequently be required to interact with multiple, specialised service location and interaction protocols simultaneously. We argue that existing service discovery techniques do not provide sufficient support to address the challenges of building applications targeted to these emerging environments. This paper makes a number of contributions. Firstly, using a set of short ubiquitous computing scenarios we identify several key limitations of existing service discovery approaches that reduce their ability to support ubiquitous computing applications. Secondly, we present a detailed analysis of requirements for providing effective support in this domain. Thirdly, we provide the design of a simple extensible meta-service discovery architecture that uses database techniques to unify service discovery protocols and addresses several of our key requirements. Lastly, we examine the lessons learnt through the development of a prototype implementation of our architecture
Data Vaults: A Symbiosis between Database Technology and Scientific File Repositories
In this short paper we outline the Data Vault, a database-attached external file repository.
It provides a true symbiosis between a DBMS and existing file-based repositories.
Data is kept in its original format while scalable processing functionality is provided through the DBMS facilities.
In particular, it provides transparent access to all data kept in the repository through an (array-based)
query language using the file-type specific scientific libraries.
The design space for data vaults is characterized by requirements coming from various fields.
We present a reference architecture for their realization in (commercial) DBMSs and a concrete
implementation in MonetDB for remote sensing data geared at content-based image retrieval
A comparison of two physical data designs for interactive social networking actions
This paper compares the performance of an SQL solution that implements a relational data model with a document store named MongoDB. We report on the performance of a single node configuration of each data store and assume the database is small enough to fit in main memory. We analyze utilization of the CPU cores and the network bandwidth to compare the two data stores. Our key findings are as follows. First, for those social networking actions that read and write a small amount of data, the join operator of the SQL solution is not slower than the JSON representation of MongoDB. Second, with a mix of actions, the SQL solution provides either the same performance as MongoDB or outperforms it by 20%. Third, a middle-tier cache enhances the performance of both data stores as query result look up is significantly faster than query processing with either system.
Performance Testing of Distributed Component Architectures
Performance characteristics, such as response time, throughput andscalability, are key quality attributes of distributed applications. Current practice,however, rarely applies systematic techniques to evaluate performance characteristics.We argue that evaluation of performance is particularly crucial in early developmentstages, when important architectural choices are made. At first glance, thiscontradicts the use of testing techniques, which are usually applied towards the endof a project. In this chapter, we assume that many distributed systems are builtwith middleware technologies, such as the Java 2 Enterprise Edition (J2EE) or theCommon Object Request Broker Architecture (CORBA). These provide servicesand facilities whose implementations are available when architectures are defined.We also note that it is the middleware functionality, such as transaction and persistenceservices, remote communication primitives and threading policy primitives,that dominates distributed system performance. Drawing on these observations, thischapter presents a novel approach to performance testing of distributed applications.We propose to derive application-specific test cases from architecture designs so thatthe performance of a distributed application can be tested based on the middlewaresoftware at early stages of a development process. We report empirical results thatsupport the viability of the approach
Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)
Database management systems (DBMSs) carefully optimize complex multi-join
queries to avoid expensive disk I/O. As servers today feature tens or hundreds
of gigabytes of RAM, a significant fraction of many analytic databases becomes
memory-resident. Even after careful tuning for an in-memory environment, a
linear disk I/O model such as the one implemented in PostgreSQL may make query
response time predictions that are up to 2X slower than the optimal multi-join
query plan over memory-resident data. This paper introduces a memory I/O cost
model to identify good evaluation strategies for complex query plans with
multiple hash-based equi-joins over memory-resident data. The proposed cost
model is carefully validated for accuracy using three different systems,
including an Amazon EC2 instance, to control for hardware-specific differences.
Prior work in parallel query evaluation has advocated right-deep and bushy
trees for multi-join queries due to their greater parallelization and
pipelining potential. A surprising finding is that the conventional wisdom from
shared-nothing disk-based systems does not directly apply to the modern
shared-everything memory hierarchy. As corroborated by our model, the
performance gap between the optimal left-deep and right-deep query plan can
grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in
SoCC'1
- ā¦