74,456 research outputs found
Dynamic Action Scheduling in a Parallel Database System
This paper describes a scheduling technique for parallel database systems to obtain high performance, both in terms of response time and throughput. The technique enables both intra- and inter-transaction parallelism while controlling concurrency between transactions correctly. Scheduling is performed dynamically at transaction execution time, taking into account dynamic aspects of the execution and allowing parallelism between the scheduling and transaction execution processes. The technique has a solid conceptual background, based on a simple graph-based approach. The usability and effectiveness of the technique are demonstrated by implementation in and measurements on the parallel PRISMA database system
UTILISING NETWORKED WORKSTATIONS TO ACCELERATE DATABASE QUERIES
The rapid growth in
the size of databases and the advances made in Query Languages has resulted in increased SQL query complexity submitted by users, which in turn slows down the speed of information retrieval from the database.
The future of high performance database systems lies in parallelism. Commercial
vendors´ database systems have introduced solutions but these have proved to be
extremely expensive.
This paper investagetes how networked resources such as workstations can be
utilised by using Parallel Virtual Machine (PVM) to Optimise Database Query Execution. An investigation and experiments of the scalability of the PVM are conducted. PVM is
used to implement palallelism in two separate ways:
(i) Removes the work load for deriving and maintaining rules from the
data server for Semantic Query Optimisation, therefore clears the way for more
widespread use of SQO in databases [16], [5].
(ii) Answers users queries by a proposed Parallel Query Algorithm PQA
which works over a network of workstations, coupled with a sequential Database
Management System DBMS called PostgreSql on the prototype called Expandable
Server Architecture ESA [11], [12], [21], [13].
Experiments have been conducted to
tackle the problems of Parallel and Distributed systems such as task
scheduling, load balance and fault tolerance
Memory aware query scheduling in a database cluster
Query throughput is one of the primary optimization goals in interactive web-based information systems in order to achieve the performance necessary to serve large user communities. Queries in this application domain differ significantly from those in traditional database applications: they are of lower complexity and almost exclusively read-only. The architecture we propose here is specifically tailored to take advantage of the query characteristics. It is based on a large parallel shared-nothing database cluster where each node runs a separate server with a fully replicated copy of the database. A query is assigned and entirely executed on one single node avoiding network contention or synchronization effects. However, the actual key to enhanced throughput is a resource efficient scheduling of the arriving queries. We develop a simple and robust scheduling scheme that takes the currently memory resident data at each server into account and trades off memory re-use and execution time, reordering queries as necessary. Our experimental evaluation demonstrates the effectiveness when scaling the system beyond hundreds of nodes showing super-linear speedup
Towards Efficient Locality Aware Parallel Data Stream Processing
Abstract: Parallel data processing and parallel streaming systems become quite popular. They are employed in various domains such as real-time signal processing, OLAP database systems, or high performance data extraction. One of the key components of these systems is the task scheduler which plans and executes tasks spawned by the application on available CPU cores. The multiprocessor systems and CPU architecture of the day become quite complex, which makes the task scheduling a challenging problem. In this paper, we propose a novel task scheduling strategy for parallel data stream systems, that reflects many technical issues of the current hardware. In addition, we have implemented a NUMA aware memory allocator that improves data locality in NUMA systems. The proposed task scheduler combined with the new memory allocator achieve up to 3Ă— speed up on a NUMA system and up to 10% speed up on an older SMP system with respect to the unoptimized versions of the scheduler and allocator. Many of the ideas implemented in our parallel framework may be adopted for task scheduling in other domains that focus on different priorities or employ additional constraints
Executing Multidatabase Transactions
In a multidatabase environment, the traditional transaction model has been found to be too restrictive. Therefore, several extended transaction models have been proposed in which some of the requirements of transaction, such as isolation or atomicity, are optional. The authors describe one of such extensions, the flexible transaction model and discuss the scheduling of transactions involving multiple autonomous database systems managed by heterogeneous DBMS.
The scheduling algorithm for flexible transactions is implemented using L.0, a logically parallel language which provides a framework for concisely specifying the multidatabase transactions and for scheduling them. The key aspects of a flexible transaction specification, such as subtransaction execution dependencies and transaction success criteria, can be naturally represented in L.0. Furthermore, scheduling in L.0 achieves maximal parallelism allowed by the specifications of transactions, which results in the improvement of their response times.
To provide access to multiple heterogeneous hardware and software systems, they use the Distributed Operation Language (DOL). DOL approach is based on providing a common communication and data exchange protocol and uses local access managers to protect the autonomy of member software systems. When L.0 determines that a subtransaction is ready to execute, it hands it through an interface to the DOL system for execution. The interface between L.0 and DOL provides the former with the execution status of subtransactions
Disk Scheduling for Intermediate Results of Large Join Queries in Shared-Disk Parallel Database Systems
In shared-disk database systems, disk access has to be scheduled properly to avoid unnecessary contention between processors. The first part of this report studies the allocation of intermediate results of join queries (buckets) on disk and derives heuristics to determine the number of processing nodes and disks to employ. Using an analytical model, we show that declustering should be applied even for single buckets to ensure optimal performance. In the second part, we consider the order of reading the buckets and demonstrate the necessity of highly dynamic load balancing to prevent excessive disk contention, especially under skew conditions
A batch scheduler with high level components
In this article we present the design choices and the evaluation of a batch
scheduler for large clusters, named OAR. This batch scheduler is based upon an
original design that emphasizes on low software complexity by using high level
tools. The global architecture is built upon the scripting language Perl and
the relational database engine Mysql. The goal of the project OAR is to prove
that it is possible today to build a complex system for ressource management
using such tools without sacrificing efficiency and scalability. Currently, our
system offers most of the important features implemented by other batch
schedulers such as priority scheduling (by queues), reservations, backfilling
and some global computing support. Despite the use of high level tools, our
experiments show that our system has performances close to other systems.
Furthermore, OAR is currently exploited for the management of 700 nodes (a
metropolitan GRID) and has shown good efficiency and robustness
- …