7,782 research outputs found
A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing
The overwhelmingly increasing amount of stored data has spurred researchers
seeking different methods in order to optimally take advantage of it which
mostly have faced a response time problem as a result of this enormous size of
data. Most of solutions have suggested materialization as a favourite solution.
However, such a solution cannot attain Real- Time answers anyhow. In this paper
we propose a framework illustrating the barriers and suggested solutions in the
way of achieving Real-Time OLAP answers that are significantly used in decision
support systems and data warehouses
Distributed Database Design: A Case Study
Data Allocation is an important problem in Distributed Database Design. Generally, evolutionary algorithms are used to determine the assignments of fragments to sites. Data Allocation Algorithms should handle replication, query frequencies, quality of service (QoS), cite capacities, table update costs, selection and projection costs. Most of the algorithms in the literature attack one or few components of the problem. In this paper, we present a case study considering all of these features. The proposed model uses Integer Linear Programming for the formulation of the problem. (C) 2014 The Authors. Published by Elsevier B.V
On hierarchical clustering-based approach for RDDBS design
Distributed database system (DDBS) design is still an open challenge even after decades of research, especially in a dynamic network setting. Hence, to meet the demands of high-speed data gathering and for the management and preservation of huge systems, it is important to construct a distributed database for real-time data storage. Incidentally, some fragmentation schemes, such as horizontal, vertical, and hybrid, are widely used for DDBS design. At the same time, data allocation could not be done without first physically fragmenting the data because the fragmentation process is the foundation of the DDBS design. Extensive research have been conducted to develop effective solutions for DDBS design problems. But the great majority of them barely consider the RDDBS\u27s initial design. Therefore, this work aims at proposing a clustering-based horizontal fragmentation and allocation technique to handle both the early and late stages of the DDBS design. To ensure that each operation flows into the next without any increase in complexity, fragmentation and allocation are done simultaneously. With this approach, the main goals are to minimize communication expenses, response time, and irrelevant data access. Most importantly, it has been observed that the proposed approach may effectively expand RDDBS performance by simultaneously fragmenting and assigning various relations. Through simulations and experiments on synthetic and real databases, we demonstrate the viability of our strategy and how it considerably lowers communication costs for typical access patterns at both the early and late stages of design
Solving Large Scale Instances of the Distribution Design Problem Using Data Mining
In this paper we approach the solution of large instances of the distribution design problem. The traditional approaches do not consider that the instance size can significantly reduce the efficiency of the solution process. We propose a new approach that includes compression methods to transform the original instance into a new one using data mining techniques. The goal of the transformation is to condense the operation access pattern of the original instance to reduce the amount of resources needed to solve the original instance, without significantly reducing the quality of its solution. In order to validate the approach, we tested it proposing two instance compression methods on a new model of the replicated version of the distribution design problem that incorporates generalized database objects. The experimental results show that our approach permits to reduce the computational resources needed for solving large instances by at least 65%, without significantly reducing the quality of its solution. Given the encouraging results, at the moment we are working on the design and implementation of efficient instance compression methods using other data mining techniques
A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing
Data Grids have been adopted as the platform for scientific communities that
need to share, access, transport, process and manage large data collections
distributed worldwide. They combine high-end computing technologies with
high-performance networking and wide-area storage management techniques. In
this paper, we discuss the key concepts behind Data Grids and compare them with
other data sharing and distribution paradigms such as content delivery
networks, peer-to-peer networks and distributed databases. We then provide
comprehensive taxonomies that cover various aspects of architecture, data
transportation, data replication and resource allocation and scheduling.
Finally, we map the proposed taxonomy to various Data Grid systems not only to
validate the taxonomy but also to identify areas for future exploration.
Through this taxonomy, we aim to categorise existing systems to better
understand their goals and their methodology. This would help evaluate their
applicability for solving similar problems. This taxonomy also provides a "gap
analysis" of this area through which researchers can potentially identify new
issues for investigation. Finally, we hope that the proposed taxonomy and
mapping also helps to provide an easy way for new practitioners to understand
this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
A threshold based dynamic data allocation algorithm - a Markov Chain model approach
In this study, a new dynamic data allocation algorithm for non-replicated Distributed Database Systems (DDS), namely the threshold algorithm, is formulated and proposed. The threshold algorithm reallocates data with respect to changing data access patterns. The proposed algorithm is distributed in the sense that each node autonomously decides whether to transfer the ownership of a fragment in DDS to another node or not. The transfer decision depends on the past accesses of the fragment. Each fragment continuously migrates ftom the node where it is not accessed locally more than a certain number of past accesses, namely a threshold value. The threshold algorithm is modeled for a fragment of the database as a finite Markov chain with constant node access probabilities. In the model, a special case, where all nodes have equal access probabilities except one with a different access probability, is analyzed. It has been shown that for positive threshold values the fragment will tend to remain at the node with the higher access probability. It is also shown that the greater the threshold values are, the greater the tendency of the fragment to remain at the node with higher access probability will be. The threshold algorithm is especially suitable for a DDS where data access pattern changes dynamically
A Methodology for Vertically Partitioning in a Multi-Relation Database Environment
Vertical partitioning, in which attributes of a relation are assigned to partitions, is aimed at improving database performance. We extend previous research that is based on a single relation to multi-relation database environment, by including referential integrity constraints, access time based heuristic, and a comprehensive cost model that considers most transaction types including updates and joins. The algorithm was applied to a real-world insurance CLAIMS database. Simulation experiments were conducted and the results show a performance improvement of 36% to 65% over unpartitioned case.
Application of our method for small databases resulted in partitioning schemes that are comparable to optimal.Facultad de Informátic
A Methodology for Vertically Partitioning in a Multi-Relation Database Environment
Vertical partitioning, in which attributes of a relation are assigned to partitions, is aimed at improving database performance. We extend previous research that is based on a single relation to multi-relation database environment, by including referential integrity constraints, access time based heuristic, and a comprehensive cost model that considers most transaction types including updates and joins. The algorithm was applied to a real-world insurance CLAIMS database. Simulation experiments were conducted and the results show a performance improvement of 36% to 65% over unpartitioned case.
Application of our method for small databases resulted in partitioning schemes that are comparable to optimal.Facultad de Informátic
- …