Search CORE

13 research outputs found

Data mining techniques using decision tree model in materialised projection and selection view

Author: Teh Y. W.
Publication venue: Universitat Politècnica de Catalunya. Secció de Matemàtiques i Informàtica
Publication date: 01/01/2004
Field of study

With the availability of very large data storage today, redundant data structures are no longer a big issue. However, an intelligent way of managing materialised projection and selection views that can lead to fast access of data is the central issue dealt with in this paper. A set of implementation steps for the data warehouse administrators or decision makers to improve the response time of queries is also defined. The study concludes that both attributes and tuples, are important factors to be considered to improve the response time of a query. The adoption of data mining techniques in the physical design of data warehouses has been shown to be useful in practice

UPCommons. Portal del coneixement obert de la UPC

Introducing Perf to a Query Optimization Algorithm

Author: Abdallah Raef
Publication venue: 'Oklahoma State University Library'
Publication date: 01/12/2002
Field of study

SHAREOK repository

Optimizing Queries Using a Materialized View in a Data Warehoue

Author: Hu Jing
Publication venue: 'Oklahoma State University Library'
Publication date: 01/07/2006
Field of study

A data warehouse is a user-centered environment for data analysis and decision support. To support decision maker in making decisions quickly and accurately, using materialized views can provide significant improvements in query processing time. The problem of answering queries using views is to find efficient methods of answering a query using a set of previously materialized views over the database, rather than accessing the database relations. The known algorithms, the bucket algorithm, the inverse-rules algorithm have been used to rewrite queries using views before executing the queries. The bucket algorithm, predominantly used to rewrite queries, generates a candidate rewriting to a query using views then checks that the rewriting is contained in the original query. However, we show same deficiencies in the bucket algorithm then describe the containment bucket algorithm and give an optimal method to solve this problem. We present an experiment comparing the performance of both algorithms.Computer Science Departmen

SHAREOK repository

Path indexing for efficient path query processing in graph databases

Author: Sumrall J.M.
Publication venue
Publication date: 01/01/2015
Field of study

Repository TU/e

Pure OAI Repository

GhostDB: Querying Visible and Hidden Data Without Leaks

Author: Anciaux Nicolas
Benzine Mehdi
Bouganim Luc
Pucheral Philippe
Shasha Dennis
Publication venue: HAL CCSD
Publication date: 01/01/2007
Field of study

International audienceImagine that you have been entrusted with private data, such as corporate product information, sensitive government information, or symptom and treatment information about hospital patients. You may want to issue queries whose result will combine private and public data, but private data must not be revealed. GhostDB is an architecture and system to achieve this. You carry private data in a smart USB key (a large Flash persistent store combined with a tamper and snoop-resistant CPU and small RAM). When the key is plugged in, you can issue queries that link private and public data and be sure that the only information revealed to a potential spy is which queries you pose. Queries linking public and private data entail novel distributed processing techniques on extremely unequal devices (standard computer and smart USB key). This paper presents the basic framework to make this all work intuitively and efficiently

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

HAL UVSQ

The Design and Implementation of Modern Column-Oriented Database Systems

Author: Abadi D.
Boncz P.A. (Peter)
Harizopoulos S.
Idreos S. (Stratos)
Madden S. (Samuel)
Publication venue: 'Now Publishers'
Publication date: 01/12/2013
Field of study

CWI's Institutional Repository

Letter from the Special Issue Editor

Author: Boncz P.A. (Peter)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/12/2010
Field of study

Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies

CWI's Institutional Repository

Index-based Join Operations in Hive

Author: Mofidpoor Mahsa
Publication venue
Publication date: 01/04/2013
Field of study

ABSTRACT INDEX-BASED JOIN OPERATIONS IN HIVE MAHSA MOFIDPOOR The exponential growth of data being generated, manipulated, analyzed, and archived nowadays introduces new challenges and opportunities for dealing with the so called big data. Hive is a batch-oriented big data software, well suited for query processing and data analysis. Originally developed by Facebook in 2009 and now under the Apache Software Foundation, Hive is gaining popularity for its SQL like query language HiveQL and for supporting majority of the SQL operations in relational database management systems (RDBMS). Being the expensive operation in RDBMS, join has been the focus of many query optimization techniques to improve performance of database systems. We investigate such techniques for join operations in Hive and develop an index-based join algorithm for queries in HiveQL. When a query requires only a small subset of data selected by a predicate in the WHERE clause, the brute-force method which scans the entire tables results in poor performance for redundant disk I/Os, and irrelevant maps initiation in case the query is issued using the mapreduce. In this work, we implement the proposed index-based technique and integrate it in Hive. To add our extension, we obtain Hive architecture details by reverse engineering the code and map our design to the conceptual optimization flow.To evaluate the performance, after setting up the environment, we run relevant test queries on datasets generated using the industry standard benchmark, TPC-H. Our results indicate significant performance gain over relatively large data or highly selective queries

Concordia University Research Repository

Recommended from our members

Physical Plan Instrumentation in Databases: Mechanisms and Applications

Author: Psallidas Fotis
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

Database management systems (DBMSs) are designed with the goal set to compile SQL queries to physical plans that, when executed, provide results to the SQL queries. Building on this functionality, an ever-increasing number of application domains (e.g., provenance management, online query optimization, physical database design, interactive data profiling, monitoring, and interactive data visualization) seek to operate on how queries are executed by the DBMS for a wide variety of purposes ranging from debugging and data explanation to optimization and monitoring. Unfortunately, DBMSs provide little, if any, support to facilitate the development of this class of important application domains. The effect is such that database application developers and database system architects either rewrite the database internals in ad-hoc ways; work around the SQL interface, if possible, with inevitable performance penalties; or even build new databases from scratch only to express and optimize their domain-specific application logic over how queries are executed. To address this problem in a principled manner in this dissertation, we introduce a prototype DBMS, namely, Smoke, that exposes instrumentation mechanisms in the form of a framework to allow external applications to manipulate physical plans. Intuitively, a physical plan is the underlying representation that DBMSs use to encode how a SQL query will be executed, and providing instrumentation mechanisms at this representation level allows applications to express and optimize their logic on how queries are executed. Having such an instrumentation-enabled DBMS in-place, we then consider how to express and optimize applications that rely their logic on how queries are executed. To best demonstrate the expressive and optimization power of instrumentation-enabled DBMSs, we express and optimize applications across several important domains including provenance management, interactive data visualization, interactive data profiling, physical database design, online query optimization, and query discovery. Expressivity-wise, we show that Smoke can express known techniques, introduce novel semantics on known techniques, and introduce new techniques across domains. Performance-wise, we show case-by-case that Smoke is on par with or up-to several orders of magnitudes faster than state-of-the-art imperative and declarative implementations of important applications across domains. As such, we believe our contributions provide evidence and form the basis towards a class of instrumentation-enabled DBMSs with the goal set to express and optimize applications across important domains with core logic over how queries are executed by DBMSs

Columbia University Academic Commons

Compilation and Code Optimization for Data Analytics

Author: Abdullah Abdul Rahim
Kasim Rizanaliah
Mohamad Basir Muhammad Sufyan Safwan
Ramli Mohd Zulkifli
Selamat Nur Asmiza
Publication venue: Lausanne, EPFL
Publication date: 01/03/2016
Field of study

The trade-offs between the use of modern high-level and low-level programming languages in constructing complex software artifacts are well known. High-level languages allow for greater programmer productivity: abstraction and genericity allow for the same functionality to be implemented with significantly less code compared to low-level languages. Modularity, object-orientation, functional programming, and powerful type systems allow programmers not only to create clean abstractions and protect them from leaking, but also to define code units that are reusable and easily composable, and software architectures that are adaptable and extensible. The abstraction, succinctness, and modularity of high-level code help to avoid software bugs and facilitate debugging and maintenance. The use of high-level languages comes at a performance cost: increased indirection due to abstraction, virtualization, and interpretation, and superfluous work, particularly in the form of tempory memory allocation and deallocation to support objects and encapsulation. As a result of this, the cost of high-level languages for performance-critical systems may seem prohibitive. The vision of abstraction without regret argues that it is possible to use high-level languages for building performance-critical systems that allow for both productivity and high performance, instead of trading off the former for the latter. In this thesis, we realize this vision for building different types of data analytics systems. Our means of achieving this is by employing compilation. The goal is to compile away expensive language features -- to compile high-level code down to efficient low-level code

Infoscience - École polytechnique fédérale de Lausanne

Universiti Teknikal Malaysia Melaka (UTeM) Repository