33 research outputs found

    CORADD: Correlation Aware Database Designer for Materialized Views and Indexes

    Get PDF
    We describe an automatic database design tool that exploits correlations between attributes when recommending materialized views (MVs) and indexes. Although there is a substantial body of related work exploring how to select an appropriate set of MVs and indexes for a given workload, none of this work has explored the effect of correlated attributes (e.g., attributes encoding related geographic information) on designs. Our tool identifies a set of MVs and secondary indexes such that correlations between the clustered attributes of the MVs and the secondary indexes are enhanced, which can dramatically improve query performance. It uses a form of Integer Linear Programming (ILP) called ILP Feedback to pick the best set of MVs and indexes for given database size constraints. We compare our tool with a state-of-the-art commercial database designer on two workloads, APB-1 and SSB (Star Schema Benchmark---similar to TPC-H). Our results show that a correlation-aware database designer can improve query performance up to 6 times within the same space budget when compared to a commercial database designer.National Science Foundation (U.S.) (Grant IIS-0704424)SAP Corporation (Grant

    INVESTIGATING THE STAR SCHEMA BENCHMARK AS A REPLACEMENT FOR THE TPC-H DECISION SUPPORT SYSTEM

    Get PDF
    Decision Support System (DSS) are at the core of business intelligence systems. Implementation costs for enterprise level Database Management System (DBMS) and DSS average $10,461 for installation costs. This does not include costs associated with database migrations or testing, which can double the cost, nor does this quoted price include the cost of yearly licensing or support agreements. Depending on the software vendor, there may be additional costs associated with using an application cluster, logical and virtual partitioning, data guards, and even costs per processor core. It is easy to see how the cost of operating a database server can grow expensive rapidly. Information Technology (IT) decision makers and software architects need the ability to choose a DBMS to suit their application's needs. To choose the correct DBMS solution a comprehensive and adaptive benchmark is needed. This benchmark must be capable of predicting how the performance of a given system will scale, as well as offer an estimation of cost. A problematic benchmark that is unable to accurately predict these values is worthless and leads to costly software decision mistakes. To continue to be successful and remain competitive in a given industry it is important for organizations to know their customers, target and acquire new markets, and look to future trends. This is where database business intelligence and decision support systems become useful. DSS allow users to data mine critical information about their work-flows, sales history and trends and have the data readily available so that they may make informed decisions and plan future growth. Business intelligence tools and decision support systems provide executive officers and members of management, the tools needed to create complex ad-hoc queries and mine important data. Presently, IT decision makers and software engineers use the TPC-H decision sup- port system benchmark as a guide to determining the optimal hardware and database vendor configurations to utilize for their decision support system. The TPC-H benchmark is a popular decision support system benchmark. In recent years, however, TPC-H has become heavily criticized for its many problems. The issues outlined within this thesis can lead IT decision makers to purchase and implement improper hardware and software solutions. This thesis examines the criticisms and issues of the TPC-H benchmark. Utilizing Amazon Web Services cloud computing power, we evaluate the Star Schema Benchmark (SSB), as an alternative to TPC-H. We successfully identify and demonstrate several previously undefined problems in the TPC-H benchmark. Our results conclude that the SSB not only resolves the issues inherent in TPC-H, and should serve as a replacement for TPC-H

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Query execution in column-oriented database systems

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 145-148).There are two obvious ways to map a two-dimension relational database table onto a one-dimensional storage interface: store the table row-by-row, or store the table column-by-column. Historically, database system implementations and research have focused on the row-by row data layout, since it performs best on the most common application for database systems: business transactional data processing. However, there are a set of emerging applications for database systems for which the row-by-row layout performs poorly. These applications are more analytical in nature, whose goal is to read through the data to gain new insight and use it to drive decision making and planning. In this dissertation, we study the problem of poor performance of row-by-row data layout for these emerging applications, and evaluate the column-by-column data layout opportunity as a solution to this problem. There have been a variety of proposals in the literature for how to build a database system on top of column-by-column layout. These proposals have different levels of implementation effort, and have different performance characteristics. If one wanted to build a new database system that utilizes the column-by-column data layout, it is unclear which proposal to follow. This dissertation provides (to the best of our knowledge) the only detailed study of multiple implementation approaches of such systems, categorizing the different approaches into three broad categories, and evaluating the tradeoffs between approaches. We conclude that building a query executer specifically designed for the column-by-column query layout is essential to archive good performance. Consequently, we describe the implementation of C-Store, a new database system with a storage layer and query executer built for column-by-column data layout. We introduce three new query execution techniques that significantly improve performance. First, we look at the problem of integrating compression and execution so that the query executer is capable of directly operating on compressed data. This improves performance by improving I/O (less data needs to be read off disk), and CPU (the data need not be decompressed). We describe our solution to the problem of executer extensibility - how can new compression techniques be added to the system without having to rewrite the operator code? Second, we analyze the problem of tuple construction (stitching together attributes from multiple columns into a row-oriented "tuple").(cont.) Tuple construction is required when operators need to access multiple attributes from the same tuple; however, if done at the wrong point in a query plan, a significant performance penalty is paid. We introduce an analytical model and some heuristics to use that help decide when in a query plan tuple construction should occur. Third, we introduce a new join technique, the "invisible join" that improves performance of a specific type of join that is common in the applications for which column-by-column data layout is a good idea. Finally, we benchmark performance of the complete C-Store database system against other column-oriented database system implementation approaches, and against row-oriented databases. We benchmark two applications. The first application is a typical analytical application for which column-by-column data layout is known to outperform row-by-row data layout. The second application is another emerging application, the Semantic Web, for which column-oriented database systems are not currently used. We find that on the first application, the complete C-Store system performed 10 to 18 times faster than alternative column-store implementation approaches, and 6 to 12 times faster than a commercial database system that uses a row-by-row data layout. On the Semantic Web application, we find that C-Store outperforms other state-of-the-art data management techniques by an order of magnitude, and outperforms other common data management techniques by almost two orders of magnitude. Benchmark queries, which used to take multiple minutes to execute, can now be answered in several seconds.by Daniel J. Abadi.Ph.D

    Context-Aware Service Registry: Modeling and Implementation

    Get PDF
    Modern societies have become very dependent on information and services. Technology is adapting to the increasing demands of people and businesses. Context-Aware Systems are becoming ubiquitous. These systems comprise mechanisms to acquire knowledge about the surrounding environment and adapt its behaviour and service provision accordingly. Service oriented computing is the main stream software development methodology. In Service-oriented Applications (SOA), service providers publish the services created by them in service registries. These services are accessed by service requesters during discovery process. For large scale SOA, the registry structure and the type of quires that it can handle are central to efficient service discovery. Moreover, the role of context in determining services and affecting execution is central. This thesis investigates the structure of a context-aware service registry in which context-aware services are stored by service producers and retrieved by service requesters in different contexts. The thesis builds on an existing rich theoretical service model in which contract, functionality, and contexts are bundled together. The thesis investigates generic models and structures for context, context history, and context-aware registry. Also, it studies state of the arts database technologies to analyse its suitability for implementing a registry for rich services. Specifically, the thesis provides a thorough study of the structures, implementation, performance, limitations, and features of Key-Value, Documented Oriented, and Column Oriented databases while considering options for implementing a rich service registry. Database models of contexts and context-aware services are discussed and implemented. The relative performance of the models are discussed after evaluating the test results run on large data sets. Based upon test results a justification for the selected model is given

    Proceedings of the GIS Research UK 18th Annual Conference GISRUK 2010

    Get PDF
    This volume holds the papers from the 18th annual GIS Research UK (GISRUK). This year the conference, hosted at University College London (UCL), from Wednesday 14 to Friday 16 April 2010. The conference covered the areas of core geographic information science research as well as applications domains such as crime and health and technological developments in LBS and the geoweb. UCL’s research mission as a global university is based around a series of Grand Challenges that affect us all, and these were accommodated in GISRUK 2010. The overarching theme this year was “Global Challenges”, with specific focus on the following themes: * Crime and Place * Environmental Change * Intelligent Transport * Public Health and Epidemiology * Simulation and Modelling * London as a global city * The geoweb and neo-geography * Open GIS and Volunteered Geographic Information * Human-Computer Interaction and GIS Traditionally, GISRUK has provided a platform for early career researchers as well as those with a significant track record of achievement in the area. As such, the conference provides a welcome blend of innovative thinking and mature reflection. GISRUK is the premier academic GIS conference in the UK and we are keen to maintain its outstanding record of achievement in developing GIS in the UK and beyond

    Adjoined Dimension Column Clustering to Improve Data Warehouse Query Performance

    No full text

    Towards end-to-end security in internet of things based healthcare

    Get PDF
    Healthcare IoT systems are distinguished in that they are designed to serve human beings, which primarily raises the requirements of security, privacy, and reliability. Such systems have to provide real-time notifications and responses concerning the status of patients. Physicians, patients, and other caregivers demand a reliable system in which the results are accurate and timely, and the service is reliable and secure. To guarantee these requirements, the smart components in the system require a secure and efficient end-to-end communication method between the end-points (e.g., patients, caregivers, and medical sensors) of a healthcare IoT system. The main challenge faced by the existing security solutions is a lack of secure end-to-end communication. This thesis addresses this challenge by presenting a novel end-to-end security solution enabling end-points to securely and efficiently communicate with each other. The proposed solution meets the security requirements of a wide range of healthcare IoT systems while minimizing the overall hardware overhead of end-to-end communication. End-to-end communication is enabled by the holistic integration of the following contributions. The first contribution is the implementation of two architectures for remote monitoring of bio-signals. The first architecture is based on a low power IEEE 802.15.4 protocol known as ZigBee. It consists of a set of sensor nodes to read data from various medical sensors, process the data, and send them wirelessly over ZigBee to a server node. The second architecture implements on an IP-based wireless sensor network, using IEEE 802.11 Wireless Local Area Network (WLAN). The system consists of a IEEE 802.11 based sensor module to access bio-signals from patients and send them over to a remote server. In both architectures, the server node collects the health data from several client nodes and updates a remote database. The remote webserver accesses the database and updates the webpage in real-time, which can be accessed remotely. The second contribution is a novel secure mutual authentication scheme for Radio Frequency Identification (RFID) implant systems. The proposed scheme relies on the elliptic curve cryptography and the D-Quark lightweight hash design. The scheme consists of three main phases: (1) reader authentication and verification, (2) tag identification, and (3) tag verification. We show that among the existing public-key crypto-systems, elliptic curve is the optimal choice due to its small key size as well as its efficiency in computations. The D-Quark lightweight hash design has been tailored for resource-constrained devices. The third contribution is proposing a low-latency and secure cryptographic keys generation approach based on Electrocardiogram (ECG) features. This is performed by taking advantage of the uniqueness and randomness properties of ECG's main features comprising of PR, RR, PP, QT, and ST intervals. This approach achieves low latency due to its reliance on reference-free ECG's main features that can be acquired in a short time. The approach is called Several ECG Features (SEF)-based cryptographic key generation. The fourth contribution is devising a novel secure and efficient end-to-end security scheme for mobility enabled healthcare IoT. The proposed scheme consists of: (1) a secure and efficient end-user authentication and authorization architecture based on the certificate based Datagram Transport Layer Security (DTLS) handshake protocol, (2) a secure end-to-end communication method based on DTLS session resumption, and (3) support for robust mobility based on interconnected smart gateways in the fog layer. Finally, the fifth and the last contribution is the analysis of the performance of the state-of-the-art end-to-end security solutions in healthcare IoT systems including our end-to-end security solution. In this regard, we first identify and present the essential requirements of robust security solutions for healthcare IoT systems. We then analyze the performance of the state-of-the-art end-to-end security solutions (including our scheme) by developing a prototype healthcare IoT system
    corecore