57 research outputs found

    Introductory Editorial

    Get PDF
    The Open Journal of Databases (OJDB) is a new open access journal covering all aspects of database research and technology. In this editorial, the first issue of the journal is presented

    Special Issue on High-Level Declarative Stream Processing

    Get PDF
    Stream processing as an information processing paradigm has been investigated by various research communities within computer science and appears in various applications: realtime analytics, online machine learning, continuous computation, ETL operations, and more. The special issue on "High-Level Declarative Stream Processing" investigates the declarative aspects of stream processing, a topic of undergoing intense study. It is published in the Open Journal of Web Technologies (OJWT) (www.ronpub.com/ojwt). This editorial provides an overview over the aims and the scope of the special issue and the accepted papers

    An NVM Aware MariaDB Database System and Associated IO Workload on File Systems

    Get PDF
    MariaDB is a community-developed fork of the MySQL relational database management system and originally designed and implemented in order to use the traditional spinning disk architecture. With Non-Volatile memory (NVM) technology now in the forefront and main stream for server storage (Data centers), MariaDB addresses the need by adding support for NVM devices and introduces NVM Compression method. NVM Compression is a novel hybrid technique that combines application level compression with flash awareness for optimal performance and storage efficiency. Utilizing new interface primitives exported by Flash Translation Layers (FTLs), we leverage the garbage collection available in flash devices to optimize the capacity management required by compression systems. We implement NVM Compression in the popular MariaDB database and use variants of commonly available POSIX file system interfaces to provide the extended FTL capabilities to the user space application. The experimental results show that the hybrid approach of NVM Compression can improve compression performance by 2-7x, deliver compression performance for flash devices that is within 5% of uncompressed performance, improve storage efficiency by 19% over legacy Row-Compression, reduce data writes by up to 4x when combined with other flash aware techniques such as Atomic Writes, and deliver further advantages in power efficiency and CPU utilization. Various micro benchmark measurement and findings on sparse files call for required improvement in file systems for handling of punch hole operations on files

    Which NoSQL Database? A Performance Overview

    Get PDF
    NoSQL data stores are widely used to store and retrieve possibly large amounts of data, typically in a key-value format. There are many NoSQL types with different performances, and thus it is important to compare them in terms of performance and verify how the performance is related to the database type. In this paper, we evaluate five most popular NoSQL databases: Cassandra, HBase, MongoDB, OrientDB and Redis. We compare those databases in terms of query performance, based on reads and updates, taking into consideration the typical workloads, as represented by the Yahoo! Cloud Serving Benchmark. This comparison allows users to choose the most appropriate database according to the specific mechanisms and application needs

    Provenance Management over Linked Data Streams

    Get PDF
    Provenance describes how results are produced starting from data sources, curation, recovery, intermediate processing, to the final results. Provenance has been applied to solve many problems and in particular to understand how errors are propagated in large-scale environments such as Internet of Things, Smart Cities. In fact, in such environments operations on data are often performed by multiple uncoordinated parties, each potentially introducing or propagating errors. These errors cause uncertainty of the overall data analytics process that is further amplified when many data sources are combined and errors get propagated across multiple parties. The ability to properly identify how such errors influence the results is crucial to assess the quality of the results. This problem becomes even more challenging in the case of Linked Data Streams, where data is dynamic and often incomplete. In this paper, we introduce methods to compute provenance over Linked Data Streams. More specifically, we propose provenance management techniques to compute provenance of continuous queries executed over complete Linked Data streams. Unlike traditional provenance management techniques, which are applied on static data, we focus strictly on the dynamicity and heterogeneity of Linked Data streams. Specifically, in this paper we describe: i) means to deliver a dynamic provenance trace of the results to the user, ii) a system capable to execute queries over dynamic Linked Data and compute provenance of these queries, and iii) an empirical evaluation of our approach using real-world datasets

    Using Business Intelligence to Improve DBA Productivity

    Get PDF
    The amount of data collected and used by companies has grown rapidly in size over the last decade. Business leaders are now using Business Intelligence (BI) systems to make effective business decisions against large amounts of data. The growth in the size of data has been a major challenge for Database Administrators (DBAs). The increase in the number and size of databases at the speed they have grown has made it difficult for DBA teams to provide the same level of service that the business requires they provide. The methods that DBAs have used in the last several decades can no longer be performed with the efficiency needed over all of the databases they administer. This paper presents the first BI system to improve DBA productivity and providing important data metrics for Information Technology (IT) managers. The BI system has been well received by Sherwin Williams Database Administrators. It has i) enabled the DBA team to quickly determine which databases needed work by a DBA without manually logging into the system; ii) helped the DBA team and its management to easily answer other business users' questions without using DBAs' time to research the issue; and iii) helped the DBA team to provide the business data for unanticipated audit request

    Machine Learning on Large Databases: Transforming Hidden Markov Models to SQL Statements

    Get PDF
    Machine Learning is a research field with substantial relevance for many applications in different areas. Because of technical improvements in sensor technology, its value for real life applications has even increased within the last years. Nowadays, it is possible to gather massive amounts of data at any time with comparatively little costs. While this availability of data could be used to develop complex models, its implementation is often narrowed because of limitations in computing power. In order to overcome performance problems, developers have several options, such as improving their hardware, optimizing their code, or use parallelization techniques like the MapReduce framework. Anyhow, these options might be too cost intensive, not suitable, or even too time expensive to learn and realize. Following the premise that developers usually are not SQL experts we would like to discuss another approach in this paper: using transparent database support for Big Data Analytics. Our aim is to automatically transform Machine Learning algorithms to parallel SQL database systems. In this paper, we especially show how a Hidden Markov Model, given in the analytics language R, can be transformed to a sequence of SQL statements. These SQL statements will be the basis for a (inter-operator and intra-operator) parallel execution on parallel DBMS as a second step of our research, not being part of this paper

    Branch-and-Bound Ranked Search by Minimizing Parabolic Polynomials

    Get PDF
    The Branch-and-Bound Ranked Search algorithm (BRS) is an efficient method for answering top-k queries based on R-trees using multivariate scoring functions. To make BRS effective with ascending rankings, the algorithm must be able to identify lower bounds of the scoring functions for exploring search partitions. This paper presents BRS supporting parabolic polynomials. These functions are common to minimize combined scores over different attributes and cover a variety of applications. To the best of our knowledge the problem to develop an algorithm for computing lower bounds for the BRS method has not been well addressed yet

    High-Dimensional Spatio-Temporal Indexing

    Get PDF
    There exist numerous indexing methods which handle either spatio-temporal or high-dimensional data well. However, those indexing methods which handle spatio-temporal data well have certain drawbacks when confronted with high-dimensional data. As the most efficient spatio-temporal indexing methods are based on the R-tree and its variants, they face the well known problems in high-dimensional space. Furthermore, most high-dimensional indexing methods try to reduce the number of dimensions in the data being indexed and compress the information given by all dimensions into few dimensions but are not able to store now - relative data. One of the most efficient high-dimensional indexing methods, the Pyramid Technique, is able to handle high-dimensional point-data only. Nonetheless, we take this technique and extend it such that it is able to handle spatio-temporal data as well. We introduce a technique for querying in this structure with spatio-temporal queries. We compare our technique, the Spatio-Temporal Pyramid Adapter (STPA), to the RST-tree for in-memory and on-disk applications. We show that for high dimensions, the extra query-cost for reducing the dimensionality in the Pyramid Technique is clearly exceeded by the rising query-cost in the RST-tree. Concluding, we address the main drawbacks and advantages of our technique

    XML-based Execution Plan Format (XEP)

    Get PDF
    Execution plan analysis is one of the most common SQL tuning tasks performed by relational database administrators and developers. Currently each database management system (DBMS) provides its own execution plan format, which supports system-specific details for execution plans and contains inherent plan operators. This makes SQL tuning a challenging issue. Firstly, administrators and developers often work with more than one DBMS and thus have to rethink among different plan formats. In addition, the analysis tools of execution plans only support single DBMSs, or they have to implement separate logic to handle each specific plan format of different DBMSs. To address these problems, this paper proposes an XML-based Execution Plan format (XEP), aiming to standardize the representation of execution plans of relational DBMSs. Two approaches are developed for transforming DBMS-specific execution plans into XEP format. They have been successfully evaluated for IBM DB2, Oracle Database and Microsoft SQL
    corecore