8 research outputs found

    Compressed positionally encoded record filters in distributed query processing.

    Get PDF
    Different from a centralized database system, distributed query processing involves data transmission among distributed sites, which makes reducing transmission cost a major goal for distributed query optimization. A Positionally Encoded Record Filter (PERF) has attracted research attention as a cost-effective operator to reduce transmission cost. A PERF is a bit array generated by relation tuple scan order instead of hashing, so that it inherits the same compact size benefit as a Bloom filter while suffering no loss of join information caused by hash collisions. Our proposed algorithm PERF_C (Compressed PERF) further reduces the transmission cost in algorithm PERF by compressing both the join attributes and the corresponding PERF filters using arithmetic coding. We prove by time complexity analysis that compression is more efficient than sorting, which was proposed by earlier research to remove duplicates in algorithm PERF. Through the experiments on our synthetic testbed with 36 types of distributed queries, algorithm PERF_C effectively reduces the transmission cost with a cost reduction ratio of 62%--77% over IFS. And PERF_C outperforms PERF with a gain of 16%--36% in cost reduction ratio. A new metric to measure the compression speed in bits per second, compression bps , is defined as a guideline to decide when compression is beneficial. When compression overhead is considered, compression is beneficial only if compression bps is faster than data transfer speed. Tested on both randomly generated and specially designed distributed queries, number of join attributes, size of join attributes and relations, level of duplications are identified to be critical database factors affecting compression. Tested under three typical real computing platforms, compression bps is measured over a wide range of data size and falls in the range from 4M b/s to 9M b/s. Compared to the present relatively slow data transfer rate over Internet, compression is found to be an effective means of reducing transmission cost in distributed query processing. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Z565. Source: Masters Abstracts International, Volume: 43-01, page: 0249. Adviser: J. Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2004

    An Experimental Study Into the Effect of Varying the Join Selectivity Factor on the Performance of Join Methods in Relational Databases

    Get PDF
    Relational database systems use join queries to retrieve data from two relations. Several join methods can be used to execute these queries. This study investigated the effect of varying join selectivity factors on the performance of the join methods. Experiments using the ORACLE environment were set up to measure the performance of three join methods: nested loop join, sort merge join and hash join. The performance was measured in terms of total elapsed time, CPU time and the number of I/O reads. The study found that the hash join performs better than the nested loop and the sort merge under all varying conditions. The nested loop competes with the hash join at low join selectivity factor. The results also showed that the sort merge join method performs better than the nested loop when a predicate is applied to the inner table

    PossDB: An Uncertainty Data Management System Based on Conditional Tables

    Get PDF
    Due to the ever increasing importance of the Internet, interoperability of heterogeneous data sources is as well of ever increasing importance. Interoperability could be achieved for instance through data integration and data exchange. Common to both approaches is the need for the database management system to be able to store and query incomplete databases. In this thesis we present PossDB, a database management system capable of storing and querying incomplete databases. The system is a wrapper over PostgreSQL, and the query language is an extension of a subset of standard SQL. Our experimental results show that our system scales well, actually better than comparable systems

    Data Mining та машинні техніки навчання для виявлення вторгнення в кібербезпеку робототехнічних та автономних систем

    Get PDF
    У роботі розглянуто проблему в області кібербезпеки повязану з методами аналізу великих масивів даних для робототехнічних систем. Об’єктом даної роботи є дослідницька система на базі методології паралельних обчислень використовуючи інструменти Hadoop. Предметом виступають методи та процеси Data Mining і машинних технік навчання для виявлення вторгнення в кібербезпеку робототехнічних і автономних систем. В поданої роботі, розглянуто основні особливості існуючої системи (SIEM). які дозволяє оброблювати великі масиви даних, її переваги та недоліки, Здійснений аналіз тактик по побудові Security Analitics System, які впливають на точність, надійність, продуктивність, масштабованість проектуємих IDS систем. Реалізована дослідницька система на базі методології паралельних обчислень використовуючи інструменти Hadoop, що забезпечує ефективне функціонування в умовах атак. Дана система може бути використана в діяльності конкретної установи, а також може бути використаний і іншими установами для вдосконалення паралельних обчислень використовуючи інструменти Hadoop, також дана концепція викладу даного дослідження може бути використана в якості методичного посібника при розробці системи виявлення вторгнення в кібербезпеку робототехнічних і автономних систем. Дозволяє збільшити швидкість обробки даних та зменшити час аналізу данних використовуючи парадигму MapReduce. Розмір пояснювальної записки – 111 аркушів, містить 31 ілюстрацій, 26 таблиць, 5 додатків.The paper deals with the problem of cybersecurity associated with methods of analysis of large data sets for robotic systems. The object of this work is a research system based on the methodology of parallel computing using Hadoop tools. The subject is the methods and processes of Data Mining and machine learning techniques to detect the invasion of the cybersecurity of robotic and autonomous systems. In the given work, the main features of the existing system (SIEM) are considered. which allows processing large volumes of data, its advantages and disadvantages, Analysis of the tactics for constructing the Security Analitics System, which affect the accuracy, reliability, performance, scalability of project IDS systems. A research system implemented on the basis of parallel computing methodology using the Hadoop tools, which provides effective operation under attack conditions. This system can be used in the activities of a particular institution, and can also be used by other institutions to improve parallel computing using Hadoop tools, this concept can also be used as a methodological guide for the development of a system for detecting cybersecurity robotic and autonomous systems . Allows you to increase the speed of data processing and reduce the time of data analysis using the MapReduce paradigm. The size of the explanatory note is 111 sheets, contains 31 illustrations, 26 tables, 5 appendices

    Quality of Service and Optimization in Data Integration Systems

    Get PDF
    This work presents techniques for the construction of a global data integrations system. Similar to distributed databases this system allows declarative queries in order to express user-specific information needs. Scalability towards global data integration systems and openness were major design goals for the architecture and techniques developed in this work. It is shown how service composition, extensibility and quality of service can be supported in an open system of providers for data, functionality for query processing operations, and computing power.Diese Arbeit präsentiert Techniken für den Aufbau eines globalen Datenintegrationssystems. Analog zu verteilten Datenbanken unterstützt dieses System deklarative Anfragen, mit denen Benutzer die gesuchte Information beschreiben können. Die Skalierbarkeit in einem globalen Kontext und die Offenheit waren hauptsächliche Entwicklungsziele der Architektur und der Techniken, die in dieser Arbeit entstanden sind. Es wird gezeigt wie Dienstekomposition, Erweiterbarkeit und Dienstgüte in einem offenen System von Anbietern für Daten, Anfrageverarbeitungsfunktionalität und Rechenleistung unterstützt werden können

    Join Algorithm Costs Revisited

    No full text
    A method of analysing join algorithms based upon the time required to access, transfer and perform the relevant CPU based operations on a disk page is proposed. The costs of variations of several of the standard join algorithms, including nested block, sort-merge, GRACE hash and hybrid hash, are presented. For a given total buffer size, the cost of these join algorithms depends on the parts of the buffer allocated for each purpose (for example, when joining two relations using the nested block join algorithm the amount of buffer space allocated for the outer and inner relations can significantly affect the cost of the join). Analysis of expected and experimental results of various join algorithms show that a combination of the optimal nested block and optimal GRACE hash join algorithms usually provide the greatest cost benefit. Algorithms to quickly determine the buffer allocation producing the minimal cost for each of these algorithms are presented. 1 Introduction In the past, the ..

    Join algorithm costs revisited

    No full text
    corecore