697 research outputs found
A bloom-filter strategy for response time reduction in distributed query processing.
In distributed database systems, query optimization is to find strategies attempt to minimize the amount of data transmitted over the network. Optimization algorithms have an important impact on the performance of distributed query processing. Since optimal query processing in distributed database systems has been shown to be NP-Hard [WC96], heuristics are applied to find a cost-effective and efficient (but suboptimal) processing strategy. Many query optimization strategies have been proposed to minimize either the total cost or the response time. The approaches in distributed query processing have mainly focused on the use of joins, semijoins, and filters. In this thesis, we propose a new reduction strategy based on bloom-filters to significantly reduce the response time of a distributed query. This algorithm can process general queries consisting of an arbitrary number of relations and join attributes. The performance of the algorithm with respect to response time is compared against the Initial Feasible Solution (IFS). An amount of experimental results has been used to evaluate the performance of our algorithm. Compared to the IFS, our algorithm provides a significantly improved query solution. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .G36. Source: Masters Abstracts International, Volume: 43-05, page: 1749. Thesis (M.Sc.)--University of Windsor (Canada), 2003
An evaluation of a 2-way semijoin distributed query processing algorithm
The 2-way semijoin is proposed as an important extended version of the semijoin, which adds a backward reduction to maximize the reduction capability of traditional semijoin operations used as an effective operator for minimizing transmission cost in distributed query processing. In this thesis, we evaluate the 2-way semijoin algorithm objectively against a full reducer which is the algorithm that fully reduces all relations involved in a query by eliminating all non-participating tuples from relations. Instead of using filter-based algorithm, our algorithm is implemented so that it avoids hash collisions and also allows for composite semijoins. A series of experiments with various queries are carried out to study the above issues. It has been show that using our 2-way semijoin algorithm to reduce the query relations achieves significantly reduction effect. It performs well with respectable results on both the average percentage reduction of query relations and the percentage of queries that achieve full reduction in terms of total cost. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2000 .C344. Source: Masters Abstracts International, Volume: 40-03, page: 0721. Adviser: Joan M. Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2001
Reduction of collisions in Bloom filters during distributed query optimization.
The goal of distributed query optimization is to find the optimal strategy for the execution of a given query. The approaches in distributed query processing have mainly focused on the use of joins, semijoins, and filters. Semijoins have the advantage over joins in that there are no increases in data sizes. However, a semijoin needs more local processing such as projection and higher data transmission. To improve the distributed query processing, the filter-based approach is utilized. One of the limitations of this approach is collisions. We investigate how collisions affect the performance of the algorithm and how performance can be improved given those collisions. Our proposed algorithm utilizes two sets of filters to reduce the collisions, so the performance has been improved when collisions exist. Our proposed algorithm is evaluated objectively by comparison to a full reducer which is the algorithm that fully reduces all relations involved in a query by eliminating all non-participating tuples from the relations. The results of the evaluation show that: (1) With a perfect hash function, on average, our algorithm eliminates 97.41% of the unneeded data and fully reduces the relations of over 70% of the queries. (2) Using a single set of filters with specific percentages of collisions, on average, less than half of a queries are fully reduced by the algorithm. Therefore, the collisions substantially affects the performance. (3) Using two sets of filters, On average, our algorithm eliminates 95% of noncontributive tuples and achieves over 60% full reduction. In conclusion, our improved algorithm utilizes the two sets of filters to reduce the effects of collisions substantially. Therefore, we improve the performance of our algorithm under the assumption of collisions which is the major problem in using Bloom filters during distributed query optimization.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1999 .L53. Source: Masters Abstracts International, Volume: 39-02, page: 0528. Adviser: Joan Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 1999
Spinning Relations: High-Speed Networks for Distributed Join Processing
By leveraging modern networking hardware (RDMA-enabled network cards), we can shift priorities in distributed database processing significantly. Complex and sophisticated mechanisms to avoid network traffic can be replaced by a scheme that takes advantag
Implementation of composite semijoins using a variation of Bloom filters.
Different from a centralized database system, distributed query processing involves data transmission among different sites and this communication cost is a dominant factor compared to local processing cost. So, the objective of distributed query optimization is to find strategies to minimize the amount of data transmitted over the network. Since optimal query processing in distributed database systems has been shown to be an NP-hard problem, heuristics are applied to find a near-optimal processing strategy. Previous research has mainly focused on the use of joins, semijoins, and hash semijoins (Bloom filters). The semijoin is a commonly recognized operator, which provides efficient query results. As a variation of semijoin, the composite semijoin is beneficial to do semijoins as one composite rather than as multiple single column semijoins. The Hash semijoin (which uses a Bloom filter) is used to minimize the cost of a semijoin operation. This thesis report provides a summary of each category of query processing techniques and optimization algorithms. Also in this thesis, we propose a new algorithm called Composite Semijoin Filter by combining the idea of composite semijoins, Bloom filters and PERF joins. One of the advantages of this algorithm is to avoid collisions. The algorithm is evaluated and compared with initial feasible solution (IFS) and another filter-based algorithm. It has been shown that the algorithm gives substantial reduction on relations and the total cost.Dept. of Computer Science. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .Z58. Source: Masters Abstracts International, Volume: 43-01, page: 0249. Adviser: Joan Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2004
Semi-join strategies for total cost minimization in distributed query processing.
A new static heuristic, called Algorithm W is presented as an efficient method for reducing the total volume of data transmitted over the network during distributed query processing. It uses the concepts of profit, marginal profit and gain to construct small, highly selective reducers using cost-effective semi-join sequences. In most cases the heuristic has a complexity of O(nm). A limitation of static strategies, such as Algorithm W is that they rely on accurate estimates to perform properly. The presence of estimation errors may lead to sub-optimal solutions. A solution to this problem is the use of a dynamic strategy (Boderick, 1985; Boderick, Pyra et al., 1989) in which the schedule of operations is monitored and corrected if the performance deteriorates. A purely dynamic heuristic, Algorithm DW is proposed which uses up to date information eliminating the need for schedule monitoring. It is shown that the overheads incurred by using exact information are minimal with respect to the overall total cost. A benchmark database is proposed upon which the empirical performance of the heuristics can be measured. Algorithm W is evaluated against the AHY General (total time) algorithm (Apers, Hevner, Yao, 1983) to investigate whether improvements are possible. The performance of the proposed heuristics are evaluated to test the hypothesis that a dynamic strategy using better estimates will produce improved schedules. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis1995 .B42. Source: Masters Abstracts International, Volume: 34-06, page: 2394. Adviser: J. M. Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 1996
An evaluation between Bloom Filter join and PERF join in Distributed Query Processing
Nowadays, with the explosion of information and the telecommunication era\u27s coming, more and more huge applications encourage decentralization of data while accessing data from different sites [HFB00]. The process of retrieving data from different sites called Distributed Query Processing. The objective of distributed query optimization is to find the most cost-effective of executing query across the network [OV99]. Semijoin [BC81] [BG+81] is known as an effective operator to eliminate the tuples of a relation which are not contributive to a query. 2-way semijoin [KR87] is an extended version of semijoin which not only performs forward reduction like traditional semijoin does, but also provides backward reduction always in cost-effective way. Bloom Filter[B70] and PERF [LR95] are 2 filter based techniques which use a bit vector to represent of the original join attributes projection during the data transmission. Compare with generating a bit array with hash function in bloom filter, Perf join is based on the tuples scan order to avoid losing information caused by hash collision. In the thesis, we will apply both bloom filter and pert on 2-way semijoin algorithms to reduce transmission cost of distributed queries. Performance of propose algorithms will compare against each others and IFS (Initial Feasible Solution) through amount of experiments. \u27Keywords:\u27 Distributed Query Processing, Semijoin, Bloom Filter, Perf Join
Public-Private Partnership for Urban Regeneration: The Case of the Urban Transformation Companies
In many industrialized countries, the debate surrounding privatization is undergoing significant changes. If during the 80’s and 90’s attention was placed on the sale of public enterprises to private operators, recent studies and practices have increasingly focused on cooperation and involvement of private enterprises in the process of distribution of public services (Bach, 1999; Montanheiro et al., 1998; Osborne, 2000). This work is based on the change from “competition” with the private sector to “cooperation” with it and - in particular - on the dynamics and characteristics of the cooperation between public and private bodies. Furthermore, this work analyzes a specific type of public-private partnership, i.e. the companies for urban transformation established in Italy by municipalities and metropolitan areas to plan and implement urban transformation measures.Public-private partnership; new public management; public governance; urban regeneration; urban transformation companies.
The Parallelism Motifs of Genomic Data Analysis
Genomic data sets are growing dramatically as the cost of sequencing
continues to decline and small sequencing devices become available. Enormous
community databases store and share this data with the research community, but
some of these genomic data analysis problems require large scale computational
platforms to meet both the memory and computational requirements. These
applications differ from scientific simulations that dominate the workload on
high end parallel systems today and place different requirements on programming
support, software libraries, and parallel architectural design. For example,
they involve irregular communication patterns such as asynchronous updates to
shared data structures. We consider several problems in high performance
genomics analysis, including alignment, profiling, clustering, and assembly for
both single genomes and metagenomes. We identify some of the common
computational patterns or motifs that help inform parallelization strategies
and compare our motifs to some of the established lists, arguing that at least
two key patterns, sorting and hashing, are missing
Distributed transaction processing in the Escada protocol
Replicação Ă© uma tĂ©cnica essencial para a implementação de bases de dados tolerantes a faltas, sendo tambĂ©m frequentemente utilizada para melhorar o seu desempenho. Infelizmente, quando critĂ©rios de consistĂŞncia forte e a capacidade de actualização a partir de qualquer rĂ©plica sĂŁo consideradas, os protocolos de replicação actualmente disponĂveis nos gestores de bases de dados comerciais nĂŁo apresentam um bom desempenho. O problema está relacionado ao custo produzido pelas interacções entre as rĂ©plicas no intuito de garantir a consistĂŞncia, e pelos protocolos de terminação que procuram assegurar que todas as rĂ©plicas concordam com o resultado da transacção. De uma maneira geral, o nĂşmero de “aborts”, “deadlocks” e mensagens trocadas cresce de maneira drástica, ao aumentar o nĂşmero de rĂ©plicas. Em outros trabalhos, foi provado que a replicação de base de dados num cenário desses Ă© impraticável.
No intuito de resolver esses problemas, diversos estudos tĂŞm sido desenvolvidos. Inicialmente, a maioria deles deixou de lado os requisitos de consistĂŞncia forte ou a capacidade de actualização a partir de qualquer rĂ©plica para conseguir soluções viáveis. Recentemente, protocolos de replicação baseados em comunicação em grupo foram propostos, nos quais os requisitos de consistĂŞncia forte e actualização a partir de qualquer rĂ©plica sĂŁo preservados e os problemas contornados. Neste contexto encontra-se o projecto Escada. Sucintamente, ele tem como objectivo estudar, projectar e implementar mecanismos de replicação transaccionais adequados para sistemas distribuĂdos de larga escala. Em particular, o projecto explora as tĂ©cnicas de replicação parcial para fornecer critĂ©rios de consistĂŞncia forte sem introduzir pesos significantes de sincronização e sem prejudicar o desempenho.
Nesta dissertação, extendemos o projecto Escada com um modelo e um mecanismo de processamento de consultas distribuĂdo, o que Ă© um requisito inevitável num ambiente de replicação parcial. AlĂ©m disso, explorando caracterĂsticas dos protocolos, propomos um cache semântico para reduzir o peso gerado ao aceder a rĂ©plicas remotas. TambĂ©m melhoramos o processo de certificação, ao procurar reduzir os “aborts”, utilizando informação semântica presente nas transacções.
Finalmente, para avaliar os protocolos desenvolvidos pelo projecto Escada, o cache semântico e o processo de certificação utilizamos um modelo de simulação que combina cĂłdigo simulado e real, o que nos permite avaliar nossas propostas em diferentes cenários e configurações. Mais do que isso, ao invĂ©s de usar cargas fictĂcias, submetemos nossas propostas a cargas baseadas nos “benchmarks” TPC-W e TPC-C.Database replication is an invaluable technique to implement fault-tolerant databases, being also frequently used to improve database performance. Unfortunately, when strong consistency among the replicas and the ability to update the database at any of the replicas are considered, the replication protocols do not scale up. The problem is related to the number of interactions among the replicas in order to guarantee consistency and to the protocols used to ensure that all the replicas agree on transactions’ result. Roughly, the number of aborts, deadlocks and messages exchanged among the replicas grows drastically, when the number of replicas increases. In related works, it has been proved that database replication in such a scenario is impractical.
In order to overcome these problems, several studies have been developed. Initially, most of them released the strong consistency and the update-anywhere requirements to achieve feasible solutions. Recently, replication protocols based on group communication were proposed, in which the strong consistency and update-anywhere requirements are preserved and the problems circumvented. This is the context of the Escada project. Briefly, it aims to study, design and implement transaction replication mechanisms suited to large scale distributed systems. In particular, the project exploits partial replication techniques to provide strong consistency criteria without introducing significant synchronization and performance overheads.
In this thesis, we augment the Escada with a distributed query processing model and mechanism, which is an inevitable requirement in a partially replicated environment. Moreover, exploiting characteristics of its protocols, we propose a semantic cache to reduce the overhead generated while accessing remote replicas. We also improve the certification process, while attempting to reduce aborts using the semantic information available in the transactions.
Finally, to evaluate the Escada protocols, the semantic caching and the certification process,
we use a simulation model that combines simulated and real code, which allows to evaluate our proposals under distinct scenarios and configurations. Furthermore, instead of using unrealistic workloads, we test our proposals using workloads based on the TPC-W and TPC-C benchmarks.Fundação para a Ciência e a Tecnologia - POSI/CHS/41285/2001
- …