112,540 research outputs found

    Predicate Transfer: Efficient Pre-Filtering on Multi-Join Queries

    Full text link
    This paper presents predicate transfer, a novel method that optimizes join performance by pre-filtering tables to reduce the join input sizes. Predicate transfer generalizes Bloom join, which conducts pre-filtering within a single join operation, to multi-table joins such that the filtering benefits can be significantly increased. Predicate transfer is inspired by the seminal theoretical results by Yannakakis, which uses semi-joins to pre-filter acyclic queries. Predicate transfer generalizes the theoretical results to any join graphs and use Bloom filters to replace semi-joins leading to significant speedup. Evaluation shows predicate transfer can outperform Bloom join by 3.1x on average on TPC-H benchmark.Comment: 6 pages, 4 figure

    Variation of bloom filters applied in distributed query optimization.

    Get PDF
    Distributed query processing is important for Distributed Database Systems. Through the past years, the research focus in distributed query processing has been on how to realize join operations with different operators such as semi-join and Bloom Filter. Experiments show that using bloom filters, the hash-semijoin, almost always does better than semi-join for the query processing. However as long as you use bloom filter, you cannot avoid collisions. So in order to get the cheaper processing, some of the past work uses two or more bloom filters to do the hash-semijoin. However several factors still affect the cost and optimization result. (1) How to decide the perfect number of the bloom filters, and what kind of bloom filter should be chosen. (2) There is no way to avoid collisions when utilizing bloom filters. (3) With bloom filter, we cannot keep the exact location information of the joining attributes (loss of join information). (4) With bloom filter, we never can combine the useful composite semi-join in the process. Taking the idea of PERF join into account, why not use the bloom filter (hash-semijoin) concept but come up with a new kind of filter Complete Reducing Filter (CRF), which can avoid the disadvantages of bloom filter, as well as inherit the advantages of it? We propose and implement a new algorithm called Complete Reducing Filter (CRF) based on PERF join, which can keep the join location information, as well as lower transmission cost (because it\u27s still using the filter concept). At the same time, CRF can combine the composite semi join into the process, which overcome the impossibility if only using a bloom filter. With the variation of the bloom filter, we try to achieve better performance with lower cost. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2003 .Z535. Source: Masters Abstracts International, Volume: 42-03, page: 0979. Adviser: Joan Morrissey. Thesis (M.Sc.)--University of Windsor (Canada), 2003

    An evaluation between Bloom Filter join and PERF join in Distributed Query Processing

    Get PDF
    Nowadays, with the explosion of information and the telecommunication era\u27s coming, more and more huge applications encourage decentralization of data while accessing data from different sites [HFB00]. The process of retrieving data from different sites called Distributed Query Processing. The objective of distributed query optimization is to find the most cost-effective of executing query across the network [OV99]. Semijoin [BC81] [BG+81] is known as an effective operator to eliminate the tuples of a relation which are not contributive to a query. 2-way semijoin [KR87] is an extended version of semijoin which not only performs forward reduction like traditional semijoin does, but also provides backward reduction always in cost-effective way. Bloom Filter[B70] and PERF [LR95] are 2 filter based techniques which use a bit vector to represent of the original join attributes projection during the data transmission. Compare with generating a bit array with hash function in bloom filter, Perf join is based on the tuples scan order to avoid losing information caused by hash collision. In the thesis, we will apply both bloom filter and pert on 2-way semijoin algorithms to reduce transmission cost of distributed queries. Performance of propose algorithms will compare against each others and IFS (Initial Feasible Solution) through amount of experiments. \u27Keywords:\u27 Distributed Query Processing, Semijoin, Bloom Filter, Perf Join

    The moral of Ulysses

    Get PDF
    Many critics are confused about the total meaning of James Joyce\u27s Ulysses. David Daiches in The Novel and the Modern World states that critics can acclaim the style, the organisation, the complexity, the insight, the ingenuity, and many other separate aspects of the work, but what are they to say of the whole? Daiches is obviously among those critics who pass Ulysses off as art for art \u27s sake. On the other hand, William M. Schutte points out that critics who have a good deal to say about Ulysses as a whole are unfortunately saying the wrong things. These critics whom Schutte attacks believe that Ulysses comes to a happy and fruitful close, while it is my intention in this thesis to support Schutte\u27s contention that Ulysses ends in utter failure, since Leopold Bloom and Stephen Dedalus will never join together in a common purpose to save Ireland. Along with maintaining Schutte\u27s contention, I intend to prove that Joyce is making a strong moral statement in Ulysses through Bloom and Stephen\u27s inability to join together. Joyce is attempting to show to Ireland and the world the need for a union of understanding between men which will enable them to join their talents and to strive together in a common and purposeful endeavor to better their condition

    The End of Slow Networks: It's Time for a Redesign

    Full text link
    Next generation high-performance RDMA-capable networks will require a fundamental rethinking of the design and architecture of modern distributed DBMSs. These systems are commonly designed and optimized under the assumption that the network is the bottleneck: the network is slow and "thin", and thus needs to be avoided as much as possible. Yet this assumption no longer holds true. With InfiniBand FDR 4x, the bandwidth available to transfer data across network is in the same ballpark as the bandwidth of one memory channel, and it increases even further with the most recent EDR standard. Moreover, with the increasing advances of RDMA, the latency improves similarly fast. In this paper, we first argue that the "old" distributed database design is not capable of taking full advantage of the network. Second, we propose architectural redesigns for OLTP, OLAP and advanced analytical frameworks to take better advantage of the improved bandwidth, latency and RDMA capabilities. Finally, for each of the workload categories, we show that remarkable performance improvements can be achieved

    Opportunistic linked data querying through approximate membership metadata

    Get PDF
    Between URI dereferencing and the SPARQL protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute SPARQL queries against low-cost servers, at the cost of higher bandwidth. Increasing a client's efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical SPARQL query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing HTTP requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer HTTP requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface
    • …
    corecore