Search CORE

8 research outputs found

Evaluation of Main Memory : Join Algorithms for Joins with Set Comparison Predicates

Author: Helmer Sven
Moerkotte Guido
Publication venue
Publication date: 01/01/1996
Field of study

Current data models like the NF2 model and object-oriented models support set-valued attributes. Hence, it becomes possible to have join predicates based on set comparison. This paper introduces and evaluates several main memory algorithms to evaluate efficiently this kind of join. More specifically, we concentrate on the set equality and the subset predicates

MAnnheim DOCument Server

Sorting in Space: Multidimensional, spatial, and metric data structures for applications in spatial databases, geographic information systems (GIS), and location-based services

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Cost estimation of spatial join in spatialhadoop

Author: Belussi A.
Eldawy A.
Migliorini S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Spatial join is an important operation in geo-spatial applications, since it is frequently used for performing data analysis involving geographical information. Many efforts have been done in the past decades in order to provide efficient algorithms for spatial join and this becomes particularly important as the amount of spatial data to be processed increases. In recent years, the MapReduce approach has become a de-facto standard for processing large amount of data (big-data) and some attempts have been made for extending existing frameworks for the processing of spatial data. In this context, several different MapReduce implementations of spatial join have been defined which mainly differ in the use of a spatial index and in the way this index is built and used. In general, none of these algorithms can be considered better than the others, but the choice might depend on the characteristics of the involved datasets. The aim of this work is to deeply analyse them and define a cost model for ranking them based on the characteristics of the dataset at hand (i.e., selectivity or spatial properties). This cost model has been extensively tested w.r.t. a set of synthetic datasets in order to prove its effectiveness

Catalogo dei prodotti della ricerca

Use of a weighted matching algorithm to sequence clusters in spatial join processing

Author: Husen Husen
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2006
Field of study

One of the most expensive operations in a spatial database is spatial join processing. This study focuses on how to improve the performance of such processing. The main objective is to reduce the Input/Output (I/O) cost of the spatial join process by using a technique called cluster-scheduling. Generally, the spatial join is processed in two steps, namely filtering and refinement. The cluster-scheduling technique is performed after the filtering step and before the refinement step and is part of the housekeeping phase. The key point of this technique is to realise order wherein two consecutive clusters in the sequence have maximal overlapping objects. However, finding the maximal overlapping order has been shown to be Nondeterministic Polynomial-time (NP)-complete. This study proposes an algorithm to provide approximate maximal overlapping (AMO) order in a Cluster Overlapping (CO) graph. The study proposes the use of an efficient maximum weighted matching algorithm to solve the problem of finding AMO order. As a result, the I/O cost in spatial join processing can be minimised

Research Online @ ECU

Efficient Index-based Methods for Processing Large Biological Databases.

Author: Kim You Jung
Publication venue
Publication date
Field of study

Over the last few decades, advances in life sciences have generated a vast amount of biological data. To cope with the rapid increase in data volume, there is a pressing need for efficient computational methods to query large biological datasets. This thesis develops efficient and scalable querying methods for biological data. For an efficient sequence database search, we developed two q-gram index based algorithms, miBLAST and ProbeMatch. miBLAST is designed to expedite batch identification of statistically significant sequence alignments. ProbeMatch is designed for identifying sequence alignments based on a k-mismatch model. For an efficient protein structure database search, we also developed a multi-dimensional index based algorithm method called proCC, an automatic and efficient classification framework. All these algorithms result in substantial performance improvements over existing methods. When designing index-based methods, the right choice of indexing methods is essential. In addition to developing index-based methods for biological applications, we also investigated an essential database problem that reexamines the state-of-the-art indexing methods by experimental evaluation. Our experimental study provides a valuable insight for choosing the right indexing method and also motivates a careful consideration of index structures when designing index-based methods. In the long run, index-based methods can lead to new and more efficient algorithms for querying and mining biological datasets. The examples above, which include query processing on biological sequence and geometrical structure datasets, employ index-based methods very effectively. While the database research community has long recognized the need for index-based query processing algorithms, the bioinformatics community has been slow to adopt such algorithms. However, since many biological datasets are growing very rapidly, database-style index-based algorithms are likely to play a crucial role in modern bioinformatics methods. The work proposed in this thesis lays the foundation for such methods.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61570/1/youjkim_1.pd

Deep Blue Documents at the University of Michigan

Object-based and image-based object representations

Author: Abel D. J.
Aggarwal C.
Ang C. H.
Aref W. G.
Aref W. G.
Aref W. G.
Aref W. G.
Arge L.
Arge L.
Baumgart B. G.
Becker B.
Beckmann N.
Bell S. B. M.
Berchtold S.
Brabec F.
Brinkhoff T.
Brodsky A.
Burt P. J.
Burton F. W.
Chakrabarti K.
Chakrabarti K.
Chen L.
Choubey R.
de Berg M.
DeWitt D. J.
Dittrich J.-P.
Dori D.
Douglas D. H.
Douglas D. H.
Dyer C. R.
Esperança C.
Faloutsos C.
Faloutsos C.
Finkel R. A.
Foley J. D.
Franklin W. R.
Freeston M.
Gaede V.
Garcia Y. J.
García Y. J.
García Y. J.
Gottschalk S.
Greene D.
Guttman A.
Günther O.
Günther O.
Hanan Samet
Hellerstein J. M.
Henrich A.
Henrich A.
Hilbert D.
Hoel E. G.
Hoel E. G.
Ichikawa T.
Jagadish H. V.
Jagadish H. V.
Joy K. I.
Kamel I.
Kamel I.
Katayama N.
Klinger A.
Knowlton K.
Koudas N.
Kriegel H.-P.
Leutenegger S. T.
Liu X.
Lo M.-L.
Lo M.-L.
Meagher D.
Miller R.
Moitra A.
O'Rourke J.
Orenstein J. A.
Ottmann T.
Patel J. M.
Peano G.
Preparata F. P.
Robinson J. T.
Rosenfeld A.
Ross K. A.
Roussopolos N.
Roussopoulos N.
Saalfeld A.
Sakurai Y.
Schiwietz M.
Schrack G.
Sellis T.
Shamos M. I.
Shekhar S.
Sloan
Solntseff N.
Srihari S. N.
Stonebraker M.
Tanimoto S. L.
Theodoridis Y.
Tropf H.
van den Bercken J.
van den Bercken J.
van Oosterom P.
Wang W.
Wang W.
White D. A.
White D. A.
White M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Benchmarking Spatial Join Operations with Spatial Output

Author: Erik G. Hoel
Hanan Samet
Publication venue
Publication date
Field of study

The spatial join operation is benchmarked using variants of well-known spatial data structures such as the R-tree, R-tree, R +-tree, and the PMR quadtree. The focus is on a spatial join with spatial output because the result of the spatial join frequently serves as input to subsequent spatial operations (i.e., a cascaded spatial join as would be common in a spatial spreadsheet). Thus, in addition to the time required to perform the spatial join itself (whose output is not always required to be spatial), the time to build the spatial data structure also plays an important role in the benchmark. The studied quantities are the time to build the data structure and the time to do the spatial join in an application domain consisting of planar line segment data. Experiments reveal that spatial data structures based on a disjoint decomposition of space and bounding boxes (i.e., the R +-tree and the PMR quadtree with bounding boxes) outperform the other structures that are based upon a non-disjoint decomposition (i.e., the R-tree and R-tree). As the size of the output of the spatial join increases with respect to the larger of the two inputs, the advantage of the bounding boxes used in methods based on a disjoint non-regular decomposition is no longer a factor an

CiteSeerX