Search CORE

120 research outputs found

The Buddy Effect: An efficient and robust access method for spatial data base systems

Author: Kriegel Hans-Peter
Seeger B.
Publication venue
Publication date: 01/01/1990
Field of study

Advance of the Access Methods

Author: Ivanova Krassimira
Karastanev Stefan
Markov Krassimir
Mitov Ilia
Publication venue: Institute of Information Theories and Applications FOI ITHEA
Publication date: 01/01/2008
Field of study

The goal of this paper is to outline the advance of the access methods in the last ten years as well as to make review of all available in the accessible bibliography methods

Bulgarian Digital Mathematics Library at IMI-BAS

Recommended from our members

Heuristics and multi-dimensional physical database design

Author: Fu Z.
Publication venue
Publication date
Field of study

An expert system approach has recently been used in parameter selection for VSAM (Virtual Storage Access Method) file organisation [AL87a]. This system has been developed to aid in-house users to apply relevant facts and heuristics to optimise VSAM file design. Multi-dimensional physical database design is more sophisticated and complicated than VSAM file design. The expert system approach can be applied to select and tune physical database design for various applications. A great deal of work has been done in developing diverse algorithms or access methods to organise automated information on secondary storage devices [FA86b] [FR86] [FR88] [GU84] [HU88a] [KS88a] [KS86] [L087] [NI84] [OR88b] [OR86] [OT85] [R081], etc. However, little work has been done to enable designers to select an access method which matches a projected application profile (features and requirements) and perceived strengths and weaknesses of candidate algorithms. This thesis considers a number of grid based algorithms and makes expert assessments of each according to its strengths and weaknesses. It analyses features of various access methods and using expert knowledge matches features for a range of m-d (multi dimensional) algorithms with corresponding characteristics of an application. The knowledge-based system presented in this thesis can be applied either manually or computerised to give a systematic approach to m-d algorithm selection. A system is proposed to (1) heuristically select an initial algorithm; (2) describe how the selection process is evaluated against actual m-d algorithm performance and (3) show how the results of the evaluation can be used to refine expert knowledge embodied in the selection system. Heuristic assessments are given for several m-d access algorithms. Examples are presented to show how these heuristics are used to select a m-d access algorithm for a specific application. It is reasonable to suppose that the initial heuristic assessments are not entirely accurate. A tuning mechanism for the system heuristics is given in section 4.9. The system selection process is thereby, able to adjust to real world results. Finally, we present a simple example to illustrate how the proposed system works

City Research Online

Multidimensional access methods

Author: ABEL D. J.
ABEL D. J.
ANG C.
AREF W. G.
BAYER R.
BAYER R.
BECKER B.
BECKMANN N.
BELUSSI A.
BENTLEY J. L.
BERCHTOLD S.
BLANKEN H.
BRINKHOFF T.
BRINKHOFF T.
BRINKHOFF T.
BRINKHOFF T.
BRODSKY A.
BURKHARD W.
BURKHARD W.A.
EVANGELIDIS G.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FALOUTSOS C.
FINKEL R.
FLAJOLET P.
FRANK A.
FREESTON M.
FREESTON M.
FREESTON M.
FREESTON M.
FREESTON M.
GAEDE V.
GAEDE V.
GAEDE V.
GAEDE V.
GREENE D.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUNTHER O.
GUTING R. H.
GUTING R. H.
GUTTMAN A.
HELLERSTEIN J. M.
HELLERSTEIN J. M.
HENRICH A.
HENRICH A.
HENRICH A.
HENRICH A.
HOEL E. G.
HOEL E. G.
HUTFLESZ A.
HUTFLESZ A.
HUTFLESZ A.
HUTFLESZ A.
JAGADISH H. V.
JAGADISH H. V.
JAGADISH H.V.
KAMEL I.
KAMEL I.
KAMEL I.
KAMEL I.
KANELLAKIS P. C.
KEDEM G.
KLINGER A.
KNOTT G.
KOLOVSON C.
KORNACKER M.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KRIEGEL H.-P.
KUMAR A.
LARSON P.A.
LIN K.-I.
LITWIN W.
LOMET D. B.
LOMET D. B.
LOMET D.B.
MATSUYAMA T.
MCDONELL K. J.
NELSON R.
NG R. T.
NG V.
NG V.
NIEVERGELT
NIEVERGELT ICHS
OHSAWA Y.
OHSAWA Y.
Oliver Günther
OoI
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J.
ORENSTEIN J. A.
OTOO E. J.
OTOO E. J.
OTOO E. J.
OUKSEL M.
OUKSEL M.
PAGEL B. U.
PAGEL B. U.
PAGEL B. U.
PAPADIAS D.
PAPADOPOULOS A.
RAVISHANKAR C.
ROBINSON J.T.
ROTEM D.
ROUSSOPOULOS N.
ROUSSOPOULOS N.
SCHNEIDER R.
SCHOLL M.
SEEGER B.
SEEGER B.
SEEGER B.
SELLIS T.
SEVCIK K.
SEXTON P.
SHEKHAR S.
SIEMENS
SIX H.
SMITH T. R.
STONEBRAKER M.
STUCKEY P.
SUBRAMANIAN S.
TAMMINEN M.
TAMMINEN M.
THEODORIDIS Y.
TROPF H.
Volker Gaede
WHITE M.
WIDMAYER P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Incremental elasticity for array databases

Author: Ang K. H.
de Witt S.
Ganesan P.
P.
Stonebraker M.
Stonebraker M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2014
Field of study

Relational databases benefit significantly from elasticity, whereby they execute on a set of changing hardware resources provisioned to match their storage and processing requirements. Such flexibility is especially attractive for scientific databases because their users often have a no-overwrite storage model, in which they delete data only when their available space is exhausted. This results in a database that is regularly growing and expanding its hardware proportionally. Also, scientific databases frequently store their data as multidimensional arrays optimized for spatial querying. This brings about several novel challenges in clustered, skew-aware data placement on an elastic shared-nothing database. In this work, we design and implement elasticity for an array database. We address this challenge on two fronts: determining when to expand a database cluster and how to partition the data within it. In both steps we propose incremental approaches, affecting a minimum set of data and nodes, while maintaining high performance. We introduce an algorithm for gradually augmenting an array database's hardware using a closed-loop control system. After the cluster adds nodes, we optimize data placement for n-dimensional arrays. Many of our elastic partitioners incrementally reorganize an array, redistributing data only to new nodes. By combining these two tools, the scientific database efficiently and seamlessly manages its monotonically increasing hardware resources.Intel Corporation (Science and Technology Center for Big Data

CiteSeerX

DSpace@MIT

Crossref

Architecting Data Centers for High Efficiency and Low Latency

Author: Zhang Yunqi
Publication venue
Publication date: 01/01/2018
Field of study

Modern data centers, housing remarkably powerful computational capacity, are built in massive scales and consume a huge amount of energy. The energy consumption of data centers has mushroomed from virtually nothing to about three percent of the global electricity supply in the last decade, and will continuously grow. Unfortunately, a significant fraction of this energy consumption is wasted due to the inefficiency of current data center architectures, and one of the key reasons behind this inefficiency is the stringent response latency requirements of the user-facing services hosted in these data centers such as web search and social networks. To deliver such low response latency, data center operators often have to overprovision resources to handle high peaks in user load and unexpected load spikes, resulting in low efficiency. This dissertation investigates data center architecture designs that reconcile high system efficiency and low response latency. To increase the efficiency, we propose techniques that understand both microarchitectural-level resource sharing and system-level resource usage dynamics to enable highly efficient co-locations of latency-critical services and low-priority batch workloads. We investigate the resource sharing on real-system simultaneous multithreading (SMT) processors to enable SMT co-locations by precisely predicting the performance interference. We then leverage historical resource usage patterns to further optimize the task scheduling algorithm and data placement policy to improve the efficiency of workload co-locations. Moreover, we introduce methodologies to better manage the response latency by automatically attributing the source of tail latency to low-level architectural and system configurations in both offline load testing environment and online production environment. We design and develop a response latency evaluation framework at microsecond-level precision for data center applications, with which we construct statistical inference procedures to attribute the source of tail latency. Finally, we present an approach that proactively enacts carefully designed causal inference micro-experiments to diagnose the root causes of response latency anomalies, and automatically correct them to reduce the response latency.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/144144/1/yunqi_1.pd

Deep Blue Documents at the University of Michigan

Bridging the gap between algorithmic and learned index structures

Author: Hadian Ali
Publication venue: Computing, Imperial College London
Publication date: 01/07/2022
Field of study

Index structures such as B-trees and bloom filters are the well-established petrol engines of database systems. However, these structures do not fully exploit patterns in data distribution. To address this, researchers have suggested using machine learning models as electric engines that can entirely replace index structures. Such a paradigm shift in data system design, however, opens many unsolved design challenges. More research is needed to understand the theoretical guarantees and design efficient support for insertion and deletion. In this thesis, we adopt a different position: index algorithms are good enough, and instead of going back to the drawing board to fit data systems with learned models, we should develop lightweight hybrid engines that build on the benefits of both algorithmic and learned index structures. The indexes that we suggest provide the theoretical performance guarantees and updatability of algorithmic indexes while using position prediction models to leverage the data distributions and thereby improve the performance of the index structure. We investigate the potential for minimal modifications to algorithmic indexes such that they can leverage data distribution similar to how learned indexes work. In this regard, we propose and explore the use of helping models that boost classical index performance using techniques from machine learning. Our suggested approach inherits performance guarantees from its algorithmic baseline index, but at the same time it considers the data distribution to improve performance considerably. We study single-dimensional range indexes, spatial indexes, and stream indexing, and show that the suggested approach results in range indexes that outperform the algorithmic indexes and have comparable performance to the read-only, fully learned indexes and hence can be reliably used as a default index structure in a database engine. Besides, we consider the updatability of the indexes and suggest solutions for updating the index, notably when the data distribution drastically changes over time (e.g., for indexing data streams). In particular, we propose a specific learning-augmented index for indexing a sliding window with timestamps in a data stream. Additionally, we highlight the limitations of learned indexes for low-latency lookup on real- world data distributions. To tackle this issue, we suggest adding an algorithmic enhancement layer to a learned model to correct the prediction error with a small memory latency. This approach enables efficient modelling of the data distribution and resolves the local biases of a learned model at the cost of roughly one memory lookup.Open Acces

Spiral - Imperial College Digital Repository

Design and Analysis of Multidimensional Data Structures

Author: Barbaro M. B.
Caballero Carretero Juan Antonio
Giusti Carlotta
González Jiménez Raúl
Ivanov M. V.
Meucci Andrea
Udías J. M.
Publication venue: Universitat Politècnica de Catalunya
Publication date: 09/12/2004
Field of study

Aquesta tesi està dedicada al disseny i a l'anàlisi d'estructures de dades multidimensionals, és a dir, estructures de dades que serveixen per emmagatzemar registres

K

-dimensionals que solen representar-se com a punts en l'espai

[0,1]^K

. Aquestes estructures tenen aplicacions en diverses àrees de la informàtica com poden ser els sistemes d'informació geogràfica, la robòtica, el processament d'imatges, la world wide web, el data mining, entre d'altres. Les estructures de dades multidimensionals també es poden utilitzar com a indexos d'estructures de dades que emmagatzemen, possiblement en memòria externa, dades més complexes que els punts.Les estructures de dades multidimensionals han d'oferir la possibilitat de realitzar operacions d'inserció i esborrat de claus dinàmicament, a més de permetre realitzar cerques anomenades associatives. Exemples d'aquest tipus de cerques són les cerques per rangs ortogonals (quins punts cauen dintre d'un hiper-rectangle donat?) i les cerques del veí més proper (quin és el punt més proper a un punt donat?).Podem dividir les contribucions d'aquesta tesi en dues parts: La primera part està relacionada amb el disseny d'estructures de dades per a punts multidimensionals. Inclou el disseny d'arbres binaris

K

-dimensionals al·leatoritzats (Randomized

K

-d trees), el d'arbres quaternaris al·leatoritzats (Randomized quad trees) i el d'arbres multidimensionals amb punters de referència (Fingered multidimensional trees).La segona part analitza el comportament de les estructures de dades multidimensionals. En particular, s'analitza el cost mitjà de les cerques parcials en arbres

K

-dimensionals relaxats, i el de les cerques per rang en diverses estructures de dades multidimensionals. Respecte al disseny d'estructures de dades multidimensionals, proposem algorismes al·leatoritzats d'inserció i esborrat de registres per als arbres

K

-dimensionals i per als arbres quaternaris. Aquests algorismes produeixen arbres aleatoris, independentment de l'ordre d'inserció dels registres i desprès de qualsevol seqüència d'insercions i esborrats. De fet, el comportament esperat de les estructures produïdes mitjançant els algorismes al·leatoritzats és independent de la distribució de les dades d'entrada, tot i conservant la simplicitat i la flexibilitat dels arbres

K

-dimensionals i quaternaris estàndard. Introduïm també els arbres multidimensionals amb punters de referència. Això permet que les estructures multidimensionals puguin aprofitar l'anomenada localitat de referència en cerques associatives altament correlacionades.I respecte de l'anàlisi d'estructures de dades multidimensionals, primer analitzem el cost esperat de las cerques parcials en els arbres

K

-dimensionals relaxats. Seguidament utilitzem aquest resultat com a base per a l'anàlisi de les cerques per rangs ortogonals, juntament amb arguments combinatoris i geomètrics. D'aquesta manera obtenim un estimat asimptòtic precís del cost de les cerques per rangs ortogonals en els arbres

K

-dimensionals aleatoris. Finalment, mostrem que les tècniques utilitzades es poden estendre fàcilment a d'altres estructures de dades i per tant proporcionem una anàlisi exacta del cost mitjà de cerques per rang en estructures de dades com són els arbres

K

-dimensionals estàndard, els arbres quaternaris, els tries quaternaris i els tries

K

-dimensionals.Esta tesis está dedicada al diseño y al análisis de estructuras de datos multidimensionales; es decir, estructuras de datos específicas para almacenar registros

K

-dimensionales que suelen representarse como puntos en el espacio

[0,1]^K

. Estas estructuras de datos tienen aplicaciones en diversas áreas de la informática como son: los sistemas de información geográfica, la robótica, el procesamiento de imágenes, la world wide web o data mining, entre otras.Las estructuras de datos multidimensionales suelen utilizarse también como índices de estructuras que almacenan, posiblemente en memoria externa, datos complejos.Las estructuras de datos multidimensionales deben ofrecer la posibilidad de realizar operaciones de inserción y borrado de llaves de manera dinámica, pero además deben permitir realizar búsquedas asociativas en los registros almacenados. Ejemplos de búsquedas asociativas son las búsquedas por rangos ortogonales (¿qué puntos de la estructura de datos están dentro de un hiper-rectángulo dado?) y las búsquedas del vecino más cercano (¿cuál es el punto de la estructura de datos más cercano a un punto dado?).Las contribuciones de esta tesis se dividen en dos partes:La primera parte está dedicada al diseño de estructuras de datos para puntos multidimensionales, que incluye el diseño de los árboles binarios

K

-dimensionales aleatorios (Randomized

K

-d trees), el de los árboles cuaternarios aleatorios (Randomized quad trees), y el de los árboles multidimensionales con punteros de referencia (Fingered multidimensional trees).La segunda parte contiene contribuciones al análisis del comportamiento de las estructuras de datos para puntos multidimensionales. En particular, damos el análisis del costo promedio de las búsquedas parciales en los árboles

K

-dimensionales relajados y el de las búsquedas por rango en varias estructuras de datos multidimensionales.Con respecto al diseño de estructuras de datos multidimensionales, proponemos algoritmos aleatorios de inserción y borrado de registros para los árboles

K

-dimensionales y los árboles cuaternarios que producen árboles aleatorios independientemente del orden de inserción de los registros y después de cualquier secuencia de inserciones y borrados intercalados. De hecho, con la aleatorización garantizamos un buen rendimiento esperado de las estructuras de datos resultantes, que es independiente de la distribución de los datos de entrada, conservando la flexibilidad y la simplicidad de los árboles

K

-dimensionales y de los árboles cuaternarios estándar. También proponemos los árboles multidimensionales con punteros de referencia, una técnica que permite que las estructuras de datos multidimensionales exploten la localidad de referencia en búsquedas asociativas que se presentan altamente correlacionadas.Con respecto al análisis de estructuras de datos multidimensionales, comenzamos dando un análisis preciso del costo esperado de las búsquedas parciales en los árboles

K

-dimensionales relajados. A continuación, utilizamos este resultado como base para el análisis de las búsquedas por rangos ortogonales, combinándolo con argumentos combinatorios y geométricos. Como resultado obtenemos un estimado asintótico preciso del costo de las búsquedas por rango en los árboles

K

-dimensionales relajados. Finalmente, mostramos que las técnicas utilizadas pueden extenderse fácilmente a otras estructuras de datos y por tanto proporcionamos un análisis preciso del costo promedio de búsquedas por rango en estructuras de datos como los árboles

K

-dimensionales estándar, los árboles cuaternarios, los tries cuaternarios y los tries

K

-dimensionales.This thesis is about the design and analysis of point multidimensional data structures: data structures that store

K

-dimensional keys which we may abstract as points in

[0,1]^K

. These data structures are present in many applications of geographical information systems, image processing or robotics, among others. They are also frequently used as indexes of more complex data structures, possibly stored in external memory.Point multidimensional data structures must have capabilities such as insertion, deletion and (exact) search of items, but in addition they must support the so called {em associative queries}. Examples of these queries are orthogonal range queries (which are the items that fall inside a given hyper-rectangle?) and nearest neighbour queries (which is the closest item to some given point?).The contributions of this thesis are two-fold:Contributions to the design of point multidimensional data structures: the design of randomized

K

-d trees, the design of randomized quad trees and the design of fingered multidimensional search trees;Contributions to the analysis of the performance of point multidimensional data structures: the average-case analysis of partial match queries in relaxed

K

-d trees and the average-case analysis of orthogonal range queries in various multidimensional data structures.Concerning the design of randomized point multidimensional data structures, we propose randomized insertion and deletion algorithms for

K

-d trees and quad trees that produce random

K

-d trees and quad trees independently of the order in which items are inserted into them and after any sequence of interleaved insertions and deletions. The use of randomization provides expected performance guarantees, irrespective of any assumption on the data distribution, while retaining the simplicity and flexibility of standard

K

-d trees and quad trees.Also related to the design of point multidimensional data structures is the proposal of fingered multidimensional search trees, a new technique that enhances point multidimensional data structures to exploit locality of reference in associative queries.With regards to performance analysis, we start by giving a precise analysis of the cost of partial matches in randomized

K

-d trees. We use these results as a building block in our analysis of orthogonal range queries, together with combinatorial and geometric arguments and we provide a tight asymptotic estimate of the cost of orthogonal range search in randomized

K

-d trees. We finally show that the techniques used apply easily to other data structures, so we can provide an analysis of the average cost of orthogonal range search in other data structures such as standard

K

-d trees, quad trees, quad tries, and

K

-d tries

arXiv.org e-Print Archive

Tesis Doctorals en Xarxa

Ghent University Academic Bibliography

idUS. Depósito de Investigación Universidad de Sevilla