235 research outputs found

    A limit process for partial match queries in random quadtrees and 22-d trees

    Full text link
    We consider the problem of recovering items matching a partially specified pattern in multidimensional trees (quadtrees and kk-d trees). We assume the traditional model where the data consist of independent and uniform points in the unit square. For this model, in a structure on nn points, it is known that the number of nodes Cn(ξ)C_n(\xi ) to visit in order to report the items matching a random query ξ\xi, independent and uniformly distributed on [0,1][0,1], satisfies E[Cn(ξ)]κnβ\mathbf {E}[{C_n(\xi )}]\sim\kappa n^{\beta}, where κ\kappa and β\beta are explicit constants. We develop an approach based on the analysis of the cost Cn(s)C_n(s) of any fixed query s[0,1]s\in[0,1], and give precise estimates for the variance and limit distribution of the cost Cn(x)C_n(x). Our results permit us to describe a limit process for the costs Cn(x)C_n(x) as xx varies in [0,1][0,1]; one of the consequences is that E[maxx[0,1]Cn(x)]γnβ\mathbf {E}[{\max_{x\in[0,1]}C_n(x)}]\sim \gamma n^{\beta}; this settles a question of Devroye [Pers. Comm., 2000].Comment: Published in at http://dx.doi.org/10.1214/12-AAP912 the Annals of Applied Probability (http://www.imstat.org/aap/) by the Institute of Mathematical Statistics (http://www.imstat.org). arXiv admin note: text overlap with arXiv:1107.223

    A limit field for orthogonal range searches in two-dimensional random point search trees

    Get PDF
    We consider the cost of general orthogonal range queries in random quadtrees. The cost of a given query is encoded into a (random) function of four variables which characterize the coordinates of two opposite corners of the query rectangle. We prove that, when suitably shifted and rescaled, the random cost function converges uniformly in probability towards a random field that is characterized as the unique solution to a distributional fixed-point equation. We also state similar results for 22-d trees. Our results imply for instance that the worst case query satisfies the same asymptotic estimates as a typical query, and thereby resolve an old question of Chanzy, Devroye and Zamora-Cura [\emph{Acta Inf.}, 37:355--383, 2000]Comment: 24 pages, 8 figure

    Partial Match Queries in Two-Dimensional Quadtrees : a Probabilistic Approach

    Full text link
    We analyze the mean cost of the partial match queries in random two-dimensional quadtrees. The method is based on fragmentation theory. The convergence is guaranteed by a coupling argument of Markov chains, whereas the value of the limit is computed as the fixed point of an integral equation

    On the cost of fixed partial match queries in K-d trees

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/s00453-015-0097-4Partial match queries constitute the most basic type of associative queries in multidimensional data structures such as K-d trees or quadtrees. Given a query q=(q0,…,qK-1) where s of the coordinates are specified and K-s are left unspecified (qi=*), a partial match search returns the subset of data points x=(x0,…,xK-1) in the data structure that match the given query, that is, the data points such that xi=qi whenever qi¿*. There exists a wealth of results about the cost of partial match searches in many different multidimensional data structures, but most of these results deal with random queries. Only recently a few papers have begun to investigate the cost of partial match queries with a fixed query q. This paper represents a new contribution in this direction, giving a detailed asymptotic estimate of the expected cost Pn,q for a given fixed query q. From previous results on the cost of partial matches with a fixed query and the ones presented here, a deeper understanding is emerging, uncovering the following functional shape for Pn,q Pn,q=¿·(¿i:qi is specifiedqi(1-qi))a/2·na+l.o.t. (l.o.t. lower order terms, throughout this work) in many multidimensional data structures, which differ only in the exponent a and the constant ¿, both dependent on s and K, and, for some data structures, on the whole pattern of specified and unspecified coordinates in q as well. Although it is tempting to conjecture that this functional shape is “universal”, we have shown experimentally that it seems not to be true for a variant of K-d trees called squarish K-d trees.Peer ReviewedPostprint (author's final draft

    Fixed partial match queries in quadtrees

    Get PDF
    Several recent papers in the literature have addressed the analysis of the cost P_{n,q} of partial match search for a given fixed query q - that has s out of K specified coordinates - in different multidimensional data structures. Indeed, detailed asymptotic estimates for the main term in the expected cost P_{n,q} = E {P_{n,q}} in standard and relaxed K-d trees are known (for any dimension K and any number s of specified coordinates), as well as stronger distributional results on P_{n,q} for standard 2-d trees and 2-dimensional quadtrees. In this work we derive a precise asymptotic estimate for the main order term of P_{n,q} in quadtrees, for any values of K and s, 0 infty exists, where alpha is the exponent of n in the expected cost of a random partial match query with s specified coordinates in a random K-dimensional quadtree.Peer ReviewedPostprint (published version

    Analysis of partial match queries in multidimensional search trees

    Get PDF
    A la portada diu "Article-based thesis". Tesi amb diferents seccions retallades per dret de l'editor.The main contribution of this thesis is to deepen and generalize previous work done in the average-case analysis of partial match queries in several types of multidimensional search trees. In particular, our focus has been the analysis of fixed PM queries. Our results about them generalize previous results which covered the case where only one coordinate is specified in the PM query- and for any dimension-or the case of 2-dimensional data structures. Using a combinatorial approach, different to the probabilistic approaches used by other researchers, we obtain asymptotic formulas for the expected cost of fixed PM queries in relaxed and standard K-d trees. We establish that, in both cases, the expected cost satisfies a common pattern in the relationship with the expected cost of random PM queries. Moreover, the same pattern appeared in the analysis, previously done by other researchers, of the expected cost of fixed partial match in 2-dimensional quad trees. Those results led us to conjecture that such formula would be pervasive to describe the expected cost of partial match queries in many different multidimensional trees, assuming some additional technical conditions about the family of multidimensional search trees under consideration. Indeed, we prove this to be the case also for K-dimensional quad trees. However, we disprove that conjecture for a new variant of K-d trees with local balancing that we define: relaxed K-dt trees. We analyze the expected cost of random PM queries and fixed PM queries in them and, while we do not find a closed-form expression for the expected cost of xed PM queries, we prove that it cannot be of the same form that we had conjectured. For random PM queries in both relaxed and standard K-dt trees, we obtain two very general results that unify several specific results that appear scattered across the literature. Finally, we also analyze random PM queries in quad-K-d trees -a generalization of both quad trees and K-d trees- and obtain a very general result that includes as particular cases previous results in relaxed K-d trees and quad trees.La principal contribución de esta tesis es profundizar y generalizar resultados anteriores referentes al análisis en caso medio de búsquedas parciales en varios tipos de árboles multidimensionales de búsqueda. En particular nos enfocamos en el análisis de búsquedas parciales fijas. Nuestros resultados sobre ellas generalizan resultados previos que cubren el caso donde solamente una coordenada es especificada en la búsqueda parcial-y para cualquier dimensión-o el caso de estructuras de datos de dos dimensiones. Usando un enfoque combinatorio, diferente a los enfoques probabilísticos utilizados por otros investigadores, obtenemos fórmulas asintóticas para el costo esperado de búsquedas parciales fijas en árboles K-d relajados y estándares. Establecemos que, en ambos casos, el costo esperado satisface un patrón común en la relación con el costo esperado de búsquedas parciales aleatorias. Además, el mismo patrón apareció en el análisis, previamente hecho por otros investigadores, del costo esperado de búsquedas parciales fijas en quadtrees de dos dimensiones. Esos resultados nos llevaron a conjeturar que tal fómula sería generalizada para describir el costo esperado de consultas de búsqueda parcial en muchos árboles multidimensionales diferentes, asumiendo algunas condiciones técnicas adicionales sobre la familia de árboles multidimensionales de búsqueda bajo consideración. De hecho, demostramos que este también es el caso en quadtrees de K dimensiones. Sin embargo, definimos una nueva variante de árboles K-d con reorganizacion local que cumplen tales condiciones, los árboles K-dt relajados, analizamos el costo esperado de búsquedas parciales aleatorias y fijas en ellos y, aunque no encontramos una expresión cerrada para el coste esperado de las búsquedas parciales fijas, demostramos que no puede ser de la misma forma que habíamos conjeturado. También obtenemos dos resultados muy generales para busquedas parciales aleatorias en árboles K-dt relajados y estándares que unifican varios resultados específicos que aparecen dispersos en la literatura. Finalmente, analizamos búsquedas parciales aleatorias en una generalizacion de quadtrees y árboles K-d, llamada árboles quad-K-d, y obtenemos un resultado general que incluye como casos particulares resultados previos en árboles K-d relajados y quadtrees.Són moltes les aplicacions en què es requereix administrar col·leccions de dades multidimensionals, en les quals cada objecte és identificat per un punt en un espai real o abstracte; un exemple paradigmàtics són els sistemes d’informació geogràfica. Aquestes aplicacions fan servir sovint estructures de dades multidimensionals que permetin consultes associatives -aquelles on s'especifiquen condicions per a més d'una coordenada- a més de les operacions tradicionals d’inserció, actualització, eliminació i cerca exacta. Un dels principals tipus de consultes associatives és la cerca parcial, on només s'especifiquen algunes coordenades i l'objectiu és determinar quins objectes coincideixen amb elles. Les consultes de cerca parcial són particularment importants perquè la seva anàlisi forma la base de l’anàlisi d'altres tipus de consultes associatives, com ara les cerques per rangs ortogonals (quins punts estan dins d'una àrea (hiper)rectangular donada?), les consultes per regió (per exemple, donats un punt i una distància, quins punts estan a aquesta distància o menys d'aquest punt?) o les consultes del veí més proper (on cal trobar els k punts més propers a un punt donat). En aquesta tesi analitzem en profunditat el rendiment mitjà de les cerques parcials en arbres multidimensionals de cerca representatius, els quals constitueixen una subclasse significativa de les estructures de dades multidimensionals. Els arbres multidimensionals de cerca, en particular els quadtrees i els arbres K-d, van ser definits a mitjans de la dècada dels anys 1970 com una generalització dels arbres binaris de cerca. Les consultes de cerca parcial s'hi responen realitzant un recorregut recursiu d'alguns subarbres. Durant molts anys l’anàlisi en arbres multidimensionals de cerca es va fer amb la suposició important, i sovint implícita, que en cada crida recursiva es generen a l'atzar noves coordenades de la consulta de cerca parcial. La raó d'aquesta suposició simplificadora va ser que, per als costos mitjans, aquesta anàlisi és equivalent a analitzar el rendiment de l'algorisme de cerca parcial quan l'entrada és una consulta de cerca parcial aleatòria. A principis d'aquesta dècada, alguns equips van començar a analitzar el cas mitjà de cerques parcials sense aquesta suposició: les coordenades especificades de la consulta romanen fixes durant totes les crides recursives. Aquestes consultes s'anomenen cerques parcials fixes. L'objectiu d'aquest enfocament recent és analitzar el rendiment de l'algorisme de cerca parcial, però ara les quantitats d’interès depenen de la consulta particular q donada com a entrada. L’anàlisi de cerques parcials fixes, juntament amb el de les aleatòries -que té un paper important per a l’anàlisi de les primeres- ens dóna una descripció molt detallada i precisa del rendiment de l'algorisme de cerca parcial que podria ser estesa a altres consultes associatives rellevants. La principal contribució d'aquesta tesi és aprofundir i generalitzar resultats previs referents a l’anàlisi en cas mitjà de les cerques parcials en diversos tipus d'arbres multidimensionals de cerca. En particular ens enfoquem en l’anàlisi de les cerques parcials fixes. Els nostres resultats en generalitzen resultats previs els quals cobreixen el cas on només una coordenada està especificada a la cerca parcial i per a qualsevol dimensió no el cas d'estructures de dades de dues dimensions. Usant un enfocament combinatori, diferent als enfocaments probabilístics utilitzats per altres investigadors, obtenim fórmules asimptòtiques per al cost esperat de cerques parcials fixes en arbres K-d relaxats i estàndards. Establim que, en tots dos casos, el cost esperat satisfà un patró comú en la relació amb el cost esperat de cerques parcials aleatòries. A més, el mateix patró va aparèixer en l’anàlisi, prèviament fet per altres investigadors, del cost esperat de cerques parcials fixes en quadtrees de dues dimensions. Aquests resultats ens van portar a conjecturar que tal fórmula seria general per descriure el cost esperat de consultes de cerca parcial en molts arbres multidimensionals diferents, assumint algunes condicions tècniques addicionals sobre la família d'arbres multidimensionals de cerca sota consideració. De fet, demostrem que aquest és també el cas pels quadtrees de K dimensions. Tanmateix, definim una nova variant de arbres K-d amb equilibri local que compleixen aquestes condicions, els arbres K-dt relaxats, n'analitzem el cost esperat de cerques parcials aleatòries i fixes i, tot i no trobar una expressió tancada per al cost esperat de les cerques parcials fixes, demostrem que no pot ser de la mateixa forma que havíem conjecturat. També obtenim dos resultats molt generals per a les cerques parcials aleatòries en arbres K-dt relaxats i estàndards, els quals unifiquen diversos resultats específics que apareixen dispersos a la literatura. Finalment, analitzem cerques parcials aleatòries en una generalització de quadtrees i arbres K-d, anomenada arbres quad-K-d, i obtenim un resultat general que inclou com a casos particulars resultats previs en arbres K-d relaxats i quadtreesPostprint (published version

    On a functional contraction method

    Get PDF
    In den letzten zwanzig Jahren hat sich die Kontraktionsmethode als ein wesentlicher Zugang zu Problemen der Konvergenz in Verteilung von Folgen von Zufallsvariablen, die additiven Rekurrenzen genügen, herausgestellt. Dabei beschränkten sich ihre Anwendungen zunächst auf reellwertige Zufallsvariablen, in den letzten Jahren wurde die Methode allerdings auch für komplexere Wertebereiche, wie etwa Hilberträume entwickelt. Basierend auf der Klasse der Zolotarev-Metriken, die in den siebziger Jahren eingeführt wurden, entwickeln wir die Methode im Rahmen von Banachräumen und präzisieren sie in den Fällen von stetigen resp. cadlag Funktionen auf dem Einheitsintervall. Wir formulieren ausreichende Bedingungen an die unter Betrachtung stehende Folge und deren möglichen Grenzwert, welcher eine stochastische Fixpunktgleichung erfüllt, die es erlauben, in Anwendungen funktionale Grenzwertsätze zu beweisen. Im Weiteren präsentieren wir als Anwendung zunächst einen neuen Beweis vom klassischen Invarianzprinzip nach Donsker, der auf additiven Rekursionen beruht. Außerdem wenden wir die Methode zur Analyse der Komplexität von partiellen Suchproblemen in zweidimensionalen Quadrantenbäumen und 2-d Bäumen an. Diese grundlegenden Datenstrukturen werden seit ihrer Einführung in den siebziger Jahren viel studiert. Unsere Ergebnisse liefern Antworten auf Fragen, die seit den Pionierarbeiten von Flajolet et al. in den achtziger und neunziger Jahren auf diesem Gebiet unbeantwortet blieben. Wir erwarten, dass die von uns entwickelte funktionale Kontraktionsmethode in den nächsten Jahren zur Lösung weiterer Fragen des asymptotischen Verhaltens von Zufallsgrößen, die additive Rekursionen erfüllen, beitragen wird.Within the last twenty years, the contraction method has turned out to be a fruitful approach to distributional convergence of sequences of random variables which obey additive recurrences. It was mainly invented for applications in the real-valued framework; however, in recent years, more complex state spaces such as Hilbert spaces have been under consideration. Based upon the family of Zolotarev metrics which were introduced in the late seventies, we develop the method in the context of Banach spaces and work it out in detail in the case of continuous resp. cadlag functions on the unit interval. We formulate sufficient conditions for both the sequence under consideration and its possible limit which satisfies a stochastic fixed-point equation, that allow to deduce functional limit theorems in applications. As a first application we present a new and considerably short proof of the classical invariance principle due to Donsker. It is based on a recursive decomposition. Moreover, we apply the method in the analysis of the complexity of partial match queries in two-dimensional search trees such as quadtrees and 2-d trees. These important data structures have been under heavy investigation since their invention in the seventies. Our results give answers to problems that have been left open in the pioneering work of Flajolet et al. in the eighties and nineties. We expect that the functional contraction method will significantly contribute to solutions for similar problems involving additive recursions in the following years

    Partial match queries in quad-K-d trees

    Get PDF
    Quad-K-d trees [Bereckzy et al., 2014] are a generalization of several well-known hierarchical Kdimensional data structures. They were introduced to provide a unified framework for the analysis of associative queries and to investigate the trade-offs between the cost of different operations and the memory needs (each node of a quad-K-d tree has arity 2 m for some m, 1 ≤ m ≤ K). Indeed, we consider here partial match – one of the fundamental associative queries – for several families of quad-K-d trees including, among others, relaxed K-d trees and quadtrees. In particular, we prove that the expected cost of a random partial match Pˆn that has s out of K specified coordinates in a random quad-K-d tree of size n is Pˆn ∼ β · n α where α and β are constants given in terms of K and s as well as additional parameters that characterize the specific family of quad-K-d trees under consideration. Additionally, we derive a precise asymptotic estimate for the main order term of Pn,q – the expected cost of a fixed partial match in a random quad-K-d tree of size n. The techniques and procedures used to derive the mentioned costs extend those already successfully applied to derive analogous results in quadtrees and relaxed K-d trees; our results show that the previous results are just particular cases, and states the validity of the conjecture made in [Duch et al., 2016] to a wider variety of multidimensional data structures.This work has been supported by funds from the MOTION Project (Project PID2020-112581GB-C21) of the Spanish Ministery of Science and Innovation MCIN/AEI/10.13039/501100011033.Peer ReviewedPostprint (published version

    On a functional contraction method

    Get PDF
    Methods for proving functional limit laws are developed for sequences of stochastic processes which allow a recursive distributional decomposition either in time or space. Our approach is an extension of the so-called contraction method to the space C[0,1]\mathcal{C}[0,1] of continuous functions endowed with uniform topology and the space D[0,1]\mathcal {D}[0,1] of c\`{a}dl\`{a}g functions with the Skorokhod topology. The contraction method originated from the probabilistic analysis of algorithms and random trees where characteristics satisfy natural distributional recurrences. It is based on stochastic fixed-point equations, where probability metrics can be used to obtain contraction properties and allow the application of Banach's fixed-point theorem. We develop the use of the Zolotarev metrics on the spaces C[0,1]\mathcal{C}[0,1] and D[0,1]\mathcal{D}[0,1] in this context. Applications are given, in particular, a short proof of Donsker's functional limit theorem is derived and recurrences arising in the probabilistic analysis of algorithms are discussed.Comment: Published at http://dx.doi.org/10.1214/14-AOP919 in the Annals of Probability (http://www.imstat.org/aop/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore