164 research outputs found

    On Density, Threshold and Emptiness Queries for Intervals in the Streaming Model

    Get PDF
    In this paper, we study the maximum density, threshold and emptiness queries for intervals in the streaming model. The input is a stream S of n points in the real line R and a floating closed interval W of width alpha. The specific problems we consider in this paper are as follows. - Maximum density: find a placement of W in R containing the maximum number of points of S. - Threshold query: find a placement of W in R, if it exists, that contains at least Delta elements of S. - Emptiness query: find, if possible, a placement of W within the extent of S so that the interior of W does not contain any element of S. The stream S, being huge, does not fit into main memory and can be read sequentially at most a constant number of times, usually once. The problems studied here in the geometric setting have relations to frequency estimation and heavy hitter identification in a stream of data. We provide lower bounds and results on trade-off between extra space and quality of solution. We also discuss generalizations for the higher dimensional variants for a few cases

    LIPIcs, Volume 274, ESA 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 274, ESA 2023, Complete Volum

    Doctor of Philosophy in Computing

    Get PDF
    dissertationIn the last two decades, an increasingly large amount of data has become available. Massive collections of videos, astronomical observations, social networking posts, network routing information, mobile location history and so forth are examples of real world data requiring processing for applications ranging from classi?cation to predictions. Computational resources grow at a far more constrained rate, and hence the need for ef?cient algorithms that scale well. Over the past twenty years high quality theoretical algorithms have been developed for two central problems: nearest neighbor search and dimensionality reduction over Euclidean distances in worst case distributions. These two tasks are interesting in their own right. Nearest neighbor corresponds to a database query lookup, while dimensionality reduction is a form of compression on massive data. Moreover, these are also subroutines in algorithms ranging from clustering to classi?cation. However, many highly relevant settings and distance measures have not received similar attention to that of worst case point sets in Euclidean space. The Bregman divergences include the information theoretic distances, such as entropy, of most relevance in many machine learning applications and yet prior to this dissertation lacked ef?cient dimensionality reductions, nearest neighbor algorithms, or even lower bounds on what could be possible. Furthermore, even in the Euclidean setting, theoretical algorithms do not leverage that almost all real world datasets have signi?cant low-dimensional substructure. In this dissertation, we explore different models and techniques for similarity search and dimensionality reduction. What upper bounds can be obtained for nearest neighbors for Bregman divergences? What upper bounds can be achieved for dimensionality reduction for information theoretic measures? Are these problems indeed intrinsically of harder computational complexity than in the Euclidean setting? Can we improve the state of the art nearest neighbor algorithms for real world datasets in Euclidean space? These are the questions we investigate in this dissertation, and that we shed some new insight on. In the ?rst part of our dissertation, we focus on Bregman divergences. We exhibit nearest neighbor algorithms, contingent on a distributional constraint on the datasets. We next show lower bounds suggesting that is in some sense inherent to the problem complexity. After this we explore dimensionality reduction techniques for the Jensen-Shannon and Hellinger distances, two popular information theoretic measures. In the second part, we show that even for the more well-studied Euclidean case, worst case nearest neighbor algorithms can be improved upon sharply for real world datasets with spectral structure

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    35th Symposium on Theoretical Aspects of Computer Science: STACS 2018, February 28-March 3, 2018, Caen, France

    Get PDF


    Get PDF
    Durant les quatre derniĂšres dĂ©cennies, la miniaturisation a permis la diffusion Ă  large Ă©chelle des ordinateurs, les rendant omniprĂ©sents. Aujourd’hui, le nombre d’objets connectĂ©s Ă  Internet ne cesse de croitre et cette tendance n’a pas l’air de ralentir. Ces objets, qui peuvent ĂȘtre des tĂ©lĂ©phones mobiles, des vĂ©hicules ou des senseurs, gĂ©nĂšrent de trĂšs grands volumes de donnĂ©es qui sont presque toujours associĂ©s Ă  un contexte spatiotemporel. Le volume de ces donnĂ©es est souvent si grand que leur traitement requiert la crĂ©ation de systĂšme distribuĂ©s qui impliquent la coopĂ©ration de plusieurs ordinateurs. La capacitĂ© de traiter ces donnĂ©es revĂȘt une importance sociĂ©tale. Par exemple: les donnĂ©es collectĂ©es lors de trajets en voiture permettent aujourd’hui d’éviter les em-bouteillages ou de partager son vĂ©hicule. Un autre exemple: dans un avenir proche, les donnĂ©es collectĂ©es Ă  l’aide de gyroscopes capables de dĂ©tecter les trous dans la chaussĂ©e permettront de mieux planifier les interventions de maintenance Ă  effectuer sur le rĂ©seau routier. Les domaines d’applications sont par consĂ©quent nombreux, de mĂȘme que les problĂšmes qui y sont associĂ©s. Les articles qui composent cette thĂšse traitent de systĂšmes qui partagent deux caractĂ©ristiques clĂ©s: un contexte spatiotemporel et une architecture dĂ©centralisĂ©e. De plus, les systĂšmes dĂ©crits dans ces articles s’articulent autours de trois axes temporels: le prĂ©sent, le passĂ©, et le futur. Les systĂšmes axĂ©s sur le prĂ©sent permettent Ă  un trĂšs grand nombre d’objets connectĂ©s de communiquer en fonction d’un contexte spatial avec des temps de rĂ©ponses proche du temps rĂ©el. Nos contributions dans ce domaine permettent Ă  ce type de systĂšme dĂ©centralisĂ© de s’adapter au volume de donnĂ©e Ă  traiter en s’étendant sur du matĂ©riel bon marchĂ©. Les systĂšmes axĂ©s sur le passĂ© ont pour but de faciliter l’accĂšs a de trĂšs grands volumes donnĂ©es spatiotemporelles collectĂ©es par des objets connectĂ©s. En d’autres termes, il s’agit d’indexer des trajectoires et d’exploiter ces indexes. Nos contributions dans ce domaine permettent de traiter des jeux de trajectoires particuliĂšrement denses, ce qui n’avait pas Ă©tĂ© fait auparavant. Enfin, les systĂšmes axĂ©s sur le futur utilisent les trajectoires passĂ©es pour prĂ©dire les trajectoires que des objets connectĂ©s suivront dans l’avenir. Nos contributions permettent de prĂ©dire les trajectoires suivies par des objets connectĂ©s avec une granularitĂ© jusque lĂ  inĂ©galĂ©e. Bien qu’impliquant des domaines diffĂ©rents, ces contributions s’articulent autour de dĂ©nominateurs communs des systĂšmes sous-jacents, ouvrant la possibilitĂ© de pouvoir traiter ces problĂšmes avec plus de gĂ©nĂ©ricitĂ© dans un avenir proche. -- During the past four decades, due to miniaturization computing devices have become ubiquitous and pervasive. Today, the number of objects connected to the Internet is in- creasing at a rapid pace and this trend does not seem to be slowing down. These objects, which can be smartphones, vehicles, or any kind of sensors, generate large amounts of data that are almost always associated with a spatio-temporal context. The amount of this data is often so large that their processing requires the creation of a distributed system, which involves the cooperation of several computers. The ability to process these data is important for society. For example: the data collected during car journeys already makes it possible to avoid traffic jams or to know about the need to organize a carpool. Another example: in the near future, the maintenance interventions to be carried out on the road network will be planned with data collected using gyroscopes that detect potholes. The application domains are therefore numerous, as are the prob- lems associated with them. The articles that make up this thesis deal with systems that share two key characteristics: a spatio-temporal context and a decentralized architec- ture. In addition, the systems described in these articles revolve around three temporal perspectives: the present, the past, and the future. Systems associated with the present perspective enable a very large number of connected objects to communicate in near real-time, according to a spatial context. Our contributions in this area enable this type of decentralized system to be scaled-out on commodity hardware, i.e., to adapt as the volume of data that arrives in the system increases. Systems associated with the past perspective, often referred to as trajectory indexes, are intended for the access to the large volume of spatio-temporal data collected by connected objects. Our contributions in this area makes it possible to handle particularly dense trajectory datasets, a problem that has not been addressed previously. Finally, systems associated with the future per- spective rely on past trajectories to predict the trajectories that the connected objects will follow. Our contributions predict the trajectories followed by connected objects with a previously unmet granularity. Although involving different domains, these con- tributions are structured around the common denominators of the underlying systems, which opens the possibility of being able to deal with these problems more generically in the near future

    36th International Symposium on Theoretical Aspects of Computer Science: STACS 2019, March 13-16, 2019, Berlin, Germany

    Get PDF
