1,463 research outputs found

    Deterministic Sampling and Range Counting in Geometric Data Streams

    Get PDF
    We present memory-efficient deterministic algorithms for constructing epsilon-nets and epsilon-approximations of streams of geometric data. Unlike probabilistic approaches, these deterministic samples provide guaranteed bounds on their approximation factors. We show how our deterministic samples can be used to answer approximate online iceberg geometric queries on data streams. We use these techniques to approximate several robust statistics of geometric data streams, including Tukey depth, simplicial depth, regression depth, the Thiel-Sen estimator, and the least median of squares. Our algorithms use only a polylogarithmic amount of memory, provided the desired approximation factors are inverse-polylogarithmic. We also include a lower bound for non-iceberg geometric queries.Comment: 12 pages, 1 figur

    Sweep-Line Extensions to the Multiple Object Intersection Problem: Methods and Applications in Graph Mining

    Get PDF
    Identifying and quantifying the size of multiple overlapping axis-aligned geometric objects is an essential computational geometry problem. The ability to solve this problem can effectively inform a number of spatial data mining methods and can provide support in decision making for a variety of critical applications. The state-of-the-art approach for addressing such problems resorts to an algorithmic paradigm, collectively known as the sweep-line or plane sweep algorithm. However, its application inherits a number of limitations including lack of versatility and lack of support for ad hoc intersection queries. With these limitations in mind, we design and implement a novel, exact, fast and scalable yet versatile, sweep-line based algorithm, named SLIG. The key idea of our algorithm lies in constructing an auxiliary data structure when the sweep line algorithm is applied, an intersection graph. This graph can effectively be used to provide connectivity properties among overlapping objects and to inform answers to ad hoc intersection queries. It can also be employed to find the location and size of the common area of multiple overlapping objects. SLIG performs significantly faster than classic sweep-line based algorithms, it is more versatile, and provides a suite of powerful querying capabilities. To demonstrate the versatility of our SLIG algorithm we show how it can be utilized for evaluating the importance of nodes in a trajectory network - a type of dynamic network where the nodes are moving objects (cars, pedestrians, etc.) and the edges represent interactions (contacts) between objects as defined by a proximity threshold. The key observation to address the problem is that the time intervals of these interactions can be represented as 1-dimensional axis-aligned geometric objects. Then, a variant of our SLIG algorithm, named SLOT, is utilized that effectively computes the metrics of interest, including node degree, triangle membership and connected components for each node, over time

    Hidden surface removal for rectangles

    Get PDF
    AbstractA simple but important special case of the hidden surface removal problem is one in which the scene consists of n rectangles with sides parallel to the x- and y-axes, with viewpoint at z=∞ (that is, an orthographic projection). This special case has application to overlapping windows in computer displays. An algorithm with running time O(n log n + k log n) is given for static scenes, where k is the number of line segments in the output. Algorithms are given for a dynamic setting (that is, rectangles may be inserted and deleted) that take time O(log2n log log n + k log2 n) per insert or delete, where k is now the number of visible line segments that change (appear or disappear). Algorithms for point location in the visible scene are also given

    A Heterogeneous High Performance Computing Framework For Ill-Structured Spatial Join Processing

    Get PDF
    The frequently employed spatial join processing over two large layers of polygonal datasets to detect cross-layer polygon pairs (CPP) satisfying a join-predicate faces challenges common to ill-structured sparse problems, namely, that of identifying the few intersecting cross-layer edges out of the quadratic universe. The algorithmic engineering challenge is compounded by GPGPU SIMT architecture. Spatial join involves lightweight filter phase typically using overlap test over minimum bounding rectangles (MBRs) to discard majority of CPPs, followed by refinement phase to rigorously test the join predicate over the edges of the surviving CPPs. In this dissertation, we develop new techniques - algorithms, data structure, i/o, load balancing and system implementation - to accelerate the two-phase spatial-join processing. We present a new filtering technique, called Common MBR Filter (CMF), which changes the overall characteristic of the spatial join algorithms wherein the refinement phase is no longer the computational bottleneck. CMF is designed based on the insight that intersecting cross-layer edges must lie within the rectangular intersection of the MBRs of CPPs, their common MBRs (CMBR). We also address a key limitation of CMF for class of spatial datasets with either large or dense active CMBRs by extended CMF, called CMF-grid, that effectively employs both CMBR and grid techniques by embedding a uniform grid over CMBR of each CPP, but of suitably engineered sizes for different CPPs. To show efficiency of CMF-based filters, extensive mathematical and experimental analysis is provided. Then, two GPU-based spatial join systems are proposed based on two CMF versions including four components: 1) sort-based MBR filter, 2) CMF/CMF-grid, 3) point-in-polygon test, and, 4) edge-intersection test. The systems show two orders of magnitude speedup over the optimized sequential GEOS C++ library. Furthermore, we present a distributed system of heterogeneous compute nodes to exploit GPU-CPU computing in order to scale up the computation. A load balancing model based on Integer Linear Programming (ILP) is formulated for this system. We also provide three heuristic algorithms to approximate the ILP. Finally, we develop MPI-cuda-GIS system based on this heterogeneous computing model by integrating our CUDA-based GPU system into a newly designed distributed framework designed based on Message Passing Interface (MPI). Experimental results show good scalability and performance of MPI-cuda-GIS system

    Efficient Analysis in Multimedia Databases

    Get PDF
    The rapid progress of digital technology has led to a situation where computers have become ubiquitous tools. Now we can find them in almost every environment, be it industrial or even private. With ever increasing performance computers assumed more and more vital tasks in engineering, climate and environmental research, medicine and the content industry. Previously, these tasks could only be accomplished by spending enormous amounts of time and money. By using digital sensor devices, like earth observation satellites, genome sequencers or video cameras, the amount and complexity of data with a spatial or temporal relation has gown enormously. This has led to new challenges for the data analysis and requires the use of modern multimedia databases. This thesis aims at developing efficient techniques for the analysis of complex multimedia objects such as CAD data, time series and videos. It is assumed that the data is modeled by commonly used representations. For example CAD data is represented as a set of voxels, audio and video data is represented as multi-represented, multi-dimensional time series. The main part of this thesis focuses on finding efficient methods for collision queries of complex spatial objects. One way to speed up those queries is to employ a cost-based decompositioning, which uses interval groups to approximate a spatial object. For example, this technique can be used for the Digital Mock-Up (DMU) process, which helps engineers to ensure short product cycles. This thesis defines and discusses a new similarity measure for time series called threshold-similarity. Two time series are considered similar if they expose a similar behavior regarding the transgression of a given threshold value. Another part of the thesis is concerned with the efficient calculation of reverse k-nearest neighbor (RkNN) queries in general metric spaces using conservative and progressive approximations. The aim of such RkNN queries is to determine the impact of single objects on the whole database. At the end, the thesis deals with video retrieval and hierarchical genre classification of music using multiple representations. The practical relevance of the discussed genre classification approach is highlighted with a prototype tool that helps the user to organize large music collections. Both the efficiency and the effectiveness of the presented techniques are thoroughly analyzed. The benefits over traditional approaches are shown by evaluating the new methods on real-world test datasets

    On the expected size of the 2d visibility complex

    Get PDF
    We study the expected size of the 2D visibility complex of randomly distributed objects in the plane. We prove that the asymptotic expected number of free bitangents (which correspond to 0-faces of the visibility complex) among unit discs (or polygons of bounded aspect ratio and similar size) is linear and exhibit bounds in terms of the density of the objects. We also make an experimental assessment of the size of the visibility complex for disjoint random unit discs. We provide experimental estimates of the onset of the linear behavior and of the asymptotic slope and y-intercept of the number of free bitangents in terms of the density of discs. Finally, we analyze the quality of our estimates in terms of the density of discs.

    Efficient bulk-loading methods for temporal and multidimensional index structures

    Get PDF
    Nahezu alle naturwissenschaftlichen Bereiche profitieren von neuesten Analyse- und Verarbeitungsmethoden fĂŒr große Datenmengen. Diese Verfahren setzten eine effiziente Verarbeitung von geo- und zeitbezogenen Daten voraus, da die Zeit und die Position wichtige Attribute vieler Daten sind. Die effiziente Anfrageverarbeitung wird insbesondere durch den Einsatz von Indexstrukturen ermöglicht. Im Fokus dieser Arbeit liegen zwei Indexstrukturen: Multiversion B-Baum (MVBT) und R-Baum. Die erste Struktur wird fĂŒr die Verwaltung von zeitbehafteten Daten, die zweite fĂŒr die Indexierung von mehrdimensionalen Rechteckdaten eingesetzt. StĂ€ndig- und schnellwachsendes Datenvolumen stellt eine große Herausforderung an die Informatik dar. Der Aufbau und das Aktualisieren von Indexen mit herkömmlichen Methoden (Datensatz fĂŒr Datensatz) ist nicht mehr effizient. Um zeitnahe und kosteneffiziente Datenverarbeitung zu ermöglichen, werden Verfahren zum schnellen Laden von Indexstrukturen dringend benötigt. Im ersten Teil der Arbeit widmen wir uns der Frage, ob es ein Verfahren fĂŒr das Laden von MVBT existiert, das die gleiche I/O-KomplexitĂ€t wie das externe Sortieren besitz. Bis jetzt blieb diese Frage unbeantwortet. In dieser Arbeit haben wir eine neue Kostruktionsmethode entwickelt und haben gezeigt, dass diese gleiche ZeitkomplexitĂ€t wie das externe Sortieren besitzt. Dabei haben wir zwei algorithmische Techniken eingesetzt: Gewichts-Balancierung und Puffer-BĂ€ume. Unsere Experimenten zeigen, dass das Resultat nicht nur theoretischer Bedeutung ist. Im zweiten Teil der Arbeit beschĂ€ftigen wir uns mit der Frage, ob und wie statistische Informationen ĂŒber Geo-Anfragen ausgenutzt werden können, um die Anfrageperformanz von R-BĂ€umen zu verbessern. Unsere neue Methode verwendet Informationen wie SeitenverhĂ€ltnis und SeitenlĂ€ngen eines reprĂ€sentativen Anfragerechtecks, um einen guten R-Baum bezĂŒglich eines hĂ€ufig eingesetzten Kostenmodells aufzubauen. Falls diese Informationen nicht verfĂŒgbar sind, optimieren wir R-BĂ€ume bezĂŒglich der Summe der Volumina von minimal umgebenden Rechtecken der Blattknoten. Da das Problem des Aufbaus von optimalen R-BĂ€umen bezĂŒglich dieses Kostenmaßes NP-hart ist, fĂŒhren wir zunĂ€chst das Problem auf ein eindimensionales Partitionierungsproblem zurĂŒck, indem wir die Daten bezĂŒglich optimierte raumfĂŒllende Kurven sortieren. Dann lösen wir dieses Problem durch Einsatz vom dynamischen Programmieren. Die I/O-KomplexitĂ€t des Verfahrens ist gleich der von externem Sortieren, da die I/O-Laufzeit der Methode durch die Laufzeit des Sortierens dominiert wird. Im letzten Teil der Arbeit haben wir die entwickelten Partitionierungsvefahren fĂŒr den Aufbau von Geo-Histogrammen eingesetzt, da diese Ă€hnlich zu R-BĂ€umen eine disjunkte Partitionierung des Raums erzeugen. Ergebnisse von intensiven Experimenten zeigen, dass sich unter Verwendung von neuen Partitionierungstechniken sowohl R-BĂ€ume mit besserer Anfrageperformanz als auch Geo-Histogrammen mit besserer SchĂ€tzqualitĂ€t im Vergleich zu Konkurrenzverfahren generieren lassen

    I/O-Efficient Planar Range Skyline and Attrition Priority Queues

    Full text link
    In the planar range skyline reporting problem, we store a set P of n 2D points in a structure such that, given a query rectangle Q = [a_1, a_2] x [b_1, b_2], the maxima (a.k.a. skyline) of P \cap Q can be reported efficiently. The query is 3-sided if an edge of Q is grounded, giving rise to two variants: top-open (b_2 = \infty) and left-open (a_1 = -\infty) queries. All our results are in external memory under the O(n/B) space budget, for both the static and dynamic settings: * For static P, we give structures that answer top-open queries in O(log_B n + k/B), O(loglog_B U + k/B), and O(1 + k/B) I/Os when the universe is R^2, a U x U grid, and a rank space grid [O(n)]^2, respectively (where k is the number of reported points). The query complexity is optimal in all cases. * We show that the left-open case is harder, such that any linear-size structure must incur \Omega((n/B)^e + k/B) I/Os for a query. We show that this case is as difficult as the general 4-sided queries, for which we give a static structure with the optimal query cost O((n/B)^e + k/B). * We give a dynamic structure that supports top-open queries in O(log_2B^e (n/B) + k/B^1-e) I/Os, and updates in O(log_2B^e (n/B)) I/Os, for any e satisfying 0 \le e \le 1. This leads to a dynamic structure for 4-sided queries with optimal query cost O((n/B)^e + k/B), and amortized update cost O(log (n/B)). As a contribution of independent interest, we propose an I/O-efficient version of the fundamental structure priority queue with attrition (PQA). Our PQA supports FindMin, DeleteMin, and InsertAndAttrite all in O(1) worst case I/Os, and O(1/B) amortized I/Os per operation. We also add the new CatenateAndAttrite operation that catenates two PQAs in O(1) worst case and O(1/B) amortized I/Os. This operation is a non-trivial extension to the classic PQA of Sundar, even in internal memory.Comment: Appeared at PODS 2013, New York, 19 pages, 10 figures. arXiv admin note: text overlap with arXiv:1208.4511, arXiv:1207.234
    • 

    corecore