1,463 research outputs found
Deterministic Sampling and Range Counting in Geometric Data Streams
We present memory-efficient deterministic algorithms for constructing
epsilon-nets and epsilon-approximations of streams of geometric data. Unlike
probabilistic approaches, these deterministic samples provide guaranteed bounds
on their approximation factors. We show how our deterministic samples can be
used to answer approximate online iceberg geometric queries on data streams. We
use these techniques to approximate several robust statistics of geometric data
streams, including Tukey depth, simplicial depth, regression depth, the
Thiel-Sen estimator, and the least median of squares. Our algorithms use only a
polylogarithmic amount of memory, provided the desired approximation factors
are inverse-polylogarithmic. We also include a lower bound for non-iceberg
geometric queries.Comment: 12 pages, 1 figur
Sweep-Line Extensions to the Multiple Object Intersection Problem: Methods and Applications in Graph Mining
Identifying and quantifying the size of multiple overlapping axis-aligned geometric objects is an essential computational geometry problem. The ability to solve this problem can effectively inform a number of spatial data mining methods and can provide support in decision making for a variety of critical applications. The state-of-the-art approach for addressing such problems resorts to an algorithmic paradigm, collectively known as the sweep-line or plane sweep algorithm. However, its application inherits a number of limitations including lack of versatility and lack of support for ad hoc intersection queries. With these limitations in mind, we design and implement a novel, exact, fast and scalable yet versatile, sweep-line based algorithm, named SLIG. The key idea of our algorithm lies in constructing an auxiliary data structure when the sweep line algorithm is applied, an intersection graph. This graph can effectively be used to provide connectivity properties among overlapping objects and to inform answers to ad hoc intersection queries. It can also be employed to find the location and size of the common area of multiple overlapping objects. SLIG performs significantly faster than classic sweep-line based algorithms, it is more versatile, and provides a suite of powerful querying capabilities.
To demonstrate the versatility of our SLIG algorithm we show how it can be utilized for evaluating the importance of nodes in a trajectory network - a type of dynamic network where the nodes are moving objects (cars, pedestrians, etc.) and the edges represent interactions (contacts) between objects as defined by a proximity threshold. The key observation to address the problem is that the time intervals of these interactions can be represented as 1-dimensional axis-aligned geometric objects. Then, a variant of our SLIG algorithm, named SLOT, is utilized that effectively computes the metrics of interest, including node degree, triangle membership and connected components for each node, over time
Hidden surface removal for rectangles
AbstractA simple but important special case of the hidden surface removal problem is one in which the scene consists of n rectangles with sides parallel to the x- and y-axes, with viewpoint at z=â (that is, an orthographic projection). This special case has application to overlapping windows in computer displays. An algorithm with running time O(n log n + k log n) is given for static scenes, where k is the number of line segments in the output. Algorithms are given for a dynamic setting (that is, rectangles may be inserted and deleted) that take time O(log2n log log n + k log2 n) per insert or delete, where k is now the number of visible line segments that change (appear or disappear). Algorithms for point location in the visible scene are also given
A Heterogeneous High Performance Computing Framework For Ill-Structured Spatial Join Processing
The frequently employed spatial join processing over two large layers of polygonal datasets to detect cross-layer polygon pairs (CPP) satisfying a join-predicate faces challenges common to ill-structured sparse problems, namely, that of identifying the few intersecting cross-layer edges out of the quadratic universe. The algorithmic engineering challenge is compounded by GPGPU SIMT architecture. Spatial join involves lightweight filter phase typically using overlap test over minimum bounding rectangles (MBRs) to discard majority of CPPs, followed by refinement phase to rigorously test the join predicate over the edges of the surviving CPPs. In this dissertation, we develop new techniques - algorithms, data structure, i/o, load balancing and system implementation - to accelerate the two-phase spatial-join processing. We present a new filtering technique, called Common MBR Filter (CMF), which changes the overall characteristic of the spatial join algorithms wherein the refinement phase is no longer the computational bottleneck. CMF is designed based on the insight that intersecting cross-layer edges must lie within the rectangular intersection of the MBRs of CPPs, their common MBRs (CMBR). We also address a key limitation of CMF for class of spatial datasets with either large or dense active CMBRs by extended CMF, called CMF-grid, that effectively employs both CMBR and grid techniques by embedding a uniform grid over CMBR of each CPP, but of suitably engineered sizes for different CPPs. To show efficiency of CMF-based filters, extensive mathematical and experimental analysis is provided. Then, two GPU-based spatial join systems are proposed based on two CMF versions including four components: 1) sort-based MBR filter, 2) CMF/CMF-grid, 3) point-in-polygon test, and, 4) edge-intersection test. The systems show two orders of magnitude speedup over the optimized sequential GEOS C++ library. Furthermore, we present a distributed system of heterogeneous compute nodes to exploit GPU-CPU computing in order to scale up the computation. A load balancing model based on Integer Linear Programming (ILP) is formulated for this system. We also provide three heuristic algorithms to approximate the ILP. Finally, we develop MPI-cuda-GIS system based on this heterogeneous computing model by integrating our CUDA-based GPU system into a newly designed distributed framework designed based on Message Passing Interface (MPI). Experimental results show good scalability and performance of MPI-cuda-GIS system
Efficient Analysis in Multimedia Databases
The rapid progress of digital technology has led to a situation
where computers have become ubiquitous tools. Now we can find them
in almost every environment, be it industrial or even private. With
ever increasing performance computers assumed more and more vital
tasks in engineering, climate and environmental research, medicine
and the content industry. Previously, these tasks could only be
accomplished by spending enormous amounts of time and money. By
using digital sensor devices, like earth observation satellites,
genome sequencers or video cameras, the amount and complexity of
data with a spatial or temporal relation has gown enormously. This
has led to new challenges for the data analysis and requires the use
of modern multimedia databases.
This thesis aims at developing efficient techniques for the analysis
of complex multimedia objects such as CAD data, time series and
videos. It is assumed that the data is modeled by commonly used
representations. For example CAD data is represented as a set of
voxels, audio and video data is represented as multi-represented,
multi-dimensional time series.
The main part of this thesis focuses on finding efficient methods
for collision queries of complex spatial objects. One way to speed
up those queries is to employ a cost-based decompositioning,
which uses interval groups to approximate a spatial object. For
example, this technique can be used for the Digital Mock-Up (DMU)
process, which helps engineers to ensure short product cycles. This
thesis defines and discusses a new similarity measure for time
series called threshold-similarity. Two time series are
considered similar if they expose a similar behavior regarding the
transgression of a given threshold value. Another part of the thesis
is concerned with the efficient calculation of reverse
k-nearest neighbor (RkNN) queries in general metric spaces
using conservative and progressive approximations. The aim of such
RkNN queries is to determine the impact of single objects on the
whole database. At the end, the thesis deals with video
retrieval and hierarchical genre classification of music
using multiple representations. The practical relevance of the
discussed genre classification approach is highlighted with a
prototype tool that helps the user to organize large music
collections.
Both the efficiency and the effectiveness of the presented
techniques are thoroughly analyzed. The benefits over traditional
approaches are shown by evaluating the new methods on real-world
test datasets
On the expected size of the 2d visibility complex
We study the expected size of the 2D visibility complex of randomly distributed objects in the plane. We prove that the asymptotic expected number of free bitangents (which correspond to 0-faces of the visibility complex) among unit discs (or polygons of bounded aspect ratio and similar size) is linear and exhibit bounds in terms of the density of the objects. We also make an experimental assessment of the size of the visibility complex for disjoint random unit discs. We provide experimental estimates of the onset of the linear behavior and of the asymptotic slope and y-intercept of the number of free bitangents in terms of the density of discs. Finally, we analyze the quality of our estimates in terms of the density of discs.
Efficient bulk-loading methods for temporal and multidimensional index structures
Nahezu alle naturwissenschaftlichen Bereiche profitieren von neuesten Analyse- und Verarbeitungsmethoden fĂŒr groĂe Datenmengen. Diese Verfahren setzten eine effiziente Verarbeitung von geo- und zeitbezogenen Daten voraus, da die Zeit und die Position wichtige Attribute vieler Daten
sind. Die effiziente Anfrageverarbeitung wird insbesondere durch den Einsatz von Indexstrukturen
ermöglicht. Im Fokus dieser Arbeit liegen zwei Indexstrukturen: Multiversion B-Baum
(MVBT) und R-Baum. Die erste Struktur wird fĂŒr die Verwaltung von zeitbehafteten Daten,
die zweite fĂŒr die Indexierung von mehrdimensionalen Rechteckdaten eingesetzt.
StĂ€ndig- und schnellwachsendes Datenvolumen stellt eine groĂe Herausforderung an die Informatik
dar. Der Aufbau und das Aktualisieren von Indexen mit herkömmlichen Methoden (Datensatz
fĂŒr Datensatz) ist nicht mehr effizient. Um zeitnahe und kosteneffiziente Datenverarbeitung
zu ermöglichen, werden Verfahren zum schnellen Laden von Indexstrukturen dringend benötigt.
Im ersten Teil der Arbeit widmen wir uns der Frage, ob es ein Verfahren fĂŒr das Laden von MVBT
existiert, das die gleiche I/O-KomplexitÀt wie das externe Sortieren besitz. Bis jetzt blieb diese
Frage unbeantwortet. In dieser Arbeit haben wir eine neue Kostruktionsmethode entwickelt und
haben gezeigt, dass diese gleiche ZeitkomplexitÀt wie das externe Sortieren besitzt. Dabei haben
wir zwei algorithmische Techniken eingesetzt: Gewichts-Balancierung und Puffer-BĂ€ume. Unsere
Experimenten zeigen, dass das Resultat nicht nur theoretischer Bedeutung ist.
Im zweiten Teil der Arbeit beschÀftigen wir uns mit der Frage, ob und wie statistische Informationen
ĂŒber Geo-Anfragen ausgenutzt werden können, um die Anfrageperformanz von R-BĂ€umen zu
verbessern. Unsere neue Methode verwendet Informationen wie SeitenverhÀltnis und SeitenlÀngen
eines reprĂ€sentativen Anfragerechtecks, um einen guten R-Baum bezĂŒglich eines hĂ€ufig eingesetzten
Kostenmodells aufzubauen. Falls diese Informationen nicht verfĂŒgbar sind, optimieren
wir R-BĂ€ume bezĂŒglich der Summe der Volumina von minimal umgebenden Rechtecken der Blattknoten.
Da das Problem des Aufbaus von optimalen R-BĂ€umen bezĂŒglich dieses KostenmaĂes
NP-hart ist, fĂŒhren wir zunĂ€chst das Problem auf ein eindimensionales Partitionierungsproblem
zurĂŒck, indem wir die Daten bezĂŒglich optimierte raumfĂŒllende Kurven sortieren. Dann lösen
wir dieses Problem durch Einsatz vom dynamischen Programmieren. Die I/O-KomplexitÀt des
Verfahrens ist gleich der von externem Sortieren, da die I/O-Laufzeit der Methode durch die
Laufzeit des Sortierens dominiert wird.
Im letzten Teil der Arbeit haben wir die entwickelten Partitionierungsvefahren fĂŒr den Aufbau
von Geo-Histogrammen eingesetzt, da diese Àhnlich zu R-BÀumen eine disjunkte Partitionierung
des Raums erzeugen. Ergebnisse von intensiven Experimenten zeigen, dass sich unter Verwendung
von neuen Partitionierungstechniken sowohl R-BĂ€ume mit besserer Anfrageperformanz als
auch Geo-Histogrammen mit besserer SchÀtzqualitÀt im Vergleich zu Konkurrenzverfahren generieren
lassen
I/O-Efficient Planar Range Skyline and Attrition Priority Queues
In the planar range skyline reporting problem, we store a set P of n 2D
points in a structure such that, given a query rectangle Q = [a_1, a_2] x [b_1,
b_2], the maxima (a.k.a. skyline) of P \cap Q can be reported efficiently. The
query is 3-sided if an edge of Q is grounded, giving rise to two variants:
top-open (b_2 = \infty) and left-open (a_1 = -\infty) queries.
All our results are in external memory under the O(n/B) space budget, for
both the static and dynamic settings:
* For static P, we give structures that answer top-open queries in O(log_B n
+ k/B), O(loglog_B U + k/B), and O(1 + k/B) I/Os when the universe is R^2, a U
x U grid, and a rank space grid [O(n)]^2, respectively (where k is the number
of reported points). The query complexity is optimal in all cases.
* We show that the left-open case is harder, such that any linear-size
structure must incur \Omega((n/B)^e + k/B) I/Os for a query. We show that this
case is as difficult as the general 4-sided queries, for which we give a static
structure with the optimal query cost O((n/B)^e + k/B).
* We give a dynamic structure that supports top-open queries in O(log_2B^e
(n/B) + k/B^1-e) I/Os, and updates in O(log_2B^e (n/B)) I/Os, for any e
satisfying 0 \le e \le 1. This leads to a dynamic structure for 4-sided queries
with optimal query cost O((n/B)^e + k/B), and amortized update cost O(log
(n/B)).
As a contribution of independent interest, we propose an I/O-efficient
version of the fundamental structure priority queue with attrition (PQA). Our
PQA supports FindMin, DeleteMin, and InsertAndAttrite all in O(1) worst case
I/Os, and O(1/B) amortized I/Os per operation.
We also add the new CatenateAndAttrite operation that catenates two PQAs in
O(1) worst case and O(1/B) amortized I/Os. This operation is a non-trivial
extension to the classic PQA of Sundar, even in internal memory.Comment: Appeared at PODS 2013, New York, 19 pages, 10 figures. arXiv admin
note: text overlap with arXiv:1208.4511, arXiv:1207.234
- âŠ