127,446 research outputs found
On Geometric Range Searching, Approximate Counting and Depth Problems
In this thesis we deal with problems connected to range searching,
which is one of the central areas of computational geometry.
The dominant problems in this area are
halfspace range searching, simplex range searching and orthogonal range searching and
research into these problems has spanned decades.
For many range searching problems, the best possible
data structures cannot offer fast (i.e., polylogarithmic) query
times if we limit ourselves to near linear storage.
Even worse, it is conjectured (and proved in some cases)
that only very small improvements to these might be possible.
This inefficiency has encouraged many researchers to seek alternatives through approximations.
In this thesis we continue this line of research and focus on
relative approximation of range counting problems.
One important problem where it is possible to achieve significant speedup
through approximation is halfspace range counting in 3D.
Here we continue the previous research done
and obtain the first optimal data structure for approximate halfspace range counting in 3D.
Our data structure has the slight advantage of being Las Vegas (the result is always correct) in contrast
to the previous methods that were Monte Carlo (the correctness holds with high probability).
Another series of problems where approximation can provide us with
substantial speedup comes from robust statistics.
We recognize three problems here:
approximate Tukey depth, regression depth and simplicial depth queries.
In 2D, we obtain an optimal data structure capable of approximating
the regression depth of a query hyperplane.
We also offer a linear space data structure which can answer approximate
Tukey depth queries efficiently in 3D.
These data structures are obtained by applying our ideas for the
approximate halfspace counting problem.
Approximating the simplicial depth turns out to be much more
difficult, however.
Computing the simplicial depth of a given point is more computationally
challenging than most other definitions of data depth.
In 2D we obtain the first data structure which uses near linear space
and can answer approximate simplicial depth queries in polylogarithmic time.
As applications of this result, we provide two non-trivial methods to
approximate the simplicial depth of a given point in higher dimension.
Along the way, we establish a tight combinatorial relationship between
the Tukey depth of any given point and its simplicial depth.
Another problem investigated in this thesis is the dominance reporting problem,
an important special case of orthogonal range reporting.
In three dimensions, we solve this
problem in the pointer machine model and the external memory model
by offering the first optimal data structures in these models of computation.
Also, in the RAM model and for points from
an integer grid we reduce the space complexity of the fastest
known data structure to optimal.
Using known techniques in the literature, we can use our
results to obtain solutions for the orthogonal range searching problem as well.
The query complexity offered by our orthogonal range reporting data structures
match the most efficient query complexities
known in the literature but our space bounds are lower than the previous methods in the external
memory model and RAM model where the input is a subset of an integer grid.
The results also yield improved orthogonal range searching in
higher dimensions (which shows the significance
of the dominance reporting problem).
Intersection searching is a generalization of range searching where
we deal with more complicated geometric objects instead of points.
We investigate the rectilinear disjoint polygon counting problem
which is a specialized intersection counting problem.
We provide a linear-size data structure capable of counting
the number of disjoint rectilinear polygons
intersecting any rectilinear polygon of constant size.
The query time (as well as some other properties of our data structure) resembles
the classical simplex range searching data structures
Deterministic Sampling and Range Counting in Geometric Data Streams
We present memory-efficient deterministic algorithms for constructing
epsilon-nets and epsilon-approximations of streams of geometric data. Unlike
probabilistic approaches, these deterministic samples provide guaranteed bounds
on their approximation factors. We show how our deterministic samples can be
used to answer approximate online iceberg geometric queries on data streams. We
use these techniques to approximate several robust statistics of geometric data
streams, including Tukey depth, simplicial depth, regression depth, the
Thiel-Sen estimator, and the least median of squares. Our algorithms use only a
polylogarithmic amount of memory, provided the desired approximation factors
are inverse-polylogarithmic. We also include a lower bound for non-iceberg
geometric queries.Comment: 12 pages, 1 figur
A Simple FPTAS for Counting Edge Covers
An edge cover of a graph is a set of edges such that every vertex has at
least an adjacent edge in it. Previously, approximation algorithm for counting
edge covers is only known for 3 regular graphs and it is randomized. We design
a very simple deterministic fully polynomial-time approximation scheme (FPTAS)
for counting the number of edge covers for any graph. Our main technique is
correlation decay, which is a powerful tool to design FPTAS for counting
problems. In order to get FPTAS for general graphs without degree bound, we
make use of a stronger notion called computationally efficient correlation
decay, which is introduced in [Li, Lu, Yin SODA 2012].Comment: To appear in SODA 201
A Few Photons Among Many: Unmixing Signal and Noise for Photon-Efficient Active Imaging
Conventional LIDAR systems require hundreds or thousands of photon detections
to form accurate depth and reflectivity images. Recent photon-efficient
computational imaging methods are remarkably effective with only 1.0 to 3.0
detected photons per pixel, but they are not demonstrated at
signal-to-background ratio (SBR) below 1.0 because their imaging accuracies
degrade significantly in the presence of high background noise. We introduce a
new approach to depth and reflectivity estimation that focuses on unmixing
contributions from signal and noise sources. At each pixel in an image,
short-duration range gates are adaptively determined and applied to remove
detections likely to be due to noise. For pixels with too few detections to
perform this censoring accurately, we borrow data from neighboring pixels to
improve depth estimates, where the neighborhood formation is also adaptive to
scene content. Algorithm performance is demonstrated on experimental data at
varying levels of noise. Results show improved performance of both reflectivity
and depth estimates over state-of-the-art methods, especially at low
signal-to-background ratios. In particular, accurate imaging is demonstrated
with SBR as low as 0.04. This validation of a photon-efficient, noise-tolerant
method demonstrates the viability of rapid, long-range, and low-power LIDAR
imaging
Statistical structures for internet-scale data management
Efficient query processing in traditional database management systems relies on statistics on base data. For centralized systems, there is a rich body of research results on such statistics, from simple aggregates to more elaborate synopses such as sketches and histograms. For Internet-scale distributed systems, on the other hand, statistics management still poses major challenges. With the work in this paper we aim to endow peer-to-peer data management over structured overlays with the power associated with such statistical information, with emphasis on meeting the scalability challenge. To this end, we first contribute efficient, accurate, and decentralized algorithms that can compute key aggregates such as Count, CountDistinct, Sum, and Average. We show how to construct several types of histograms, such as simple Equi-Width, Average-Shifted Equi-Width, and Equi-Depth histograms. We present a full-fledged open-source implementation of these tools for distributed statistical synopses, and report on a comprehensive experimental performance evaluation, evaluating our contributions in terms of efficiency, accuracy, and scalability
- …