Search CORE

27 research outputs found

Approximating the Distribution of the Median and other Robust Estimators on Uncertain Data

Author: Buchin Kevin
Phillips Jeff M.
Tang Pingfan
Publication venue
Publication date: 01/01/2018
Field of study

Robust estimators, like the median of a point set, are important for data analysis in the presence of outliers. We study robust estimators for locationally uncertain points with discrete distributions. That is, each point in a data set has a discrete probability distribution describing its location. The probabilistic nature of uncertain data makes it challenging to compute such estimators, since the true value of the estimator is now described by a distribution rather than a single point. We show how to construct and estimate the distribution of the median of a point set. Building the approximate support of the distribution takes near-linear time, and assigning probability to that support takes quadratic time. We also develop a general approximation technique for distributions of robust estimators with respect to ranges with bounded VC dimension. This includes the geometric median for high dimensions and the Siegel estimator for linear regression.Comment: Full version of a paper to appear at SoCG 201

arXiv.org e-Print Archive

Pure OAI Repository

Dagstuhl Research Online Publication Server

Convex Hulls under Uncertainty

Author: Agarwal Pankaj K.
Har-Peled Sariel
Suri Subhash
Yildiz Hakan
Zhang Wuzhou
Publication venue
Publication date: 01/01/2014
Field of study

We study the convex-hull problem in a probabilistic setting, motivated by the need to handle data uncertainty inherent in many applications, including sensor databases, location-based services and computer vision. In our framework, the uncertainty of each input site is described by a probability distribution over a finite number of possible locations including a \emph{null} location to account for non-existence of the point. Our results include both exact and approximation algorithms for computing the probability of a query point lying inside the convex hull of the input, time-space tradeoffs for the membership queries, a connection between Tukey depth and membership queries, as well as a new notion of \some-hull that may be a useful representation of uncertain hulls

arXiv.org e-Print Archive

CiteSeerX

Crossref

OpenMETU (Middle East Technical University)

Uncertain Curve Simplification

Author: Buchin Kevin
Popov Aleksandr
Roeloffzen Marcel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 46th International Symposium on Mathematical Foundations of Computer Science (MFCS 2021)
Publication date: 01/01/2021
Field of study

We study the problem of polygonal curve simplification under uncertainty, where instead of a sequence of exact points, each uncertain point is represented by a region, which contains the (unknown) true location of the vertex. The regions we consider are disks, line segments, convex polygons, and discrete sets of points. We are interested in finding the shortest subsequence of uncertain points such that no matter what the true location of each uncertain point is, the resulting polygonal curve is a valid simplification of the original polygonal curve under the Hausdorff or the Fr\'echet distance. For both these distance measures, we present polynomial-time algorithms for this problem.Comment: 25 pages, 5 figure

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

On the expected diameter, width, and complexity of a stochastic convex-hull

Author: A Jørgensen
C Li
L Huang
M Löffler
P Kamousi
PK Agarwal
PK Agarwal
S Suri
S Suri
Publication venue
Publication date: 01/05/2017
Field of study

We investigate several computational problems related to the stochastic convex hull (SCH). Given a stochastic dataset consisting of

n

points in

\mathbb{R}^d

each of which has an existence probability, a SCH refers to the convex hull of a realization of the dataset, i.e., a random sample including each point with its existence probability. We are interested in computing certain expected statistics of a SCH, including diameter, width, and combinatorial complexity. For diameter, we establish the first deterministic 1.633-approximation algorithm with a time complexity polynomial in both

n

and

d

. For width, two approximation algorithms are provided: a deterministic

O(1)

-approximation running in

O(n^{d+1} \log n)

time, and a fully polynomial-time randomized approximation scheme (FPRAS). For combinatorial complexity, we propose an exact

O(n^d)

-time algorithm. Our solutions exploit many geometric insights in Euclidean space, some of which might be of independent interest

arXiv.org e-Print Archive

Crossref

From Proximity to Utility: A Voronoi Partition of Pareto Optima

Author: Chang Hsien-Chih
Har-Peled Sariel
Raichel Benjamin
Publication venue
Publication date: 01/01/2014
Field of study

We present an extension of Voronoi diagrams where when considering which site a client is going to use, in addition to the site distances, other site attributes are also considered (for example, prices or weights). A cell in this diagram is then the locus of all clients that consider the same set of sites to be relevant. In particular, the precise site a client might use from this candidate set depends on parameters that might change between usages, and the candidate set lists all of the relevant sites. The resulting diagram is significantly more expressive than Voronoi diagrams, but naturally has the drawback that its complexity, even in the plane, might be quite high. Nevertheless, we show that if the attributes of the sites are drawn from the same distribution (note that the locations are fixed), then the expected complexity of the candidate diagram is near linear. To this end, we derive several new technical results, which are of independent interest. In particular, we provide a high-probability, asymptotically optimal bound on the number of Pareto optima points in a point set uniformly sampled from the

d

-dimensional hypercube. To do so we revisit the classical backward analysis technique, both simplifying and improving relevant results in order to achieve the high-probability bounds

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

Querying Probabilistic Neighborhoods in Spatial Data Sets Efficiently

Author: D Krioukov
H Samet
H-P Kriegel
HW Hethcote
L Arge
M Looz von
R Aldecoa
V Batagelj
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/08/2016
Field of study

\newcommand{\dist}{\operatorname{dist}}

In this paper we define the notion of a probabilistic neighborhood in spatial data: Let a set

P

n

points in

\mathbb{R}^d

, a query point

q \in \mathbb{R}^d

, a distance metric \dist, and a monotonically decreasing function

f : \mathbb{R}^+ \rightarrow [0,1]

be given. Then a point

p \in P

belongs to the probabilistic neighborhood

N(q, f)

q

with respect to

f

with probability f(\dist(p,q)). We envision applications in facility location, sensor networks, and other scenarios where a connection between two entities becomes less likely with increasing distance. A straightforward query algorithm would determine a probabilistic neighborhood in

\Theta(n\cdot d)

time by probing each point in

P

. To answer the query in sublinear time for the planar case, we augment a quadtree suitably and design a corresponding query algorithm. Our theoretical analysis shows that -- for certain distributions of planar

P

-- our algorithm answers a query in

O((|N(q,f)| + \sqrt{n})\log n)

time with high probability (whp). This matches up to a logarithmic factor the cost induced by quadtree-based algorithms for deterministic queries and is asymptotically faster than the straightforward approach whenever

|N(q,f)| \in o(n / \log n)

. As practical proofs of concept we use two applications, one in the Euclidean and one in the hyperbolic plane. In particular, our results yield the first generator for random hyperbolic graphs with arbitrary temperatures in subquadratic time. Moreover, our experimental data show the usefulness of our algorithm even if the point distribution is unknown or not uniform: The running time savings over the pairwise probing approach constitute at least one order of magnitude already for a modest number of points and queries.Comment: The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-44543-4_3

arXiv.org e-Print Archive

Crossref

Non-zero probability of nearest neighbor searching

Author: A. Mesrikhani
M. Davoodi
Publication venue: 'International Ocean Discovery Program (IODP)'
Publication date: 01/03/2017
Field of study

Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, such as tracking and locating services, GIS and data mining, it is possible both of them are imprecise. So, in this situation, a natural way to handle the issue is to report the data have a nonzero probability —called nonzero nearest neighbor— to be the nearest neighbor of a given query. Formally, let P be a set of n uncertain points modeled by some regions. We first consider the following variation of NN searching problem under uncertainty. If both the query and the data are uncertain points modeled by distinct unit segments parallel to the x-axis, we propose an efficient algorithm that reports nonzero nearest neighbors under Manhattan metric in O(n^2 α(n^2 )) preprocessing and O(log⁡n+k) query time, where α(.) is the extremely slowly growing functional inverse of Ackermann’s function. Finally, for the arbitrarily length segments parallel to the x-axis, we propose an approximation algorithm that reports nonzero nearest neighbor with maximum error L in O(n^2 α(n^2 )) preprocessing and O(log⁡n+k) query time, where L is the length of the query

Directory of Open Access Journals