870 research outputs found
Computational Geometry Column 34
Problems presented at the open-problem session of the 14th Annual ACM
Symposium on Computational Geometry are listed
Incremental Entity Resolution from Linked Documents
In many government applications we often find that information about
entities, such as persons, are available in disparate data sources such as
passports, driving licences, bank accounts, and income tax records. Similar
scenarios are commonplace in large enterprises having multiple customer,
supplier, or partner databases. Each data source maintains different aspects of
an entity, and resolving entities based on these attributes is a well-studied
problem. However, in many cases documents in one source reference those in
others; e.g., a person may provide his driving-licence number while applying
for a passport, or vice-versa. These links define relationships between
documents of the same entity (as opposed to inter-entity relationships, which
are also often used for resolution). In this paper we describe an algorithm to
cluster documents that are highly likely to belong to the same entity by
exploiting inter-document references in addition to attribute similarity. Our
technique uses a combination of iterative graph-traversal, locality-sensitive
hashing, iterative match-merge, and graph-clustering to discover unique
entities based on a document corpus. A unique feature of our technique is that
new sets of documents can be added incrementally while having to re-resolve
only a small subset of a previously resolved entity-document collection. We
present performance and quality results on two data-sets: a real-world database
of companies and a large synthetically generated `population' database. We also
demonstrate benefit of using inter-document references for clustering in the
form of enhanced recall of documents for resolution.Comment: 15 pages, 8 figures, patented wor
An Efficient Algorithm for Computing High-Quality Paths amid Polygonal Obstacles
We study a path-planning problem amid a set of obstacles in
, in which we wish to compute a short path between two points
while also maintaining a high clearance from ; the clearance of a
point is its distance from a nearest obstacle in . Specifically,
the problem asks for a path minimizing the reciprocal of the clearance
integrated over the length of the path. We present the first polynomial-time
approximation scheme for this problem. Let be the total number of obstacle
vertices and let . Our algorithm computes in time
a path of total cost
at most times the cost of the optimal path.Comment: A preliminary version of this work appear in the Proceedings of the
27th Annual ACM-SIAM Symposium on Discrete Algorithm
On Range Searching with Semialgebraic Sets II
Let be a set of points in . We present a linear-size data
structure for answering range queries on with constant-complexity
semialgebraic sets as ranges, in time close to . It essentially
matches the performance of similar structures for simplex range searching, and,
for , significantly improves earlier solutions by the first two authors
obtained in~1994. This almost settles a long-standing open problem in range
searching.
The data structure is based on the polynomial-partitioning technique of Guth
and Katz [arXiv:1011.4105], which shows that for a parameter , , there exists a -variate polynomial of degree such that
each connected component of contains at most points
of , where is the zero set of . We present an efficient randomized
algorithm for computing such a polynomial partition, which is of independent
interest and is likely to have additional applications
Approximating Dynamic Time Warping and Edit Distance for a Pair of Point Sequences
We give the first subquadratic-time approximation schemes for dynamic time
warping (DTW) and edit distance (ED) of several natural families of point
sequences in , for any fixed . In particular, our
algorithms compute -approximations of DTW and ED in time
near-linear for point sequences drawn from k-packed or k-bounded curves, and
subquadratic for backbone sequences. Roughly speaking, a curve is
-packed if the length of its intersection with any ball of radius
is at most , and a curve is -bounded if the sub-curve
between two curve points does not go too far from the two points compared to
the distance between the two points. In backbone sequences, consecutive points
are spaced at approximately equal distances apart, and no two points lie very
close together. Recent results suggest that a subquadratic algorithm for DTW or
ED is unlikely for an arbitrary pair of point sequences even for . Our
algorithms work by constructing a small set of rectangular regions that cover
the entries of the dynamic programming table commonly used for these distance
measures. The weights of entries inside each rectangle are roughly the same, so
we are able to use efficient procedures to approximately compute the cheapest
paths through these rectangles
Weak Lensing Effect on CMB in the Presence of a Dipole Anisotropy
We investigate weak lensing effect on cosmic microwave background (CMB) in
the presence of dipole anisotropy. The approach of flat-sky approximation is
considered. We determine the functions and that
appear in expressions of the lensed CMB power spectrum in the presence of a
dipole anisotropy. We determine the correction to B-mode power spectrum which
is found to be appreciable at low multipoles (). However, the temperature
and E-mode power spectrum are not altered significantly.Comment: 9 page
- …