16 research outputs found
Efficient Algorithms for the Closest Pair Problem and Applications
The closest pair problem (CPP) is one of the well studied and fundamental
problems in computing. Given a set of points in a metric space, the problem is
to identify the pair of closest points. Another closely related problem is the
fixed radius nearest neighbors problem (FRNNP). Given a set of points and a
radius , the problem is, for every input point , to identify all the
other input points that are within a distance of from . A naive
deterministic algorithm can solve these problems in quadratic time. CPP as well
as FRNNP play a vital role in computational biology, computational finance,
share market analysis, weather prediction, entomology, electro cardiograph,
N-body simulations, molecular simulations, etc. As a result, any improvements
made in solving CPP and FRNNP will have immediate implications for the solution
of numerous problems in these domains. We live in an era of big data and
processing these data take large amounts of time. Speeding up data processing
algorithms is thus much more essential now than ever before. In this paper we
present algorithms for CPP and FRNNP that improve (in theory and/or practice)
the best-known algorithms reported in the literature for CPP and FRNNP. These
algorithms also improve the best-known algorithms for related applications
including time series motif mining and the two locus problem in Genome Wide
Association Studies (GWAS)
Approximation Algorithms for Confidence Bands for Time Series
Confidence intervals are a standard technique for analyzing data. When applied to time series, confidence intervals are computed for each time point separately. Alternatively, we can compute confidence bands, where we are required to find the smallest area enveloping k time series, where k is a user parameter. Confidence bands can be then used to detect abnormal time series, not just individual observations within the time series. We will show that despite being an NP-hard problem it is possible to find optimal confidence band for some k. We do this by considering a different problem: discovering regularized bands, where we minimize the envelope area minus the number of included time series weighted by a parameter a. Unlike normal confidence bands we can solve the problem exactly by using a minimum cut. By varying a we can obtain solutions for various k. If we have a constraint k for which we cannot find appropriate a, we demonstrate a simple algorithm that yields O(root n) approximation guarantee by connecting the problem to a minimum k-union problem. This connection also implies that we cannot approximate the problem better than O (n(1/4)) under some (mild) assumptions. Finally, we consider a variant where instead of minimizing the area we minimize the maximum width. Here, we demonstrate a simple 2-approximation algorithm and show that we cannot achieve better approximation guarantee.Peer reviewe
Approximation Algorithms for Confidence Bands for Time Series
Confidence intervals are a standard technique for analyzing data. When applied to time series, confidence intervals are computed for each time point separately. Alternatively, we can compute confidence bands, where we are required to find the smallest area enveloping k time series, where k is a user parameter. Confidence bands can be then used to detect abnormal time series, not just individual observations within the time series. We will show that despite being an NP-hard problem it is possible to find optimal confidence band for some k. We do this by considering a different problem: discovering regularized bands, where we minimize the envelope area minus the number of included time series weighted by a parameter a. Unlike normal confidence bands we can solve the problem exactly by using a minimum cut. By varying a we can obtain solutions for various k. If we have a constraint k for which we cannot find appropriate a, we demonstrate a simple algorithm that yields O(root n) approximation guarantee by connecting the problem to a minimum k-union problem. This connection also implies that we cannot approximate the problem better than O (n(1/4)) under some (mild) assumptions. Finally, we consider a variant where instead of minimizing the area we minimize the maximum width. Here, we demonstrate a simple 2-approximation algorithm and show that we cannot achieve better approximation guarantee.Peer reviewe
Convex Hull of Points Lying on Lines in o(n log n) Time after Preprocessing
Motivated by the desire to cope with data imprecision, we study methods for
taking advantage of preliminary information about point sets in order to speed
up the computation of certain structures associated with them.
In particular, we study the following problem: given a set L of n lines in
the plane, we wish to preprocess L such that later, upon receiving a set P of n
points, each of which lies on a distinct line of L, we can construct the convex
hull of P efficiently. We show that in quadratic time and space it is possible
to construct a data structure on L that enables us to compute the convex hull
of any such point set P in O(n alpha(n) log* n) expected time. If we further
assume that the points are "oblivious" with respect to the data structure, the
running time improves to O(n alpha(n)). The analysis applies almost verbatim
when L is a set of line-segments, and yields similar asymptotic bounds. We
present several extensions, including a trade-off between space and query time
and an output-sensitive algorithm. We also study the "dual problem" where we
show how to efficiently compute the (<= k)-level of n lines in the plane, each
of which lies on a distinct point (given in advance).
We complement our results by Omega(n log n) lower bounds under the algebraic
computation tree model for several related problems, including sorting a set of
points (according to, say, their x-order), each of which lies on a given line
known in advance. Therefore, the convex hull problem under our setting is
easier than sorting, contrary to the "standard" convex hull and sorting
problems, in which the two problems require Theta(n log n) steps in the worst
case (under the algebraic computation tree model).Comment: 26 pages, 5 figures, 1 appendix; a preliminary version appeared at
SoCG 201
Net and Prune: A Linear Time Algorithm for Euclidean Distance Problems
We provide a general framework for getting expected linear time constant
factor approximations (and in many cases FPTAS's) to several well known
problems in Computational Geometry, such as -center clustering and farthest
nearest neighbor. The new approach is robust to variations in the input
problem, and yet it is simple, elegant and practical. In particular, many of
these well studied problems which fit easily into our framework, either
previously had no linear time approximation algorithm, or required rather
involved algorithms and analysis. A short list of the problems we consider
include farthest nearest neighbor, -center clustering, smallest disk
enclosing points, th largest distance, th smallest -nearest
neighbor distance, th heaviest edge in the MST and other spanning forest
type problems, problems involving upward closed set systems, and more. Finally,
we show how to extend our framework such that the linear running time bound
holds with high probability
NcorpiN : A software for N-body integration in collisional and fragmenting systems
NcorpiN is a -body software developed for the time-efficient
integration of collisional and fragmenting systems of planetesimals or moonlets
orbiting a central mass. It features a fragmentation model, based on crater
scaling and ejecta models, able to realistically simulate a violent impact. The
user of NcorpiN can choose between four different built-in modules
to compute self-gravity and detect collisions. One of these makes use of a
mesh-based algorithm to treat mutual interactions in time.
Another module, much more efficient than the standard Barnes-Hut tree code, is
a tree-based algorithm called FalcON. It relies on fast
multipole expansion for gravity computation and we adapted it to collision
detection as well. Computation time is reduced by building the tree structure
using a three-dimensional Hilbert curve. For the same precision in mutual
gravity computation, NcorpiN is found to be up to 25 times faster
than the famous software REBOUND. NcorpiN is written entirely in
the C language and only needs a C compiler to run. A python add-on, that
requires only basic python libraries, produces animations of the simulations
from the output files. The name NcorpiN, reminding of a scorpion,
comes from the French -corps, meaning -body, and from the mathematical
notation , due to the running time of the software being almost
linear in the total number of moonlets. NcorpiN is designed
for the study of accreting or fragmenting disks of planetesimal or moonlets. It
detects collisions and computes mutual gravity faster than REBOUND, and unlike
other -body integrators, it can resolve a collision by fragmentation. The
fast multipole expansions are implemented up to order six to allow for a high
precision in mutual gravity computation.Comment: 29 pages, 6 figure