29,037 research outputs found
Self-improving Algorithms for Coordinate-wise Maxima
Computing the coordinate-wise maxima of a planar point set is a classic and
well-studied problem in computational geometry. We give an algorithm for this
problem in the \emph{self-improving setting}. We have (unknown) independent
distributions \cD_1, \cD_2, ..., \cD_n of planar points. An input pointset
is generated by taking an independent sample from
each \cD_i, so the input distribution \cD is the product \prod_i \cD_i. A
self-improving algorithm repeatedly gets input sets from the distribution \cD
(which is \emph{a priori} unknown) and tries to optimize its running time for
\cD. Our algorithm uses the first few inputs to learn salient features of the
distribution, and then becomes an optimal algorithm for distribution \cD. Let
\OPT_\cD denote the expected depth of an \emph{optimal} linear comparison
tree computing the maxima for distribution \cD. Our algorithm eventually has
an expected running time of O(\text{OPT}_\cD + n), even though it did not
know \cD to begin with.
Our result requires new tools to understand linear comparison trees for
computing maxima. We show how to convert general linear comparison trees to
very restricted versions, which can then be related to the running time of our
algorithm. An interesting feature of our algorithm is an interleaved search,
where the algorithm tries to determine the likeliest point to be maximal with
minimal computation. This allows the running time to be truly optimal for the
distribution \cD.Comment: To appear in Symposium of Computational Geometry 2012 (17 pages, 2
figures
Recommended from our members
RGFGA: An efficient representation and crossover for grouping genetic algorithms
There is substantial research into genetic algorithms that are used to group large numbers of
objects into mutually exclusive subsets based upon some fitness function. However, nearly all
methods involve degeneracy to some degree.
We introduce a new representation for grouping genetic algorithms, the restricted growth function
genetic algorithm, that effectively removes all degeneracy, resulting in a more efficient search. A new crossover operator is also described that exploits a measure of similarity between chromosomes in a population. Using several synthetic datasets, we compare the performance of our representation and crossover with another well known state-of-the-art GA method, a strawman
optimisation method and a well-established statistical clustering algorithm, with encouraging results
RRR: Rank-Regret Representative
Selecting the best items in a dataset is a common task in data exploration.
However, the concept of "best" lies in the eyes of the beholder: different
users may consider different attributes more important, and hence arrive at
different rankings. Nevertheless, one can remove "dominated" items and create a
"representative" subset of the data set, comprising the "best items" in it. A
Pareto-optimal representative is guaranteed to contain the best item of each
possible ranking, but it can be almost as big as the full data. Representative
can be found if we relax the requirement to include the best item for every
possible user, and instead just limit the users' "regret". Existing work
defines regret as the loss in score by limiting consideration to the
representative instead of the full data set, for any chosen ranking function.
However, the score is often not a meaningful number and users may not
understand its absolute value. Sometimes small ranges in score can include
large fractions of the data set. In contrast, users do understand the notion of
rank ordering. Therefore, alternatively, we consider the position of the items
in the ranked list for defining the regret and propose the {\em rank-regret
representative} as the minimal subset of the data containing at least one of
the top- of any possible ranking function. This problem is NP-complete. We
use the geometric interpretation of items to bound their ranks on ranges of
functions and to utilize combinatorial geometry notions for developing
effective and efficient approximation algorithms for the problem. Experiments
on real datasets demonstrate that we can efficiently find small subsets with
small rank-regrets
Phase-space structures II: Hierarchical Structure Finder
A new multi-dimensional Hierarchical Structure Finder (HSF) to study the
phase-space structure of dark matter in N-body cosmological simulations is
presented. The algorithm depends mainly on two parameters, which control the
level of connectivity of the detected structures and their significance
compared to Poisson noise. By working in 6D phase-space, where contrasts are
much more pronounced than in 3D position space, our HSF algorithm is capable of
detecting subhaloes including their tidal tails, and can recognise other
phase-space structures such as pure streams and candidate caustics. If an
additional unbinding criterion is added, the algorithm can be used as a
self-consistent halo and subhalo finder. As a test, we apply it to a large halo
of the Millennium Simulation, where 19 % of the halo mass are found to belong
to bound substructures, which is more than what is detected with conventional
3D substructure finders, and an additional 23-36 % of the total mass belongs to
unbound HSF structures. The distribution of identified phase-space density
peaks is clearly bimodal: high peaks are dominated by the bound structures and
low peaks belong mostly to tidal streams. In order to better understand what
HSF provides, we examine the time evolution of structures, based on the merger
tree history. Bound structures typically make only up to 6 orbits inside the
main halo. Still, HSF can identify at the present time at least 80 % of the
original content of structures with a redshift of infall as high as z <= 0.3,
which illustrates the significant power of this tool to perform dynamical
analyses in phase-space.Comment: Submitted to MNRAS, 24 pages, 18 figure
On smoothed analysis of quicksort and Hoare's find
We provide a smoothed analysis of Hoare's find algorithm, and we revisit the smoothed analysis of quicksort. Hoare's find algorithm - often called quickselect or one-sided quicksort - is an easy-to-implement algorithm for finding the k-th smallest element of a sequence. While the worst-case number of comparisons that Hoare’s find needs is Theta(n^2), the average-case number is Theta(n). We analyze what happens between these two extremes by providing a smoothed analysis. In the first perturbation model, an adversary specifies a sequence of n numbers of [0,1], and then, to each number of the sequence, we add a random number drawn independently from the interval [0,d]. We prove that Hoare's find needs Theta(n/(d+1) sqrt(n/d) + n) comparisons in expectation if the adversary may also specify the target element (even after seeing the perturbed sequence) and slightly fewer comparisons for finding the median. In the second perturbation model, each element is marked with a probability of p, and then a random permutation is applied to the marked elements. We prove that the expected number of comparisons to find the median is Omega((1−p)n/p log n). Finally, we provide lower bounds for the smoothed number of comparisons of quicksort and Hoare’s find for the median-of-three pivot rule, which usually yields faster algorithms than always selecting the first element: The pivot is the median of the first, middle, and last element of the sequence. We show that median-of-three does not yield a significant improvement over the classic rule
-SELC: Optimization by sequential elimination of level combinations using genetic algorithms and Gaussian processes
Identifying promising compounds from a vast collection of feasible compounds
is an important and yet challenging problem in the pharmaceutical industry. An
efficient solution to this problem will help reduce the expenditure at the
early stages of drug discovery. In an attempt to solve this problem, Mandal, Wu
and Johnson [Technometrics 48 (2006) 273--283] proposed the SELC algorithm.
Although powerful, it fails to extract substantial information from the data to
guide the search efficiently, as this methodology is not based on any
statistical modeling. The proposed approach uses Gaussian Process (GP) modeling
to improve upon SELC, and hence named -SELC. The performance of
the proposed methodology is illustrated using four and five dimensional test
functions. Finally, we implement the new algorithm on a real pharmaceutical
data set for finding a group of chemical compounds with optimal properties.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS199 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …