24,279 research outputs found
Polynomial kernels for 3-leaf power graph modification problems
A graph G=(V,E) is a 3-leaf power iff there exists a tree T whose leaves are
V and such that (u,v) is an edge iff u and v are at distance at most 3 in T.
The 3-leaf power graph edge modification problems, i.e. edition (also known as
the closest 3-leaf power), completion and edge-deletion, are FTP when
parameterized by the size of the edge set modification. However polynomial
kernel was known for none of these three problems. For each of them, we provide
cubic kernels that can be computed in linear time for each of these problems.
We thereby answer an open problem first mentioned by Dom, Guo, Huffner and
Niedermeier (2005).Comment: Submitte
A Minimal Periods Algorithm with Applications
Kosaraju in ``Computation of squares in a string'' briefly described a
linear-time algorithm for computing the minimal squares starting at each
position in a word. Using the same construction of suffix trees, we generalize
his result and describe in detail how to compute in O(k|w|)-time the minimal
k-th power, with period of length larger than s, starting at each position in a
word w for arbitrary exponent and integer . We provide the
complete proof of correctness of the algorithm, which is somehow not completely
clear in Kosaraju's original paper. The algorithm can be used as a sub-routine
to detect certain types of pseudo-patterns in words, which is our original
intention to study the generalization.Comment: 14 page
Hierarchical testing designs for pattern recognition
We explore the theoretical foundations of a ``twenty questions'' approach to
pattern recognition. The object of the analysis is the computational process
itself rather than probability distributions (Bayesian inference) or decision
boundaries (statistical learning). Our formulation is motivated by applications
to scene interpretation in which there are a great many possible explanations
for the data, one (``background'') is statistically dominant, and it is
imperative to restrict intensive computation to genuinely ambiguous regions.
The focus here is then on pattern filtering: Given a large set Y of possible
patterns or explanations, narrow down the true one Y to a small (random) subset
\hat Y\subsetY of ``detected'' patterns to be subjected to further, more
intense, processing. To this end, we consider a family of hypothesis tests for
Y\in A versus the nonspecific alternatives Y\in A^c. Each test has null type I
error and the candidate sets A\subsetY are arranged in a hierarchy of nested
partitions. These tests are then characterized by scope (|A|), power (or type
II error) and algorithmic cost. We consider sequential testing strategies in
which decisions are made iteratively, based on past outcomes, about which test
to perform next and when to stop testing. The set \hat Y is then taken to be
the set of patterns that have not been ruled out by the tests performed. The
total cost of a strategy is the sum of the ``testing cost'' and the
``postprocessing cost'' (proportional to |\hat Y|) and the corresponding
optimization problem is analyzed.Comment: Published at http://dx.doi.org/10.1214/009053605000000174 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
High-dimensional approximate nearest neighbor: k-d Generalized Randomized Forests
We propose a new data-structure, the generalized randomized kd forest, or
kgeraf, for approximate nearest neighbor searching in high dimensions. In
particular, we introduce new randomization techniques to specify a set of
independently constructed trees where search is performed simultaneously, hence
increasing accuracy. We omit backtracking, and we optimize distance
computations, thus accelerating queries. We release public domain software
geraf and we compare it to existing implementations of state-of-the-art methods
including BBD-trees, Locality Sensitive Hashing, randomized kd forests, and
product quantization. Experimental results indicate that our method would be
the method of choice in dimensions around 1,000, and probably up to 10,000, and
pointsets of cardinality up to a few hundred thousands or even one million;
this range of inputs is encountered in many critical applications today. For
instance, we handle a real dataset of images represented in 960
dimensions with a query time of less than sec on average and 90\% responses
being true nearest neighbors
- …