6,243 research outputs found
Compression via Matroids: A Randomized Polynomial Kernel for Odd Cycle Transversal
The Odd Cycle Transversal problem (OCT) asks whether a given graph can be
made bipartite by deleting at most of its vertices. In a breakthrough
result Reed, Smith, and Vetta (Operations Research Letters, 2004) gave a
\BigOh(4^kkmn) time algorithm for it, the first algorithm with polynomial
runtime of uniform degree for every fixed . It is known that this implies a
polynomial-time compression algorithm that turns OCT instances into equivalent
instances of size at most \BigOh(4^k), a so-called kernelization. Since then
the existence of a polynomial kernel for OCT, i.e., a kernelization with size
bounded polynomially in , has turned into one of the main open questions in
the study of kernelization.
This work provides the first (randomized) polynomial kernelization for OCT.
We introduce a novel kernelization approach based on matroid theory, where we
encode all relevant information about a problem instance into a matroid with a
representation of size polynomial in . For OCT, the matroid is built to
allow us to simulate the computation of the iterative compression step of the
algorithm of Reed, Smith, and Vetta, applied (for only one round) to an
approximate odd cycle transversal which it is aiming to shrink to size . The
process is randomized with one-sided error exponentially small in , where
the result can contain false positives but no false negatives, and the size
guarantee is cubic in the size of the approximate solution. Combined with an
\BigOh(\sqrt{\log n})-approximation (Agarwal et al., STOC 2005), we get a
reduction of the instance to size \BigOh(k^{4.5}), implying a randomized
polynomial kernelization.Comment: Minor changes to agree with SODA 2012 version of the pape
Compressive Embedding and Visualization using Graphs
Visualizing high-dimensional data has been a focus in data analysis
communities for decades, which has led to the design of many algorithms, some
of which are now considered references (such as t-SNE for example). In our era
of overwhelming data volumes, the scalability of such methods have become more
and more important. In this work, we present a method which allows to apply any
visualization or embedding algorithm on very large datasets by considering only
a fraction of the data as input and then extending the information to all data
points using a graph encoding its global similarity. We show that in most
cases, using only samples is sufficient to diffuse the
information to all data points. In addition, we propose quantitative
methods to measure the quality of embeddings and demonstrate the validity of
our technique on both synthetic and real-world datasets
A practical fpt algorithm for Flow Decomposition and transcript assembly
The Flow Decomposition problem, which asks for the smallest set of weighted
paths that "covers" a flow on a DAG, has recently been used as an important
computational step in transcript assembly. We prove the problem is in FPT when
parameterized by the number of paths by giving a practical linear fpt
algorithm. Further, we implement and engineer a Flow Decomposition solver based
on this algorithm, and evaluate its performance on RNA-sequence data.
Crucially, our solver finds exact solutions while achieving runtimes
competitive with a state-of-the-art heuristic. Finally, we contextualize our
design choices with two hardness results related to preprocessing and weight
recovery. Specifically, -Flow Decomposition does not admit polynomial
kernels under standard complexity assumptions, and the related problem of
assigning (known) weights to a given set of paths is NP-hard.Comment: Introduces software package Toboggan: Version 1.0.
http://dx.doi.org/10.5281/zenodo.82163
Support Vector Machines in High Energy Physics
This lecture will introduce the Support Vector algorithms for classification
and regression. They are an application of the so called kernel trick, which
allows the extension of a certain class of linear algorithms to the non linear
case. The kernel trick will be introduced and in the context of structural risk
minimization, large margin algorithms for classification and regression will be
presented. Current applications in high energy physics will be discussed.Comment: 11 pages, 12 figures. Part of the proceedings of the Track
'Computational Intelligence for HEP Data Analysis' at iCSC 200
Filling in CMB map missing data using constrained Gaussian realizations
For analyzing maps of the cosmic microwave background sky, it is necessary to
mask out the region around the galactic equator where the parasitic foreground
emission is strongest as well as the brightest compact sources. Since many of
the analyses of the data, particularly those searching for non-Gaussianity of a
primordial origin, are most straightforwardly carried out on full-sky maps, it
is of great interest to develop efficient algorithms for filling in the missing
information in a plausible way. We explore practical algorithms for filling in
based on constrained Gaussian realizations. Although carrying out such
realizations is in principle straightforward, for finely pixelized maps as will
be required for the Planck analysis a direct brute force method is not
numerically tractable. We present some concrete solutions to this problem, both
on a spatially flat sky with periodic boundary conditions and on the pixelized
sphere. One approach is to solve the linear system with an appropriately
preconditioned conjugate gradient method. While this approach was successfully
implemented on a rectangular domain with periodic boundary conditions and
worked even for very wide masked regions, we found that the method failed on
the pixelized sphere for reasons that we explain here. We present an approach
that works for full-sky pixelized maps on the sphere involving a kernel-based
multi-resolution Laplace solver followed by a series of conjugate gradient
corrections near the boundary of the mask.Comment: 22 pages, 14 figures, minor changes, a few missing references adde
- …