6,243 research outputs found

    Compression via Matroids: A Randomized Polynomial Kernel for Odd Cycle Transversal

    Full text link
    The Odd Cycle Transversal problem (OCT) asks whether a given graph can be made bipartite by deleting at most kk of its vertices. In a breakthrough result Reed, Smith, and Vetta (Operations Research Letters, 2004) gave a \BigOh(4^kkmn) time algorithm for it, the first algorithm with polynomial runtime of uniform degree for every fixed kk. It is known that this implies a polynomial-time compression algorithm that turns OCT instances into equivalent instances of size at most \BigOh(4^k), a so-called kernelization. Since then the existence of a polynomial kernel for OCT, i.e., a kernelization with size bounded polynomially in kk, has turned into one of the main open questions in the study of kernelization. This work provides the first (randomized) polynomial kernelization for OCT. We introduce a novel kernelization approach based on matroid theory, where we encode all relevant information about a problem instance into a matroid with a representation of size polynomial in kk. For OCT, the matroid is built to allow us to simulate the computation of the iterative compression step of the algorithm of Reed, Smith, and Vetta, applied (for only one round) to an approximate odd cycle transversal which it is aiming to shrink to size kk. The process is randomized with one-sided error exponentially small in kk, where the result can contain false positives but no false negatives, and the size guarantee is cubic in the size of the approximate solution. Combined with an \BigOh(\sqrt{\log n})-approximation (Agarwal et al., STOC 2005), we get a reduction of the instance to size \BigOh(k^{4.5}), implying a randomized polynomial kernelization.Comment: Minor changes to agree with SODA 2012 version of the pape

    Compressive Embedding and Visualization using Graphs

    Get PDF
    Visualizing high-dimensional data has been a focus in data analysis communities for decades, which has led to the design of many algorithms, some of which are now considered references (such as t-SNE for example). In our era of overwhelming data volumes, the scalability of such methods have become more and more important. In this work, we present a method which allows to apply any visualization or embedding algorithm on very large datasets by considering only a fraction of the data as input and then extending the information to all data points using a graph encoding its global similarity. We show that in most cases, using only O(log(N))\mathcal{O}(\log(N)) samples is sufficient to diffuse the information to all NN data points. In addition, we propose quantitative methods to measure the quality of embeddings and demonstrate the validity of our technique on both synthetic and real-world datasets

    A practical fpt algorithm for Flow Decomposition and transcript assembly

    Full text link
    The Flow Decomposition problem, which asks for the smallest set of weighted paths that "covers" a flow on a DAG, has recently been used as an important computational step in transcript assembly. We prove the problem is in FPT when parameterized by the number of paths by giving a practical linear fpt algorithm. Further, we implement and engineer a Flow Decomposition solver based on this algorithm, and evaluate its performance on RNA-sequence data. Crucially, our solver finds exact solutions while achieving runtimes competitive with a state-of-the-art heuristic. Finally, we contextualize our design choices with two hardness results related to preprocessing and weight recovery. Specifically, kk-Flow Decomposition does not admit polynomial kernels under standard complexity assumptions, and the related problem of assigning (known) weights to a given set of paths is NP-hard.Comment: Introduces software package Toboggan: Version 1.0. http://dx.doi.org/10.5281/zenodo.82163

    Support Vector Machines in High Energy Physics

    Get PDF
    This lecture will introduce the Support Vector algorithms for classification and regression. They are an application of the so called kernel trick, which allows the extension of a certain class of linear algorithms to the non linear case. The kernel trick will be introduced and in the context of structural risk minimization, large margin algorithms for classification and regression will be presented. Current applications in high energy physics will be discussed.Comment: 11 pages, 12 figures. Part of the proceedings of the Track 'Computational Intelligence for HEP Data Analysis' at iCSC 200

    Filling in CMB map missing data using constrained Gaussian realizations

    Full text link
    For analyzing maps of the cosmic microwave background sky, it is necessary to mask out the region around the galactic equator where the parasitic foreground emission is strongest as well as the brightest compact sources. Since many of the analyses of the data, particularly those searching for non-Gaussianity of a primordial origin, are most straightforwardly carried out on full-sky maps, it is of great interest to develop efficient algorithms for filling in the missing information in a plausible way. We explore practical algorithms for filling in based on constrained Gaussian realizations. Although carrying out such realizations is in principle straightforward, for finely pixelized maps as will be required for the Planck analysis a direct brute force method is not numerically tractable. We present some concrete solutions to this problem, both on a spatially flat sky with periodic boundary conditions and on the pixelized sphere. One approach is to solve the linear system with an appropriately preconditioned conjugate gradient method. While this approach was successfully implemented on a rectangular domain with periodic boundary conditions and worked even for very wide masked regions, we found that the method failed on the pixelized sphere for reasons that we explain here. We present an approach that works for full-sky pixelized maps on the sphere involving a kernel-based multi-resolution Laplace solver followed by a series of conjugate gradient corrections near the boundary of the mask.Comment: 22 pages, 14 figures, minor changes, a few missing references adde
    corecore