66 research outputs found

    Efficient Compression Technique for Sparse Sets

    Full text link
    Recent technological advancements have led to the generation of huge amounts of data over the web, such as text, image, audio and video. Most of this data is high dimensional and sparse, for e.g., the bag-of-words representation used for representing text. Often, an efficient search for similar data points needs to be performed in many applications like clustering, nearest neighbour search, ranking and indexing. Even though there have been significant increases in computational power, a simple brute-force similarity-search on such datasets is inefficient and at times impossible. Thus, it is desirable to get a compressed representation which preserves the similarity between data points. In this work, we consider the data points as sets and use Jaccard similarity as the similarity measure. Compression techniques are generally evaluated on the following parameters --1) Randomness required for compression, 2) Time required for compression, 3) Dimension of the data after compression, and 4) Space required to store the compressed data. Ideally, the compressed representation of the data should be such, that the similarity between each pair of data points is preserved, while keeping the time and the randomness required for compression as low as possible. We show that the compression technique suggested by Pratap and Kulkarni also works well for Jaccard similarity. We present a theoretical proof of the same and complement it with rigorous experimentations on synthetic as well as real-world datasets. We also compare our results with the state-of-the-art "min-wise independent permutation", and show that our compression algorithm achieves almost equal accuracy while significantly reducing the compression time and the randomness

    From Gap-ETH to FPT-Inapproximability: Clique, Dominating Set, and More

    Full text link
    We consider questions that arise from the intersection between the areas of polynomial-time approximation algorithms, subexponential-time algorithms, and fixed-parameter tractable algorithms. The questions, which have been asked several times (e.g., [Marx08, FGMS12, DF13]), are whether there is a non-trivial FPT-approximation algorithm for the Maximum Clique (Clique) and Minimum Dominating Set (DomSet) problems parameterized by the size of the optimal solution. In particular, letting OPT\text{OPT} be the optimum and NN be the size of the input, is there an algorithm that runs in t(OPT)poly(N)t(\text{OPT})\text{poly}(N) time and outputs a solution of size f(OPT)f(\text{OPT}), for any functions tt and ff that are independent of NN (for Clique, we want f(OPT)=ω(1)f(\text{OPT})=\omega(1))? In this paper, we show that both Clique and DomSet admit no non-trivial FPT-approximation algorithm, i.e., there is no o(OPT)o(\text{OPT})-FPT-approximation algorithm for Clique and no f(OPT)f(\text{OPT})-FPT-approximation algorithm for DomSet, for any function ff (e.g., this holds even if ff is the Ackermann function). In fact, our results imply something even stronger: The best way to solve Clique and DomSet, even approximately, is to essentially enumerate all possibilities. Our results hold under the Gap Exponential Time Hypothesis (Gap-ETH) [Dinur16, MR16], which states that no 2o(n)2^{o(n)}-time algorithm can distinguish between a satisfiable 3SAT formula and one which is not even (1ϵ)(1 - \epsilon)-satisfiable for some constant ϵ>0\epsilon > 0. Besides Clique and DomSet, we also rule out non-trivial FPT-approximation for Maximum Balanced Biclique, Maximum Subgraphs with Hereditary Properties, and Maximum Induced Matching in bipartite graphs. Additionally, we rule out ko(1)k^{o(1)}-FPT-approximation algorithm for Densest kk-Subgraph although this ratio does not yet match the trivial O(k)O(k)-approximation algorithm.Comment: 43 pages. To appear in FOCS'1

    Symbolic Execution Game Semantics

    Get PDF
    41 pages, 5 figuresWe present a framework for symbolically executing and model checking higher-order programs with external (open) methods. We focus on the client-library paradigm and in particular we aim to check libraries with respect to any definable client. We combine traditional symbolic execution techniques with operational game semantics to build a symbolic execution semantics that captures arbitrary external behaviour. We prove the symbolic semantics to be sound and complete. This yields a bounded technique by imposing bounds on the depth of recursion and callbacks. We provide an implementation of our technique in the K framework and showcase its performance on a custom benchmark based on higher-order coding errors such as reentrancy bugs

    Fifth Biennial Report : June 1999 - August 2001

    No full text

    RAVEN: Reinforcement Learning for Generating Verifiable Run-Time Requirement Enforcers for MPSoCs

    Get PDF
    In embedded systems, applications frequently have to meet non-functional requirements regarding, e.g., real-time or energy consumption constraints, when executing on a given MPSoC target platform. Feedback-based controllers have been proposed that react to transient environmental factors by adapting the DVFS settings or degree of parallelism following some predefined control strategy. However, it is, in general, not possible to give formal guarantees for the obtained controllers to satisfy a given set of non-functional requirements. Run-time requirement enforcement has emerged as a field of research for the enforcement of non-functional requirements at run-time, allowing to define and formally verify properties on respective control strategies specified by automata. However, techniques for the automatic generation of such controllers have not yet been established. In this paper, we propose a technique using reinforcement learning to automatically generate verifiable feedback-based enforcers. For that, we train a control policy based on a representative input sequence at design time. The learned control strategy is then transformed into a verifiable enforcement automaton which constitutes our run-time control model that can handle unseen input data. As a case study, we apply the approach to generate controllers that are able to increase the probability of satisfying a given set of requirement verification goals compared to multiple state-of-the-art approaches, as can be verified by model checkers

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Strategic Issues, Problems and Challenges in Inductive Theorem Proving

    Get PDF
    Abstract(Automated) Inductive Theorem Proving (ITP) is a challenging field in automated reasoning and theorem proving. Typically, (Automated) Theorem Proving (TP) refers to methods, techniques and tools for automatically proving general (most often first-order) theorems. Nowadays, the field of TP has reached a certain degree of maturity and powerful TP systems are widely available and used. The situation with ITP is strikingly different, in the sense that proving inductive theorems in an essentially automatic way still is a very challenging task, even for the most advanced existing ITP systems. Both in general TP and in ITP, strategies for guiding the proof search process are of fundamental importance, in automated as well as in interactive or mixed settings. In the paper we will analyze and discuss the most important strategic and proof search issues in ITP, compare ITP with TP, and argue why ITP is in a sense much more challenging. More generally, we will systematically isolate, investigate and classify the main problems and challenges in ITP w.r.t. automation, on different levels and from different points of views. Finally, based on this analysis we will present some theses about the state of the art in the field, possible criteria for what could be considered as substantial progress, and promising lines of research for the future, towards (more) automated ITP

    Point Line Cover: The Easy Kernel is Essentially Tight

    Get PDF
    The input to the NP-hard Point Line Cover problem (PLC) consists of a set PP of nn points on the plane and a positive integer kk, and the question is whether there exists a set of at most kk lines which pass through all points in PP. A simple polynomial-time reduction reduces any input to one with at most k2k^2 points. We show that this is essentially tight under standard assumptions. More precisely, unless the polynomial hierarchy collapses to its third level, there is no polynomial-time algorithm that reduces every instance (P,k)(P,k) of PLC to an equivalent instance with O(k2ϵ)O(k^{2-\epsilon}) points, for any ϵ>0\epsilon>0. This answers, in the negative, an open problem posed by Lokshtanov (PhD Thesis, 2009). Our proof uses the machinery for deriving lower bounds on the size of kernels developed by Dell and van Melkebeek (STOC 2010). It has two main ingredients: We first show, by reduction from Vertex Cover, that PLC---conditionally---has no kernel of total size O(k2ϵ)O(k^{2-\epsilon}) bits. This does not directly imply the claimed lower bound on the number of points, since the best known polynomial-time encoding of a PLC instance with nn points requires ω(n2)\omega(n^{2}) bits. To get around this we build on work of Goodman et al. (STOC 1989) and devise an oracle communication protocol of cost O(nlogn)O(n\log n) for PLC; its main building block is a bound of O(nO(n))O(n^{O(n)}) for the order types of nn points that are not necessarily in general position, and an explicit algorithm that enumerates all possible order types of n points. This protocol and the lower bound on total size together yield the stated lower bound on the number of points. While a number of essentially tight polynomial lower bounds on total sizes of kernels are known, our result is---to the best of our knowledge---the first to show a nontrivial lower bound for structural/secondary parameters
    corecore