66 research outputs found
Efficient Compression Technique for Sparse Sets
Recent technological advancements have led to the generation of huge amounts
of data over the web, such as text, image, audio and video. Most of this data
is high dimensional and sparse, for e.g., the bag-of-words representation used
for representing text. Often, an efficient search for similar data points needs
to be performed in many applications like clustering, nearest neighbour search,
ranking and indexing. Even though there have been significant increases in
computational power, a simple brute-force similarity-search on such datasets is
inefficient and at times impossible. Thus, it is desirable to get a compressed
representation which preserves the similarity between data points. In this
work, we consider the data points as sets and use Jaccard similarity as the
similarity measure. Compression techniques are generally evaluated on the
following parameters --1) Randomness required for compression, 2) Time required
for compression, 3) Dimension of the data after compression, and 4) Space
required to store the compressed data. Ideally, the compressed representation
of the data should be such, that the similarity between each pair of data
points is preserved, while keeping the time and the randomness required for
compression as low as possible.
We show that the compression technique suggested by Pratap and Kulkarni also
works well for Jaccard similarity. We present a theoretical proof of the same
and complement it with rigorous experimentations on synthetic as well as
real-world datasets. We also compare our results with the state-of-the-art
"min-wise independent permutation", and show that our compression algorithm
achieves almost equal accuracy while significantly reducing the compression
time and the randomness
From Gap-ETH to FPT-Inapproximability: Clique, Dominating Set, and More
We consider questions that arise from the intersection between the areas of
polynomial-time approximation algorithms, subexponential-time algorithms, and
fixed-parameter tractable algorithms. The questions, which have been asked
several times (e.g., [Marx08, FGMS12, DF13]), are whether there is a
non-trivial FPT-approximation algorithm for the Maximum Clique (Clique) and
Minimum Dominating Set (DomSet) problems parameterized by the size of the
optimal solution. In particular, letting be the optimum and be
the size of the input, is there an algorithm that runs in
time and outputs a solution of size
, for any functions and that are independent of (for
Clique, we want )?
In this paper, we show that both Clique and DomSet admit no non-trivial
FPT-approximation algorithm, i.e., there is no
-FPT-approximation algorithm for Clique and no
-FPT-approximation algorithm for DomSet, for any function
(e.g., this holds even if is the Ackermann function). In fact, our results
imply something even stronger: The best way to solve Clique and DomSet, even
approximately, is to essentially enumerate all possibilities. Our results hold
under the Gap Exponential Time Hypothesis (Gap-ETH) [Dinur16, MR16], which
states that no -time algorithm can distinguish between a satisfiable
3SAT formula and one which is not even -satisfiable for some
constant .
Besides Clique and DomSet, we also rule out non-trivial FPT-approximation for
Maximum Balanced Biclique, Maximum Subgraphs with Hereditary Properties, and
Maximum Induced Matching in bipartite graphs. Additionally, we rule out
-FPT-approximation algorithm for Densest -Subgraph although this
ratio does not yet match the trivial -approximation algorithm.Comment: 43 pages. To appear in FOCS'1
Symbolic Execution Game Semantics
41 pages, 5 figuresWe present a framework for symbolically executing and model checking higher-order programs with external (open) methods. We focus on the client-library paradigm and in particular we aim to check libraries with respect to any definable client. We combine traditional symbolic execution techniques with operational game semantics to build a symbolic execution semantics that captures arbitrary external behaviour. We prove the symbolic semantics to be sound and complete. This yields a bounded technique by imposing bounds on the depth of recursion and callbacks. We provide an implementation of our technique in the K framework and showcase its performance on a custom benchmark based on higher-order coding errors such as reentrancy bugs
RAVEN: Reinforcement Learning for Generating Verifiable Run-Time Requirement Enforcers for MPSoCs
In embedded systems, applications frequently have to meet non-functional requirements regarding, e.g., real-time or energy consumption constraints, when executing on a given MPSoC target platform.
Feedback-based controllers have been proposed that react to transient environmental factors by adapting the DVFS settings or degree of parallelism following some predefined control strategy. However, it is, in general, not possible to give formal guarantees for the obtained controllers to satisfy a given set of non-functional requirements. Run-time requirement enforcement has emerged as a field of research for the enforcement of non-functional requirements at run-time, allowing to define and formally verify properties on respective control strategies specified by automata. However, techniques for the automatic generation of such controllers have not yet been established.
In this paper, we propose a technique using reinforcement learning to automatically generate verifiable feedback-based enforcers. For that, we train a control policy based on a representative input sequence at design time. The learned control strategy is then transformed into a verifiable enforcement automaton which constitutes our run-time control model that can handle unseen input data. As a case study, we apply the approach to generate controllers that are able to increase the probability of satisfying a given set of requirement verification goals compared to multiple state-of-the-art approaches, as can be verified by model checkers
Strategic Issues, Problems and Challenges in Inductive Theorem Proving
Abstract(Automated) Inductive Theorem Proving (ITP) is a challenging field in automated reasoning and theorem proving. Typically, (Automated) Theorem Proving (TP) refers to methods, techniques and tools for automatically proving general (most often first-order) theorems. Nowadays, the field of TP has reached a certain degree of maturity and powerful TP systems are widely available and used. The situation with ITP is strikingly different, in the sense that proving inductive theorems in an essentially automatic way still is a very challenging task, even for the most advanced existing ITP systems. Both in general TP and in ITP, strategies for guiding the proof search process are of fundamental importance, in automated as well as in interactive or mixed settings. In the paper we will analyze and discuss the most important strategic and proof search issues in ITP, compare ITP with TP, and argue why ITP is in a sense much more challenging. More generally, we will systematically isolate, investigate and classify the main problems and challenges in ITP w.r.t. automation, on different levels and from different points of views. Finally, based on this analysis we will present some theses about the state of the art in the field, possible criteria for what could be considered as substantial progress, and promising lines of research for the future, towards (more) automated ITP
Point Line Cover: The Easy Kernel is Essentially Tight
The input to the NP-hard Point Line Cover problem (PLC) consists of a set
of points on the plane and a positive integer , and the question is
whether there exists a set of at most lines which pass through all points
in . A simple polynomial-time reduction reduces any input to one with at
most points. We show that this is essentially tight under standard
assumptions. More precisely, unless the polynomial hierarchy collapses to its
third level, there is no polynomial-time algorithm that reduces every instance
of PLC to an equivalent instance with points, for
any . This answers, in the negative, an open problem posed by
Lokshtanov (PhD Thesis, 2009).
Our proof uses the machinery for deriving lower bounds on the size of kernels
developed by Dell and van Melkebeek (STOC 2010). It has two main ingredients:
We first show, by reduction from Vertex Cover, that PLC---conditionally---has
no kernel of total size bits. This does not directly imply
the claimed lower bound on the number of points, since the best known
polynomial-time encoding of a PLC instance with points requires
bits. To get around this we build on work of Goodman et al.
(STOC 1989) and devise an oracle communication protocol of cost
for PLC; its main building block is a bound of for the order
types of points that are not necessarily in general position, and an
explicit algorithm that enumerates all possible order types of n points. This
protocol and the lower bound on total size together yield the stated lower
bound on the number of points.
While a number of essentially tight polynomial lower bounds on total sizes of
kernels are known, our result is---to the best of our knowledge---the first to
show a nontrivial lower bound for structural/secondary parameters
- …