13,097 research outputs found
Separating sets of strings by finding matching patterns is almost always hard
© 2017 Elsevier B.V. We study the complexity of the problem of searching for a set of patterns that separate two given sets of strings. This problem has applications in a wide variety of areas, most notably in data mining, computational biology, and in understanding the complexity of genetic algorithms. We show that the basic problem of finding a small set of patterns that match one set of strings but do not match any string in a second set is difficult (NP-complete, W[2]-hard when parameterized by the size of the pattern set, and APX-hard). We then perform a detailed parameterized analysis of the problem, separating tractable and intractable variants. In particular we show that parameterizing by the size of pattern set and the number of strings, and the size of the alphabet and the number of strings give FPT results, amongst others
Fast Algorithms for Parameterized Problems with Relaxed Disjointness Constraints
In parameterized complexity, it is a natural idea to consider different
generalizations of classic problems. Usually, such generalization are obtained
by introducing a "relaxation" variable, where the original problem corresponds
to setting this variable to a constant value. For instance, the problem of
packing sets of size at most into a given universe generalizes the Maximum
Matching problem, which is recovered by taking . Most often, the
complexity of the problem increases with the relaxation variable, but very
recently Abasi et al. have given a surprising example of a problem ---
-Simple -Path --- that can be solved by a randomized algorithm with
running time . That is, the complexity of the
problem decreases with . In this paper we pursue further the direction
sketched by Abasi et al. Our main contribution is a derandomization tool that
provides a deterministic counterpart of the main technical result of Abasi et
al.: the algorithm for -Monomial
Detection, which is the problem of finding a monomial of total degree and
individual degrees at most in a polynomial given as an arithmetic circuit.
Our technique works for a large class of circuits, and in particular it can be
used to derandomize the result of Abasi et al. for -Simple -Path. On our
way to this result we introduce the notion of representative sets for
multisets, which may be of independent interest. Finally, we give two more
examples of problems that were already studied in the literature, where the
same relaxation phenomenon happens. The first one is a natural relaxation of
the Set Packing problem, where we allow the packed sets to overlap at each
element at most times. The second one is Degree Bounded Spanning Tree,
where we seek for a spanning tree of the graph with a small maximum degree
IIFA: Modular Inter-app Intent Information Flow Analysis of Android Applications
Android apps cooperate through message passing via intents. However, when
apps do not have identical sets of privileges inter-app communication (IAC) can
accidentally or maliciously be misused, e.g., to leak sensitive information
contrary to users expectations. Recent research considered static program
analysis to detect dangerous data leaks due to inter-component communication
(ICC) or IAC, but suffers from shortcomings with respect to precision,
soundness, and scalability. To solve these issues we propose a novel approach
for static ICC/IAC analysis. We perform a fixed-point iteration of ICC/IAC
summary information to precisely resolve intent communication with more than
two apps involved. We integrate these results with information flows generated
by a baseline (i.e. not considering intents) information flow analysis, and
resolve if sensitive data is flowing (transitively) through components/apps in
order to be ultimately leaked. Our main contribution is the first fully
automatic sound and precise ICC/IAC information flow analysis that is scalable
for realistic apps due to modularity, avoiding combinatorial explosion: Our
approach determines communicating apps using short summaries rather than
inlining intent calls, which often requires simultaneously analyzing all tuples
of apps. We evaluated our tool IIFA in terms of scalability, precision, and
recall. Using benchmarks we establish that precision and recall of our
algorithm are considerably better than prominent state-of-the-art analyses for
IAC. But foremost, applied to the 90 most popular applications from the Google
Playstore, IIFA demonstrated its scalability to a large corpus of real-world
apps. IIFA reports 62 problematic ICC-/IAC-related information flows via two or
more apps/components
A web content mining application for detecting relevant pages using Jaccard similarity
The tremendous growth in the availability of enormous text data from a variety of sources creates a slew of concerns and obstacles to discovering meaningful information. This advancement of technology in the digital realm has resulted in the dispersion of texts over millions of web sites. Unstructured texts are densely packed with textual information. The discovery of valuable and intriguing relationships in unstructured texts demands more computer processing. So, text mining has developed into an attractive area of study for obtaining organized and useful data. One of the purposes of this research is to discuss text pre-processing of automobile marketing domains in order to create a structured database. Regular expressions were used to extract data from unstructured vehicle advertisements, resulting in a well-organized database. We manually develop unique rule-based ways of extracting structured data from unstructured web pages. As a result of the information retrieved from these advertisements, a systematic search for certain noteworthy qualities is performed. There are numerous approaches for query recommendation, and it is vital to understand which one should be employed. Additionally, this research attempts to determine the optimal value similarity for query suggestions based on user-supplied parameters by comparing MySQL pattern matching and Jaccard similarity
- …