13,097 research outputs found

    Separating sets of strings by finding matching patterns is almost always hard

    Get PDF
    © 2017 Elsevier B.V. We study the complexity of the problem of searching for a set of patterns that separate two given sets of strings. This problem has applications in a wide variety of areas, most notably in data mining, computational biology, and in understanding the complexity of genetic algorithms. We show that the basic problem of finding a small set of patterns that match one set of strings but do not match any string in a second set is difficult (NP-complete, W[2]-hard when parameterized by the size of the pattern set, and APX-hard). We then perform a detailed parameterized analysis of the problem, separating tractable and intractable variants. In particular we show that parameterizing by the size of pattern set and the number of strings, and the size of the alphabet and the number of strings give FPT results, amongst others

    Fast Algorithms for Parameterized Problems with Relaxed Disjointness Constraints

    Full text link
    In parameterized complexity, it is a natural idea to consider different generalizations of classic problems. Usually, such generalization are obtained by introducing a "relaxation" variable, where the original problem corresponds to setting this variable to a constant value. For instance, the problem of packing sets of size at most pp into a given universe generalizes the Maximum Matching problem, which is recovered by taking p=2p=2. Most often, the complexity of the problem increases with the relaxation variable, but very recently Abasi et al. have given a surprising example of a problem --- rr-Simple kk-Path --- that can be solved by a randomized algorithm with running time O(2O(klogrr))O^*(2^{O(k \frac{\log r}{r})}). That is, the complexity of the problem decreases with rr. In this paper we pursue further the direction sketched by Abasi et al. Our main contribution is a derandomization tool that provides a deterministic counterpart of the main technical result of Abasi et al.: the O(2O(klogrr))O^*(2^{O(k \frac{\log r}{r})}) algorithm for (r,k)(r,k)-Monomial Detection, which is the problem of finding a monomial of total degree kk and individual degrees at most rr in a polynomial given as an arithmetic circuit. Our technique works for a large class of circuits, and in particular it can be used to derandomize the result of Abasi et al. for rr-Simple kk-Path. On our way to this result we introduce the notion of representative sets for multisets, which may be of independent interest. Finally, we give two more examples of problems that were already studied in the literature, where the same relaxation phenomenon happens. The first one is a natural relaxation of the Set Packing problem, where we allow the packed sets to overlap at each element at most rr times. The second one is Degree Bounded Spanning Tree, where we seek for a spanning tree of the graph with a small maximum degree

    IIFA: Modular Inter-app Intent Information Flow Analysis of Android Applications

    Full text link
    Android apps cooperate through message passing via intents. However, when apps do not have identical sets of privileges inter-app communication (IAC) can accidentally or maliciously be misused, e.g., to leak sensitive information contrary to users expectations. Recent research considered static program analysis to detect dangerous data leaks due to inter-component communication (ICC) or IAC, but suffers from shortcomings with respect to precision, soundness, and scalability. To solve these issues we propose a novel approach for static ICC/IAC analysis. We perform a fixed-point iteration of ICC/IAC summary information to precisely resolve intent communication with more than two apps involved. We integrate these results with information flows generated by a baseline (i.e. not considering intents) information flow analysis, and resolve if sensitive data is flowing (transitively) through components/apps in order to be ultimately leaked. Our main contribution is the first fully automatic sound and precise ICC/IAC information flow analysis that is scalable for realistic apps due to modularity, avoiding combinatorial explosion: Our approach determines communicating apps using short summaries rather than inlining intent calls, which often requires simultaneously analyzing all tuples of apps. We evaluated our tool IIFA in terms of scalability, precision, and recall. Using benchmarks we establish that precision and recall of our algorithm are considerably better than prominent state-of-the-art analyses for IAC. But foremost, applied to the 90 most popular applications from the Google Playstore, IIFA demonstrated its scalability to a large corpus of real-world apps. IIFA reports 62 problematic ICC-/IAC-related information flows via two or more apps/components

    A web content mining application for detecting relevant pages using Jaccard similarity

    Get PDF
    The tremendous growth in the availability of enormous text data from a variety of sources creates a slew of concerns and obstacles to discovering meaningful information. This advancement of technology in the digital realm has resulted in the dispersion of texts over millions of web sites. Unstructured texts are densely packed with textual information. The discovery of valuable and intriguing relationships in unstructured texts demands more computer processing. So, text mining has developed into an attractive area of study for obtaining organized and useful data. One of the purposes of this research is to discuss text pre-processing of automobile marketing domains in order to create a structured database. Regular expressions were used to extract data from unstructured vehicle advertisements, resulting in a well-organized database. We manually develop unique rule-based ways of extracting structured data from unstructured web pages. As a result of the information retrieved from these advertisements, a systematic search for certain noteworthy qualities is performed. There are numerous approaches for query recommendation, and it is vital to understand which one should be employed. Additionally, this research attempts to determine the optimal value similarity for query suggestions based on user-supplied parameters by comparing MySQL pattern matching and Jaccard similarity

    Improved bibliographic reference parsing based on repeated patterns

    Get PDF
    uploaded by Plaz
    corecore