340 research outputs found

    A Note on the Complexity of Comparing Succinctly Represented Integers, with an Application to Maximum Probability Parsing

    Get PDF
    The following two decision problems capture the complexity of comparing integers or rationals that are succinctly represented in product-of-exponentials notation, or equivalently, via arithmetic circuits using only multiplication and division gates, and integer inputs: Input instance: four lists of positive integers: a_1, ...., a_n ; b_1,...., b_n ; c_1,....,c_m ; d_1, ...., d_m ; where each of the integers is represented in binary. Problem 1 (equality testing): Decide whether a_1^{b_1} a_2^{b_2} .... a_n^{b_n} = c_1^{d_1} c_2^{d_2} .... c_m^{d_m} . Problem 2 (inequality testing): Decide whether a_1^{b_1} a_2^{b_2} ... a_n^{b_n} >= c_1^{d_1} c_2^{d_2} .... c_m^{d_m} . Problem 1 is easily decidable in polynomial time using a simple iterative algorithm. Problem 2 is much harder. We observe that the complexity of Problem 2 is intimately connected to deep conjectures and results in number theory. In particular, if a refined form of the ABC conjecture formulated by Baker in 1998 holds, or if the older Lang-Waldschmidt conjecture (formulated in 1978) on linear forms in logarithms holds, then Problem 2 is decidable in P-time (in the standard Turing model of computation). Moreover, it follows from the best available quantitative bounds on linear forms in logarithms, e.g., by Baker and W\"{u}stholz (1993) or Matveev (2000), that if m and n are fixed universal constants then Problem 2 is decidable in P-time (without relying on any conjectures). This latter fact was observed earlier by Shub (1993). We describe one application: P-time maximum probability parsing for arbitrary stochastic context-free grammars (where \epsilon-rules are allowed)

    Computing and evolving variants of computational depth

    Get PDF
    The structure and organization of information in binary strings and (infinite) binary sequences are investigated using two computable measures of complexity related to computational depth. First, fundamental properties of recursive computational depth, a refinement of Bennett\u27s original notion of computational depth, are developed, and it is shown that the recursively weakly (respectively, strongly) deep sequences form a proper subclass of the class of weakly (respectively, strongly) deep sequences. It is then shown that every weakly useful sequence is recursively strongly deep, strengthening a theorem by Juedes, Lathrop, and Lutz. It follows from these results that not every strongly deep sequence is weakly useful, thereby answering an open question posed by Juedes;Second, compression depth, a feasibly computable depth measurement, is developed based on the Lempel-Ziv compression algorithm. LZ compression depth is further formalized by introducing strongly (compression) deep sequences and showing that analogues of the main properties of computational depth hold for compression depth. Critical to these results, it is shown that a sequence that is not normal must be compressible by the Lempei-Ziv algorithm. This yields a new, simpler proof that the Champernowne sequence is normal;Compression depth is also used to measure the organization of genes in genetic algorithms. Using finite-state machines to control the actions of an automaton playing prisoner\u27s dilemma, a genetic algorithm is used to evolve a population of finite-state machines (players) to play prisoner\u27s dilemma against each other. Since the fitness function is based solely on how well a player performs against all other players in the population, any accumulation of compression depth (organization) in the genetic structure of the player can only by attributed to the fact that more fit players have a more highly organized genetic structure. It is shown experimentally that this is the case

    SPARTA: A Graphical User Interface for Malicious Mobile Code Fingerprint-ing.

    Get PDF
    This thesis introduces and describes SPARTA (for Stochastic Profiling Application for the Rendering of Trees and Automata), a graphical user interface used as a front end to a collection of tools written in C that collectively convert a log of registry system calls performed by an application into binary descriptions of PSTs (for Probabilistic Suffix Trees) and PSAs (for Probabilistic Suffix Automata), which are models used to represent application behavior on Windows-based systems. SPARTA works by rendering these binary descriptions into graphical form, showcasing a variety of features intended to make the user interaction with PSTs and PSAs informative and insightful. The ultimate goal of SPARTA is to aid in the process of profiling applications based on the system calls they make, using characteristics from PSTs and PSAs that are more easily noticeable in their graphical form to define “normal” behavior for Windows applications. With knowledge of normal behavior, these very same models can be used to measure deviations that might ultimately result in the destructive actions of malicious mobile code, enabling the user to halt or quarantine them before they take place

    A Framework for File Format Fuzzing with Genetic Algorithms

    Get PDF
    Secure software, meaning software free from vulnerabilities, is desirable in today\u27s marketplace. Consumers are beginning to value a product\u27s security posture as well as its functionality. Software development companies are recognizing this trend, and they are factoring security into their entire software development lifecycle. Secure development practices like threat modeling, static analysis, safe programming libraries, run-time protections, and software verification are being mandated during product development. Mandating these practices improves a product\u27s security posture before customer delivery, and these practices increase the difficulty of discovering and exploiting vulnerabilities. Since the 1980\u27s, security researchers have uncovered software defects by fuzz testing an application. In fuzz testing\u27s infancy, randomly generated data could discover multiple defects quickly. However, as software matures and software development companies integrate secure development practices into their development life cycles, fuzzers must apply more sophisticated techniques in order to retain their ability to uncover defects. Fuzz testing must evolve, and fuzz testing practitioners must devise new algorithms to exercise an application in unexpected ways. This dissertation\u27s objective is to create a proof-of-concept genetic algorithm fuzz testing framework to exercise an application\u27s file format parsing routines. The framework includes multiple genetic algorithm variations, provides a configuration scheme, and correlates data gathered from static and dynamic analysis to guide negative test case evolution. Experiments conducted for this dissertation illustrate the effectiveness of a genetic algorithm fuzzer in comparison to standard fuzz testing tools. The experiments showcase a genetic algorithm fuzzer\u27s ability to discover multiple unique defects within a limited number of negative test cases. These experiments also highlight an application\u27s increased execution time when fuzzing with a genetic algorithm. To combat increased execution time, a distributed architecture is implemented and additional experiments demonstrate a decrease in execution time comparable to standard fuzz testing tools. A final set of experiments provide guidance on fitness function selection with a CHC genetic algorithm fuzzer with different population size configurations

    Scallop: A Language for Neurosymbolic Programming

    Full text link
    We present Scallop, a language which combines the benefits of deep learning and logical reasoning. Scallop enables users to write a wide range of neurosymbolic applications and train them in a data- and compute-efficient manner. It achieves these goals through three key features: 1) a flexible symbolic representation that is based on the relational data model; 2) a declarative logic programming language that is based on Datalog and supports recursion, aggregation, and negation; and 3) a framework for automatic and efficient differentiable reasoning that is based on the theory of provenance semirings. We evaluate Scallop on a suite of eight neurosymbolic applications from the literature. Our evaluation demonstrates that Scallop is capable of expressing algorithmic reasoning in diverse and challenging AI tasks, provides a succinct interface for machine learning programmers to integrate logical domain knowledge, and yields solutions that are comparable or superior to state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions outperform these models in aspects such as runtime and data efficiency, interpretability, and generalizability

    Integrated supertagging and parsing

    Get PDF
    EuroMatrixPlus project funded by the European Commission, 7th Framework ProgrammeParsing is the task of assigning syntactic or semantic structure to a natural language sentence. This thesis focuses on syntactic parsing with Combinatory Categorial Grammar (CCG; Steedman 2000). CCG allows incremental processing, which is essential for speech recognition and some machine translation models, and it can build semantic structure in tandem with syntactic parsing. Supertagging solves a subset of the parsing task by assigning lexical types to words in a sentence using a sequence model. It has emerged as a way to improve the efficiency of full CCG parsing (Clark and Curran, 2007) by reducing the parser’s search space. This has been very successful and it is the central theme of this thesis. We begin by an analysis of how efficiency is being traded for accuracy in supertagging. Pruning the search space by supertagging is inherently approximate and to contrast this we include A* in our analysis, a classic exact search technique. Interestingly, we find that combining the two methods improves efficiency but we also demonstrate that excessive pruning by a supertagger significantly lowers the upper bound on accuracy of a CCG parser. Inspired by this analysis, we design a single integrated model with both supertagging and parsing features, rather than separating them into distinct models chained together in a pipeline. To overcome the resulting complexity, we experiment with both loopy belief propagation and dual decomposition approaches to inference, the first empirical comparison of these algorithms that we are aware of on a structured natural language processing problem. Finally, we address training the integrated model. We adopt the idea of optimising directly for a task-specific metric such as is common in other areas like statistical machine translation. We demonstrate how a novel dynamic programming algorithm enables us to optimise for F-measure, our task-specific evaluation metric, and experiment with approximations, which prove to be excellent substitutions. Each of the presented methods improves over the state-of-the-art in CCG parsing. Moreover, the improvements are additive, achieving a labelled/unlabelled dependency F-measure on CCGbank of 89.3%/94.0% with gold part-of-speech tags, and 87.2%/92.8% with automatic part-of-speech tags, the best reported results for this task to date. Our techniques are general and we expect them to apply to other parsing problems, including lexicalised tree adjoining grammar and context-free grammar parsing

    Parsing Inside-Out

    Full text link
    The inside-outside probabilities are typically used for reestimating Probabilistic Context Free Grammars (PCFGs), just as the forward-backward probabilities are typically used for reestimating HMMs. I show several novel uses, including improving parser accuracy by matching parsing algorithms to evaluation criteria; speeding up DOP parsing by 500 times; and 30 times faster PCFG thresholding at a given accuracy level. I also give an elegant, state-of-the-art grammar formalism, which can be used to compute inside-outside probabilities; and a parser description formalism, which makes it easy to derive inside-outside formulas and many others.Comment: Ph.D. Thesis, 257 pages, 40 postscript figure
    • …
    corecore