82 research outputs found

    Faster Compression of Deterministic Finite Automata

    Full text link
    Deterministic finite automata (DFA) are a classic tool for high throughput matching of regular expressions, both in theory and practice. Due to their high space consumption, extensive research has been devoted to compressed representations of DFAs that still support efficient pattern matching queries. Kumar~et~al.~[SIGCOMM 2006] introduced the \emph{delayed deterministic finite automaton} (\ddfa{}) which exploits the large redundancy between inter-state transitions in the automaton. They showed it to obtain up to two orders of magnitude compression of real-world DFAs, and their work formed the basis of numerous subsequent results. Their algorithm, as well as later algorithms based on their idea, have an inherent quadratic-time bottleneck, as they consider every pair of states to compute the optimal compression. In this work we present a simple, general framework based on locality-sensitive hashing for speeding up these algorithms to achieve sub-quadratic construction times for \ddfa{}s. We apply the framework to speed up several algorithms to near-linear time, and experimentally evaluate their performance on real-world regular expression sets extracted from modern intrusion detection systems. We find an order of magnitude improvement in compression times, with either little or no loss of compression, or even significantly better compression in some cases

    Subpath Queries on Compressed Graphs: A Survey

    Get PDF
    Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages

    Doctor of Philosophy

    Get PDF
    dissertationSynthetic biology is a new field in which engineers, biologists, and chemists are working together to transform genetic engineering into an advanced engineering discipline, one in which the design and construction of novel genetic circuits are made possible through the application of engineering principles. This dissertation explores two engineering strategies to address the challenges of working with genetic technology, namely the development of standards for describing genetic components and circuits at separate yet connected levels of detail and the use of Genetic Design Automation (GDA) software tools to simplify and speed up the process of optimally designing genetic circuits. Its contributions to the field of synthetic biology include (1) a proposal for the next version of the Synthetic Biology Open Language (SBOL), an existing standard for specifying and exchanging genetic designs electronically, and (2) a GDA work ow that enables users of the software tool iBioSim to create an abstract functional specication, automatically select genetic components that satisfy the specication from a design library, and compose the selected components into a standardized genetic circuit design for subsequent analysis and physical construction. Ultimately, this dissertation demonstrates how existing techniques and concepts from electrical and computer engineering can be adapted to overcome the challenges of genetic design and is an example of what is possible when working with publicly available standards for genetic design

    Structure and conformational dynamics of fatty acid synthases

    Get PDF
    Multistep reactions rely on substrate channeling between active sites. Carrier protein-based enzyme systems constitute the most versatile class of shuttling systems due to their capability of linking multiple catalytic centers. In eukaryotes and some bacteria, these systems have evolved to multifunctional enzymes, which integrate all functional domains involved into one or more giant polypeptide chains. The metazoan fatty acid synthase (FAS) is a key paradigm for carrier protein-based multienzymes. It catalyzes the de novo biosynthesis of fatty acids from carbohydrate-derived precursors in more than 40 individual reactions steps. Its seven functional domains are encoded on one polypeptide chain, which assembles into an X-shaped dimer for activity. The dimer features two lateral reaction clefts, each equipped with a full set of active sites and a flexibly tethered carrier protein. Substrate loading and condensation in the condensing region are structurally and functionally separated from the β-carbon processing domains in the modifying region. At the beginning of this thesis, only a single crystal structure of an intact metazoan FAS was known. FAS, in particular its modifying region, displays extensive conformational variability, according to electron microscopy (EM) studies. Thus, the aim was to obtain a crystal structure of the FAS modifying region to identify a ground-state structure of the FAS modifying region and to characterize its structural heterogeneity. The second aim was to establish a method for mapping conformational changes in multienzymes at high spatiotemporal resolution. Chapter 1 introduces FAS and gives a methodological overview of studying conformational dynamics of multienzymes. In chapter 2, the 2.7-Å crystal structure of the entire 250-kDa modifying region of insect FAS is presented. It presents a conserved ground-state conformation adopted by the most divergent member of metazoan FAS. Remarkably, even the V-shape of the central dehydratase dimer is conserved, despite a minimal interface. Structural comparison to polyketide synthases (PKSs) highlights distinct properties of FAS such as strong domain interactions and the absence of an N-terminal β-α-β-α extension of the lateral non-catalytic pseudo-ketoreductase. Chapter 3 presents a novel approach for identifying conformational dynamics of multienzymes in solution by filming with high-speed atomic force microscopy (HS-AFM) at a spatial resolution of 2-5 nm. The temporal resolution of 10 fps correlates with the timescale of large-scale conformational changes in FAS. Varied viewing angles are provided by combining different molecular tethering strategies. Reference-free particle classification enables the quantitative characterization of conformational states and their transitions. Chapter 4 discusses the implications of the results on multienzyme biology with respect to biological questions and biotechnological applications. The results of this thesis provide the tools to understand the role of conformational dynamics in multienzymes with respect to their function. These findings will ultimately advance the engineering of enzymatic assembly lines for tailored compound production

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access two-volume set constitutes the proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021, which was held during March 27 – April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The total of 41 full papers presented in the proceedings was carefully reviewed and selected from 141 submissions. The volume also contains 7 tool papers; 6 Tool Demo papers, 9 SV-Comp Competition Papers. The papers are organized in topical sections as follows: Part I: Game Theory; SMT Verification; Probabilities; Timed Systems; Neural Networks; Analysis of Network Communication. Part II: Verification Techniques (not SMT); Case Studies; Proof Generation/Validation; Tool Papers; Tool Demo Papers; SV-Comp Tool Competition Papers

    Automatic Detection and Repair of Input Validation and Sanitization Bugs

    Get PDF
    A crucial problem in developing dependable web applications is thecorrectness of the input validation and sanitization. Bugs in stringmanipulation operations used for validation and sanitization are common,resulting in erroneous application behavior and vulnerabilities that areexploitable by malicious users. In this dissertation, we investigate theproblem of automatic detection and repair of validation and sanitization bugsboth at the client-side (JavaScript) and the server-side (PHP or Java) code.We first present a formal model for input validation and sanitizationfunctions along with a new domain specific intermediate languageto represent them. Then, we show how to extract input validation andsanitization functions in our intermediate language from both client andserver-side code in web applications. After the extraction phase, we useautomata-based static string-analysis techniques to automatically verifyand fix the extracted functions. One of our contributions is the developmentof efficient automata-based string analysis techniques for frequently used,complex string operations.We developed two basic approaches to bug detection and repair: 1)policy-based, and 2) differential. In the policy-based approach, inputvalidation and sanitization policies are expressed using two regularexpressions, one specifying the maximum policy (the upper bound for theset of strings that should be allowed) and the other specifying the minimumpolicy (the lower bound for the set of strings that should be allowed). Usingour string analysis techniques we can identify two types of errors inan input validation and sanitization function: 1) it accepts a set of strings thatis not permitted by the maximum policy (i.e., it is under-constrained),or 2) it rejects a set of strings that is permitted by the minimum policy(i.e., it is over-constrained).Our differential bug detection and repair approach does not require anypolicy specifications. It exploits the fact that, in web applications,developers typically perform redundant input validation and sanitizationin both the client and the server-side since client-side checks canbe by-passed. Using automata-based string analysis, we compare theinput validation and sanitization functions extracted from the client- andserver-side code, and identify and report the inconsistencies between them.Finally, we present an automated differential repair technique that canrepair client and server-side code with respect to each other, or acrossapplications in order to strengthen the validation and sanitizationchecks. Given a reference and a target function, our differential repairtechnique strengthens the validation and sanitization operations in thetarget function based on the reference function by automatically generatinga set of patches.We experimented with a number of real world web applications and found manybugs and vulnerabilities. Our analysis generates counter-example behaviorsdemonstrating the detected bugs and vulnerabilities to help the developerswith the debugging process. Moreover, we automatically generate patchesthat can be used to mitigate the detected bugs and vulnerabilities untildevelopers write their own patches

    Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022

    Get PDF
    The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing

    Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022

    Get PDF
    The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing

    Programming Languages and Systems

    Get PDF
    This open access book constitutes the proceedings of the 30th European Symposium on Programming, ESOP 2021, which was held during March 27 until April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The 24 papers included in this volume were carefully reviewed and selected from 79 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
    • …
    corecore