82 research outputs found
Faster Compression of Deterministic Finite Automata
Deterministic finite automata (DFA) are a classic tool for high throughput
matching of regular expressions, both in theory and practice.
Due to their high space consumption, extensive research has been devoted to
compressed representations of DFAs that still support efficient pattern
matching queries.
Kumar~et~al.~[SIGCOMM 2006] introduced the \emph{delayed deterministic finite
automaton} (\ddfa{}) which exploits the large redundancy between inter-state
transitions in the automaton.
They showed it to obtain up to two orders of magnitude compression of
real-world DFAs, and their work formed the basis of numerous subsequent
results.
Their algorithm, as well as later algorithms based on their idea, have an
inherent quadratic-time bottleneck, as they consider every pair of states to
compute the optimal compression.
In this work we present a simple, general framework based on
locality-sensitive hashing for speeding up these algorithms to achieve
sub-quadratic construction times for \ddfa{}s.
We apply the framework to speed up several algorithms to near-linear time,
and experimentally evaluate their performance on real-world regular expression
sets extracted from modern intrusion detection systems.
We find an order of magnitude improvement in compression times, with either
little or no loss of compression, or even significantly better compression in
some cases
Subpath Queries on Compressed Graphs: A Survey
Text indexing is a classical algorithmic problem that has been studied for over four decades: given a text T, pre-process it off-line so that, later, we can quickly count and locate the occurrences of any string (the query pattern) in T in time proportional to the query’s length. The earliest optimal-time solution to the problem, the suffix tree, dates back to 1973 and requires up to two orders of magnitude more space than the plain text just to be stored. In the year 2000, two breakthrough works showed that efficient queries can be achieved without this space overhead: a fast index be stored in a space proportional to the text’s entropy. These contributions had an enormous impact in bioinformatics: today, virtually any DNA aligner employs compressed indexes. Recent trends considered more powerful compression schemes (dictionary compressors) and generalizations of the problem to labeled graphs: after all, texts can be viewed as labeled directed paths. In turn, since finite state automata can be considered as a particular case of labeled graphs, these findings created a bridge between the fields of compressed indexing and regular language theory, ultimately allowing to index regular languages and promising to shed new light on problems, such as regular expression matching. This survey is a gentle introduction to the main landmarks of the fascinating journey that took us from suffix trees to today’s compressed indexes for labeled graphs and regular languages
Doctor of Philosophy
dissertationSynthetic biology is a new field in which engineers, biologists, and chemists are working together to transform genetic engineering into an advanced engineering discipline, one in which the design and construction of novel genetic circuits are made possible through the application of engineering principles. This dissertation explores two engineering strategies to address the challenges of working with genetic technology, namely the development of standards for describing genetic components and circuits at separate yet connected levels of detail and the use of Genetic Design Automation (GDA) software tools to simplify and speed up the process of optimally designing genetic circuits. Its contributions to the field of synthetic biology include (1) a proposal for the next version of the Synthetic Biology Open Language (SBOL), an existing standard for specifying and exchanging genetic designs electronically, and (2) a GDA work ow that enables users of the software tool iBioSim to create an abstract functional specication, automatically select genetic components that satisfy the specication from a design library, and compose the selected components into a standardized genetic circuit design for subsequent analysis and physical construction. Ultimately, this dissertation demonstrates how existing techniques and concepts from electrical and computer engineering can be adapted to overcome the challenges of genetic design and is an example of what is possible when working with publicly available standards for genetic design
Structure and conformational dynamics of fatty acid synthases
Multistep reactions rely on substrate channeling between active sites. Carrier protein-based enzyme systems constitute the most versatile class of shuttling systems due to their capability of linking multiple catalytic centers. In eukaryotes and some bacteria, these systems have evolved to multifunctional enzymes, which integrate all functional domains involved into one or more giant polypeptide chains. The metazoan fatty acid synthase (FAS) is a key paradigm for carrier protein-based multienzymes. It catalyzes the de novo biosynthesis of fatty acids from carbohydrate-derived precursors in more than 40 individual reactions steps. Its seven functional domains are encoded on one polypeptide chain, which assembles into an X-shaped dimer for activity. The dimer features two lateral reaction clefts, each equipped with a full set of active sites and a flexibly tethered carrier protein. Substrate loading and condensation in the condensing region are structurally and functionally separated from the β-carbon processing domains in the modifying region.
At the beginning of this thesis, only a single crystal structure of an intact metazoan FAS was known. FAS, in particular its modifying region, displays extensive conformational variability, according to electron microscopy (EM) studies. Thus, the aim was to obtain a crystal structure of the FAS modifying region to identify a ground-state structure of the FAS modifying region and to characterize its structural heterogeneity. The second aim was to establish a method for mapping conformational changes in multienzymes at high spatiotemporal resolution. Chapter 1 introduces FAS and gives a methodological overview of studying conformational dynamics of multienzymes.
In chapter 2, the 2.7-Å crystal structure of the entire 250-kDa modifying region of insect FAS is presented. It presents a conserved ground-state conformation adopted by the most divergent member of metazoan FAS. Remarkably, even the V-shape of the central dehydratase dimer is conserved, despite a minimal interface. Structural comparison to polyketide synthases (PKSs) highlights distinct properties of FAS such as strong domain interactions and the absence of an N-terminal β-α-β-α extension of the lateral non-catalytic pseudo-ketoreductase.
Chapter 3 presents a novel approach for identifying conformational dynamics of multienzymes in solution by filming with high-speed atomic force microscopy (HS-AFM) at a spatial resolution of 2-5 nm. The temporal resolution of 10 fps correlates with the timescale of large-scale conformational changes in FAS. Varied viewing angles are provided by combining different molecular tethering strategies. Reference-free particle classification enables the quantitative characterization of conformational states and their transitions.
Chapter 4 discusses the implications of the results on multienzyme biology with respect to biological questions and biotechnological applications. The results of this thesis provide the tools to understand the role of conformational dynamics in multienzymes with respect to their function. These findings will ultimately advance the engineering of enzymatic assembly lines for tailored compound production
Tools and Algorithms for the Construction and Analysis of Systems
This open access two-volume set constitutes the proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021, which was held during March 27 – April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The total of 41 full papers presented in the proceedings was carefully reviewed and selected from 141 submissions. The volume also contains 7 tool papers; 6 Tool Demo papers, 9 SV-Comp Competition Papers. The papers are organized in topical sections as follows: Part I: Game Theory; SMT Verification; Probabilities; Timed Systems; Neural Networks; Analysis of Network Communication. Part II: Verification Techniques (not SMT); Case Studies; Proof Generation/Validation; Tool Papers; Tool Demo Papers; SV-Comp Tool Competition Papers
Recommended from our members
Understanding Flaws in the Deployment and Implementation of Web Encryption
In recent years, the web has switched from using the unencrypted HTTP protocol to using encrypted communications. Primarily, this resulted in increasing deployment of TLS to mitigate information leakage over the network. This development has led many web service operators to mistakenly think that migrating from HTTP to HTTPS will magically protect them from information leakage without any additional effort on their end to guar- antee the desired security properties. In reality, despite the fact that there exists enough infrastructure in place and the protocols have been “tested” (by virtue of being in wide, but not ubiquitous, use for many years), deploying HTTPS is a highly challenging task due to the technical complexity of its underlying protocols (i.e., HTTP, TLS) as well as the complexity of the TLS certificate ecosystem and this of popular client applications such as web browsers. For example, we found that many websites still avoid ubiquitous encryption and force only critical functionality and sensitive data access over encrypted connections while allowing more innocuous functionality to be accessed over HTTP. In practice, this approach is prone to flaws that can expose sensitive information or functionality to third parties. Thus, it is crucial for developers to verify the correctness of their deployments and implementations.
In this dissertation, in an effort to improve users’ privacy, we highlight semantic flaws in the implementations of both web servers and clients, caused by the improper deployment of web encryption protocols. First, we conduct an in-depth assessment of major websites and explore what functionality and information is exposed to attackers that have hijacked a user’s HTTP cookies. We identify a recurring pattern across websites with partially de- ployed HTTPS, namely, that service personalization inadvertently results in the exposure of private information. The separation of functionality across multiple cookies with different scopes and inter-dependencies further complicates matters, as imprecise access control renders restricted account functionality accessible to non-secure cookies. Our cookie hijacking study reveals a number of severe flaws; for example, attackers can obtain the user’s saved address and visited websites from e.g., Google, Bing, and Yahoo allow attackers to extract the contact list and send emails from the user’s account. To estimate the extent of the threat, we run measurements on a university public wireless network for a period of 30 days and detect over 282K accounts exposing the cookies required for our hijacking attacks.
Next, we explore and study security mechanisms purposed to eliminate this problem by enforcing encryption such as HSTS and HTTPS Everywhere. We evaluate each mechanism in terms of its adoption and effectiveness. We find that all mechanisms suffer from implementation flaws or deployment issues and argue that, as long as servers continue to not support ubiquitous encryption across their entire domain, no mechanism can effectively protect users from cookie hijacking and information leakage.
Finally, as the security guarantees of TLS (in turn HTTPS), are critically dependent on the correct validation of X.509 server certificates, we study hostname verification, a critical component in the certificate validation process. We develop HVLearn, a novel testing framework to verify the correctness of hostname verification implementations and use HVLearn to analyze a number of popular TLS libraries and applications. To this end, we found 8 unique violations of the RFC specifications. Several of these violations are critical and can render the affected implementations vulnerable to man-in-the-middle attacks
Automatic Detection and Repair of Input Validation and Sanitization Bugs
A crucial problem in developing dependable web applications is thecorrectness of the input validation and sanitization. Bugs in stringmanipulation operations used for validation and sanitization are common,resulting in erroneous application behavior and vulnerabilities that areexploitable by malicious users. In this dissertation, we investigate theproblem of automatic detection and repair of validation and sanitization bugsboth at the client-side (JavaScript) and the server-side (PHP or Java) code.We first present a formal model for input validation and sanitizationfunctions along with a new domain specific intermediate languageto represent them. Then, we show how to extract input validation andsanitization functions in our intermediate language from both client andserver-side code in web applications. After the extraction phase, we useautomata-based static string-analysis techniques to automatically verifyand fix the extracted functions. One of our contributions is the developmentof efficient automata-based string analysis techniques for frequently used,complex string operations.We developed two basic approaches to bug detection and repair: 1)policy-based, and 2) differential. In the policy-based approach, inputvalidation and sanitization policies are expressed using two regularexpressions, one specifying the maximum policy (the upper bound for theset of strings that should be allowed) and the other specifying the minimumpolicy (the lower bound for the set of strings that should be allowed). Usingour string analysis techniques we can identify two types of errors inan input validation and sanitization function: 1) it accepts a set of strings thatis not permitted by the maximum policy (i.e., it is under-constrained),or 2) it rejects a set of strings that is permitted by the minimum policy(i.e., it is over-constrained).Our differential bug detection and repair approach does not require anypolicy specifications. It exploits the fact that, in web applications,developers typically perform redundant input validation and sanitizationin both the client and the server-side since client-side checks canbe by-passed. Using automata-based string analysis, we compare theinput validation and sanitization functions extracted from the client- andserver-side code, and identify and report the inconsistencies between them.Finally, we present an automated differential repair technique that canrepair client and server-side code with respect to each other, or acrossapplications in order to strengthen the validation and sanitizationchecks. Given a reference and a target function, our differential repairtechnique strengthens the validation and sanitization operations in thetarget function based on the reference function by automatically generatinga set of patches.We experimented with a number of real world web applications and found manybugs and vulnerabilities. Our analysis generates counter-example behaviorsdemonstrating the detected bugs and vulnerabilities to help the developerswith the debugging process. Moreover, we automatically generate patchesthat can be used to mitigate the detected bugs and vulnerabilities untildevelopers write their own patches
Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022
The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing
Proceedings of the 22nd Conference on Formal Methods in Computer-Aided Design – FMCAD 2022
The Conference on Formal Methods in Computer-Aided Design (FMCAD) is an annual conference on the theory and applications of formal methods in hardware and system verification. FMCAD provides a leading forum to researchers in academia and industry for presenting and discussing groundbreaking methods, technologies, theoretical results, and tools for reasoning formally about computing systems. FMCAD covers formal aspects of computer-aided system design including verification, specification, synthesis, and testing
Programming Languages and Systems
This open access book constitutes the proceedings of the 30th European Symposium on Programming, ESOP 2021, which was held during March 27 until April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The 24 papers included in this volume were carefully reviewed and selected from 79 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
- …