80 research outputs found

    Stream Processing using Grammars and Regular Expressions

    Full text link
    In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

    Comparación de dos algoritmos recientes para inferencia gramatical de lenguajes regulares mediante autómatas no deterministas

    Get PDF
    El desarrollo de nuevos algoritmos, que resulten convergentes y eficientes, es un paso necesario para un uso provechoso de la inferencia gramatical en la solución de problemas reales y de mayor tamaño. En este trabajo se presentan dos algoritmos llamados DeLeTe2 y MRIA, que implementan la inferencia gramatical por medio de autómatas no deterministas, en contraste con los algoritmos más comúnmente empleados, los cuales utilizan autómatas deterministas. Se consideran las ventajas y desventajas de este cambio en el modelo de representación, mediante la descripción detallada y la comparación de los dos algoritmos de inferencia con respecto al enfoque utilizado en su implementación, a su complejidad computacional, a sus criterios de terminación y a su desempeño sobre un cuerpo de datos sintéticos

    Mining Multiple Web Sources Using Non-Deterministic Finite State Automata

    Get PDF
    Existing web content extracting systems use unsupervised, supervised, and semi-supervised approaches. The WebOMiner system is an automatic web content data extraction system which models a specific Business to Customer (B2C) web site such as bestbuy.com using object oriented database schema. WebOMiner system extracts different web page content types like product, list, text using non deterministic finite automaton (NFA) generated manually. This thesis extends the automatic web content data extraction techniques proposed in the WebOMiner system to handle multiple web sites and generate integrated data warehouse automatically. We develop the WebOMiner-2 which generates NFA of specific domain classes from regular expressions extracted from web page DOM trees\u27 frequent patterns. Our algorithm can also handle NFA epsilon([varepsilon]) transition and convert it to deterministic finite automata (DFA) to identify different content tuples from list of tuples. Experimental results show that our system is highly effective and performs the content extraction task with 100% precision and 98.35% recall value

    QUALITY IMPROVEMENT AND VALIDATION TECHNIQUES ON SOFTWARE SPECIFICATION AND DESIGN

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Improving Programming Support for Hardware Accelerators Through Automata Processing Abstractions

    Full text link
    The adoption of hardware accelerators, such as Field-Programmable Gate Arrays, into general-purpose computation pipelines continues to rise, driven by recent trends in data collection and analysis as well as pressure from challenging physical design constraints in hardware. The architectural designs of many of these accelerators stand in stark contrast to the traditional von Neumann model of CPUs. Consequently, existing programming languages, maintenance tools, and techniques are not directly applicable to these devices, meaning that additional architectural knowledge is required for effective programming and configuration. Current programming models and techniques are akin to assembly-level programming on a CPU, thus placing significant burden on developers tasked with using these architectures. Because programming is currently performed at such low levels of abstraction, the software development process is tedious and challenging and hinders the adoption of hardware accelerators. This dissertation explores the thesis that theoretical finite automata provide a suitable abstraction for bridging the gap between high-level programming models and maintenance tools familiar to developers and the low-level hardware representations that enable high-performance execution on hardware accelerators. We adopt a principled hardware/software co-design methodology to develop a programming model providing the key properties that we observe are necessary for success, namely performance and scalability, ease of use, expressive power, and legacy support. First, we develop a framework that allows developers to port existing, legacy code to run on hardware accelerators by leveraging automata learning algorithms in a novel composition with software verification, string solvers, and high-performance automata architectures. Next, we design a domain-specific programming language to aid programmers writing pattern-searching algorithms and develop compilation algorithms to produce finite automata, which supports efficient execution on a wide variety of processing architectures. Then, we develop an interactive debugger for our new language, which allows developers to accurately identify the locations of bugs in software while maintaining support for high-throughput data processing. Finally, we develop two new automata-derived accelerator architectures to support additional applications, including the detection of security attacks and the parsing of recursive and tree-structured data. Using empirical studies, logical reasoning, and statistical analyses, we demonstrate that our prototype artifacts scale to real-world applications, maintain manageable overheads, and support developers' use of hardware accelerators. Collectively, the research efforts detailed in this dissertation help ease the adoption and use of hardware accelerators for data analysis applications, while supporting high-performance computation.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155224/1/angstadt_1.pd

    Formal synthesis of control and communication schemes

    Full text link
    Thesis (Ph.D.)--Boston UniversityIn traditional motion planning, the problem is simply specified as "go from A to B while avoiding obstacles", where A and B are two configurations or regions of interest in the robot workspace. However, a large number of robotic applications require more expressive specification languages, which allow for logical and temporal statements about the satisfaction of properties of interest. Examples include "visit A and B infinitely often, always avoid C, and do not visit D unless E vas visited before". Such task specifications cannot be trivially converted to a sequence of "go from A to B" primitives. This thesis establishes theoretical and computational frameworks for automatic synthesis of robot control and communication schemes that are correct-by-construction from task specifications given in expressive languages. We consider a purely discrete scenario, in which the dynamics of each robot is modeled as a finite discrete system. The first problem addressed in this thesis is the generation of provably-correct individual control and communication strategies for a team of robots from rich task specifications in the case when the workspace is static. The second problem relaxes this assumption and considers a scenario in which the environment changes according to some unknown patterns. It proposed a combined learning and formal synthesis approach to generate correct control policies. To tackle the first problem, we draw inspirations from the research fields of formal verification and synthesis, distributed formal synthesis, and concurrency theory. We consider a team of robots that can move among the regions of a partitioned environment and have known capabilities of servicing a set of requests that can occur in the regions of the partition. Some of these requests can be serviced by a robot individually, while some require the cooperation of groups of robots. We propose a top-down approach, in which global specifications given as Regular Expressions (RE) or Linear Temporal Logics (LTL) can be decomposed into local (individual) specifications, which can then be used to automatically synthesize robot control and communication strategies. To address the second problem, we bring together automata learning methods from the field of theoretical linguistics and techniques from temporal logic games and probabilistic model checking, to develop a provably-correct control strategy for robots moving in an environment with unknown dynamics. The robots are required to achieve a surveillance mission, in which a certain request needs to be serviced repeatedly, while the expected time in between consecutive services is minimized and additional temporal logic constraints are satisfied. We define a fragment of Linear Temporal Logic (LTL) to describe such a mission. We consider a single agent case at first and then extend the results to multi-agent systems. To this end, we apply approximate dynamic programming to our computational framework, which leads to significant reduction of computational time. To demonstrate the proposed theoretical and computational frameworks, we implement the derived algorithms in two experimental platforms, the Robotic Urban-Like Environment (RULE) and the Robotic InDoor-like Environment (RIDE). We assign tasks to the team using Regular Expressions or Linear Temporal Logics over requests occurring at regions in the environment. The robots are automatically deployed to complete the missions

    Quantitative Verification and Synthesis of Resilient Networks

    Get PDF
    corecore