12 research outputs found

    Sound regular expression semantics for dynamic symbolic execution of JavaScript

    Get PDF
    Existing support for regular expressions in automated test generation or verification tools is lacking. Common aspects of regular expression engines found in mainstream programming languages, such as backreferences or greedy matching, are commonly ignored or imprecisely approximated, leading to poor test coverage or failed proofs. In this paper, we present the first complete strategy to faithfully reason about regular expressions in the context of symbolic execution, focusing on the operators found in JavaScript. We model regular expression operations using string constraints and classical regular expressions and use a refinement scheme to address the problem of matching precedence and greediness. Our survey of over 400,000 JavaScript packages from the NPM software repository shows that one fifth make use of complex regular expressions features. We implemented our model in a dynamic symbolic execution engine for JavaScript and evaluated it on over 1,000 Node.js packages containing regular expressions, demonstrating that the strategy is effective and can increase line coverage of programs by up to 30%Comment: This arXiv version (v4) contains fixes for some typographical errors of the PLDI'19 version (the numbering of indices in Section 4.1 and the example in Section 4.3

    Deterministic Regular Expressions with Back-References

    Get PDF
    Most modern libraries for regular expression matching allow back-references (i.e. repetition operators) that substantially increase expressive power, but also lead to intractability. In order to find a better balance between expressiveness and tractability, we combine these with the notion of determinism for regular expressions used in XML DTDs and XML Schema. This includes the definition of a suitable automaton model, and a generalization of the Glushkov construction

    Stream Processing using Grammars and Regular Expressions

    Full text link
    In this dissertation we study regular expression based parsing and the use of grammatical specifications for the synthesis of fast, streaming string-processing programs. In the first part we develop two linear-time algorithms for regular expression based parsing with Perl-style greedy disambiguation. The first algorithm operates in two passes in a semi-streaming fashion, using a constant amount of working memory and an auxiliary tape storage which is written in the first pass and consumed by the second. The second algorithm is a single-pass and optimally streaming algorithm which outputs as much of the parse tree as is semantically possible based on the input prefix read so far, and resorts to buffering as many symbols as is required to resolve the next choice. Optimality is obtained by performing a PSPACE-complete pre-analysis on the regular expression. In the second part we present Kleenex, a language for expressing high-performance streaming string processing programs as regular grammars with embedded semantic actions, and its compilation to streaming string transducers with worst-case linear-time performance. Its underlying theory is based on transducer decomposition into oracle and action machines, and a finite-state specialization of the streaming parsing algorithm presented in the first part. In the second part we also develop a new linear-time streaming parsing algorithm for parsing expression grammars (PEG) which generalizes the regular grammars of Kleenex. The algorithm is based on a bottom-up tabulation algorithm reformulated using least fixed points and evaluated using an instance of the chaotic iteration scheme by Cousot and Cousot

    Xcerpt: A Rule-Based Query and Transformation Language for the Web

    Get PDF
    This thesis investigates querying the Web and the Semantic Web. It proposes a new rulebased query language called Xcerpt. Xcerpt differs from other query languages in that it uses patterns instead of paths for the selection of data, and in that it supports both rule chaining and recursion. Rule chaining serves for structuring large queries, as well as for designing complex query programs (e.g. involving queries to the Semantic Web), and for modelling inference rules. Query patterns may contain special constructs like partial subqueries, optional subqueries, or negated subqueries that account for the particularly flexible structure of data on the Web. Furthermore, this thesis introduces the syntax of the language Xcerpt, which is illustrated on a large collection of use cases both from the conventional Web and the Semantic Web. In addition, a declarative semantics in form of a Tarski-style model theory is described, and an algorithm is proposed that performs a backward chaining evaluation of Xcerpt programs. This algorithm has also been implemented (partly) in a prototypical runtime system. A salient aspect of this algorithm is the specification of a non-standard unification algorithm called simulation unification that supports the new query constructs described above. This unification is symmetric in the sense that variables in both terms can be bound. On the other hand it is in contrast to standard unification assymmetric in the sense that the unification determines that the one term is a subterm of the other term.Diese Arbeit untersucht das Anfragen des Webs und des Semantischen Webs. Sie stellt eine neue regel-basierte Anfragesprache namens Xcerpt vor. Xcerpt unterscheidet sich von anderen Anfragesprachen insofern, als dass es zur Selektion von Daten sog. Pattern (,,Muster'') verwendet und sowohl Regelschliessen als auch Rekursion unterstützt, was sowohl zur Strukturierung größerer Anfragen als auch zur Erstellung komplexer Anfrageprogramme, und zur Modellierung von Inferenzregeln dient. Anfrage-Pattern können spezielle Konstrukte, wie partielle Teilanfragen, optionale Teilanfragen, oder negierte Teilanfragen, enthalten, die der besonders flexiblen Struktur von Daten im Web genügen. In dieser Arbeit wird weiterhin die Syntax von Xcerpt eingeführt, und mit Hilfe mehrerer Anwendungsszenarien sowohl aus dem konventionellen als auch aus dem semantischen Web erläutert. Ausserdem wird eine deklarative Semantik im Stil von Tarski's Modelltheorie beschrieben und ein Algorithmus vorgeschlagen, der eine rückwärtsschliessende Auswertung von Xcerpt durchführt und in einem prototypischen Laufzeitsystem implementiert wurde. Wesentlicher Bestandteil des Rückwärtsschliessens ist die Spezifikation eines nicht-standard Unifikations-Algorithmus, der die oben genannten speziellen Xcerpt-Konstrukte berücksichtigt. Diese Unifikation ist symmetrisch in dem Sinne, dass sie Variablen in beiden angeglichenen (,,unifizierten'') Termen binden kann. Andererseits ist sie im Gegensatz zur Standardunifikation assymmetrisch in dem Sinne, dass der dadurch geleistete Angleich den einen Term als ,,Teilterm'' des anderen erkennt

    Cruiser and PhoTable: Exploring Tabletop User Interface Software for Digital Photograph Sharing and Story Capture

    Get PDF
    Digital photography has not only changed the nature of photography and the photographic process, but also the manner in which we share photographs and tell stories about them. Some traditional methods, such as the family photo album or passing around piles of recently developed snapshots, are lost to us without requiring the digital photos to be printed. The current, purely digital, methods of sharing do not provide the same experience as printed photographs, and they do not provide effective face-to-face social interaction around photographs, as experienced during storytelling. Research has found that people are often dissatisfied with sharing photographs in digital form. The recent emergence of the tabletop interface as a viable multi-user direct-touch interactive large horizontal display has provided the hardware that has the potential to improve our collocated activities such as digital photograph sharing. However, while some software to communicate with various tabletop hardware technologies exists, software aspects of tabletop user interfaces are still at an early stage and require careful consideration in order to provide an effective, multi-user immersive interface that arbitrates the social interaction between users, without the necessary computer-human interaction interfering with the social dialogue. This thesis presents PhoTable, a social interface allowing people to effectively share, and tell stories about, recently taken, unsorted digital photographs around an interactive tabletop. In addition, the computer-arbitrated digital interaction allows PhoTable to capture the stories told, and associate them as audio metadata to the appropriate photographs. By leveraging the tabletop interface and providing a highly usable and natural interaction we can enable users to become immersed in their social interaction, telling stories about their photographs, and allow the computer interaction to occur as a side-effect of the social interaction. Correlating the computer interaction with the corresponding audio allows PhoTable to annotate an automatically created digital photo album with audible stories, which may then be archived. These stories remain useful for future sharing -- both collocated sharing and remote (e.g. via the Internet) -- and also provide a personal memento both of the event depicted in the photograph (e.g. as a reminder) and of the enjoyable photo sharing experience at the tabletop. To provide the necessary software to realise an interface such as PhoTable, this thesis explored the development of Cruiser: an efficient, extensible and reusable software framework for developing tabletop applications. Cruiser contributes a set of programming libraries and the necessary application framework to facilitate the rapid and highly flexible development of new tabletop applications. It uses a plugin architecture that encourages code reuse, stability and easy experimentation, and leverages the dedicated computer graphics hardware and multi-core processors of modern consumer-level systems to provide a responsive and immersive interactive tabletop user interface that is agnostic to the tabletop hardware and operating platform, using efficient, native cross-platform code. Cruiser's flexibility has allowed a variety of novel interactive tabletop applications to be explored by other researchers using the framework, in addition to PhoTable. To evaluate Cruiser and PhoTable, this thesis follows recommended practices for systems evaluation. The design rationale is framed within the above scenario and vision which we explore further, and the resulting design is critically analysed based on user studies, heuristic evaluation and a reflection on how it evolved over time. The effectiveness of Cruiser was evaluated in terms of its ability to realise PhoTable, use of it by others to explore many new tabletop applications, and an analysis of performance and resource usage. Usability, learnability and effectiveness of PhoTable was assessed on three levels: careful usability evaluations of elements of the interface; informal observations of usability when Cruiser was available to the public in several exhibitions and demonstrations; and a final evaluation of PhoTable in use for storytelling, where this had the side effect of creating a digital photo album, consisting of the photographs users interacted with on the table and associated audio annotations which PhoTable automatically extracted from the interaction. We conclude that our approach to design has resulted in an effective framework for creating new tabletop interfaces. The parallel goal of exploring the potential for tabletop interaction as a new way to share digital photographs was realised in PhoTable. It is able to support the envisaged goal of an effective interface for telling stories about one's photos. As a serendipitous side-effect, PhoTable was effective in the automatic capture of the stories about individual photographs for future reminiscence and sharing. This work provides foundations for future work in creating new ways to interact at a tabletop and to the ways to capture personal stories around digital photographs for sharing and long-term preservation
    corecore