9 research outputs found
Linear Matching of JavaScript Regular Expressions
Modern regex languages have strayed far from well-understood traditional
regular expressions: they include features that fundamentally transform the
matching problem. In exchange for these features, modern regex engines at times
suffer from exponential complexity blowups, a frequent source of
denial-of-service vulnerabilities in JavaScript applications. Worse, regex
semantics differ across languages, and the impact of these divergences on
algorithmic design and worst-case matching complexity has seldom been
investigated.
This paper provides a novel perspective on JavaScript's regex semantics by
identifying a larger-than-previously-understood subset of the language that can
be matched with linear time guarantees. In the process, we discover several
cases where state-of-the-art algorithms were either wrong (semantically
incorrect), inefficient (suffering from superlinear complexity) or excessively
restrictive (assuming certain features could not be matched linearly). We
introduce novel algorithms to restore correctness and linear complexity. We
further advance the state-of-the-art in linear regex matching by presenting the
first nonbacktracking algorithms for matching lookarounds in linear time: one
supporting captureless lookbehinds in any regex language, and another
leveraging a JavaScript property to support unrestricted lookaheads and
lookbehinds. Finally, we describe new time and space complexity tradeoffs for
regex engines. All of our algorithms are practical: we validated them in a
prototype implementation, and some have also been merged in the V8 JavaScript
implementation used in Chrome and Node.js
Backtracking reference stores
National audienceFrançois Pottier's unionFind library is parameterized over an underlying store of mutable references, and provides the usual references, transactional reference stores (for rolling back some changes in case of higher-level errors), and persistent reference stores. We extend this library with a new implementation of backtracking reference stores, to get a Union-Find implementation that efficiently supports arbitrary backtracking and also subsumes the transactional interface. Our backtracking reference stores are not specific to unionFind, they can be used to build arbitrary backtracking data structures. The natural implementation, using a journal to record all writes, provides amortized-constant-time operations with a space overhead linear in the number of store updates. A refined implementation reduces the memory overhead to be linear in the number of store cells updated, and gives performance that match non-backtracking references in practice
Kindly Bent to Free Us
Systems programming often requires the manipulation of resources like file
handles, network connections, or dynamically allocated memory. Programmers need
to follow certain protocols to handle these resources correctly. Violating
these protocols causes bugs ranging from type mismatches over data races to
use-after-free errors and memory leaks. These bugs often lead to security
vulnerabilities.
While statically typed programming languages guarantee type soundness and
memory safety by design, most of them do not address issues arising from
improper handling of resources. An important step towards handling resources is
the adoption of linear and affine types that enforce single-threaded resource
usage. However, the few languages supporting such types require heavy type
annotations.
We present Affe, an extension of ML that manages linearity and affinity
properties using kinds and constrained types. In addition Affe supports the
exclusive and shared borrowing of affine resources, inspired by features of
Rust. Moreover, Affe retains the defining features of the ML family: it is an
impure, strict, functional expression language with complete principal type
inference and type abstraction. Affe does not require any linearity annotations
in expressions and supports common functional programming idioms.Comment: ICFP 202
Programming Languages and Systems
This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
Kindly bent to free us
International audienceSystems programming often requires the manipulation of resources like file handles, network connections, or dynamically allocated memory. Programmers need to follow certain protocols to handle these resources correctly. Violating these protocols causes bugs ranging from type mismatches over data races to use-after-free errors and memory leaks. These bugs often lead to security vulnerabilities. While statically typed programming languages guarantee type soundness and memory safety by design, most of them do not address issues arising from improper handling of resources. An important step towards handling resources is the adoption of linear and affine types that enforce single-threaded resource usage. However, the few languages supporting such types require heavy type annotations. We present Affe, an extension of ML that manages linearity and affinity properties using kinds and constrained types. In addition Affe supports the exclusive and shared borrowing of affine resources, inspired by features of Rust. Moreover, Affe retains the defining features of the ML family: it is an impure, strict, functional expression language with complete principal type inference and type abstraction. Affe does not require any linearity annotations in expressions and supports common functional programming idioms
Programming Languages and Systems
This open access book constitutes the proceedings of the 31st European Symposium on Programming, ESOP 2022, which was held during April 5-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 21 regular papers presented in this volume were carefully reviewed and selected from 64 submissions. They deal with fundamental issues in the specification, design, analysis, and implementation of programming languages and systems
Des types aux assertions logiques : preuve automatique ou assistée de propriétés sur les programmes fonctionnels.
This work studies two approaches to improve the safety of computer programs using static analysis.The first one is typing which guarantees that the evaluation of program cannot fail. The functionallanguage ML has a very rich type system and also an algorithm that infers automatically the types.We focus on its adaptation to generalized algebraic data types (GADTs). In this setting, efficientcomputation of a most general type is impossible. We propose a stratification of the language thatretain the usual characteristics of the ML fragment and make explicit the use of GADTs. The re-sulting language, MLGX, entails a burden on the programmer who must annotate its programs toomuch. A second stratum, MLGI, offers a mechanism to infer locally, in a predictable and efficient way,incomplete yet, most of the type annotations. The first part concludes on an illustration of the expres-siveness of GADTs to encode the invariants of pushdown automata used in LR parsing. The secondapproach augments the language with logic assertions that enables arbitrarily complex specificationsto be expressed. We check the compliance of the program semantics with respect to these specifica-tions thanks to a method called Hoare logic and thanks to semi-automatic computer-based proofs.The design choices permit to handle first-class functions. They are directed by an implementationwhich is illustrated by the certification of a module of trees that denote finite sets.Cette thèse étudie deux approches fondées sur l’analyse statique pour augmenter la sûreté defonctionnement et la correction des programmes informatiques.La première approche est le typage qui permet de prouver automatiquement qu’un programmes’évalue sans échouer. Le langage fonctionnel ML possède un système de type très riche et un algorithmeeffectuant une synthèse automatique de ces types. On s’intéresse à l’adaptation de cet algorithme auxtypes algébriques généralisés (GADT), une forme restreinte des inductifs de Coq, qui ont été introduitspar Hongwei Xi en 2003.Dans ce cadre, le calcul efficace d’un type plus général est impossible. On propose une stratificationqui maintient les caractéristiques habituelles sur le fragment ML et qui isole le traitement des GADTen explicitant leur utilisation. Le langage obtenu, MLGX, nécessite des annotations de type qui alour-dissent les programmes. Une seconde strate, MLGI, offre au programmeur un mécanisme de synthèselocale, prédictible et efficace bien qu’incomplet, de la plupart de ces annotations. La première parties’achève avec une démonstration de l’expressivité des GADT pour coder les invariants des automatesà pile utilisés par l’analyse syntaxique LR.La seconde approche augmente le langage de programmation par des assertions logiques permettantd’exprimer des spécifications de complexité arbitraire dans la logique d’ordre supérieur polymorphi-quement typée. On vérifie statiquement la conformité de la sémantique du programme vis-à -vis de cesspécifications à l’aide d’une technique appelée logique de Hoare qui consiste à engendrer un ensembled’obligations de preuves à partir d’un programme annoté. Une fois ces obligations de preuve traitées,si un programme est utilisé correctement et si il renvoie une valeur alors il est certain que celle-ci estcorrecte.Habituellement, cette technique est employée sur les langages impératifs. Avec un langage fonc-tionnel pur, les problèmes liés à l’état de la mémoire d’évanouissent tandis que l’ordre supérieur etle polymorphisme en posent de nouveaux. Nos choix de conceptions cherchent à maximiser les op-portunités d’utilisation de prouveurs automatiques en traduisant minutieusement les objets d’ordresupérieur en objets du premier ordre. Une implantation prototype du système en fournit une illustra-tion dans la preuve presque totalement automatique d’un module CAML d’arbres équilibrés dénotantdes ensembles finis