37 research outputs found

    Set-theoretic Types for Erlang

    Full text link
    Erlang is a functional programming language with dynamic typing. The language offers great flexibility for destructing values through pattern matching and dynamic type tests. Erlang also comes with a type language supporting parametric polymorphism, equi-recursive types, as well as union and a limited form of intersection types. However, type signatures only serve as documentation, there is no check that a function body conforms to its signature. Set-theoretic types and semantic subtyping fit Erlang's feature set very well. They allow expressing nearly all constructs of its type language and provide means for statically checking type signatures. This article brings set-theoretic types to Erlang and demonstrates how existing Erlang code can be statically typechecked without or with only minor modifications to the code. Further, the article formalizes the main ingredients of the type system in a small core calculus, reports on an implementation of the system, and compares it with other static typecheckers for Erlang.Comment: 14 pages, 9 figures, IFL 2022; latexmk -pdf to buil

    From Curry to Haskell: Paths to Abstraction in Programming Languages

    Get PDF

    Content-Based Textual File Type Detection at Scale

    Full text link
    Programming language detection is a common need in the analysis of large source code bases. It is supported by a number of existing tools that rely on several features, and most notably file extensions, to determine file types. We consider the problem of accurately detecting the type of files commonly found in software code bases, based solely on textual file content. Doing so is helpful to classify source code that lack file extensions (e.g., code snippets posted on the Web or executable scripts), to avoid misclassifying source code that has been recorded with wrong or uncommon file extensions, and also shed some light on the intrinsic recognizability of source code files. We propose a simple model that (a) use a language-agnostic word tokenizer for textual files, (b) group tokens in 1-/2-grams, (c) build feature vectors based on N-gram frequencies, and (d) use a simple fully connected neural network as classifier. As training set we use textual files extracted from GitHub repositories with at least 1000 stars, using existing file extensions as ground truth. Despite its simplicity the proposed model reaches 85% in our experiments for a relatively high number of recognized classes (more than 130 file types)

    Best-Effort Lazy Evaluation for Python Software Built on APIs

    Get PDF
    This paper focuses on an important optimization opportunity in Python-hosted domain-specific languages (DSLs): the use of laziness for optimization, whereby multiple API calls are deferred and then optimized prior to execution (rather than executing eagerly, which would require executing each call in isolation). In existing supports of lazy evaluation, laziness is "terminated" as soon as control passes back to the host language in any way, limiting opportunities for optimization. This paper presents Cunctator, a framework that extends this laziness to more of the Python language, allowing intermediate values from DSLs like NumPy or Pandas to flow back to the host Python code without triggering evaluation. This exposes more opportunities for optimization and, more generally, allows for larger computation graphs to be built, producing 1.03-14.2X speedups on a set of programs in common libraries and frameworks

    Constructing Hybrid Incremental Compilers for Cross-Module Extensibility with an Internal Build System

    Full text link
    Context: Compilation time is an important factor in the adaptability of a software project. Fast recompilation enables cheap experimentation with changes to a project, as those changes can be tested quickly. Separate and incremental compilation has been a topic of interest for a long time to facilitate fast recompilation. Inquiry: Despite the benefits of an incremental compiler, such compilers are usually not the default. This is because incrementalization requires cross-cutting, complicated, and error-prone techniques such as dependency tracking, caching, cache invalidation, and change detection. Especially in compilers for languages with cross-module definitions and integration, correctly and efficiently implementing an incremental compiler can be a challenge. Retrofitting incrementality into a compiler is even harder. We address this problem by developing a compiler design approach that reuses parts of an existing non-incremental compiler to lower the cost of building an incremental compiler. It also gives an intuition into compiling difficult-to-incrementalize language features through staging. Approach: We use the compiler design approach presented in this paper to develop an incremental compiler for the Stratego term-rewriting language. This language has a set of features that at first glance look incompatible with incremental compilation. Therefore, we treat Stratego as our critical case to demonstrate the approach on. We show how this approach decomposes the original compiler and has a solution to compile Stratego incrementally. The key idea on which we build our incremental compiler is to internally use an incremental build system to wire together the components we extract from the original compiler. Knowledge: The resulting compiler is already in use as a replacement of the original whole-program compiler. We find that the incremental build system inside the compiler is a crucial component of our approach. This allows a compiler writer to think in multiple steps of compilation, and combine that into a incremental compiler almost effortlessly. Normally, separate compilation \`a la C is facilitated by an external build system, where the programmer is responsible for managing dependencies between files. We reuse an existing sound and optimal incremental build system, and integrate its dependency tracking into the compiler. Grounding: The incremental compiler for Stratego is available as an artefact along with this article. We evaluate it on a large Stratego project to test its performance. The benchmark replays edits to the Stratego project from version control. These benchmarks are part of the artefact, packaged as a virtual machine image for easy reproducibility. Importance: Although we demonstrate our design approach on the Stratego programming language, we also describe it generally throughout this paper. Many currently used programming languages have a compiler that is much slower than necessary. Our design provides an approach to change this, by reusing an existing compiler and making it incremental within a reasonable amount of time

    Transient Typechecks are (Almost) Free

    Get PDF
    Transient gradual typing imposes run-time type tests that typically cause a linear slowdown in programs’ performance. This performance impact discourages the use of type annotations because adding types to a program makes the program slower. A virtual machine can employ standard justin-time optimizations to reduce the overhead of transient checks to near zero. These optimizations can give gradually-typed languages performance comparable to state-of-the-art dynamic languages, so programmers can add types to their code without affecting their programs’ performance

    DRAFT-What you always wanted to know but could not find about block-based environments

    Get PDF
    Block-based environments are visual programming environments, which are becoming more and more popular because of their ease of use. The ease of use comes thanks to their intuitive graphical representation and structural metaphors (jigsaw-like puzzles) to display valid combinations of language constructs to the users. Part of the current popularity of block-based environments is thanks to Scratch. As a result they are often associated with tools for children or young learners. However, it is unclear how these types of programming environments are developed and used in general. So we conducted a systematic literature review on block-based environments by studying 152 papers published between 2014 and 2020, and a non-systematic tool review of 32 block-based environments. In particular, we provide a helpful inventory of block-based editors for end-users on different topics and domains. Likewise, we focused on identifying the main components of block-based environments, how they are engineered, and how they are used. This survey should be equally helpful for language engineering researchers and language engineers alike
    corecore