37 research outputs found
Set-theoretic Types for Erlang
Erlang is a functional programming language with dynamic typing. The language
offers great flexibility for destructing values through pattern matching and
dynamic type tests. Erlang also comes with a type language supporting
parametric polymorphism, equi-recursive types, as well as union and a limited
form of intersection types. However, type signatures only serve as
documentation, there is no check that a function body conforms to its
signature. Set-theoretic types and semantic subtyping fit Erlang's feature set
very well. They allow expressing nearly all constructs of its type language and
provide means for statically checking type signatures. This article brings
set-theoretic types to Erlang and demonstrates how existing Erlang code can be
statically typechecked without or with only minor modifications to the code.
Further, the article formalizes the main ingredients of the type system in a
small core calculus, reports on an implementation of the system, and compares
it with other static typecheckers for Erlang.Comment: 14 pages, 9 figures, IFL 2022; latexmk -pdf to buil
Content-Based Textual File Type Detection at Scale
Programming language detection is a common need in the analysis of large
source code bases. It is supported by a number of existing tools that rely on
several features, and most notably file extensions, to determine file types. We
consider the problem of accurately detecting the type of files commonly found
in software code bases, based solely on textual file content. Doing so is
helpful to classify source code that lack file extensions (e.g., code snippets
posted on the Web or executable scripts), to avoid misclassifying source code
that has been recorded with wrong or uncommon file extensions, and also shed
some light on the intrinsic recognizability of source code files. We propose a
simple model that (a) use a language-agnostic word tokenizer for textual files,
(b) group tokens in 1-/2-grams, (c) build feature vectors based on N-gram
frequencies, and (d) use a simple fully connected neural network as classifier.
As training set we use textual files extracted from GitHub repositories with at
least 1000 stars, using existing file extensions as ground truth. Despite its
simplicity the proposed model reaches 85% in our experiments for a relatively
high number of recognized classes (more than 130 file types)
Best-Effort Lazy Evaluation for Python Software Built on APIs
This paper focuses on an important optimization opportunity in Python-hosted domain-specific languages (DSLs): the use of laziness for optimization, whereby multiple API calls are deferred and then optimized prior to execution (rather than executing eagerly, which would require executing each call in isolation). In existing supports of lazy evaluation, laziness is "terminated" as soon as control passes back to the host language in any way, limiting opportunities for optimization. This paper presents Cunctator, a framework that extends this laziness to more of the Python language, allowing intermediate values from DSLs like NumPy or Pandas to flow back to the host Python code without triggering evaluation. This exposes more opportunities for optimization and, more generally, allows for larger computation graphs to be built, producing 1.03-14.2X speedups on a set of programs in common libraries and frameworks
Constructing Hybrid Incremental Compilers for Cross-Module Extensibility with an Internal Build System
Context: Compilation time is an important factor in the adaptability of a
software project. Fast recompilation enables cheap experimentation with changes
to a project, as those changes can be tested quickly. Separate and incremental
compilation has been a topic of interest for a long time to facilitate fast
recompilation.
Inquiry: Despite the benefits of an incremental compiler, such compilers are
usually not the default. This is because incrementalization requires
cross-cutting, complicated, and error-prone techniques such as dependency
tracking, caching, cache invalidation, and change detection. Especially in
compilers for languages with cross-module definitions and integration,
correctly and efficiently implementing an incremental compiler can be a
challenge. Retrofitting incrementality into a compiler is even harder. We
address this problem by developing a compiler design approach that reuses parts
of an existing non-incremental compiler to lower the cost of building an
incremental compiler. It also gives an intuition into compiling
difficult-to-incrementalize language features through staging.
Approach: We use the compiler design approach presented in this paper to
develop an incremental compiler for the Stratego term-rewriting language. This
language has a set of features that at first glance look incompatible with
incremental compilation. Therefore, we treat Stratego as our critical case to
demonstrate the approach on. We show how this approach decomposes the original
compiler and has a solution to compile Stratego incrementally. The key idea on
which we build our incremental compiler is to internally use an incremental
build system to wire together the components we extract from the original
compiler.
Knowledge: The resulting compiler is already in use as a replacement of the
original whole-program compiler. We find that the incremental build system
inside the compiler is a crucial component of our approach. This allows a
compiler writer to think in multiple steps of compilation, and combine that
into a incremental compiler almost effortlessly. Normally, separate compilation
\`a la C is facilitated by an external build system, where the programmer is
responsible for managing dependencies between files. We reuse an existing sound
and optimal incremental build system, and integrate its dependency tracking
into the compiler.
Grounding: The incremental compiler for Stratego is available as an artefact
along with this article. We evaluate it on a large Stratego project to test its
performance. The benchmark replays edits to the Stratego project from version
control. These benchmarks are part of the artefact, packaged as a virtual
machine image for easy reproducibility.
Importance: Although we demonstrate our design approach on the Stratego
programming language, we also describe it generally throughout this paper. Many
currently used programming languages have a compiler that is much slower than
necessary. Our design provides an approach to change this, by reusing an
existing compiler and making it incremental within a reasonable amount of time
Transient Typechecks are (Almost) Free
Transient gradual typing imposes run-time type tests that typically cause a linear slowdown in
programs’ performance. This performance impact discourages the use of type annotations because
adding types to a program makes the program slower. A virtual machine can employ standard justin-time optimizations to reduce the overhead of transient checks to near zero. These optimizations
can give gradually-typed languages performance comparable to state-of-the-art dynamic languages,
so programmers can add types to their code without affecting their programs’ performance
DRAFT-What you always wanted to know but could not find about block-based environments
Block-based environments are visual programming environments, which are becoming more and more popular because of their ease of use. The ease of use comes thanks to their intuitive graphical representation and structural metaphors (jigsaw-like puzzles) to display valid combinations of language constructs to the users. Part of the current popularity of block-based environments is thanks to Scratch. As a result they are often associated with tools for children or young learners. However, it is unclear how these types of programming environments are developed and used in general. So we conducted a systematic literature review on block-based environments by studying 152 papers published between 2014 and 2020, and a non-systematic tool review of 32 block-based environments. In particular, we provide a helpful inventory of block-based editors for end-users on different topics and domains. Likewise, we focused on identifying the main components of block-based environments, how they are engineered, and how they are used. This survey should be equally helpful for language engineering researchers and language engineers alike