5,454 research outputs found

    A foundation for synthesising programming language semantics

    Get PDF
    Programming or scripting languages used in real-world systems are seldom designed with a formal semantics in mind from the outset. Therefore, the first step for developing well-founded analysis tools for these systems is to reverse-engineer a formal semantics. This can take months or years of effort. Could we automate this process, at least partially? Though desirable, automatically reverse-engineering semantics rules from an implementation is very challenging, as found by Krishnamurthi, Lerner and Elberty. They propose automatically learning desugaring translation rules, mapping the language whose semantics we seek to a simplified, core version, whose semantics are much easier to write. The present thesis contains an analysis of their challenge, as well as the first steps towards a solution. Scaling methods with the size of the language is very difficult due to state space explosion, so this thesis proposes an incremental approach to learning the translation rules. I present a formalisation that both clarifies the informal description of the challenge by Krishnamurthi et al, and re-formulates the problem, shifting the focus to the conditions for incremental learning. The central definition of the new formalisation is the desugaring extension problem, i.e. extending a set of established translation rules by synthesising new ones. In a synthesis algorithm, the choice of search space is important and non-trivial, as it needs to strike a good balance between expressiveness and efficiency. The rest of the thesis focuses on defining search spaces for translation rules via typing rules. Two prerequisites are required for comparing search spaces. The first is a series of benchmarks, a set of source and target languages equipped with intended translation rules between them. The second is an enumerative synthesis algorithm for efficiently enumerating typed programs. I show how algebraic enumeration techniques can be applied to enumerating well-typed translation rules, and discuss the properties expected from a type system for ensuring that typed programs be efficiently enumerable. The thesis presents and empirically evaluates two search spaces. A baseline search space yields the first practical solution to the challenge. The second search space is based on a natural heuristic for translation rules, limiting the usage of variables so that they are used exactly once. I present a linear type system designed to efficiently enumerate translation rules, where this heuristic is enforced. Through informal analysis and empirical comparison to the baseline, I then show that using linear types can speed up the synthesis of translation rules by an order of magnitude

    UMSL Bulletin 2023-2024

    Get PDF
    The 2023-2024 Bulletin and Course Catalog for the University of Missouri St. Louis.https://irl.umsl.edu/bulletin/1088/thumbnail.jp

    Complete and easy type Inference for first-class polymorphism

    Get PDF
    The Hindley-Milner (HM) typing discipline is remarkable in that it allows statically typing programs without requiring the programmer to annotate programs with types themselves. This is due to the HM system offering complete type inference, meaning that if a program is well typed, the inference algorithm is able to determine all the necessary typing information. Let bindings implicitly perform generalisation, allowing a let-bound variable to receive the most general possible type, which in turn may be instantiated appropriately at each of the variable’s use sites. As a result, the HM type system has since become the foundation for type inference in programming languages such as Haskell as well as the ML family of languages and has been extended in a multitude of ways. The original HM system only supports prenex polymorphism, where type variables are universally quantified only at the outermost level. This precludes many useful programs, such as passing a data structure to a function in the form of a fold function, which would need to be polymorphic in the type of the accumulator. However, this would require a nested quantifier in the type of the overall function. As a result, one direction of extending the HM system is to add support for first-class polymorphism, allowing arbitrarily nested quantifiers and instantiating type variables with polymorphic types. In such systems, restrictions are necessary to retain decidability of type inference. This work presents FreezeML, a novel approach for integrating first-class polymorphism into the HM system, focused on simplicity. It eschews sophisticated yet hard to grasp heuristics in the type systems or extending the language of types, while still requiring only modest amounts of annotations. In particular, FreezeML leverages the mechanisms for generalisation and instantiation that are already at the heart of ML. Generalisation and instantiation are performed by let bindings and variables, respectively, but extended to types beyond prenex polymorphism. The defining feature of FreezeML is the ability to freeze variables, which prevents the usual instantiation of their types, allowing them instead to keep their original, fully polymorphic types. We demonstrate that FreezeML is as expressive as System F by providing a translation from the latter to the former; the reverse direction is also shown. Further, we prove that FreezeML is indeed a conservative extension of ML: When considering only ML programs, FreezeML accepts exactly the same programs as ML itself. # We show that type inference for FreezeML can easily be integrated into HM-like type systems by presenting a sound and complete inference algorithm for FreezeML that extends Algorithm W, the original inference algorithm for the HM system. Since the inception of Algorithm W in the 1970s, type inference for the HM system and its descendants has been modernised by approaches that involve constraint solving, which proved to be more modular and extensible. In such systems, a term is translated to a logical constraint, whose solutions correspond to the types of the original term. A solver for such constraints may then be defined independently. To this end, we demonstrate such a constraint-based inference approach for FreezeML. We also discuss the effects of integrating the value restriction into FreezeML and provide detailed comparisons with other approaches towards first-class polymorphism in ML alongside a collection of examples found in the literature

    Set-theoretic Types for Erlang

    Full text link
    Erlang is a functional programming language with dynamic typing. The language offers great flexibility for destructing values through pattern matching and dynamic type tests. Erlang also comes with a type language supporting parametric polymorphism, equi-recursive types, as well as union and a limited form of intersection types. However, type signatures only serve as documentation, there is no check that a function body conforms to its signature. Set-theoretic types and semantic subtyping fit Erlang's feature set very well. They allow expressing nearly all constructs of its type language and provide means for statically checking type signatures. This article brings set-theoretic types to Erlang and demonstrates how existing Erlang code can be statically typechecked without or with only minor modifications to the code. Further, the article formalizes the main ingredients of the type system in a small core calculus, reports on an implementation of the system, and compares it with other static typecheckers for Erlang.Comment: 14 pages, 9 figures, IFL 2022; latexmk -pdf to buil

    Software Test Case Generation Tools and Techniques: A Review

    Get PDF
    Software Industry is evolving at a very fast pace since last two decades. Many software developments, testing and test case generation approaches have evolved in last two decades to deliver quality products and services. Testing plays a vital role to ensure the quality and reliability of software products. In this paper authors attempted to conduct a systematic study of testing tools and techniques. Six most popular e-resources called IEEE, Springer, Association for Computing Machinery (ACM), Elsevier, Wiley and Google Scholar to download 738 manuscripts out of which 125 were selected to conduct the study. Out of 125 manuscripts selected, a good number approx. 79% are from reputed journals and around 21% are from good conference of repute. Testing tools discussed in this paper have broadly been divided into five different categories: open source, academic and research, commercial, academic and open source, and commercial & open source. The paper also discusses several benchmarked datasets viz. Evosuite 10, SF100 Corpus, Defects4J repository, Neo4j, JSON, Mocha JS, and Node JS to name a few. Aim of this paper is to make the researchers aware of the various test case generation tools and techniques introduced in the last 11 years with their salient features

    Guided rewriting and constraint satisfaction for parallel GPU code generation

    Get PDF
    Graphics Processing Units (GPUs) are notoriously hard to optimise for manually due to their scheduling and memory hierarchies. What is needed are good automatic code generators and optimisers for such parallel hardware. Functional approaches such as Accelerate, Futhark and LIFT leverage a high-level algorithmic Intermediate Representation (IR) to expose parallelism and abstract the implementation details away from the user. However, producing efficient code for a given accelerator remains challenging. Existing code generators depend on the user input to choose a subset of hard-coded optimizations or automated exploration of implementation search space. The former suffers from the lack of extensibility, while the latter is too costly due to the size of the search space. A hybrid approach is needed, where a space of valid implementations is built automatically and explored with the aid of human expertise. This thesis presents a solution combining user-guided rewriting and automatically generated constraints to produce high-performance code. The first contribution is an automatic tuning technique to find a balance between performance and memory consumption. Leveraging its functional patterns, the LIFT compiler is empowered to infer tuning constraints and limit the search to valid tuning combinations only. Next, the thesis reframes parallelisation as a constraint satisfaction problem. Parallelisation constraints are extracted automatically from the input expression, and a solver is used to identify valid rewriting. The constraints truncate the search space to valid parallel mappings only by capturing the scheduling restrictions of the GPU in the context of a given program. A synchronisation barrier insertion technique is proposed to prevent data races and improve the efficiency of the generated parallel mappings. The final contribution of this thesis is the guided rewriting method, where the user encodes a design space of structural transformations using high-level IR nodes called rewrite points. These strongly typed pragmas express macro rewrites and expose design choices as explorable parameters. The thesis proposes a small set of reusable rewrite points to achieve tiling, cache locality, data reuse and memory optimisation. A comparison with the vendor-provided handwritten kernel ARM Compute Library and the TVM code generator demonstrates the effectiveness of this thesis' contributions. With convolution as a use case, LIFT-generated direct and GEMM-based convolution implementations are shown to perform on par with the state-of-the-art solutions on a mobile GPU. Overall, this thesis demonstrates that a functional IR yields well to user-guided and automatic rewriting for high-performance code generation

    Mapping the Focal Points of WordPress: A Software and Critical Code Analysis

    Get PDF
    Programming languages or code can be examined through numerous analytical lenses. This project is a critical analysis of WordPress, a prevalent web content management system, applying four modes of inquiry. The project draws on theoretical perspectives and areas of study in media, software, platforms, code, language, and power structures. The applied research is based on Critical Code Studies, an interdisciplinary field of study that holds the potential as a theoretical lens and methodological toolkit to understand computational code beyond its function. The project begins with a critical code analysis of WordPress, examining its origins and source code and mapping selected vulnerabilities. An examination of the influence of digital and computational thinking follows this. The work also explores the intersection of code patching and vulnerability management and how code shapes our sense of control, trust, and empathy, ultimately arguing that a rhetorical-cultural lens can be used to better understand code\u27s controlling influence. Recurring themes throughout these analyses and observations are the connections to power and vulnerability in WordPress\u27 code and how cultural, processual, rhetorical, and ethical implications can be expressed through its code, creating a particular worldview. Code\u27s emergent properties help illustrate how human values and practices (e.g., empathy, aesthetics, language, and trust) become encoded in software design and how people perceive the software through its worldview. These connected analyses reveal cultural, processual, and vulnerability focal points and the influence these entanglements have concerning WordPress as code, software, and platform. WordPress is a complex sociotechnical platform worthy of further study, as is the interdisciplinary merging of theoretical perspectives and disciplines to critically examine code. Ultimately, this project helps further enrich the field by introducing focal points in code, examining sociocultural phenomena within the code, and offering techniques to apply critical code methods

    Staged Specifications for Automated Verification of Higher-Order Imperative Programs

    Full text link
    Higher-order functions and imperative references are language features supported by many mainstream languages. Their combination enables the ability to package references to code blocks with the captured state from their environment. Higher-order imperative programs are expressive and useful, but complicate formal specification and reasoning due to the use of yet-to-be-instantiated function parameters, especially when their invocations may mutate memory captured by or reachable from their arguments. Existing state-of-the-art works for verifying higher-order imperative behaviors are restricted in two ways: achieving strong theoretical results without automated implementations, or achieving automation with the help of strong assumptions from dedicated type systems (e.g. Rust). To enable an automated verification solution for imperative languages without the above restrictions, we introduce Higher-order Staged Separation Logic (HSSL), an extension of Hoare logic for call-by-value higher-order functions with ML-like local references. In this paper, we design a novel staged specification logic, prove its soundness, develop a new automated higher-order verifier, Heifer, for a core OCaml-like language, report on experimental results, and present various case studies investigating its capabilities

    Design of new algorithms for gene network reconstruction applied to in silico modeling of biomedical data

    Get PDF
    Programa de Doctorado en Biotecnología, Ingeniería y Tecnología QuímicaLínea de Investigación: Ingeniería, Ciencia de Datos y BioinformáticaClave Programa: DBICódigo Línea: 111The root causes of disease are still poorly understood. The success of current therapies is limited because persistent diseases are frequently treated based on their symptoms rather than the underlying cause of the disease. Therefore, biomedical research is experiencing a technology-driven shift to data-driven holistic approaches to better characterize the molecular mechanisms causing disease. Using omics data as an input, emerging disciplines like network biology attempt to model the relationships between biomolecules. To this effect, gene co- expression networks arise as a promising tool for deciphering the relationships between genes in large transcriptomic datasets. However, because of their low specificity and high false positive rate, they demonstrate a limited capacity to retrieve the disrupted mechanisms that lead to disease onset, progression, and maintenance. Within the context of statistical modeling, we dove deeper into the reconstruction of gene co-expression networks with the specific goal of discovering disease-specific features directly from expression data. Using ensemble techniques, which combine the results of various metrics, we were able to more precisely capture biologically significant relationships between genes. We were able to find de novo potential disease-specific features with the help of prior biological knowledge and the development of new network inference techniques. Through our different approaches, we analyzed large gene sets across multiple samples and used gene expression as a surrogate marker for the inherent biological processes, reconstructing robust gene co-expression networks that are simple to explore. By mining disease-specific gene co-expression networks we come up with a useful framework for identifying new omics-phenotype associations from conditional expression datasets.In this sense, understanding diseases from the perspective of biological network perturbations will improve personalized medicine, impacting rational biomarker discovery, patient stratification and drug design, and ultimately leading to more targeted therapies.Universidad Pablo de Olavide de Sevilla. Departamento de Deporte e Informátic

    Certificates for decision problems in temporal logic using context-based tableaux and sequent calculi.

    Get PDF
    115 p.Esta tesis trata de resolver problemas de Satisfactibilidad y Model Checking, aportando certificados del resultado. En ella, se trabaja con tres lógicas temporales: Propositional Linear Temporal Logic (PLTL), Computation Tree Logic (CTL) y Extended Computation Tree Logic (ECTL). Primero se presenta el trabajo realizado sobre Certified Satisfiability. Ahí se muestra una adaptación del ya existente método dual de tableaux y secuentes basados en contexto para satisfactibilidad de fórmulas PLTL en Negation Normal Form. Se ha trabajado la generación de certificados en el caso en el que las fórmulas son insactisfactibles. Por último, se aporta una prueba de soundness del método. Segundo, se ha optimizado con Sat Solvers el método de Certified Satisfiability para el contexto de Certified Model Checking. Se aportan varios ejemplos de sistemas y propiedades. Tercero, se ha creado un nuevo método dual de tableaux y secuentes basados en contexto para realizar Certified Satisfiability para fórmulas CTL yECTL. Se presenta el método y un algoritmo que genera tanto el modelo en el caso de que las fórmulas son satisfactibles como la prueba en el caso en que no lo sean. Por último, se presenta una implementación del método para CTL y una experimentación comparando el método propuesto con otro método de similares características
    corecore