322 research outputs found

    Language integrated relational lenses

    Get PDF
    Relational databases are ubiquitous. Such monolithic databases accumulate large amounts of data, yet applications typically only work on small portions of the data at a time. A subset of the database defined as a computation on the underlying tables is called a view. Querying views is helpful, but it is also desirable to update them and have these changes be applied to the underlying database. This view update problem has been the subject of much previous work before, but support by database servers is limited and only rarely available. Lenses are a popular approach to bidirectional transformations, a generalization of the view update problem in databases to arbitrary data. However, perhaps surprisingly, lenses have seldom actually been used to implement updatable views in databases. Bohannon, Pierce and Vaughan propose an approach to updatable views called relational lenses. However, to the best of our knowledge this proposal has not been implemented or evaluated prior to the work reported in this thesis. This thesis proposes programming language support for relational lenses. Language integrated relational lenses support expressive and efficient view updates, without relying on updatable view support from the database server. By integrating relational lenses into the programming language, application development becomes easier and less error-prone, avoiding the impedance mismatch of having two programming languages. Integrating relational lenses into the language poses additional challenges. As defined by Bohannon et al. relational lenses completely recompute the database, making them inefficient as the database scales. The other challenge is that some parts of the well-formedness conditions are too general for implementation. Bohannon et al. specify predicates using possibly infinite abstract sets and define the type checking rules using relational algebra. Incremental relational lenses equip relational lenses with change-propagating semantics that map small changes to the view into (potentially) small changes to the source tables. We prove that our incremental semantics are functionally equivalent to the non-incremental semantics, and our experimental results show orders of magnitude improvement over the non-incremental approach. This thesis introduces a concrete predicate syntax and shows how the required checks are performed on these predicates and show that they satisfy the abstract predicate specifications. We discuss trade-offs between static predicates that are fully known at compile time vs dynamic predicates that are only known during execution and introduce hybrid predicates taking inspiration from both approaches. This thesis adapts the typing rules for relational lenses from sequential composition to a functional style of sub-expressions. We prove that any well-typed functional relational lens expression can derive a well-typed sequential lens. We use these additions to relational lenses as the foundation for two practical implementations: an extension of the Links functional language and a library written in Haskell. The second implementation demonstrates how type-level computation can be used to implement relational lenses without changes to the compiler. These two implementations attest to the possibility of turning relational lenses into a practical language feature

    Towards A Practical High-Assurance Systems Programming Language

    Full text link
    Writing correct and performant low-level systems code is a notoriously demanding job, even for experienced developers. To make the matter worse, formally reasoning about their correctness properties introduces yet another level of complexity to the task. It requires considerable expertise in both systems programming and formal verification. The development can be extremely costly due to the sheer complexity of the systems and the nuances in them, if not assisted with appropriate tools that provide abstraction and automation. Cogent is designed to alleviate the burden on developers when writing and verifying systems code. It is a high-level functional language with a certifying compiler, which automatically proves the correctness of the compiled code and also provides a purely functional abstraction of the low-level program to the developer. Equational reasoning techniques can then be used to prove functional correctness properties of the program on top of this abstract semantics, which is notably less laborious than directly verifying the C code. To make Cogent a more approachable and effective tool for developing real-world systems, we further strengthen the framework by extending the core language and its ecosystem. Specifically, we enrich the language to allow users to control the memory representation of algebraic data types, while retaining the automatic proof with a data layout refinement calculus. We repurpose existing tools in a novel way and develop an intuitive foreign function interface, which provides users a seamless experience when using Cogent in conjunction with native C. We augment the Cogent ecosystem with a property-based testing framework, which helps developers better understand the impact formal verification has on their programs and enables a progressive approach to producing high-assurance systems. Finally we explore refinement type systems, which we plan to incorporate into Cogent for more expressiveness and better integration of systems programmers with the verification process

    Functorial String Diagrams for Reverse-Mode Automatic Differentiation

    Get PDF
    We formulate a reverse-mode automatic differentiation (RAD) algorithm for (applied) simply typed lambda calculus in the style of Pearlmutter and Siskind [Barak A. Pearlmutter and Jeffrey Mark Siskind, 2008], using the graphical formalism of string diagrams. Thanks to string diagram rewriting, we are able to formally prove for the first time the soundness of such an algorithm. Our approach requires developing a calculus of string diagrams with hierarchical features in the spirit of functorial boxes, in order to model closed monoidal (and cartesian closed) structure. To give an efficient yet principled implementation of the RAD algorithm, we use foliations of our hierarchical string diagrams

    Parallel and Flow-Based High Quality Hypergraph Partitioning

    Get PDF
    Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits. Given a hypergraph and an integer kk, the task is to divide the vertices into kk disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks. In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge. The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases. In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs. Once sufficiently small, an initial partition is computed. Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level. An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time. The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem. Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality. While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible. We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways. Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines. In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof. We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation. For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements. For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly. Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework. It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner. Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level. This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential. We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening. In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio. This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening. The last ingredient for high quality is an iterative improvement algorithm based on maximum flows. In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts. Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel. Beyond the strive for highest quality, we present a deterministically parallel partitioning framework. We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement. Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small. All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets. To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar. While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain. With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense. Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm

    Automated tailoring of system software stacks

    Get PDF
    In many industrial sectors, device manufacturers are moving away from expensive special-purpose hardware units and consolidate their systems on commodity hardware. As part of this change, developers are enabled to run their applications on general-purpose operating systems like Linux, which already supports thousands of different devices out of the box and can be used in a wide range of target scenarios. Furthermore, the Linux ecosystem allows them to integrate existing implementations of standard functionality in the form of shared libraries. However, as the libraries and the Linux kernel are designed as generic building blocks in order to support as many applications as possible, they cannot make assumptions about specific use cases for a single-purpose device. This generality leads to unnecessary overheads in narrowly defined target scenarios, as unneeded components do not only take up space on the target system but have to be maintained over the lifetime of the device as well. While the Linux kernel provides a configuration system to disable unneeded functionality like device drivers, determining the required features from over 16000 options is an infeasible task. Even worse, most shared libraries cannot be customized even though only around 10 percent of their functions are ever used by applications. In this thesis, I present my approaches for the automated identification and removal of unnecessary components in all layers of the software stack. As the configuration system is an integral part of the Linux kernel, we embrace its presence and automatically generate custom-fitted configurations for observed target scenarios with the help of an extracted variability model. For the much more diverse realm of shared libraries, with different programming languages, build systems, and a lack of configurability, I demonstrate a different approach. By identifying individual functions as logically distinct units, we construct a symbol-level dependency graph across the applications and all their required libraries. We then remove unneeded code at the binary level and rearrange the remaining parts to take up minimal space in the binary file by formulating their placement as an optimization problem. To lower the number of unnecessary updates to unused components in a deployed system, I lastly present an automated method to determine the impact of software changes on a target scenario and provide guidance for developers on whether they need to update their systems. Applying these techniques to different target systems, I demonstrate that we can disable up to 87 percent of configuration options in a Debian Linux kernel, shrink the size of an embedded OpenWrt kernel by 59 percent, and speed up the boot process of the embedded system by 21 percent. As part of the shared library tailoring process, we can remove 13060 functions from all libraries in OpenWrt and reduce their total size by 31 percent. In the memcached Docker container, we identify 381 entirely unneeded shared libraries and shrink the container image size by 82 percent. An analysis of the development history of two large library projects over the course of more than two years further shows that between 68 and 82 percent of all changes are not required for an OpenWrt appliance, reducing the number of patch days by up to 69 percent. These results demonstrate the broad applicability of our automated methods for both the Linux kernel and shared libraries to a wide range of scenarios. From embedded systems to server applications, custom-tailored system software stacks contribute to the reduction of overheads in space and time

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Jornadas Nacionales de Investigación en Ciberseguridad: actas de las VIII Jornadas Nacionales de Investigación en ciberseguridad: Vigo, 21 a 23 de junio de 2023

    Get PDF
    Jornadas Nacionales de Investigación en Ciberseguridad (8ª. 2023. Vigo)atlanTTicAMTEGA: Axencia para a modernización tecnolóxica de GaliciaINCIBE: Instituto Nacional de Cibersegurida

    String Diagrams for λ\lambda-calculi and Functional Computation

    Full text link
    This tutorial gives an advanced introduction to string diagrams and graph languages for higher-order computation. The subject matter develops in a principled way, starting from the two dimensional syntax of key categorical concepts such as functors, adjunctions, and strictification, and leading up to Cartesian Closed Categories, the core mathematical model of the lambda calculus and of functional programming languages. This methodology inverts the usual approach of proceeding from syntax to a categorical interpretation, by rationally reconstructing a syntax from the categorical model. The result is a graph syntax -- more precisely, a hierarchical hypergraph syntax -- which in many ways is shown to be an improvement over the conventional linear term syntax. The rest of the tutorial focuses on applications of interest to programming languages: operational semantics, general frameworks for type inference, and complex whole-program transformations such as closure conversion and automatic differentiation

    Teaching informatics to novices: big ideas and the necessity of optimal guidance

    Get PDF
    This thesis reports on the two main areas of our research: introductory programming as the traditional way of accessing informatics and cultural teaching informatics through unconventional pathways. The research on introductory programming aims to overcome challenges in traditional programming education, thus increasing participation in informatics. Improving access to informatics enables individuals to pursue more and better professional opportunities and contribute to informatics advancements. We aimed to balance active, student-centered activities and provide optimal support to novices at their level. Inspired by Productive Failure and exploring the concept of notional machine, our work focused on developing Necessity Learning Design, a design to help novices tackle new programming concepts. Using this design, we implemented a learning sequence to introduce arrays and evaluated it in a real high-school context. The subsequent chapters discuss our experiences teaching CS1 in a remote-only scenario during the COVID-19 pandemic and our collaborative effort with primary school teachers to develop a learning module for teaching iteration using a visual programming environment. The research on teaching informatics principles through unconventional pathways, such as cryptography, aims to introduce informatics to a broader audience, particularly younger individuals that are less technical and professional-oriented. It emphasizes the importance of understanding informatics's cultural and scientific aspects to focus on the informatics societal value and its principles for active citizenship. After reflecting on computational thinking and inspired by the big ideas of science and informatics, we describe our hands-on approach to teaching cryptography in high school, which leverages its key scientific elements to emphasize its social aspects. Additionally, we present an activity for teaching public-key cryptography using graphs to explore fundamental concepts and methods in informatics and mathematics and their interdisciplinarity. In broadening the understanding of informatics, these research initiatives also aim to foster motivation and prime for more professional learning of informatics
    corecore