322 research outputs found
Language integrated relational lenses
Relational databases are ubiquitous. Such monolithic databases accumulate large
amounts of data, yet applications typically only work on small portions of the data
at a time. A subset of the database defined as a computation on the underlying
tables is called a view. Querying views is helpful, but it is also desirable to update
them and have these changes be applied to the underlying database. This view
update problem has been the subject of much previous work before, but support
by database servers is limited and only rarely available.
Lenses are a popular approach to bidirectional transformations, a generalization
of the view update problem in databases to arbitrary data. However, perhaps surprisingly, lenses have seldom actually been used to implement updatable views in
databases. Bohannon, Pierce and Vaughan propose an approach to updatable views called relational lenses. However, to the best of our knowledge this
proposal has not been implemented or evaluated prior to the work reported in
this thesis.
This thesis proposes programming language support for relational lenses. Language integrated relational lenses support expressive and efficient view updates,
without relying on updatable view support from the database server. By integrating relational lenses into the programming language, application development
becomes easier and less error-prone, avoiding the impedance mismatch of having
two programming languages. Integrating relational lenses into the language poses
additional challenges. As defined by Bohannon et al. relational lenses completely
recompute the database, making them inefficient as the database scales. The
other challenge is that some parts of the well-formedness conditions are too general for implementation. Bohannon et al. specify predicates using possibly infinite
abstract sets and define the type checking rules using relational algebra.
Incremental relational lenses equip relational lenses with change-propagating semantics that map small changes to the view into (potentially) small changes
to the source tables. We prove that our incremental semantics are functionally
equivalent to the non-incremental semantics, and our experimental results show
orders of magnitude improvement over the non-incremental approach. This thesis introduces a concrete predicate syntax and shows how the required checks
are performed on these predicates and show that they satisfy the abstract predicate specifications. We discuss trade-offs between static predicates that are fully
known at compile time vs dynamic predicates that are only known during execution and introduce hybrid predicates taking inspiration from both approaches.
This thesis adapts the typing rules for relational lenses from sequential composition to a functional style of sub-expressions. We prove that any well-typed
functional relational lens expression can derive a well-typed sequential lens.
We use these additions to relational lenses as the foundation for two practical implementations: an extension of the Links functional language and a library written
in Haskell. The second implementation demonstrates how type-level computation can be used to implement relational lenses without changes to the compiler.
These two implementations attest to the possibility of turning relational lenses
into a practical language feature
Towards A Practical High-Assurance Systems Programming Language
Writing correct and performant low-level systems code is a notoriously demanding job, even for experienced developers. To make the matter worse, formally reasoning about their correctness properties introduces yet another level of complexity to the task. It requires considerable expertise in both systems programming and formal verification. The development can be extremely costly due to the sheer complexity of the systems and the nuances in them, if not assisted with appropriate tools that provide abstraction and automation.
Cogent is designed to alleviate the burden on developers when writing and verifying systems code. It is a high-level functional language with a certifying compiler, which automatically proves the correctness of the compiled code and also provides a purely functional abstraction of the low-level program to the developer. Equational reasoning techniques can then be used to prove functional correctness properties of the program on top of this abstract semantics, which is notably less laborious than directly verifying the C code.
To make Cogent a more approachable and effective tool for developing real-world systems, we further strengthen the framework by extending the core language and its ecosystem. Specifically, we enrich the language to allow users to control the memory representation of algebraic data types, while retaining the automatic proof with a data layout refinement calculus. We repurpose existing tools in a novel way and develop an intuitive foreign function interface, which provides users a seamless experience when using Cogent in conjunction with native C. We augment the Cogent ecosystem with a property-based testing framework, which helps developers better understand the impact formal verification has on their programs and enables a progressive approach to producing high-assurance systems. Finally we explore refinement type systems, which we plan to incorporate into Cogent for more expressiveness and better integration of systems programmers with the verification process
Functorial String Diagrams for Reverse-Mode Automatic Differentiation
We formulate a reverse-mode automatic differentiation (RAD) algorithm for (applied) simply typed lambda calculus in the style of Pearlmutter and Siskind [Barak A. Pearlmutter and Jeffrey Mark Siskind, 2008], using the graphical formalism of string diagrams. Thanks to string diagram rewriting, we are able to formally prove for the first time the soundness of such an algorithm. Our approach requires developing a calculus of string diagrams with hierarchical features in the spirit of functorial boxes, in order to model closed monoidal (and cartesian closed) structure. To give an efficient yet principled implementation of the RAD algorithm, we use foliations of our hierarchical string diagrams
Parallel and Flow-Based High Quality Hypergraph Partitioning
Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits.
Given a hypergraph and an integer , the task is to divide the vertices into disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks.
In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge.
The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases.
In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs.
Once sufficiently small, an initial partition is computed.
Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level.
An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time.
The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem.
Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality.
While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible.
We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways.
Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines.
In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof.
We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation.
For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements.
For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly.
Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework.
It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner.
Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level.
This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential.
We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening.
In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio.
This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening.
The last ingredient for high quality is an iterative improvement algorithm based on maximum flows.
In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts.
Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel.
Beyond the strive for highest quality, we present a deterministically parallel partitioning framework.
We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement.
Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small.
All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets.
To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar.
While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain.
With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense.
Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm
Automated tailoring of system software stacks
In many industrial sectors, device manufacturers are moving away from expensive special-purpose hardware units and consolidate their systems on commodity hardware. As part of this change, developers are enabled to run their applications on general-purpose operating systems like Linux, which already supports thousands of different devices out of the box and can be used in a wide range of target scenarios. Furthermore, the Linux ecosystem allows them to integrate existing implementations of standard functionality in the form of shared libraries.
However, as the libraries and the Linux kernel are designed as generic building blocks in order to support as many applications as possible, they cannot make assumptions about specific use cases for a single-purpose device. This generality leads to unnecessary overheads in narrowly defined target scenarios, as unneeded components do not only take up space on the target system but have to be maintained over the lifetime of the device as well. While the Linux kernel provides a configuration system to disable unneeded functionality like device drivers, determining the required features from over 16000 options is an infeasible task. Even worse, most shared libraries cannot be customized even though only around 10 percent of their functions are ever used by applications.
In this thesis, I present my approaches for the automated identification and removal of unnecessary components in all layers of the software stack. As the configuration system is an integral part of the Linux kernel, we embrace its presence and automatically generate custom-fitted configurations for observed target scenarios with the help of an extracted variability model. For the much more diverse realm of shared libraries, with different programming languages, build systems, and a lack of configurability, I demonstrate a different approach. By identifying individual functions as logically distinct units, we construct a symbol-level dependency graph across the applications and all their required libraries. We then remove unneeded code at the binary level and rearrange the remaining parts to take up minimal space in the binary file by formulating their placement as an optimization problem. To lower the number of unnecessary updates to unused components in a deployed system, I lastly present an automated method to determine the impact of software changes on a target scenario and provide guidance for developers on whether they need to update their systems.
Applying these techniques to different target systems, I demonstrate that we can disable up to 87 percent of configuration options in a Debian Linux kernel, shrink the size of an embedded OpenWrt kernel by 59 percent, and speed up the boot process of the embedded system by 21 percent. As part of the shared library tailoring process, we can remove 13060 functions from all libraries in OpenWrt and reduce their total size by 31 percent. In the memcached Docker container, we identify 381 entirely unneeded shared libraries and shrink the container image size by 82 percent. An analysis of the development history of two large library projects over the course of more than two years further shows that between 68 and 82 percent of all changes are not required for an OpenWrt appliance, reducing the number of patch days by up to 69 percent.
These results demonstrate the broad applicability of our automated methods for both the Linux kernel and shared libraries to a wide range of scenarios. From embedded systems to server applications, custom-tailored system software stacks contribute to the reduction of overheads in space and time
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Jornadas Nacionales de Investigación en Ciberseguridad: actas de las VIII Jornadas Nacionales de Investigación en ciberseguridad: Vigo, 21 a 23 de junio de 2023
Jornadas Nacionales de Investigación en Ciberseguridad (8ª. 2023. Vigo)atlanTTicAMTEGA: Axencia para a modernización tecnolóxica de GaliciaINCIBE: Instituto Nacional de Cibersegurida
String Diagrams for -calculi and Functional Computation
This tutorial gives an advanced introduction to string diagrams and graph
languages for higher-order computation. The subject matter develops in a
principled way, starting from the two dimensional syntax of key categorical
concepts such as functors, adjunctions, and strictification, and leading up to
Cartesian Closed Categories, the core mathematical model of the lambda calculus
and of functional programming languages. This methodology inverts the usual
approach of proceeding from syntax to a categorical interpretation, by
rationally reconstructing a syntax from the categorical model. The result is a
graph syntax -- more precisely, a hierarchical hypergraph syntax -- which in
many ways is shown to be an improvement over the conventional linear term
syntax. The rest of the tutorial focuses on applications of interest to
programming languages: operational semantics, general frameworks for type
inference, and complex whole-program transformations such as closure conversion
and automatic differentiation
Teaching informatics to novices: big ideas and the necessity of optimal guidance
This thesis reports on the two main areas of our research: introductory programming as the traditional way of accessing informatics and cultural teaching informatics through unconventional pathways.
The research on introductory programming aims to overcome challenges in traditional programming education, thus increasing participation in informatics. Improving access to informatics enables individuals to pursue more and better professional opportunities and contribute to informatics advancements. We aimed to balance active, student-centered activities and provide optimal support to novices at their level. Inspired by Productive Failure and exploring the concept of notional machine, our work focused on developing Necessity Learning Design, a design to help novices tackle new programming concepts. Using this design, we implemented a learning sequence to introduce arrays and evaluated it in a real high-school context. The subsequent chapters discuss our experiences teaching CS1 in a remote-only scenario during the COVID-19 pandemic and our collaborative effort with primary school teachers to develop a learning module for teaching iteration using a visual programming environment.
The research on teaching informatics principles through unconventional pathways, such as cryptography, aims to introduce informatics to a broader audience, particularly younger individuals that are less technical and professional-oriented. It emphasizes the importance of understanding informatics's cultural and scientific aspects to focus on the informatics societal value and its principles for active citizenship. After reflecting on computational thinking and inspired by the big ideas of science and informatics, we describe our hands-on approach to teaching cryptography in high school, which leverages its key scientific elements to emphasize its social aspects. Additionally, we present an activity for teaching public-key cryptography using graphs to explore fundamental concepts and methods in informatics and mathematics and their interdisciplinarity. In broadening the understanding of informatics, these research initiatives also aim to foster motivation and prime for more professional learning of informatics
- …