661 research outputs found

    Language integrated relational lenses

    Get PDF
    Relational databases are ubiquitous. Such monolithic databases accumulate large amounts of data, yet applications typically only work on small portions of the data at a time. A subset of the database defined as a computation on the underlying tables is called a view. Querying views is helpful, but it is also desirable to update them and have these changes be applied to the underlying database. This view update problem has been the subject of much previous work before, but support by database servers is limited and only rarely available. Lenses are a popular approach to bidirectional transformations, a generalization of the view update problem in databases to arbitrary data. However, perhaps surprisingly, lenses have seldom actually been used to implement updatable views in databases. Bohannon, Pierce and Vaughan propose an approach to updatable views called relational lenses. However, to the best of our knowledge this proposal has not been implemented or evaluated prior to the work reported in this thesis. This thesis proposes programming language support for relational lenses. Language integrated relational lenses support expressive and efficient view updates, without relying on updatable view support from the database server. By integrating relational lenses into the programming language, application development becomes easier and less error-prone, avoiding the impedance mismatch of having two programming languages. Integrating relational lenses into the language poses additional challenges. As defined by Bohannon et al. relational lenses completely recompute the database, making them inefficient as the database scales. The other challenge is that some parts of the well-formedness conditions are too general for implementation. Bohannon et al. specify predicates using possibly infinite abstract sets and define the type checking rules using relational algebra. Incremental relational lenses equip relational lenses with change-propagating semantics that map small changes to the view into (potentially) small changes to the source tables. We prove that our incremental semantics are functionally equivalent to the non-incremental semantics, and our experimental results show orders of magnitude improvement over the non-incremental approach. This thesis introduces a concrete predicate syntax and shows how the required checks are performed on these predicates and show that they satisfy the abstract predicate specifications. We discuss trade-offs between static predicates that are fully known at compile time vs dynamic predicates that are only known during execution and introduce hybrid predicates taking inspiration from both approaches. This thesis adapts the typing rules for relational lenses from sequential composition to a functional style of sub-expressions. We prove that any well-typed functional relational lens expression can derive a well-typed sequential lens. We use these additions to relational lenses as the foundation for two practical implementations: an extension of the Links functional language and a library written in Haskell. The second implementation demonstrates how type-level computation can be used to implement relational lenses without changes to the compiler. These two implementations attest to the possibility of turning relational lenses into a practical language feature

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Automated identification and behaviour classification for modelling social dynamics in group-housed mice

    Get PDF
    Mice are often used in biology as exploratory models of human conditions, due to their similar genetics and physiology. Unfortunately, research on behaviour has traditionally been limited to studying individuals in isolated environments and over short periods of time. This can miss critical time-effects, and, since mice are social creatures, bias results. This work addresses this gap in research by developing tools to analyse the individual behaviour of group-housed mice in the home-cage over several days and with minimal disruption. Using data provided by the Mary Lyon Centre at MRC Harwell we designed an end-to-end system that (a) tracks and identifies mice in a cage, (b) infers their behaviour, and subsequently (c) models the group dynamics as functions of individual activities. In support of the above, we also curated and made available a large dataset of mouse localisation and behaviour classifications (IMADGE), as well as two smaller annotated datasets for training/evaluating the identification (TIDe) and behaviour inference (ABODe) systems. This research constitutes the first of its kind in terms of the scale and challenges addressed. The data source (side-view single-channel video with clutter and no identification markers for mice) presents challenging conditions for analysis, but has the potential to give richer information while using industry standard housing. A Tracking and Identification module was developed to automatically detect, track and identify the (visually similar) mice in the cluttered home-cage using only single-channel IR video and coarse position from RFID readings. Existing detectors and trackers were combined with a novel Integer Linear Programming formulation to assign anonymous tracks to mouse identities. This utilised a probabilistic weight model of affinity between detections and RFID pickups. The next task necessitated the implementation of the Activity Labelling module that classifies the behaviour of each mouse, handling occlusion to avoid giving unreliable classifications when the mice cannot be observed. Two key aspects of this were (a) careful feature-selection, and (b) judicious balancing of the errors of the system in line with the repercussions for our setup. Given these sequences of individual behaviours, we analysed the interaction dynamics between mice in the same cage by collapsing the group behaviour into a sequence of interpretable latent regimes using both static and temporal (Markov) models. Using a permutation matrix, we were able to automatically assign mice to roles in the HMM, fit a global model to a group of cages and analyse abnormalities in data from a different demographic

    Declarative Specification of Intraprocedural Control-flow and Dataflow Analysis

    Get PDF
    Static program analysis plays a crucial role in ensuring the quality and security of software applications by detecting and fixing bugs, and potential security vulnerabilities in the code. The use of declarative paradigms in dataflow analysis as part of static program analysis has become increasingly popular in recent years. This is due to its enhanced expressivity and modularity, allowing for a higher-level programming approach, resulting in easy and efficient development.The aim of this thesis is to explore the design and implementation of control-flow and dataflow analyses using the declarative Reference Attribute Grammars formalism. Specifically, we focus on the construction of analyses directly on the source code rather than on an intermediate representation.The main result of this thesis is our language-agnostic framework, called IntraCFG. IntraCFG enables efficient and effective dataflow analysis by allowing the construction of precise and source-level control-flow graphs. The framework superimposes control-flow graphs on top of the abstract syntax tree of the program. The effectiveness of IntraCFG is demonstrated through two case studies, IntraJ and IntraTeal. These case studies showcase the potential and flexibility of IntraCFG in diverse contexts, such as bug detection and education. IntraJ supports the Java programming language, while IntraTeal is a tool designed for teaching program analysis for an educational language, Teal.IntraJ has proven to be faster than and as precise as well-known industrial tools. The combination of precision, performance, and on-demand evaluation in IntraJ leads to low latency in querying the analysis results. This makes IntraJ a suitable tool for use in interactive tools. Preliminary experiments have also been conducted to demonstrate how IntraJ can be used to support interactive bug detection and fixing.Additionally, this thesis presents JFeature, a tool for automatically extracting and summarising the features of a Java corpus, including the use of different Java features (e.g., use of Lambda Expressions) across different Java versions. JFeature provides researchers and developers with a deeper understanding of the characteristics of corpora, enabling them to identify suitable benchmarks for the evaluation of their tools and methodologies

    A Differential Datalog Interpreter

    Full text link
    The core reasoning task for datalog engines is materialization, the evaluation of a datalog program over a database alongside its physical incorporation into the database itself. The de-facto method of computing it, is through the recursive application of inference rules. Due to it being a costly operation, it is a must for datalog engines to provide incremental materialization, that is, to adjust the computation to new data, instead of restarting from scratch. One of the major caveats, is that deleting data is notoriously more involved than adding, since one has to take into account all possible data that has been entailed from what is being deleted. Differential Dataflow is a computational model that provides efficient incremental maintenance, notoriously with equal performance between additions and deletions, and work distribution, of iterative dataflows. In this paper we investigate the performance of materialization with three reference datalog implementations, out of which one is built on top of a lightweight relational engine, and the two others are differential-dataflow and non-differential versions of the same rewrite algorithm, with the same optimizations

    Processing genome-wide association studies within a repository of heterogeneous genomic datasets

    Get PDF
    Background Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants – typically single-nucleotide polymorphisms (SNPs) – in different individuals that are associated with phenotypic traits. Research efforts have so far been directed to improving GWAS techniques rather than on making the results of GWAS interoperable with other genomic signals; this is currently hindered by the use of heterogeneous formats and uncoordinated experiment descriptions. Results To practically facilitate integrative use, we propose to include GWAS datasets within the META-BASE repository, exploiting an integration pipeline previously studied for other genomic datasets that includes several heterogeneous data types in the same format, queryable from the same systems. We represent GWAS SNPs and metadata by means of the Genomic Data Model and include metadata within a relational representation by extending the Genomic Conceptual Model with a dedicated view. To further reduce the gap with the descriptions of other signals in the repository of genomic datasets, we perform a semantic annotation of phenotypic traits. Our pipeline is demonstrated using two important data sources, initially organized according to different data models: the NHGRI-EBI GWAS Catalog and FinnGen (University of Helsinki). The integration effort finally allows us to use these datasets within multisample processing queries that respond to important biological questions. These are then made usable for multi-omic studies together with, e.g., somatic and reference mutation data, genomic annotations, epigenetic signals. Conclusions As a result of our work on GWAS datasets, we enable 1) their interoperable use with several other homogenized and processed genomic datasets in the context of the META-BASE repository; 2) their big data processing by means of the GenoMetric Query Language and associated system. Future large-scale tertiary data analysis may extensively benefit from the addition of GWAS results to inform several different downstream analysis workflows

    Generalised latent variable models for location, scale, and shape parameters

    Get PDF
    Latent Variable Models (LVM) are widely used in social, behavioural, and educational sciences to uncover underlying associations in multivariate data using a smaller number of latent variables. However, the classical LVM framework has certain assumptions that can be restrictive in empirical applications. In particular, the distribution of the observed variables being from the exponential family and the latent variables influencing only the conditional mean of the observed variables. This thesis addresses these limitations and contributes to the current literature in two ways. First, we propose a novel class of models called Generalised Latent Variable Models for Location, Scale, and Shape parameters (GLVM-LSS). These models use linear functions of latent factors to model location, scale, and shape parameters of the items’ conditional distributions. By doing so, we model higher order moments such as variance, skewness, and kurtosis in terms of the latent variables, providing a more flexible framework compared to classical factor models. The model parameters are estimated using maximum likelihood estimation. Second, we address the challenge of interpreting the GLVM-LSS, which can be complex due to its increased number of parameters. We propose a penalised maximum likelihood estimation approach with automatic selection of tuning parameters. This extends previous work on penalised estimation in the LVM literature to cases without closed-form solutions. Our findings suggest that modelling the entire distribution of items, not just the conditional mean, leads to improved model fit and deeper insights into how the items reflect the latent constructs they are intended to measure. To assess the performance of the proposed methods, we conduct extensive simulation studies and apply it to real-world data from educational testing and public opinion research. The results highlight the efficacy of the GLVM-LSS framework in capturing complex relationships between observed variables and latent factors, providing valuable insights for researchers in various fields

    Resilient and Scalable Forwarding for Software-Defined Networks with P4-Programmable Switches

    Get PDF
    Traditional networking devices support only fixed features and limited configurability. Network softwarization leverages programmable software and hardware platforms to remove those limitations. In this context the concept of programmable data planes allows directly to program the packet processing pipeline of networking devices and create custom control plane algorithms. This flexibility enables the design of novel networking mechanisms where the status quo struggles to meet high demands of next-generation networks like 5G, Internet of Things, cloud computing, and industry 4.0. P4 is the most popular technology to implement programmable data planes. However, programmable data planes, and in particular, the P4 technology, emerged only recently. Thus, P4 support for some well-established networking concepts is still lacking and several issues remain unsolved due to the different characteristics of programmable data planes in comparison to traditional networking. The research of this thesis focuses on two open issues of programmable data planes. First, it develops resilient and efficient forwarding mechanisms for the P4 data plane as there are no satisfying state of the art best practices yet. Second, it enables BIER in high-performance P4 data planes. BIER is a novel, scalable, and efficient transport mechanism for IP multicast traffic which has only very limited support of high-performance forwarding platforms yet. The main results of this thesis are published as 8 peer-reviewed and one post-publication peer-reviewed publication. The results cover the development of suitable resilience mechanisms for P4 data planes, the development and implementation of resilient BIER forwarding in P4, and the extensive evaluations of all developed and implemented mechanisms. Furthermore, the results contain a comprehensive P4 literature study. Two more peer-reviewed papers contain additional content that is not directly related to the main results. They implement congestion avoidance mechanisms in P4 and develop a scheduling concept to find cost-optimized load schedules based on day-ahead forecasts
    • …
    corecore