4,227 research outputs found

    CONFLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code

    Full text link
    We present an instrumenting compiler for enforcing data confidentiality in low-level applications (e.g. those written in C) in the presence of an active adversary. In our approach, the programmer marks secret data by writing lightweight annotations on top-level definitions in the source code. The compiler then uses a static flow analysis coupled with efficient runtime instrumentation, a custom memory layout, and custom control-flow integrity checks to prevent data leaks even in the presence of low-level attacks. We have implemented our scheme as part of the LLVM compiler. We evaluate it on the SPEC micro-benchmarks for performance, and on larger, real-world applications (including OpenLDAP, which is around 300KLoC) for programmer overhead required to restructure the application when protecting the sensitive data such as passwords. We find that performance overheads introduced by our instrumentation are moderate (average 12% on SPEC), and the programmer effort to port OpenLDAP is only about 160 LoC.Comment: Technical report for CONFLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code, appearing at EuroSys 201

    Mutual information based clustering of market basket data for profiling users

    Get PDF
    Attraction and commercial success of web sites depend heavily on the additional values visitors may find. Here, individual, automatically obtained and maintained user profiles are the key for user satisfaction. This contribution shows for the example of a cooking information site how user profiles might be obtained using category information provided by cooking recipes. It is shown that metrical distance functions and standard clustering procedures lead to erroneous results. Instead, we propose a new mutual information based clustering approach and outline its implications for the example of user profiling

    Structure-Aware Sampling: Flexible and Accurate Summarization

    Full text link
    In processing large quantities of data, a fundamental problem is to obtain a summary which supports approximate query answering. Random sampling yields flexible summaries which naturally support subset-sum queries with unbiased estimators and well-understood confidence bounds. Classic sample-based summaries, however, are designed for arbitrary subset queries and are oblivious to the structure in the set of keys. The particular structure, such as hierarchy, order, or product space (multi-dimensional), makes range queries much more relevant for most analysis of the data. Dedicated summarization algorithms for range-sum queries have also been extensively studied. They can outperform existing sampling schemes in terms of accuracy on range queries per summary size. Their accuracy, however, rapidly degrades when, as is often the case, the query spans multiple ranges. They are also less flexible - being targeted for range sum queries alone - and are often quite costly to build and use. In this paper we propose and evaluate variance optimal sampling schemes that are structure-aware. These summaries improve over the accuracy of existing structure-oblivious sampling schemes on range queries while retaining the benefits of sample-based summaries: flexible summaries, with high accuracy on both range queries and arbitrary subset queries

    Separation logic for high-level synthesis

    Get PDF
    High-level synthesis (HLS) promises a significant shortening of the digital hardware design cycle by raising the abstraction level of the design entry to high-level languages such as C/C++. However, applications using dynamic, pointer-based data structures remain difficult to implement well, yet such constructs are widely used in software. Automated optimisations that leverage the memory bandwidth of dedicated hardware implementations by distributing the application data over separate on-chip memories and parallelise the implementation are often ineffective in the presence of dynamic data structures, due to the lack of an automated analysis that disambiguates pointer-based memory accesses. This thesis takes a step towards closing this gap. We explore recent advances in separation logic, a rigorous mathematical framework that enables formal reasoning about the memory access of heap-manipulating programs. We develop a static analysis that automatically splits heap-allocated data structures into provably disjoint regions. Our algorithm focuses on dynamic data structures accessed in loops and is accompanied by automated source-to-source transformations which enable loop parallelisation and physical memory partitioning by off-the-shelf HLS tools. We then extend the scope of our technique to pointer-based memory-intensive implementations that require access to an off-chip memory. The extended HLS design aid generates parallel on-chip multi-cache architectures. It uses the disjointness property of memory accesses to support non-overlapping memory regions by private caches. It also identifies regions which are shared after parallelisation and which are supported by parallel caches with a coherency mechanism and synchronisation, resulting in automatically specialised memory systems. We show up to 15x acceleration from heap partitioning, parallelisation and the insertion of the custom cache system in demonstrably practical applications.Open Acces
    corecore