91 research outputs found

    Transparent pointer compression for linked data structures

    Full text link
    64-bit address spaces are increasingly important for modern applications, but they come at a price: pointers use twice as much memory, reducing the effective cache capacity and memory bandwidth of the system (compared to 32-bit ad-dress spaces). This paper presents a sophisticated, auto-matic transformation that shrinks pointers from 64-bits to 32-bits. The approach is “macroscopic, ” i.e., it operates on an entire logical data structure in the program at a time. It allows an individual data structure instance or even a subset thereof to grow up to 232 bytes in size, and can compress pointers to some data structures but not others. Together, these properties allow efficient usage of a large (64-bit) ad-dress space. We also describe (but have not implemented) a dynamic version of the technique that can transparently expand the pointers in an individual data structure if it ex-ceeds the 4GB limit. For a collection of pointer-intensive benchmarks, we show that the transformation reduces peak heap sizes substantially by (20 % to 2x) for several of these benchmarks, and improves overall performance significantly in some cases

    Semi-Supervised Object Detection in the Open World

    Full text link
    Existing approaches for semi-supervised object detection assume a fixed set of classes present in training and unlabeled datasets, i.e., in-distribution (ID) data. The performance of these techniques significantly degrades when these techniques are deployed in the open-world, due to the fact that the unlabeled and test data may contain objects that were not seen during training, i.e., out-of-distribution (OOD) data. The two key questions that we explore in this paper are: can we detect these OOD samples and if so, can we learn from them? With these considerations in mind, we propose the Open World Semi-supervised Detection framework (OWSSD) that effectively detects OOD data along with a semi-supervised learning pipeline that learns from both ID and OOD data. We introduce an ensemble based OOD detector consisting of lightweight auto-encoder networks trained only on ID data. Through extensive evalulation, we demonstrate that our method performs competitively against state-of-the-art OOD detection algorithms and also significantly boosts the semi-supervised learning performance in open-world scenarios

    Technical Report: Region and Effect Inference for Safe Parallelism

    Get PDF
    In this paper, we present the first full regions-and-effects inference algorithm for explicitly parallel fork-join programs. We infer annotations inspired by Deterministic Parallel Java (DPJ) for a type-safe subset of C++. We chose the DPJ annotations because they give the \emph{strongest} safety guarantees of any existing concurrency-checking approach we know of, static or dynamic, and it is also the most expressive static checking system we know of that gives strong safety guarantees. This expressiveness, however, makes manual annotation difficult and tedious, which motivates the need for automatic inference, but it also makes the inference problem very challenging: the code may use region polymorphism, imperative updates with complex aliasing, arbitrary recursion, hierarchical region specifications, and wildcard elements to describe potentially infinite sets of regions. We express the inference as a constraint satisfaction problem and develop, implement, and evaluate an algorithm for solving it. The region and effect annotations inferred by the algorithm constitute a checkable proof of safe parallelism, and it can be recorded both for documentation and for fast and modular safety checking.Ope

    The influence of random delays on parallel execution times

    Full text link

    Automatic Pool Allocation: Compile-Time Control of Data Structure Layout in the Heap

    Get PDF
    Despite the potential importance of data structure layouts and traversal patterns, compiler transformations on pointer-intensive programs are performed primarily using pointer analysis, and not by controlling and using information about the layout of high-level data structures. This paper describes a compiler transformation called \emph{Automatic Pool Allocation} that segregates instances of ``logical'' data structures in the heap into distinct pools, and allows different heuristics to be used to partially control the internal layout of those data structures. Because these are rigorous transformations, their results, combined with pointer analysis information, can be used to perform further compiler analyses and transformations, and we briefly list a few examples. Automatic Pool Allocation also provides several direct performance benefits for pointer intensive programs, most importantly, that traversals of a logical data structure allocated to a separate pool can have better spatial locality and smaller working sets. We evaluate the performance and cache behavior of the code transformed by the automatic pool allocation transformation on a series of heap-intensive and general-purpose benchmarks, and find that it speeds up several C programs by 10-40\% percent or more, and does not hurt (or help) other programs

    Parallel Programming Must Be Deterministic By Default

    Get PDF
    We examine the problem of providing a parallel programming model that guarantees deterministic semantics. We propose a research agenda focusing on the following questions: 1. How to guarantee determinism in a modern object-oriented language; 2. How to provide sound guarantees when parts of the program either cannot be proved deterministic or have "harmless" nondeterminism; 3. How to specify explicit non-determinism when needed; and 4. How to make it easier to port programs to the language

    An Empirical Study of Reported Bugs in Server Software with Implications for Automated Bug Diagnosis

    Get PDF
    Reproducing bug symptoms is a prerequisite for performing automatic bug diagnosis. Do bugs have characteristics that ease or hinder automatic bug diagnosis? In this paper, we conduct a thorough empirical study of several key characteristics of bugs that affect reproducibility at the production site. We examine randomly selected bug reports of six server applications and consider their implications on automatic bug diagnosis tools. Our results are promising. From the study, we find that nearly 82% of bug symptoms can be reproduced deterministically by re-running with the same set of inputs at the production site. We further find that very few input requests are needed to reproduce most failures; in fact, just one input request after session establishment suffices to reproduce the failure in nearly 77% of the cases. We describe the implications of the results on reproducing software failures and designing automated diagnosis tools for production runs.published or accepted for publicatio
    corecore