4,816 research outputs found

    Permission-Based Separation Logic for Multithreaded Java Programs

    Get PDF
    This paper motivates and presents a program logic for reasoning about multithreaded Java-like programs with concurrency primitives such as dynamic thread creation, thread joining and reentrant object monitors. The logic is based on concurrent separation logic. It is the first detailed adaptation of concurrent separation logic to a multithreaded Java-like language. The program logic associates a unique static access permission with each heap location, ensuring exclusive write accesses and ruling out data races. Concurrent reads are supported through fractional permissions. Permissions can be transferred between threads upon thread starting, thread joining, initial monitor entrancies and final monitor exits.\ud This paper presents the basic principles to reason about thread creation and thread joining. It finishes with an outlook how this logic will evolve into a full-fledged verification technique for Java (and possibly other multithreaded languages)

    The force on the flex: Global parallelism and portability

    Get PDF
    A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment

    Gunrock: A High-Performance Graph Processing Library on the GPU

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs have been two significant challenges for developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We evaluate Gunrock on five key graph primitives and show that Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives, and better performance than any other GPU high-level graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the previous version v5

    PluralisMAC: a generic multi-MAC framework for heterogeneous, multiservice wireless networks, applied to smart containers

    Get PDF
    Developing energy-efficient MAC protocols for lightweight wireless systems has been a challenging task for decades because of the specific requirements of various applications and the varying environments in which wireless systems are deployed. Many MAC protocols for wireless networks have been proposed, often custom-made for a specific application. It is clear that one MAC does not fit all the requirements. So, how should a MAC layer deal with an application that has several modes (each with different requirements) or with the deployment of another application during the lifetime of the system? Especially in a mobile wireless system, like Smart Monitoring of Containers, we cannot know in advance the application state (empty container versus stuffed container). Dynamic switching between different energy-efficient MAC strategies is needed. Our architecture, called PluralisMAC, contains a generic multi-MAC framework and a generic neighbour monitoring and filtering framework. To validate the real-world feasibility of our architecture, we have implemented it in TinyOS and have done experiments on the TMote Sky nodes in the w-iLab.t testbed. Experimental results show that dynamic switching between MAC strategies is possible with minimal receive chain overhead, while meeting the various application requirements (reliability and low-energy consumption)

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU

    Platform for Testing and Evaluation of PUF and TRNG Implementations in FPGAs

    Get PDF
    Implementation of cryptographic primitives like Physical Unclonable Functions (PUFs) and True Random Number Generators (TRNGs) depends significantly on the underlying hardware. Common evaluation boards offered by FPGA vendors are not suitable for a fair benchmarking, since they have different vendor dependent configuration and contain noisy switching power supplies. The proposed hardware platform is primary aimed at testing and evaluation of cryptographic primitives across different FPGA and ASIC families. The modular platform consists of a motherboard and exchangeable daughter board modules. These are designed to be as simple as possible to allow cheap and independent evaluation of cryptographic blocks and namely PUFs. The motherboard is based on the Microsemi SmartFusion 2 SoC FPGA. It features a low-noise power supply, which simplifies evaluation of vulnerability to the side channel attacks. It provides also means of communication between the PC and the daughter module. Available software tools can be easily customized, for example to collect data from the random number generator located in the daughter module and to read it via USB interface. The daughter module can be plugged into the motherboard or connected using an HDMI cable to be placed inside a Faraday cage or a temperature control chamber. The whole platform was designed and optimized to fullfil the European HECTOR project (H2020) requirements
    corecore