4,816 research outputs found
Permission-Based Separation Logic for Multithreaded Java Programs
This paper motivates and presents a program logic for reasoning about multithreaded Java-like programs with concurrency primitives such as dynamic thread creation, thread joining and reentrant object monitors. The logic is based on concurrent separation logic. It is the first detailed adaptation of concurrent separation logic to a multithreaded Java-like language. The program logic associates a unique static access permission with each heap location, ensuring exclusive write accesses and ruling out data races. Concurrent reads are supported through fractional permissions. Permissions can be transferred between threads upon thread starting, thread joining, initial monitor entrancies and final monitor exits.\ud
This paper presents the basic principles to reason about thread creation and thread joining. It finishes with an outlook how this logic will evolve into a full-fledged verification technique for Java (and possibly other multithreaded languages)
The force on the flex: Global parallelism and portability
A parallel programming methodology, called the force, supports the construction of programs to be executed in parallel by an unspecified, but potentially large, number of processes. The methodology was originally developed on a pipelined, shared memory multiprocessor, the Denelcor HEP, and embodies the primitive operations of the force in a set of macros which expand into multiprocessor Fortran code. A small set of primitives is sufficient to write large parallel programs, and the system has been used to produce 10,000 line programs in computational fluid dynamics. The level of complexity of the force primitives is intermediate. It is high enough to mask detailed architectural differences between multiprocessors but low enough to give the user control over performance. The system is being ported to a medium scale multiprocessor, the Flex/32, which is a 20 processor system with a mixture of shared and local memory. Memory organization and the type of processor synchronization supported by the hardware on the two machines lead to some differences in efficient implementations of the force primitives, but the user interface remains the same. An initial implementation was done by retargeting the macros to Flexible Computer Corporation's ConCurrent C language. Subsequently, the macros were caused to directly produce the system calls which form the basis for ConCurrent C. The implementation of the Fortran based system is in step with Flexible Computer Corporations's implementation of a Fortran system in the parallel environment
Gunrock: A High-Performance Graph Processing Library on the GPU
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs have been two
significant challenges for developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We evaluate Gunrock on five key graph
primitives and show that Gunrock has on average at least an order of magnitude
speedup over Boost and PowerGraph, comparable performance to the fastest GPU
hardwired primitives, and better performance than any other GPU high-level
graph library.Comment: 14 pages, accepted by PPoPP'16 (removed the text repetition in the
previous version v5
PluralisMAC: a generic multi-MAC framework for heterogeneous, multiservice wireless networks, applied to smart containers
Developing energy-efficient MAC protocols for lightweight wireless systems has been a challenging task for decades because of the specific requirements of various applications and the varying environments in which wireless systems are deployed. Many MAC protocols for wireless networks have been proposed, often custom-made for a specific application. It is clear that one MAC does not fit all the requirements. So, how should a MAC layer deal with an application that has several modes (each with different requirements) or with the deployment of another application during the lifetime of the system? Especially in a mobile wireless system, like Smart Monitoring of Containers, we cannot know in advance the application state (empty container versus stuffed container). Dynamic switching between different energy-efficient MAC strategies is needed. Our architecture, called PluralisMAC, contains a generic multi-MAC framework and a generic neighbour monitoring and filtering framework. To validate the real-world feasibility of our architecture, we have implemented it in TinyOS and have done experiments on the TMote Sky nodes in the w-iLab.t testbed. Experimental results show that dynamic switching between MAC strategies is possible with minimal receive chain overhead, while meeting the various application requirements (reliability and low-energy consumption)
Gunrock: GPU Graph Analytics
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs, have presented two
significant challenges to developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We characterize the performance of
various optimization strategies and evaluate Gunrock's overall performance on
different GPU architectures on a wide range of graph primitives that span from
traversal-based algorithms and ranking algorithms, to triangle counting and
bipartite-graph-based algorithms. The results show that on a single GPU,
Gunrock has on average at least an order of magnitude speedup over Boost and
PowerGraph, comparable performance to the fastest GPU hardwired primitives and
CPU shared-memory graph libraries such as Ligra and Galois, and better
performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing
(TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance
Graph Processing Library on the GPU
Platform for Testing and Evaluation of PUF and TRNG Implementations in FPGAs
Implementation of cryptographic primitives like
Physical Unclonable Functions (PUFs) and True Random Number
Generators (TRNGs) depends significantly on the underlying
hardware. Common evaluation boards offered by FPGA vendors
are not suitable for a fair benchmarking, since they have different
vendor dependent configuration and contain noisy switching
power supplies. The proposed hardware platform is primary
aimed at testing and evaluation of cryptographic primitives
across different FPGA and ASIC families. The modular platform
consists of a motherboard and exchangeable daughter board
modules. These are designed to be as simple as possible to
allow cheap and independent evaluation of cryptographic blocks
and namely PUFs. The motherboard is based on the Microsemi
SmartFusion 2 SoC FPGA. It features a low-noise power supply,
which simplifies evaluation of vulnerability to the side channel
attacks. It provides also means of communication between the
PC and the daughter module. Available software tools can be
easily customized, for example to collect data from the random
number generator located in the daughter module and to read it
via USB interface. The daughter module can be plugged into
the motherboard or connected using an HDMI cable to be
placed inside a Faraday cage or a temperature control chamber.
The whole platform was designed and optimized to fullfil the
European HECTOR project (H2020) requirements
- …