243 research outputs found

    Lightweight Communications and Marshalling for Low-Latency Interprocess Communication

    Get PDF
    We describe the Lightweight Communications and Marshalling (LCM) library for message passing and data marshalling. The primary goal of LCM is to simplify the development of low-latency message passing systems, targeted at real-time robotics applications. LCM is comprised of several components: a data type specification language, a message passing system, logging/playback tools, and real-time analysis tools. LCM provides a platform- and language-independent type specification language. These specifications can be compiled into platform and language specific implementations, eliminating the need for users to implement marshalling code while guaranteeing run-time type safety. Messages can be transmitted between different processes using LCM's message-passing system, which implements a publish/subscribe model. LCM's implementation is notable in providing low-latency messaging and eliminating the need for a central communications "hub". This architecture makes it easy to mix simulated, recorded, and live data sources. A number of logging, playback, and traffic inspection tools simplify common development and debugging tasks. LCM is targeted at robotics and other real-time systems where low latency is critical; its messaging model permits dropping messages in order to minimize the latency of new messages. In this paper, we explain LCM's design, evaluate its performance, and describe its application to a number of autonomous land, underwater, and aerial robots

    Separating presentation from interface in RPC and IDLs

    Get PDF
    Journal ArticleIn RPC-based communication, we term the interface the set of remote procedures and the types of their arguments; the presentation is the way these procedures and types are mapped to the target language environment in a particular client or server, including semantic requirements. For example, presentation includes the local names assigned to RPC stubs, the physical representation of a logical block of data (e.g., in-line, out-of-line, linked blocks), and trust requirements (e.g., integrity, security). In existing systems, the presentation of a given RPC construct is largely fixed. Separating presentation from interface, both in the interface definition language (IDL) itself and in the RPC implementation, is the key to interoperability, with many benefits in the area of elegance, as well. This separation and resulting cleanliness makes it manageable to generate specialized kernel code paths for each type of client-server pair. This is a key element o/end-to-end optimization. The separation should also allow the integration of disparate RPC optimization techniques, such as those applied in LRPC[2] and fbufs[6], into a single system, in a uniform and fully interoperable way. In initial work we demonstrate a variant of threaded code generation and two presentation-based optimizations, transparently activated by the RPC system. Each of these optimizations speeds up local RPC by approximately 25%.

    Accelerating Climate and Weather Simulations through Hybrid Computing

    Get PDF
    Unconventional multi- and many-core processors (e.g. IBM (R) Cell B.E.(TM) and NVIDIA (R) GPU) have emerged as effective accelerators in trial climate and weather simulations. Yet these climate and weather models typically run on parallel computers with conventional processors (e.g. Intel, AMD, and IBM) using Message Passing Interface. To address challenges involved in efficiently and easily connecting accelerators to parallel computers, we investigated using IBM's Dynamic Application Virtualization (TM) (IBM DAV) software in a prototype hybrid computing system with representative climate and weather model components. The hybrid system comprises two Intel blades and two IBM QS22 Cell B.E. blades, connected with both InfiniBand(R) (IB) and 1-Gigabit Ethernet. The system significantly accelerates a solar radiation model component by offloading compute-intensive calculations to the Cell blades. Systematic tests show that IBM DAV can seamlessly offload compute-intensive calculations from Intel blades to Cell B.E. blades in a scalable, load-balanced manner. However, noticeable communication overhead was observed, mainly due to IP over the IB protocol. Full utilization of IB Sockets Direct Protocol and the lower latency production version of IBM DAV will reduce this overhead

    Darkside: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

    Get PDF
    On-chip DNN inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present Darkside, a System-on-Chip with a heterogeneous cluster of 8 RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost performance and efficiency on key compute-intensive Deep Neural Network (DNN) kernels, the cluster is enriched with three digital accelerators: a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); a minimal overhead datamover to marshal 1-b to 32-b data on-the-fly; a 16-b floating point Tensor Product Engine (TPE) for tiled matrix-multiplication acceleration. Darkside is implemented in 65nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency – enough to enable on-chip floating-point training at competitive speed coupled with ultra-low power quantized inference

    Communication in Microkernel-Based Operating Systems

    Get PDF
    Communication in microkernel-based systems is much more frequent than system calls known from monolithic kernels. This can be attributed to the placement of system services into their own protection domains. Communication has to be fast to avoid unnecessary overhead. Also, communication channels in microkernel-based systems are used for more than just remote procedure calls. In distributed systems, which also have a componentized design, it is state of the art to use tools to generate stubs for the communication between components. The communication interfaces of components are described in an interface definition language (IDL). In contrast to distributed systems, components of a microkernel-based system run on the same architecture and message delivery is guaranteed. In this Thesis, I explore the different kinds of communication, which can be used in microkernel-based systems, as well as their possible representation in IDL. Specifically, I introduce the syntax to describe kernel objects in IDL. I discuss the complexity of IDL compilers and its relation to the complexity of the IDL. Furthermore, I evaluate the performance of the communication stubs generated by different IDL compilers and discuss techniques to minimize performance overhead in generated stubs. I validated these techniques by implementing the Drops IDL Compiler - Dice. Finally, this Thesis presents a mechanism to measure the frequency and performance of invocations of generated communication code. I used this technique to conduct measurements in highly complex systems and introducing the least possible overhead

    Communication in Microkernel-Based Operating Systems

    Get PDF
    Communication in microkernel-based systems is much more frequent than system calls known from monolithic kernels. This can be attributed to the placement of system services into their own protection domains. Communication has to be fast to avoid unnecessary overhead. Also, communication channels in microkernel-based systems are used for more than just remote procedure calls. In distributed systems, which also have a componentized design, it is state of the art to use tools to generate stubs for the communication between components. The communication interfaces of components are described in an interface definition language (IDL). In contrast to distributed systems, components of a microkernel-based system run on the same architecture and message delivery is guaranteed. In this Thesis, I explore the different kinds of communication, which can be used in microkernel-based systems, as well as their possible representation in IDL. Specifically, I introduce the syntax to describe kernel objects in IDL. I discuss the complexity of IDL compilers and its relation to the complexity of the IDL. Furthermore, I evaluate the performance of the communication stubs generated by different IDL compilers and discuss techniques to minimize performance overhead in generated stubs. I validated these techniques by implementing the Drops IDL Compiler - Dice. Finally, this Thesis presents a mechanism to measure the frequency and performance of invocations of generated communication code. I used this technique to conduct measurements in highly complex systems and introducing the least possible overhead

    Communication in Microkernel-Based Operating Systems

    Get PDF
    Communication in microkernel-based systems is much more frequent than system calls known from monolithic kernels. This can be attributed to the placement of system services into their own protection domains. Communication has to be fast to avoid unnecessary overhead. Also, communication channels in microkernel-based systems are used for more than just remote procedure calls. In distributed systems, which also have a componentized design, it is state of the art to use tools to generate stubs for the communication between components. The communication interfaces of components are described in an interface definition language (IDL). In contrast to distributed systems, components of a microkernel-based system run on the same architecture and message delivery is guaranteed. In this Thesis, I explore the different kinds of communication, which can be used in microkernel-based systems, as well as their possible representation in IDL. Specifically, I introduce the syntax to describe kernel objects in IDL. I discuss the complexity of IDL compilers and its relation to the complexity of the IDL. Furthermore, I evaluate the performance of the communication stubs generated by different IDL compilers and discuss techniques to minimize performance overhead in generated stubs. I validated these techniques by implementing the Drops IDL Compiler - Dice. Finally, this Thesis presents a mechanism to measure the frequency and performance of invocations of generated communication code. I used this technique to conduct measurements in highly complex systems and introducing the least possible overhead
    • …
    corecore