Search CORE

243 research outputs found

Lightweight Communications and Marshalling for Low-Latency Interprocess Communication

Author: Huang Albert
Moore David
Olson Edwin
Publication venue
Publication date: 02/09/2009
Field of study

We describe the Lightweight Communications and Marshalling (LCM) library for message passing and data marshalling. The primary goal of LCM is to simplify the development of low-latency message passing systems, targeted at real-time robotics applications. LCM is comprised of several components: a data type specification language, a message passing system, logging/playback tools, and real-time analysis tools. LCM provides a platform- and language-independent type specification language. These specifications can be compiled into platform and language specific implementations, eliminating the need for users to implement marshalling code while guaranteeing run-time type safety. Messages can be transmitted between different processes using LCM's message-passing system, which implements a publish/subscribe model. LCM's implementation is notable in providing low-latency messaging and eliminating the need for a central communications "hub". This architecture makes it easy to mix simulated, recorded, and live data sources. A number of logging, playback, and traffic inspection tools simplify common development and debugging tasks. LCM is targeted at robotics and other real-time systems where low latency is critical; its messaging model permits dropping messages in order to minimize the latency of new messages. In this paper, we explain LCM's design, evaluate its performance, and describe its application to a number of autonomous land, underwater, and aerial robots

DSpace@MIT

Separating presentation from interface in RPC and IDLs

Author: Ford Bryan
Hibler Michael J.
Publication venue: University of Utah
Publication date: 01/01/1995
Field of study

Journal ArticleIn RPC-based communication, we term the interface the set of remote procedures and the types of their arguments; the presentation is the way these procedures and types are mapped to the target language environment in a particular client or server, including semantic requirements. For example, presentation includes the local names assigned to RPC stubs, the physical representation of a logical block of data (e.g., in-line, out-of-line, linked blocks), and trust requirements (e.g., integrity, security). In existing systems, the presentation of a given RPC construct is largely fixed. Separating presentation from interface, both in the interface definition language (IDL) itself and in the RPC implementation, is the key to interoperability, with many benefits in the area of elegance, as well. This separation and resulting cleanliness makes it manageable to generate specialized kernel code paths for each type of client-server pair. This is a key element o/end-to-end optimization. The separation should also allow the integration of disparate RPC optimization techniques, such as those applied in LRPC[2] and fbufs[6], into a single system, in a uniform and fully interoperable way. In initial work we demonstrate a variant of threaded code generation and two presentation-based optimizations, transparently activated by the RPC system. Each of these optimizations speeds up local RPC by approximately 25%.

The University of Utah: J. Willard Marriott Digital Library

Accelerating Climate and Weather Simulations through Hybrid Computing

Author: Cruz Carlos
Duffy Daniel
Purcell Mark
Tucker Robert
Zhou Shujia
Publication venue
Publication date
Field of study

Unconventional multi- and many-core processors (e.g. IBM (R) Cell B.E.(TM) and NVIDIA (R) GPU) have emerged as effective accelerators in trial climate and weather simulations. Yet these climate and weather models typically run on parallel computers with conventional processors (e.g. Intel, AMD, and IBM) using Message Passing Interface. To address challenges involved in efficiently and easily connecting accelerators to parallel computers, we investigated using IBM's Dynamic Application Virtualization (TM) (IBM DAV) software in a prototype hybrid computing system with representative climate and weather model components. The hybrid system comprises two Intel blades and two IBM QS22 Cell B.E. blades, connected with both InfiniBand(R) (IB) and 1-Gigabit Ethernet. The system significantly accelerates a solar radiation model component by offloading compute-intensive calculations to the Cell blades. Systematic tests show that IBM DAV can seamlessly offload compute-intensive calculations from Intel blades to Cell B.E. blades in a scalable, load-balanced manner. However, noticeable communication overhead was observed, mainly due to IP over the IB protocol. Full utilization of IB Sockets Direct Protocol and the lower latency production version of IBM DAV will reduce this overhead

NASA Technical Reports Server

Services Everywhere: an Object-Oriented Distributed Platform to Support Pervasive Access to HW and SW Objects in Ambient Intelligence Environments

Author: David Villa
Felix Jesus Villanueva
Fernando Rincon
Francisco Moya
Jesus Barba
Juan Carlos Lopez
Maria Jose Santofimia
Publication venue: 'IntechOpen'
Publication date: 01/03/2010
Field of study

IntechOpen

Darkside: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training

Author: Benini Luca
Conti Francesco
Garofalo Angelo
Nadalini Alessandro
Perotti Matteo
Rossi Davide
Tortorella Yvan
Valente Luca
Publication venue
Publication date: 01/01/2022
Field of study

On-chip DNN inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy and flexibility requirements. Heterogeneous clusters are promising solutions to meet the challenge, combining the flexibility of DSP-enhanced cores with the performance and energy boost of dedicated accelerators. We present Darkside, a System-on-Chip with a heterogeneous cluster of 8 RISC-V cores enhanced with 2-b to 32-b mixed-precision integer arithmetic. To boost performance and efficiency on key compute-intensive Deep Neural Network (DNN) kernels, the cluster is enriched with three digital accelerators: a specialized engine for low-data-reuse depthwise convolution kernels (up to 30 MAC/cycle); a minimal overhead datamover to marshal 1-b to 32-b data on-the-fly; a 16-b floating point Tensor Product Engine (TPE) for tiled matrix-multiplication acceleration. Darkside is implemented in 65nm CMOS technology. The cluster achieves a peak integer performance of 65 GOPS and a peak efficiency of 835 GOPS/W when working on 2-b integer DNN kernels. When targeting floating-point tensor operations, the TPE provides up to 18.2 GFLOPS of performance or 300 GFLOPS/W of efficiency – enough to enable on-chip floating-point training at competitive speed coupled with ultra-low power quantized inference

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Communication in Microkernel-Based Operating Systems

Author: Aigner Ronald
Publication venue
Publication date: 07/03/2006
Field of study

Communication in microkernel-based systems is much more frequent than system calls known from monolithic kernels. This can be attributed to the placement of system services into their own protection domains. Communication has to be fast to avoid unnecessary overhead. Also, communication channels in microkernel-based systems are used for more than just remote procedure calls. In distributed systems, which also have a componentized design, it is state of the art to use tools to generate stubs for the communication between components. The communication interfaces of components are described in an interface definition language (IDL). In contrast to distributed systems, components of a microkernel-based system run on the same architecture and message delivery is guaranteed. In this Thesis, I explore the different kinds of communication, which can be used in microkernel-based systems, as well as their possible representation in IDL. Specifically, I introduce the syntax to describe kernel objects in IDL. I discuss the complexity of IDL compilers and its relation to the complexity of the IDL. Furthermore, I evaluate the performance of the communication stubs generated by different IDL compilers and discuss techniques to minimize performance overhead in generated stubs. I validated these techniques by implementing the Drops IDL Compiler - Dice. Finally, this Thesis presents a mechanism to measure the frequency and performance of invocations of generated communication code. I used this technique to conduct measurements in highly complex systems and introducing the least possible overhead

Digital Commons

Technische Universität Dresden: Qucosa