Search CORE

7,612 research outputs found

HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA

Author: Benini Luca
Capotondi Alessandro
Kurth Andreas
Marongiu Andrea
Vogel Pirmin
Publication venue
Publication date: 01/01/2017
Field of study

Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host processor with programmable manycore accelerators (PMCAs) to combine general-purpose computing with domain-specific, efficient processing capabilities. While leading companies successfully advance their HESoC products, research lags behind due to the challenges of building a prototyping platform that unites an industry-standard host processor with an open research PMCA architecture. In this work we introduce HERO, an FPGA-based research platform that combines a PMCA composed of clusters of RISC-V cores, implemented as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host processor. The PMCA architecture mapped on the FPGA is silicon-proven, scalable, configurable, and fully modifiable. HERO includes a complete software stack that consists of a heterogeneous cross-compilation toolchain with support for OpenMP accelerator programming, a Linux driver, and runtime libraries for both host and PMCA. HERO is designed to facilitate rapid exploration on all software and hardware layers: run-time behavior can be accurately analyzed by tracing events, and modifications can be validated through fully automated hard ware and software builds and executed tests. We demonstrate the usefulness of HERO by means of case studies from our research

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features

Author: Ameli F.
Ammendola R.
Biagioni A.
Frezza O.
Lamanna G.
Lo Cicero F.
Lonardo A.
Martinelli M.
Nicolau C.
Paolucci P.S.
Pastorelli E.
Pontisso L.
Rossetti D.
Simeone F.
Simula F.
Sozzi M.
Tosoratto L.
Vicini P.
Publication venue
Publication date: 13/06/2014
Field of study

While the GPGPU paradigm is widely recognized as an effective approach to high performance computing, its adoption in low-latency, real-time systems is still in its early stages. Although GPUs typically show deterministic behaviour in terms of latency in executing computational kernels as soon as data is available in their internal memories, assessment of real-time features of a standard GPGPU system needs careful characterization of all subsystems along data stream path. The networking subsystem results in being the most critical one in terms of absolute value and fluctuations of its response latency. Our envisioned solution to this issue is NaNet, a FPGA-based PCIe Network Interface Card (NIC) design featuring a configurable and extensible set of network channels with direct access through GPUDirect to NVIDIA Fermi/Kepler GPU memories. NaNet design currently supports both standard - GbE (1000BASE-T) and 10GbE (10Base-R) - and custom - 34~Gbps APElink and 2.5~Gbps deterministic latency KM3link - channels, but its modularity allows for a straightforward inclusion of other link technologies. To avoid host OS intervention on data stream and remove a possible source of jitter, the design includes a network/transport layer offload module with cycle-accurate, upper-bound latency, supporting UDP, KM3link Time Division Multiplexing and APElink protocols. After NaNet architecture description and its latency/bandwidth characterization for all supported links, two real world use cases will be presented: the GPU-based low level trigger for the RICH detector in the NA62 experiment at CERN and the on-/off-shore data link for KM3 underwater neutrino telescope

arXiv.org e-Print Archive

CERN Document Server

Learning Concise Models from Long Execution Traces

Author: Jeppu Natasha Yogananda
Kroening Daniel
Melham Tom
O'Leary John
Publication venue
Publication date: 01/01/2020
Field of study

Abstract models of system-level behaviour have applications in design exploration, analysis, testing and verification. We describe a new algorithm for automatically extracting useful models, as automata, from execution traces of a HW/SW system driven by software exercising a use-case of interest. Our algorithm leverages modern program synthesis techniques to generate predicates on automaton edges, succinctly describing system behaviour. It employs trace segmentation to tackle complexity for long traces. We learn concise models capturing transaction-level, system-wide behaviour--experimentally demonstrating the approach using traces from a variety of sources, including the x86 QEMU virtual platform and the Real-Time Linux kernel

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive

Recommended from our members

Computing infrastructure issues in distributed communications systems : a survey of operating system transport system architectures

Author: Schmidt Douglas C.
Suda Tatsuya
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

The performance of distributed applications (such as file transfer, remote login, tele-conferencing, full-motion video, and scientific visualization) is influenced by several factors that interact in complex ways. In particular, application performance is significantly affected both by communication infrastructure factors and computing infrastructure factors. Several communication infrastructure factors include channel speed, bit-error rate, and congestion at intermediate switching nodes. Computing infrastructure factors include (among other things) both protocol processing activities (such as connection management, flow control, error detection, and retransmission) and general operating system factors (such as memory latency, CPU speed, interrupt and context switching overhead, process architecture, and message buffering). Due to a several orders of magnitude increase in network channel speed and an increase in application diversity, performance bottlenecks are shifting from the network factors to the transport system factors.This paper defines an abstraction called an "Operating System Transport System Architecture" (OSTSA) that is used to classify the major components and services in the computing infrastructure. End-to-end network protocols such as TCP, TP4, VMTP, XTP, and Delta-t typically run on general-purpose computers, where they utilize various operating system resources such as processors, virtual memory, and network controllers. The OSTSA provides services that integrate these resources to support distributed applications running on local and wide area networks.A taxonomy is presented to evaluate OSTSAs in terms of their support for protocol processing activities. We use this taxonomy to compare and contrast five general-purpose commercial and experimental operating systems including System V UNIX, BSD UNIX, the x-kernel, Choices, and Xinu

eScholarship - University of California

Operating-system support for distributed multimedia

Author: Leslie Ian M.
Mcauley Derek
Mullender Sape J.
Publication venue: University of Twente, Faculty of Computer Science
Publication date: 01/01/1993
Field of study

Multimedia applications place new demands upon processors, networks and operating systems. While some network designers, through ATM for example, have considered revolutionary approaches to supporting multimedia, the same cannot be said for operating systems designers. Most work is evolutionary in nature, attempting to identify additional features that can be added to existing systems to support multimedia. Here we describe the Pegasus project's attempt to build an integrated hardware and operating system environment from\ud the ground up specifically targeted towards multimedia

CiteSeerX

University of Twente Research Information

Extending sensor networks into the cloud using Amazon web services

Author: Hughes D
Joosen W
Lee K
Murray D
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Sensor networks provide a method of collecting environmental data for use in a variety of distributed applications. However, to date, limited support has been provided for the development of integrated environmental monitoring and modeling applications. Specifically, environmental dynamism makes it difficult to provide computational resources that are sufficient to deal with changing environmental conditions. This paper argues that the Cloud Computing model is a good fit with the dynamic computational requirements of environmental monitoring and modeling. We demonstrate that Amazon EC2 can meet the dynamic computational needs of environmental applications. We also demonstrate that EC2 can be integrated with existing sensor network technologies to offer an end-to-end environmental monitoring and modeling solution

CiteSeerX

Deakin Research Online

Crossref

Nottingham Trent Institutional Repository (IRep)

Research Repository

On Making Emerging Trusted Execution Environments Accessible to Developers

Author: Asokan N.
McGillion Brian
Nyman Thomas
Publication venue
Publication date: 30/06/2015
Field of study

New types of Trusted Execution Environment (TEE) architectures like TrustLite and Intel Software Guard Extensions (SGX) are emerging. They bring new features that can lead to innovative security and privacy solutions. But each new TEE environment comes with its own set of interfaces and programming paradigms, thus raising the barrier for entry for developers who want to make use of these TEEs. In this paper, we motivate the need for realizing standard TEE interfaces on such emerging TEE architectures and show that this exercise is not straightforward. We report on our on-going work in mapping GlobalPlatform standard interfaces to TrustLite and SGX.Comment: Author's version of article to appear in 8th Internation Conference of Trust & Trustworthy Computing, TRUST 2015, Heraklion, Crete, Greece, August 24-26, 201

arXiv.org e-Print Archive

TUbiblio

OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

Author: A Klöckner
D Charousset
G Agha
G Agha
J Nickolls
JD Owens
K Wu
L Dagum
S Srinivasan
S Wienke
T Desell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

arXiv.org e-Print Archive

Crossref

REPOSIT