Search CORE

124 research outputs found

ControlPULP: A RISC-V On-Chip Parallel Power Controller for Many-Core HPC Processors with FPGA-Based Hardware-In-The-Loop Power and Thermal Emulation

Author: Balas Robert
Bambini Giovanni
Bartolini Andrea
Benini Luca
Ciani Maicol
del Vecchio Antonio
Ottaviano Alessandro
Rossi Davide
Publication venue
Publication date: 19/06/2023
Field of study

High-Performance Computing (HPC) processors are nowadays integrated Cyber-Physical Systems demanding complex and high-bandwidth closed-loop power and thermal control strategies. To efficiently satisfy real-time multi-input multi-output (MIMO) optimal power requirements, high-end processors integrate an on-die power controller system (PCS). While traditional PCSs are based on a simple microcontroller (MCU)-class core, more scalable and flexible PCS architectures are required to support advanced MIMO control algorithms for managing the ever-increasing number of cores, power states, and process, voltage, and temperature variability. This paper presents ControlPULP, an open-source, HW/SW RISC-V parallel PCS platform consisting of a single-core MCU with fast interrupt handling coupled with a scalable multi-core programmable cluster accelerator and a specialized DMA engine for the parallel acceleration of real-time power management policies. ControlPULP relies on FreeRTOS to schedule a reactive power control firmware (PCF) application layer. We demonstrate ControlPULP in a power management use-case targeting a next-generation 72-core HPC processor. We first show that the multi-core cluster accelerates the PCF, achieving 4.9x speedup compared to single-core execution, enabling more advanced power management algorithms within the control hyper-period at a shallow area overhead, about 0.1% the area of a modern HPC CPU die. We then assess the PCS and PCF by designing an FPGA-based, closed-loop emulation framework that leverages the heterogeneous SoCs paradigm, achieving DVFS tracking with a mean deviation within 3% the plant's thermal design power (TDP) against a software-equivalent model-in-the-loop approach. Finally, we show that the proposed PCF compares favorably with an industry-grade control algorithm under computational-intensive workloads.Comment: 33 pages, 11 figure

arXiv.org e-Print Archive

PLACES'10: The 3rd Workshop on Programmng Language Approaches to concurrency and Communication-Centric Software

Author: Honda Kohei
Mycroft Alan
Publication venue
Publication date: 30/12/2013
Field of study

Paphos, Cyprus. March 201

Queen Mary Research Online

StochSoCs: High performance biocomputing simulations for large scale Systems Biology

Author: Kouskoumvekakis Elias
Manolakos Elias S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/10/2017
Field of study

The stochastic simulation of large-scale biochemical reaction networks is of great importance for systems biology since it enables the study of inherently stochastic biological mechanisms at the whole cell scale. Stochastic Simulation Algorithms (SSA) allow us to simulate the dynamic behavior of complex kinetic models, but their high computational cost makes them very slow for many realistic size problems. We present a pilot service, named WebStoch, developed in the context of our StochSoCs research project, allowing life scientists with no high-performance computing expertise to perform over the internet stochastic simulations of large-scale biological network models described in the SBML standard format. Biomodels submitted to the service are parsed automatically and then placed for parallel execution on distributed worker nodes. The workers are implemented using multi-core and many-core processors, or FPGA accelerators that can handle the simulation of thousands of stochastic repetitions of complex biomodels, with possibly thousands of reactions and interacting species. Using benchmark LCSE biomodels, whose workload can be scaled on demand, we demonstrate linear speedup and more than two orders of magnitude higher throughput than existing serial simulators.Comment: The 2017 International Conference on High Performance Computing & Simulation (HPCS 2017), 8 page

arXiv.org e-Print Archive

Crossref

The Signal Synchronous Multiclock Approach to the Design of Distributed Embedded System

Author: Gamatié Abdoulaye
Gautier Thierry
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

International audienceThis paper presents the design of distributed embedded systems using the synchronous multiclock model of the Signal language. It proposes a methodology that ensures a correct-by-construction functional implementation of these systems from high-level models. It shows the capability of the synchronous approach to apply formal techniques and tools that guarantee the reliability of the designed systems. Such a capability is necessary and highly worthy when dealing with safety-critical systems. The proposed methodology is demonstrated through a case study consisting of a simple avionic application, which aims to pragmatically help the reader to understand the manipulated formal concepts, and to apply them easily in order to solve system correctness issues encountered in practice. The application functionality is first modeled as well as its distribution on a generic hardware architecture. This relies on the endochrony and endo-isochrony properties of Signal specifications, defined previously. The considered architectures include asynchronous communication mechanisms, which are also modeled in Signal and proved to achieve message exchanges correctly. Furthermore, the synchronizability of the different parts in the resulting system is addressed after its deployment on a specific execution platform with multirate clocks. After all these steps, a distributed code can be automatically generated

HAL-CentraleSupelec

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Rennes 1

Contracts for System Design

Author: Benveniste Albert
Caillaud Benoit
Damm Werner
Henzinger Thomas
Larsen Kim Guldstrand
Nickovic Dejan
Passerone Roberto
Raclet Jean-Baptiste
Reinkemeier Philipp
Sangiovanni-Vincentelli Alberto
Publication venue: HAL CCSD
Publication date: 27/11/2012
Field of study

Systems design has become a key challenge and differentiating factor over the last decades for system companies. Aircrafts, trains, cars, plants, distributed telecommunication military or health care systems, and more, involve systems design as a critical step. Complexity has caused system design times and costs to go severely over budget so as to threaten the health of entire industrial sectors. Heuristic methods and standard practices do not seem to scale with complexity so that novel design methods and tools based on a strong theoretical foundation are sorely needed. Model-based design as well as other methodologies such as layered and compositional design have been used recently but a unified intellectual framework with a complete design flow supported by formal tools is still lacking albeit some attempts at this framework such as Platform-based Design have been successfully deployed. Recently an "orthogonal" approach has been proposed that can be applied to all methodologies proposed thus far to provide a rigorous scaffolding for verification, analysis and abstraction/refinement: contractbased design. Several results have been obtained in this domain but a unified treatment of the topic that can help in putting contract-based design in perspective is still missing. This paper intends to provide such treatment where contracts are precisely defined and characterized so that they can be used in design methodologies such as the ones mentioned above with no ambiguity. In addition, the paper provides an important link between interfaces and contracts to show similarities and correspondences. Examples of the use of contracts in design are provided as well as in depth analysis of existing literature.Cet article fait le point sur le concept de contrat pour la conception de systèmes. Les contrats que nous proposons portent, non seulement sur des propriétés de typage de leurs interfaces, mais incluent une description abstraite de comportements. Nous proposons une méta-théorie, ou, si l'on veut, une théorie générique des contrats, qui permet le développement séparé de sous-systèmes. Nous montrons que cette méta-théorie se spécialise en l'une ou l'autre des théories connues

HAL-CentraleSupelec

Scientific Publications of the University of Toulouse II Le Mirail

INRIA a CCSD electronic archive server

Hal-Diderot

HAL-Rennes 1

FPGAs in Bioinformatics: Implementation and Evaluation of Common Bioinformatics Algorithms in Reconfigurable Logic

Author: Wienbrandt Lars
Publication venue
Publication date: 01/01/2016
Field of study

Life. Much effort is taken to grant humanity a little insight in this fascinating and complex but fundamental topic. In order to understand the relations and to derive consequences humans have begun to sequence their genomes, i.e. to determine their DNA sequences to infer information, e.g. related to genetic diseases. The process of DNA sequencing as well as subsequent analysis presents a computational challenge for recent computing systems due to the large amounts of data alone. Runtimes of more than one day for analysis of simple datasets are common, even if the process is already run on a CPU cluster. This thesis shows how this general problem in the area of bioinformatics can be tackled with reconfigurable hardware, especially FPGAs. Three compute intensive problems are highlighted: sequence alignment, SNP interaction analysis and genotype imputation. In the area of sequence alignment the software BLASTp for protein database searches is exemplarily presented, implemented and evaluated.SNP interaction analysis is presented with three applications performing an exhaustive search for interactions including the corresponding statistical tests: BOOST, iLOCi and the mutual information measurement. All applications are implemented in FPGA-hardware and evaluated, resulting in an impressive speedup of more than in three orders of magnitude when compared to standard computers. The last topic of genotype imputation presents a two-step process composed of the phasing step and the actual imputation step. The focus lies on the phasing step which is targeted by the SHAPEIT2 application. SHAPEIT2 is discussed with its underlying mathematical methods in detail, and finally implemented and evaluated. A remarkable speedup of 46 is reached here as well

MACAU: Open Access Repository of Kiel University

Evaluating the performance of legacy applications on emerging parallel architectures

Author: Pennycook Simon J.
Publication venue
Publication date
Field of study

The gap between a supercomputer's theoretical maximum (\peak") oatingpoint performance and that actually achieved by applications has grown wider over time. Today, a typical scientific application achieves only 5{20% of any given machine's peak processing capability, and this gap leaves room for significant improvements in execution times. This problem is most pronounced for modern \accelerator" architectures { collections of hundreds of simple, low-clocked cores capable of executing the same instruction on dozens of pieces of data simultaneously. This is a significant change from the low number of high-clocked cores found in traditional CPUs, and effective utilisation of accelerators typically requires extensive code and algorithmic changes. In many cases, the best way in which to map a parallel workload to these new architectures is unclear. The principle focus of the work presented in this thesis is the evaluation of emerging parallel architectures (specifically, modern CPUs, GPUs and Intel MIC) for two benchmark codes { the LU benchmark from the NAS Parallel Benchmark Suite and Sandia's miniMD benchmark { which exhibit complex parallel behaviours that are representative of many scientific applications. Using combinations of low-level intrinsic functions, OpenMP, CUDA and MPI, we demonstrate performance improvements of up to 7x for these workloads. We also detail a code development methodology that permits application developers to target multiple architecture types without maintaining completely separate implementations for each platform. Using OpenCL, we develop performance portable implementations of the LU and miniMD benchmarks that are faster than the original codes, and at most 2x slower than versions highly-tuned for particular hardware. Finally, we demonstrate the importance of evaluating architectures at scale (as opposed to on single nodes) through performance modelling techniques, highlighting the problems associated with strong-scaling on emerging accelerator architectures

Warwick Research Archives Portal Repository

sensor data collection and performance evaluation using a TK1 board

Author: Pahune Saurabh
Publication venue: University of Memphis Digital Commons
Publication date: 18/07/2019
Field of study

Monitoring applications are abundant in todaýђةs world. Our goal is to monitor an individualand his neighborhood using wearable sensors. The system is smart in the sense it can process the captured data in near real-time and communicate-opportunistically with other such systems as well as smart phones and computers. we develop the hardware platform using existing components to support such functionalities. The Nvidia Jetson TEGRA-KEPLER (TK) board is used as the processor as it is one of the most powerful processors for embedded applications with the flexibility to connect to a plethora of sensors. Data transfer for communication is facilitated via Bluetooth and Wireless Fidelity (Wi-Fi). Results on the performance of this setup is reported in experiments with different sensors such as cameras, microphone, gas sensor,temperature/pressure/humidity sensor, and Garmin smart health watch determined heart rate/distance/speed/altitude/- position latitude and longitude and using metrics such as read/write speed,heat generated of Central Processing Unit (CPU), TK board and transmission delay

University of Memphis Digital Commons