Search CORE

1,221 research outputs found

マルチコアプロセッサのためのコンパイラによるキャッシュコヒーレンシー制御に関する研究

Author: Adhi Boma Anantasatya
Publication venue
Publication date: 01/01/2020
Field of study

早大学位記番号:新8512早稲田大

Waseda University Repository

HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA

Author: Benini Luca
Capotondi Alessandro
Kurth Andreas
Marongiu Andrea
Vogel Pirmin
Publication venue
Publication date: 01/01/2017
Field of study

Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host processor with programmable manycore accelerators (PMCAs) to combine general-purpose computing with domain-specific, efficient processing capabilities. While leading companies successfully advance their HESoC products, research lags behind due to the challenges of building a prototyping platform that unites an industry-standard host processor with an open research PMCA architecture. In this work we introduce HERO, an FPGA-based research platform that combines a PMCA composed of clusters of RISC-V cores, implemented as soft cores on an FPGA fabric, with a hard ARM Cortex-A multicore host processor. The PMCA architecture mapped on the FPGA is silicon-proven, scalable, configurable, and fully modifiable. HERO includes a complete software stack that consists of a heterogeneous cross-compilation toolchain with support for OpenMP accelerator programming, a Linux driver, and runtime libraries for both host and PMCA. HERO is designed to facilitate rapid exploration on all software and hardware layers: run-time behavior can be accurately analyzed by tracing events, and modifications can be validated through fully automated hard ware and software builds and executed tests. We demonstrate the usefulness of HERO by means of case studies from our research

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Performance Aspects of Synthesizable Computing Systems

Author: Schleuniger Pascal
Publication venue: Technical University of Denmark
Publication date: 01/01/2014
Field of study

Online Research Database In Technology

From plasma to beefarm: Design experience of an FPGA-based multicore prototype

Author: Arcas Abella Oriol
Cristal Kestelman Adrián
Hur Ibrahim
Sayilar Gokhan
Singh Satnam
Sonmez Nehir
Unsal Osman Sabri
Valero Cortés Mateo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In this paper, we take a MIPS-based open-source uniprocessor soft core, Plasma, and extend it to obtain the Beefarm infrastructure for FPGA-based multiprocessor emulation, a popular research topic of the last few years both in the FPGA and the computer architecture communities. We discuss various design tradeoffs and we demonstrate superior scalability through experimental results compared to traditional software instruction set simulators. Based on our experience of designing and building a complete FPGA-based multiprocessor emulation system that supports run-time and compiler infrastructure and on the actual executions of our experiments running Software Transactional Memory (STM) benchmarks, we comment on the pros, cons and future trends of using hardware-based emulation for research.Peer ReviewedPostprint (author's final draft

CiteSeerX

UPCommons. Portal del coneixement obert de la UPC

Modularizing and Specifying Protocols among Threads

Author: Arbab Farhad
Jongmans Sung-Shik T. Q.
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2013
Field of study

We identify three problems with current techniques for implementing protocols among threads, which complicate and impair the scalability of multicore software development: implementing synchronization, implementing coordination, and modularizing protocols. To mend these deficiencies, we argue for the use of domain-specific languages (DSL) based on existing models of concurrency. To demonstrate the feasibility of this proposal, we explain how to use the model of concurrency Reo as a high-level protocol DSL, which offers appropriate abstractions and a natural separation of protocols and computations. We describe a Reo-to-Java compiler and illustrate its use through examples.Comment: In Proceedings PLACES 2012, arXiv:1302.579

arXiv.org e-Print Archive

Directory of Open Access Journals

Embedded System Optimization of Radar Post-processing in an ARM CPU Core

Author: Ogbonnia Chibundu
Publication venue
Publication date: 04/05/2022
Field of study

Algorithms executed on the radar processor system contributes to a significant performance bottleneck of the overall radar system. One key performance concern is the latency in target detection when dealing with hard deadline systems. Research has shown software optimization as one major contributor to radar system performance improvements. This thesis aims at software optimizations using a manual and automatic approach and analyzing the results to make informed future decisions while working with an ARM processor system. In order to ascertain an optimized implementation, a question put forward was whether the algorithms on the ARM processor could work with a 6-antenna implementation without a decline in the performance. However, an answer would also help project how many additional algorithms can still be added without performance decline. The manual optimization was done based on the quantitative analysis of the software execution time. The manual optimization approach looked at the vectorization strategy using the NEON vector register on the ARM CPU to reimplement the initial Constant False Alarm Rate(CFAR) Detection algorithm. An additional optimization approach was eliminating redundant loops while going through the Range Gates and Doppler filters. In order to determine the best compiler for automatic code optimization for the radar algorithms on the ARM processor, the GCC and Clang compilers were used to compile the initial algorithms and the optimized implementation on the radar post-processing stage. Analysis of the optimization results showed that it is possible to run the radar post-processing algorithms on the ARM processor at the 6-antenna implementation without system load stress. In addition, the results show an excellent headroom margin based on the defined scenario. The result analysis further revealed that the effect of dynamic memory allocation could not be underrated in situations where performance is a significant concern. Additional statements from the result demonstrated that the GCC and Clang compiler has their strength and weaknesses when used in the compilation. One limiting factor to note on the optimization using the NEON register is the sample size’s effect on the optimization implementation. Although it fits into the test samples used based on the defined scenario, there might be varying results in varying window cell size situations that might not necessarily improve the time constraints

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Multimedia ONline ARchiv CHemnitz

Evaluating Cache Coherent Shared Virtual Memory for Heterogeneous Multicore Chips

Author: Hechtman Blake A.
Sorin Daniel J.
Publication venue
Publication date: 01/01/2013
Field of study

The trend in industry is towards heterogeneous multicore processors (HMCs), including chips with CPUs and massively-threaded throughput-oriented processors (MTTOPs) such as GPUs. Although current homogeneous chips tightly couple the cores with cache-coherent shared virtual memory (CCSVM), this is not the communication paradigm used by any current HMC. In this paper, we present a CCSVM design for a CPU/MTTOP chip, as well as an extension of the pthreads programming model, called xthreads, for programming this HMC. Our goal is to evaluate the potential performance benefits of tightly coupling heterogeneous cores with CCSVM

arXiv.org e-Print Archive

CiteSeerX

Crossref

T-Crest: A Time-Predictable Multi-Core Platform For Aerospace Applications

Author: Rocha Andre
Schoeberl Martin
Silva Claudio
Publication venue: 'European Space Agency'
Publication date: 01/01/2014
Field of study

Online Research Database In Technology