323 research outputs found
High-Performance Simultaneous Multiprocessing for Heterogeneous System-on-Chip
This paper presents a methodology for simultaneous heterogeneous computing,
named ENEAC, where a quad core ARM Cortex-A53 CPU works in tandem with a
preprogrammed on-board FPGA accelerator. A heterogeneous scheduler distributes
the tasks optimally among all the resources and all compute units run
asynchronously, which allows for improved performance for irregular workloads.
ENEAC achieves up to 17\% performance improvement \ignore{and 14\% energy usage
reduction,} when using all platform resources compared to just using the FPGA
accelerators and up to 865\% performance increase \ignore{and up to 89\% energy
usage decrease} when using just the CPU. The workflow uses existing commercial
tools and C/C++ as a single programming language for both accelerator design
and CPU programming for improved productivity and ease of verification.Comment: 7 pages, 5 figures, 1 table Presented at the 13th International
Workshop on Programmability and Architectures for Heterogeneous Multicores,
2020 (arXiv:2005.07619
Refactoring software to heterogeneous parallel platforms
In summary, the papers included in this special issue are representative of the progress achieved by the research community at various levels from the very high level using parallel patterns to lower levels using, for example, transactional software memory. Also the integration of GPUs and FPGAs in the landscape is essential to achieve better performance in different categories of applications. All these innovative research directions will contribute to better achieve the long-term goal of better refactoring of existing applications to new and evolving parallel heterogeneous architectures
Lightweight asynchronous scheduling in heterogeneous reconfigurable systems
The trend for heterogeneous embedded systems is the integration of accelerators and general-purpose CPU cores on the same die. In these integrated architectures, like the Zynq UltraScale+ board (CPU+FPGA) that we target in this work, hardware support for shared memory and low-overhead synchronization between the accelerator and the CPU cores make the case for exploring strategies that exploit a tight collaboration between the CPUs and the accelerator. In this paper we propose a novel lightweight scheduling strategy, FastFit, targeted to FPGA accelerators, and a new scheduler based on it, named MultiFastFit, which asynchronously tackles heterogeneous systems comprised of a variety of CPU cores and FPGA IPs. Our strategy significantly reduces the overhead to automatically compute the near-optimal chunksizes when compared to a previous state-of-the-art auto-tuned approach, which makes our approach more suitable for fine-grained applications. Additionally, our scheduler MultiFastFit has been designed to enable the efficient co-execution of work among compute devices in such a way that all the devices are busy while minimizing the load unbalance. Our approaches have been evaluated using four benchmarks carefully tuned for the low-power UltraScale+ platform. Our experiments demonstrate that the FastFit strategy always finds the near-optimal FPGA chunksize for any device configuration at a reasonable cost, even for fine-grained and irregular applications, and that heterogeneous CPU+FPGA co-executions that exploit all the compute devices are usually faster and more energy efficient than the CPU-only and FPGA-only executions. We have also compared MultiFastFit with other state-of-the-art scheduling strategies, finding that it outperforms other auto-tuned approach up to 2x and it achieves similar results to manually-tuned schedulers without requiring an offline search of the ideal CPU-FPGA partition or FPGA chunk granularity. © 2022 The Author
Multiprocessor platform using LEON3 processor
The recent advances in embedded systems world, lead us to more complex systems with
application specific blocks (IP cores), the System on Chip (SoC) devices. A good example
of these complex devices can be encountered in the cell phones that can have image processing
cores, communication cores, memory card cores, and others.
The need of augmenting systems’ processing performance with lowest power, leads to a
concept of Multiprocessor System on Chip (MSoC) in which the execution of multiple
tasks can be distributed along various processors.
This thesis intends to address the creation of a synthesizable multiprocessing system to be
placed in a FPGA device, providing a good flexibility to tailor the system to a specific application.
To deliver a multiprocessing system, will be used the synthesisable 32-bit
SPARC V8 compliant, LEON3 processor.Os avanços recentes no mundo dos sistemas embebidos levam-nos a sistemas mais
complexos com blocos para aplicações específicas (IP cores), os dispositivos System on
Chip (SoC). Um bom exemplo destes complexos dispositivos pode ser encontrado nos
telemóveis, que podem conter cores de processamento de imagem, cores de comunicações,
cores para cartões de memória, entre outros.
A necessidade de aumentar o desempenho dos sistemas de processamento com o menor
consumo possível, leva ao conceito de Multiprocessor System on Chip (MSoC) em que a
execução de múltiplas tarefas pode ser distribuída por vários processadores.
Esta Tese pretende abordar a criação de um sistema de multiprocessamento sintetizável
para ser colocado numa FPGA, proporcionando uma boa flexibilidade para a adaptação do
sistema a uma aplicação específica. Para obter o sistema multiprocessamento, irá ser
utilizado o processador sintetizável SPARC V8 de 32-bit, LEON3
A TrustZone-assisted secure silicon on a co-design framework
Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresEmbedded systems were for a long time, single-purpose and closed systems, characterized
by hardware resource constraints and real-time requirements. Nowadays, their functionality is
ever-growing, coupled with an increasing complexity and heterogeneity. Embedded applications
increasingly demand employment of general-purpose operating systems (GPOSs) to handle operator
interfaces and general-purpose computing tasks, while simultaneously ensuring the strict
timing requirements. Virtualization, which enables multiple operating systems (OSs) to run on
top of the same hardware platform, is gaining momentum in the embedded systems arena,
driven by the growing interest in consolidating and isolating multiple and heterogeneous environments.
The penalties incurred by classic virtualization approaches is pushing research towards
hardware-assisted solutions. Among the existing commercial off-the-shelf (COTS) technologies for
virtualization, ARM TrustZone technology is gaining momentum due to the supremacy and lower
cost of TrustZone-enabled processors.
Programmable system-on-chips (SoCs) are becoming leading players in the embedded systems
space, because the combination of a plethora of hard resources with programmable logic
enables the efficient implementation of systems that perfectly fit the heterogeneous nature of
embedded applications. Moreover, novel disruptive approaches make use of field-programmable
gate array (FPGA) technology to enhance virtualization mechanisms.
This master’s thesis proposes a hardware-software co-design framework for easing the economy
of addressing the new generation of embedded systems requirements. ARM TrustZone is
exploited to implement the root-of-trust of a virtualization-based architecture that allows the execution
of a GPOS side-by-side with a real-time OS (RTOS). RTOS services were offloaded to hardware,
so that it could present simultaneous improvements on performance and determinism. Instead
of focusing in a concrete application, the goal is to provide a complete framework, specifically tailored
for Zynq-base devices, that developers can use to accelerate a bunch of distinct applications
across different embedded industries.Os sistemas embebidos foram, durante muitos anos, sistemas com um simples e único
propósito, caracterizados por recursos de hardware limitados e com cariz de tempo real. Hoje
em dia, o número de funcionalidades começa a escalar, assim como o grau de complexidade
e heterogeneidade. As aplicações embebidas exigem cada vez mais o uso de sistemas operativos
(OSs) de uso geral (GPOS) para lidar com interfaces gráficas e tarefas de computação de
propósito geral. Porém, os seus requisitos primordiais de tempo real mantém-se. A virtualização
permite que vários sistemas operativos sejam executados na mesma plataforma de hardware.
Impulsionada pelo crescente interesse em consolidar e isolar ambientes múltiplos e heterogéneos,
a virtualização tem ganho uma crescente relevância no domínio dos sistemas embebidos.
As adversidades que advém das abordagens de virtualização clássicas estão a direcionar estudos
no âmbito de soluções assistidas por hardware. Entre as tecnologias comerciais existentes, a
tecnologia ARM TrustZone está a ganhar muita relevância devido à supremacia e ao menor custo
dos processadores que suportam esta tecnologia.
Plataformas hibridas, que combinam processadores com lógica programável, estão em crescente
penetração no domínio dos sistemas embebidos pois, disponibilizam um enorme conjunto
de recursos que se adequam perfeitamente à natureza heterogénea dos sistemas atuais. Além
disso, existem soluções recentes que fazem uso da tecnologia de FPGA para melhorar os mecanismos
de virtualização.
Esta dissertação propõe uma framework baseada em hardware-software de modo a cumprir
os requisitos da nova geração de sistemas embebidos. A tecnologia TrustZone é explorada para
implementar uma arquitetura que permite a execução de um GPOS lado-a-lado com um sistemas
operativo de tempo real (RTOS). Os serviços disponibilizados pelo RTOS são migrados
para hardware, para melhorar o desempenho e determinismo do OS. Em vez de focar numa
aplicação concreta, o objetivo é fornecer uma framework especificamente adaptada para dispositivos
baseados em System-on-chips Zynq, de forma a que developers possam usar para acelerar
um vasto número de aplicações distintas em diferentes setores
Cooperative CPU, GPU, and FPGA heterogeneous execution with EngineCL
Heterogeneous systems are the core architecture of most of the high-performance computing nodes, due to their excellent performance and energy efficiency. However, a key challenge that remains is programmability, specifically, releasing the programmer from the burden of managing data and devices with different architectures. To this end, we extend EngineCL to support FPGA devices. Based on OpenCL, EngineCL is a high-level framework providing load balancing among devices. Our proposal fully integrates FPGAs into the framework, enabling effective cooperation between CPU, GPU, and FPGA. With command overlapping and judicious data management, our work improves performance by up to 96% compared with single-device execution and delivers energy-delay gains of up to 37%. In addition, adopting FPGAs does not require programmers to make big changes in their applications because the extensions do not modify the user-facing interface of EngineCL
- …