44 research outputs found
Dynamic Scheduling, Allocation, and Compaction Scheme for Real-Time Tasks on FPGAs
Run-time reconfiguration (RTR) is a method of computing on reconfigurable logic, typically FPGAs, changing hardware configurations from phase to phase of a computation at run-time. Recent research has expanded from a focus on a single application at a time to encompass a view of the reconfigurable logic as a resource shared among multiple applications or users. In real-time system design, task deadlines play an important role. Real-time multi-tasking systems not only need to support sharing of the resources in space, but also need to guarantee execution of the tasks. At the operating system level, sharing logic gates, wires, and I/O pins among multiple tasks needs to be managed. From the high level standpoint, access to the resources needs to be scheduled according to task deadlines. This thesis describes a task allocator for scheduling, placing, and compacting tasks on a shared FPGA under real-time constraints. Our consideration of task deadlines is novel in the setting of handling multiple simultaneous tasks in RTR. Software simulations have been conducted to evaluate the performance of the proposed scheme. The results indicate significant improvement by decreasing the number of tasks rejected
A Reconfigurable Processor for Heterogeneous Multi-Core Architectures
A reconfigurable processor is a general-purpose processor coupled with an FPGA-like reconfigurable fabric. By deploying application-specific accelerators, performance for a wide range of applications can be improved with such a system. In this work concepts are designed for the use of reconfigurable processors in multi-tasking scenarios and as part of multi-core systems
MURAC: A unified machine model for heterogeneous computers
Includes bibliographical referencesHeterogeneous computing enables the performance and energy advantages of multiple distinct processing architectures to be efficiently exploited within a single machine. These systems are capable of delivering large performance increases by matching the applications to architectures that are most suited to them. The Multiple Runtime-reconfigurable Architecture Computer (MURAC) model has been proposed to tackle the problems commonly found in the design and usage of these machines. This model presents a system-level approach that creates a clear separation of concerns between the system implementer and the application developer. The three key concepts that make up the MURAC model are a unified machine model, a unified instruction stream and a unified memory space. A simple programming model built upon these abstractions provides a consistent interface for interacting with the underlying machine to the user application. This programming model simplifies application partitioning between hardware and software and allows the easy integration of different execution models within the single control ow of a mixed-architecture application. The theoretical and practical trade-offs of the proposed model have been explored through the design of several systems. An instruction-accurate system simulator has been developed that supports the simulated execution of mixed-architecture applications. An embedded System-on-Chip implementation has been used to measure the overhead in hardware resources required to support the model, which was found to be minimal. An implementation of the model within an operating system on a tightly-coupled reconfigurable processor platform has been created. This implementation is used to extend the software scheduler to allow for the full support of mixed-architecture applications in a multitasking environment. Different scheduling strategies have been tested using this scheduler for mixed-architecture applications. The design and implementation of these systems has shown that a unified abstraction model for heterogeneous computers provides important usability benefits to system and application designers. These benefits are achieved through a consistent view of the multiple different architectures to the operating system and user applications. This allows them to focus on achieving their performance and efficiency goals by gaining the benefits of different execution models during runtime without the complex implementation details of the system-level synchronisation and coordination
A TrustZone-assisted secure silicon on a co-design framework
Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresEmbedded systems were for a long time, single-purpose and closed systems, characterized
by hardware resource constraints and real-time requirements. Nowadays, their functionality is
ever-growing, coupled with an increasing complexity and heterogeneity. Embedded applications
increasingly demand employment of general-purpose operating systems (GPOSs) to handle operator
interfaces and general-purpose computing tasks, while simultaneously ensuring the strict
timing requirements. Virtualization, which enables multiple operating systems (OSs) to run on
top of the same hardware platform, is gaining momentum in the embedded systems arena,
driven by the growing interest in consolidating and isolating multiple and heterogeneous environments.
The penalties incurred by classic virtualization approaches is pushing research towards
hardware-assisted solutions. Among the existing commercial off-the-shelf (COTS) technologies for
virtualization, ARM TrustZone technology is gaining momentum due to the supremacy and lower
cost of TrustZone-enabled processors.
Programmable system-on-chips (SoCs) are becoming leading players in the embedded systems
space, because the combination of a plethora of hard resources with programmable logic
enables the efficient implementation of systems that perfectly fit the heterogeneous nature of
embedded applications. Moreover, novel disruptive approaches make use of field-programmable
gate array (FPGA) technology to enhance virtualization mechanisms.
This master’s thesis proposes a hardware-software co-design framework for easing the economy
of addressing the new generation of embedded systems requirements. ARM TrustZone is
exploited to implement the root-of-trust of a virtualization-based architecture that allows the execution
of a GPOS side-by-side with a real-time OS (RTOS). RTOS services were offloaded to hardware,
so that it could present simultaneous improvements on performance and determinism. Instead
of focusing in a concrete application, the goal is to provide a complete framework, specifically tailored
for Zynq-base devices, that developers can use to accelerate a bunch of distinct applications
across different embedded industries.Os sistemas embebidos foram, durante muitos anos, sistemas com um simples e único
propósito, caracterizados por recursos de hardware limitados e com cariz de tempo real. Hoje
em dia, o número de funcionalidades começa a escalar, assim como o grau de complexidade
e heterogeneidade. As aplicações embebidas exigem cada vez mais o uso de sistemas operativos
(OSs) de uso geral (GPOS) para lidar com interfaces gráficas e tarefas de computação de
propósito geral. Porém, os seus requisitos primordiais de tempo real mantém-se. A virtualização
permite que vários sistemas operativos sejam executados na mesma plataforma de hardware.
Impulsionada pelo crescente interesse em consolidar e isolar ambientes múltiplos e heterogéneos,
a virtualização tem ganho uma crescente relevância no domínio dos sistemas embebidos.
As adversidades que advém das abordagens de virtualização clássicas estão a direcionar estudos
no âmbito de soluções assistidas por hardware. Entre as tecnologias comerciais existentes, a
tecnologia ARM TrustZone está a ganhar muita relevância devido à supremacia e ao menor custo
dos processadores que suportam esta tecnologia.
Plataformas hibridas, que combinam processadores com lógica programável, estão em crescente
penetração no domínio dos sistemas embebidos pois, disponibilizam um enorme conjunto
de recursos que se adequam perfeitamente à natureza heterogénea dos sistemas atuais. Além
disso, existem soluções recentes que fazem uso da tecnologia de FPGA para melhorar os mecanismos
de virtualização.
Esta dissertação propõe uma framework baseada em hardware-software de modo a cumprir
os requisitos da nova geração de sistemas embebidos. A tecnologia TrustZone é explorada para
implementar uma arquitetura que permite a execução de um GPOS lado-a-lado com um sistemas
operativo de tempo real (RTOS). Os serviços disponibilizados pelo RTOS são migrados
para hardware, para melhorar o desempenho e determinismo do OS. Em vez de focar numa
aplicação concreta, o objetivo é fornecer uma framework especificamente adaptada para dispositivos
baseados em System-on-chips Zynq, de forma a que developers possam usar para acelerar
um vasto número de aplicações distintas em diferentes setores
Control Software for Reconfigurable Coprocessors
On-line data processing at the ATLAS general purpose particle detector, which is currently under construction at Geneva, generates demands on computing power that are difficult to satisfy with commodity CPU-based computers. One of the most demanding applications is the recognition of particle tracks that originate from B-quark decays. However, this and many others applications can benefit from parallel execution on field programmable gate arrays (FPGA). After the demonstration of accelerated track recognition with big FPGA-based custom computers, the development of FPGA based coprocessors started in the late 1990's. Applications of FPGA coprocessors are usually partitioned between the host and the tightly coupled coprocessor. The objective of the research that I present in this thesis was the development of software that mediates to applications the access to FPGA coprocessors. I used a software process based on iterative prototyping to cope with the expected changing requirements. Also, I used a strict bottom-up design to create classes that model devices on the coprocessors. Using these low-level classes, I developed tools which were used for bootstrapping, debugging, and firmware update of the coprocessors during their development and maintenance. Measurements show that the software overhead introduced by object-oriented programming and software layering is small. The software-support for six different coprocessors was partitioned into corresponding independent packages, which reuse a set of packages that provide common and basic functions. The steady evolution and use of the software during more than four years shows that the software is maintainable, adaptable, and usable
Space Station Freedom data management system growth and evolution report
The Information Sciences Division at the NASA Ames Research Center has completed a 6-month study of portions of the Space Station Freedom Data Management System (DMS). This study looked at the present capabilities and future growth potential of the DMS, and the results are documented in this report. Issues have been raised that were discussed with the appropriate Johnson Space Center (JSC) management and Work Package-2 contractor organizations. Areas requiring additional study have been identified and suggestions for long-term upgrades have been proposed. This activity has allowed the Ames personnel to develop a rapport with the JSC civil service and contractor teams that does permit an independent check and balance technique for the DMS
Towards the development of a reliable reconfigurable real-time operating system on FPGAs
In the last two decades, Field Programmable Gate Arrays (FPGAs) have been
rapidly developed from simple “glue-logic” to a powerful platform capable of
implementing a System on Chip (SoC). Modern FPGAs achieve not only the high
performance compared with General Purpose Processors (GPPs), thanks to hardware
parallelism and dedication, but also better programming flexibility, in comparison to
Application Specific Integrated Circuits (ASICs). Moreover, the hardware
programming flexibility of FPGAs is further harnessed for both performance and
manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR
allows a part or parts of a circuit to be reconfigured at run-time, without interrupting
the rest of the chip’s operation. As a result, hardware resources can be more
efficiently exploited since the chip resources can be reused by swapping in or out
hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR
improves fault tolerance against transient errors and permanent damage, such as
Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid
error accumulation. Furthermore, power and heat can be reduced by removing
finished or idle tasks from the chip. For all these reasons above, DPR has
significantly promoted Reconfigurable Computing (RC) and has become a very hot
topic. However, since hardware integration is increasing at an exponential rate, and
applications are becoming more complex with the growth of user demands, highlevel
application design and low-level hardware implementation are increasingly
separated and layered. As a consequence, users can obtain little advantage from DPR
without the support of system-level middleware.
To bridge the gap between the high-level application and the low-level hardware
implementation, this thesis presents the important contributions towards a Reliable,
Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the
user exploitation of DPR from the application level, by managing the complex
hardware in the background. In R3TOS, hardware tasks behave just like software
tasks, which can be created, scheduled, and mapped to different computing resources
on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time
scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and
EVC), which schedule tasks in real-time and circumvent emerging faults while
maintaining more compact empty areas. 3) Design and implementation of a faulttolerant
microprocessor by harnessing the existing FPGA resources, such as Error
Correction Code (ECC) and configuration primitives. 4) A novel symmetric
multiprocessing (SMP)-based architectures that supports shared memory programing
interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest
Neighbour classifier, which is a non-parametric classification algorithm widely used
in various fields of data mining; and b) pairwise sequence alignment, namely the
Smith Waterman algorithm, used for identifying similarities between two biological
sequences.
R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking
applications, whereby resources can be dynamically managed in respect of
user requirements and hardware availability. Benefiting from this, not only the
hardware resources can be more efficiently used, but also the system performance
can be significantly increased. Results show that the scheduling and allocating
efficiencies have been improved up to 2x, and the overall system performance is
further improved by ~2.5x. Future work includes the development of Network on
Chip (NoC), which is expected to further increase the communication throughput; as
well as the standardization and automation of our system design, which will be
carried out in line with the enablement of other high-level synthesis tools, to allow
application developers to benefit from the system in a more efficient manner
Run-time management for future MPSoC platforms
In recent years, we are witnessing the dawning of the Multi-Processor Systemon- Chip (MPSoC) era. In essence, this era is triggered by the need to handle more complex applications, while reducing overall cost of embedded (handheld) devices. This cost will mainly be determined by the cost of the hardware platform and the cost of designing applications for that platform. The cost of a hardware platform will partly depend on its production volume. In turn, this means that ??exible, (easily) programmable multi-purpose platforms will exhibit a lower cost. A multi-purpose platform not only requires ??exibility, but should also combine a high performance with a low power consumption. To this end, MPSoC devices integrate computer architectural properties of various computing domains. Just like large-scale parallel and distributed systems, they contain multiple heterogeneous processing elements interconnected by a scalable, network-like structure. This helps in achieving scalable high performance. As in most mobile or portable embedded systems, there is a need for low-power operation and real-time behavior. The cost of designing applications is equally important. Indeed, the actual value of future MPSoC devices is not contained within the embedded multiprocessor IC, but in their capability to provide the user of the device with an amount of services or experiences. So from an application viewpoint, MPSoCs are designed to ef??ciently process multimedia content in applications like video players, video conferencing, 3D gaming, augmented reality, etc. Such applications typically require a lot of processing power and a signi??cant amount of memory. To keep up with ever evolving user needs and with new application standards appearing at a fast pace, MPSoC platforms need to be be easily programmable. Application scalability, i.e. the ability to use just enough platform resources according to the user requirements and with respect to the device capabilities is also an important factor. Hence scalability, ??exibility, real-time behavior, a high performance, a low power consumption and, ??nally, programmability are key components in realizing the success of MPSoC platforms. The run-time manager is logically located between the application layer en the platform layer. It has a crucial role in realizing these MPSoC requirements. As it abstracts the platform hardware, it improves platform programmability. By deciding on resource assignment at run-time and based on the performance requirements of the user, the needs of the application and the capabilities of the platform, it contributes to ??exibility, scalability and to low power operation. As it has an arbiter function between different applications, it enables real-time behavior. This thesis details the key components of such an MPSoC run-time manager and provides a proof-of-concept implementation. These key components include application quality management algorithms linked to MPSoC resource management mechanisms and policies, adapted to the provided MPSoC platform services. First, we describe the role, the responsibilities and the boundary conditions of an MPSoC run-time manager in a generic way. This includes a de??nition of the multiprocessor run-time management design space, a description of the run-time manager design trade-offs and a brief discussion on how these trade-offs affect the key MPSoC requirements. This design space de??nition and the trade-offs are illustrated based on ongoing research and on existing commercial and academic multiprocessor run-time management solutions. Consequently, we introduce a fast and ef??cient resource allocation heuristic that considers FPGA fabric properties such as fragmentation. In addition, this thesis introduces a novel task assignment algorithm for handling soft IP cores denoted as hierarchical con??guration. Hierarchical con??guration managed by the run-time manager enables easier application design and increases the run-time spatial mapping freedom. In turn, this improves the performance of the resource assignment algorithm. Furthermore, we introduce run-time task migration components. We detail a new run-time task migration policy closely coupled to the run-time resource assignment algorithm. In addition to detailing a design-environment supported mechanism that enables moving tasks between an ISP and ??ne-grained recon??gurable hardware, we also propose two novel task migration mechanisms tailored to the Network-on-Chip environment. Finally, we propose a novel mechanism for task migration initiation, based on reusing debug registers in modern embedded microprocessors. We propose a reactive on-chip communication management mechanism. We show that by exploiting an injection rate control mechanism it is possible to provide a communication management system capable of providing a soft (reactive) QoS in a NoC. We introduce a novel, platform independent run-time algorithm to perform quality management, i.e. to select an application quality operating point at run-time based on the user requirements and the available platform resources, as reported by the resource manager. This contribution also proposes a novel way to manage the interaction between the quality manager and the resource manager. In order to have a the realistic, reproducible and ??exible run-time manager testbench with respect to applications with multiple quality levels and implementation tradev offs, we have created an input data generation tool denoted Pareto Surfaces For Free (PSFF). The the PSFF tool is, to the best of our knowledge, the ??rst tool that generates multiple realistic application operating points either based on pro??ling information of a real-life application or based on a designer-controlled random generator. Finally, we provide a proof-of-concept demonstrator that combines these concepts and shows how these mechanisms and policies can operate for real-life situations. In addition, we show that the proposed solutions can be integrated into existing platform operating systems