106 research outputs found

    A TrustZone-assisted secure silicon on a co-design framework

    Get PDF
    Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresEmbedded systems were for a long time, single-purpose and closed systems, characterized by hardware resource constraints and real-time requirements. Nowadays, their functionality is ever-growing, coupled with an increasing complexity and heterogeneity. Embedded applications increasingly demand employment of general-purpose operating systems (GPOSs) to handle operator interfaces and general-purpose computing tasks, while simultaneously ensuring the strict timing requirements. Virtualization, which enables multiple operating systems (OSs) to run on top of the same hardware platform, is gaining momentum in the embedded systems arena, driven by the growing interest in consolidating and isolating multiple and heterogeneous environments. The penalties incurred by classic virtualization approaches is pushing research towards hardware-assisted solutions. Among the existing commercial off-the-shelf (COTS) technologies for virtualization, ARM TrustZone technology is gaining momentum due to the supremacy and lower cost of TrustZone-enabled processors. Programmable system-on-chips (SoCs) are becoming leading players in the embedded systems space, because the combination of a plethora of hard resources with programmable logic enables the efficient implementation of systems that perfectly fit the heterogeneous nature of embedded applications. Moreover, novel disruptive approaches make use of field-programmable gate array (FPGA) technology to enhance virtualization mechanisms. This master’s thesis proposes a hardware-software co-design framework for easing the economy of addressing the new generation of embedded systems requirements. ARM TrustZone is exploited to implement the root-of-trust of a virtualization-based architecture that allows the execution of a GPOS side-by-side with a real-time OS (RTOS). RTOS services were offloaded to hardware, so that it could present simultaneous improvements on performance and determinism. Instead of focusing in a concrete application, the goal is to provide a complete framework, specifically tailored for Zynq-base devices, that developers can use to accelerate a bunch of distinct applications across different embedded industries.Os sistemas embebidos foram, durante muitos anos, sistemas com um simples e único propósito, caracterizados por recursos de hardware limitados e com cariz de tempo real. Hoje em dia, o número de funcionalidades começa a escalar, assim como o grau de complexidade e heterogeneidade. As aplicações embebidas exigem cada vez mais o uso de sistemas operativos (OSs) de uso geral (GPOS) para lidar com interfaces gráficas e tarefas de computação de propósito geral. Porém, os seus requisitos primordiais de tempo real mantém-se. A virtualização permite que vários sistemas operativos sejam executados na mesma plataforma de hardware. Impulsionada pelo crescente interesse em consolidar e isolar ambientes múltiplos e heterogéneos, a virtualização tem ganho uma crescente relevância no domínio dos sistemas embebidos. As adversidades que advém das abordagens de virtualização clássicas estão a direcionar estudos no âmbito de soluções assistidas por hardware. Entre as tecnologias comerciais existentes, a tecnologia ARM TrustZone está a ganhar muita relevância devido à supremacia e ao menor custo dos processadores que suportam esta tecnologia. Plataformas hibridas, que combinam processadores com lógica programável, estão em crescente penetração no domínio dos sistemas embebidos pois, disponibilizam um enorme conjunto de recursos que se adequam perfeitamente à natureza heterogénea dos sistemas atuais. Além disso, existem soluções recentes que fazem uso da tecnologia de FPGA para melhorar os mecanismos de virtualização. Esta dissertação propõe uma framework baseada em hardware-software de modo a cumprir os requisitos da nova geração de sistemas embebidos. A tecnologia TrustZone é explorada para implementar uma arquitetura que permite a execução de um GPOS lado-a-lado com um sistemas operativo de tempo real (RTOS). Os serviços disponibilizados pelo RTOS são migrados para hardware, para melhorar o desempenho e determinismo do OS. Em vez de focar numa aplicação concreta, o objetivo é fornecer uma framework especificamente adaptada para dispositivos baseados em System-on-chips Zynq, de forma a que developers possam usar para acelerar um vasto número de aplicações distintas em diferentes setores

    A New Methodology to Manage FPGA Distributed Memory Content via Bitstream for Xilinx ZYNQ Devices

    Get PDF
    This paper proposes a methodology to access data and manage the content of distributed memories in FPGA designs through the configuration bitstream. Thanks to the methods proposed, it is possible to read and write the data content of registers without using the in/out ports of registers in a straightforward fashion. Hence, it offers the possibility of performing several operations, such as, to load, copy or compare the information stored in registers without the necessity of physical interconnections. This work includes two flows that simplify the designing process when using the proposed approach: while the first enables the protection or unprotection of writing on different partial regions through the bitstream, the second permits homogeneous instances of a design implemented in different reconfigurable regions to be obtained without losing efficiency. The approach is based and has been physically validated on the ZYNQ from Xilinx, and when using partially reconfigurable designs, it does not affect the hardware overhead nor the maximum operating frequency of the design.This work has been supported, within the fund for research groups of the Basque university system IT1440-22, by the Department of Education and, within PILAR ZE-2020/00022 and COMMUTE ZE-2021/00931 projects, by the Hazitek program, both of the Basque Government; the latter also by the Ministerio de Ciencia Innovación of Spain through the Centro para el Desarrollo Tecnológico Industrial (CDTI) within the projects IDI-20201264 and IDI-20220543, and through the Fondo Europeo de Desarrollo Regional 2014–2020 (FEDER funds)

    Multi-Tenant Cloud FPGA: A Survey on Security

    Full text link
    With the exponentially increasing demand for performance and scalability in cloud applications and systems, data center architectures evolved to integrate heterogeneous computing fabrics that leverage CPUs, GPUs, and FPGAs. FPGAs differ from traditional processing platforms such as CPUs and GPUs in that they are reconfigurable at run-time, providing increased and customized performance, flexibility, and acceleration. FPGAs can perform large-scale search optimization, acceleration, and signal processing tasks compared with power, latency, and processing speed. Many public cloud provider giants, including Amazon, Huawei, Microsoft, Alibaba, etc., have already started integrating FPGA-based cloud acceleration services. While FPGAs in cloud applications enable customized acceleration with low power consumption, it also incurs new security challenges that still need to be reviewed. Allowing cloud users to reconfigure the hardware design after deployment could open the backdoors for malicious attackers, potentially putting the cloud platform at risk. Considering security risks, public cloud providers still don't offer multi-tenant FPGA services. This paper analyzes the security concerns of multi-tenant cloud FPGAs, gives a thorough description of the security problems associated with them, and discusses upcoming future challenges in this field of study

    Automatic synthesis of application-specific processors

    Get PDF
    Thesis (D. Tech. (Engineering: Electrical)) -- Central University of technology, Free State, 2012This thesis describes a method for the automatic generation of appli- cation speci_c processors. The thesis was organized into three sepa- rate but interrelated studies, which together provide: a justi_cation for the method used, a theory that supports the method, and a soft- ware application that realizes the method. The _rst study looked at how modern day microprocessors utilize their hardware resources and it proposed a metric, called core density, for measuring the utilization rate. The core density is a function of the microprocessor's instruction set and the application scheduled to run on that microprocessor. This study concluded that modern day microprocessors use their resources very ine_ciently and proposed the use of subset processors to exe- cute the same applications more e_ciently. The second study sought to provide a theoretical framework for the use of subset processors by developing a generic formal model of computer architecture. To demonstrate the model's versatility, it was used to describe a number of computer architecture components and entire computing systems. The third study describes the development of a set of software tools that enable the automatic generation of application speci_c proces- sors. The FiT toolkit automatically generates a unique Hardware Description Language (HDL) description of a processor based on an application binary _le and a parameterizable template of a generic mi- croprocessor. Area-optimized and performance-optimized custom soft processors were generated using the FiT toolkit and the utilization of the hardware resources by the custom soft processors was character- ized. The FiT toolkit was combined with an ANSI C compiler and a third-party tool for programming _eld-programmable gate arrays (FPGAs) to create an unconstrained C-to-silicon compiler

    Multi-core architectures with coarse-grained dynamically reconfigurable processors for broadband wireless access technologies

    Get PDF
    Broadband Wireless Access technologies have significant market potential, especially the WiMAX protocol which can deliver data rates of tens of Mbps. Strong demand for high performance WiMAX solutions is forcing designers to seek help from multi-core processors that offer competitive advantages in terms of all performance metrics, such as speed, power and area. Through the provision of a degree of flexibility similar to that of a DSP and performance and power consumption advantages approaching that of an ASIC, coarse-grained dynamically reconfigurable processors are proving to be strong candidates for processing cores used in future high performance multi-core processor systems. This thesis investigates multi-core architectures with a newly emerging dynamically reconfigurable processor – RICA, targeting WiMAX physical layer applications. A novel master-slave multi-core architecture is proposed, using RICA processing cores. A SystemC based simulator, called MRPSIM, is devised to model this multi-core architecture. This simulator provides fast simulation speed and timing accuracy, offers flexible architectural options to configure the multi-core architecture, and enables the analysis and investigation of multi-core architectures. Meanwhile a profiling-driven mapping methodology is developed to partition the WiMAX application into multiple tasks as well as schedule and map these tasks onto the multi-core architecture, aiming to reduce the overall system execution time. Both the MRPSIM simulator and the mapping methodology are seamlessly integrated with the existing RICA tool flow. Based on the proposed master-slave multi-core architecture, a series of diverse homogeneous and heterogeneous multi-core solutions are designed for different fixed WiMAX physical layer profiles. Implemented in ANSI C and executed on the MRPSIM simulator, these multi-core solutions contain different numbers of cores, combine various memory architectures and task partitioning schemes, and deliver high throughputs at relatively low area costs. Meanwhile a design space exploration methodology is developed to search the design space for multi-core systems to find suitable solutions under certain system constraints. Finally, laying a foundation for future multithreading exploration on the proposed multi-core architecture, this thesis investigates the porting of a real-time operating system – Micro C/OS-II to a single RICA processor. A multitasking version of WiMAX is implemented on a single RICA processor with the operating system support

    Run-time management of many-core SoCs: A communication-centric approach

    Get PDF
    The single core performance hit the power and complexity limits in the beginning of this century, moving the industry towards the design of multi- and many-core system-on-chips (SoCs). The on-chip communication between the cores plays a criticalrole in the performance of these SoCs, with power dissipation, communication latency, scalability to many cores, and reliability against the transistor failures as the main design challenges. Accordingly, we dedicate this thesis to the communicationcentered management of the many-core SoCs, with the goal to advance the state-ofthe-art in addressing these challenges. To this end, we contribute to on-chip communication of many-core SoCs in three main directions. First, we start with a synthesizable SoC with full system simulation. We demonstrate the importance of the networking overhead in a practical system, and propose our sophisticated network interface (NI) that offloads the work from SW to HW. Our results show around 5x and up to 50x higher network performance, compared to previous works. As the second direction of this thesis, we study the significance of run-time application mapping. We demonstrate that contiguous application mapping not only improves the network latency (by 23%) and power dissipation (by 50%), but also improves the system throughput (by 3%) and quality-of-service (QoS) of soft real-time applications (up to 100x less deadline misses). Also our hierarchical run-time application mapping provides 99.41% successful mapping when up to 8 links are broken. As the final direction of the thesis, we propose a fault-tolerant routing algorithm, the maze-routing. It is the first-in-class algorithm that provides guaranteed delivery, a fully-distributed solution, low area overhead (by 16x), and instantaneous reconfiguration (vs. 40K cycles down time of previous works), all at the same time. Besides the individual goals of each contribution, when applicable, we ensure that our solutions scale to extreme network sizes like 12x12 and 16x16. This thesis concludes that the communication overhead and its optimization play a significant role in the performance of many-core SoC

    MURAC: A unified machine model for heterogeneous computers

    Get PDF
    Includes bibliographical referencesHeterogeneous computing enables the performance and energy advantages of multiple distinct processing architectures to be efficiently exploited within a single machine. These systems are capable of delivering large performance increases by matching the applications to architectures that are most suited to them. The Multiple Runtime-reconfigurable Architecture Computer (MURAC) model has been proposed to tackle the problems commonly found in the design and usage of these machines. This model presents a system-level approach that creates a clear separation of concerns between the system implementer and the application developer. The three key concepts that make up the MURAC model are a unified machine model, a unified instruction stream and a unified memory space. A simple programming model built upon these abstractions provides a consistent interface for interacting with the underlying machine to the user application. This programming model simplifies application partitioning between hardware and software and allows the easy integration of different execution models within the single control ow of a mixed-architecture application. The theoretical and practical trade-offs of the proposed model have been explored through the design of several systems. An instruction-accurate system simulator has been developed that supports the simulated execution of mixed-architecture applications. An embedded System-on-Chip implementation has been used to measure the overhead in hardware resources required to support the model, which was found to be minimal. An implementation of the model within an operating system on a tightly-coupled reconfigurable processor platform has been created. This implementation is used to extend the software scheduler to allow for the full support of mixed-architecture applications in a multitasking environment. Different scheduling strategies have been tested using this scheduler for mixed-architecture applications. The design and implementation of these systems has shown that a unified abstraction model for heterogeneous computers provides important usability benefits to system and application designers. These benefits are achieved through a consistent view of the multiple different architectures to the operating system and user applications. This allows them to focus on achieving their performance and efficiency goals by gaining the benefits of different execution models during runtime without the complex implementation details of the system-level synchronisation and coordination

    Software support for dynamic partial reconfigurable FPGAs on heterogeneous platforms

    Get PDF
    This thesis addresses the design and implementation of a software support for real-time systems developed on heterogeneous platforms that include a processor and an FPGA with dynamic partial reconfiguration capabilities. The software support enables tasks to request the execution of accelerated functions on the FPGA in parallel with other tasks running on the processor. Accelerated functions are dynamically allocated on the FPGA depending of the availability of the area and the online requests issued by the processor, so extending the concept of multitasking to the FPGA resource domain. The performance of the allocation mechanism has been evaluated in terms of speed-up and response times. The achieved results show that the system is able to guarantee bounded delays and acceptable overhead that can be taken into account for a future schedulability analysis of real-time applications

    Run-time management for future MPSoC platforms

    Get PDF
    In recent years, we are witnessing the dawning of the Multi-Processor Systemon- Chip (MPSoC) era. In essence, this era is triggered by the need to handle more complex applications, while reducing overall cost of embedded (handheld) devices. This cost will mainly be determined by the cost of the hardware platform and the cost of designing applications for that platform. The cost of a hardware platform will partly depend on its production volume. In turn, this means that ??exible, (easily) programmable multi-purpose platforms will exhibit a lower cost. A multi-purpose platform not only requires ??exibility, but should also combine a high performance with a low power consumption. To this end, MPSoC devices integrate computer architectural properties of various computing domains. Just like large-scale parallel and distributed systems, they contain multiple heterogeneous processing elements interconnected by a scalable, network-like structure. This helps in achieving scalable high performance. As in most mobile or portable embedded systems, there is a need for low-power operation and real-time behavior. The cost of designing applications is equally important. Indeed, the actual value of future MPSoC devices is not contained within the embedded multiprocessor IC, but in their capability to provide the user of the device with an amount of services or experiences. So from an application viewpoint, MPSoCs are designed to ef??ciently process multimedia content in applications like video players, video conferencing, 3D gaming, augmented reality, etc. Such applications typically require a lot of processing power and a signi??cant amount of memory. To keep up with ever evolving user needs and with new application standards appearing at a fast pace, MPSoC platforms need to be be easily programmable. Application scalability, i.e. the ability to use just enough platform resources according to the user requirements and with respect to the device capabilities is also an important factor. Hence scalability, ??exibility, real-time behavior, a high performance, a low power consumption and, ??nally, programmability are key components in realizing the success of MPSoC platforms. The run-time manager is logically located between the application layer en the platform layer. It has a crucial role in realizing these MPSoC requirements. As it abstracts the platform hardware, it improves platform programmability. By deciding on resource assignment at run-time and based on the performance requirements of the user, the needs of the application and the capabilities of the platform, it contributes to ??exibility, scalability and to low power operation. As it has an arbiter function between different applications, it enables real-time behavior. This thesis details the key components of such an MPSoC run-time manager and provides a proof-of-concept implementation. These key components include application quality management algorithms linked to MPSoC resource management mechanisms and policies, adapted to the provided MPSoC platform services. First, we describe the role, the responsibilities and the boundary conditions of an MPSoC run-time manager in a generic way. This includes a de??nition of the multiprocessor run-time management design space, a description of the run-time manager design trade-offs and a brief discussion on how these trade-offs affect the key MPSoC requirements. This design space de??nition and the trade-offs are illustrated based on ongoing research and on existing commercial and academic multiprocessor run-time management solutions. Consequently, we introduce a fast and ef??cient resource allocation heuristic that considers FPGA fabric properties such as fragmentation. In addition, this thesis introduces a novel task assignment algorithm for handling soft IP cores denoted as hierarchical con??guration. Hierarchical con??guration managed by the run-time manager enables easier application design and increases the run-time spatial mapping freedom. In turn, this improves the performance of the resource assignment algorithm. Furthermore, we introduce run-time task migration components. We detail a new run-time task migration policy closely coupled to the run-time resource assignment algorithm. In addition to detailing a design-environment supported mechanism that enables moving tasks between an ISP and ??ne-grained recon??gurable hardware, we also propose two novel task migration mechanisms tailored to the Network-on-Chip environment. Finally, we propose a novel mechanism for task migration initiation, based on reusing debug registers in modern embedded microprocessors. We propose a reactive on-chip communication management mechanism. We show that by exploiting an injection rate control mechanism it is possible to provide a communication management system capable of providing a soft (reactive) QoS in a NoC. We introduce a novel, platform independent run-time algorithm to perform quality management, i.e. to select an application quality operating point at run-time based on the user requirements and the available platform resources, as reported by the resource manager. This contribution also proposes a novel way to manage the interaction between the quality manager and the resource manager. In order to have a the realistic, reproducible and ??exible run-time manager testbench with respect to applications with multiple quality levels and implementation tradev offs, we have created an input data generation tool denoted Pareto Surfaces For Free (PSFF). The the PSFF tool is, to the best of our knowledge, the ??rst tool that generates multiple realistic application operating points either based on pro??ling information of a real-life application or based on a designer-controlled random generator. Finally, we provide a proof-of-concept demonstrator that combines these concepts and shows how these mechanisms and policies can operate for real-life situations. In addition, we show that the proposed solutions can be integrated into existing platform operating systems

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing
    corecore