15 research outputs found

    Support des communications dans des architectures multicœurs par l’intermédiaire de mécanismes matériels et d’interfaces de programmation standardisées

    Get PDF
    The application constraints driving the design of embedded systems are constantly demanding higher performance and power efficiency. To meet these constraints, current SoC platforms rely on replicating several processing cores while adding dedicated hardware accelerators to handle specific tasks. However, developing embedded applications is becoming a key challenge, since applications workload will continue to grow and the software technologies are not evolving as fast as hardware architectures, leaving a gap in the full system design. Indeed, the increased programming complexity can be associated to the lack of software standards that supports heterogeneity, frequently leading to custom solutions. On the other hand, implementing a standard software solution for embedded systems might induce significant performance and memory usage overheads. Therefore, this Thesis focus on decreasing this gap by implementing hardware mechanisms in co-design with a standard programming interface for embedded systems. The main objectives are to increase programmability through the implementation of a standardized communication application programming interface (MCAPI), and decrease the overheads imposed by the software implementation through the use of the developed hardware mechanisms.The contributions of the Thesis comprise the implementation of MCAPI for a generic multi-core platform and dedicated hardware mechanisms to improve communication connection phase and overall performance of data transfer phase. It is demonstrated that the proposed mechanisms can be exploited by the software implementation without increasing software complexity. Furthermore, performance estimations obtained using a SystemC/TLM simulation model for the reference multi-core architecture show that the proposed mechanisms provide significant gains in terms of latency (up to 97%), throughput (40x increase) and network traffic (up to 68%) while reducing processor workload for both characterization test-cases and real application benchmarks.L’évolution des contraintes applicatives imposent des améliorations continues sur les performances et l’efficacité énergétique des systèmes embarqués. Pour répondre à ces contraintes, les plateformes « SoC » actuelles s’appuient sur la multiplication des cœurs de calcul, tout en ajoutant des accélérateurs matériels dédiés pour gérer des tâches spécifiques. Dans ce contexte, développer des applications embarquées devient un défi complexe, en effet la charge de travail des applications continue à croître alors que les technologies logicielles n’évoluent pas aussi vite que les architectures matérielles, laissant un écart dans la conception complète du système. De fait, la complexité accrue de programmation peut être associée à l’absence de standards logiciels qui prennent en charge l’hétérogénéité des architectures, menant souvent à des solutions ad hoc. A l’opposé, l’utilisation d’une solution logicielle standardisée pour les systèmes embarqués peut induire des surcoûts importants concernant les performances et l’occupation de la mémoire si elle n’est pas adaptée à l’architecture. Par conséquent, le travail de cette thèse se concentre sur la réduction de cet écart en mettant en œuvre des mécanismes matériels dont la conception prend en compte une interface de programmation standard pour systèmes embarqués. Les principaux objectifs sont ainsi d’accroître la programmabilité par la mise en œuvre d’une interface de programmation : MCAPI, et de diminuer la charge logiciel des cœurs grâce à l’utilisation des mécanismes matériels développés.Les contributions de la thèse comprennent la mise en œuvre de MCAPI pour une plate-forme multicœur générique et des mécanismes matériels pour améliorer la performance globale de la configuration de la communication et des transferts de données. Il est démontré que les mécanismes peuvent être pris en charge par les interfaces logicielles sans augmenter leur complexité. En outre, les résultats de performance obtenus en utilisant un modèle SystemC/TLM de l’architecture multicœurs de référence montrent que les mécanismes proposés apportent des gains significatifs en termes de latence, débit, trafic réseau, temps de charge processeur et temps de communication sur des cas d’étude et des applications complètes

    Suunnittelutason parametrisointi Kactus2:ssa

    Get PDF
    Embedded systems are growing larger and more complex. Even now, current system designs can contain hundreds of Intellectual Property (IP) components. To keep up with productivity, the reusability of the IP components must be improved. This is the scope of IEEE standard IP-XACT. This thesis is based on Kactus2, an open source IP-XACT tool developed at Tampere University of Technology. Kactus2 provides a graphical user interface for System-on-Chip and embedded system IP packing, design capture and VHDL/Verilog code generation. This thesis describes the development and implementation of version 2.8 of Kactus2. The requirements and solutions are presented in detail for each of the new features and improvements implemented in version 2.8. Alternative solutions are presented, and the selected alternatives are justified. Possible future implementations are also given. In version 2.8, the parameter usage of IP components is improved through the use of universally unique IDs (UUID), which requires many changes e.g. to the IP-XACT Component editors. New features include parameter importing, design level configuration through parameters and a parameter propagation mechanism. Remapping IP-XACT Memory Maps through the use of Remap States and memory Remap Elements has also been added. To facilitate the storing of hierarchical IP components, a new save action has been added to the Kactus2 toolbar. Version 2.8 of Kactus2 was released according to its schedule. The development of the version is considered a success, as it improves the design level parameterization in Kactus2 while incorporating additional new features. Within a month of its release, Kactus2 version 2.8 has been downloaded over 200 times, and its benefits over the previous version are confirmed by industrial System-on-Chip developers

    Prosessori- ja system-on-chip-työkalujen yhteiskäyttö

    Get PDF
    Transport-triggered architecture (TTA) processors provide an efficient middle-ground in creating intellectual property (IP) components for system-on-chip (SoC) designs. Using TTAs, the design effort is greatly reduced compared to ASIC approach, and a more economic and efficient implementation is possible than when using a general purpose processor. This Thesis examines ways to accelerate the design flow when using TTA processors in SoC designs. The proposed flows combine the use of the TTA-based Co-design Environment (TCE) tool set and Kactus2 IP-XACT design environment. The IP-XACT standard and the Kactus2 tool make it easy to integrate and configure IP components from multiple vendors, whereas the TCE tools provide a fast and efficient path from C to VHDL. The Thesis presents three use cases for TTA: as a ready-made fixed accelerator, a general purpose processor, and a tailored application-specific processor. Moreover, management of instance-specific data in IP-XACT is discussed. For each use case, the design flows are presented in detail step-by-step, a case example is presented, and the design time spent on each step is evaluated. The flows contain between 15 and 18 steps and use between 8 and 12 different program tools from the studied tool sets. Provided that C source codes and IP-XACT library are available, a non-HW oriented engineer can implement an FPGA based multiprocessor product in less than 4 hours. Based on the results, further development suggestions for the TCE tools and Kactus2 are made

    Self-adaptivity of applications on network on chip multiprocessors: the case of fault-tolerant Kahn process networks

    Get PDF
    Technology scaling accompanied with higher operating frequencies and the ability to integrate more functionality in the same chip has been the driving force behind delivering higher performance computing systems at lower costs. Embedded computing systems, which have been riding the same wave of success, have evolved into complex architectures encompassing a high number of cores interconnected by an on-chip network (usually identified as Multiprocessor System-on-Chip). However these trends are hindered by issues that arise as technology scaling continues towards deep submicron scales. Firstly, growing complexity of these systems and the variability introduced by process technologies make it ever harder to perform a thorough optimization of the system at design time. Secondly, designers are faced with a reliability wall that emerges as age-related degradation reduces the lifetime of transistors, and as the probability of defects escaping post-manufacturing testing is increased. In this thesis, we take on these challenges within the context of streaming applications running in network-on-chip based parallel (not necessarily homogeneous) systems-on-chip that adopt the no-remote memory access model. In particular, this thesis tackles two main problems: (1) fault-aware online task remapping, (2) application-level self-adaptation for quality management. For the former, by viewing fault tolerance as a self-adaptation aspect, we adopt a cross-layer approach that aims at graceful performance degradation by addressing permanent faults in processing elements mostly at system-level, in particular by exploiting redundancy available in multi-core platforms. We propose an optimal solution based on an integer linear programming formulation (suitable for design time adoption) as well as heuristic-based solutions to be used at run-time. We assess the impact of our approach on the lifetime reliability. We propose two recovery schemes based on a checkpoint-and-rollback and a rollforward technique. For the latter, we propose two variants of a monitor-controller- adapter loop that adapts application-level parameters to meet performance goals. We demonstrate not only that fault tolerance and self-adaptivity can be achieved in embedded platforms, but also that it can be done without incurring large overheads. In addressing these problems, we present techniques which have been realized (depending on their characteristics) in the form of a design tool, a run-time library or a hardware core to be added to the basic architecture

    Integrated support for Adaptivity and Fault-tolerance in MPSoCs

    Get PDF
    The technology improvement and the adoption of more and more complex applications in consumer electronics are forcing a rapid increase in the complexity of multiprocessor systems on chip (MPSoCs). Following this trend, MPSoCs are becoming increasingly dynamic and adaptive, for several reasons. One of these is that applications are getting intrinsically dynamic. Another reason is that the workload on emerging MPSoCs cannot be predicted because modern systems are open to new incoming applications at run-time. A third reason which calls for adaptivity is the decreasing component reliability associated with technology scaling. Components below the 32-nm node are more inclined to temporal or even permanent faults. In case of a malfunctioning system component, the rest of the system is supposed to take over its tasks. Thus, the system adaptivity goal shall influence several de- sign decisions, that have been listed below: 1) The applications should be specified such that system adaptivity can be easily supported. To this end, we consider Polyhedral Process Networks (PPNs) as model of computation to specify applications. PPNs are composed by concurrent and autonomous processes that communicate between each other using bounded FIFO channels. Moreover, in PPNs the control is completely distributed, as well as the memories. This represents a good match with the emerging MPSoC architectures, in which processing elements and memories are usually distributed. Most importantly, the simple operational semantics of PPNs allows for an easy adoption of system adaptivity mechanisms. 2) The hardware platform should guarantee the flexibility that adaptivity mechanisms require. Networks-on-Chip (NoCs) are emerging communication infrastructures for MPSoCs that, among many other advantages, allow for system adaptivity. This is because NoCs are generic, since the same platformcan be used to run different applications, or to run the same application with different mapping of processes. However, there is a mismatch between the generic structure of the NoCs and the semantics of the PPN model. Therefore, in this thesis we investigate and propose several communication approaches to overcome this mismatch. 3) The system must be able to change the process mapping at run-time, using process migration. To this end, a process migration mechanism has been proposed and evaluated. This mechanism takes into account specific requirements of the embedded domain such as predictability and efficiency. To face the problem of graceful degradation of the system, we enriched the MADNESS NoC platform by adding fault tolerance support at both software and hardware level. The proposed process migration mechanism can be exploited to cope with permanent faults by migrating the processes running on the faulty processing element. A fast heuristic is used to determine the new mapping of the processes to tiles. The experimental results prove that the overhead in terms of execution time, due to the execution time of the remapping heuristic, together with the actual process migration, is almost negligible compared to the execution time of the whole application. This means that the proposed approach allows the system to change its performance metrics and to react to faults without a substantial impact on the user experience

    Integrated support for Adaptivity and Fault-tolerance in MPSoCs

    Get PDF
    The technology improvement and the adoption of more and more complex applications in consumer electronics are forcing a rapid increase in the complexity of multiprocessor systems on chip (MPSoCs). Following this trend, MPSoCs are becoming increasingly dynamic and adaptive, for several reasons. One of these is that applications are getting intrinsically dynamic. Another reason is that the workload on emerging MPSoCs cannot be predicted because modern systems are open to new incoming applications at run-time. A third reason which calls for adaptivity is the decreasing component reliability associated with technology scaling. Components below the 32-nm node are more inclined to temporal or even permanent faults. In case of a malfunctioning system component, the rest of the system is supposed to take over its tasks. Thus, the system adaptivity goal shall influence several de- sign decisions, that have been listed below: 1) The applications should be specified such that system adaptivity can be easily supported. To this end, we consider Polyhedral Process Networks (PPNs) as model of computation to specify applications. PPNs are composed by concurrent and autonomous processes that communicate between each other using bounded FIFO channels. Moreover, in PPNs the control is completely distributed, as well as the memories. This represents a good match with the emerging MPSoC architectures, in which processing elements and memories are usually distributed. Most importantly, the simple operational semantics of PPNs allows for an easy adoption of system adaptivity mechanisms. 2) The hardware platform should guarantee the flexibility that adaptivity mechanisms require. Networks-on-Chip (NoCs) are emerging communication infrastructures for MPSoCs that, among many other advantages, allow for system adaptivity. This is because NoCs are generic, since the same platformcan be used to run different applications, or to run the same application with different mapping of processes. However, there is a mismatch between the generic structure of the NoCs and the semantics of the PPN model. Therefore, in this thesis we investigate and propose several communication approaches to overcome this mismatch. 3) The system must be able to change the process mapping at run-time, using process migration. To this end, a process migration mechanism has been proposed and evaluated. This mechanism takes into account specific requirements of the embedded domain such as predictability and efficiency. To face the problem of graceful degradation of the system, we enriched the MADNESS NoC platform by adding fault tolerance support at both software and hardware level. The proposed process migration mechanism can be exploited to cope with permanent faults by migrating the processes running on the faulty processing element. A fast heuristic is used to determine the new mapping of the processes to tiles. The experimental results prove that the overhead in terms of execution time, due to the execution time of the remapping heuristic, together with the actual process migration, is almost negligible compared to the execution time of the whole application. This means that the proposed approach allows the system to change its performance metrics and to react to faults without a substantial impact on the user experience

    Automatically Parallelizing Embedded Legacy Software on Soft-Core SoCs

    Get PDF
    Nowadays, embedded systems are utilized in many areas and become omnipresent, making people's lives more comfortable. Embedded systems have to handle more and more functionality in many products. To maintain the often required low energy consumption, multi-core systems provide high performance at moderate energy consumption. The development started with dual-core processors and has today reached many-core designs with dozens and hundreds of processor cores. However, existing applications can barely leverage the potential of that many cores. Legacy applications are usually written sequentially and thus typically use only one processor core. Thus, these applications do not benefit from the advantages provided by modern many-core systems. Rewriting those applications to use multiple cores requires new skills from developers and it is also time-consuming and highly error prone. Dozens of languages, APIs and compilers have already been presented in the past decades to aid the user with parallelizing applications. Fully automatic parallelizing compilers are seen as the holy grail, since the user effort is kept minimal. However, automatic parallelizers often cannot extract parallelism as good as user aided approaches. Most of these parallelization tools are designed for desktop and high-performance systems and are thus not tuned or applicable for low performance embedded systems. To improve this situation, this work presents an automatic parallelizer for embedded systems, which is able to mostly deliver better quality than user aided approaches and if not allows easy manual fine-tuning. Parallelization tools extract concurrently executable tasks from an application. These tasks can then be executed on different processor cores. Parallelization tools and automatic parallelizers in particular often struggle to efficiently map the extracted parallelism to an existing multi-core processor. This work uses soft-core processors on FPGAs, which makes it possible to realize custom multi-core designs in hardware, within a few minutes. This allows to adapt the multi-core processor to the characteristics of the extracted parallelism. Especially, core-interconnects for communication can be optimized to fit the communication pattern of the parallel application. Embedded applications are often structured as follows: receive input data, (multiple) data processing steps, data output. The multiple processing steps are often realized as consecutive loosely coupled transformations. These steps naturally already model the structure of a processing pipeline. It is the goal of this work to extract this kind of pipeline-parallelism from an application and map it to multiple cores to increase the overall throughput of the system. Multiple cores forming a chain with direct communication channels ideally fit this pattern. The previously described, so called pipeline-parallelism is a barely addressed concept in most parallelization tools. Also, current multi-core designs often do not support the hardware flexibility provided by soft-cores, targeted in this approach. The main contribution of this work is an automatic parallelizer which is able to map different processing steps from the source-code of a sequential application to different cores in a multi-core pipeline. Users only specify the required processing speed after parallelization. The developed tool tries to extract a matching parallelized software design along with a custom multi-core design out of sequential embedded legacy applications. The automatically created multi-core system already contains used peripherals extracted from the source-code and is ready to be used. The presented parallelizer implements multi-objective optimization to generate a minimal hardware design, just fulfilling the user defined requirement. To the best of my knowledge, the possibility to generate such a multi-core pipeline defined by the demands of the parallelized software has never been presented before. The approach is implemented for two soft-core processors and evaluation shows for both targets high speedups of 12x and higher at a reasonable hardware overhead. Compared to other automatic parallelizers, which mainly focus on speedups through latency reduction, significantly higher speedups can be achieved depending on the given application structure

    Actor-Oriented Programming for Resource Constrained Multiprocessor Networks on Chip

    Get PDF
    Multiprocessor Networks on Chip (MPNoCs) are an attractive architecture for integrated circuits as they can benefit from the improved performance of ever smaller transistors but are not severely constrained by the poor performance of global on-chip wires. As the number of processors increases it becomes ever more expensive to provide coherent shared memory but this is a foundational assumption of thread-level parallelism. Threaded models of concurrency cannot efficiently address architectures where shared memory is not coherent or does not exist. In this thesis an extended actor oriented programming model is proposed to enable the design of complex and general purpose software for highly parallel and decentralised multiprocessor architectures. This model requires the encapsulation of an execution context and state into isolated Machines which may only initiate communication with one another via explicitly named channels. An emphasis on message passing and strong isolation of computation encourages application structures that are congruent with the nature of non-shared memory multiprocessors, and the model also avoids creating dependences on specific hardware topologies. A realisation of the model called Machine Java is presented to demonstrate the applicability of the model to a general purpose programming language. Applications designed with this framework are shown to be capable of scaling to large numbers of processors and remain independent of the hardware targets. Through the use of an efficient compilation technique, Machine Java is demonstrated to be portable across several architectures and viable even in the highly constrained context of an FPGA hosted MPNoC

    Kirjastonhallinnan toteutus Kactus2 IP-XACT työkalussa

    Get PDF
    The size and complexity of embedded systems have grown at an accelerating pace over the last years. This causes demand to improve the productivity of the design process e.g. by enhancing the reusability of logic components, also called IP-blocks. Improving reusability requires use of new design tools and methods. IP-XACT is a XML based metadata standard, which describes IP-blocks in a tool, implementation and vendor neutral way. Previously there hasn’t been open source design tools supporting IP-XACT and the commercial tools are expensive, thus limiting the ability of small and middle-sized companies to use IP-XACT. This thesis presents an open source IP-XACT design tool called Kactus2. The scope of the thesis is the library management and IP-packaging modules, which enable automated management of IP-blocks. The thesis presents a few extensions to the standard, which expand the original scope of IP-XACT towards product management. The design and implementation of the library management and IP-packaging classes and the user interfaces are described. The implementation language was C++ and the used development framework was the open source version 4.8.3 of Qt. The development environment was Microsoft Visual Studio 2008 with the Qt add-in installed. Qt enables cross-platform development, which facilitated the release of Kactus2 for both Windows and Linux operating systems. The sizes of the presented modules in code lines are 7.500 for library management and 21.000 for IP-packaging. The corresponding class counts are 26 and 156. The code line count for whole Kactus2 tool is 103.000 lines. Library management contains two views of the library structure and a segment to define search options. Packaging module contains 28 editors for different elements of the metadata. The graphical user interface was designed to be easy to use, enabling users to adopt new design methods. Also, the tool contains a context based help system, which reacts to user’s actions giving advice related to the task on hand. The total download count for different Kactus2 versions is over 1.700.Sulautettujen järjestelmien koko ja monimutkaisuus ovat viime vuosina kasvaneet kiihtyvällä tahdilla. Siksi suunnittelun tuottavuutta täytyy tehostaa, johon on pyritty mm. käyttämällä uudelleenkäytettäviä logiikkakomponentteja. Uudelleenkäytön tehostaminen vaatii uusia suunnittelutyökaluja ja metodeja. IP-XACT on XML-pohjainen metadata standardi, jolla kuvataan uudelleenkäytettäviä logiikkakomponentteja, eli IP-lohkoja, työkalu- toteutus- ja toimittajaneutraalilla tavalla. Ongelmana IP-XACT:in yleistymisessä on ollut työkalujen tuki. Saatavilla ei ole aiemmin ollut vapaan lähdekoodin suunnittelutyökaluja ja kaupalliset vaihtoehdot ovat kalliita, mikä rajoittaa pienten ja keskisuurten yritysten mahdollisuuksia ottaa IP-XACT käyttöön. Tässä diplomityössä esitellään avoimen lähdekoodin Kactus2 työkalu IP-XACT-pohjaiseen suunnitteluun. Työn aiheena on työkalun kirjastonhallinta- ja IP-paketointimoduulit, joiden avulla IP-lohkoille voidaan luoda metadata-kuvaukset ja hallinnoida lohkoja automatisoidusti. Diplomityössä esitellään muutamia lisäyksiä, jotka laajentavat alkuperäistä standardia myös tuotetiedon hallintaan. Työssä sekä suunniteltiin että toteutettiin kirjastonhallinnan ja paketoinnin vaatimat luokat ja käyttöliittymänäkymät. Toteutuksessa käytettiin C++ ohjelmointikieltä ja ohjelmistokehyksenä käytettiin Qt:n avoimen lähdekoodin versiota 4.8.3. Kehitysympäristönä toimi Microsoftin Visual Studio 2008, johon oli asennettu Qt lisäosa. Qt mahdollistaa järjestelmäriippumattoman koodin kirjoittamisen, joten Kactus2 on julkaistu sekä Windows että Linux käyttöjärjestelmille. Esiteltyjen moduulien koot koodiriveinä ovat 7.500 kirjastonhallinta- ja 21.000 IP-paketointimoduulille. Vastaavat luokkien määrät ovat 26 ja 156. Koko Kactus2:n koodirivimäärä on 103.000 riviä. Kirjastonhallinta sisältää kaksi eri näkymää kirjaston rakenteesta, sekä oman osan kirjaston hakuehtojen määrittämiseen. Paketointimoduuli sisältää 28 eri editoria. Käyttöliittymästä on pyritty tekemään selkeä ja helppokäyttöinen, jotta käyttäjien olisi helppo omaksua uusia toimintatapoja. Lisäksi työkaluun on lisätty kontekstipohjainen opastusjärjestelmä, joka reagoi käyttäjän tekemisiin. Kokonaisuudessaan Kactus2:n eri versioita on ladattu yli 1.700 kertaa
    corecore