    CONTREX: Design of embedded mixed-criticality CONTRol systems under consideration of EXtra-functional properties

    The increasing processing power of today’s HW/SW platforms leads to the integration of more and more functions in a single device. Additional design challenges arise when these functions share computing resources and belong to different criticality levels. CONTREX complements current activities in the area of predictable computing platforms and segregation mechanisms with techniques to consider the extra-functional properties, i.e., timing constraints, power, and temperature. CONTREX enables energy efficient and cost aware design through analysis and optimization of these properties with regard to application demands at different criticality levels. This article presents an overview of the CONTREX European project, its main innovative technology (extension of a model based design approach, functional and extra-functional analysis with executable models and run-time management) and the final results of three industrial use-cases from different domain (avionics, automotive and telecommunication).The work leading to these results has received funding from the European Community’s Seventh Framework Programme FP7/2007-2011 under grant agreement no. 611146

    Design Space Exploration and Resource Management of Multi/Many-Core Systems

    The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends

    Revisiting the high-performance reconfigurable computing for future datacenters

    Modern datacenters are reinforcing the computational power and energy efficiency by assimilating field programmable gate arrays (FPGAs). The sustainability of this large-scale integration depends on enabling multi-tenant FPGAs. This requisite amplifies the importance of communication architecture and virtualization method with the required features in order to meet the high-end objective. Consequently, in the last decade, academia and industry proposed several virtualization techniques and hardware architectures for addressing resource management, scheduling, adoptability, segregation, scalability, performance-overhead, availability, programmability, time-to-market, security, and mainly, multitenancy. This paper provides an extensive survey covering three important aspects-discussion on non-standard terms used in existing literature, network-on-chip evaluation choices as a mean to explore the communication architecture, and virtualization methods under latest classification. The purpose is to emphasize the importance of choosing appropriate communication architecture, virtualization technique and standard language to evolve the multi-tenant FPGAs in datacenters. None of the previous surveys encapsulated these aspects in one writing. Open problems are indicated for scientific community as well

    Sequence-To-Sequence Neural Networks Inference on Embedded Processors Using Dynamic Beam Search

    Sequence-to-sequence deep neural networks have become the state of the art for a variety of machine learning applications, ranging from neural machine translation (NMT) to speech recognition. Many mobile and Internet of Things (IoT) applications would benefit from the ability of performing sequence-to-sequence inference directly in embedded devices, thereby reducing the amount of raw data transmitted to the cloud, and obtaining benefits in terms of response latency, energy consumption and security. However, due to the high computational complexity of these models, specific optimization techniques are needed to achieve acceptable performance and energy consumption on single-core embedded processors. In this paper, we present a new optimization technique called dynamic beam search, in which the inference complexity is tuned to the difficulty of the processed input sequence at runtime. Results based on measurements on a real embedded device, and on three state-of-the-art deep learning models, show that our method is able to reduce the inference time and energy by up to 25% without loss of accuracy

    A hierarchical run-time adaptive resource allocation framework for large-scale MPSoC systems

    In the embedded computer system domain, MPSoC systems have become increasingly popular due to the ever-increasing performance demands of modern embedded applications. The number of processing elements in these MPSoCs also steadily increases. Whereas current MPSoCs still contain a limited number of processing elements, future MPSoCs will feature tens up to hundreds of (heterogeneous) processing elements that are all integrated on a single chip. On these future large-scale MPSoC systems, the mapping of applications onto the hardware resources plays an important role to fully explore the parallelism of applications. In this article, a hierarchical run-time adaptive resource allocation framework which uses an intelligent task remapping approach is proposed to improve the system performance for large-scale MPSoCs

    Expérimentation d'une suite d'outils pour automatiser le passage d'une conception basée sur un modèle vers la réalisation d'une implémentation, en passant par l'exploration architecturale

    RÉSUMÉ Aujourd’hui, les systèmes embarqués sont de plus en plus complexes à développer surtout s’il s’agit de systèmes temps réel. Ces projets intègrent des technologies à la fine pointe de la recherche, qui sont compliquées à mettre en place. La complexité de conception de ces systèmes repose sur la nécessité de trouver un équilibre entre la puissance de calcul requise, la surface de carte et le nombre de ressources matérielles utilisées, ou encore la consommation du circuit. En ajoutant à tout cela des temps de mise en marché de plus en plus stricts pour ce genre de systèmes, les besoins d’outils et de flots de conception efficaces deviennent de plus en plus pressants. Dans cette optique, de nombreux langages de spécification de système ont été mis au point. Ils sont échelonnés à différents niveaux d’abstraction allant des langages de haut niveau d’abstraction comme sysML ou AADL jusqu’au bas niveau RTL en passant par des spécifications pour ESL (Electronic system level) comme SystemC. Ces langages sont liés à des méthodologies basées sur les modèles. Le projet de recherche présenté dans ce mémoire consiste à mettre en avant une méthodologie de conception d’un système embarqué. Cette méthodologie s’illustre au travers d’un flot de conception utilisant le langage de description de système AADL ainsi que la plateforme de codesign SpaceStudio. Elle vise à développer en parallèle des applications logicielles ainsi que les plateformes matérielles sur lesquelles ces applications doivent s’exécuter. Le défi de ce projet consiste donc à réaliser la jonction entre le langage AADL et la plateforme SpaceStudio. L’outil chargé de réaliser cette jonction compile du code AADL et génère un script python. Ce script est lu par l’API du logiciel SpaceStudio qui permet de générer un projet sur sa plateforme de coconception. L’outil créé durant ce projet et nommé AADL2Space est testé à travers un exemple de modèle AADL disponible sur Internet. Par la suite, une application de décodage vidéo MJPEG est utilisée pour illustrer le flot de conception. Un modèle AADL de cette application a été développé afin de fournir la description architecturale du système. La partie applicative du système a été codée en C et associée au modèle AADL. Ainsi, un système complet est compilé par AADL2Space pour ainsi générer un projet SpaceStudio. Une fois le projet instancié sur la plateforme de coconception, celui-ci est simulé et analysé afin d’obtenir des métriques permettant de valider ou non l’architecture. De cette façon, plusieurs architectures sont testées afin de satisfaire les contraintes d’ordonnancement temps réel, de taux d’utilisation des processeurs, d’utilisation des ressources matérielles, etc. L’architecture choisie est enfin synthétisée pour être implémentée sur carte. Ce projet a conduit à l’écriture d’un article de conférence à EEE international Symposium on Rapid System Prototyping (RSP)----------ABSTRACT Nowadays, embedded systems are increasingly complex to design. These system’s design complexity is based on the need to find a balance between the required power, the used area on ship and hardware resources, and the system consumption. This issue mainly occurs for real-time systems. For such systems, times to market are more and more demanding. Consequently, new tools and design flows are definitely needed. This project bridges and validates two of these technologies. To reach our goal, numerous system description languages and libraries have been worked out. They have different abstraction levels from high abstraction level languages as SysML or AADL, to low level abstraction RTL, through ESL (Electronic system level) as systemC. The aim of the research project introduced in this work is to show an embedded system design methodology. This methodology is illustrated through a design flow using the description language AADL and the SpaceStudioTM HW/SW co-design platform. It targets a parallel design of software applications and hardware platform on which applications will be executed. This project’s challenge is to fill the gap between the description language AADL and SpaceStudio platform. SpaceStudio is a scriptable tool. All the graphic manipulations can also be achieved through a Python script. The proposed tool filling this gap acts as a compiler of an AADL code and generate a Python script that can be used as an input description of SpaceStudio. The created tool called AADL2Space is tested thanks to an AADL model example available on Internet. Next, an MJPEG video decoder application is used to illustrate the design flow. An AADL model of this application has been designed to provide the system’s architectural description. The software part of the system has been coded in C language and bound to the AADL model. Thereby, a complete system is compiled by the designed tool and generated as a SpaceStudio project. Once the project has been instantiated on the co-design platform, it is simulated and analyzed to validate metric performances. Different architecture configurations are tested to meet system’s constraints as real time scheduling, processor’s use rate, use of hardware resources, etc. The chosen architecture configuration is finally synthetized to be implemented on a FPGA

    Extending the battery life of mobile device by computation offloading

    Doctor of PhilosophyComputing and Information SciencesDaniel A. AndresenThe need for increased performance of mobile device directly conflicts with the desire for longer battery life. Offloading computation to resourceful servers is an effective method to reduce energy consumption and enhance performance for mobile applications. Today, most mobile devices have fast wireless link such as 4G and Wi-Fi, making computation offloading a reasonable solution to extend battery life of mobile device. Android provides mechanisms for creating mobile applications but lacks a native scheduling system for determining where code should be executed. We present Jade, a system that adds sophisticated energy-aware computation offloading capabilities to Android applications. Jade monitors device and application status and automatically decides where code should be executed. Jade dynamically adjusts offloading strategy by adapting to workload variation, communication costs, and device status. Jade minimizes the burden on developers to build applications with computation offloading ability by providing easy-to-use Jade API. Evaluation shows that Jade can effectively reduce up to 37% of average power consumption for mobile device while improving application performance

    Comparaison de strategies de calcul de bornes sur NoC

    The Kalray MPPA2-256 processor integrates 256 processing cores and 32 management cores on a chip. Theses cores are grouped into clusters, and clusters are connected by a high-performance network on chip (NoC). This NoC provides some hardware mechanisms (egress traffic limiters) that can be configured to offer bounded latencies. This paper presents how network calculus can be used to bound these latencies while computing the routes of data flows, using linear programming. Then, its shows how other approaches can also be used and adapted to analyze this NoC. Their performances are then compared on three case studies: two small coming from previous studies, and one realistic with 128 or 256 flows. On theses cases studies, it shows that modeling the shaping introduced by links is of major importance to get accurate bounds. And when packets are of constant size, the Total Flow Analysis gives, on average, bounds 20%-25% smaller than all other methods

    Exploring resource/performance trade-offs for streaming applications on embedded multiprocessors

    Embedded system design is challenged by the gap between the ever-increasing customer demands and the limited resource budgets. The tough competition demands ever-shortening time-to-market and product lifecycles. To solve or, at least to alleviate, the aforementioned issues, designers and manufacturers need model-based quantitative analysis techniques for early design-space exploration to study trade-offs of different implementation candidates. Moreover, modern embedded applications, especially the streaming applications addressed in this thesis, face more and more dynamic input contents, and the platforms that they are running on are more flexible and allow runtime configuration. Quantitative analysis techniques for embedded system design have to be able to handle such dynamic adaptable systems. This thesis has the following contributions: - A resource-aware extension to the Synchronous Dataflow (SDF) model of computation. - Trade-off analysis techniques, both in the time-domain and in the iterationdomain (i.e., on an SDF iteration basis), with support for resource sharing. - Bottleneck-driven design-space exploration techniques for resource-aware SDF. - A game-theoretic approach to controller synthesis, guaranteeing performance under dynamic input. As a first contribution, we propose a new model, as an extension of static synchronous dataflow graphs (SDF) that allows the explicit modeling of resources with consistency checking. The model is called resource-aware SDF (RASDF). The extension enables us to investigate resource sharing and to explore different scheduling options (ways to allocate the resources to the different tasks) using state-space exploration techniques. Consistent SDF and RASDF graphs have the property that an execution occurs in so-called iterations. An iteration typically corresponds to the processing of a meaningful piece of data, and it returns the graph to its initial state. On multiprocessor platforms, iterations may be executed in a pipelined fashion, which makes performance analysis challenging. As the second contribution, this thesis develops trade-off analysis techniques for RASDF, both in the time-domain and in the iteration-domain (i.e., on an SDF iteration basis), to dimension resources on platforms. The time-domain analysis allows interleaving of different iterations, but the size of the explored state space grows quickly. The iteration-based technique trades the potential of interleaving of iterations for a compact size of the iteration state space. An efficient bottleneck-driven designspace exploration technique for streaming applications, the third main contribution in this thesis, is derived from analysis of the critical cycle of the state space, to reveal bottleneck resources that are limiting the throughput. All techniques are based on state-based exploration. They enable system designers to tailor their platform to the required applications, based on their own specific performance requirements. Pruning techniques for efficient exploration of the state space have been developed. Pareto dominance in terms of performance and resource usage is used for exact pruning, and approximation techniques are used for heuristic pruning. Finally, the thesis investigates dynamic scheduling techniques to respond to dynamic changes in input streams. The fourth contribution in this thesis is a game-theoretic approach to tackle controller synthesis to select the appropriate schedules in response to dynamic inputs from the environment. The approach transforms the explored iteration state space of a scenario- and resource-aware SDF (SARA SDF) graph to a bipartite game graph, and maps the controller synthesis problem to the problem of finding a winning positional strategy in a classical mean payoff game. A winning strategy of the game can be used to synthesize the controller of schedules for the system that is guaranteed to satisfy the throughput requirement given by the designer