13 research outputs found

    Ocelotl: Large Trace Overviews Based on Multidimensional Data Aggregation

    No full text
    International audiencePerformance analysis of parallel applications is commonly based on execution traces that might be investigated through visualization techniques. The weak scalability of such techniques appears when traces get larger both in time (many events registered) and space (many processing elements), a very common situation for current large-scale HPC applications. In this paper we present an approach to tackle such scenarios in order to give a correct overview of the behavior registered in very large traces. Two configurable and controlled aggregation-based techniques are presented: one based exclusively on the temporal aggregation, and another that consists in a spatiotemporal aggregation algorithm. The paper also details the implementation and evaluation of these techniques in Ocelotl, a performance analysis and visualization tool that overcomes the current graphical and interpretation limitations by providing a concise overview registered on traces. The experimental results show that Ocelotl helps in detecting quickly and accurately anomalies in 8 GB traces containing up to two hundred million of events

    Génération automatique d'architectures multiprocesseurs hétérogènes: aspects logiciel et matériel

    No full text
    Embedded systems are now ubiquitous and the increase in the in- tegration capacity allows for more features and capabilities. This trend has led to the emergence of Heterogeneous Multiprocessors Systems-on-Chip(H-MPSoC)whichprovideawaytorespectthecost and performance constraints inherent to embedded systems. How- ever they also make the task of designing and programming such systems a long and arduous process. The skills required along with the long development time are obstacles to their diffusion. It is thus necessary to develop tools that will free designers from architectural and programming details, so that they can focus on the tasks where they can bring added-value. The objective is thus to automatize the tedious tasks that burden the design of H-MPSoC, in particular on FPGA, by providing a higher-level of abstraction following a method that brings together High-Level Synthesis and hardware/software co- design beyond the existing solutions which are whether incomplete or unfit.The presented work aims at providing an answer to these problems. They introduce a design framework relying on the automation of te- dious tasks and allowing designers to express their expertise where they want to. For this, we rely on an architecture model defined with a high-level formalism independent from implementation details, pro- viding a solution to the lack of multiprocessor architecture in FPGAs. Thisspecificationmodelalsoallowsdesignerstoprovidedesigncon- straintsinaccordancewiththeirlevelofexpertiseorinvolvement.The design space exploration is implemented as a scalable algorithm re- lying on fast and accurate estimation techniques. A method for the explorationofhardwareacceleratorsbasedonhigh-levelsynthesis to provide fast cost estimations is introduced. Finally the integration of model-driven engineering methods enables portability and reuse by generating the final design implementation. The framework is val- idated through two case studies: an MJPEG video decoder and a morecomplexfacedetectionapplication.Les systèmes embarqués sont aujourd’hui omniprésents et les progrès d’intégration accompagnant cette évolution permettent d’accroître leurs fonctionnalités et capacités potentielles. Cette co_évolution a conduit à l’emergence de systèmes-sur-puce multiprocesseurs hétérogènes qui répondent aux contraintes des systèmes embarqués en termes de performances et d’énergie. Cependant cet avantage se traduit par une complexité de conception et de programmation accrue. Le niveau d’expertise requis ainsi que le temps de développement limitent considérablement leur déploiement, il est donc nécessaire de réaliser des outils permettant d’affranchir les concepteurs des détails architecturaux et de programmation afin qu’ils puissent mobiliser leurs efforts sur les étapes à forte valeur ajoutée. L’objectif est donc d’automatiser les tâches fastidieuses et chronophages propres à la conception d’architectures multiprocesseurs hétérogènes, notamment sur FPGA, en élevant le niveau d’abstraction selon une approche qui unifie la synthèse de haut-niveau et la co-conception logicielle/matérielle au-delà des ap- proches existantes qui se révèlent partielles ou inadaptées.Les travaux de cette thèse sont une réponse à ce problème, ils présentent un outil de conception reposant sur le principe d’une automatisation des tâches fastidieuses et laissant la main au concepteur là où celui-ci le souhaite. Pour cela, on s’appuie sur un modèle d’architecture défini à l’aide d’un formalisme de haut-niveau indépendant des détails d’implémentation, palliant ainsi l’absence d’architecture multiprocesseur sous-jacente dans les FPGA. Ce modèle de spécification permet également au concepteur de fournir les contraintes à différents niveaux de détails en fonction des connaissances du système ou de son niveau d’implication. L’exploration de l’espace de conception se fait grâce à un algorithme scalable et reposant sur des estimations rapides et précises. Une méthode d’exploration des accélérateurs matériels, utilisant la synthèse de haut-niveau pour une estimation rapide des coûts. Enfin, l’intégration de méthodes d’ingénierie dirigée par les modèles permet la génération du design final et notamment des fichiers d’implémentation en fonction de la cible, facilitant ainsi la portabilité et la réutilisation des designs. L’outil a été validé à travers deux études de cas: un décodeur vidéo MJPEG et une application complexe de détection de visage

    Memory Organisation Cartography & Analysis

    Get PDF
    Although performance analysis is one of the most important phase of High Perfor-mance Computing application development, analysis tools are complex to use. Most of the time,they rely on performance counters which are hard to understand for the final user. Therefore onlyfew advanced users are able to do an efficient and precise analysis.Moreover these counters focus on the processor while most of the performance issues are due to abad memory usage.In this report, we present MOCA a new kind of analysis tool which provides an overview of thememory access over time. We present MOCA's trace visualization through Ocelotl, a tool designedto provided an aggregated view of data while loosing as few information as possible. Finally, weexplain how using these tools, the user can identify memory usage patterns, execution phases andhow to interpret them.Bien que l'analyse de performances soit une phase importante du developpe-ment d'applications de calcul haute performance, les outils d'analyses existants sont complexesà utiliser. Ils se basent la plupart du temps sur des compteurs de performances difficiles à in-terpréter pour l'utilisateur. De ce fait, seuls quelques utilisateurs aguerris sont capables de lesutiliser correctement pour mener à bien une telle analyse.De plus ces compteurs se focalisent sur les performances de processeur alors que la plupartdes problèmes de performances proviennent d'une mauvaise utilisation de la mémoire.Dans ce rapport, nous présentons MOCA un nouvel outil d'analyse qui se concentre sur lamémoire. Nous proposons une visualization des traces produites par MOCA via l'outil Ocelotlqui permet d'obtenir une vue aggregée en minimisant la perte d'informations. Finallement nousexpliquant, comment ces deux outils permettent à l'utilisateur de voir les schémas d'utilisationde la mémoire, les phases d'éxecution et comment les interpréter

    A framework for high-level synthesis of heterogeneous MP-SoC

    No full text
    International audienceIn this paper we propose an ESL synthesis framework which, from the C code of an application and a description of a generic architecture, automatically explores and generates a complete synthesizable version of a H-MPSoC architecture along with the adapted code application. We developed a Design Space Exploration (DSE) algorithm that merges hardware specialization, data-parallelism exploration, processor instantiation and task mapping according to user performance and cost constraints. We also inserted HLS in the DSE loop and get fast exploration of hardware acceleration. A new ESL framework is presented, it combines our contributions with some legacy tools issued from our and another team. We validated our framework with a case study of an MJPEG decode

    Experiment Centric Teaching for Reconfigurable Processors

    Get PDF
    This paper presents a setup for teaching configware to master students. Our approach focuses on experiment and leaning-by-doing while being supported by research activity. The central project we submit to students addresses building up a simple RISC processor, that supports an extensible instructions set thanks to its reconfigurable functional unit. The originality comes from that the students make use of the Biniou framework. Biniou is a research tool which approach covers tasks ranging from describing the RFU, synthesizing it as VHDL code, and implementing applications over it. Once done, students exhibit a deep understanding of the domain, ensuring the ability to fast adapt to state-of-the-art techniques

    TBES: Template-Based Exploration and Synthesis of Heterogeneous Multiprocessor Architectures on FPGA

    No full text
    International audienceThis paper describes TBES, a software end-to-end environment for synthesizing multi-task applications on FPGAs. The implementation follows a template-based approach for creating heterogeneous multiprocessor architectures. Heterogene-ity stems from the use of general-purpose processors along with custom accelerators. Experimental results demonstrate substantial speedup for several classes of applications. Furthermore, this work allows to reduce development costs and save development time, both for the software architect, the domain expert, and the optimization expert. This work provides a framework to bring together various existing tools and optimisation algorithms. The advantages are manifold: modularity and flexibility, easy customization for best fit algorithm selection, durability and evolution over time, and legacy preservation including domain expert's know-how. In addition to the use of architecture templates for the overall system, a second contribution lies upon using high-level synthesis for promoting exploration of hardware IPs. The domain expert, who best knows which tasks are good candidates for hardware implementation, selects parts of the initial application, to be potentially synthesized as dedicated accelerators. As a consequence, HLS general problem turns into a constrained and more tractable issue, and automation capabilities eliminate the need for tedious and error prone manual processes during domain space exploration. The automation only takes place once the application has been broken down into concurrent tasks by the designer, who can then drive the synthesis process with a set of parameters provided by TBES to balance tradeoffs between optimization efforts and quality of results. The approach is demonstrated step by step up to FPGA implementations and executions with an MJPEG benchmark and a complex Viola-Jones face detection application. We show that TBES allows to achieve results with up to 10X speedup, to reduce development times and to widen design space exploration

    Fast Template-based Heterogeneous MPSoC Synthesis on FPGA

    No full text
    International audienceOur contribution lies in offering a fast and parametrized domain-space exploration to the designer, whose expertise drives the whole process while staying the actor of added-value creation. In this paper, we present two new features and two important improvements of our H-MPSoC synthesis framework. The first one is a new template- based approach for automated design space exploration and synthesis. A template describes an architecture model for a specific domain and has three levels of specifications each representing a different level of de- sign expertise. We also rely on the Model-Driven Architecture (MDA) paradigm to provide flexibility, reusability and code generation for dif- ferent FPGA targets. We have refined the communication models to get more accurate performance estimations. Finally, we also improved our mapping decision algorithm that drastically reduces the simulation time. The output is the synthesizable code of the hardware architecture, the adapted code of the C application and the project files for FPGA design tools. We use an MJPEG decoder as a case-study to validate our framework on a Xilinx FPGA

    Ocelotl: Large Trace Overviews Based on Multidimensional Data Aggregation

    Get PDF
    International audiencePerformance analysis of parallel applications is commonly based on execution traces that might be investigated through visualization techniques. The weak scalability of such techniques appears when traces get larger both in time (many events registered) and space (many processing elements), a very common situation for current large-scale HPC applications. In this paper we present an approach to tackle such scenarios in order to give a correct overview of the behavior registered in very large traces. Two configurable and controlled aggregation-based techniques are presented: one based exclusively on the temporal aggregation, and another that consists in a spatiotemporal aggregation algorithm. The paper also details the implementation and evaluation of these techniques in Ocelotl, a performance analysis and visualization tool that overcomes the current graphical and interpretation limitations by providing a concise overview registered on traces. The experimental results show that Ocelotl helps in detecting quickly and accurately anomalies in 8 GB traces containing up to two hundred million of events

    phospho-Ku70 induced by DNA damage interacts with RNA Pol II and promotes the formation of 53BP1 foci to ensure optimal cNHEJ

    Get PDF
    International audienceCanonical non-homologous end-joining (cNHEJ) is the prominent mammalian DNA double-strand breaks (DSBs) repair pathway operative throughout the cell cycle. Phosphorylation of Ku70 at ser27-ser33 (pKu70) is induced by DNA DSBs and has been shown to regulate cNHEJ activity, but the underlying mechanism remained unknown. Here, we established that following DNA damage induction, Ku70 moves from nucleoli to the sites of damage, and once linked to DNA, it is phosphorylated. Notably, the novel emanating functions of pKu70 are evidenced through the recruitment of RNA Pol II and concomitant formation of phospho-53BP1 foci. Phosphorylation is also a prerequisite for the dynamic release of Ku70 from the repair complex through neddylation-dependent ubiquitylation. Although the non-phosphorylable ala-Ku70 form does not compromise the formation of the NHEJ core complex per se, cells expressing this form displayed constitutive and stress-inducible chromosomal instability. Consistently, upon targeted induction of DSBs by the I-SceI meganuclease into an intrachromosomal reporter substrate, cells expressing pKu70, rather than ala-Ku70, are protected against the joining of distal DNA ends. Collectively, our results underpin the essential role of pKu70 in the orchestration of DNA repair execution in living cells and substantiated the way it paves the maintenance of genome stability
    corecore