2,080 research outputs found

    Mira: A Framework for Static Performance Analysis

    Full text link
    The performance model of an application can pro- vide understanding about its runtime behavior on particular hardware. Such information can be analyzed by developers for performance tuning. However, model building and analyzing is frequently ignored during software development until perfor- mance problems arise because they require significant expertise and can involve many time-consuming application runs. In this paper, we propose a fast, accurate, flexible and user-friendly tool, Mira, for generating performance models by applying static program analysis, targeting scientific applications running on supercomputers. We parse both the source code and binary to estimate performance attributes with better accuracy than considering just source or just binary code. Because our analysis is static, the target program does not need to be executed on the target architecture, which enables users to perform analysis on available machines instead of conducting expensive exper- iments on potentially expensive resources. Moreover, statically generated models enable performance prediction on non-existent or unavailable architectures. In addition to flexibility, because model generation time is significantly reduced compared to dynamic analysis approaches, our method is suitable for rapid application performance analysis and improvement. We present several scientific application validation results to demonstrate the current capabilities of our approach on small benchmarks and a mini application

    A Survey of Phase Classification Techniques for Characterizing Variable Application Behavior

    Full text link
    Adaptable computing is an increasingly important paradigm that specializes system resources to variable application requirements, environmental conditions, or user requirements. Adapting computing resources to variable application requirements (or application phases) is otherwise known as phase-based optimization. Phase-based optimization takes advantage of application phases, or execution intervals of an application, that behave similarly, to enable effective and beneficial adaptability. In order for phase-based optimization to be effective, the phases must first be classified to determine when application phases begin and end, and ensure that system resources are accurately specialized. In this paper, we present a survey of phase classification techniques that have been proposed to exploit the advantages of adaptable computing through phase-based optimization. We focus on recent techniques and classify these techniques with respect to several factors in order to highlight their similarities and differences. We divide the techniques by their major defining characteristics---online/offline and serial/parallel. In addition, we discuss other characteristics such as prediction and detection techniques, the characteristics used for prediction, interval type, etc. We also identify gaps in the state-of-the-art and discuss future research directions to enable and fully exploit the benefits of adaptable computing.Comment: To appear in IEEE Transactions on Parallel and Distributed Systems (TPDS

    ASAM: Automatic Architecture Synthesis and Application Mapping,

    Get PDF
    Abstract -This paper focuses on mastering the automatic architecture synthesis and application mapping for heterogeneous massively-parallel MPSoCs based on customizable applicationspecific instruction-set processors (ASIPs). It presents an overview of the research being currently performed in the scope of the European project ASAM (Architecture Synthesis and Application Mapping) of the ARTEMIS program. The paper briefly presents the results of our analysis of the main problems to be solved and challenges to be faced in the design of such heterogeneous MPSoCs. It explains which system, design, and electronic design automation (EDA) concepts seem to be adequate to resolve the problems and address the challenges. Finally, it introduces and briefly discusses the design-flow and its main stages proposed by the ASAM project consortium to enable an effective and efficient solution of these problems. Index Terms-embedded systems, heterogeneous multiprocessor system-on-chip (MPSoC), customizable ASIPs, architecture synthesis, MPSoC and ASIP design automation

    Généralisation de l’analyse de performance décrémentale vers l’analyse différentielle

    Get PDF
    A crucial step in the process of application performance analysis is the accurate detection of program bottlenecks. A bottleneck is any event which contributes to extend the execution time. Determining their cause is important for application developpers as it enable them to detect code design and generation flaws.Bottleneck detection is becoming a difficult art. Techniques such as event counts,which succeeded to find bottlenecks easily in the past, became less efficient because of the increasing complexity of modern micro-processors, and because of the introduction of parallelism at several levels. Consequently, a real need for new analysis approaches is present in order to face these challenges.Our work focuses on performance analysis and bottleneck detection of computeintensive loops in scientific applications. We work on Decan, a performance analysis and bottleneck detection tool, which offers an interesting and promising approach called Decremental Analysis. The tool, which operates at binary level, is based on the idea of performing controlled modifications on the instructions of a loop, and comparing the new version (called variant) to the original one. The goal is to assess the cost of specific events, and thus the existence or not of bottlenecks.Our first contribution, consists of extending Decan with new variants that we designed, tested and validated. Based on these variants, we developed analysis methods which we used to characterize hot loops and find their bottlenecks. Welater, integrated the tool into a performance analysis methodology (Pamda) which coordinates several analysis tools in order to achieve a more efficient application performance analysis.Second, we introduce several improvements on the Decan tool. Techniquesdeveloped to preserve the control flow of the modified programs, allowed to use thetool on real applications instead of extracted kernels. Support for parallel programs(thread and process based) was also added. Finally, our tool primarily relying on execution time as the main concern for its analysis process, we study the opportunity of also using other hardware generated events, through a study of their stability, precision and overheadUne des étapes les plus cruciales dans le processus d’analyse des performances d’une application est la détection des goulets d’étranglement. Un goulet étant tout évènement qui contribue à l’allongement temps d’exécution, la détection de ses causes est importante pour les développeurs d’applications afin de comprendre les défauts de conception et de génération de code. Cependant, la détection de goulets devient un art difficile. Dans le passé, des techniques qui reposaient sur le comptage du nombre d’évènements, arrivaient facilement à trouver les goulets. Maintenant, la complexité accrue des micro-architectures modernes et l’introduction de plusieurs niveaux de parallélisme ont rendu ces techniques beaucoup moins efficaces. Par conséquent, il y a un réel besoin de réflexion sur de nouvelles approches.Notre travail porte sur le développement d’outils d’évaluation de performance des boucles de calculs issues d’applications scientifiques. Nous travaillons sur Decan, un outil d’analyse de performance qui présente une approche intéressante et prometteuse appelée l’Analyse Décrémentale. Decan repose sur l’idée d’effectuer des changements contrôlés sur les boucles du programme et de comparer la version obtenue (appelée variante) avec la version originale, permettant ainsi de détecter la présence ou pas de goulets d’étranglement.Tout d’abord, nous avons enrichi Decan avec de nouvelles variantes, que nous avons conçues, testées et validées. Ces variantes sont, par la suite, intégrées dans une analyse de performance poussée appelée l’Analyse Différentielle. Nous avons intégré l’outil et l’analyse dans une méthodologie d’analyse de performance plus globale appelée Pamda.Nous décrirons aussi les différents apports à l’outil Decan. Sont particulièrement détaillées les techniques de préservation des structures de contrôle du programme,ainsi que l’ajout du support pour les programmes parallèles.Finalement, nous effectuons une étude statistique qui permet de vérifier la possibilité d’utiliser des compteurs d’évènements, autres que le temps d’exécution, comme métriques de comparaison entre les variantes Deca

    Modernization of a legacy system:event streaming with Apache Kafka and Spring Boot

    Get PDF
    Abstract. In this thesis, we will design, implement, and evaluate a brand new replacement, the Watcher, for a legacy system built over two decades ago. The Watcher is able to track changes in our PDM system, and notify users of the changes by email or as a push notifcation using SSE. Functional requirements for the new system come from the legacy system including the possibility to create subscriptions with a wide range of options to flter out redundant data traffc. The Watcher will also be able to carry out all operations than the predecessor with increased performance and effciency. The main focus is on scalability, maintainability, and fault tolerance. The reason for building a new system is mainly the cost of maintainability and further development of the legacy system as well as features removed due to obsolete technologies. In the literature review, we go through the theory of the technologies related to the project. We create a REST API with Spring Boot for interactions between users and the system, implement powerful event streaming and processing environment using Apache Kafka, and build a message service responsible for providing information via scheduled emails or SSE. In the end, we will use Docker to containerize all the services. In the project design, we present functional as well as technical requirements that we use later on to evaluate the project’s success. We also compare the legacy system to the new one using metrics such as speed and ease of the installation process. In the end, we discuss the project’s future including steps before going to production such as automatic testing, and further development for years to come such as orchestration.Legacy-palvelun uudistaminen : reaaliaikajärjestelmä Apache Kafkaa ja Spring Bootia hyödyntäen. Tiivistelmä. Tässä työssä suunnittelemme, toteutamme ja arvioimme uuden järjestelmän, joka tulee korvaamaan yli kaksi vuosikymmentä sitten luodun legacy-järjestelmän. Tämä uusi järjestelmä, "the Watcher", kykenee seuraamaan muutoksia meidän PDM järjestelmässämme, ja ilmoittamaan muutoksista käyttäjille sähköpostilla, sekä push-ilmoituksilla. Hyödynnämme tässä työssä toiminnallisia vaatimuksia, jotka ovat määritelty jo vanhalle järjestelmälle. Esimerkiksi tilausten luominen käyttäen useita suodattimia vähentäen samalla tarpeetonta dataliikennettä. "The Watcher"kykenee suoriutumaan kaikista tehtävistä joista vanha järjestelmäkin, sekä lisäominaisuutena se tarjoaa paremman suorituskyvyn, sekä tehokkuuden. Pääpainona järjestelmässä on skaalautuvuus, ylläpidettävyys ja vikasietoisuus. Kirjallisuuskatsauksessa käymme läpi projektiin liittyvien teknologioiden teorian. Toteutamme Spring Boot ohjelmointikehyksen avulla REST-rajapinnan, jonka välityksellä käyttäjät voivat kommunikoida järjestelmän kanssa. Rakennamme myös tehokkaan ympäristön datan käsittelyyn ja reaaliaikaiseen viestintään käyttäen Apache Kafkaa. Viimeiseksi luomme viestipalvelun, joka vastaa käyttäjien informoimisesta hyödyntäen SSE:ksi kutsuttua teknologiaa, sekä lähettämällä sähköpostiviestejä käyttäjien toivomana ajankohtana. Lopuksi vielä sijoitamme kaikki palvelut kontteihin Dockerin avulla. Projektin suunnitteluosiossa esittelemme niin toiminnalliset, kuin teknisetkin vaatimukset, joiden avulla arvioimme myöhemmin projektin onnistumista. Vertaamme myös vanhaa ja uutta järjestelmää käyttäen metriikoita kuten nopeus ja asennusprosessin yksinkertaisuus. Lopussa keskustelemme projektin tulevaisuudesta sisältäen vaiheet jotka tulisi suorittaa ennen kuin järjestelmä voidaan ottaa tuotantokäyttöön kuten automaattinen testaus, sekä toiminnallisuuksien kehitys tulevina vuosina

    ASAM: Automatic Architecture Synthesis and Application Mapping

    Full text link
    This paper focuses on mastering the automatic architecture synthesis and application mapping for heterogeneous massively-parallel MPSoCs based on customizable application-specific instruction-set processors (ASIPs). It presents an over-view of the research being currently performed in the scope of the European project ASAM of the ARTEMIS program. The paper briefly presents the results of our analysis of the main problems to be solved and challenges to be faced in the design of such heterogeneous MPSoCs. It explains which system, design, and electronic design automation (EDA) concepts seem to be adequate to resolve the problems and address the challenges. Finally, it introduces and briefly discusses the ASAM design-flow and its main stages
    corecore