3,216 research outputs found

    On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution

    Get PDF
    Technology trends are making the cost of data movement increasingly dominant, both in terms of energy and time, over the cost of performing arithmetic operations in computer systems. The fundamental ratio of aggregate data movement bandwidth to the total computational power (also referred to the machine balance parameter) in parallel computer systems is decreasing. It is there- fore of considerable importance to characterize the inherent data movement requirements of parallel algorithms, so that the minimal architectural balance parameters required to support it on future systems can be well understood. In this paper, we develop an extension of the well-known red-blue pebble game to develop lower bounds on the data movement complexity for the parallel execution of computational directed acyclic graphs (CDAGs) on parallel systems. We model multi-node multi-core parallel systems, with the total physical memory distributed across the nodes (that are connected through some interconnection network) and in a multi-level shared cache hierarchy for processors within a node. We also develop new techniques for lower bound characterization of non-homogeneous CDAGs. We demonstrate the use of the methodology by analyzing the CDAGs of several numerical algorithms, to develop lower bounds on data movement for their parallel execution

    Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop

    Get PDF
    We summarize the accomplishments of a multi-disciplinary workshop exploring the computational and scientific issues surrounding the discovery of linguistic units (subwords and words) in a language without orthography. We study the replacement of orthographic transcriptions by images and/or translated text in a well-resourced language to help unsupervised discovery from raw speech.Comment: Accepted to ICASSP 201

    Virtual Environments for multiphysics code validation on Computing Grids

    Get PDF
    We advocate in this paper the use of grid-based infrastructures that are designed for seamless approaches to the numerical expert users, i.e., the multiphysics applications designers. It relies on sophisticated computing environments based on computing grids, connecting heterogeneous computing resources: mainframes, PC-clusters and workstations running multiphysics codes and utility software, e.g., visualization tools. The approach is based on concepts defined by the HEAVEN* consortium. HEAVEN is a European scientific consortium including industrial partners from the aerospace, telecommunication and software industries, as well as academic research institutes. Currently, the HEAVEN consortium works on a project that aims to create advanced services platforms. It is intended to enable "virtual private grids" supporting various environments for users manipulating a suitable high-level interface. This will become the basis for future generalized services allowing the integration of various services without the need to deploy specific grid infrastructures

    The Zero Resource Speech Challenge 2017

    Full text link
    We describe a new challenge aimed at discovering subword and word units from raw speech. This challenge is the followup to the Zero Resource Speech Challenge 2015. It aims at constructing systems that generalize across languages and adapt to new speakers. The design features and evaluation metrics of the challenge are presented and the results of seventeen models are discussed.Comment: IEEE ASRU (Automatic Speech Recognition and Understanding) 2017. Okinawa, Japa

    TABARNAC: Tools for Analyzing Behavior of Applications Running on NUMA Architecture

    Get PDF
    In modern parallel architectures, memory accesses represent a commonbottleneck. Thus, optimizing the way applications access the memory is an important way to improve performance and energy consumption. Memory accesses are even more important with NUMAmachines, as the access time to data depends on its location inthe memory. Many efforts were made todevelop adaptive tools to improve memory accesses at the runtime by optimizingthe mapping of data and threads to NUMA nodes. However, theses tools are notable to change the memory access pattern of the original application,therefore a code written without considering memory performance mightnot benefit from them. Moreover, automatic mapping tools take time to convergetowards the best mapping, losing optimization opportunities. Adeeper understanding of the memory behavior can help optimizing it,removing the need for runtime analysis.In this paper, we present TABARNAC, a tool for analyzing the memory behavior of parallel applications with a focus on NUMA architectures.TABARNAC provides a new visualization of the memory access behavior, focusing on thedistribution of accesses by thread and by structure. Such visualization allows thedeveloper to easily understand why performance issues occur and how to fix them.Using TABARNAC, we explain why some applications do not benefit from dataand thread mapping. Moreover, we propose several code modifications toimprove the memory access behavior of several parallel applications.Les accès mémoire représentent une source de problème de performance fréquenteavec les architectures parallèle moderne. Ainsi optimiser la manière dont lesapplications accèdent à la mémoire est un moyen efficace d'améliorer laperformance et la consommation d'énergie. Les accès mémoire prennent d'autantplus d'important avec les machines NUMA où le temps d'accès à une donnéedépend de sa localisation dans la mémoire. De nombreuse études ont proposéesdes outils adaptatif pour améliorer les accès mémoire en temps réel, cesoutils opèrent en changeant le placement des données et des thread sur lesnœuds NUMA. Cependant ces outils n'ont pas la possibilité de changer la façondont l'application accède à la mémoire. De ce fait un code développé sansprendre en compte les performances des accès mémoire pourrait ne pas en tirerparti. De plus les outils de placement automatique ont besoin de temps pourconverger vers le meilleur placement, perdant des opportunités d'optimisation.Mieux comprendre le comportement mémoire peut aider à l'optimiser et supprimerle besoin d'optimisation en temps réel.Cette étude présente TABARNAC un outil pour analyser le comportement mémoired'application parallèles s'exécutant sur architecture NUMA. TABARNAC offreune nouvelle forme de visualisation du comportement mémoire mettant l'accentsur la distribution des accès entre les thread et par structure de données. Cetype de visualisations permettent de comprendre facilement pourquoi lesproblèmes de performances apparaissent et comment les résoudre. En utilisantTABARNAC, nous expliquons pourquoi certaines applications ne tirent pas partid'outils placement de donnée et de thread. De plus nous proposons plusieursmodification de code permettant d'améliorer le comportement mémoire de plusieursapplications parallèles
    corecore