10 research outputs found

    Design and Code Optimization for Systems with Next-generation Racetrack Memories

    Get PDF
    With the rise of computationally expensive application domains such as machine learning, genomics, and fluids simulation, the quest for performance and energy-efficient computing has gained unprecedented momentum. The significant increase in computing and memory devices in modern systems has resulted in an unsustainable surge in energy consumption, a substantial portion of which is attributed to the memory system. The scaling of conventional memory technologies and their suitability for the next-generation system is also questionable. This has led to the emergence and rise of nonvolatile memory ( NVM ) technologies. Today, in different development stages, several NVM technologies are competing for their rapid access to the market. Racetrack memory ( RTM ) is one such nonvolatile memory technology that promises SRAM -comparable latency, reduced energy consumption, and unprecedented density compared to other technologies. However, racetrack memory ( RTM ) is sequential in nature, i.e., data in an RTM cell needs to be shifted to an access port before it can be accessed. These shift operations incur performance and energy penalties. An ideal RTM , requiring at most one shift per access, can easily outperform SRAM . However, in the worst-cast shifting scenario, RTM can be an order of magnitude slower than SRAM . This thesis presents an overview of the RTM device physics, its evolution, strengths and challenges, and its application in the memory subsystem. We develop tools that allow the programmability and modeling of RTM -based systems. For shifts minimization, we propose a set of techniques including optimal, near-optimal, and evolutionary algorithms for efficient scalar and instruction placement in RTMs . For array accesses, we explore schedule and layout transformations that eliminate the longer overhead shifts in RTMs . We present an automatic compilation framework that analyzes static control flow programs and transforms the loop traversal order and memory layout to maximize accesses to consecutive RTM locations and minimize shifts. We develop a simulation framework called RTSim that models various RTM parameters and enables accurate architectural level simulation. Finally, to demonstrate the RTM potential in non-Von-Neumann in-memory computing paradigms, we exploit its device attributes to implement logic and arithmetic operations. As a concrete use-case, we implement an entire hyperdimensional computing framework in RTM to accelerate the language recognition problem. Our evaluation shows considerable performance and energy improvements compared to conventional Von-Neumann models and state-of-the-art accelerators

    XX Workshop de Investigadores en Ciencias de la Computación - WICC 2018 : Libro de actas

    Get PDF
    Actas del XX Workshop de Investigadores en Ciencias de la Computación (WICC 2018), realizado en Facultad de Ciencias Exactas y Naturales y Agrimensura de la Universidad Nacional del Nordeste, los dìas 26 y 27 de abril de 2018.Red de Universidades con Carreras en Informática (RedUNCI

    Algorithmic skeletons for exact combinatorial search at scale

    Get PDF
    Exact combinatorial search is essential to a wide range of application areas including constraint optimisation, graph matching, and computer algebra. Solutions to combinatorial problems are found by systematically exploring a search space, either to enumerate solutions, determine if a specific solution exists, or to find an optimal solution. Combinatorial searches are computationally hard both in theory and practice, and efficiently exploring the huge number of combinations is a real challenge, often addressed using approximate search algorithms. Alternatively, exact search can be parallelised to reduce execution time. However, parallel search is challenging due to both highly irregular search trees and sensitivity to search order, leading to anomalies that can cause unexpected speedups and slowdowns. As core counts continue to grow, parallel search becomes increasingly useful for improving the performance of existing searches, and allowing larger instances to be solved. A high-level approach to parallel search allows non-expert users to benefit from increasing core counts. Algorithmic Skeletons provide reusable implementations of common parallelism patterns that are parameterised with user code which determines the specific computation, e.g. a particular search. We define a set of skeletons for exact search, requiring the user to provide in the minimal case a single class that specifies how the search tree is generated and a parameter that specifies the type of search required. The five are: Sequential search; three general-purpose parallel search methods: Depth-Bounded, Stack-Stealing, and Budget; and a specific parallel search method, Ordered, that guarantees replicable performance. We implement and evaluate the skeletons in a new C++ parallel search framework, YewPar. YewPar provides both high-level skeletons and low-level search specific schedulers and utilities to deal with the irregularity of search and knowledge exchange between workers. YewPar is based on the HPX library for distributed task-parallelism potentially allowing search to execute on multi-cores, clusters, cloud, and high performance computing systems. Underpinning the skeleton design is a novel formal model, MT^3 , a parallel operational semantics that describes multi-threaded tree traversals, allowing reasoning about parallel search, e.g. describing common parallel search phenomena such as performance anomalies. YewPar is evaluated using seven different search applications (and over 25 specific instances): Maximum Clique, k-Clique, Subgraph Isomorphism, Travelling Salesperson, Binary Knapsack, Enumerating Numerical Semigroups, and the Unbalanced Tree Search Benchmark. The search instances are evaluated at multiple scales from 1 to 255 workers, on a 17 host, 272 core Beowulf cluster. The overheads of the skeletons are low, with a mean 6.1% slowdown compared to hand-coded sequential implementation. Crucially, for all search applications YewPar reduces search times by an order of magnitude, i.e hours/minutes to minutes/seconds, and we commonly see greater than 60% (average) parallel efficiency speedups for up to 255 workers. Comparing skeleton performance reveals that no one skeleton is best for all searches, highlighting a benefit of a skeleton approach that allows multiple parallelisations to be explored with minimal refactoring. The Ordered skeleton avoids slowdown anomalies where, due to search knowledge being order dependent, a parallel search takes longer than a sequential search. Analysis of Ordered shows that, while being 41% slower on average (73% worse-case) than Depth-Bounded, in nearly all cases it maintains the following replicable performance properties: 1) parallel executions are no slower than one worker sequential executions 2) runtimes do not increase as workers are added, and 3) variance between repeated runs is low. In particular, where Ordered maintains a relative standard deviation (RSD) of less than 15%, Depth-Bounded suffers from an RSD greater than 50%, showing the importance of carefully controlling search orders for repeatability

    XX Workshop de Investigadores en Ciencias de la Computación - WICC 2018 : Libro de actas

    Get PDF
    Actas del XX Workshop de Investigadores en Ciencias de la Computación (WICC 2018), realizado en Facultad de Ciencias Exactas y Naturales y Agrimensura de la Universidad Nacional del Nordeste, los dìas 26 y 27 de abril de 2018.Red de Universidades con Carreras en Informática (RedUNCI

    Survey of Template-Based Code Generation

    Full text link
    L'automatisation de la génération des artefacts textuels à partir des modèles est une étape critique dans l'Ingénierie Dirigée par les Modèles (IDM). C'est une transformation de modèles utile pour générer le code source, sérialiser les modèles dans de stockages persistents, générer les rapports ou encore la documentation. Parmi les différents paradigmes de transformation de modèle-au-texte, la génération de code basée sur les templates (TBCG) est la plus utilisée en IDM. La TBCG est une technique de génération qui produit du code à partir des spécifications de haut niveau appelées templates. Compte tenu de la diversité des outils et des approches, il est nécessaire de classifier et de comparer les techniques de TBCG existantes afin d'apporter un soutien approprié aux développeurs. L'objectif de ce mémoire est de mieux comprendre les caractéristiques des techniques de TBCG, identifier les tendances dans la recherche, et éxaminer l'importance du rôle de l'IDM par rapport à cette approche. J'évalue également l'expressivité, la performance et la mise à l'échelle des outils associés selon une série de modèles. Je propose une étude systématique de cartographie de la littérature qui décrit une intéressante vue d'ensemble de la TBCG et une étude comparitive des outils de la TBCG pour mieux guider les dévloppeurs dans leur choix. Cette étude montre que les outils basés sur les modèles offrent plus d'expressivité tandis que les outils basés sur le code sont les plus performants. Enfin, Xtend2 offre le meilleur compromis entre l'expressivité et la performance.A critical step in model-driven engineering (MDE) is the automatic synthesis of a textual artifact from models. This is a very useful model transformation to generate application code, to serialize the model in persistent storage, generate documentation or reports. Among the various model-to-text transformation paradigms, Template-Based Code Generation (TBCG) is the most popular in MDE. TBCG is a synthesis technique that produces code from high-level specifications, called templates. It is a popular technique in MDE given that they both emphasize abstraction and automation. Given the diversity of tools and approaches, it is necessary to classify and compare existing TBCG techniques to provide appropriate support to developers. The goal of this thesis is to better understand the characteristics of TBCG techniques, identify research trends, and assess the importance of the role of MDE in this code synthesis approach. We also evaluate the expressiveness, performance and scalability of the associated tools based on a range of models that implement critical patterns. To this end, we conduct a systematic mapping study of the literature that paints an interesting overview of TBCG and a comparative study on TBCG tools to better guide developers in their choices. This study shows that model-based tools offer more expressiveness whereas code-based tools performed much faster. Xtend2 offers the best compromise between the expressiveness and the performance

    High Performance Computing for Computational Science - VECPAR 2014 : 11th International Conference, Eugene, OR, USA, June 30 - July 3, 2014, Revised Selected Papers

    No full text
    International audienceThis book constitutes the thoroughly refereed post-conference proceedings of the 11th International Conference on High Performance Computing for Computational Science, VECPAR 2014, held in Eugene, OR, USA, in June/July 2014. The 25 papers presented were carefully reviewed and selected of numerous submissions. The papers are organized in topical sections on algorithms for GPU and manycores, large-scale applications, numerical algorithms, direct/hybrid methods for solving sparse matrices, performance tuning. The volume also contains the papers presented at the 9th International Workshop on Automatic Performance Tuning

    Software for Exascale Computing - SPPEXA 2016-2019

    Get PDF
    This open access book summarizes the research done and results obtained in the second funding phase of the Priority Program 1648 "Software for Exascale Computing" (SPPEXA) of the German Research Foundation (DFG) presented at the SPPEXA Symposium in Dresden during October 21-23, 2019. In that respect, it both represents a continuation of Vol. 113 in Springer’s series Lecture Notes in Computational Science and Engineering, the corresponding report of SPPEXA’s first funding phase, and provides an overview of SPPEXA’s contributions towards exascale computing in today's sumpercomputer technology. The individual chapters address one or more of the research directions (1) computational algorithms, (2) system software, (3) application software, (4) data management and exploration, (5) programming, and (6) software tools. The book has an interdisciplinary appeal: scholars from computational sub-fields in computer science, mathematics, physics, or engineering will find it of particular interest
    corecore