27 research outputs found

    Shiftsreduce: Minimizing shifts in racetrack memory 4.0

    Get PDF
    Racetrack memories (RMs) have significantly evolved since their conception in 2008, making them a serious contender in the field of emerging memory technologies. Despite key technological advancements, the access latency and energy consumption of an RM-based system are still highly influenced by the number of shift operations. These operations are required to move bits to the right positions in the racetracks. This article presents data-placement techniques for RMs that maximize the likelihood that consecutive references access nearby memory locations at runtime, thereby minimizing the number of shifts. We present an integer linear programming (ILP) formulation for optimal data placement in RMs, and we revisit existing offset assignment heuristics, originally proposed for random-access memories. We introduce a novel heuristic tailored to a realistic RM and combine it with a genetic search to further improve the solution. We show a reduction in the number of shifts of up to 52.5%, outperforming the state of the art by up to 16.1%

    Design and Code Optimization for Systems with Next-generation Racetrack Memories

    Get PDF
    With the rise of computationally expensive application domains such as machine learning, genomics, and fluids simulation, the quest for performance and energy-efficient computing has gained unprecedented momentum. The significant increase in computing and memory devices in modern systems has resulted in an unsustainable surge in energy consumption, a substantial portion of which is attributed to the memory system. The scaling of conventional memory technologies and their suitability for the next-generation system is also questionable. This has led to the emergence and rise of nonvolatile memory ( NVM ) technologies. Today, in different development stages, several NVM technologies are competing for their rapid access to the market. Racetrack memory ( RTM ) is one such nonvolatile memory technology that promises SRAM -comparable latency, reduced energy consumption, and unprecedented density compared to other technologies. However, racetrack memory ( RTM ) is sequential in nature, i.e., data in an RTM cell needs to be shifted to an access port before it can be accessed. These shift operations incur performance and energy penalties. An ideal RTM , requiring at most one shift per access, can easily outperform SRAM . However, in the worst-cast shifting scenario, RTM can be an order of magnitude slower than SRAM . This thesis presents an overview of the RTM device physics, its evolution, strengths and challenges, and its application in the memory subsystem. We develop tools that allow the programmability and modeling of RTM -based systems. For shifts minimization, we propose a set of techniques including optimal, near-optimal, and evolutionary algorithms for efficient scalar and instruction placement in RTMs . For array accesses, we explore schedule and layout transformations that eliminate the longer overhead shifts in RTMs . We present an automatic compilation framework that analyzes static control flow programs and transforms the loop traversal order and memory layout to maximize accesses to consecutive RTM locations and minimize shifts. We develop a simulation framework called RTSim that models various RTM parameters and enables accurate architectural level simulation. Finally, to demonstrate the RTM potential in non-Von-Neumann in-memory computing paradigms, we exploit its device attributes to implement logic and arithmetic operations. As a concrete use-case, we implement an entire hyperdimensional computing framework in RTM to accelerate the language recognition problem. Our evaluation shows considerable performance and energy improvements compared to conventional Von-Neumann models and state-of-the-art accelerators

    Bit-Flip Aware Data Structures for Phase Change Memory

    Get PDF
    Big, non-volatile, byte-addressable, low-cost, and fast non-volatile memories like Phase Change Memory are appearing in the marketplace. They have the capability to unify both memory and storage and allow us to rethink the present memory hierarchy. An important draw-back to Phase Change Memory is limited write-endurance. In addition, Phase Change Memory shares with other Non-Volatile Random Access Memories an asym- metry in the energy costs of writes and reads. Best use of Non-Volatile Random Access Memories limits the number of times a Non-Volatile Random Access Memory cell changes contents, called a bit-flip. While the future of main memory is still unknown, we should already start to create data structures for them in order to shape the future era. This thesis investigates the creation of bit-flip aware data structures.The thesis first considers general ways in which a data structure can save bit- flips by smart overwrites and by using the exclusive-or of pointers. It then shows how a simple content dependent encoding can reduce bit-flips for web corpora. It then shows how to build hash based dictionary structures for Linear Hashing and Spiral Storage. Finally, the thesis presents Gray counters, close to bit-flip optimal counters that even enable age- based wear leveling with counters managed by the Non-Volatile Random Access Memories themselves instead of by the Operating Systems

    Accelerating Graph Computation with Racetrack Memory and Pointer-Assisted Graph Representation

    No full text
    1

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access two-volume set constitutes the proceedings of the 27th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2021, which was held during March 27 – April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The total of 41 full papers presented in the proceedings was carefully reviewed and selected from 141 submissions. The volume also contains 7 tool papers; 6 Tool Demo papers, 9 SV-Comp Competition Papers. The papers are organized in topical sections as follows: Part I: Game Theory; SMT Verification; Probabilities; Timed Systems; Neural Networks; Analysis of Network Communication. Part II: Verification Techniques (not SMT); Case Studies; Proof Generation/Validation; Tool Papers; Tool Demo Papers; SV-Comp Tool Competition Papers

    Data Resource Management in Throughput Processors

    Full text link
    Graphics Processing Units (GPUs) are becoming common in data centers for tasks like neural network training and image processing due to their high performance and efficiency. GPUs maintain high throughput by running thousands of threads simultaneously, issuing instructions from ready threads to hide latency in others that are stalled. While this is effective for keeping the arithmetic units busy, the challenge in GPU design is moving the data for computation at the same high rate. Any inefficiency in data movement and storage will compromise the throughput and energy efficiency of the system. Since energy consumption and cooling make up a large part of the cost of provisioning and running and a data center, making GPUs more suitable for this environment requires removing the bottlenecks and overheads that limit their efficiency. The performance of GPU workloads is often limited by the throughput of the memory resources inside each GPU core, and though many of the power-hungry structures in CPUs are not found in GPU designs, there is overhead for storing each thread's state. When sharing a GPU between workloads, contention for resources also causes interference and slowdown. This thesis develops techniques to manage and streamline the data movement and storage resources in GPUs in each of these places. The first part of this thesis resolves data movement restrictions inside each GPU core. The GPU memory system is optimized for sequential accesses, but many workloads load data in irregular or transposed patterns that cause a throughput bottleneck even when all loads are cache hits. This work identifies and leverages opportunities to merge requests across threads before sending them to the cache. While requests are waiting for merges, they can be reordered to achieve a higher cache hit rate. These methods yielded a 38% speedup for memory throughput limited workloads. Another opportunity for optimization is found in the register file. Since it must store the registers for thousands of active threads, it is the largest on-chip data storage structure on a GPU. The second work in this thesis replaces the register file with a smaller, more energy-efficient register buffer. Compiler directives allow the GPU to know ahead of time which registers will be accessed, allowing the hardware to store only the registers that will be imminently accessed in the buffer, with the rest moved to main memory. This technique reduced total GPU energy by 11%. Finally, in a data center, many different applications will be launching GPU jobs, and just as multiple processes can share the same CPU to increase its utilization, running multiple workloads on the same GPU can increase its overall throughput. However, co-runners interfere with each other in unpredictable ways, especially when sharing memory resources. The final part of this thesis controls this interference, allowing a GPU to be shared between two tiers of workloads: one tier with a high performance target and another suitable for batch jobs without deadlines. At a 90% performance target, this technique increased GPU throughput by 9.3%. GPUs' high efficiency and performance makes them a valuable accelerator in the data center. The contributions in this thesis further increase their efficiency by removing data movement and storage overheads and unlock additional performance by enabling resources to be shared between workloads while controlling interference.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146122/1/jklooste_1.pd

    An Innovative Human Machine Interface for UAS Flight Management System

    Get PDF
    The thesis is relative to the development of an innovative Human Machine Interface for UAS Flight Management System. In particular, touchscreena have been selected as data entry interface. The thesis has been done together at Alenia Aermacch

    2D and 3D quantitative TEM mapping of CoNi nanowires

    Get PDF
    Les nanofils magnétiques constituent un domaine de recherche en plein essor. De section cylindrique, ils permettent la propagation des parois de domaines magnétiques à très grandes vitesses et des interactions fortes avec les ondes de spin, ce qui les rend particulièrement intéressants pour le développement de futurs composants de la spintronique. L'objectif de ce travail de thèse est de fournir une analyse quantitative et qualitative complète de la configuration magnétique locale dans des nanofils magnétiques cylindriques d'alliage CoNi à anisotropie magnétocristalline perpendiculaire en utilisant les techniques d'imagerie magnétique avancées de la microscopie électronique à transmission (MET), principalement axées sur l'holographie électronique (HE). Une étude corrélative entre les propriétés structurales, les variations locales de composition et les configurations magnétiques de ces nanofils a été réalisée. De plus, les configurations tridimensionnelles (3D) complexes des domaines et des parois magnétiques ont été analysées par tomographie holographique de champ vectoriel (THCV) afin d'obtenir les trois composantes de l'induction magnétique. Enfin, un protocole a été développé pour étudier in situ par microscopie de Lorentz la configuration magnétique de ces nanofils lors de l'injection d'impulsions de courant. La première partie de ce travail est focalisée sur la corrélation des configurations magnétiques de nanofils individuels de CoNi avec les propriétés structurales et chimiques locales. L'orientation de la phase cristalline a été cartographiée en diffraction électronique par précession et combinée à des mesures de composition par spectroscopie de perte d'énergie des électrons. Les résultats révèlent une coexistence de grains de phase cfc et de phase hcp, cette dernière présente sa direction cristallographique c orientée presque perpendiculairement à l'axe du nanofil. Cette coexistence de phases cristallographiques est à l'origine de variations localisées et abruptes de la configuration magnétique. Deux nanofil configurations principales ont été observées : une chaîne d'états transversaux par rapport à l'axe du, de type vortex, et un état longitudinal. Nous avons observé que les états transversaux sont liés à la phase hcp possédant une forte anisotropie magnétocristalline perpendiculaire, ce que confirment les simulations micromagnétiques. Une autre partie de ce travail concerne l'étude de la structure magnétique 3D des domaines et des parois de domaines dans la phase hcp. Cette étude a été menée pour des états rémanents différents en fonction de l'application d'un champ de saturation perpendiculaire et parallèle à l'axe du nanofil. Les mesures ont été réalisées par la méthode THCV afin d'extraire les trois composantes de l'induction magnétique et reconstruire en 3D la configuration magnétique locale du nanofil. Les résultats montrent une stabilisation d'une chaîne de vortex dans le cas d'une saturation perpendiculaire, et des états d'enroulement longitudinaux séparés par des parois de domaine transversales après l'application d'un champ externe parallèle à l'axe du fils. La dernière partie du manuscrit présente les résultats obtenus en microscopie de Lorentz in situ démontrant la possibilité de manipuler les parois des domaines magnétiques d'un nanofil de CoNi par injection d'impulsions électriques. Cette preuve de concept est considérée comme le précurseur des observations in situ de la dynamique des parois de domaines en EH. Un protocole précis, axé sur les étapes cruciales de préparation des échantillons et les développements à poursuivre pour réaliser ces expériences délicates, est détaillé.Cylindrical magnetic nanowires (NWs) are currently subjects of high interest due to fast domain wall velocities and interaction with spin-waves, which are considered interesting qualities for developing future spintronic devices. This thesis aims to provide a wholesome quantitative and qualitative analysis of the local magnetic configuration in cylindrical Co-rich CoNi NWs with perpendicular magnetocrystalline anisotropy using state-of-the-art transmission electron microscopy (TEM) magnetic imaging techniques, mainly focused on two-dimensional (2D) and three-dimensional (3D) electron holography (EH). A correlative study between the NW's texture, modulation in composition, and magnetic configuration has been conducted. Further, the complex 3D nature of the domain and domain wall configurations have been analyzed using holographic vector field electron tomography (VFET) to retrieve all three components of the magnetic induction. Finally, I have successfully manipulated the magnetic configuration observed by Lorentz microscopy in Fresnel mode by the in situ injection of a current pulse. A TEM study comparing the magnetic configuration to the local NW structure was performed on single NWs. The crystal phase analysis was done by precession electron diffraction assisted automated crystal orientation mapping in the TEM combined with compositional analysis by scanning-TEM (STEM) electron energy loss spectroscopy (EELS) for a detailed correlation with the sample's magnetic configuration. The results reveal a coexistence of fcc grains and hcp phase with its c-axis oriented close to perpendicular to the wire axis in the same NW, which is identified as the origin of drastic local changes in the magnetic configuration. Two main configurations are observed in the NW region: a chain of transversal vortex-like states and a longitudinal curling state. The chain or vortices are linked to the hcp grain with the perpendicular magnetocrystalline anisotropy, as confirmed by micromagnetic simulations. The 3D magnetic structure of the domains and domain walls observed in the hcp grain of the NWs has been studied for two different remnant states: after the application of a saturation field perpendicular (i) and parallel (ii) to the NW axis. The measurements were done using state-of-the-art holographic VFET to extract all three components of the magnetic induction in the sample, as well as a 3D reconstruction of the volume from the measured electric potentials, giving insight into the local morphology of the NW. The results show a stabilization of a vortex chain in the case of perpendicular saturation, but longitudinal curling states separated by transversal domain walls after applying a parallel external field. Finally, preliminary Lorentz microscopy results are presented, documenting the manipulation of magnetic domain walls by the in situ injection of electrical pulses on a single cylindrical CoNi nanowire contacted by focused ion beam induced deposition. This is believed to be the forerunner for quantitative electrical measurements and in situ observations of domain wall dynamics using EH at the CEMES. A detailed protocol focusing on the crucial steps and challenges ahead for such a delicate experiment is presented, together with suggestions for future work to continue the developments
    corecore