Search CORE

135 research outputs found

Derived ECC for Protection of On-Demand Data

Author: Ollivier SEBASTIEN
Publication venue
Publication date: 23/01/2019
Field of study

Traditional error correction coding (ECC) methodologies store data parity bits along with data they protect. Subsequently, upon accessing the combined data and parity bits it is possible discover and correct faults in either the data or correction bits. In domain-wall memories (DWMs), a type of sequential access non-volatile memory related to spin-transfer torque magnetic memory (STT-MRAM), it is not convenient or efficient to store data to protect access errors data shifting during sequential access. To solve this problem, we propose a new technique called derived error correction (DEC) for such cases where it is intractable to record metadata for error correction. Instead, we rebuild the metadata on-demand and store only the parity bits that protect the metadata. The DWM metadata is constructed using a novel transverse read (TR). TR reads in an orthogonal direction of a traditional DWM access point and can be used to calculate number of ones in a DWM. Faults in the metadata correspond to shift-errors in the DWM. Using ECC, we can correct these faults in the metadata and use these corrections to repair the shifting state of the DWM to ensure correct operation. Through these techniques, we propose a shift-aware error correction code that provides a lifetime over 10 years while reducing area by 2.6 and 3.7 times against state of the art technique

D-Scholarship@Pitt

Shiftsreduce: Minimizing shifts in racetrack memory 4.0

Author: Bläsing R.
Castrillon J.
Hameed F.
Khan A.
Parkin S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 08/03/2019
Field of study

Racetrack memories (RMs) have significantly evolved since their conception in 2008, making them a serious contender in the field of emerging memory technologies. Despite key technological advancements, the access latency and energy consumption of an RM-based system are still highly influenced by the number of shift operations. These operations are required to move bits to the right positions in the racetracks. This article presents data-placement techniques for RMs that maximize the likelihood that consecutive references access nearby memory locations at runtime, thereby minimizing the number of shifts. We present an integer linear programming (ILP) formulation for optimal data placement in RMs, and we revisit existing offset assignment heuristics, originally proposed for random-access memories. We introduce a novel heuristic tailored to a realistic RM and combine it with a genetic search to further improve the solution. We show a reduction in the number of shifts of up to 52.5%, outperforming the state of the art by up to 16.1%

arXiv.org e-Print Archive

MPG.PuRe

Magnetic racetrack memory: from physics to the cusp of applications within a decade

Author: Bläsing R.
Castrillon J.
Filippou P.
Garg C.
Hameed F.
Khan A.
Parkin S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2020
Field of study

Racetrack memory (RTM) is a novel spintronic memory-storage technology that has the potential to overcome fundamental constraints of existing memory and storage devices. It is unique in that its core differentiating feature is the movement of data, which is composed of magnetic domain walls (DWs), by short current pulses. This enables more data to be stored per unit area compared to any other current technologies. On the one hand, RTM has the potential for mass data storage with unlimited endurance using considerably less energy than today's technologies. On the other hand, RTM promises an ultrafast nonvolatile memory competitive with static random access memory (SRAM) but with a much smaller footprint. During the last decade, the discovery of novel physical mechanisms to operate RTM has led to a major enhancement in the efficiency with which nanoscopic, chiral DWs can be manipulated. New materials and artificially atomically engineered thin-film structures have been found to increase the speed and lower the threshold current with which the data bits can be manipulated. With these recent developments, RTM has attracted the attention of the computer architecture community that has evaluated the use of RTM at various levels in the memory stack. Recent studies advocate RTM as a promising compromise between, on the one hand, power-hungry, volatile memories and, on the other hand, slow, nonvolatile storage. By optimizing the memory subsystem, significant performance improvements can be achieved, enabling a new era of cache, graphical processing units, and high capacity memory devices. In this article, we provide an overview of the major developments of RTM technology from both the physics and computer architecture perspectives over the past decade. We identify the remaining challenges and give an outlook on its future

MPG.PuRe

Design and Code Optimization for Systems with Next-generation Racetrack Memories

Author: Khan Asif Ali
Publication venue
Publication date: 16/06/2022
Field of study

With the rise of computationally expensive application domains such as machine learning, genomics, and fluids simulation, the quest for performance and energy-efficient computing has gained unprecedented momentum. The significant increase in computing and memory devices in modern systems has resulted in an unsustainable surge in energy consumption, a substantial portion of which is attributed to the memory system. The scaling of conventional memory technologies and their suitability for the next-generation system is also questionable. This has led to the emergence and rise of nonvolatile memory ( NVM ) technologies. Today, in different development stages, several NVM technologies are competing for their rapid access to the market. Racetrack memory ( RTM ) is one such nonvolatile memory technology that promises SRAM -comparable latency, reduced energy consumption, and unprecedented density compared to other technologies. However, racetrack memory ( RTM ) is sequential in nature, i.e., data in an RTM cell needs to be shifted to an access port before it can be accessed. These shift operations incur performance and energy penalties. An ideal RTM , requiring at most one shift per access, can easily outperform SRAM . However, in the worst-cast shifting scenario, RTM can be an order of magnitude slower than SRAM . This thesis presents an overview of the RTM device physics, its evolution, strengths and challenges, and its application in the memory subsystem. We develop tools that allow the programmability and modeling of RTM -based systems. For shifts minimization, we propose a set of techniques including optimal, near-optimal, and evolutionary algorithms for efficient scalar and instruction placement in RTMs . For array accesses, we explore schedule and layout transformations that eliminate the longer overhead shifts in RTMs . We present an automatic compilation framework that analyzes static control flow programs and transforms the loop traversal order and memory layout to maximize accesses to consecutive RTM locations and minimize shifts. We develop a simulation framework called RTSim that models various RTM parameters and enables accurate architectural level simulation. Finally, to demonstrate the RTM potential in non-Von-Neumann in-memory computing paradigms, we exploit its device attributes to implement logic and arithmetic operations. As a concrete use-case, we implement an entire hyperdimensional computing framework in RTM to accelerate the language recognition problem. Our evaluation shows considerable performance and energy improvements compared to conventional Von-Neumann models and state-of-the-art accelerators

Technische Universität Dresden: Qucosa

DESTINY: A Comprehensive Tool with 3D and Multi-Level Cell Memory Modeling Capability

Author: Mittal Sparsh
Vetter Jeffrey
Wang Rujia
Publication venue: 'MDPI AG'
Publication date: 01/01/2017
Field of study

To enable the design of large capacity memory structures, novel memory technologies such as non-volatile memory (NVM) and novel fabrication approaches, e.g., 3D stacking and multi-level cell (MLC) design have been explored. The existing modeling tools, however, cover only a few memory technologies, technology nodes and fabrication approaches. We present DESTINY, a tool for modeling 2D/3D memories designed using SRAM, resistive RAM (ReRAM), spin transfer torque RAM (STT-RAM), phase change RAM (PCM) and embedded DRAM (eDRAM) and 2D memories designed using spin orbit torque RAM (SOT-RAM), domain wall memory (DWM) and Flash memory. In addition to single-level cell (SLC) designs for all of these memories, DESTINY also supports modeling MLC designs for NVMs. We have extensively validated DESTINY against commercial and research prototypes of these memories. DESTINY is very useful for performing design-space exploration across several dimensions, such as optimizing for a target (e.g., latency, area or energy-delay product) for a given memory technology, choosing the suitable memory technology or fabrication method (i.e., 2D v/s 3D) for a given optimization target, etc. We believe that DESTINY will boost studies of next-generation memory architectures used in systems ranging from mobile devices to extreme-scale supercomputers. The latest source-code of DESTINY is available from the following git repository: https://bitbucket.org/sparsh_mittal/destiny_v2

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Research Archive of Indian Institute of Technology Hyderabad

High-Performance and Low-Power Magnetic Material Memory Based Cache Design

Author: Sun Zhenyu
Publication venue
Publication date: 29/01/2014
Field of study

Magnetic memory technologies are very promising candidates to be universal memory due to its good scalability, zero standby power and radiation hardness. Having a cell area much smaller than SRAM, magnetic memory can be used to construct much larger cache with the same die footprint, leading to siginficant improvement of overall system performance and power consumption especially in this multi-core era. However, magnetic memories have their own drawbacks such as slow write, read disturbance and scaling limitation, making its usage as caches challenging. This dissertation comprehensively studied these two most popular magnetic memory technologies. Design exploration and optimization for the cache design from different design layers including the memory devices, peripheral circuit, memory array structure and micro-architecture are presented. By leveraging device features, two major micro-architectures -multi-retention cache hierarchy and process-variation-aware cache are presented to improve the write performance of STT-RAM. The enhancement in write performance results in the degradation of read operations, in terms of both speed and data reliability. This dissertation also presents an architecture to resolve STT-RAM read disturbance issue. Furthermore, the scaling of STT-RAM is hindered due to the required size of switching transistor. To break the cell area limitation of STT-RAM, racetrack memory is studied to achieve an even higher memory density and better performance and lower energy consumption. With dedicated elaboration, racetrack memory based cache design can achieve a siginificant area reduction and energy saving when compared to optimized STT-RAM

D-Scholarship@Pitt

Spokane Intercollegiate Research Conference 2012

Author: Gonzaga University
Publication venue: Whitworth University
Publication date: 21/04/2012
Field of study

Whitworth University

Study and manufacturing of biosensors based on plasmonic effects and built on silicon

Author: Calvo Michele
Publication venue: 'Universite de Sherbrooke'
Publication date: 01/01/2020
Field of study

Abstract: Lab-on-a-chip (or LOC) devices scale down the laboratory processes for detecting illnesses and monitoring sick patients without the need of medical laboratories. Well-known examples of LOC are pregnancy test kits or portable HIV sensors. To be useful, LOC devices must be sensitive, specific, compact, and affordable. These criteria are made possible with a transducer that can convert the biological presence of the target molecule into electrical information. Since the early 2000s, integrated photonics have offered a possible solution for a transducer compatible with LOC needs. In particular, silicon micro-ring resonators represent a compact and sensitive choice to use as a transducer in LOC devices. In agreement with the requirements of LOC devices, the objective of this project is to design and assess the performance of a compact photonic biosensor. The system will be based on integrated photonic transduction. The requirements are that it is compatible with an industrial fabrication platform and fluidic systems, with a sensitivity equal to or higher than the state-of-the-art and simple to functionalize in order to localize the target molecules in the sensitive regions of the sensor. This project details the design, fabrication, and characterization of such a biosensor. We found that ring resonators with a Hybrid Plasmonic Waveguide (HPWG) cross-section fulfill the LOC requirements and outperform the state-of-the-art biosensor. Furthermore, based on a principle called mode lift, we patented new geometry of HPWG, which will be the object of an article. We simulated the HPWG structure to understand the coupling mechanisms of the modes inside the structure (more specifically, the plasmonic and the ridge dielectric modes). The fabrication was possible thanks to the collaboration of the industrial and university cleanrooms. An advantage of industrial production is that we can reproducibly create the geometric components necessary for the LOC in a high-throughput manner, thus lowering the cost per unit cell. Once the 300 mm Si wafers were patterned, the university cleanroom fabrication process adds the metallic waveguides. The Au nanopatterning on the devices characterized in this project was created using the lift-off method. The preliminary measures define the optimal testing liquid (glucose monohydrate) and the uncertainty of the measures. The HPWG samples showed an experimental sensitivity lower than the simulations. After adjusting the fabrication parameters (mainly Au and Cr deposition rates and thicknesses), the second-generation HPWG devices suggest that the mode lift improves the sensitivity for waveguides below cutoff (the sensitivity increases from 210 nm/RIU to 320 nm/RIU when only 10% of the ring resonator has an HPWG section and the rest is a ridge waveguide). Even in the case where ridge waveguides are above the cutoff, the sensitivity increases by 40 nm/RIU when using mode lift. We also showed the compatibility of the fabricated devices’ surface with differential functionalization, by means of fluorescent nanoparticles. Due to time limitations, the presence of the nanoparticles will be measured with the fabricated devices in future experiments.Les dispositifs laboratoire sur puce (ou Lab-on-a-chip ou LOC) visent à miniaturiser les procédés de laboratoires pour la détection des maladies et la surveillance des patients malades, sans avoir besoin de laboratoires médicaux. Deux exemples bien connus de LOC sont les kits de test de grossesse ou les capteurs portables du VIH. Pour être efficaces, les appareils LOC doivent être sensibles, spécifiques à l’analyte concerné, compacts et abordables. Ces critères sont possibles grâce à un transducteur, qui peut convertir la présence biologique de la molécule cible en informations électriques. Depuis le début des années 2000, la photonique intégrée a offert une solution pour un système de transduction compatible avec les besoins du LOC. En particulier, les micro-résonateurs à anneaux en silicium représentent un transducteur compact et sensible adapté aux appareils LOC. En accord avec les exigences des dispositifs LOC, l’objectif de ce projet est de concevoir et d’évaluer les performances d’un biocapteur photonique compact. Le système sera basé sur une transduction photonique intégrée. Les exigences sont : une simple fonctionnalisation, la compatibilité avec une plateforme de fabrication industrielle et des systèmes fluidiques, avec une sensibilité égale ou supérieure à l’état de l’art. Ce projet détaille la conception, la fabrication et la caractérisation d’un tel biocapteur. Nous avons constaté que les résonateurs en anneau avec une section transversale de guide d’ondes hybrides plasmoniques (HPWG) remplissent les exigences LOC et sont compétitifs en comparaison avec l’état de l’art des biocapteurs photoniques. Par ailleurs, basée sur un principe appelé mode lift, une nouvelle géométrie de HPWG a été brevetée et fera l’objet d’un article. Nous avons simulé la structure HPWG pour comprendre les mécanismes de couplage des modes photoniques à l’intérieur de la structure (plus précisément les modes plasmoniques et les modes diélectriques du guide d’onde à ruban). La fabrication a été possible grâce à la collaboration de la salle blanche industrielle de STMicroelectronics et des salles blanches universitaires de l’université de Sherbrooke et de l’Institut de Nanotechnologies de Lyon. Un avantage de la production industrielle est que nous pouvons créer de manière reproductible la géométrie des composants nécessaires pour le LOC à haut débit, réduisant ainsi le coût par unité. Une fois que les wafers de 300 mm ont été structurés, le processus de fabrication en salle blanche universitaire permet d’ajouter le métal des guides d’ondes plasmoniques. La méthode du lift-off a été utilisée pour la nanostructuration Au sur les dispositifs caractérisés dans ce projet. Des mesures préliminaires ont permis de définir le liquide d’essai optimal (glucose monohydrate) ainsi que l’incertitude des mesures. Les échantillons HPWG ont montré une sensibilité expérimentale inférieure aux simulations. Après avoir ajusté les paramètres de fabrication (principalement les taux et les épaisseurs de dépôt d’Au et de Cr), les dispositifs HPWG de deuxième génération suggèrent que le mode lift améliore la sensibilité des guides d’ondes en dessous de la coupure (la sensibilité augmente de 210 nm/RIU à 320 nm/RIU lorsque seulement 10 % du résonateur en anneau a une section HPWG). Même par rapport aux guides d’ondes au-dessus de la coupure, la sensibilité augmente de 40 nm/RIU lors de l’utilisation du mode lift. Nous avons également montré la compatibilité de la surface des appareils fabriqués avec la fonctionnalisation différentielle en utilisant des nanoparticules fluorescentes. Pour des contraintes de temps, la présence des nanoparticules ne sera mesurée que dans des futures expériences

Savoirs UdeS

Fusión de los niveles L1 y L2 de la jerarquía de memoria cache utilizando DWM

Author: Tárrega Sánchez Hugo
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 12/11/2021
Field of study

[ES] El presente trabajo aborda la necesidad cada vez mayor por parte de la industria de los semiconductores de contar con memorias cache más densas y con un menor consumo energético que las actuales. Debido a que la tecnología actual más utilizada, SRAM, no puede ofrecer estas mejoras, este trabajo propone el uso de las memorias magnéticas DWM (Domain Wall Memory) como tecnología emergente sustitutiva. El presente trabajo aborda principalmente uno de los mayores inconvenientes de DWM como es la latencia variable por acceso a los datos almacenados en una cinta magnética. Este hecho es especialmente crítico en las caches de primer nivel (L1) al encontrarse en el pipeline del procesador y conllevar un impacto directo en el rendimiento del sistema. Para superar este problema, se propone un diseño de cache de datos L1 ascendente. En primer lugar, se diseña una celda de memoria DWM capaz de almacenar múltiples bits y de reducir el impacto de la latencia variable de acceso mediante el uso de múltiples puertos de acceso sobre la cinta, entre otras características. A continuación, se diseña un módulo de cache que integra múltiples celdas DWM, de manera que los conjuntos se organizan de manera entrelazada entre los puertos, favoreciendo la localidad espacial que exhiben las aplicaciones en L1 y por tanto reduciendo la problemática de la latencia variable de acceso. Finalmente, el uso de módulos DWM permite implementar el vector de datos completo de una memoria cache de datos asociativa por conjuntos de

n

vías. La alta densidad de DWM permite fusionar los niveles L1 y L2 en un sólo nivel DWM con el objetivo de aumentar el rendimiento frente a un diseño de jerarquía de cache convencional SRAM. La propuesta de cache DWM se implementa y evalúa en el simulador ciclo-a-ciclo Multi2Sim, ampliamente utilizado tanto en la industria como en la academia. Los resultados experimentales muestran que la cache DWM reduce significativamente la penalización media por acceso a memoria, los fallos por kilo-instrucción y los ciclos de parada en el \emph{reorder buffer} frente a un diseño de cache convencional. Todo ello conlleva a una mejora en el rendimiento del sistema de un 10% en la media no sólo frente a un diseño convencional basado en SRAM sino también frente al diseño DWM del estado-del-arte, referido como TapeCache e implementado como parte del presente trabajo.[CA] El present treball aborda la necessitat cada volta més apressant per part de la indús- tria dels semiconductors de trobar memòries cau més denses i amb un menor consum energètic que les actuals. Com que la tecnologia actual més utilitzada, SRAM, no pot oferir aquestes millores, aquest treball propose l’ús de les memòries magnètiques DWM (Domain Wall Memory com a tecnologia de substitució.) Aquest treball tracta principalment un dels majors inconvenients de les DWM, com és la latència variable per accés a les dades emmagatzemades en una cinta magnètica. Aquest fet es especialment crític en les memòries cau de primer nivell (L1) al trovar-se en el pipeline del processador i implicar un impacte directe en el rendiment del sistema. Per a superar aquest problema, es propose un disseny de memòria L1 de dades ascendent. En primer lloc, es dissenya una cel·la de memòria DWM capaç d’emmagatzemar múlti- ples bits i de reduir l’impacte de la latència variable d’accés per mitjà de l’ús de múltiples ports d’accés sobre la cinta, entre altres característiques. A continuació, es dissenya un mòdul de memòria cau que integre múltiples cel·les DWM, de manera que els conjunts s’organitzen de manera entrellaçada entre els ports, afavorint la localitat espacial que ex- hibeixen les aplicacions en L1 i per tant reduint la problemàtica de la latència variable d’accés. Finalment, l’ús de mòduls DWM permet implementar el vector de dades com- pletes d’una memòria cau de dades associatives per conjunts de n vies. L’alta densitat de DWM permet fusionar els nivells L1 i L2 en un només nivell DWM amb l’objectiu d’augmentar el rendiment enfront d’un disseny de jerarquia de memòria cau convencio- nal SRAM. La proposta de memòria cau amb DWM s’implementa i avalue en el simulador cicle- a-cicle Multi2Sim, àmpliament utilitzat tant en la indústria com en l’acadèmia. Els re- sultats experimentals mostren que la memòria cau DWM redueix significativament la penalització mitjana per accés a memòria, les fallades per quilo-instrucció i els cicles de parada en el reorder buffer enfront d’un disseny de memòria convencional. Tot això comporta a una millora en el rendiment del sistema d’un 10% en la mitjana no sols en- front d’un disseny convencional basat en SRAM sinó també enfront del disseny DWM de l’estat-del-art, referit com TapeCache i implementat com a part del present treball.[EN] The present work addresses the growing need of the semiconductor industry for denser cache memories with lower power consumption than the current ones. Since the most widely used current technology, SRAM, cannot offer these improvements, this work proposes the use of DWM (Domain Wall Memory) magnetic memories as a substitute emerging technology. This work mainly addresses one of the major drawbacks of DWM, which is the variable latency for accessing data stored on a magnetic tape. This fact is especially critical in first-level (L1) caches as they are located in the processor pipeline and have a direct impact on the system performance. To overcome this problem, a bottom-up L1 data cache design is proposed. First, it is designed a DWM memory cell capable of storing multiple bits and reducing the impact of the variable access latency by using multiple access ports on the tape, among other features. Next, it is designed a cache module that integrates multiple DWM cells, such that the sets are organized in an interleaved structure between ports, favoring the spatial locality exhibited by applications on L1 and thus reducing the variable access latency issue. Finally, the use of DWM modules allows implementing the complete data array of an associative data cache with n-way sets. The high density of DWM allows merging the L1 and L2 levels into a single DWM level with the goal of increasing performance over a conventional SRAM cache hierarchy design. The proposed DWM cache is implemented and evaluated on the Multi2Sim cycle-accurate simulator, which is widely used in both industry and academia. Experimental results show that the DWM cache significantly reduces the average memory access penalty, misses per kilo-instruction, and stall cycles in the reorder buffer compared to a conventional cache design. This leads to a 10% improvement in the average system performance not only over a conventional SRAM-based design but also over the state-of-the-art DWM design, referred to as TapeCache and implemented as part of this work.Tárrega Sánchez, H. (2021). Fusión de los niveles L1 y L2 de la jerarquía de memoria cache utilizando DWM. Universitat Politècnica de València. http://hdl.handle.net/10251/17701

RiuNet