900 research outputs found

    Performance impact of a slower main memory: a case study of STT-MRAM in HPC

    Get PDF
    In high-performance computing (HPC), significant effort is invested in research and development of novel memory technologies. One of them is Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM) --- byte-addressable, high-endurance non-volatile memory with slightly higher access time than DRAM. In this study, we conduct a preliminary assessment of HPC system performance impact with STT-MRAM main memory with recent industry estimations. Reliable timing parameters of STT-MRAM devices are unavailable, so we also perform a sensitivity analysis that correlates overall system slowdown trend with respect to average device latency. Our results demonstrate that the overall system performance of large HPC clusters is not particularly sensitive to main-memory latency. Therefore, STT-MRAM, as well as any other emerging non-volatile memories with comparable density and access time, can be a viable option for future HPC memory system design.This work was supported by the Collaboration Agreement between Samsung Electronics Co., Ltd. and BSC, Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). This work has also received funding from the European Union's Horizon 2020 research and innovation programme under ExaNoDe project (grant agreement No 671578).Peer ReviewedPostprint (author's final draft

    Soft eSkin:distributed touch sensing with harmonized energy and computing

    Get PDF
    Inspired by biology, significant advances have been made in the field of electronic skin (eSkin) or tactile skin. Many of these advances have come through mimicking the morphology of human skin and by distributing few touch sensors in an area. However, the complexity of human skin goes beyond mimicking few morphological features or using few sensors. For example, embedded computing (e.g. processing of tactile data at the point of contact) is centric to the human skin as some neuroscience studies show. Likewise, distributed cell or molecular energy is a key feature of human skin. The eSkin with such features, along with distributed and embedded sensors/electronics on soft substrates, is an interesting topic to explore. These features also make eSkin significantly different from conventional computing. For example, unlike conventional centralized computing enabled by miniaturized chips, the eSkin could be seen as a flexible and wearable large area computer with distributed sensors and harmonized energy. This paper discusses these advanced features in eSkin, particularly the distributed sensing harmoniously integrated with energy harvesters, storage devices and distributed computing to read and locally process the tactile sensory data. Rapid advances in neuromorphic hardware, flexible energy generation, energy-conscious electronics, flexible and printed electronics are also discussed. This article is part of the theme issue ‘Harmonizing energy-autonomous computing and intelligence’

    Evaluation of STT-MRAM main memory for HPC and real-time systems

    Get PDF
    It is questionable whether DRAM will continue to scale and will meet the needs of next-generation systems. Therefore, significant effort is invested in research and development of novel memory technologies. One of the candidates for nextgeneration memory is Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM). STT-MRAM is an emerging non-volatile memory with a lot of potential that could be exploited for various requirements of different computing systems. Being a novel technology, STT-MRAM devices are already approaching DRAM in terms of capacity, frequency and device size. Special STT-MRAM features such as intrinsic radiation hardness, non-volatility, zero stand-by power and capability to function in extreme temperatures also make it particularly suitable for aerospace, avionics and automotive applications. Despite of being a conceivable alternative for main memory technology, to this day, academic research of STT-MRAM main memory remains marginal. This is mainly due to the unavailability of publicly available detailed timing parameters of this novel technology, which are required to perform a cycle accurate main memory simulation. Some researchers adopt simplistic memory models to simulate main memory, but such models can introduce significant errors in the analysis of the overall system performance. Therefore, detailed timing parameters are a must-have for any evaluation or architecture exploration study of STT-MRAM main memory. These detailed parameters are not publicly available because STT-MRAM manufacturers are reluctant to release any delicate information on the technology. This thesis demonstrates an approach to perform a cycle accurate simulation of STT-MRAM main memory, being the first to release detailed timing parameters of this technology from academia, essentially enabling researchers to conduct reliable system level simulation of STT-MRAM using widely accepted existing simulation infrastructure. Our results show that, in HPC domain STT-MRAM provide performance comparable to DRAM. Results from the power estimation indicates that STT-MRAM power consumption increases significantly for Activation/Precharge power while Burst power increases moderately and Background power does not deviate much from DRAM. The thesis includes detailed STT-MRAM main memory timing parameters to the main repositories of DramSim2 and Ramulator, two of the most widely used and accepted state-of-the-art main memory simulators. The STT-MRAM timing parameters that has been originated as a part of this thesis, are till date the only reliable and publicly available timing information on this memory technology published from academia. Finally, the thesis analyzes the feasibility of using STT-MRAM in real-time embedded systems by investigating STT-MRAM main memory impact on average system performance and WCET. STT-MRAM's suitability for the real-time embedded systems is validated on benchmarks provided by the European Space Agency (ESA), EEMBC Autobench and MediaBench suite by analyzing performance and WCET impact. In quantitative terms, our results show that STT-MRAM main memory in real-time embedded systems provides performance and WCET comparable to conventional DRAM, while opening up opportunities to exploit various advantages.Es cuestionable si DRAM continuará escalando y cumplirá con las necesidades de los sistemas de la próxima generación. Por lo tanto, se invierte un esfuerzo significativo en la investigación y el desarrollo de nuevas tecnologías de memoria. Uno de los candidatos para la memoria de próxima generación es la Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM). STT-MRAM es una memoria no volátil emergente con un gran potencial que podría ser explotada para diversos requisitos de diferentes sistemas informáticos. Al ser una tecnología novedosa, los dispositivos STT-MRAM ya se están acercando a la DRAM en términos de capacidad, frecuencia y tamaño del dispositivo. Las características especiales de STTMRAM, como la dureza intrínseca a la radiación, la no volatilidad, la potencia de reserva cero y la capacidad de funcionar en temperaturas extremas, también lo hacen especialmente adecuado para aplicaciones aeroespaciales, de aviónica y automotriz. A pesar de ser una alternativa concebible para la tecnología de memoria principal, hasta la fecha, la investigación académica de la memoria principal de STT-MRAM sigue siendo marginal. Esto se debe principalmente a la falta de disponibilidad de los parámetros de tiempo detallados públicamente disponibles de esta nueva tecnología, que se requieren para realizar un ciclo de simulación de memoria principal precisa. Algunos investigadores adoptan modelos de memoria simplistas para simular la memoria principal, pero tales modelos pueden introducir errores significativos en el análisis del rendimiento general del sistema. Por lo tanto, los parámetros de tiempo detallados son indispensables para cualquier evaluación o estudio de exploración de la arquitectura de la memoria principal de STT-MRAM. Estos parámetros detallados no están disponibles públicamente porque los fabricantes de STT-MRAM son reacios a divulgar información delicada sobre la tecnología. Esta tesis demuestra un enfoque para realizar un ciclo de simulación precisa de la memoria principal de STT-MRAM, siendo el primero en lanzar parámetros de tiempo detallados de esta tecnología desde la academia, lo que esencialmente permite a los investigadores realizar una simulación confiable a nivel de sistema de STT-MRAM utilizando una simulación existente ampliamente aceptada infraestructura. Nuestros resultados muestran que, en el dominio HPC, STT-MRAM proporciona un rendimiento comparable al de la DRAM. Los resultados de la estimación de potencia indican que el consumo de potencia de STT-MRAM aumenta significativamente para la activation/Precharge power, mientras que la Burst power aumenta moderadamente y la Background power no se desvía mucho de la DRAM. La tesis incluye parámetros detallados de temporización memoria principal de STT-MRAM a los repositorios principales de DramSim2 y Ramulator, dos de los simuladores de memoria principal más avanzados y más utilizados y aceptados. Los parámetros de tiempo de STT-MRAM que se han originado como parte de esta tesis, son hasta la fecha la única información de tiempo confiable y disponible al público sobre esta tecnología de memoria publicada desde la academia. Finalmente, la tesis analiza la viabilidad de usar STT-MRAM en real-time embedded systems mediante la investigación del impacto de la memoria principal de STT-MRAM en el rendimiento promedio del sistema y WCET. La idoneidad de STTMRAM para los real-time embedded systems se valida en los applicaciones proporcionados por la European Space Agency (ESA), EEMBC Autobench y MediaBench, al analizar el rendimiento y el impacto de WCET. En términos cuantitativos, nuestros resultados muestran que la memoria principal de STT-MRAM en real-time embedded systems proporciona un desempeño WCET comparable al de una memoria DRAM convencional, al tiempo que abre oportunidades para explotar varias ventajas

    Cost-Driven Hardware-Software Co-Optimization of Machine Learning Pipelines

    Full text link
    Researchers have long touted a vision of the future enabled by a proliferation of internet-of-things devices, including smart sensors, homes, and cities. Increasingly, embedding intelligence in such devices involves the use of deep neural networks. However, their storage and processing requirements make them prohibitive for cheap, off-the-shelf platforms. Overcoming those requirements is necessary for enabling widely-applicable smart devices. While many ways of making models smaller and more efficient have been developed, there is a lack of understanding of which ones are best suited for particular scenarios. More importantly for edge platforms, those choices cannot be analyzed in isolation from cost and user experience. In this work, we holistically explore how quantization, model scaling, and multi-modality interact with system components such as memory, sensors, and processors. We perform this hardware/software co-design from the cost, latency, and user-experience perspective, and develop a set of guidelines for optimal system design and model deployment for the most cost-constrained platforms. We demonstrate our approach using an end-to-end, on-device, biometric user authentication system using a $20 ESP-EYE board

    Limits on Fundamental Limits to Computation

    Full text link
    An indispensable part of our lives, computing has also become essential to industries and governments. Steady improvements in computer hardware have been supported by periodic doubling of transistor densities in integrated circuits over the last fifty years. Such Moore scaling now requires increasingly heroic efforts, stimulating research in alternative hardware and stirring controversy. To help evaluate emerging technologies and enrich our understanding of integrated-circuit scaling, we review fundamental limits to computation: in manufacturing, energy, physical space, design and verification effort, and algorithms. To outline what is achievable in principle and in practice, we recall how some limits were circumvented, compare loose and tight limits. We also point out that engineering difficulties encountered by emerging technologies may indicate yet-unknown limits.Comment: 15 pages, 4 figures, 1 tabl
    • …
    corecore