16,451 research outputs found

    Main memory in HPC: do we need more, or could we live with less?

    Get PDF
    An important aspect of High-Performance Computing (HPC) system design is the choice of main memory capacity. This choice becomes increasingly important now that 3D-stacked memories are entering the market. Compared with conventional Dual In-line Memory Modules (DIMMs), 3D memory chiplets provide better performance and energy efficiency but lower memory capacities. Therefore, the adoption of 3D-stacked memories in the HPC domain depends on whether we can find use cases that require much less memory than is available now. This study analyzes the memory capacity requirements of important HPC benchmarks and applications. We find that the High-Performance Conjugate Gradients (HPCG) benchmark could be an important success story for 3D-stacked memories in HPC, but High-Performance Linpack (HPL) is likely to be constrained by 3D memory capacity. The study also emphasizes that the analysis of memory footprints of production HPC applications is complex and that it requires an understanding of application scalability and target category, i.e., whether the users target capability or capacity computing. The results show that most of the HPC applications under study have per-core memory footprints in the range of hundreds of megabytes, but we also detect applications and use cases that require gigabytes per core. Overall, the study identifies the HPC applications and use cases with memory footprints that could be provided by 3D-stacked memory chiplets, making a first step toward adoption of this novel technology in the HPC domain.This work was supported by the Collaboration Agreement between Samsung Electronics Co., Ltd. and BSC, Spanish Government through Severo Ochoa programme (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316-P project and by the Generalitat de Catalunya (contracts 2014-SGR-1051 and 2014-SGR-1272). This work has also received funding from the European Union’s Horizon 2020 research and innovation programme under ExaNoDe project (grant agreement No 671578). Darko Zivanovic holds the Severo Ochoa grant (SVP-2014-068501) of the Ministry of Economy and Competitiveness of Spain. The authors thank Harald Servat from BSC and Vladimir Marjanovi´c from High Performance Computing Center Stuttgart for their technical support.Postprint (published version

    Near-Memory Address Translation

    Full text link
    Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common. In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure

    Robust and parallel scalable iterative solutions for large-scale finite cell analyses

    Full text link
    The finite cell method is a highly flexible discretization technique for numerical analysis on domains with complex geometries. By using a non-boundary conforming computational domain that can be easily meshed, automatized computations on a wide range of geometrical models can be performed. Application of the finite cell method, and other immersed methods, to large real-life and industrial problems is often limited due to the conditioning problems associated with these methods. These conditioning problems have caused researchers to resort to direct solution methods, which signifi- cantly limit the maximum size of solvable systems. Iterative solvers are better suited for large-scale computations than their direct counterparts due to their lower memory requirements and suitability for parallel computing. These benefits can, however, only be exploited when systems are properly conditioned. In this contribution we present an Additive-Schwarz type preconditioner that enables efficient and parallel scalable iterative solutions of large-scale multi-level hp-refined finite cell analyses.Comment: 32 pages, 17 figure

    Hybrid M-FSK/DQPSK Modulations for CubeSat Picosatellites

    Get PDF
    Conventional CubeSat radio systems typically use one of several basic modulations, such as AFSK, GMSK, BPSK, QPSK and OOK or switch between them on demand if possible. These modulations represent a bal¬anced trade-off between good energy efficiency of high order M-FSK modulation and good spectral efficiency of high order M-QAM modulation. Utilization of modulations with the best energy efficiency is not possible due to strict limits on occupied frequency bandwidth. In this paper the proposed group of hybrid modulations and proposed hybrid modulator and demodulator are presented. Novel solution offer interesting possibilities of increasing spectral efficiency as well as energy efficiency of basic M-FSK modulation by embedding DQPSK symbols between two M-FSK symbols. Such group of hybrid modulations offers suitable properties for picosatellite, e.g. simple realization onboard the picosatellite, better energy and spectral efficiency, low PAPR, wide range of adaptation by changing the order of M-FSK, suitable for easy non-coherent demodulation, good immunity to Doppler effect with DM-FSK coding
    corecore