56 research outputs found

    Exploiting NVM in Large-scale Graph Analytics

    Get PDF
    Data center applications like graph analytics require servers with ever larger memory capacities. DRAM scaling, how- ever, is not able to match the increasing demands for ca- pacity. Emerging byte-addressable, non-volatile memory technologies (NVM) offer a more scalable alternative, with memory that is directly addressable to software, but at a higher latency and lower bandwidth. Using an NVM hardware emulator, we study the suitabil- ity of NVM in meeting the memory demands of four state of the art graph analytics frameworks, namely Graphlab, Galois, X-Stream and Graphmat. We evaluate their perfor- mance with popular algorithms (Pagerank, BFS, Triangle Counting and Collaborative filtering) by allocating mem- ory exclusive from DRAM (DRAM-only) or emulated NVM (NVM-only). While all of these applications are sensitive to higher latency or lower bandwidth of NVM, resulting in perfor- mance degradation of up to 4X with NVM-only (compared to DRAM-only), we show that the performance impact is somewhat mitigated in the frameworks that exploit CPU memory-level parallelism and hardware prefetchers. Further, we demonstrate that, in a hybrid memory system with NVM and DRAM, intelligent placement of application data based on their relative importance may help offset the overheads of the NVM-only solution in a cost-effective man- ner (i.e., using only a small amount of DRAM). Specifically, we show that, depending on the algorithm, Graphmat can achieve close to DRAM-only performance (within 1.2X) by placing only 6.7% to 31.5% of its total memory footprint in DRA

    Implicit Decomposition for Write-Efficient Connectivity Algorithms

    Full text link
    The future of main memory appears to lie in the direction of new technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of latency, bandwidth, and energy. Motivated by this trend, we propose sequential and parallel algorithms to solve graph connectivity problems using significantly fewer writes than conventional algorithms. Our primary algorithmic tool is the construction of an o(n)o(n)-sized "implicit decomposition" of a bounded-degree graph GG on nn nodes, which combined with read-only access to GG enables fast answers to connectivity and biconnectivity queries on GG. The construction breaks the linear-write "barrier", resulting in costs that are asymptotically lower than conventional algorithms while adding only a modest cost to querying time. For general non-sparse graphs on mm edges, we also provide the first o(m)o(m) writes and O(m)O(m) operations parallel algorithms for connectivity and biconnectivity. These algorithms provide insight into how applications can efficiently process computations on large graphs in systems with read-write asymmetry

    Adding Machine Intelligence to Hybrid Memory Management

    Get PDF
    Computing platforms increasingly incorporate heterogeneous memory hardware technologies, as a way to scale application performance, memory capacities and achieve cost effectiveness. However, this heterogeneity, along with the greater irregularity in the behavior of emerging workloads, render existing hybrid memory management approaches ineffective, calling for more intelligent methods. To this end, this thesis reveals new insights, develops novel methods and contributes system-level mechanisms towards the practical integration of machine learning to hybrid memory management, boosting application performance and system resource efficiency. First, this thesis builds Kleio; a hybrid memory page scheduler with machine intelligence. Kleio deploys Recurrent Neural Networks to learn memory access patterns at a page granularity and to improve upon the selection of dynamic page migrations across the memory hardware components. Kleio cleverly focuses the machine learning on the page subset whose timely movement will reveal most application performance improvement, while preserving history-based lightweight management for the rest of the pages. In this way, Kleio bridges on average 80% of the relative existing performance gap, while laying the grounds for practical machine intelligent data management with manageable learning overheads. In addition, this thesis contributes three system-level mechanisms to further boost application performance and reduce the operational and learning overheads of machine learning-based hybrid memory management. First, this thesis builds Cori; a system-level solution for tuning the operational frequency of periodic page schedulers for hybrid memories. Cori leverages insights on data reuse times to fine tune the page migration frequency in a lightweight manner. Second, this thesis contributes Coeus; a page grouping mechanism for page schedulers like Kleio. Coeus leverages Cori’s data reuse insights to tune the granularity at which patterns are interpreted by the page scheduler and enable the training of a single Recurrent Neural Network per page cluster, reducing by 3x the model training times. The combined effects of Cori and Coeus provide 3x additional performance improvements to Kleio. Finally, this thesis proposes Cronus; an image-based page selector for page schedulers like Kleio. Cronus uses visualization to accelerate the process of selecting which page patterns should be managed with machine learning, reducing by 75x the operational overheads of Kleio. Cronus lays the foundations for future use of visualization and computer vision methods in memory management, such as image-based memory access pattern classification, recognition and prediction.Ph.D

    Explorando a substituição de DRAM por NVM na memória principal através de simulação

    Get PDF
    Orientadores: Rodolfo Jardim de Azevedo, Emílio de Camargo FrancesquiniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O sistema de memória dos computadores tem se baseado fortemente no uso de memórias voláteis para prover um bom desempenho. A tecnologia SRAM é utilizada como um intermediário que acelera o acesso à memória principal, comumente composta pela tecnologia DRAM. Memórias não-voláteis são colocadas como memórias secundárias. Pelo fato dos dados persistentes estarem armazenados no nível de memória mais distante do processador, eles normalmente são manipulados de maneira indireta através de cópias transientes. Tais cópias transientes, além de possívelmente estarem presentes em mais de um nível de memória volátil, podem não ter a mesma forma de suas formas persistentes, o que leva à necessidade de uma tradução entre essas formas. Tecnologias emergentes de memórias não-voláteis (NVMs) prometem possibilitar a existência de dados persistentes na memória principal, permitindo que os mesmos sejam manipulados diretamente, e potencialmente reduzindo a quantidade de cópias transientes. Infelizmente, NVMs ainda não estão amplamente disponíveis no mercado, e pesquisas em seu uso são normalmente feitas através de simulação. Neste documento é apresentado um simulador que tem como fim explorar o uso de NVMs na memória principal. Por enquanto, a tecnologia DRAM provê um tempo de acesso inferior ao das NVMs, restringindo o uso de NVMs na memória principal em questão de desempenho. São mostrados aqui dois cenários para o uso do simulador. No primeiro caso, há a utilização de uma memória principal composta apenas de NVM. Como NVM é mais lenta, são observados certos slowdowns de até 5,3, mas em alguns programas o desempenho é marginalmente afetado. Em um segundo caso, há a exploração da memória híbrida, onde DRAM e NVM coexistem na memória principal. Uma API, chamada NVMalloc, é fornecida para permitir que programas consigam utilizar a não volatilidade presente na memória principal. É mostrado que há casos onde a manipulação direta dos dados persistentes é vantajosa, mas existem outros em que ainda é preferível trabalhar com cópias transientes na DRAM. É esperado que esse simulador seja utilizado como um ponto de partida para futuras pesquisas sobre o uso de NVMsAbstract: Computer memory systems have relied on volatile memories to enhance their performance for quite a time by now. SRAM technology is used at the closest layer to the CPU to accelerate the access time to the main memory, which is traditionally composed by DRAM technology. Non-volatile memories are left as secondary memories, serving as an extension of the main memory and allowing data to be persisted. Persistent data, for residing in the farthest memory layer from the CPU, are commonly not manipulated directly. They are indirectly manipulated with their transient copies that may differ, in form, from their persistent form. These transient copies will also be scattered throughout the several volatile memories in the memory hierarchy, incurring in data replication. This scenario may change with the adoption of emerging non-volatile memories (NVMs), like phase change memory for example, that may allow persistent data to exist in the main memory. This might allow a direct manipulation of persistent data, accelerating their access time and probably reducing the usage of replications. Unfortunately, NVMs are still not broadly available on the market, and research on their usage is still mostly done through simulation. We present a simulator to explore the usage of NVMs in the main memory. We demonstrate the usage of the simulator in two scenarios, the first where DRAM is completely replaced for NVMs, and the second in which a hybrid architecture employing DRAM and NVM is explored. For now, DRAM provides faster access times when compared with NVMs. We show that the use of a main memory composed exclusively of NVMs may incur in slowdowns as high as 5.3, but may be negligible in some cases. In the hybrid main memory scenario, we showed that, although persistent data can be manipulated directly, there are cases in which is still better to work with transient copies, depending on the frequency of usage of the persistent data. To allow programs to make use of the non-volatility presented in main memory, we provide an API, called NVMalloc, that is able to allocate persistent memory in the main memory. We expect the simulator to be a starting point for future researches regarding the usage of NVMsMestradoCiência da ComputaçãoMestre em Ciência da Computação1564396CAPE
    corecore