Search CORE

17 research outputs found

Large-Scale Sorting in Uniform Memory Hierarchies

Author: Nodine Mark H.
Vitter Jeffrey Scott
Publication venue: 'Elsevier BV'
Publication date: 21/03/2011
Field of study

We present several e cient algorithms for sorting on the uniform memory hierarchy (UMH), introduced by Alpern, Carter, and Feig, and its paral- lelization P-UMH.We give optimal and nearly-optimal algorithms for a wide range of bandwidth degradations, including a parsimonious algorithm for constant bandwidth. We also develop optimal sorting algorithms for all bandwidths for other versions of UMH and P-UMH, including natural restrictions we introduce called RUMH and P-RUMH, which more closely correspond to current programming languages

KU ScholarWorks

vMCA: Memory Capacity Aggregation and Management in Cloud Environments

Author: Carpenter Paul
Garrido Luis A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/05/2018
Field of study

In cloud environments, the VMs within the computing nodes generate varying memory demand profiles. When memory utilization reaches its limits due to this, costly (virtual) disk accesses and/or VM migrations can occur. Since some nodes might have idle memory, some costly operations could be avoided by making the idle memory available to the nodes that need it. In view of this, new architectures have been introduced that provide hardware support for a shared global address space that, together with fast interconnects, can share resources across nodes. Thus, memory becomes a global resource. This paper presents a memory capacity aggregation mechanism for cloud environments called vMCA (Virtualized Memory Capacity Aggregation) based on Xen's Transcendent Memory (Tmem). vMCA distributes the system's total memory within a single node and globally across multiple nodes using a user-space process with high-level memory management policies. We evaluate vMCA using CloudSuite 3.0 on Linux and Xen. Our results demonstrate a peak running time improvement of 76.8% when aggregating memory, and of 37.5% when aggregating memory and implementing our policies.This research has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement number 610456 (Euroserver). The research was also supported by the Ministry of Economy and Competitiveness of Spain (TIN2012-34557 and TIN2015-65316), HiPEAC Network of Excellence (ICT-287759 and ICT-687698), the FI-DGR Grant Program (2016FI-B-00947) of the Government of Catalonia and the Severo Ochoa Program (SEV-2011-00067) of the Spanish Government.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Towards a theory of cache-efficient algorithms

Author: Chatterjee Siddhartha
Sen Sandeep
Publication venue: Society for Industrial and Applied Mathematics Philadelphia
Publication date: 01/01/2000
Field of study

We describe a model that enables us to analyze the running time of an algorithm in a computer with a memory hierarchy with limited associativity, in terms of various cache parameters. Our model, an extension of Aggarwal and Vitter's I/O model, enables us to establish useful relationships between the cache complexity and the I/O complexity of computations. As a corollary, we obtain cache-optimal algorithms for some fundamental problems like sorting, FFT, and an important subclass of permutations in the single-level cache model. We also show that ignoring associativity concerns could lead to inferior performance, by analyzing the average-case cache behavior of mergesort. We further extend our model to multiple levels of cache with limited associativity and present optimal algorithms for matrix transpose and sorting. Our techniques may be used for systematic exploitation of the memory hierarchy starting from the algorithm design stage, and dealing with the hitherto unresolved problem of limited associativity

Effective Large Scale Computing Software for Parallel Mesh Generation

Author: Kot andriy
Publication venue: W&M ScholarWorks
Publication date: 01/01/2011
Field of study

Scientists commonly turn to supercomputers or Clusters of Workstations with hundreds (even thousands) of nodes to generate meshes for large-scale simulations. Parallel mesh generation software is then used to decompose the original mesh generation problem into smaller sub-problems that can be solved (meshed) in parallel. The size of the final mesh is limited by the amount of aggregate memory of the parallel machine. Also, requesting many compute nodes on a shared computing resource may result in a long waiting, far surpassing the time it takes to solve the problem.;These two problems (i.e., insufficient memory when computing on a small number of nodes, and long waiting times when using many nodes from a shared computing resource) can be addressed by using out-of-core algorithms. These are algorithms that keep most of the dataset out-of-core (i.e., outside of memory, on disk) and load only a portion in-core (i.e., into memory) at a time.;We explored two approaches to out-of-core computing. First, we presented a traditional approach, which is to modify the existing in-core algorithms to enable out-of-core computing. While we achieved good performance with this approach the task is complex and labor intensive. An alternative approach, we presented a runtime system designed to support out-of-core applications. It requires little modification of the existing in-core application code and still produces acceptable results. Evaluation of the runtime system showed little performance degradation while simplifying and shortening the development cycle of out-of-core applications. The overhead from using the runtime system for small problem sizes is between 12% and 41% while the overlap of computation, communication and disk I/O is above 50% and as high as 61% for large problems.;The main contribution of our work is the ability to utilize computing resources more effectively. The user has a choice of either solving larger problems, that otherwise would not be possible, or solving problems of the same size but using fewer computing nodes, thus reducing the waiting time on shared clusters and supercomputers. We demonstrated that the latter could potentially lead to substantially shorter wall-clock time

College of William & Mary: W&M Publish

Cache-oblivious algorithms

Author: Prokop Harald, 1975-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 67-70).by Harald Prokop.S.M

CiteSeerX

DSpace@MIT

Performance Analysis of Irregular Task-Based Applications on Hybrid Platforms: Structure Matters

Author: Legrand Arnaud
Mello Schnorr Lucas
Miletto Marcelo,
Nesi Lucas,
Publication venue: 'Elsevier BV'
Publication date: 01/10/2022
Field of study

International audienceEfficiently exploiting computational resources in heterogeneous platforms is a real challenge which has motivated the adoption of the task-based programming paradigm where resource usage is dynamic and adaptive. Unfortunately, classical performance visualization techniques used in routine performance analysis often fail to provide any insight in this new context, especially when the application structure is irregular. In this paper, we propose several performance visualization techniques and modeling strategies motivated by the analysis of task-based multifrontal sparse linear solvers whose structure is particularly complex. We show that by building on both a performance model of irregular tasks and on structure of the application (in particular the elimination tree), we can detect and highlight anomalies and understand resource utilization from the application point-of-view in a very insightful way. We validate these novel performance analysis techniques with the QR_mumps sparse parallel solver by describing a series of case studies where we identify and address non trivial performance issues thanks to our visualization methodology

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Virtualization techniques for memory resource exploitation

Author: Garrido Platero Luis Angel
Publication venue: Universitat Politècnica de Catalunya
Publication date: 26/11/2019
Field of study

Cloud infrastructures have become indispensable in our daily lives with the rise of cloud-based services offered by companies like Facebook, Google, Amazon and many others. These cloud infrastructures use a large numbers of servers provisioned with their own computing resources. Each of these servers use a piece of software, called the Hypervisor (``HV''), that allows them to create multiple virtual instances of the server's physical computing resources and abstract them into "Virtual Machines'' (VMs). A VM runs an Operating System, which in turn runs the applications. The VMs within the servers generate varying memory demand behavior. When the demand increases, costly operations such as (virtual) disk accesses and/or VM migrations can occur. As a result, it is necessary to optimize the utilization of the local memory resources within a single computing server. However, pressure on the memory resources can still increase, making it necessary to migrate the VM to a different server with larger memory or add more memory to the same server. At this point, it is important to consider that some of the servers in the cloud infrastructure might have memory resources that they are not using. Considering the possibility to make memory available to the server, new architectures have been introduced that provide hardware support to enable servers to share their memory capacity. This thesis presents multiple contributions to the memory management problem. First, it addresses the problem of optimizing memory resources in a virtualized server through different types of memory abstractions. Two full contributions are presented for managing memory within a single server called SmarTmem and CARLEMM. In this respect, a third contribution is also presented, called CAVMem, that works as the foundation for CARLEMM. Second, this thesis presents two contributions for memory capacity aggregation across multiple servers, offering two mechanisms called GV-Tmem and vMCA, this latter being based on GV-Tmem but with significant enhancements. These mechanisms distribute the server's total memory within a single-server and globally across computing servers using a user-space process with high-level memory management policies.Las infraestructuras para la nube se han vuelto indispensables en nuestras vidas diarias con la proliferación de los servicios ofrecidos por compañías como Facebook, Google, Amazon entre otras. Estas infraestructuras utilizan una gran cantidad de servidores proveídos con sus propios recursos computacionales. Cada unos de estos servidores utilizan un software, llamado el Hipervisor (“HV”), que les permite crear múltiples instancias virtuales de los recursos físicos de computación del servidor y abstraerlos en “Máquinas Virtuales” (VMs). Una VM ejecuta un Sistema Operativo (OS), el cual a su vez ejecuta aplicaciones. Las VMs dentro de los servidores generan un comportamiento variable de demanda de memoria. Cuando la demanda de memoria aumenta, operaciones costosas como accesos al disco (virtual) y/o migraciones de VMs pueden ocurrir. Como resultado, es necesario optimizar la utilización de los recursos de memoria locales dentro del servidor. Sin embargo, la demanda por memoria puede seguir aumentando, haciendo necesario que la VM migre a otro servidor o que se añada más memoria al servidor. En este punto, es importante considerar que algunos servidores podrían tener recursos de memoria que no están utilizando. Considerando la posibilidad de hacer más memoria disponible a los servidores que lo necesitan, nuevas arquitecturas de servidores han sido introducidos que brindan el soporte de hardware necesario para habilitar que los servidores puedan compartir su capacidad de memoria. Esta tesis presenta múltiples contribuciones para el problema de manejo de memoria. Primero, se enfoca en el problema de optimizar los recursos de memoria en un servidor virtualizado a través de distintos tipos de abstracciones de memoria. Dos contribuciones son presentadas para administrar memoria de manera automática dentro de un servidor virtualizado, llamadas SmarTmem y CARLEMM. En este contexto, una tercera contribución es presentada, llamada CAVMem, que proporciona los fundamentos para el desarrollo de CARLEMM. Segundo, la tesis presenta dos contribuciones enfocadas en la agregación de capacidad de memoria a través de múltiples servidores, ofreciendo dos mecanismos llamados GV-Tmem y vMCA, siendo este último basado en GV-Tmem pero con mejoras significativas. Estos mecanismos administran la memoria total de un servidor a nivel local y de manera global a lo largo de los servidores de la infraestructura de nube utilizando un proceso de usuario que implementa políticas de manejo de ..

Tesis Doctorals en Xarxa

Virtualization techniques for memory resource exploitation

Author: Garrido Platero Luis Ángel
Publication venue: Universitat Politècnica de Catalunya
Publication date: 26/11/2019
Field of study

UPCommons. Portal del coneixement obert de la UPC