6 research outputs found

    Improving Index Performance through Prefetching

    Full text link

    Performance Analysis of the Alpha 21264-Based Compaq ES40 System

    No full text
    This paper evaluates performance characteristics of the Compaq ES40 shared memory multiprocessor. The ES40 system contains up to four Alpha 21264 CPU’s together with a high-performance memory system. We qualitatively describe architectural features included in the 21264 microprocessor and the surrounding system chipset. We further quantitatively show the performance effects of these features using benchmark results and profiling data collected from industry-standard commercial and technical workloads. The profile data includes basic performance information – such as instructions per cycle, branch mispredicts, and cache misses – as well as other data that specifically characterizes the 21264. Wherever possible, we compare and contrast the ES40 to the AlphaServer 4100 – a previous-generation Alpha system containing four Alpha 21164 microprocessors – to highlight the architectural advances in the ES40. We find that the Compaq ES40 often provides 2 to 3 times the performance of the AlphaServer 4100 at similar clock frequencies. We also find that the ES40 memory system has about five times the memory bandwidth of the 4100. These performance improvements come from numerous microprocessor and platform enhancements, including out-of-order execution, branch prediction, functional units, and the memory system. 1

    Secure execution environments through reconfigurable lightweight cryptographic components

    Get PDF
    Software protection is one of the most important problems in the area of computing as it affects a multitude of players like software vendors, digital content providers, users, and government agencies. There are multiple dimensions to this broad problem of software protection. The most important ones are: (1) protecting software from reverse engineering. (2) protecting software from tamper (or modification). (3) preventing software piracy. (4) verification of integrity of the software;In this thesis we focus on these areas of software protection. The basic requirement to achieve these goals is to provide a secure execution environment, which ensures that the programs behave in the same way as it was designed, and the execution platforms respect certain types of wishes specified by the program;We take the approach of providing secure execution environment through architecture support. We exploit the power of reconfigurable components in achieving this. The first problem we consider is to provide architecture support for obfuscation. This also achieves the goals of tamper resistance, copy protection, and IP protection indirectly. Our approach is based on the intuition that the software is a sequence of instructions (and data) and if the sequence as well the contents are obfuscated then all the required goals can be achieved;The second problem we solve is integrity verification of the software particularly in embedded devices. Our solution is based on the intuition that an obfuscated (permuted) binary image without any dynamic traces reveals very little information about the IP of the program. Moreover, if this obfuscation function becomes a shared secret between the verifier and the embedded device then verification can be performed in a trustworthy manner;Cryptographic components form the underlying building blocks/primitives of any secure execution environment. Our use of reconfigurable components to provide software protection in both Arc 3 D and TIVA led us to an interesting observation about the power of reconfigurable components. Reconfigurable components provide the ability to use the secret (or key) in a much stronger way than the conventional cryptographic designs. This opened up an opportunity for us to explore the use of reconfigurable gates to build cryptographic functions

    Memory architectures for exaflop computing systems

    Get PDF
    Most computing systems are heavily dependent on their main memories, as their primary storage, or as an intermediate cache for slower storage systems (HDDs). The capacity of memory systems, as well as their performance, have a direct impact on overall computing capabilities of the system, and are also major contributors to its initial and operating costs. Dynamic Random Access Memory (DRAM) technology has been dominating the main memory landscape since its beginnings in 1970s until today. However, due to DRAM's inherent limitations, its steady rate of development has saturated over the past decade, creating a disparity between CPU and main memory performance, known as the memory wall. Modern parallel architectures, such as High-Performance Computing (HPC) clusters and manycore solutions, create even more stress on their memory systems. It is not trivial to estimate memory requirements that these systems will have in the future, and if DRAM technology would be able to meet them, or we would need to look for a novel memory solution. This thesis attempts to give insight in the most important technological challenges that future memory systems need to address, in order to meet the ever growing requirements of users and their applications, in manycore and HPC context. We try to describe the limitations of DRAM, as the dominant technology in today's main memory systems, that may impede performance or increase cost of future systems. We discuss some of the emerging memory technologies, and by comparing them with DRAM, we try to estimate their potential usage in future memory systems. The thesis evaluates the requirements of manycore scientific applications, in terms of memory bandwidth and footprint, and estimates how these requirements may change in the future. With this evaulation in mind, we propose a hybrid memory solution that employs DRAM and PCM, as well as several page placement and page migration policies, to bridge the gap between fast and small DRAM and larger but slower non-volatile memory. As the aforementioned evaluations required custom software solutions, we present tools we produced over the course of this PhD, which continue to be used in Heterogeneous Computer Architectures group in Barcelona Supercomputing Center. First, Limpio - a LIghtweight MPI instrumentatiOn framework, that provides an interface for low-overhead instrumentation and profiling of MPI applications with user-defined routines. Second, MemTraceMPI, a Valgrind tool, used to produce memory access traces of MPI applications, with several innovative concepts included (filter-cache, iteration tracing, compressed trace files).La mayoría de los sistemas de computación dependen en gran medida de sus principales recuerdos, como su almacenamiento primario, o como un caché intermedio para sistemas de almacenamiento más lentos (discos duros). La capacidad de los sistemas de memoria, así como su rendimiento, tienen un impacto directo en las capacidades globales de computación del sistema, y también son los principales contribuyentes a sus costos iniciales y de operación. Tecnología Dynamic Random Access memoria (DRAM) ha estado dominando el principal paisaje de memoria desde sus inicios en 1970 hasta la actualidad. Sin embargo, debido a las limitaciones inherentes de DRAM, su tasa constante de desarrollo ha saturado durante la última década, creando una disparidad entre la CPU y el rendimiento de la memoria principal, conocido como el muro de la memoria. Arquitecturas modernas paralelas, como la computación (HPC) de alto rendimiento y soluciones manycore, crear aún más presión sobre sus sistemas de memoria. No es trivial para estimar los requisitos de memoria que estos sistemas tendrán en el futuro, y si la tecnología DRAM sería capaz de cumplir con ellas, o que tendría que buscar una solución de memoria novela. En esta tesis se intenta dar una idea de los más importantes retos tecnológicos que los sistemas de memoria futuras deben abordar, con el fin de satisfacer las necesidades cada vez mayores de los usuarios y sus aplicaciones, en Manycore y HPC contexto. Intentamos describir las limitaciones de memoria DRAM, como la tecnología dominante en los sistemas de memoria principal de hoy en día, que pueden impedir el rendimiento o el aumento de los costos de los sistemas futuros. Se discuten algunas de las tecnologías de memoria emergentes, y comparándolos con DRAM, tratamos de estimar su uso potencial en sistemas de memoria futuras. La tesis evalúa los requisitos de las aplicaciones científicas manycore, en términos de ancho de banda de memoria y huella, y estima cómo estos requisitos pueden cambiar en el futuro. Con esta evaulation en mente, proponemos una solución de memoria híbrida que emplea DRAM y PCM, así como varias políticas de colocación de la página y la página de la migración, para cerrar la brecha entre la DRAM rápido y pequeño y más grande pero la memoria más lenta no volátil. Como las evaluaciones mencionadas necesarias soluciones de software personalizadas, se presentan las herramientas que hemos producido en el transcurso de esta tesis doctoral, que se siguen utilizando en el grupo heterogéneo de computadoras Arquitecturas en Barcelona Supercomputing Center. En primer lugar, Limpio - un marco MPI Instrumentación ligero, que proporciona una interfaz para la instrumentación de baja sobrecarga y perfilado de aplicaciones MPI con rutinas definidas por el usuario. En segundo lugar, MemTraceMPI, una herramienta Valgrind, utilizado para producir los rastros de acceso a memoria de aplicaciones MPI, con varios conceptos innovadores incluido (filtro-cache, trazado iteración, archivos de seguimiento comprimido)
    corecore