2,858 research outputs found

    The future of computing beyond Moore's Law.

    Get PDF
    Moore's Law is a techno-economic model that has enabled the information technology industry to double the performance and functionality of digital electronics roughly every 2 years within a fixed cost, power and area. Advances in silicon lithography have enabled this exponential miniaturization of electronics, but, as transistors reach atomic scale and fabrication costs continue to rise, the classical technological driver that has underpinned Moore's Law for 50 years is failing and is anticipated to flatten by 2025. This article provides an updated view of what a post-exascale system will look like and the challenges ahead, based on our most recent understanding of technology roadmaps. It also discusses the tapering of historical improvements, and how it affects options available to continue scaling of successors to the first exascale machine. Lastly, this article covers the many different opportunities and strategies available to continue computing performance improvements in the absence of historical technology drivers. This article is part of a discussion meeting issue 'Numerical algorithms for high-performance computational science'

    Energy challenges for ICT

    Get PDF
    The energy consumption from the expanding use of information and communications technology (ICT) is unsustainable with present drivers, and it will impact heavily on the future climate change. However, ICT devices have the potential to contribute signi - cantly to the reduction of CO2 emission and enhance resource e ciency in other sectors, e.g., transportation (through intelligent transportation and advanced driver assistance systems and self-driving vehicles), heating (through smart building control), and manu- facturing (through digital automation based on smart autonomous sensors). To address the energy sustainability of ICT and capture the full potential of ICT in resource e - ciency, a multidisciplinary ICT-energy community needs to be brought together cover- ing devices, microarchitectures, ultra large-scale integration (ULSI), high-performance computing (HPC), energy harvesting, energy storage, system design, embedded sys- tems, e cient electronics, static analysis, and computation. In this chapter, we introduce challenges and opportunities in this emerging eld and a common framework to strive towards energy-sustainable ICT

    Characterization of interconnection networks in CMPs using full-system simulation

    Get PDF
    Los computadores más recientes incluyen complejos chips compuestos de varios procesadores y una cantidad significativa de memoria cache. La tendencia actual consiste en conectar varios nodos, cada uno de ellos con un procesador y uno o más niveles de cache privada y/o compartida, utilizando una red de interconexión. La importancia de esta red está aumentando a medida que crece el número de nodos que se integran en un chip, ya que pueden aparecer cuellos de botella en la comunicación que reduzcan las prestaciones. Además, la red contribuye en gran medida al consumo de energía y área del chip. En este proyecto, comparamos el comportamiento de tres topologías: el anillo bidireccional, la malla y el toro. El anillo es una topología mínima con bajo coste en energía pero peor rendimiento debido a la mayor latencia de comunicación entre nodos. Por otro lado, el toro tiene mayor número de enlaces entre nodos y ofrece mejores prestaciones. La malla ha sido incluida como una opción intermedia altamente popular. Analizaremos también dos topologías de anillo adicionales que aprovechan la reducida área y complejidad del mismo: una con mayor ancho de banda y otra con routers de menor número de ciclos. Modelamos cuidadosamente todos los componentes del sistema (procesadores, jerarquía de memoria y red de interconexión) utilizando simulación de sistema completo. Ejecutamos aplicaciones reales en arquitecturas con 16 y 64 nodos, incluyendo tanto cargas paralelas como multiprogramadas (ejecución de varias aplicaciones independientes). Demostramos que la topología de la red afecta en gran medida al rendimiento en sistemas con 64 nodos. Con las topologías de anillo, los tiempos de ejecución son mucho mayores debido al aumento del número de saltos que le cuesta a un mensaje atravesar la red. El toro es la topología que ofrece mejor rendimiento, pero la elección más óptima sería la malla si tenemos en cuenta también energía y área. Por otro lado, para chips con 16 nodos, las diferencias en rendimiento son menores y un anillo con routers de 3 cyclos ofrece un tiempo de ejecución aceptable con el menor coste en área y energía. Nuestra aportación más significativa está relacionada con la distribución del tráfico en la red. Vemos que el tráfico no está distribuido uniformemente y que los nodos con mayores tasas de inyección varían con la aplicación. Hasta donde nosotros sabemos, no hay ningún trabajo de investigación previo que destaque este comportamiento

    STATISTICAL MACHINE LEARNING BASED MODELING FRAMEWORK FOR DESIGN SPACE EXPLORATION AND RUN-TIME CROSS-STACK ENERGY OPTIMIZATION FOR MANY-CORE PROCESSORS

    Get PDF
    The complexity of many-core processors continues to grow as a larger number of heterogeneous cores are integrated on a single chip. Such systems-on-chip contains computing structures ranging from complex out-of-order cores, simple in-order cores, digital signal processors (DSPs), graphic processing units (GPUs), application specific processors, hardware accelerators, I/O subsystems, network-on-chip interconnects, and large caches arranged in complex hierarchies. While the industry focus is on putting higher number of cores on a single chip, the key challenge is to optimally architect these many-core processors such that performance, energy and area constraints are satisfied. The traditional approach to processor design through extensive cycle accurate simulations are ill-suited for designing many-core processors due to the large microarchitecture design space that must be explored. Additionally it is hard to optimize such complex processors and the applications that run on them statically at design time such that performance and energy constraints are met under dynamically changing operating conditions. The dissertation establishes statistical machine learning based modeling framework that enables the efficient design and operation of many-core processors that meets performance, energy and area constraints. We apply the proposed framework to rapidly design the microarchitecture of a many-core processor for multimedia, computer graphics rendering, finance, and data mining applications derived from the Parsec benchmark. We further demonstrate the application of the framework in the joint run-time adaptation of both the application and microarchitecture such that energy availability constraints are met

    Doctor of Philosophy

    Get PDF
    dissertationPortable electronic devices will be limited to available energy of existing battery chemistries for the foreseeable future. However, system-on-chips (SoCs) used in these devices are under a demand to offer more functionality and increased battery life. A difficult problem in SoC design is providing energy-efficient communication between its components while maintaining the required performance. This dissertation introduces a novel energy-efficient network-on-chip (NoC) communication architecture. A NoC is used within complex SoCs due it its superior performance, energy usage, modularity, and scalability over traditional bus and point-to-point methods of connecting SoC components. This is the first academic research that combines asynchronous NoC circuits, a focus on energy-efficient design, and a software framework to customize a NoC for a particular SoC. Its key contribution is demonstrating that a simple, asynchronous NoC concept is a good match for low-power devices, and is a fruitful area for additional investigation. The proposed NoC is energy-efficient in several ways: simple switch and arbitration logic, low port radix, latch-based router buffering, a topology with the minimum number of 3-port routers, and the asynchronous advantages of zero dynamic power consumption while idle and the lack of a clock tree. The tool framework developed for this work uses novel methods to optimize the topology and router oorplan based on simulated annealing and force-directed movement. It studies link pipelining techniques that yield improved throughput in an energy-efficient manner. A simulator is automatically generated for each customized NoC, and its traffic generators use a self-similar message distribution, as opposed to Poisson, to better match application behavior. Compared to a conventional synchronous NoC, this design is superior by achieving comparable message latency with half the energy
    corecore