Search CORE

138 research outputs found

ANALYTICAL MODEL FOR CHIP MULTIPROCESSOR MEMORY HIERARCHY DESIGN AND MAMAGEMENT

Author: Oh Tae Cheol
Publication venue
Publication date: 30/01/2011
Field of study

Continued advances in circuit integration technology has ushered in the era of chip multiprocessor (CMP) architectures as further scaling of the performance of conventional wide-issue superscalar processor architectures remains hard and costly. CMP architectures take advantageof Moore¡¯s Law by integrating more cores in a given chip area rather than a single fastyet larger core. They achieve higher performance with multithreaded workloads. However,CMP architectures pose many new memory hierarchy design and management problems thatmust be addressed. For example, how many cores and how much cache capacity must weintegrate in a single chip to obtain the best throughput possible? Which is more effective,allocating more cache capacity or memory bandwidth to a program?This thesis research develops simple yet powerful analytical models to study two newmemory hierarchy design and resource management problems for CMPs. First, we considerthe chip area allocation problem to maximize the chip throughput. Our model focuses onthe trade-off between the number of cores, cache capacity, and cache management strategies.We find that different cache management schemes demand different area allocation to coresand cache to achieve their maximum performance. Second, we analyze the effect of cachecapacity partitioning on the bandwidth requirement of a given program. Furthermore, ourmodel considers how bandwidth allocation to different co-scheduled programs will affect theindividual programs¡¯ performance. Since the CMP design space is large and simulating only one design point of the designspace under various workloads would be extremely time-consuming, the conventionalsimulation-based research approach quickly becomes ineffective. We anticipate that ouranalytical models will provide practical tools to CMP designers and correctly guide theirdesign efforts at an early design stage. Furthermore, our models will allow them to betterunderstand potentially complex interactions among key design parameters

D-Scholarship@Pitt

Recommended from our members

Modeling the Effects of Memory Hierarchy Performance on Throughput of Multithreaded Processors

Author: Fedorova Alexandra
Seltzer Margo I.
Smith Michael D.
Publication venue
Publication date: 25/02/2016
Field of study

Understanding the relationship between the performance of the on-chip processor caches and the overall performance of the processor is critical for both hardware design and software program optimization. While this relationship is well understood for conventional processors, it is not understood for new multithreaded processors that hide a workload's memory latency by executing instructions from several threads in parallel. In this paper we present a model for estimating processor throughput as a function of the cache hierarchy performance. Our model has a closed-form solution, is robust against a range of workloads and input parameters, and gives estimates of processor throughput that are within 13% of measured values for heterogeneous workloads. We demonstrate how this model can be used in an operating system scheduler tailored for multithreaded processor systems.Engineering and Applied Science

Harvard University - DASH

Design and Performance of Scalable High-Performance Programmable Routers - Doctoral Dissertation, August 2002

Author: Wolf Tilman
Publication venue: Washington University Open Scholarship
Publication date: 03/05/2002
Field of study

The flexibility to adapt to new services and protocols without changes in the underlying hardware is and will increasingly be a key requirement for advanced networks. Introducing a processing component into the data path of routers and implementing packet processing in software provides this ability. In such a programmable router, a powerful processing infrastructure is necessary to achieve to level of performance that is comparable to custom silicon-based routers and to demonstrate the feasibility of this approach. This work aims at the general design of such programmable routers and, specifically, at the design and performance analysis of the processing subsystem. The necessity of programmable routers is motivated, and a router design is proposed. Based on the design, a general performance model is developed and quantitatively evaluated using a new network processor benchmark. Operational challenges, like scheduling of packets to processing engines, are addressed, and novel algorithms are presented. The results of this work give qualitative and quantitative insights into this new domain that combines issues from networking, computer architecture, and system design

Washington University St. Louis: Open Scholarship

Automatic synthesis and optimization of chip multiprocessors

Author: Nikitin Nikita
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Empirical and Statistical Application Modeling Using on -Chip Performance Monitors.

Author: Cameron Kirk William
Publication venue: LSU Digital Commons
Publication date: 01/01/2000
Field of study

To analyze the performance of applications and architectures, both programmers and architects desire formal methods to explain anomalous behavior. To this end, we present various methods that utilize non-intrusive, performance-monitoring hardware only recently available on microprocessors to provide further explanations of observed behavior. All the methods attempt to characterize and explain the instruction-level parallelism achieved by codes on different architectures. We also present a prototype tool automating the analysis process to exploit the advantages of the empirical and statistical methods proposed. The empirical, statistical and hybrid methods are discussed and explained with case study results provided. The given methods further the wealth of tools available to programmer\u27s and architects for generally understanding the performance of scientific applications. Specifically, the models and tools presented provide new methods for evaluating and categorizing application performance. The empirical memory model serves to quantify the hierarchical memory performance of applications by inferring the incurred latencies of codes after the effect of latency hiding techniques are realized. The instruction-level model and its extensions model on-chip performance analytically giving insight into inherent performance bottlenecks in superscalar architectures. The statistical model and its hybrid extension provide other methods of categorizing codes via their statistical variations. The PTERA performance tool automates the use of performance counters for use by these methods across platforms making the modeling process easier still. These unique methods provide alternatives to performance modeling and categorizing not available previously in an attempt to utilize the inherent modeling capabilities of performance monitors on commodity processors for scientific applications

Louisiana State University

Recommended from our members

Cache-Fair Thread Scheduling for Multicore Processors

Author: Fedorova Alexandra
Seltzer Margo I.
Smith Michael D.
Publication venue
Publication date: 04/03/2016
Field of study

We present a new operating system scheduling algorithm for multicore processors. Our algorithm reduces the effects of unequal CPU cache sharing that occur on these processors and cause unfair CPU sharing, priority inversion, and inadequate CPU accounting. We describe the implementation of our algorithm in the Solaris operating system and demonstrate that it produces fairer schedules enabling better priority enforcement and improved performance stability for applications. With conventional scheduling algorithms, application performance on multicore processors varies by up to 36% depending on the runtime characteristics of concurrent processes. We reduce this variability by up to a factor of seven.Engineering and Applied Science

Harvard University - DASH

High-performance and hardware-aware computing: proceedings of the second International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC\u2711), San Antonio, Texas, USA, February 2011 ; (in conjunction with HPCA-17)

Author: Buchty Rainer
Weiß Jan-Philipp
Publication venue: KIT Scientific Publishing, Karlsruhe
Publication date: 01/01/2011
Field of study

High-performance system architectures are increasingly exploiting heterogeneity. The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach

KITopen

Analysis and simulation of emergent architectures for internet of things

Author: Roca Marí Damián
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/02/2018
Field of study

The Internet of Things (IoT) promises a plethora of new services and applications supported by a wide range of devices that includes sensors and actuators. To reach its potential IoT must break down the silos that limit applications' interoperability and hinder their manageability. These silos' result from existing deployment techniques where each vendor set up its own infrastructure, duplicating the hardware and increasing the costs. Fog Computing can serve as the underlying platform to support IoT applications thus avoiding the silos'. Each application becomes a system formed by IoT devices (i.e. sensors, actuators), an edge infrastructure (i.e. Fog Computing) and the Cloud. In order to improve several aspects of human lives, different systems can interact to correlate data obtaining functionalities not achievable by any of the systems in isolation. Then, we can analyze the IoT as a whole system rather than a conjunction of isolated systems. Doing so leads to the building of Ultra-Large Scale Systems (ULSS), an extension of the concept of Systems of Systems (SoS), in several verticals including Autonomous Vehicles, Smart Cities, and Smart Grids. The scope of ULSS is large in the number of things and complex in the variety of applications, volume of data, and diversity of communication patterns. To handle this scale and complexity in this thesis we propose Hierarchical Emergent Behaviors (HEB), a paradigm that builds on the concepts of emergent behavior and hierarchical organization. Rather than explicitly program all possible situations in the vast space of ULSS scenarios, HEB relies on emergent behaviors induced by local rules that define the interactions of the "things" between themselves and also with their environment. We discuss the modifications to classical IoT architectures required by HEB, as well as the new challenges. Once these challenges such as scalability and manageability are addressed, we can illustrate HEB's usefulness dealing with an IoT-based ULSS through a case study based on Autonomous Vehicles (AVs). To this end we design and analyze well-though simulations that demonstrate its tremendous potential since small modifications to the basic set of rules induce different and interesting behaviors. Then we design a set of primitives to perform basic maneuver such as exiting a platoon formation and maneuvering in anticipation of obstacles beyond the range of on-board sensors. These simulations also evaluate the impact of a HEB deployment assisted by Fog nodes to enlarge the informational scope of vehicles. To conclude we develop a design methodology to build, evaluate, and run HEB-based solutions for AVs. We provide architectural foundations for the second level and its implications in major areas such as communications. These foundations are then validated through simulations that incorporate new rules, obtaining valuable experimental observations. The proposed architecture has a tremendous potential to solve the scalability issue found in ULSS, enabling IoT deployments to reach its true potential.El Internet de las Cosas (IoT) promete una plétora de nuevos servicios y aplicaciones habilitadas por una amplia gama de dispositivos que incluye sensores y actuadores. Para alcanzar su potencial, IoT debe superar los silos que limitan la interoperabilidad de las aplicaciones y dificultan su administración. Estos silos son el resultado de las técnicas de implementación existentes en las que cada proveedor instala su propia infraestructura y duplica el hardware, incrementando los costes. Fog Computing puede servir como la plataforma subyacente que soporte aplicaciones del IoT evitando así los silos. Cada aplicación se convierte en un sistema formado por dispositivos IoT (por ejemplo sensores y actuadores), una infraestructura (como Fog Computing) y la nube. Con el fin de mejorar varios aspectos de la vida humana, diferentes sistemas pueden interactuar para correlacionar datos obteniendo funcionalidades que no pueden lograrse por ninguno de los sistemas de forma aislada. Entonces, podemos analizar el IoT como un único sistema en lugar de una conjunción de sistemas aislados. Esta perspectiva conduce a la construcción de Ultra-Large Scale Systems (ULSS), una extensión del concepto de Systems of Systems (SoS), en varios verticales, incluidos los vehículos autónomos, Smart Cities y Smart Grids. El alcance de ULSS es vasto debido a la cantidad de dispositivos y complejo en la variedad de aplicaciones, volumen de datos y diversidad de patrones de comunicación. Para manejar esta escala y complejidad, en esta tesis proponemos Hierarchical Emergent Behaviors (HEB), un paradigma que se basa en los conceptos de comportamientos emergente y organización jerárquica. En lugar de programar explícitamente todas las situaciones posibles en el vasto espacio de escenarios presentes en los ULSS, HEB se basa en comportamientos emergentes inducidos por reglas locales que definen las interacciones de las "cosas" entre ellas y también con su entorno. Discutimos las modificaciones a las arquitecturas clásicas de IoT requeridas por HEB, así como los nuevos desafíos. Una vez que se abordan estos desafíos, como la escalabilidad y la capacidad de administración, podemos ilustrar la utilidad de HEB cuando se ocupa de un ULSS basado en IoT a través de un caso de estudio basado en Vehículos Autónomos (AV). Con este fin, diseñamos y analizamos simulaciones que demuestran su enorme potencial, ya que pequeñas modificaciones en el conjunto básico de reglas inducen comportamientos diferentes e interesantes. Luego, diseñamos un conjunto de primitivas para realizar una maniobra básica, como salir de un pelotón y maniobrar en anticipación de obstáculos más allá del alcance de los sensores de a bordo. Estas simulaciones también evalúan el impacto de una implementación de HEB asistida por nodos de Fog Computing para ampliar el alcance sensorial de los vehículos. Para concluir, desarrollamos una metodología de diseño para construir, evaluar y ejecutar soluciones basadas en HEB para AV. Brindamos fundamentos arquitectónicos para el segundo nivel de HEB y sus implicaciones en áreas importantes como las comunicaciones. Estas bases se validan a través de simulaciones que incorporan nuevas reglas, obteniendo valiosas observaciones experimentales. La arquitectura propuesta tiene un enorme potencial para resolver el problema de escalabilidad que presentan los ULSS, permitiendo que las implementaciones de IoT alcancen su verdadero potencial.Postprint (published version

UPCommons. Portal del coneixement obert de la UPC

Analysis and simulation of emergent architectures for internet of things

Author: Roca Marí Damián
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2018
Field of study

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa