3 research outputs found
Recommended from our members
Dynamic Processor Reconfiguration for Power, Performance and Reliability Management
Technology advancements allowed more transistors to be packed in a smaller area, while the improved performance helped in achieving higher clock frequencies. This, unfortunately led to a power density problem, forcing processor industry to lower the clock frequency and integrate multiple cores on the same die. Depending on core characteristics, the multiple cores in the die could be symmetric or asymmetric. Asymmetric multi-core processors (AMPs) have been proposed as an alternative to symmetric multi-cores to improve power efficiency. AMPs comprise of cores that implement the same ISA, but differ in performance and power characteristics due to varying sizes of micro-architectural resources. As the computational bottleneck of a workload shifts from one resource to another during its course of execution, reassigning it to another core (where it runs more efficiently), can improve the overall power efficiency. Thus achieving high power efficiency in AMPs requires (i) a diverse set of cores that are optimized for various program phases, (ii) runtime analysis to determine the best core to run on, and (iii) low overhead of re-assigning a thread to a different core type.
Decisions to swap threads between AMPs are made at coarse grain granularity of millions of instructions, to mitigate the impact of thread migration overhead. But the computational needs of the program rapidly change during the course of its execution. The best core configuration for an application such that, both power consumption and performance are optimized, changes over time rapidly at fine granularity of thousands of instructions. This dissertation explores ways to design core micro-architecture such that high power efficiency could be achieved, if switching overhead could be lowered, enabling fine grain switching.
To take advantage of power saving opportunities at fine grain granularity, this thesis explores reconfigurable/morphable architectures where core resources are reconfigured on demand to suit the needs of the executing application. At first, we explore reconfigurable architectures consisting of two kinds of cores: out-of-order (OOO) big cores and in-order (InO) small cores. The big cores provide higher performance while the small cores are more power efficient. In this proposed architecture, OOO core reconfigures into InO core at run time. Our proposed online management scheme decides to switch between these core types such that we obtain significant power benefits without impacting performance. We also observe that, resource requirements of applications can be quite diverse and consequently, resource bottlenecks or excesses can vary considerably. Thus, reconfiguration between just two core modes may not fully exploit power and performance improvement opportunities.
We therefore, explore reconfigurable architectures consisting of diverse core types that not limited to big and little cores. A single core can reconfigure into multiple core modes where each mode has unique power and performance characteristics. Workload performance on a particular core mode depends on a large set of processor resources. Some workloads are highly memory intensive, some exhibit large instruction dependency, some experience high rates of branch mis-prediction, while other workloads exhibit large exploitable instruction level parallelism. A diverse set of core modes is needed, that could address shifting resource needs during various program phases of an application. Different trade-offs in power and performance could be achieved by reducing or expanding the size of various resource. Trade-offs for each core mode are also affected by operating voltage and frequency. We therefore, propose joint core resource resizing with dynamic voltage and frequency scaling (DVFS), which is important for applications whose performance is sensitive to changes in frequency. Thus, at fine granularity, the core should adapt to varying instruction window sizes, execution bandwidth and frequency to meet the demands of the workload at run-time to improve power efficiency.
Many current processors employ DVFS aggressively to improve power efficiency and maximize performance. This dissertation studies the tradeoff in power efficiency in using fine grain DVFS and reconfigurable architectures mentioned above.We also explore another important problem due to continued scaling of devices which results in higher vulnerability to soft-errors. We consider dynamic core reconfiguration from the perspectives of both power efficiency and vulnerability to soft-errors. An online management scheme is proposed such that core reconfiguration upon a thread switch not only improves power efficiency but also does not increase the vulnerability to soft errors.
In summary, we propose in this thesis several solutions for improving power efficiency by integrating heterogeneity within the core. We also address how popular power reduction techniques like DVFS are comparable to our approach. Finally, we address reliability challenges along with improving power efficiency
Predição de performance em ambientes multi-core com aceleradores compartilhados
Dois dos principais fatores do aumento da performance em aplicações single-thread – frequência de operação e exploração do paralelismo à nível das instruções – tiveram pouco avanço nos últimos anos devido a restrições de potência. Além disso, a exploração do paralelismo em nível de threads é limitada pelas porções intrinsecamente sequenciais das aplicações. Neste contexto, é cada vez mais comum a integração de aceleradores em arquiteturas multi-core. Esses sistemas exploram o que há de vantajoso em cada um dos componentes que os compõem: enquanto o arranjo multi-core explora o paralelismo em nível de threads, aceleradores em hardware executam funções com performance e eficiência energética ordens de grandeza maiores que uma possível execução nos cores de propósito geral. No atual estado da arte, poucos trabalhos propuseram métricas que avaliam características outras que não a performance ou energia desses arranjos multi-core que compartilham aceleradores em hardware entre os elementos de processamento. Os MPSoCs, Multi-processor System-on-Chips, popularizaram a integração de aceleradores nessas arquiteturas, mas nenhum estudo foi realizado em termos da aceleração esperada com esses aceleradores, ou sobre o custo-benefício esperado da adição de novos aceleradores nessas arquiteturas. A ausência de métricas que avaliem outras características de arquiteturas multi-core heterogêneas pode limitar severamente o seu potencial. Este trabalho tem por objetivo propor uma nova métrica para a integração de aceleradores compartilhados em arquiteturas multi-core, SACL (do inglês, Shared Accelerator Concurrency Level), descorrelacionada do paralelismo no nível das threads. Esta métrica avalia o percentual de uma aplicação em que blocos básicos aceleráveis executam simultâneamente em diferentes threads ativas no contexto, competindo assim pelo uso do acelerador. A partir disto, o projetista pode usar o valor obtido para prever a aceleração esperada para uma determinada aplicação, e também estabelecer o custo-benefício da adição (ou não) de novos aceleradores no sistema. Essa métrica é independente do acelerador, podendo ser utilizada tanto para aceleradores específicos como reconfiguráveis.Two of the major drivers of increased performance in single-thread applications - increase in operation frequency and exploitation of instruction-level parallelism - have had little advances in the last years due to power constraints. In addition, the intrinsic sequential portions of application limit exploitation of thread-level parallelism. In this context, it is increasingly common the integration of hardware accelerator in multi-core architectures. These systems are able to exploit the most advantageous features of each composing component: while the multi-core configuration exploits thread-level parallelism, hardware accelerators execute functions with increased performance and energy efficiency when comparing to execution in a general purpose processors. In the current state of art, very few works proposed metrics that evaluate characteristics other than performance or energy of these multi-core configurations that share hardware accelerators among processing elements. MPSoCs have popularized the integration of accelerators in these architectures, but there is no study realized in regard of expected speedup with these accelerators, or about the expected cost-benefit of the addition of more accelerators in these architectures. The absence of metrics that evaluate different characteristics of heterogeneous multi-core architectures may severely limit its potential. The goal of this work is to propose a new metric for the integration of shared hardware accelerators in multi-core architectures, SACL, uncorrelated to the thread-level parallelism (TLP). This metric evaluates the percentual of an application that accelerable basic blocks simultaneously execute in different active threads in the context, thus competing to use the accelerator. With this metric, a designer can use the obtained value to predict the expected speedup for a specific application, and to establish the cost-benefit of adding new accelerators to the system. The proposed metric is independent of the accelerator type, and it can be used with specific or reconfigurable accelerators
Methoden zur applikationsspezifischen Effizienzsteigerung adaptiver Prozessorplattformen
General-Purpose Prozessoren sind für den durchschnittlichen Anwendungsfall optimiert, wodurch vorhandene Ressourcen nicht effizient genutzt werden. In der vorliegenden Arbeit wird untersucht, in wie weit es möglich ist, einen General-Purpose Prozessor an einzelne Anwendungen anzupassen und so die Effizienz zu steigern. Die Adaption kann zur Laufzeit durch das Prozessor- oder Laufzeitsystem anhand der jeweiligen Systemparameter erfolgen, um eine Effizienzsteigerung zu erzielen