Search CORE

15 research outputs found

Iso-energy-efficiency: An approach to power-constrained parallel computation

Author: Ge Rong
Kirk Cameron
Song Shuaiwen
Su Chunyi
Vishnu Abhinav
Publication venue
Publication date: 01/01/2010
Field of study

Future large scale high performance supercomputer systems require high energy efficiency to achieve exaflops computational power and beyond. Despite the need to understand energy efficiency in high-performance systems, there are few techniques to evaluate energy efficiency at scale. In this paper, we propose a system-level iso-energy-efficiency model to analyze, evaluate and predict energy-performance of data intensive parallel applications with various execution patterns running on large scale power-aware clusters. Our analytical model can help users explore the effects of machine and application dependent characteristics on system energy efficiency and isolate efficient ways to scale system parameters (e.g. processor count, CPU power/frequency, workload size and network bandwidth) to balance energy use and performance. We derive our iso-energy-efficiency model and apply it to the NAS Parallel Benchmarks on two power-aware clusters. Our results indicate that the model accurately predicts total system energy consumption within 5% error on average for parallel applications with various execution and communication patterns. We demonstrate effective use of the model for various application contexts and in scalability decision-making

Computer Science Technical Reports @Virginia Tech

A dynamic power-aware partitioner with task migration for multicore embedded systems

Author: A. El-Haj-Mahmoud
B.B. Brandenburg
C. Hung
C. McNairy
E. Brião
F. Cazorla
H. Aydin
J. Donald
L. Zheng
Q. Wu
R. Kalla
R. Ubal
R. Watanabe
S. Kato
T.A. AlEnawy
T.P. Baker
Y. Wei
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Nowadays, a key design issue in embedded systems is how to reduce the power consumption, since batteries have a limited energy budget. For this purpose, several techniques such as Dynamic Voltage Scaling (DVS) or task migration can be used. DVS allows reducing power by selecting the optimal voltage supply, while task migration achieves this effect by balancing the workload among cores. This paper first analyzes the impact on energy due to task migration in multicore embedded systems with DVS capability and using the well-known Worst Fit (WF) partitioning heuristic. To reduce overhead, migrations are only performed at the time that a task arrives to and/or leaves the system and, in such a case, only one migration is allowed. The huge potential on energy saving due to task migration, leads us to propose a new dynamic partitioner, namely DP, that migrates tasks in a more efficient way than typical partitioners. Unlike WF, the proposed algorithm examines which is the optimal target core before allowing a migration. Experimental results show that DP can improve energy consumption in a factor up to 2.74 over the typical WF algorithm. © 2011 Springer-Verlag.This work was supported by Spanish CICYT under Grant TIN2009-14475-C04-01, and by Consolider-Ingenio under Grant CSD2006-00046.March Cabrelles, JL.; Sahuquillo Borrás, J.; Petit Martí, SV.; Hassan Mohamed, H.; Duato Marín, JF. (2011). A dynamic power-aware partitioner with task migration for multicore embedded systems. En Euro-Par 2011 Parallel Processing. Springer Verlag (Germany). 2011(6852):218-229. https://doi.org/10.1007/978-3-642-23400-2_21S21822920116852AlEnawy, T.A., Aydin, H.: Energy-Aware Task Allocation for Rate Monotonic Scheduling. In: Proceedings of the 11th Real Time on Embedded Technology and Applications Symposium, March 7-10, pp. 213–223. IEEE Computer Society, San Francisco (2005)Aydin, H., Yang, Q.: Energy-Aware Partitioning for Multiprocessor Real-Time Systems. In: Proceedings of the 17th International Parallel and Distributed Processing Symposium, Workshop on Parallel and Distributed Real-Time Systems, April 22-26, p. 113. IEEE Computer Society, Nice (2003)Baker, T.P.: An Analysis of EDF schedulability on a multiprocessor. IEEE Transactions on Parallel and Distributed Systems 16(8), 760–768 (2005)Brandenburg, B.B., Calandrino, J.M., Anderson, J.H.: On the Scalability of Real-Time Scheduling Algorithms on Multicore Platforms: A Case Study. In: Proceedings of the 29th Real-Time Systems Symposium, November 30-December 3, pp. 157–169. IEEE Computer Society, Barcelona (2008)Brião, E., Barcelos, D., Wronski, F., Wagner, F.R.: Impact of Task Migration in NoC-based MPSoCs for Soft Real-time Applications. In: Proceedings of the International Conference on VLSI, October 15-17, pp. 296–299. IEEE Computer Society, Atlanta (2007)Cazorla, F., Knijnenburg, P., Sakellariou, R., Fernández, E., Ramirez, A., Valero, M.: Predictable Performance in SMT Processors: Synergy between the OS and SMTs. IEEE Transactions on Computers 55(7), 785–799 (2006)Donald, J., Martonosi, M.: Techniques for Multicore Thermal Management: Classification and New Exploration. In: Proceedings of the 33rd Annual International Symposium on Computer Architecture, June 17-21, pp. 78–88. IEEE Computer Society, Boston (2006)El-Haj-Mahmoud, A., AL-Zawawi, A., Anantaraman, A., Rotenberg, E.: Virtual Multiprocessor: An Analyzable, High-Performance Architecture for Real-Time Computing. In: Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems, September 24-27, pp. 213–224. ACM Press, San Francisco (2005)Hung, C., Chen, J., Kuo, T.: Energy-Efficient Real-Time Task Scheduling for a DVS System with a Non-DVS Processing Element. In: Proceedings of the 27th Real-Time Systems Symposium, December 5-8, pp. 303–312. IEEE Computer Society, Rio de Janeiro (2006)Kalla, R., Sinharoy, B., Tendler, J.M.: IBM Power5 Chip: A Dual-Core Multithreaded Processor. IEEE Micro 24(2), 40–47 (2004)Kato, S., Yamasaki, N.: Global EDF-based Scheduling with Efficient Priority Promotion. In: Proceedings of the 14th International Conference on Embedded and Real-Time Computing Systems and Applications, August 25-27, pp. 197–206. IEEE Computer Society, Kaohisung (2008)Malardalen Real-Time Research Center, Vasteras, Sweden: WCET Analysis Project. WCET Benchmark Programs (2006), [Online], http://www.mrtc.mdh.se/projects/wcet/March, J., Sahuquillo, J., Hassan, H., Petit, S., Duato, J.: A New Energy-Aware Dynamic Task Set Partitioning Algorithm for Soft and Hard Embedded Real-Time Systems. To be published on The Computer Journal (2011)McNairy, C., Bhatia, R.: Montecito: A Dual-Core, Dual-Thread Itanium Processor. IEEE Micro 25(2), 10–20 (2005)Seo, E., Jeong, J., Park, S., Lee, J.: Energy Efficient Scheduling of Real-Time Tasks on Multicore Processors. IEEE Transactions on Parallel and Distributed Systems 19(11), 1540–1552 (2008)Shah, A.: Arm plans to add multithreading to chip design. ITworld (2010), [Online], http://www.itworld.com/hardware/122383/arm-plans-add-multithreading-chip-designUbal, R., Sahuquillo, J., Petit, S., López, P.: Multi2Sim: A Simulation Framework to Evaluate Multicore-Multithreaded Processors. In: Proceedings of the 19th International Symposium on Computer Architecture and High Performance Computing, October 24-27, pp. 62–68. IEEE Computer Society, Gramado (2007)Watanabe, R., Kondo, M., Imai, M., Nakamura, H., Nanya, T.: Task Scheduling under Performance Constraints for Reducing the Energy Consumption of the GALS Multi-Processor SoC. In: Proceedings of the Design Automation and Test in Europe, April 16-20, pp. 797–802. ACM, Nice (2007)Wei, Y., Yang, C., Kuo, T., Hung, S.: Energy-Efficient Real-Time Scheduling of Multimedia Tasks on Multi-Core Processors. In: Proceedings of the 25th Symposium on Applied Computing, March 22-26, pp. 258–262. ACM, Sierre (2010)Wu, Q., Martonosi, M., Clark, D.W., Reddi, V.J., Connors, D., Wu, Y., Lee, J., Brooks, D.: A Dynamic Compilation Framework for Controlling Microprocessor Energy and Performance. In: Proceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, November 12-16, pp. 271–282. IEEE Computer Society, Barcelona (2005)Zheng, L.: A Task Migration Constrained Energy-Efficient Scheduling Algorithm for Multiprocessor Real-time Systems. In: Proceedings of the International Conference on Wireless Communications, Networking and Mobile Computing, September 21-25, pp. 3055–3058. IEEE Computer Society, Shanghai (2007

Crossref

RiuNet

HAPPE: Human and Application-Driven Frequency Scaling for Processor Power Efficiency

Author: L Yang
Lei Yang
Member IEEE Gokhan Memik
Member IEEE Peter Dinda
Member IEEE Robert P Dick
Publication venue
Publication date: 01/01/2013
Field of study

Abstract-Conventional dynamic voltage and frequency scaling techniques use high CPU utilization as a predictor for user dissatisfaction, to which they react by increasing CPU frequency. In this paper, we demonstrate that for many interactive applications, perceived performance is highly dependent upon the particular user and application, and is not linearly related to CPU utilization. This observation reveals an opportunity for reducing power consumption. We propose Human and Application driven frequency scaling for Processor Power Efficiency (HAPPE), an adaptive user-and-application-aware dynamic CPU frequency scaling technique. HAPPE continuously adapts processor frequency and voltage to the learned performance requirement of the current user and application. Adaptation to user requirements is quick and requires minimal effort from the user (typically a handful of key strokes). Once the system has adapted to the user's performance requirements, the user is not required to provide continued feedback but is permitted to provide additional feedback to adjust the control policy to changes in preferences. HAPPE was implemented on a Linux-based laptop and evaluated in 22 hours of controlled user studies. Compared to the default Linux CPU frequency controller, HAPPE reduces the measured system-wide power consumption of CPU-intensive interactive applications by 25 percent on average while maintaining user satisfaction. Index Terms-Power, CPU frequency scaling, user-driven study, mobile systems Ç 1I NTRODUCTION P OWER efficiency has been a major technology driver for battery-powered mobile systems, such as mobile phones, personal digital assistants, MP3 players, and laptops. Power efficiency has also become a new focus for line-powered desktop systems and data centers because of its impact on power dissipation and chip temperature, which affect performance, reliability, and lifetime. Processor power consumption is often a substantial portion of system power consumption in mobile systems Traditional CPU power management approaches can lose sight of an important fact: The ultimate goal of any computer system is to satisfy its users, not to execute a particular number of instructions per second. Although CPU utilization is a good indication of processor performance, the actual perceivable system performance depends on individual users and applications, and user satisfaction is not linearly related to CPU utilization. We conducted a study on 10 users with four interactive applications and found that for some applications, some users are satisfied with system performance when the processor is at the lowest frequency, while other users may not be satisfied even when it operates at the highest frequency. We also found that users may be insensitive to varying processor frequency for one application, but may be very sensitive to such changes for another application. Traditional DVFS policies that consider only CPU utilization or other useroblivious performance metrics are often too pessimistic about user performance requirements, and use a high frequency to satisfy all users, resulting in wasted power. Similar findings were also reported in other studies In this paper, we propose Human and Application driven frequency scaling for Processor Power Efficiency (HAPPE), a CPU DVFS technique that adapts voltage and frequency to the performance requirement of the curren

CiteSeerX

A Dynamic Power-Aware Partitioner with Real-Time Task Migration for Embedded Multicore Processors

Author: March Cabrelles José Luis
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 18/06/2013
Field of study

[ES] Analizar el impacto de permitir que las tareas de tiempo real puedan migrar su ejecución de un core a otro, sobre el consumo en sistemas empotrados multicore.[EN] A major design issue in embedded systems is reducing the power consumption since batteries have a limited energy budget. For this purpose, several techniques such as Dynamic Voltage and Frequency Scaling (DVFS) or task migration are being used. DVFS circuitry allows reducing power by selecting the optimal voltage supply, while task migration achieves this effect by balancing the workload among cores. This work focuses on power-aware scheduling allowing task migration to reduce energy consumption in multicore embedded systems implementing DVFS capabilities. To address energy savings, the devised schedulers follow two main rules: migrations are allowed at specific points of time and only one task is allowed to migrate each time. Two algorithms have been proposed working under real-time constraints. The simpler algorithm, namely, Single Option Migration (SOM) only checks one target core before performing a migration. In contrast, the Multiple Option Migration (MOM) searches the optimal target core. In general, the MOM algorithm achieves better energy savings than the SOM algorithm, although differences are wider for a reduced number of cores and frequency/voltage levels. Moreover, the MOM algorithm reduces energy consumption as much as 40% over the typical Worst Fit (WF) strategy.March Cabrelles, JL. (2012). A Dynamic Power-Aware Partitioner with Real-Time Task Migration for Embedded Multicore Processors. http://hdl.handle.net/10251/29847Archivo delegad

RiuNet

Chapter One – An Overview of Architecture-Level Power- and Energy-Efficient Design Techniques

Author: Bežanić Nikola
Cristal Kestelman Adrián
Milutinović Veljko
Ratkovic Ivan
Unsal Osma S.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Power dissipation and energy consumption became the primary design constraint for almost all computer systems in the last 15 years. Both computer architects and circuit designers intent to reduce power and energy (without a performance degradation) at all design levels, as it is currently the main obstacle to continue with further scaling according to Moore's law. The aim of this survey is to provide a comprehensive overview of power- and energy-efficient “state-of-the-art” techniques. We classify techniques by component where they apply to, which is the most natural way from a designer point of view. We further divide the techniques by the component of power/energy they optimize (static or dynamic), covering in that way complete low-power design flow at the architectural level. At the end, we conclude that only a holistic approach that assumes optimizations at all design levels can lead to significant savings.Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

DynaMOS: Dynamic Schedule Migration for Heterogeneous Cores

Author: Andrew Lukefahr
Reetuparna Das
Scott Mahlke
Shruti Padmanabha
Publication venue
Publication date: 06/03/2020
Field of study

ABSTRACT InOrder (InO) cores achieve limited performance because their inability to dynamically reorder instructions prevents them from exploiting Instruction-Level-Parallelism. Conversely, Out-of-Order (OoO) cores achieve high performance by aggressively speculating past stalled instructions and creating highly optimized issue schedules. It has been observed that these issue schedules tend to repeat for sequences of instructions with predictable control and data-flow. An equally provisioned InO core can potentially achieve OoO's performance at a fraction of the energy cost if provided with an OoO schedule. In the context of a fine-grained heterogeneous multicore system composed of a big (OoO) core and a little (InO) core, we could offload recurring issue schedules from the big to the little core, to achieve energyefficiency while maintaining performance. To this end, we introduce the DynaMOS architecture. Recurring issue schedules may contain instructions that speculate across branches, utilize renamed registers to eliminate false dependencies, and reorder memory operations. DynaMOS provisions little with an OinO mode to replay a speculative schedule while ensuring program correctness. Any divergence from the recorded instruction sequence causes execution to restart in program order from a previously checkpointed state. On a system capable of switching between big and little cores rapidly with low overheads, DynaMOS schedules 38% of execution on the little on average, increasing utilization of the energy-efficient core by 2.9X over prior work. This amounts to energy savings of 32% over execution on only big core, with an allowable 5% performance loss

CiteSeerX

DVFS power management in HPC systems

Author: Etinski Maja
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

Recent increase in performance of High Performance Computing (HPC) systems has been followed by even higher increase in power consumption. Power draw of modern supercomputers leads to very high operating costs and reliability concerns. Furthermore, it has negative consequences on the environment. Accordingly, over the last decade there have been many works dealing with power/energy management in HPC systems. Since CPUs accounts for a high portion of the total system power consumption, our work aims at CPU power reduction. Dynamic Voltage Frequency Scaling (DVFS) is a widely used technique for CPU power management. Running an application at lower frequency/voltage reduces its power consumption. However, frequency scaling should be used carefully since it has negative effects on the application performance. We argue that the job scheduler level presents a good place for power management in an HPC center having in mind that a parallel job scheduler has a global overview of the entire system. In this thesis we propose power-aware parallel job scheduling policies where the scheduler determines the job CPU frequency, besides the job execution order. Based on the goal, the proposed policies can be classified into two groups: energy saving and power budgeting policies. The energy saving policies aim to reduce CPU energy consumption with a minimal job performance penalty. The first of the energy saving policies assigns the job frequency based on system utilization while the other makes job performance predictions. While for less loaded workloads these policies achieve energy savings, highly loaded workloads suffer from a substantial performance degradation because of higher job wait times due to an increase in load caused by longer job run times. Our results show higher potential of the DVFS technique when applied for power budgeting. The second group of policies are policies for power constrained systems. In contrast to the systems without a power limitation, in the case of a given power budget the DVFS technique even improves overall job performance reducing the average job wait time. This comes from a lower job power consumption that allows more jobs to run simultaneously. The first proposed policy from this group assigns CPU frequency using the job predicted performance and current power draw of already running jobs. The other power budgeting policy is based on an optimization problem which solution determines the job execution order, as well as power distribution among jobs selected for execution. This policy fully exploits available power and leads to further performance improvements. The last contribution of the thesis is an analysis of the DVFS technique potential for energyperformance trade-off in current and future HPC systems. Ongoing changes in technology decrease the DVFS applicability for energy savings but the technique still reduces power consumption making it useful for power constrained systems. In order to analyze DVFS potential, a model of frequency scaling impact on MPI application execution time has been proposed and validated against measurements on a large-scale system. This parametric analysis showed for which application/platform characteristic, frequency scaling leads to energy savings.El aumento de rendimiento que han experimentado los sistemas de altas prestaciones ha venido acompañado de un aumento aún mayor en el consumo de energía. El consumo de los supercomputadores actuales implica unos costes muy altos de funcionamiento. Estos costes no tienen simplemente implicaciones a nivel económico sino también implicaciones en el medio ambiente. Dado la importancia del problema, en los últimos tiempos se han realizado importantes esfuerzos de investigación para atacar el problema de la gestión eficiente de la energía que consumen los sistemas de supercomputación. Dado que la CPU supone un alto porcentaje del consumo total de un sistema, nuestro trabajo se centra en la reducción y gestión eficiente de la energía consumida por la CPU. En concreto, esta tesis se centra en la viabilidad de realizar esta gestión mediante la técnica de Dynamic Voltage Frequency Scalingi (DVFS), una técnica ampliamente utilizada con el objetivo de reducir el consumo energético de la CPU. Sin embargo, esta técnica puede implicar una reducción en el rendimiento de las aplicaciones que se ejecutan, ya que implica una reducción de la frecuencia. Si tenemos en cuenta que el contexto de esta tesis son sistemas de alta prestaciones, minimizar el impacto en la pérdida de rendimiento será uno de nuestros objetivos. Sin embargo, en nuestro contexto, el rendimiento de un trabajo viene determinado por dos factores, tiempo de ejecución y tiempo de espera, por lo que habrá que considerar los dos componentes. Los sistemas de supercomputación suelen estar gestionados por sistemas de colas. Los trabajos, dependiendo de la política que se aplique y el estado del sistema, deberán esperar más o menos tiempo antes de ser ejecutado. Dado las características del sistema objetivo de esta tesis, nosotros consideramos que el Planificador de trabajo (o Job Scheduler), es el mejor componente del sistema para incluir la gestión de la energía ya que es el único punto donde se tiene una visión global de todo el sistema. En este trabajo de tesis proponemos un conjunto de políticas de planificación que considerarán el consumo energético como un recurso más. Estas políticas decidirán que trabajo ejecutar, el número de cpus asignadas y la lista de cpus (y nodos) sino también la frecuencia a la que estas cpus se ejecutarán. Estas políticas estarán orientadas a dos objetivos: reducir la energía total consumida por un conjunto de trabajos y controlar en consumo puntual de un conjunto puntual para evitar saturaciones del sistema en aquellos centros que puedan tener una capacidad limitada (permanente o puntual). El primer grupo de políticas intentará reducir el consumo total minimizando el impacto en el rendimiento. En este grupo encontramos una primera política que asigna la frecuencia de las cpus en función de la utilización del sistema y una segunda que calcula una estimación de la penalización que sufrirá el trabajo que va a empezar para decidir si reducir o no la frecuencia. Estas políticas han mostrado unos resultados aceptables con sistemas poco cargados, pero han mostrado unas pérdidas de rendimiento significativas cuando el sistema está muy cargado. Estas pérdidas de rendimiento no han sido a nivel de incremento significativo del tiempo de ejecución de los trabajos, pero sí de las métricas de rendimiento que incluyen el tiempo de espera de los trabajos (habituales en este contexto). El segundo grupo de políticas, orientadas a sistemas con limitaciones en cuanto a la potencia que pueden consumir, han mostrado un gran potencial utilizando DVFS como mecanismo de gestión. En este caso, comparado con un sistema que no incluya esta gestión, han demostrado mejoras en el rendimiento ya que permiten ejecutar más trabajos de forma simultánea, reduciendo significativamente el tiempo de espera de los trabajos. En este segundo grupo proponemos una política basada en el rendimiento del trabajo que se va a ejecutar y una segunda que considera la asignación de todos los recursos como un problema de optimización lineal. Esta última política es la contribución más importante de la tesis ya que demuestra un buen comportamiento en todos los casos evaluados. La última contribución de la tesis es un estudio del potencial de DVFS como técnica de gestión de la energía en un futuro próximo, en función de un estudio de las características de las aplicaciones, de la reducción de DVFS en el consumo de la CPU y del peso de la CPU dentro de todo el sistema. Este estudio indica que la capacidad de DVFS de ahorrar energía será limitado pero sigue mostrando un gran potencial de cara al control del consumo energético

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Hardware-Oriented Cache Management for Large-Scale Chip Multiprocessors

Author: Hammoud Mohammad Hussein
Publication venue
Publication date: 30/09/2010
Field of study

One of the key requirements to obtaining high performance from chip multiprocessors (CMPs) is to effectively manage the limited on-chip cache resources shared among co-scheduled threads/processes. This thesis proposes new hardware-oriented solutions for distributed CMP caches. Computer architects are faced with growing challenges when designing cache systems for CMPs. These challenges result from non-uniform access latencies, interference misses, the bandwidth wall problem, and diverse workload characteristics. Our exploration of the CMP cache management problem suggests a CMP caching framework (CC-FR) that defines three main approaches to solve the problem: (1) data placement, (2) data retention, and (3) data relocation. We effectively implement CC-FR's components by proposing and evaluating multiple cache management mechanisms.Pressure and Distance Aware Placement (PDA) decouples the physical locations of cache blocks from their addresses for the sake of reducing misses caused by destructive interferences. Flexible Set Balancing (FSB), on the other hand, reduces interference misses via extending the life time of cache lines through retaining some fraction of the working set at underutilized local sets to satisfy far-flung reuses. PDA implements CC-FR's data placement and relocation components and FSB applies CC-FR's retention approach.To alleviate non-uniform access latencies and adapt to phase changes in programs, Adaptive Controlled Migration (ACM) dynamically and periodically promotes cache blocks towards L2 banks close to requesting cores. ACM lies under CC-FR's data relocation category. Dynamic Cache Clustering (DCC), on the other hand, addresses diverse workload characteristics and growing non-uniform access latencies challenges via constructing a cache cluster for each core and expands/contracts all clusters synergistically to match each core's cache demand. DCC implements CC-FR's data placement and relocation approaches. Lastly, Dynamic Pressure and Distance Aware Placement (DPDA) combines PDA and ACM to cooperatively mitigate interference misses and non-uniform access latencies. Dynamic Cache Clustering and Balancing (DCCB), on the other hand, combines DCC and FSB to employ all CC-FR's categories and achieve higher system performance. Simulation results demonstrate the effectiveness of the proposed mechanisms and show that they compare favorably with related cache designs

D-Scholarship@Pitt

Developing New Power Management and High-Reliability Schemes in Data-Intensive Environment

Author: Wang Ruijun
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2016
Field of study

With the increasing popularity of data-intensive applications as well as the large-scale computing and storage systems, current data centers and supercomputers are often dealing with extremely large data-sets. To store and process this huge amount of data reliably and energy-efficiently, three major challenges should be taken into consideration for the system designers. Firstly, power conservation–Multicore processors or CMPs have become a mainstream in the current processor market because of the tremendous improvement in transistor density and the advancement in semiconductor technology. However, the increasing number of transistors on a single die or chip reveals a super-linear growth in power consumption [4]. Thus, how to balance system performance and power-saving is a critical issue which needs to be solved effectively. Secondly, system reliability–Reliability is a critical metric in the design and development of replication-based big data storage systems such as Hadoop File System (HDFS). In the system with thousands machines and storage devices, even in-frequent failures become likely. In Google File System, the annual disk failure rate is 2:88%,which means you were expected to see 8,760 disk failures in a year. Unfortunately, given an increasing number of node failures, how often a cluster starts losing data when being scaled out is not well investigated. Thirdly, energy efficiency–The fast processing speeds of the current generation of supercomputers provide a great convenience to scientists dealing with extremely large data sets. The next generation of exascale supercomputers could provide accurate simulation results for the automobile industry, aerospace industry, and even nuclear fusion reactors for the very first time. However, the energy cost of super-computing is extremely high, with a total electricity bill of 9 million dollars per year. Thus, conserving energy and increasing the energy efficiency of supercomputers has become critical in recent years. This dissertation proposes new solutions to address the above three key challenges for current large-scale storage and computing systems. Firstly, we propose a novel power management scheme called MAR (model-free, adaptive, rule-based) in multiprocessor systems to minimize the CPU power consumption subject to performance constraints. By introducing new I/O wait status, MAR is able to accurately describe the relationship between core frequencies, performance and power consumption. Moreover, we adopt a model-free control method to filter out the I/O wait status from the traditional CPU busy/idle model in order to achieve fast responsiveness to burst situations and take full advantage of power saving. Our extensive experiments on a physical testbed demonstrate that, for SPEC benchmarks and data-intensive (TPC-C) benchmarks, an MAR prototype system achieves 95.8-97.8% accuracy of the ideal power saving strategy calculated offline. Compared with baseline solutions, MAR is able to save 12.3-16.1% more power while maintain a comparable performance loss of about 0.78-1.08%. In addition, more simulation results indicate that our design achieved 3.35-14.2% more power saving efficiency and 4.2-10.7% less performance loss under various CMP configurations as compared with various baseline approaches such as LAST, Relax, PID and MPC. Secondly, we create a new reliability model by incorporating the probability of replica loss to investigate the system reliability of multi-way declustering data layouts and analyze their potential parallel recovery possibilities. Our comprehensive simulation results on Matlab and SHARPE show that the shifted declustering data layout outperforms the random declustering layout in a multi-way replication scale-out architecture, in terms of data loss probability and system reliability by upto 63% and 85% respectively. Our study on both 5-year and 10-year system reliability equipped with various recovery bandwidth settings shows that, the shifted declustering layout surpasses the two baseline approaches in both cases by consuming up to 79 % and 87% less recovery bandwidth for copyset, as well as 4.8% and 10.2% less recovery bandwidth for random layout. Thirdly, we develop a power-aware job scheduler by applying a rule based control method and taking into account real world power and speedup profiles to improve power efficiency while adhering to predetermined power constraints. The intensive simulation results shown that our proposed method is able to achieve the maximum utilization of computing resources as compared to baseline scheduling algorithms while keeping the energy cost under the threshold. Moreover, by introducing a Power Performance Factor (PPF) based on the real world power and speedup profiles, we are able to increase the power efficiency by up to 75%

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)