209 research outputs found

    A performance study on dynamic load balancing algorithms.

    Get PDF
    by Sau-ming Lau.Thesis (M.Phil.)--Chinese University of Hong Kong, 1995.Includes bibliographical references (leaves 131-134).Abstract --- p.iAcknowledgement --- p.iiiList of Tables --- p.viiiList of Figures --- p.xChapter 1 --- Introduction --- p.1Chapter 2 --- Basic Concepts and Related Work --- p.9Chapter 2.1 --- Components of Dynamic Load Balancing Algorithms --- p.10Chapter 2.2 --- Classification of Load Balancing Algorithms --- p.11Chapter 2.2.1 --- Casavant and Kuhl's Taxonomy --- p.12Chapter 3 --- System Model and Assumptions --- p.19Chapter 3.1 --- The System Model and Assumptions --- p.19Chapter 3.2 --- Survey on Cost Models --- p.21Chapter 3.2.1 --- "Eager, Lazowska, and Zahorjan's Model" --- p.22Chapter 3.2.2 --- "Shivaratri, Krueger, and Singhal's Model" --- p.23Chapter 3.3 --- Our Cost Model --- p.24Chapter 3.3.1 --- Design Philosophy --- p.24Chapter 3.3.2 --- Polling Query Cost Model --- p.25Chapter 3.3.3 --- Load State Broadcasting Cost Model --- p.26Chapter 3.3.4 --- Task Assignment Cost Model --- p.27Chapter 3.3.5 --- Task Migration Cost Model --- p.28Chapter 3.3.6 --- Execution Priority --- p.29Chapter 3.3.7 --- Simulation Parameter Values --- p.31Chapter 3.4 --- Performance Metrics --- p.33Chapter 4 --- A Performance Study on Load Information Dissemination Strategies --- p.36Chapter 4.1 --- Algorithm Descriptions --- p.37Chapter 4.1.1 --- Transfer Policy --- p.37Chapter 4.1.2 --- Information Policy --- p.40Chapter 4.1.3 --- Location Policy --- p.40Chapter 4.1.4 --- Categorization of the Algorithms --- p.43Chapter 4.2 --- Simulations and Analysis of Results --- p.43Chapter 4.2.1 --- Performance Comparisons --- p.44Chapter 4.2.2 --- Effect of Imbalance Factor on AWLT Algorithms --- p.49Chapter 4.2.3 --- Comparison of Average Performance --- p.52Chapter 4.2.4 --- Raw Simulation Results --- p.54Chapter 4.3 --- Discussions --- p.55Chapter 5 --- Resolving Processor Thrashing with Batch Assignment --- p.56Chapter 5.1 --- The GR.batch Algorithm --- p.57Chapter 5.1.1 --- The Guarantee and Reservation Protocol --- p.57Chapter 5.1.2 --- The Location Policy --- p.58Chapter 5.1.3 --- Batch Size Determination --- p.60Chapter 5.1.4 --- The Complete GR.batch Description --- p.62Chapter 5.2 --- Additional Performance Metrics --- p.66Chapter 5.3 --- Simulations and Analysis of Results --- p.67Chapter 5.4 --- Discussions --- p.73Chapter 6 --- Applying Batch Assignment to Systems with Bursty Task Arrival Patterns --- p.75Chapter 6.1 --- Bursty Workload Pattern Characterization Model --- p.76Chapter 6.2 --- Algorithm Descriptions --- p.77Chapter 6.2.1 --- The GR.batch Algorithm --- p.77Chapter 6.2.2 --- The SK .single Algorithm --- p.77Chapter 6.2.3 --- Summary of Algorithm Properties --- p.77Chapter 6.3 --- Analysis of Simulation Results --- p.77Chapter 6.3.1 --- Performance Comparison --- p.79Chapter 6.3.2 --- Time Trace --- p.80Chapter 6.4 --- Discussions --- p.80Chapter 7 --- A Preliminary Study on Task Assignment Augmented with Migration --- p.87Chapter 7.1 --- Algorithm Descriptions --- p.87Chapter 7.1.1 --- Information Policy --- p.88Chapter 7.1.2 --- Location Policy --- p.88Chapter 7.1.3 --- Transfer Policy --- p.88Chapter 7.1.4 --- The Three Load Balancing Algorithms --- p.89Chapter 7.2 --- Simulations and Analysis of Results --- p.90Chapter 7.2.1 --- Even Task Service Time --- p.90Chapter 7.2.2 --- Uneven Task Service Time --- p.94Chapter 7.3 --- Discussions --- p.99Chapter 8 --- Assignment Augmented with Migration Revisited --- p.100Chapter 8.1 --- Algorithm Descriptions --- p.100Chapter 8.1.1 --- The GR.BATCH.A Algorithm --- p.101Chapter 8.1.2 --- The SK.SINGLE.AM Algorithm --- p.101Chapter 8.1.3 --- Summary of Algorithm Properties --- p.101Chapter 8.2 --- Simulations and Analysis of Results --- p.101Chapter 8.2.1 --- Performance Comparisons --- p.102Chapter 8.2.2 --- Effect of Workload Imbalance --- p.105Chapter 8.3 --- Discussions --- p.106Chapter 9 --- Applying Batch Transfer to Heterogeneous Systems with Many Task Classes --- p.108Chapter 9.1 --- Heterogeneous System Model --- p.109Chapter 9.1.1 --- Processing Node Specification --- p.110Chapter 9.1.2 --- Task Type Specification --- p.111Chapter 9.1.3 --- Workload State Measurement --- p.112Chapter 9.1.4 --- Task Selection Candidates --- p.113Chapter 9.2 --- Algorithm Descriptions --- p.115Chapter 9.2.1 --- First Category ´ؤ The Sk .single Variations --- p.115Chapter 9.2.2 --- Second Category ´ؤ The GR. batch Variation Modeled with SSP --- p.117Chapter 9.3 --- Analysis of Simulation Results --- p.123Chapter 10 --- Conclusions and Future Work --- p.127Bibliography --- p.131Appendix A System Model Notations and Definitions --- p.131Appendix A.1 Processing Node Model --- p.131Appendix A.2 Cost Models --- p.132Appendix A.3 Load Measurement --- p.134Appendix A.4 Batch Size Determination Rules --- p.135Appendix A.5 Bursty Arrivals Modeling --- p.135Appendix A.6 Heterogeneous Systems Modeling --- p.135Appendix B Shivaratri and Krueger's Location Policy --- p.13

    E-EON : Energy-Efficient and Optimized Networks for Hadoop

    Get PDF
    Energy efficiency and performance improvements have been two of the major concerns of current Data Centers. With the advent of Big Data, more information is generated year after year, and even the most aggressive predictions of the largest network equipment manufacturer have been surpassed due to the non-stop growing network traffic generated by current Big Data frameworks. As, currently, one of the most famous and discussed frameworks designed to store, retrieve and process the information that is being consistently generated by users and machines, Hadoop has gained a lot of attention from the industry in recent years and presently its name describes a whole ecosystem designed to tackle the most varied requirements of today’s cloud applications. This thesis relates to Hadoop clusters, mainly focused on their interconnects, which is commonly considered to be the bottleneck of such ecosystem. We conducted research focusing on energy efficiency and also on performance optimizations as improvements on cluster throughput and network latency. Regarding the energy consumption, a significant proportion of a data center's energy consumption is caused by the network, which stands for 12% of the total system power at full load. With the non-stop growing network traffic, it is desired by industry and academic community that network energy consumption should be proportional to its utilization. Considering cluster performance, although Hadoop is a network throughput-sensitive workload with less stringent requirements for network latency, there is an increasing interest in running batch and interactive workloads concurrently on the same cluster. Doing so maximizes system utilization, to obtain the greatest benefits from the capital and operational expenditures. For this to happen, cluster throughput should not be impacted when network latency is minimized. The two biggest challenges faced during the development of this thesis were related to achieving near proportional energy consumption for the interconnects and also improving the network latency found on Hadoop clusters, while having virtually no loss on cluster throughput. Such challenges led to comparable sized opportunity: proposing new techniques that must solve such problems from the current generation of Hadoop clusters. We named E-EON the set of techniques presented in this work, which stands for Energy Efficient and Optimized Networks for Hadoop. E-EON can be used to reduce the network energy consumption and yet, to reduce network latency while cluster throughput is improved at the same time. Furthermore, such techniques are not exclusive to Hadoop and they are also expected to have similar benefits if applied to any other Big Data framework infrastructure that fits the problem characterization we presented throughout this thesis. With E-EON we were able to reduce the energy consumption by up to 80% compared to the state-of-the art technique. We were also able to reduce network latency by up to 85% and in some cases, even improve cluster throughput by 10%. Although these were the two major accomplishment from this thesis, we also present minor benefits which translate to easier configuration compared to the stat-of-the-art techniques. Finally, we enrich the discussions found in this thesis with recommendations targeting network administrators and network equipment manufacturers.La eficiencia energética y las mejoras de rendimiento han sido dos de las principales preocupaciones de los Data Centers actuales. Con el arribo del Big Data, se genera más información año con año, incluso las predicciones más agresivas de parte del mayor fabricante de dispositivos de red se han superado debido al continuo tráfico de red generado por los sistemas de Big Data. Actualmente, uno de los más famosos y discutidos frameworks desarrollado para almacenar, recuperar y procesar la información generada consistentemente por usuarios y máquinas, Hadoop acaparó la atención de la industria en los últimos años y actualmente su nombre describe a todo un ecosistema diseñado para abordar los requisitos más variados de las aplicaciones actuales de Cloud Computing. Esta tesis profundiza sobre los clusters Hadoop, principalmente enfocada a sus interconexiones, que comúnmente se consideran el cuello de botella de dicho ecosistema. Realizamos investigaciones centradas en la eficiencia energética y también en optimizaciones de rendimiento como mejoras en el throughput de la infraestructura y de latencia de la red. En cuanto al consumo de energía, una porción significativa de un Data Center es causada por la red, representada por el 12 % de la potencia total del sistema a plena carga. Con el tráfico constantemente creciente de la red, la industria y la comunidad académica busca que el consumo energético sea proporcional a su uso. Considerando las prestaciones del cluster, a pesar de que Hadoop mantiene una carga de trabajo sensible al rendimiento de red aunque con requisitos menos estrictos sobre la latencia de la misma, existe un interés creciente en ejecutar aplicaciones interactivas y secuenciales de manera simultánea sobre dicha infraestructura. Al hacerlo, se maximiza la utilización del sistema para obtener los mayores beneficios al capital y gastos operativos. Para que esto suceda, el rendimiento del sistema no puede verse afectado cuando se minimiza la latencia de la red. Los dos mayores desafíos enfrentados durante el desarrollo de esta tesis estuvieron relacionados con lograr un consumo energético cercano a la cantidad de interconexiones y también a mejorar la latencia de red encontrada en los clusters Hadoop al tiempo que la perdida del rendimiento de la infraestructura es casi nula. Dichos desafíos llevaron a una oportunidad de tamaño semejante: proponer técnicas novedosas que resuelven dichos problemas a partir de la generación actual de clusters Hadoop. Llamamos a E-EON (Energy Efficient and Optimized Networks) al conjunto de técnicas presentadas en este trabajo. E-EON se puede utilizar para reducir el consumo de energía y la latencia de la red al mismo tiempo que el rendimiento del cluster se mejora. Además tales técnicas no son exclusivas de Hadoop y también se espera que tengan beneficios similares si se aplican a cualquier otra infraestructura de Big Data que se ajuste a la caracterización del problema que presentamos a lo largo de esta tesis. Con E-EON pudimos reducir el consumo de energía hasta en un 80% en comparación con las técnicas encontradas en la literatura actual. También pudimos reducir la latencia de la red hasta en un 85% y, en algunos casos, incluso mejorar el rendimiento del cluster en un 10%. Aunque estos fueron los dos principales logros de esta tesis, también presentamos beneficios menores que se traducen en una configuración más sencilla en comparación con las técnicas más avanzadas. Finalmente, enriquecimos las discusiones encontradas en esta tesis con recomendaciones dirigidas a los administradores de red y a los fabricantes de dispositivos de red

    Non-Metaheuristic Clustering Algorithms for Energy-Efficient Cooperative Communication in Wireless Sensor Networks: A Comparative Study

    Get PDF
     Wireless Sensor Networks (WSNs) are now considered a vital technology that enables the gathering and distribution of data in various applications, such as environmental monitoring and industrial automation. Nevertheless, the finite energy resources of sensor nodes pose significant obstacles to the long-term viability and effectiveness of these networks. Researchers have developed and studied various non-meta algorithms to improve energy efficiency, data transfer, and network lifespan. These efforts contribute to enhancing cooperative communication modules. This analysis conducts a detailed examination and comparative evaluation of different well-known clustering methods in the field of Wireless Sensor Networks (WSNs), providing significant insights for improving cooperative communication. Our purpose is to provide a comprehensive perspective on the contributions of these algorithms to improving energy efficiency in WSNs. This will be achieved by examining their practical implementations, underlying mathematical principles, strengths, shortcomings, real-world applications, and potential for further improvement

    Observable dynamic compilation

    Get PDF
    Managed language platforms such as the Java Virtual Machine rely on a dynamic compiler to achieve high performance. Despite the benefits that dynamic compilation provides, it also introduces some challenges to program profiling. Firstly, profilers based on bytecode instrumentation may yield wrong results in the presence of an optimizing dynamic compiler, either due to not being aware of optimizations, or because the inserted instrumentation code disrupts such optimizations. To avoid such perturbations, we present a technique to make profilers based on bytecode instrumentation aware of the optimizations performed by the dynamic compiler, and make the dynamic compiler aware of the inserted code. We implement our technique for separating inserted instrumentation code from base-program code in Oracle's Graal compiler, integrating our extension into the OpenJDK Graal project. We demonstrate its significance with concrete profilers. On the one hand, we improve accuracy of existing profiling techniques, for example, to quantify the impact of escape analysis on bytecode-level allocation profiling, to analyze object life-times, and to evaluate the impact of method inlining when profiling method invocations. On the other hand, we also illustrate how our technique enables new kinds of profilers, such as a profiler for non-inlined callsites, and a testing framework for locating performance bugs in dynamic compiler implementations. Secondly, the lack of profiling support at the intermediate representation (IR) level complicates the understanding of program behavior in the compiled code. This issue cannot be addressed by bytecode instrumentation because it cannot precisely capture the occurrence of IR-level operations. Binary instrumentation is not suited either, as it lacks a mapping from the collected low-level metrics to higher-level operations of the observed program. To fill this gap, we present an easy-to-use event-based framework for profiling operations at the IR level. We integrate the IR profiling framework in the Graal compiler, together with our instrumentation-separation technique. We illustrate our approach with a profiler that tracks the execution of memory barriers within compiled code. In addition, using a deoptimization profiler based on our IR profiling framework, we conduct an empirical study on deoptimization in the Graal compiler. We focus on situations which cause program execution to switch from machine code to the interpreter, and compare application performance using three different deoptimization strategies which influence the amount of extra compilation work done by Graal. Using an adaptive deoptimization strategy, we manage to improve the average start-up performance of benchmarks from the DaCapo, ScalaBench, and Octane suites by avoiding wasted compilation work. We also find that different deoptimization strategies have little impact on steady- state performance

    Performance Analysis of a Dynamic Bandwidth Allocation Algorithm in a Circuit-Switched Communications Network

    Get PDF
    Military communications networks typically employ a gateway multiplexer to aggregate all communications traffic onto a single link. These multiplexers typically use a static bandwidth allocation method via time-division multiplexing (TDM). Inefficiencies occur when a high-bandwidth circuit, e.g., a video teleconferencing circuit, is relatively inactive rendering a considerable portion of the aggregate bandwidth wasted while inactive. Dynamic bandwidth allocation (DBA) reclaims unused bandwidth from circuits with low utilization and reallocates it to circuits with higher utilization without adversely affecting queuing delay. The proposed DBA algorithm developed here measures instantaneous utilization by counting frames arriving during the transmission time of a single frame on the aggregate link. The maximum calculated utilization observed over a monitoring period is then used to calculate the bandwidth available for reallocation. A key advantage of the proposed approach is that it can be applied now and to existing systems supporting heterogeneous permanent virtual circuits. With the inclusion of DBA, military communications networks can bring information to the warfighter more efficiently and in a shorter time even for small bandwidths allocated to deployed sites. The algorithm is general enough to be applied to multiple TDM platforms and robust enough to function at any line speed, making it a viable option for high-speed multiplexers. The proposed DBA algorithm provides a powerful performance boost by optimizing available resources of the communications network. Utilization results indicate the proposed DBA algorithm significantly out-performs the static allocation model in all cases. The best configuration uses a 65536 bps allocation granularity and a 10 second monitoring period. Utilization gains observed with this configuration were almost 17% over the static allocation method. Queuing delays increased by 50% but remained acceptable, even for realtime traffic

    Minimizing queueing delays in computer networks

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Performance Analysis of NAND Flash Memory Solid-State Disks

    Get PDF
    As their prices decline, their storage capacities increase, and their endurance improves, NAND Flash Solid-State Disks (SSD) provide an increasingly attractive alternative to Hard Disk Drives (HDD) for portable computing systems and PCs. HDDs have been an integral component of computing systems for several decades as long-term, non-volatile storage in memory hierarchy. Today's typical hard disk drive is a highly complex electro-mechanical system which is a result of decades of research, development, and fine-tuned engineering. Compared to HDD, flash memory provides a simpler interface, one without the complexities of mechanical parts. On the other hand, today's typical solid-state disk drive is still a complex storage system with its own peculiarities and system problems. Due to lack of publicly available SSD models, we have developed our NAND flash SSD models and integrated them into DiskSim, which is extensively used in academe in studying storage system architectures. With our flash memory simulator, we model various solid-state disk architectures for a typical portable computing environment, quantify their performance under real user PC workloads and explore potential for further improvements. We find the following: * The real limitation to NAND flash memory performance is not its low per-device bandwidth but its internal core interface. * NAND flash memory media transfer rates do not need to scale up to those of HDDs for good performance. * SSD organizations that exploit concurrency at both the system and device level improve performance significantly. * These system- and device-level concurrency mechanisms are, to a significant degree, orthogonal: that is, the performance increase due to one does not come at the expense of the other, as each exploits a different facet of concurrency exhibited within the PC workload. * SSD performance can be further improved by implementing flash-oriented queuing algorithms, access reordering, and bus ordering algorithms which exploit the flash memory interface and its timing differences between read and write requests

    Energy Efficient Ethernet on MapReduce Clusters: Packet Coalescing To Improve 10GbE Links

    Get PDF
    An important challenge of modern data centers is to reduce energy consumption, of which a substantial proportion is due to the network. Switches and NICs supporting the recent energy efficient Ethernet (EEE) standard are now available, but current practice is to disable EEE in production use, since its effect on real world application performance is poorly understood. This paper contributes to this discussion by analyzing the impact of EEE on MapReduce workloads, in terms of performance overheads and energy savings. MapReduce is the central programming model of Apache Hadoop, one of the most widely used application frameworks in modern data centers. We find that, while 1GbE links (edge links) achieve good energy savings using the standard EEE implementation, optimum energy savings in the 10 GbE links (aggregation and core links) are only possible, if these links employ packet coalescing. Packet coalescing must, however, be carefully configured in order to avoid excessive performance degradation. With our new analysis of how the static parameters of packet coalescing perform under different cluster loads, we were able to cover both idle and heavy load periods that can exist on this type of environment. Finally, we evaluate our recommendation for packet coalescing for 10 GbE links using the energy-delay metric. This paper is an extension of our previous work [1], which was published in the Proceedings of the 40th Annual IEEE Conference on Local Computer Networks (LCN 2015).This work was supported in part by the European Union’s Seventh Framework Programme (FP7/2007-2013) under Grant 610456 (EUROSERVER), in part by the Spanish Government through the Severo Ochoa programme (SEV-2011-00067 and SEV-2015-0493), in part by the Spanish Ministry of Economy a nd Competitiveness under Contract TIN2012-34557 and Contract TIN2015-65316-P, and in part by the Generalitat de Catalunya under Contract 2014-SGR-1051 and Contract 2014-SGR-1272.Peer ReviewedPostprint (author's final draft
    corecore