11,484 research outputs found

    Garbage collection auto-tuning for Java MapReduce on Multi-Cores

    Get PDF
    MapReduce has been widely accepted as a simple programming pattern that can form the basis for efficient, large-scale, distributed data processing. The success of the MapReduce pattern has led to a variety of implementations for different computational scenarios. In this paper we present MRJ, a MapReduce Java framework for multi-core architectures. We evaluate its scalability on a four-core, hyperthreaded Intel Core i7 processor, using a set of standard MapReduce benchmarks. We investigate the significant impact that Java runtime garbage collection has on the performance and scalability of MRJ. We propose the use of memory management auto-tuning techniques based on machine learning. With our auto-tuning approach, we are able to achieve MRJ performance within 10% of optimal on 75% of our benchmark tests

    Distributed Training Large-Scale Deep Architectures

    Full text link
    Scale of data and scale of computation infrastructures together enable the current deep learning renaissance. However, training large-scale deep architectures demands both algorithmic improvement and careful system configuration. In this paper, we focus on employing the system approach to speed up large-scale training. Via lessons learned from our routine benchmarking effort, we first identify bottlenecks and overheads that hinter data parallelism. We then devise guidelines that help practitioners to configure an effective system and fine-tune parameters to achieve desired speedup. Specifically, we develop a procedure for setting minibatch size and choosing computation algorithms. We also derive lemmas for determining the quantity of key components such as the number of GPUs and parameter servers. Experiments and examples show that these guidelines help effectively speed up large-scale deep learning training

    ACTS in Need: Automatic Configuration Tuning with Scalability Guarantees

    Full text link
    To support the variety of Big Data use cases, many Big Data related systems expose a large number of user-specifiable configuration parameters. Highlighted in our experiments, a MySQL deployment with well-tuned configuration parameters achieves a peak throughput as 12 times much as one with the default setting. However, finding the best setting for the tens or hundreds of configuration parameters is mission impossible for ordinary users. Worse still, many Big Data applications require the support of multiple systems co-deployed in the same cluster. As these co-deployed systems can interact to affect the overall performance, they must be tuned together. Automatic configuration tuning with scalability guarantees (ACTS) is in need to help system users. Solutions to ACTS must scale to various systems, workloads, deployments, parameters and resource limits. Proposing and implementing an ACTS solution, we demonstrate that ACTS can benefit users not only in improving system performance and resource utilization, but also in saving costs and enabling fairer benchmarking

    A Review on Energy Consumption Optimization Techniques in IoT Based Smart Building Environments

    Get PDF
    In recent years, due to the unnecessary wastage of electrical energy in residential buildings, the requirement of energy optimization and user comfort has gained vital importance. In the literature, various techniques have been proposed addressing the energy optimization problem. The goal of each technique was to maintain a balance between user comfort and energy requirements such that the user can achieve the desired comfort level with the minimum amount of energy consumption. Researchers have addressed the issue with the help of different optimization algorithms and variations in the parameters to reduce energy consumption. To the best of our knowledge, this problem is not solved yet due to its challenging nature. The gap in the literature is due to the advancements in the technology and drawbacks of the optimization algorithms and the introduction of different new optimization algorithms. Further, many newly proposed optimization algorithms which have produced better accuracy on the benchmark instances but have not been applied yet for the optimization of energy consumption in smart homes. In this paper, we have carried out a detailed literature review of the techniques used for the optimization of energy consumption and scheduling in smart homes. The detailed discussion has been carried out on different factors contributing towards thermal comfort, visual comfort, and air quality comfort. We have also reviewed the fog and edge computing techniques used in smart homes

    Situational Intelligence for Improving Power System Operations Under High Penetration of Photovoltaics

    Get PDF
    Nowadays, power grid operators are experiencing challenges and pressures to balance the interconnected grid frequency with rapidly increasing photovoltaic (PV) power penetration levels. PV sources are variable and intermittent. To mitigate the effect of this intermittency, power system frequency is regulated towards its security limits. Under aforementioned stressed regimes, frequency oscillations are inevitable, especially during disturbances and may lead to costly consequences as brownout or blackout. Hence, the power system operations need to be improved to make the appropriate decision in time. Specifically, concurrent or beforehand power system precise frequencies simplified straightforward-to-comprehend power system visualizations and cooperated well-performed automatic generation controls (AGC) for multiple areas are needed for operation centers to enhance. The first study in this dissertation focuses on developing frequency prediction general structures for PV and phasor measurement units integrated electric grids to improve the situational awareness (SA) of the power system operation center in making normal and emergency decisions ahead of time. Thus, in this dissertation, a frequency situational intelligence (FSI) methodology capable of multi-bus type and multi-timescale prediction is presented based on the cellular computational network (CCN) structure with a multi-layer proception (MLP) and a generalized neuron (GN) algorithms. The results present that both CCMLPN and CCGNN can provide precise multi-timescale frequency predictions. Moreover, the CCGNN has a superior performance than the CCMLPN. The second study of this dissertation is to improve the SA of the operation centers by developing the online visualization tool based on the synchronous generator vulnerability index (GVI) and the corresponding power system vulnerability index (SVI) considering dynamic PV penetration. The GVI and SVI are developed by the coherency grouping results of synchronous generator using K-Harmonic Means Clustering (KHMC) algorithm. Furthermore, the CCGNN based FSI method has been implemented for the online coherency grouping procedure to achieve a faster-than-real-time grouping performance. Last but not the least, the multi-area AGCs under different PV integrated power system operating conditions are investigated on the multi-area multi-source interconnected testbed, especially with severe load disturbances. Furthermore, an onward asynchronous tuning method and a two-step (synchronous) tuning method utilizing particle swarm optimization algorithm are developed to refine the multi-area AGCs, which provide more opportunities for power system balancing authorities to interconnect freely and to utilize more PV power. In summary, a number of methods for improving the interconnected power system situational intelligence for a high level of PV power penetration have been presented in this dissertation

    Cloud Storage Level Service Offering in Virtualized Load Balancer using AWS

    Get PDF
    Distributed computing epitomizes an approach perfectly suited to the realm of IT commitments, leveraging the aggregation of information and resources through electronic cloud service providers utilizing interconnected hardware and software primarily based online, all at a reasonable cost. However, resource sharing can lead to challenges in their accessibility, potentially causing system crashes. To counter this, the technique of distributing network traffic across multiple servers, known as load balancing, plays a pivotal role. This paper ensures that no single server is overwhelmed, thereby preventing overloads and enhancing user responsiveness by equitably distributing tasks. Moreover, it significantly enhances the accessibility of tasks and websites to users. The fundamental objective of this concept is to comprehend load regulation, which operates in tandem with associated frameworks within communication structures like the Web. Load balancing stands as a critical domain within distributed computing, designed to prevent overburdening and to provide equally significant support. Various algorithms are employed to assess the system's complexity. In our proposed strategy, a process is outlined to determine optimal storage space utilization in real-time, utilizing 100 virtual computers, achieving an impressive 92% accuracy rate in its computations. This innovative approach promises efficient resource allocation within the distributed computing framework, thereby optimizing performance and accessibility for end-users

    Gunrock: GPU Graph Analytics

    Full text link
    For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library. "Gunrock", our graph-processing system designed specifically for the GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on operations on a vertex or edge frontier. Gunrock achieves a balance between performance and expressiveness by coupling high performance GPU computing primitives and optimization strategies with a high-level programming model that allows programmers to quickly develop new graph primitives with small code size and minimal GPU programming knowledge. We characterize the performance of various optimization strategies and evaluate Gunrock's overall performance on different GPU architectures on a wide range of graph primitives that span from traversal-based algorithms and ranking algorithms, to triangle counting and bipartite-graph-based algorithms. The results show that on a single GPU, Gunrock has on average at least an order of magnitude speedup over Boost and PowerGraph, comparable performance to the fastest GPU hardwired primitives and CPU shared-memory graph libraries such as Ligra and Galois, and better performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing (TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance Graph Processing Library on the GPU
    corecore