11,484 research outputs found
Garbage collection auto-tuning for Java MapReduce on Multi-Cores
MapReduce has been widely accepted as a simple programming pattern that can form the basis for efficient, large-scale, distributed data processing. The success of the MapReduce pattern has led to a variety of implementations for different computational scenarios. In this paper we present MRJ, a MapReduce Java framework for multi-core architectures. We evaluate its scalability on a four-core, hyperthreaded Intel Core i7 processor, using a set of standard MapReduce benchmarks. We investigate the significant impact that Java runtime garbage collection has on the performance and scalability of MRJ. We propose the use of memory management auto-tuning techniques based on machine learning. With our auto-tuning approach, we are able to achieve MRJ performance within 10% of optimal on 75% of our benchmark tests
Distributed Training Large-Scale Deep Architectures
Scale of data and scale of computation infrastructures together enable the
current deep learning renaissance. However, training large-scale deep
architectures demands both algorithmic improvement and careful system
configuration. In this paper, we focus on employing the system approach to
speed up large-scale training. Via lessons learned from our routine
benchmarking effort, we first identify bottlenecks and overheads that hinter
data parallelism. We then devise guidelines that help practitioners to
configure an effective system and fine-tune parameters to achieve desired
speedup. Specifically, we develop a procedure for setting minibatch size and
choosing computation algorithms. We also derive lemmas for determining the
quantity of key components such as the number of GPUs and parameter servers.
Experiments and examples show that these guidelines help effectively speed up
large-scale deep learning training
ACTS in Need: Automatic Configuration Tuning with Scalability Guarantees
To support the variety of Big Data use cases, many Big Data related systems
expose a large number of user-specifiable configuration parameters. Highlighted
in our experiments, a MySQL deployment with well-tuned configuration parameters
achieves a peak throughput as 12 times much as one with the default setting.
However, finding the best setting for the tens or hundreds of configuration
parameters is mission impossible for ordinary users. Worse still, many Big Data
applications require the support of multiple systems co-deployed in the same
cluster. As these co-deployed systems can interact to affect the overall
performance, they must be tuned together. Automatic configuration tuning with
scalability guarantees (ACTS) is in need to help system users. Solutions to
ACTS must scale to various systems, workloads, deployments, parameters and
resource limits. Proposing and implementing an ACTS solution, we demonstrate
that ACTS can benefit users not only in improving system performance and
resource utilization, but also in saving costs and enabling fairer
benchmarking
A Review on Energy Consumption Optimization Techniques in IoT Based Smart Building Environments
In recent years, due to the unnecessary wastage of electrical energy in
residential buildings, the requirement of energy optimization and user comfort
has gained vital importance. In the literature, various techniques have been
proposed addressing the energy optimization problem. The goal of each technique
was to maintain a balance between user comfort and energy requirements such
that the user can achieve the desired comfort level with the minimum amount of
energy consumption. Researchers have addressed the issue with the help of
different optimization algorithms and variations in the parameters to reduce
energy consumption. To the best of our knowledge, this problem is not solved
yet due to its challenging nature. The gap in the literature is due to the
advancements in the technology and drawbacks of the optimization algorithms and
the introduction of different new optimization algorithms. Further, many newly
proposed optimization algorithms which have produced better accuracy on the
benchmark instances but have not been applied yet for the optimization of
energy consumption in smart homes. In this paper, we have carried out a
detailed literature review of the techniques used for the optimization of
energy consumption and scheduling in smart homes. The detailed discussion has
been carried out on different factors contributing towards thermal comfort,
visual comfort, and air quality comfort. We have also reviewed the fog and edge
computing techniques used in smart homes
Situational Intelligence for Improving Power System Operations Under High Penetration of Photovoltaics
Nowadays, power grid operators are experiencing challenges and pressures to balance the interconnected grid frequency with rapidly increasing photovoltaic (PV) power penetration levels. PV sources are variable and intermittent. To mitigate the effect of this intermittency, power system frequency is regulated towards its security limits. Under aforementioned stressed regimes, frequency oscillations are inevitable, especially during disturbances and may lead to costly consequences as brownout or blackout. Hence, the power system operations need to be improved to make the appropriate decision in time. Specifically, concurrent or beforehand power system precise frequencies simplified straightforward-to-comprehend power system visualizations and cooperated well-performed automatic generation controls (AGC) for multiple areas are needed for operation centers to enhance.
The first study in this dissertation focuses on developing frequency prediction general structures for PV and phasor measurement units integrated electric grids to improve the situational awareness (SA) of the power system operation center in making normal and emergency decisions ahead of time. Thus, in this dissertation, a frequency situational intelligence (FSI) methodology capable of multi-bus type and multi-timescale prediction is presented based on the cellular computational network (CCN) structure with a multi-layer proception (MLP) and a generalized neuron (GN) algorithms. The results present that both CCMLPN and CCGNN can provide precise multi-timescale frequency predictions. Moreover, the CCGNN has a superior performance than the CCMLPN.
The second study of this dissertation is to improve the SA of the operation centers by developing the online visualization tool based on the synchronous generator vulnerability index (GVI) and the corresponding power system vulnerability index (SVI) considering dynamic PV penetration. The GVI and SVI are developed by the coherency grouping results of synchronous generator using K-Harmonic Means Clustering (KHMC) algorithm. Furthermore, the CCGNN based FSI method has been implemented for the online coherency grouping procedure to achieve a faster-than-real-time grouping performance.
Last but not the least, the multi-area AGCs under different PV integrated power system operating conditions are investigated on the multi-area multi-source interconnected testbed, especially with severe load disturbances. Furthermore, an onward asynchronous tuning method and a two-step (synchronous) tuning method utilizing particle swarm optimization algorithm are developed to refine the multi-area AGCs, which provide more opportunities for power system balancing authorities to interconnect freely and to utilize more PV power.
In summary, a number of methods for improving the interconnected power system situational intelligence for a high level of PV power penetration have been presented in this dissertation
Cloud Storage Level Service Offering in Virtualized Load Balancer using AWS
Distributed computing epitomizes an approach perfectly suited to the realm of IT commitments, leveraging the aggregation of information and resources through electronic cloud service providers utilizing interconnected hardware and software primarily based online, all at a reasonable cost. However, resource sharing can lead to challenges in their accessibility, potentially causing system crashes. To counter this, the technique of distributing network traffic across multiple servers, known as load balancing, plays a pivotal role. This paper ensures that no single server is overwhelmed, thereby preventing overloads and enhancing user responsiveness by equitably distributing tasks. Moreover, it significantly enhances the accessibility of tasks and websites to users. The fundamental objective of this concept is to comprehend load regulation, which operates in tandem with associated frameworks within communication structures like the Web. Load balancing stands as a critical domain within distributed computing, designed to prevent overburdening and to provide equally significant support. Various algorithms are employed to assess the system's complexity. In our proposed strategy, a process is outlined to determine optimal storage space utilization in real-time, utilizing 100 virtual computers, achieving an impressive 92% accuracy rate in its computations. This innovative approach promises efficient resource allocation within the distributed computing framework, thereby optimizing performance and accessibility for end-users
Gunrock: GPU Graph Analytics
For large-scale graph analytics on the GPU, the irregularity of data access
and control flow, and the complexity of programming GPUs, have presented two
significant challenges to developing a programmable high-performance graph
library. "Gunrock", our graph-processing system designed specifically for the
GPU, uses a high-level, bulk-synchronous, data-centric abstraction focused on
operations on a vertex or edge frontier. Gunrock achieves a balance between
performance and expressiveness by coupling high performance GPU computing
primitives and optimization strategies with a high-level programming model that
allows programmers to quickly develop new graph primitives with small code size
and minimal GPU programming knowledge. We characterize the performance of
various optimization strategies and evaluate Gunrock's overall performance on
different GPU architectures on a wide range of graph primitives that span from
traversal-based algorithms and ranking algorithms, to triangle counting and
bipartite-graph-based algorithms. The results show that on a single GPU,
Gunrock has on average at least an order of magnitude speedup over Boost and
PowerGraph, comparable performance to the fastest GPU hardwired primitives and
CPU shared-memory graph libraries such as Ligra and Galois, and better
performance than any other GPU high-level graph library.Comment: 52 pages, invited paper to ACM Transactions on Parallel Computing
(TOPC), an extended version of PPoPP'16 paper "Gunrock: A High-Performance
Graph Processing Library on the GPU
- …