246 research outputs found
A Techno-economic and Spatial Analysis for the Optimal Planning of Wind Energy in Kythira Island, Greece
Low-enthalpy geothermal resources for electricity production: A demand-side management study for intelligent communities
The Unexpected Efficiency of Bin Packing Algorithms for Dynamic Storage Allocation in the Wild: An Intellectual Abstract
Recent work has shown that viewing allocators as black-box 2DBP solvers bears
meaning. For instance, there exists a 2DBP-based fragmentation metric which
often correlates monotonically with maximum resident set size (RSS). Given the
field's indeterminacy with respect to fragmentation definitions, as well as the
immense value of physical memory savings, we are motivated to set
allocator-generated placements against their 2DBP-devised, makespan-optimizing
counterparts. Of course, allocators must operate online while 2DBP algorithms
work on complete request traces; but since both sides optimize criteria related
to minimizing memory wastage, the idea of studying their relationship preserves
its intellectual--and practical--interest.
Unfortunately no implementations of 2DBP algorithms for DSA are available.
This paper presents a first, though partial, implementation of the
state-of-the-art. We validate its functionality by comparing its outputs'
makespan to the theoretical upper bound provided by the original authors. Along
the way, we identify and document key details to assist analogous future
efforts.
Our experiments comprise 4 modern allocators and 8 real application
workloads. We make several notable observations on our empirical evidence: in
terms of makespan, allocators outperform Robson's worst-case lower bound
of the time. In of cases, GNU's \texttt{malloc}
implementation demonstrates equivalent or superior performance to the 2DBP
state-of-the-art, despite the second operating offline.
Most surprisingly, the 2DBP algorithm proves competent in terms of
fragmentation, producing up to x better solutions. Future research can
leverage such insights towards memory-targeting optimizations.Comment: 13 pages, 10 figures, 3 tables. To appear in ISMM '2
Resource Aware GPU Scheduling in Kubernetes Infrastructure
Nowadays, there is an ever-increasing number of artificial intelligence inference workloads pushed and executed on the cloud. To effectively serve and manage the computational demands, data center operators have provisioned their infrastructures with accelerators. Specifically for GPUs, support for efficient management lacks, as state-of-the-art schedulers and orchestrators, threat GPUs only as typical compute resources ignoring their unique characteristics and application properties. This phenomenon combined with the GPU over-provisioning problem leads to severe resource under-utilization. Even though prior work has addressed this problem by colocating applications into a single accelerator device, its resource agnostic nature does not manage to face the resource under-utilization and quality of service violations especially for latency critical applications.
In this paper, we design a resource aware GPU scheduling framework, able to efficiently colocate applications on the same GPU accelerator card. We integrate our solution with Kubernetes, one of the most widely used cloud orchestration frameworks. We show that our scheduler can achieve 58.8% lower end-to-end job execution time 99%-ile, while delivering 52.5% higher GPU memory usage, 105.9% higher GPU utilization percentage on average and 44.4% lower energy consumption on average, compared to the state-of-the-art schedulers, for a variety of ML representative workloads
Recommended from our members
Oops: Optimizing operation-mode selection for IoT edge devices
The massive increase of IoT devices and their collected data raises the question of how to analyze all that data. Edge computing provides a suitable compromise, but the question remains: How much processing should be done locally vs. offloaded to other devices? The diverse application requirements and limited resources at the edge extend the challenges.
We propose
Oops
, an optimization framework to adapt the resource management at runtime distributedly. It orchestrates the IoT devices and adapts their operation mode with respect to their constraints and the gateway’s limited shared resources. Oops reduces runtime overhead significantly while increasing user utility compared to state-of-the-art.
</jats:p
Dataflow acceleration of Smith-Waterman with Traceback for high throughput Next Generation Sequencing
Smith-Waterman algorithm is widely adopted bymost popular DNA sequence aligners. The inherent algorithmcomputational intensity and the vast amount of NGS input datait operates on, create a bottleneck in genomic analysis flows forshort-read alignment. FPGA architectures have been extensivelyleveraged to alleviate the problem, each one adopting a differentapproach. In existing solutions, effective co-design of the NGSshort-read alignment still remains an open issue, mainly due tonarrow view on real integration aspects, such as system widecommunication and accelerator call overheads. In this paper, wepropose a dataflow architecture for Smith-Waterman Matrix-filland Traceback alignment stages, to perform short-read alignmenton NGS data. The architectural decision of moving both stages onchip extinguishes the communication overhead, and coupled withradical software restructuring, allows for efficient integration intowidely-used Bowtie2 aligner. This approach deliversĂ—18 speedupover the respective Bowtie2 standalone components, while our co-designed Bowtie2 demonstrates a 35% boost in performance
- …