148,943 research outputs found
Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory
Storage-class memory (SCM) combines the benefits of a solid-state memory,
such as high-performance and robustness, with the archival capabilities and low
cost of conventional hard-disk magnetic storage. Among candidate solid-state
nonvolatile memory technologies that could potentially be used to construct
SCM, flash memory is a well-established technology and have been widely used in
commercially available SCM incarnations. Flash-based SCM enables much better
tradeoffs between performance, space and power than disk-based systems.
However, write endurance is a significant challenge for a flash-based SCM (each
act of writing a bit may slightly damage a cell, so one flash cell can be
written 10^4--10^5 times, depending on the flash technology, before it becomes
unusable). This is a well-documented problem and has received a lot of
attention by manufactures that are using some combination of write reduction
and wear-leveling techniques for achieving longer lifetime. In an effort to
improve flash lifetime, first, by quantifying data longevity in an SCM, we show
that a majority of the data stored in a solid-state SCM do not require long
retention times provided by flash memory (i.e., up to 10 years in modern
devices); second, by exploiting retention time relaxation, we propose a novel
mechanism, called Dense-SLC (D-SLC), which enables us perform multiple writes
into a cell during each erase cycle for lifetime extension; and finally, we
discuss the required changes in the flash management software (FTL) in order to
use this characteristic for extending the lifetime of the solid-state part of
an SCM. Using an extensive simulation-based analysis of a flash-based SCM, we
demonstrate that D-SLC is able to significantly improve device lifetime
(between 5.1X and 8.6X) with no performance overhead and also very small
changes at the FTL software
MPI Windows on Storage for HPC Applications
Upcoming HPC clusters will feature hybrid memories and storage devices per
compute node. In this work, we propose to use the MPI one-sided communication
model and MPI windows as unique interface for programming memory and storage.
We describe the design and implementation of MPI storage windows, and present
its benefits for out-of-core execution, parallel I/O and fault-tolerance. In
addition, we explore the integration of heterogeneous window allocations, where
memory and storage share a unified virtual address space. When performing
large, irregular memory operations, we verify that MPI windows on local storage
incurs a 55% performance penalty on average. When using a Lustre parallel file
system, asymmetric performance is observed with over 90% degradation in writing
operations. Nonetheless, experimental results of a Distributed Hash Table, the
HACC I/O kernel mini-application, and a novel MapReduce implementation based on
the use of MPI one-sided communication, indicate that the overall penalty of
MPI windows on storage can be negligible in most cases in real-world
applications
Splotch: porting and optimizing for the Xeon Phi
With the increasing size and complexity of data produced by large scale
numerical simulations, it is of primary importance for scientists to be able to
exploit all available hardware in heterogenous High Performance Computing
environments for increased throughput and efficiency. We focus on the porting
and optimization of Splotch, a scalable visualization algorithm, to utilize the
Xeon Phi, Intel's coprocessor based upon the new Many Integrated Core
architecture. We discuss steps taken to offload data to the coprocessor and
algorithmic modifications to aid faster processing on the many-core
architecture and make use of the uniquely wide vector capabilities of the
device, with accompanying performance results using multiple Xeon Phi. Finally
performance is compared against results achieved with the GPU implementation of
Splotch.Comment: Version 1, 11 pages, 14 figures. Accepted for publication in
International Journal of High Performance Computing Applications (IJHPCA
Unix Memory Allocations are Not Poisson
In multitasking operating systems, requests for free memory are traditionally
modeled as a stochastic counting process with independent,
exponentially-distributed interarrival times because of the analytic simplicity
such Poisson models afford. We analyze the distribution of several million unix
page commits to show that although this approach could be valid over relatively
long timespans, the behavior of the arrival process over shorter periods is
decidedly not Poisson. We find that this result holds regardless of the
originator of the request: unlike network packets, there is little difference
between system- and user-level page-request distributions. We believe this to
be due to the bursty nature of page allocations, which tend to occur in either
small or extremely large increments. Burstiness and persistent variance have
recently been found in self-similar processes in computer networks, but we show
that although page commits are both bursty and possess high variance over long
timescales, they are probably not self-similar. These results suggest that
altogether different models are needed for fine-grained analysis of memory
systems, an important consideration not only for understanding behavior but
also for the design of online control systems
DynaChanAl: Dynamic Channel Allocation with Minimal End-to-end Delay for Wireless Sensor Networks
With recent advances in wireless communication, networking, and low power
sensor technology, wireless sensor network (WSN) systems have begun to take
significant roles in various applications ranging from environmental sensing to
mobile healthcare sensing. While some WSN applications only require a lim- ited
amount of bandwidth, new emerging applications operate with a notice- ably
large amount of data transfers. One way to deal with such applications is to
maximize the available capacity by utilizing the use of multiple wireless
channels. This work proposes DynaChannAl, a distributed dynamic wireless
channel algorithm with the goal of effectively distributing nodes on multiple
wireless channels in WSN systems. Specifically, DynaChannAl targets applica-
tions where mobile nodes connect to a pre-existing wireless backbone and takes
the expected end-to-end queuing delay as its core metric. We use the link qual-
ity indicator (LQI) values provided by IEEE 802.15.4 radios white-list
potential links with good link quality and evaluate such links with the
aggregated packet transmission latency at each hop. Our approach is useful for
applications that require minimal end-to-end delay (i.e., healthcare
applications). DynaChannAl is a light weight and highly adoptable scheme that
can be easily incorporated with various pre-developed components and
pre-deployed applications. We eval- uate DynaChannAl in on a 45 node WSN
testbed. As the first study to consider end-to-end latency as the core metric
for channel allocation in WSN systems, the experimental results indicate that
DynaChannAl successfully distributes multi- ple (mobile) source nodes on
different wireless channels and enables the nodes to select wireless channel
and links that can minimize the end-to-end latency
ADARES: Adaptive Resource Management for Virtual Machines
Virtual execution environments allow for consolidation of multiple
applications onto the same physical server, thereby enabling more efficient use
of server resources. However, users often statically configure the resources of
virtual machines through guesswork, resulting in either insufficient resource
allocations that hinder VM performance, or excessive allocations that waste
precious data center resources. In this paper, we first characterize real-world
resource allocation and utilization of VMs through the analysis of an extensive
dataset, consisting of more than 250k VMs from over 3.6k private enterprise
clusters. Our large-scale analysis confirms that VMs are often misconfigured,
either overprovisioned or underprovisioned, and that this problem is pervasive
across a wide range of private clusters. We then propose ADARES, an adaptive
system that dynamically adjusts VM resources using machine learning techniques.
In particular, ADARES leverages the contextual bandits framework to effectively
manage the adaptations. Our system exploits easily collectible data, at the
cluster, node, and VM levels, to make more sensible allocation decisions, and
uses transfer learning to safely explore the configurations space and speed up
training. Our empirical evaluation shows that ADARES can significantly improve
system utilization without sacrificing performance. For instance, when compared
to threshold and prediction-based baselines, it achieves more predictable
VM-level performance and also reduces the amount of virtual CPUs and memory
provisioned by up to 35% and 60% respectively for synthetic workloads on real
clusters
Minimizing Total Busy Time for Energy-Aware Virtual Machine Allocation Problems
This paper investigates the energy-aware virtual machine (VM) allocation
problems in clouds along characteristics: multiple resources, fixed interval
time and non-preemption of virtual machines. Many previous works have been
proposed to use a minimum number of physical machines, however, this is not
necessarily a good solution to minimize total energy consumption in the VM
placement with multiple resources, fixed interval time and non-preemption. We
observed that minimizing the sum of total busy time of all physical machines
implies minimizing total energy consumption of physical machines. In addition
to, if mapping of a VM onto physical machines have the same total busy time
then the best mapping has physical machine's remaining available resource
minimizing. Based on these observations, we proposed heuristic-based EM
algorithm to solve the energy-aware VM allocation with fixed starting time and
duration time. In addition, this work studies some heuristics for sorting the
list of virtual machines (e.g., sorting by the earliest starting time, or
latest finishing time, or the longest duration time first, etc.) to allocate
VM. We evaluate the EM using CloudSim toolkit and jobs log-traces in the
Feitelson's Parallel Workloads Archive. Simulation's results show that all of
EM-ST, EM-LFT and EM-LDTF algorithms could reduce total energy consumption
compared to state-of-the-art of power-aware VM allocation algorithms. (e.g.
Power-Aware Best-Fit Decreasing (PABFD) [7])).Comment: 8 pages, Proceedings of the Sixth International Symposium on
Information and Communication Technology. arXiv admin note: substantial text
overlap with arXiv:1511.0682
Cost-oblivious storage reallocation
Databases need to allocate and free blocks of storage on disk. Freed blocks
introduce holes where no data is stored. Allocation systems attempt to reuse
such deallocated regions in order to minimize the footprint on disk. If
previously allocated blocks cannot be moved, the problem is called the memory
allocation problem, which is known to have a logarithmic overhead in the
footprint.
This paper defines the storage reallocation problem, where previously
allocated blocks can be moved, or reallocated, but at some cost. The algorithms
presented here are cost oblivious, in that they work for a broad and reasonable
class of cost functions, even when they do not know what the cost function is.
The objective is to minimize the storage footprint, that is, the largest
memory address containing an allocated object, while simultaneously minimizing
the reallocation costs. This paper gives asymptotically optimal algorithms for
storage reallocation, in which the storage footprint is at most (1+epsilon)
times optimal, and the reallocation cost is at most (1/epsilon) times the
original allocation cost, which is also optimal. The algorithms are cost
oblivious as long as the allocation/reallocation cost function is subadditive.Comment: 20 pages, 3 figures; to appear in Transactions on Algorithms. Full
journal version of of previous conference paper in PODS 201
A Joint Optimization of Operational Cost and Performance Interference in Cloud Data Centers
Virtual machine (VM) scheduling is an important technique to efficiently
operate the computing resources in a data center. Previous work has mainly
focused on consolidating VMs to improve resource utilization and thus to
optimize energy consumption. However, the interference between collocated VMs
is usually ignored, which can result in very worse performance degradation to
the applications running in those VMs due to the contention of the shared
resources. Based on this observation, we aim at designing efficient VM
assignment and scheduling strategies where we consider optimizing both the
operational cost of the data center and the performance degradation of running
applications and then, we propose a general model which captures the inherent
tradeoff between the two contradictory objectives. We present offline and
online solutions for this problem by exploiting the spatial and temporal
information of VMs where VM scheduling is done by jointly consider the
combinations and the life-cycle overlapping of the VMs. Evaluation results show
that the proposed methods can generate efficient schedules for VMs, achieving
low operational cost while significantly reducing the performance degradation
of applications in cloud data centers
Adaptive Event Dispatching in Serverless Computing Infrastructures
Serverless computing is an emerging Cloud service model. It is currently
gaining momentum as the next step in the evolution of hosted computing from
capacitated machine virtualisation and microservices towards utility computing.
The term "serverless" has become a synonym for the entirely
resource-transparent deployment model of cloud-based event-driven distributed
applications. This work investigates how adaptive event dispatching can improve
serverless platform resource efficiency and contributes a novel approach that
allows for better scaling and fitting of the platform's resource consumption to
actual demand
- …