148,943 research outputs found

    Exploiting Data Longevity for Enhancing the Lifetime of Flash-based Storage Class Memory

    Full text link
    Storage-class memory (SCM) combines the benefits of a solid-state memory, such as high-performance and robustness, with the archival capabilities and low cost of conventional hard-disk magnetic storage. Among candidate solid-state nonvolatile memory technologies that could potentially be used to construct SCM, flash memory is a well-established technology and have been widely used in commercially available SCM incarnations. Flash-based SCM enables much better tradeoffs between performance, space and power than disk-based systems. However, write endurance is a significant challenge for a flash-based SCM (each act of writing a bit may slightly damage a cell, so one flash cell can be written 10^4--10^5 times, depending on the flash technology, before it becomes unusable). This is a well-documented problem and has received a lot of attention by manufactures that are using some combination of write reduction and wear-leveling techniques for achieving longer lifetime. In an effort to improve flash lifetime, first, by quantifying data longevity in an SCM, we show that a majority of the data stored in a solid-state SCM do not require long retention times provided by flash memory (i.e., up to 10 years in modern devices); second, by exploiting retention time relaxation, we propose a novel mechanism, called Dense-SLC (D-SLC), which enables us perform multiple writes into a cell during each erase cycle for lifetime extension; and finally, we discuss the required changes in the flash management software (FTL) in order to use this characteristic for extending the lifetime of the solid-state part of an SCM. Using an extensive simulation-based analysis of a flash-based SCM, we demonstrate that D-SLC is able to significantly improve device lifetime (between 5.1X and 8.6X) with no performance overhead and also very small changes at the FTL software

    MPI Windows on Storage for HPC Applications

    Full text link
    Upcoming HPC clusters will feature hybrid memories and storage devices per compute node. In this work, we propose to use the MPI one-sided communication model and MPI windows as unique interface for programming memory and storage. We describe the design and implementation of MPI storage windows, and present its benefits for out-of-core execution, parallel I/O and fault-tolerance. In addition, we explore the integration of heterogeneous window allocations, where memory and storage share a unified virtual address space. When performing large, irregular memory operations, we verify that MPI windows on local storage incurs a 55% performance penalty on average. When using a Lustre parallel file system, asymmetric performance is observed with over 90% degradation in writing operations. Nonetheless, experimental results of a Distributed Hash Table, the HACC I/O kernel mini-application, and a novel MapReduce implementation based on the use of MPI one-sided communication, indicate that the overall penalty of MPI windows on storage can be negligible in most cases in real-world applications

    Splotch: porting and optimizing for the Xeon Phi

    Full text link
    With the increasing size and complexity of data produced by large scale numerical simulations, it is of primary importance for scientists to be able to exploit all available hardware in heterogenous High Performance Computing environments for increased throughput and efficiency. We focus on the porting and optimization of Splotch, a scalable visualization algorithm, to utilize the Xeon Phi, Intel's coprocessor based upon the new Many Integrated Core architecture. We discuss steps taken to offload data to the coprocessor and algorithmic modifications to aid faster processing on the many-core architecture and make use of the uniquely wide vector capabilities of the device, with accompanying performance results using multiple Xeon Phi. Finally performance is compared against results achieved with the GPU implementation of Splotch.Comment: Version 1, 11 pages, 14 figures. Accepted for publication in International Journal of High Performance Computing Applications (IJHPCA

    Unix Memory Allocations are Not Poisson

    Full text link
    In multitasking operating systems, requests for free memory are traditionally modeled as a stochastic counting process with independent, exponentially-distributed interarrival times because of the analytic simplicity such Poisson models afford. We analyze the distribution of several million unix page commits to show that although this approach could be valid over relatively long timespans, the behavior of the arrival process over shorter periods is decidedly not Poisson. We find that this result holds regardless of the originator of the request: unlike network packets, there is little difference between system- and user-level page-request distributions. We believe this to be due to the bursty nature of page allocations, which tend to occur in either small or extremely large increments. Burstiness and persistent variance have recently been found in self-similar processes in computer networks, but we show that although page commits are both bursty and possess high variance over long timescales, they are probably not self-similar. These results suggest that altogether different models are needed for fine-grained analysis of memory systems, an important consideration not only for understanding behavior but also for the design of online control systems

    DynaChanAl: Dynamic Channel Allocation with Minimal End-to-end Delay for Wireless Sensor Networks

    Full text link
    With recent advances in wireless communication, networking, and low power sensor technology, wireless sensor network (WSN) systems have begun to take significant roles in various applications ranging from environmental sensing to mobile healthcare sensing. While some WSN applications only require a lim- ited amount of bandwidth, new emerging applications operate with a notice- ably large amount of data transfers. One way to deal with such applications is to maximize the available capacity by utilizing the use of multiple wireless channels. This work proposes DynaChannAl, a distributed dynamic wireless channel algorithm with the goal of effectively distributing nodes on multiple wireless channels in WSN systems. Specifically, DynaChannAl targets applica- tions where mobile nodes connect to a pre-existing wireless backbone and takes the expected end-to-end queuing delay as its core metric. We use the link qual- ity indicator (LQI) values provided by IEEE 802.15.4 radios white-list potential links with good link quality and evaluate such links with the aggregated packet transmission latency at each hop. Our approach is useful for applications that require minimal end-to-end delay (i.e., healthcare applications). DynaChannAl is a light weight and highly adoptable scheme that can be easily incorporated with various pre-developed components and pre-deployed applications. We eval- uate DynaChannAl in on a 45 node WSN testbed. As the first study to consider end-to-end latency as the core metric for channel allocation in WSN systems, the experimental results indicate that DynaChannAl successfully distributes multi- ple (mobile) source nodes on different wireless channels and enables the nodes to select wireless channel and links that can minimize the end-to-end latency

    ADARES: Adaptive Resource Management for Virtual Machines

    Full text link
    Virtual execution environments allow for consolidation of multiple applications onto the same physical server, thereby enabling more efficient use of server resources. However, users often statically configure the resources of virtual machines through guesswork, resulting in either insufficient resource allocations that hinder VM performance, or excessive allocations that waste precious data center resources. In this paper, we first characterize real-world resource allocation and utilization of VMs through the analysis of an extensive dataset, consisting of more than 250k VMs from over 3.6k private enterprise clusters. Our large-scale analysis confirms that VMs are often misconfigured, either overprovisioned or underprovisioned, and that this problem is pervasive across a wide range of private clusters. We then propose ADARES, an adaptive system that dynamically adjusts VM resources using machine learning techniques. In particular, ADARES leverages the contextual bandits framework to effectively manage the adaptations. Our system exploits easily collectible data, at the cluster, node, and VM levels, to make more sensible allocation decisions, and uses transfer learning to safely explore the configurations space and speed up training. Our empirical evaluation shows that ADARES can significantly improve system utilization without sacrificing performance. For instance, when compared to threshold and prediction-based baselines, it achieves more predictable VM-level performance and also reduces the amount of virtual CPUs and memory provisioned by up to 35% and 60% respectively for synthetic workloads on real clusters

    Minimizing Total Busy Time for Energy-Aware Virtual Machine Allocation Problems

    Full text link
    This paper investigates the energy-aware virtual machine (VM) allocation problems in clouds along characteristics: multiple resources, fixed interval time and non-preemption of virtual machines. Many previous works have been proposed to use a minimum number of physical machines, however, this is not necessarily a good solution to minimize total energy consumption in the VM placement with multiple resources, fixed interval time and non-preemption. We observed that minimizing the sum of total busy time of all physical machines implies minimizing total energy consumption of physical machines. In addition to, if mapping of a VM onto physical machines have the same total busy time then the best mapping has physical machine's remaining available resource minimizing. Based on these observations, we proposed heuristic-based EM algorithm to solve the energy-aware VM allocation with fixed starting time and duration time. In addition, this work studies some heuristics for sorting the list of virtual machines (e.g., sorting by the earliest starting time, or latest finishing time, or the longest duration time first, etc.) to allocate VM. We evaluate the EM using CloudSim toolkit and jobs log-traces in the Feitelson's Parallel Workloads Archive. Simulation's results show that all of EM-ST, EM-LFT and EM-LDTF algorithms could reduce total energy consumption compared to state-of-the-art of power-aware VM allocation algorithms. (e.g. Power-Aware Best-Fit Decreasing (PABFD) [7])).Comment: 8 pages, Proceedings of the Sixth International Symposium on Information and Communication Technology. arXiv admin note: substantial text overlap with arXiv:1511.0682

    Cost-oblivious storage reallocation

    Full text link
    Databases need to allocate and free blocks of storage on disk. Freed blocks introduce holes where no data is stored. Allocation systems attempt to reuse such deallocated regions in order to minimize the footprint on disk. If previously allocated blocks cannot be moved, the problem is called the memory allocation problem, which is known to have a logarithmic overhead in the footprint. This paper defines the storage reallocation problem, where previously allocated blocks can be moved, or reallocated, but at some cost. The algorithms presented here are cost oblivious, in that they work for a broad and reasonable class of cost functions, even when they do not know what the cost function is. The objective is to minimize the storage footprint, that is, the largest memory address containing an allocated object, while simultaneously minimizing the reallocation costs. This paper gives asymptotically optimal algorithms for storage reallocation, in which the storage footprint is at most (1+epsilon) times optimal, and the reallocation cost is at most (1/epsilon) times the original allocation cost, which is also optimal. The algorithms are cost oblivious as long as the allocation/reallocation cost function is subadditive.Comment: 20 pages, 3 figures; to appear in Transactions on Algorithms. Full journal version of of previous conference paper in PODS 201

    A Joint Optimization of Operational Cost and Performance Interference in Cloud Data Centers

    Full text link
    Virtual machine (VM) scheduling is an important technique to efficiently operate the computing resources in a data center. Previous work has mainly focused on consolidating VMs to improve resource utilization and thus to optimize energy consumption. However, the interference between collocated VMs is usually ignored, which can result in very worse performance degradation to the applications running in those VMs due to the contention of the shared resources. Based on this observation, we aim at designing efficient VM assignment and scheduling strategies where we consider optimizing both the operational cost of the data center and the performance degradation of running applications and then, we propose a general model which captures the inherent tradeoff between the two contradictory objectives. We present offline and online solutions for this problem by exploiting the spatial and temporal information of VMs where VM scheduling is done by jointly consider the combinations and the life-cycle overlapping of the VMs. Evaluation results show that the proposed methods can generate efficient schedules for VMs, achieving low operational cost while significantly reducing the performance degradation of applications in cloud data centers

    Adaptive Event Dispatching in Serverless Computing Infrastructures

    Full text link
    Serverless computing is an emerging Cloud service model. It is currently gaining momentum as the next step in the evolution of hosted computing from capacitated machine virtualisation and microservices towards utility computing. The term "serverless" has become a synonym for the entirely resource-transparent deployment model of cloud-based event-driven distributed applications. This work investigates how adaptive event dispatching can improve serverless platform resource efficiency and contributes a novel approach that allows for better scaling and fitting of the platform's resource consumption to actual demand
    corecore