588 research outputs found

    Towards Optimizing Storage Costs on the Cloud

    Full text link
    We study the problem of optimizing data storage and access costs on the cloud while ensuring that the desired performance or latency is unaffected. We first propose an optimizer that optimizes the data placement tier (on the cloud) and the choice of compression schemes to apply, for given data partitions with temporal access predictions. Secondly, we propose a model to learn the compression performance of multiple algorithms across data partitions in different formats to generate compression performance predictions on the fly, as inputs to the optimizer. Thirdly, we propose to approach the data partitioning problem fundamentally differently than the current default in most data lakes where partitioning is in the form of ingestion batches. We propose access pattern aware data partitioning and formulate an optimization problem that optimizes the size and reading costs of partitions subject to access patterns. We study the various optimization problems theoretically as well as empirically, and provide theoretical bounds as well as hardness results. We propose a unified pipeline of cost minimization, called SCOPe that combines the different modules. We extensively compare the performance of our methods with related baselines from the literature on TPC-H data as well as enterprise datasets (ranging from GB to PB in volume) and show that SCOPe substantially improves over the baselines. We show significant cost savings compared to platform baselines, of the order of 50% to 83% on enterprise Data Lake datasets that range from terabytes to petabytes in volume.Comment: The first two authors contributed equally. 12 pages, Accepted to the International Conference on Data Engineering (ICDE) 202

    Cheap Data Analytics using Cold Storage Devices

    Get PDF
    Enterprise databases use storage tiering to lower capital and operational expenses. In such a setting, data waterfalls from an SSD-based high-performance tier when it is "hot" (frequently accessed) to a disk-based capacity tier and finally to a tape-based archival tier when "cold" (rarely accessed). To address the unprecedented growth in the amount of cold data, hardware vendors introduced new devices named Cold Storage Devices (CSD) explicitly targeted at cold data workloads. With access latencies in tens of seconds and cost/GB as low as $0.01/GB/month, CSD provide a middle ground between the low-latency (ms), high-cost, HDD-based capacity tier, and high-latency (min to h), low-cost, tape-based, archival tier. Driven by the price/performance aspect of CSD, this paper makes a case for using CSD as a replacement for both capacity and archival tiers of enterprise databases. Although CSD offer major cost savings, we show that current database systems can suffer from severe performance drop when CSD are used as a replacement for HDD due to the mismatch between design assumptions made by the query execution engine and actual storage characteristics of the CSD. We then build a CSD-driven query execution framework, called Skipper, that modifies both the database execution engine and CSD scheduling algorithms to be aware of each other. Using results from our implementation of the architecture based on PostgreSQL and OpenStack Swift, we show that Skipper is capable of completely masking the high latency overhead of CSD, thereby opening up CSD for wider adoption as a storage tier for cheap data analytics over cold data

    MACHS: Mitigating the Achilles Heel of the Cloud through High Availability and Performance-aware Solutions

    Get PDF
    Cloud computing is continuously growing as a business model for hosting information and communication technology applications. However, many concerns arise regarding the quality of service (QoS) offered by the cloud. One major challenge is the high availability (HA) of cloud-based applications. The key to achieving availability requirements is to develop an approach that is immune to cloud failures while minimizing the service level agreement (SLA) violations. To this end, this thesis addresses the HA of cloud-based applications from different perspectives. First, the thesis proposes a component’s HA-ware scheduler (CHASE) to manage the deployments of carrier-grade cloud applications while maximizing their HA and satisfying the QoS requirements. Second, a Stochastic Petri Net (SPN) model is proposed to capture the stochastic characteristics of cloud services and quantify the expected availability offered by an application deployment. The SPN model is then associated with an extensible policy-driven cloud scoring system that integrates other cloud challenges (i.e. green and cost concerns) with HA objectives. The proposed HA-aware solutions are extended to include a live virtual machine migration model that provides a trade-off between the migration time and the downtime while maintaining HA objective. Furthermore, the thesis proposes a generic input template for cloud simulators, GITS, to facilitate the creation of cloud scenarios while ensuring reusability, simplicity, and portability. Finally, an availability-aware CloudSim extension, ACE, is proposed. ACE extends CloudSim simulator with failure injection, computational paths, repair, failover, load balancing, and other availability-based modules

    Cloud Based IoT Architecture

    Get PDF
    The Internet of Things (IoT) and cloud computing have grown in popularity over the past decade as the internet becomes faster and more ubiquitous. Cloud platforms are well suited to handle IoT systems as they are accessible and resilient, and they provide a scalable solution to store and analyze large amounts of IoT data. IoT applications are complex software systems and software developers need to have a thorough understanding of the capabilities, limitations, architecture, and design patterns of cloud platforms and cloud-based IoT tools to build an efficient, maintainable, and customizable IoT application. As the IoT landscape is constantly changing, research into cloud-based IoT platforms is either lacking or out of date. The goal of this thesis is to describe the basic components and requirements for a cloud-based IoT platform, to provide useful insights and experiences in implementing a cloud-based IoT solution using Microsoft Azure, and to discuss some of the shortcomings when combining IoT with a cloud platform

    Whole Farm Net Zero: approaches to quantification of climate regulation ecosystem services at the whole farm scale. Vermont Payment for Ecosystem Services Technical Report #7

    Get PDF
    In this report, approaches to the quantification of climate mitigation ecosystem services at the whole farm scale are reviewed and summarized for easy comparison. Eight quantification tools, and three case studies demonstrating possible tool applications, are summarized to fulfill the requirements of the Technical Services Contract—Task 7. Information from a combination of literature review and expert interviews served to document the inputs, outputs, strengths, weaknesses, opportunities, and threats for each quantification tool. This research was conducted in service to the Vermont Soil Health and Payment for Ecosystem Services (PES) Working Group (VT PES working group). It is our hope that this report provides productive information and insights for the implementation of whole farm scale payment for ecosystem services programs, Vermont’s Climate Action Plan, and similar efforts elsewhere. Emissions reductions on farms are of interest to farmers in Vermont and will be required by the implementation of the Global Warming Solutions Act (GWSA). Management changes that reduce emissions at the farm scale could possibly be supported and encouraged through a PES program. Given the work and goals of the PES Working Group and the requirements to implement the GWSA it is critical to understand the degree of accuracy and scope of currently available greenhouse gas assessment tools that could possibly be implemented to measure and monitor outcomes from VT agriculture. Section 2 of this report describes the methods used to collect information reviewing eight tools for quantifying agricultural greenhouse gas emissions and sequestration rates, including the CarbOn Management & Emissions Tool (COMET)-Farm, COMET-Planner, COOL-Farm, DayCent, DNDC (DeNitrification-DeComposition), Environmental Policy Integrated Climate (EPIC) & APEX Agricultural Policy / Environmental eXtender (APEX), Holos, and the Integrated Farm Systems Model (IFSM). These eight tools were each reviewed using a systematic literature review, interviews with experts who are well-versed in using the specific tools, and a Strengths-Weaknesses-Opportunities-Threats (SWOT) analysis. Section 3 presents some larger-context considerations for choosing an appropriate tool. Section 4 gives a high-level overview of the SWOT analysis performed for each tool reviewed for this task. Section 5 describes three example applications of emissions modeling tools.Section 6 contains concluding remarks. The report’s Appendix section includes the SWOT analyses for each tool to allow for more in-depth review, as well as a series of tables to present a high-level comparison of the tools

    iURBAN

    Get PDF
    iURBAN: Intelligent Urban Energy Tool introduces an urban energy tool integrating different ICT energy management systems (both hardware and software) in two European cities, providing useful data to a novel decision support system that makes available the necessary parameters for the generation and further operation of associated business models. The business models contribute at a global level to efficiently manage and distribute the energy produced and consumed at a local level (city or neighbourhood), incorporating behavioural aspects of the users into the software platform and in general prosumers. iURBAN integrates a smart Decision Support System (smartDSS) that collects real-time or near real-time data, aggregates, analyses and suggest actions of energy consumption and production from different buildings, renewable energy production resources, combined heat and power plants, electric vehicles (EV) charge stations, storage systems, sensors and actuators. The consumption and production data is collected via a heterogeneous data communication protocols and networks. The iURBAN smartDSS through a Local Decision Support System allows the citizens to analyse the consumptions and productions that they are generating, receive information about CO2 savings, advises in demand response and the possibility to participate actively in the energy market. Whilst, through a Centralised Decision Support System allow to utilities, ESCOs, municipalities or other authorised third parties to: Get a continuous snapshot of city energy consumption and productionManage energy consumption and productionForecasting of energy consumptionPlanning of new energy "producers" for the future needs of the cityVisualise, analyse and take decisions of all the end points that are consuming or producing energy in a city level, permitting them to forecast and planning renewable power generation available in the city

    Interconnected Services for Time-Series Data Management in Smart Manufacturing Scenarios

    Get PDF
    xvii, 218 p.The rise of Smart Manufacturing, together with the strategic initiatives carried out worldwide, have promoted its adoption among manufacturers who are increasingly interested in boosting data-driven applications for different purposes, such as product quality control, predictive maintenance of equipment, etc. However, the adoption of these approaches faces diverse technological challenges with regard to the data-related technologies supporting the manufacturing data life-cycle. The main contributions of this dissertation focus on two specific challenges related to the early stages of the manufacturing data life-cycle: an optimized storage of the massive amounts of data captured during the production processes and an efficient pre-processing of them. The first contribution consists in the design and development of a system that facilitates the pre-processing task of the captured time-series data through an automatized approach that helps in the selection of the most adequate pre-processing techniques to apply to each data type. The second contribution is the design and development of a three-level hierarchical architecture for time-series data storage on cloud environments that helps to manage and reduce the required data storage resources (and consequently its associated costs). Moreover, with regard to the later stages, a thirdcontribution is proposed, that leverages advanced data analytics to build an alarm prediction system that allows to conduct a predictive maintenance of equipment by anticipating the activation of different types of alarms that can be produced on a real Smart Manufacturing scenario

    A survey on architectures and energy efficiency in Data Center Networks

    Get PDF
    Data Center Networks (DCNs) are attracting growing interest from both academia and industry to keep pace with the exponential growth in cloud computing and enterprise networks. Modern DCNs are facing two main challenges of scalability and cost-effectiveness. The architecture of a DCN directly impacts on its scalability, while its cost is largely driven by its power consumption. In this paper, we conduct a detailed survey of the most recent advances and research activities in DCNs, with a special focus on the architectural evolution of DCNs and their energy efficiency. The paper provides a qualitative categorization of existing DCN architectures into switch-centric and server-centric topologies as well as their design technologies. Energy efficiency in data centers is discussed in details with survey of existing techniques in energy savings, green data centers and renewable energy approaches. Finally, we outline potential future research directions in DCNs
    • …
    corecore