126 research outputs found

    Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures

    Get PDF
    One of the significant shifts of the next-generation computing technologies will certainly be in the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD landmark, evolved as a widely deployed BD operating system. Its new features include federation structure and many associated frameworks, which provide Hadoop 3.x with the maturity to serve different markets. This dissertation addresses two leading issues involved in exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely, (i)Scalability that directly affects the system performance and overall throughput using portable Docker containers. (ii) Security that spread the adoption of data protection practices among practitioners using access controls. An Enhanced Mapreduce Environment (EME), OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker (BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for data streaming to the cloud computing are the main contribution of this thesis study

    Technical Report: A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters

    Get PDF
    To improve customer experience, datacenter operators offer support for simplifying application and resource management. For example, running workloads of workflows on behalf of customers is desirable, but requires increasingly more sophisticated autoscaling policies, that is, policies that dynamically provision resources for the customer. Although selecting and tuning autoscaling policies is a challenging task for datacenter operators, so far relatively few studies investigate the performance of autoscaling for workloads of workflows. Complementing previous knowledge, in this work we propose the first comprehensive performance study in the field. Using trace-based simulation, we compare state-of-the-art autoscaling policies across multiple application domains, workload arrival patterns (e.g., burstiness), and system utilization levels. We further investigate the interplay between autoscaling and regular allocation policies, and the complexity cost of autoscaling. Our quantitative study focuses not only on traditional performance metrics and on state-of-the-art elasticity metrics, but also on time- and memory-related autoscaling-complexity metrics. Our main results give strong and quantitative evidence about previously unreported operational behavior, for example, that autoscaling policies perform differently across application domains and by how much they differ.Comment: Technical Report for the CCGrid 2018 submission "A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters

    Sphere: Simulator of edge infrastructures for the optimization of performance and resources energy consumption

    Get PDF
    Edge computing constitutes a key paradigm to address the new requirements of areas such as smart cars, industry 4.0, and health care, where massive amounts of heterogeneous data from continuous geographically-distributed sources have to be processed and computed near real-time. To this end, new distributed infrastructures consisting on small computing clusters close to data sources, also known as Cloudlets have emerged. In order to evaluate the performance of these solutions we present Sphere, a simulation tool that enables researchers to establish various scenarios, including: (a) topology and orchestration model of the infrastructure; (b) incoming workload patterns; (c) resource-managing models; and (d) scheduling policies. Moreover, Sphere allows researchers to apply efficiency and performance policies both at infrastructure and cluster levels. The simulator presents the following benefits: (a) Evaluation of various orchestration models; (b) Analysis of resource-efficiency and performance strategies at Edge-infrastructure and cluster (Cloudlet/Cloud) level; (c) Execution of diverse workload generation patterns; (d) Evaluation of strategies for the infrastructure communication, as well as the impact on tasks completion time (makespan); and (e) Simulation of each cluster (Cloudlet/Cloud) independently, including resource-managing, scheduling and resource-efficiency models. Finally, we performed a deep evaluation based on realistic Edge-Computing use cases. The results of the experiments confirm that it is a performant and reliable tool for the analysis of orchestration, graph-resolving, energy-efficiency, resource-managing and scheduling strategies in Edge-computing environments.Ministerio de Ciencia, Innovación y Universidades RTI2018-098062-A-I0

    Allocation of Virtual Machines in Cloud Data Centers - A Survey of Problem Models and Optimization Algorithms

    Get PDF
    Data centers in public, private, and hybrid cloud settings make it possible to provision virtual machines (VMs) with unprecedented flexibility. However, purchasing, operating, and maintaining the underlying physical resources incurs significant monetary costs and also environmental impact. Therefore, cloud providers must optimize the usage of physical resources by a careful allocation of VMs to hosts, continuously balancing between the conflicting requirements on performance and operational costs. In recent years, several algorithms have been proposed for this important optimization problem. Unfortunately, the proposed approaches are hardly comparable because of subtle differences in the used problem models. This paper surveys the used problem formulations and optimization algorithms, highlighting their strengths and limitations, also pointing out the areas that need further research in the future
    corecore