126 research outputs found
Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures
One of the significant shifts of the next-generation computing technologies will certainly be in
the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD
landmark, evolved as a widely deployed BD operating system. Its new features include
federation structure and many associated frameworks, which provide Hadoop 3.x with the
maturity to serve different markets. This dissertation addresses two leading issues involved in
exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely,
(i)Scalability that directly affects the system performance and overall throughput using
portable Docker containers. (ii) Security that spread the adoption of data protection practices
among practitioners using access controls. An Enhanced Mapreduce Environment (EME),
OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker
(BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for
data streaming to the cloud computing are the main contribution of this thesis study
Technical Report: A Trace-Based Performance Study of Autoscaling Workloads of Workflows in Datacenters
To improve customer experience, datacenter operators offer support for
simplifying application and resource management. For example, running workloads
of workflows on behalf of customers is desirable, but requires increasingly
more sophisticated autoscaling policies, that is, policies that dynamically
provision resources for the customer. Although selecting and tuning autoscaling
policies is a challenging task for datacenter operators, so far relatively few
studies investigate the performance of autoscaling for workloads of workflows.
Complementing previous knowledge, in this work we propose the first
comprehensive performance study in the field. Using trace-based simulation, we
compare state-of-the-art autoscaling policies across multiple application
domains, workload arrival patterns (e.g., burstiness), and system utilization
levels. We further investigate the interplay between autoscaling and regular
allocation policies, and the complexity cost of autoscaling. Our quantitative
study focuses not only on traditional performance metrics and on
state-of-the-art elasticity metrics, but also on time- and memory-related
autoscaling-complexity metrics. Our main results give strong and quantitative
evidence about previously unreported operational behavior, for example, that
autoscaling policies perform differently across application domains and by how
much they differ.Comment: Technical Report for the CCGrid 2018 submission "A Trace-Based
Performance Study of Autoscaling Workloads of Workflows in Datacenters
Sphere: Simulator of edge infrastructures for the optimization of performance and resources energy consumption
Edge computing constitutes a key paradigm to address the new requirements of areas such as
smart cars, industry 4.0, and health care, where massive amounts of heterogeneous data from
continuous geographically-distributed sources have to be processed and computed near real-time. To this end, new distributed infrastructures consisting on small computing clusters close to
data sources, also known as Cloudlets have emerged. In order to evaluate the performance of these
solutions we present Sphere, a simulation tool that enables researchers to establish various scenarios, including: (a) topology and orchestration model of the infrastructure; (b) incoming
workload patterns; (c) resource-managing models; and (d) scheduling policies. Moreover, Sphere
allows researchers to apply efficiency and performance policies both at infrastructure and cluster
levels. The simulator presents the following benefits: (a) Evaluation of various orchestration
models; (b) Analysis of resource-efficiency and performance strategies at Edge-infrastructure and
cluster (Cloudlet/Cloud) level; (c) Execution of diverse workload generation patterns; (d)
Evaluation of strategies for the infrastructure communication, as well as the impact on tasks
completion time (makespan); and (e) Simulation of each cluster (Cloudlet/Cloud) independently,
including resource-managing, scheduling and resource-efficiency models. Finally, we performed
a deep evaluation based on realistic Edge-Computing use cases. The results of the experiments
confirm that it is a performant and reliable tool for the analysis of orchestration, graph-resolving,
energy-efficiency, resource-managing and scheduling strategies in Edge-computing environments.Ministerio de Ciencia, Innovación y Universidades RTI2018-098062-A-I0
Allocation of Virtual Machines in Cloud Data Centers - A Survey of Problem Models and Optimization Algorithms
Data centers in public, private, and hybrid cloud settings make it possible to provision virtual machines
(VMs) with unprecedented flexibility. However, purchasing, operating, and maintaining the underlying physical
resources incurs significant monetary costs and also environmental impact. Therefore, cloud providers must
optimize the usage of physical resources by a careful allocation of VMs to hosts, continuously balancing between
the conflicting requirements on performance and operational costs. In recent years, several algorithms have been
proposed for this important optimization problem. Unfortunately, the proposed approaches are hardly comparable
because of subtle differences in the used problem models. This paper surveys the used problem formulations and
optimization algorithms, highlighting their strengths and limitations, also pointing out the areas that need further
research in the future
- …