43 research outputs found
On Evaluating Commercial Cloud Services: A Systematic Review
Background: Cloud Computing is increasingly booming in industry with many
competing providers and services. Accordingly, evaluation of commercial Cloud
services is necessary. However, the existing evaluation studies are relatively
chaotic. There exists tremendous confusion and gap between practices and theory
about Cloud services evaluation. Aim: To facilitate relieving the
aforementioned chaos, this work aims to synthesize the existing evaluation
implementations to outline the state-of-the-practice and also identify research
opportunities in Cloud services evaluation. Method: Based on a conceptual
evaluation model comprising six steps, the Systematic Literature Review (SLR)
method was employed to collect relevant evidence to investigate the Cloud
services evaluation step by step. Results: This SLR identified 82 relevant
evaluation studies. The overall data collected from these studies essentially
represent the current practical landscape of implementing Cloud services
evaluation, and in turn can be reused to facilitate future evaluation work.
Conclusions: Evaluation of commercial Cloud services has become a world-wide
research topic. Some of the findings of this SLR identify several research gaps
in the area of Cloud services evaluation (e.g., the Elasticity and Security
evaluation of commercial Cloud services could be a long-term challenge), while
some other findings suggest the trend of applying commercial Cloud services
(e.g., compared with PaaS, IaaS seems more suitable for customers and is
particularly important in industry). This SLR study itself also confirms some
previous experiences and reveals new Evidence-Based Software Engineering (EBSE)
lessons
Resource provisioning in Science Clouds: Requirements and challenges
Cloud computing has permeated into the information technology industry in the
last few years, and it is emerging nowadays in scientific environments. Science
user communities are demanding a broad range of computing power to satisfy the
needs of high-performance applications, such as local clusters,
high-performance computing systems, and computing grids. Different workloads
are needed from different computational models, and the cloud is already
considered as a promising paradigm. The scheduling and allocation of resources
is always a challenging matter in any form of computation and clouds are not an
exception. Science applications have unique features that differentiate their
workloads, hence, their requirements have to be taken into consideration to be
fulfilled when building a Science Cloud. This paper will discuss what are the
main scheduling and resource allocation challenges for any Infrastructure as a
Service provider supporting scientific applications
MapReduce in the Clouds for Science
Abstract — The utility computing model introduced by cloud computing combined with the rich set of cloud infrastructure services offers a very viable alternative to traditional servers and computing clusters. MapReduce distributed data processing architecture has become the weapon of choice for data-intensive analyses in the clouds and in commodity clusters due to its excellent fault tolerance features, scalability and the ease of use. Currently, there are several options for using MapReduce in cloud environments, such as using MapReduce as a service, setting up one’s own MapReduce cluster on cloud instances, or using specialized cloud MapReduce runtimes that take advantage of cloud infrastructure services. In this paper, we introduce AzureMapReduce, a novel MapReduce runtime built using the Microsoft Azure cloud infrastructure services. AzureMapReduce architecture successfully leverages the high latency, eventually consistent, yet highly scalable Azure infrastructure services to provide an efficient, on demand alternative to traditional MapReduce clusters. Further we evaluate the use and performance of MapReduce frameworks, including AzureMapReduce, in cloud environments for scientific applications using sequence assembly and sequence alignment as use cases
A cloudification methodology for high performance simulations
Mención Internacional en el título de doctorMany scientific areas make extensive use of computer simulations to study complex real-world processes. These computations are typically very resource-intensive and present scalability issues as experiments get larger, even in dedicated supercomputers since they are limited by their own hardware resources. Cloud computing raises as an option to move forward into the ideal unlimited scalability by providing virtually infinite resources, yet applications must be adapted to this paradigm.
The major goal of this thesis is to analyze the suitability of performing simulations in clouds by performing a paradigm shift, from classic parallel approaches to data-centric models, in those applications where that is possible. The aim is to maintain the scalability achieved in traditional
HPC infrastructures, while taking advantage of Cloud Computing paradigm features. The thesis also explores the characteristics that make simulators suitable or unsuitable to be deployed on
HPC or Cloud infrastructures, defining a generic architecture and extracting common elements present among the majority of simulators.
As result, we propose a generalist cloudification methodology based on the MapReduce paradigm to migrate high performance simulations into the cloud to provide greater scalability. We analysed its viability by applying it to a real engineering simulator and running the resulting implementation on HPC and cloud environments. Our evaluations will aim to show that the cloudified application is highly scalable and there is still a large margin to improve the theoretical model and its implementations, and also to extend it to a wider range of simulations.Muchas áreas de investigación hacen uso extensivo de simulaciones informáticas para estudiar procesos complejos del mundo real. Estas simulaciones suelen hacer uso intensivo de recursos, y presentan problemas de escalabilidad conforme los experimentos aumentan en tamaño incluso en clústeres, ya que estos están limitados por sus propios recursos hardware. Cloud Computing (computación en la nube) surge como alternativa para avanzar hacia el ideal de escalabilidad ilimitada mediante el aprovisionamiento de infinitos recursos (de forma virtual). No obstante, las aplicaciones deben ser adaptadas a este nuevo paradigma.
La principal meta de esta tesis es analizar la idoneidad de realizar simulaciones en la nube mediante un cambio de paradigma, de las clásicas aproximaciones paralelas a nuevos modelos centrados en los datos, en aquellas aplicaciones donde esto sea posible. El objetivo es mantener la escalabilidad alcanzada en las tradicionales infraestructuras HPC, mientras se explotan las ventajas del paradigma de computación en la nube. La tesis explora las características que hacen a los simuladores ser o no adecuados para ser desplegados en infraestructuras clúster o en la nube, definiendo una arquitectura genérica y extrayendo elementos comunes presentes en la mayoría de los simuladores.
Como resultado, proponemos una metodología genérica de cloudificación, basada en el paradigma MapReduce, para migrar simulaciones de alto rendimiento a la nube con el fin de proveer mayor escalabilidad. Analizamos su viabilidad aplicándola a un simulador real de ingeniería, y ejecutando la implementación resultante en entornos clúster y en la nube. Nuestras evaluaciones pretenden mostrar que la aplicación cloudificada es altamente escalable, y que existe un amplio margen para mejorar el modelo teórico y sus implementaciones, y para extenderlo a un rango más amplio de simulaciones.- Administrador de Infraestructuras Ferroviarias (ADIF), Estudio y realización de programas de cálculo de pórticos rígidos de catenaria (CALPOR) y de sistema de simulación de montaje de agujas aéreas de línea aérea de contacto (SIA), JM/RS 3.6/4100.0685-9/00100 – Administrador de Infraestructuras Ferroviarias (ADIF), Proyecto para la Investigación sobre la aplicación de las TIC a la innovación de las diferentes infraestructuras
correspondientes a las instalaciones de electrificación y suministro de energía (SIRTE), JM/RS 3.9/1500.0009/0-00000 – Spanish Ministry of Education, TIN2010-16497, Scalable Input/Output techniques
for high-performance distributed and parallel computing environments
– Spanish Ministry of Economics and Competitiveness, TIN2013-41350-P, Técnicas de gestión escalable de datos para high-end computing systems – European Union, COST Action IC1305, ”Network for Sustainable Ultrascale Computing Platforms” (NESUS) – European Union, COST Action IC0805, ”Open European Network for High Performance Computing on Complex Environments” – Spanish Ministry of Economics and Competitiveness, TIN2011-15734-E, Red de Computación de Altas Prestaciones sobre Arquitecturas Paralelas Heterogéneas (CAPAP-H)Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Domenica Talia.- Presidente: José Daniel García Sánchez.- Secretario: José Manuel Moya Fernánde
Developing eThread pipeline using SAGA-pilot abstraction for large-scale structural bioinformatics
While most of computational annotation approaches are sequence-based, threading methods are becoming increasingly attractive because of predicted structural information that could uncover the underlying function. However, threading tools are generally compute-intensive and the number of protein sequences from even small genomes such as prokaryotes is large typically containing many thousands, prohibiting their application as a genome-wide structural systems biology tool. To leverage its utility, we have developed a pipeline for eThread - a meta-threading protein structure modeling tool, that can use computational resources efficiently and effectively. We employ a pilot-based approach that supports seamless data and task-level parallelism and manages large variation in workload and computational requirements. Our scalable pipeline is deployed on Amazon EC2 and can efficiently select resources based upon task requirements. We present runtime analysis to characterize computational complexity of eThread and EC2 infrastructure. Based on results, we suggest a pathway to an optimized solution with respect to metrics such as time-to-solution or cost-to-solution. Our eThread pipeline can scale to support a large number of sequences and is expected to be a viable solution for genome-scale structural bioinformatics and structure-based annotation, particularly, amenable for small genomes such as prokaryotes. The developed pipeline is easily extensible to other types of distributed cyberinfrastructure. © 2014 Anjani Ragothaman et al
Towards an MPI-like Framework for Azure Cloud Platform
Message passing interface (MPI) has been widely used for implementing parallel and distributed applications. The emergence of cloud computing offers a scalable, fault-tolerant, on-demand al-ternative to traditional on-premise clusters. In this thesis, we investigate the possibility of adopt-ing the cloud platform as an alternative to conventional MPI-based solutions. We show that cloud platform can exhibit competitive performance and benefit the users of this platform with its fault-tolerant architecture and on-demand access for a robust solution. Extensive research is done to identify the difficulties of designing and implementing an MPI-like framework for Azure cloud platform. We present the details of the key components required for implementing such a framework along with our experimental results for benchmarking multiple basic operations of MPI standard implemented in the cloud and its practical application in solving well-known large-scale algorithmic problems
Using power-law properties of social groups for cloud defense and community detection
The power-law distribution can be used to describe various aspects of social group behavior. For mussels, sociobiological research has shown that the Lévy walk best describes their self-organizing movement strategy. A mussel\u27s step length is drawn from a power-law distribution, and its direction is drawn from a uniform distribution. In the area of social networks, theories such as preferential attachment seek to explain why the degree distribution tends to be scale-free. The aim of this dissertation is to glean insight from these works to help solve problems in two domains: cloud computing systems and community detection.
Privacy and security are two areas of concern for cloud systems. Recent research has provided evidence indicating how a malicious user could perform co-residence profiling and public to private IP mapping to target and exploit customers which share physical resources. This work proposes a defense strategy, in part inspired by mussel self-organization, that relies on user account and workload clustering to mitigate co-residence profiling. To obfuscate the public to private IP map, clusters are managed and accessed by account proxies. This work also describes a set of capabilities and attack paths an attacker needs to execute for targeted co-residence, and presents arguments to show how the defense strategy disrupts the critical steps in the attack path for most cases. Further, it performs a risk assessment to determine the likelihood an individual user will be victimized, given that a successful non-directed exploit has occurred. Results suggest that while possible, this event is highly unlikely.
As for community detection, several algorithms have been proposed. Most of these, however, share similar disadvantages. Some algorithms require apriori information, such as threshold values or the desired number of communities, while others are computationally expensive. A third category of algorithms suffer from a combination of the two. This work proposes a greedy community detection heuristic which exploits the scale-free properties of social networks. It hypothesizes that highly connected nodes, or hubs, form the basic building blocks of communities. A detection technique that explores these characteristics remains largely unexplored throughout recent literature. To show its effectiveness, the algorithm is tested on commonly used real network data sets. In most cases, it classifies nodes into communities which coincide with their respective known structures. Unlike other implementations, the proposed heuristic is computationally inexpensive, deterministic, and does not require apriori information