    Elastic Cloud Services Compliance with Gustafson’s and Amdahl’s Laws

    Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016.The speedup that can be achieved with parallel and distributed architectures is limited at least by two laws: the Amdahl’s and Gustafson’s laws. The former limits the speedup to a constant value when a fixed size problem is executed on a multiprocessor, while the latter limits the speedup up to its linear value for the fixed time problems, which means that it is limited by the number of used processors. However, a superlinear speedup can be achieved (speedup greater than the number of used processors) due to insufficient memory, while, parallel and, especially distributed systems can even slowdown the execution due to the communication overhead, when compared to the sequential one. Since the cloud performance is uncertain and it can be influenced by available memory and networks, in this paper we investigate if it follows the same speedup pattern as the other traditional distributed systems. The focus is to determine how the elastic cloud services behave in the different scaled environments. We define several scaled systems and we model the corresponding performance indicators. The analysis shows that both laws limit the speedup for a specific range of the input parameters and type of scaling. Even more, the speedup in cloud systems follows the Gustafson’s extreme cases, i.e. insufficient memory and communication bound domains.This work is partially supported by the European Union’s Horizon 2020 research and innovation programme under the grant agreements 644179 ENTICE: dEcentralized repositories for traNsparent and efficienT vIrtual maChine opErations (first two authors) and 643946, CloudLightning: Selforganizing, self-managing Heterogeneous Clouds (fourth author). The authors would like to acknowledge networking support by the COST programme Action IC1305, Network for Sustainable Ultrascale Computing (NESUS).European Commissio

    Using the Cloud for Parameter Estimation Problems: Comparing Spark vs MPI with a Case-Study

    Date of Conference: 14-17 May 2017. Conference Location: Madrid[Abstract] Systems biology is an emerging approach focused in generating new knowledge about complex biological systems by combining experimental data with mathematical modeling and advanced computational techniques. Many problems in this field are extremely challenging and require substantial supercomputing resources to be solved. This is the case of parameter estimation in large-scale nonlinear dynamic systems biology models. Recently, Cloud Computing has emerged as a new paradigm for on-demand delivery of computing resources. However, scientific computing community has been quite hesitant in using the Cloud, simply because traditional programming models do not fit well with the new paradigm, and the earliest cloud programming models do not allow most scientific computations being efficiently run in the Cloud. In this paper we explore and compare two distributed computing models: the MPI (message-passing interface) model, that is high-performance oriented, and the Spark model, which is throughput oriented but outperforms other cloud programming solutions adding improved support for iterative algorithms through in-memory computing. The performance of a very well known metaheuristic, the Differential Evolution algorithm, has been thoroughly assessed using a challenging parameter estimation problem from the domain of computational systems biology. The experiments have been carried out both in a local cluster and in the Microsoft Azure public cloud, allowing performance and cost evaluation for both infrastructures.Gobierno de España; DPI2014-55276-C5-2-RFondos Feder; TIN2016-75845-PXunta de Galicia; R2016/045Xunta de Galicia; GRC2013/05

    Accelerating Heuristic Search for AI Planning

    AI Planning is an important research field. Heuristic search is the most commonly used method in solving planning problems. Despite recent advances in improving the quality of heuristics and devising better search strategies, the high computational cost of heuristic search remains a barrier that severely limits its application to real world problems. In this dissertation, we propose theories, algorithms and systems to accelerate heuristic search for AI planning. We make four major contributions in this dissertation. First, we propose a state-space reduction method called Stratified Planning to accelerate heuristic search. Stratified Planning can be combined with any heuristic search to prune redundant paths in state space, without sacrificing the optimality and completeness of search algorithms. Second, we propose a general theory for partial order reduction in planning. The proposed theory unifies previous reduction algorithms for planning, and ushers in new partial order reduction algorithms that can further accelerate heuristic search by pruning more nodes in state space than previously proposed algorithms. Third, we study the local structure of state space and propose using random walks to accelerate plateau exploration for heuristic search. We also implement two state-of-the-art planners that perform competitively in the Seventh International Planning Competition. Last, we utilize cloud computing to further accelerate search for planning. We propose a portfolio stochastic search algorithm that takes advantage of the cloud. We also implement a cloud-based planning system to which users can submit planning tasks and make full use of the computational resources provided by the cloud. We push the state of the art in AI planning by developing theories and algorithms that can accelerate heuristic search for planning. We implement state-of-the-art planning systems that have strong speed and quality performance

    Proceedings of the Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016) Sofia, Bulgaria

    Combinatorial Auction-Based Virtual Machine Provisioning And Allocation In Clouds

    Current cloud providers use fixed-price based mechanisms to allocate Virtual Machine (VM) instances to their users. But economic theory states that when there are large amount of resources to be allocated to large number of users, auctions are the most efficient allocation mechanisms. Auctions achieve efficiency of allocation and also maximize the providers\u27 revenue, which a fixed-price based mechanism is unable to do. We argue that combinatorial auctions are best suited for the problem of VM provisioning and allocation in clouds, since they provide the users with the most flexible way to express their requirements. In combinatorial auctions, users bid for bundles of items rather than individual ones, therefore they are able to express whether the items they require are complementary to each other. The objective of this Ph.D. dissertation is to design, study, and implement combinatorial auction-based mechanisms for efficient provisioning and allocation of VM instances in clouds. The central hypothesis is that allocation efficiency and revenue maximization can be obtained by inducing users to fully express and truthfully report their preferences to the system. The rationale for our research is that, once efficient resource provisioning and allocation mechanisms that take into account the incentives of the users and cloud providers are developed and implemented, it will become more efficient to utilize cloud computing environments for solving challenging problems in business, science and engineering. In this dissertation, we present three combinatorial auction-based offline mechanisms to provision and allocation VM instances in clouds. We also present an online mechanism for dynamic provisioning of virtual machine instances in clouds. Finally, we designed an efficient bidding algorithm to assist users submitting bids to combinatorial auction-based mechanisms to execute parallel jobs the cloud. We outline our contribution and possible direction for future research in this field

    High performance computing in the cloud

    In recent years, the interest in both scientific and business workflows has increased. A workflow is composed of a series of tools, which should be executed in a predefined order to perform an analysis. Traditionally, these workflows were executed in a manual way, sending the output of one tool to the next one in the analysis process. Many applications to execute workflows automatically, appeared recently. These applications ease the work of the users while executing their analyses. In addition, from the computational point of view, some workflows require a significant amount of resources. Consequently, workflow execution moved from single workstations to distributed environments such as Grids or Clouds. Data management and tasks scheduling are required to execute workflows in an efficient way in such environments. In this thesis, we propose a cloud-based HPC environment, focusing on tasks scheduling, resources auto-scaling, data management and simplifying the access to the resources with software clients. First, the cloud computing infrastructure is devised, which includes the base software (i.e. OpenStack) plus several additional modules aimed at improving authentication (i.e. LDAP) and data management (i.e. GridFTP, Globus Online and CloudFuse). Second, built on top of the mentioned infrastructure, the TORQUE distributed resources manager and the Maui scheduler have been configured to schedule and distribute tasks to the cloud-based workers. To reduce the number of idle nodes and the incurred cost of the active cloud resources, we also propose a configurable auto-scaling technique, which is able to scale the execution cluster depending on the workload. Additionally, in order to simplify tasks submission to the TORQUE execution cluster, we have interconnected the Galaxy workflows management system with it, therefore users benefit from a simple way to execute their tasks. Finally, we conducted an experimental evaluation, composed by a number of different studies with synthetic and real-world applications, to show the behaviour of the auto-scaled execution cluster managed by TORQUE and Maui. All experiments have been performed by using an OpenStack cloud computing environment and the benchmarked applications correspond to the benchmarking suite, which is specially designed for workflows scheduling in the cloud computing environment. Cybershake, Ligo and Montage have been the selected synthetic applications from the benchmarking suite. GECKO and a GWAS pipeline represent the real-world test use cases, both having a diverse and heterogeneous set of tasks.The numerous technological advances in data acquisition techniques allow the massive production of enormous amounts of data in diverse fields such as astronomy, health and social networks. Nowadays, only a small part of this data can be analysed because of the lack of computational resources. High Performance Computing (HPC) strategies represent the single choice to analyse such overwhelming amount of data. However, in general, HPC techniques require the use of big and expensive computing and storage infrastructures, usually not affordable or available for most users. Cloud computing, where users pay for the resources they need and when they actually need them, appears as an interesting alternative. Besides the savings in hardware infrastructure, cloud computing offers further advantages such as the removal of installation, administration and supplying requirements. In addition, it enables users to use better hardware than the one they can usually afford, scale the resources depending on their needs, and a greater fault-tolerance, amongst others. The efficient utilisation of HPC resources becomes a fundamental task, particularly in cloud computing. We need to consider the cost of using HPC resources, specially in the case of cloud-based infrastructures, where users have to pay for storing, transferring and analysing data. Therefore, it is really important the usage of generic tasks scheduling and auto-scaling techniques to efficiently exploit the computational resources. It is equally important to make these tasks user-friendly through the development of tools/applications (software clients), which act as interface between the user and the infrastructure

    Adaptive, efficient, parallel execution of parallel programs

    Abstract Future multicore processors will be heterogeneous, be increasingly less reliable, and operate in dynamically changing operating conditions. Such environments will result in a constantly varying pool of hardware resources which can greatly complicate the task of efficiently exposing a program's parallelism onto these resources. Coupled with this uncertainty is the diverse set of efficiency metrics that users may desire. This paper proposes Varuna, a system that dynamically, continuously, rapidly and transparently adapts a program's parallelism to best match the instantaneous capabilities of the hardware resources while satisfying different efficiency metrics. Varuna is applicable to both multithreaded and task-based programs and can be seamlessly inserted between the program and the operating system without needing to change the source code of either. We demonstrate Varuna's effectiveness in diverse execution environments using unaltered C/C++ parallel programs from various benchmark suites. Regardless of the execution environment, Varuna always outperformed the state-of-the-art approaches for the efficiency metrics considered

    Libro de Actas JCC&BD 2018 : VI Jornadas de Cloud Computing & Big Data

    Se recopilan las ponencias presentadas en las VI Jornadas de Cloud Computing & Big Data (JCC&BD), realizadas entre el 25 al 29 de junio de 2018 en la Facultad de Informática de la Universidad Nacional de La Plata.Universidad Nacional de La Plata (UNLP) - Facultad de Informátic

    Effizienz in Cluster-Datenbanksystemen - Dynamische und ArbeitslastberĂĽcksichtigende Skalierung und Allokation

    Database systems have been vital in all forms of data processing for a long time. In recent years, the amount of processed data has been growing dramatically, even in small projects. Nevertheless, database management systems tend to be static in terms of size and performance which makes scaling a difficult and expensive task. Because of performance and especially cost advantages more and more installed systems have a shared nothing cluster architecture. Due to the massive parallelism of the hardware programming paradigms from high performance computing are translated into data processing. Database research struggles to keep up with this trend. A key feature of traditional database systems is to provide transparent access to the stored data. This introduces data dependencies and increases system complexity and inter process communication. Therefore, many developers are exchanging this feature for a better scalability. However, explicitly managing the data distribution and data flow requires a deep understanding of the distributed system and reduces the possibilities for automatic and autonomic optimization. In this thesis we present an approach for database system scaling and allocation that features good scalability although it keeps the data distribution transparent. The first part of this thesis analyzes the challenges and opportunities for self-scaling database management systems in cluster environments. Scalability is a major concern of Internet based applications. Access peaks that overload the application are a financial risk. Therefore, systems are usually configured to be able to process peaks at any given moment. As a result, server systems often have a very low utilization. In distributed systems the efficiency can be increased by adapting the number of nodes to the current workload. We propose a processing model and an architecture that allows efficient self-scaling of cluster database systems. In the second part we consider different allocation approaches. To increase the efficiency we present a workload-aware, query-centric model. The approach is formalized; optimal and heuristic algorithms are presented. The algorithms optimize the data distribution for local query execution and balance the workload according to the query history. We present different query classification schemes for different forms of partitioning. The approach is evaluated for OLTP and OLAP style workloads. It is shown that variants of the approach scale well for both fields of application. The third part of the thesis considers benchmarks for large, adaptive systems. First, we present a data generator for cloud-sized applications. Due to its architecture the data generator can easily be extended and configured. A key feature is the high degree of parallelism that makes linear speedup for arbitrary numbers of nodes possible. To simulate systems with user interaction, we have analyzed a productive online e-learning management system. Based on our findings, we present a model for workload generation that considers the temporal dependency of user interaction.Datenbanksysteme sind seit langem die Grundlage für alle Arten von Informationsverarbeitung. In den letzten Jahren ist das Datenaufkommen selbst in kleinen Projekten dramatisch angestiegen. Dennoch sind viele Datenbanksysteme statisch in Bezug auf ihre Kapazität und Verarbeitungsgeschwindigkeit was die Skalierung aufwendig und teuer macht. Aufgrund der guten Geschwindigkeit und vor allem aus Kostengründen haben immer mehr Systeme eine Shared-Nothing-Architektur, bestehen also aus unabhängigen, lose gekoppelten Rechnerknoten. Da dieses Konstruktionsprinzip einen sehr hohen Grad an Parallelität aufweist, werden zunehmend Programmierparadigmen aus dem klassischen Hochleistungsrechen für die Informationsverarbeitung eingesetzt. Dieser Trend stellt die Datenbankforschung vor große Herausforderungen. Eine der grundlegenden Eigenschaften traditioneller Datenbanksysteme ist der transparente Zugriff zu den gespeicherten Daten, der es dem Nutzer erlaubt unabhängig von der internen Organisation auf die Daten zuzugreifen. Die resultierende Unabhängigkeit führt zu Abhängigkeiten in den Daten und erhöht die Komplexität der Systeme und der Kommunikation zwischen einzelnen Prozessen. Daher wird Transparenz von vielen Entwicklern für eine bessere Skalierbarkeit geopfert. Diese Entscheidung führt dazu, dass der die Datenorganisation und der Datenfluss explizit behandelt werden muss, was die Möglichkeiten für eine automatische und autonome Optimierung des Systems einschränkt. Der in dieser Arbeit vorgestellte Ansatz zur Skalierung und Allokation erhält den transparenten Zugriff und zeichnet sich dabei durch seine vollständige Automatisierbarkeit und sehr gute Skalierbarkeit aus. Im ersten Teil dieser Dissertation werden die Herausforderungen und Chancen für selbst-skalierende Datenbankmanagementsysteme behandelt, die in auf Computerclustern betrieben werden. Gute Skalierbarkeit ist eine notwendige Eigenschaft für Anwendungen, die über das Internet zugreifbar sind. Lastspitzen im Zugriff, die die Anwendung überladen stellen ein finanzielles Risiko dar. Deshalb werden Systeme so konfiguriert, dass sie eventuelle Lastspitzen zu jedem Zeitpunkt verarbeiten können. Das führt meist zu einer im Schnitt sehr geringen Auslastung der unterliegenden Systeme. Eine Möglichkeit dieser Ineffizienz entgegen zu steuern ist es die Anzahl der verwendeten Rechnerknoten an die vorliegende Last anzupassen. In dieser Dissertation werden ein Modell und eine Architektur für die Anfrageverarbeitung vorgestellt, mit denen es möglich ist Datenbanksysteme auf Clusterrechnern einfach und effizient zu skalieren. Im zweiten Teil der Arbeit werden verschieden Möglichkeiten für die Datenverteilung behandelt. Um die Effizienz zu steigern wird ein Modell verwendet, das die Lastverteilung im Anfragestrom berücksichtigt. Der Ansatz ist formalisiert und optimale und heuristische Lösungen werden präsentiert. Die vorgestellten Algorithmen optimieren die Datenverteilung für eine lokale Ausführung aller Anfragen und balancieren die Last auf den Rechnerknoten. Es werden unterschiedliche Arten der Anfrageklassifizierung vorgestellt, die zu verschiedenen Arten von Partitionierung führen. Der Ansatz wird sowohl für Onlinetransaktionsverarbeitung, als auch Onlinedatenanalyse evaluiert. Die Evaluierung zeigt, dass der Ansatz für beide Felder sehr gut skaliert. Im letzten Teil der Arbeit werden verschiedene Techniken für die Leistungsmessung von großen, adaptiven Systemen präsentiert. Zunächst wird ein Datengenerierungsansatz gezeigt, der es ermöglicht sehr große Datenmengen völlig parallel zu erzeugen. Um die Benutzerinteraktion von Onlinesystemen zu simulieren wurde ein produktives E-learningsystem analysiert. Anhand der Analyse wurde ein Modell für die Generierung von Arbeitslasten erstellt, das die zeitlichen Abhängigkeiten von Benutzerinteraktion berücksichtigt