39 research outputs found

    A Low Cost Two-Tier Architecture Model For High Availability Clusters Application Load Balancing

    Full text link
    This article proposes a design and implementation of a low cost two-tier architecture model for high availability cluster combined with load-balancing and shared storage technology to achieve desired scale of three-tier architecture for application load balancing e.g. web servers. The research work proposes a design that physically omits Network File System (NFS) server nodes and implements NFS server functionalities within the cluster nodes, through Red Hat Cluster Suite (RHCS) with High Availability (HA) proxy load balancing technologies. In order to achieve a low-cost implementation in terms of investment in hardware and computing solutions, the proposed architecture will be beneficial. This system intends to provide steady service despite any system components fails due to uncertainly such as network system, storage and applications.Comment: Load balancing, high availability cluster, web server cluster

    Latency-driven replication for globally distributed systems

    Get PDF
    Steen, M.R. van [Promotor]Pierre, G.E.O. [Copromotor

    Effective task assignment strategies for distributed systems under highly variable workloads

    Get PDF
    Heavy-tailed workload distributions are commonly experienced in many areas of distributed computing. Such workloads are highly variable, where a small number of very large tasks make up a large proportion of the workload, making the load very hard to distribute effectively. Traditional task assignment policies are ineffective under these conditions as they were formulated based on the assumption of an exponentially distributed workload. Size-based task assignment policies have been proposed to handle heavy-tailed workloads, but their applications are limited by their static nature and assumption of prior knowledge of a task's service requirement. This thesis analyses existing approaches to load distribution under heavy-tailed workloads, and presents a new generalised task assignment policy that significantly improves performance for many distributed applications, by intelligently addressing the negative effects on performance that highly variable workloads cause. Many problems associated with the modelling and optimisations of systems under highly variable workloads were then addressed by a novel technique that approximated these workloads with simpler mathematical representations, without losing any of their pertinent original properties. Finally, we obtain advance queuing metrics (such as the variance of key measurements like waiting time and slowdown that are difficult to obtain analytically) through rigorous simulation

    Aggregate matrix-analytic techniques and their applications

    Get PDF
    The complexity of computer systems affects the complexity of modeling techniques that can be used for their performance analysis. In this dissertation, we develop a set of techniques that are based on tractable analytic models and enable efficient performance analysis of computer systems. Our approach is three pronged: first, we propose new techniques to parameterize measurement data with Markovian-based stochastic processes that can be further used as input into queueing systems; second, we propose new methods to efficiently solve complex queueing models; and third, we use the proposed methods to evaluate the performance of clustered Web servers and propose new load balancing policies based on this analysis.;We devise two new techniques for fitting measurement data that exhibit high variability into Phase-type (PH) distributions. These techniques apply known fitting algorithms in a divide-and-conquer fashion. We evaluate the accuracy of our methods from both the statistics and the queueing systems perspective. In addition, we propose a new methodology for fitting measurement data that exhibit long-range dependence into Markovian Arrival Processes (MAPs).;We propose a new methodology, ETAQA, for the exact solution of M/G/1-type processes, (GI/M/1-type processes, and their intersection, i.e., quasi birth-death (QBD) processes. ETAQA computes an aggregate steady state probability distribution and a set of measures of interest. E TAQA is numerically stable and computationally superior to alternative solution methods. Apart from ETAQA, we propose a new methodology for the exact solution of a class of GI/G/1-type processes based on aggregation/decomposition.;Finally, we demonstrate the applicability of the proposed techniques by evaluating load balancing policies in clustered Web servers. We address the high variability in the service process of Web servers by dedicating the servers of a cluster to requests of similar sizes and propose new, content-aware load balancing policies. Detailed analysis shows that the proposed policies achieve high user-perceived performance and, by continuously adapting their scheduling parameters to the current workload characteristics, provide good performance under conditions of transient overload

    Towards Data Sharing across Decentralized and Federated IoT Data Analytics Platforms

    Get PDF
    In the past decade the Internet-of-Things concept has overwhelmingly entered all of the fields where data are produced and processed, thus, resulting in a plethora of IoT platforms, typically cloud-based, that centralize data and services management. In this scenario, the development of IoT services in domains such as smart cities, smart industry, e-health, automotive, are possible only for the owner of the IoT deployments or for ad-hoc business one-to-one collaboration agreements. The realization of "smarter" IoT services or even services that are not viable today envisions a complete data sharing with the usage of multiple data sources from multiple parties and the interconnection with other IoT services. In this context, this work studies several aspects of data sharing focusing on Internet-of-Things. We work towards the hyperconnection of IoT services to analyze data that goes beyond the boundaries of a single IoT system. This thesis presents a data analytics platform that: i) treats data analytics processes as services and decouples their management from the data analytics development; ii) decentralizes the data management and the execution of data analytics services between fog, edge and cloud; iii) federates peers of data analytics platforms managed by multiple parties allowing the design to scale into federation of federations; iv) encompasses intelligent handling of security and data usage control across the federation of decentralized platforms instances to reduce data and service management complexity. The proposed solution is experimentally evaluated in terms of performances and validated against use cases. Further, this work adopts and extends available standards and open sources, after an analysis of their capabilities, fostering an easier acceptance of the proposed framework. We also report efforts to initiate an IoT services ecosystem among 27 cities in Europe and Korea based on a novel methodology. We believe that this thesis open a viable path towards a hyperconnection of IoT data and services, minimizing the human effort to manage it, but leaving the full control of the data and service management to the users' will

    Effizienz in Cluster-Datenbanksystemen - Dynamische und Arbeitslastberücksichtigende Skalierung und Allokation

    Get PDF
    Database systems have been vital in all forms of data processing for a long time. In recent years, the amount of processed data has been growing dramatically, even in small projects. Nevertheless, database management systems tend to be static in terms of size and performance which makes scaling a difficult and expensive task. Because of performance and especially cost advantages more and more installed systems have a shared nothing cluster architecture. Due to the massive parallelism of the hardware programming paradigms from high performance computing are translated into data processing. Database research struggles to keep up with this trend. A key feature of traditional database systems is to provide transparent access to the stored data. This introduces data dependencies and increases system complexity and inter process communication. Therefore, many developers are exchanging this feature for a better scalability. However, explicitly managing the data distribution and data flow requires a deep understanding of the distributed system and reduces the possibilities for automatic and autonomic optimization. In this thesis we present an approach for database system scaling and allocation that features good scalability although it keeps the data distribution transparent. The first part of this thesis analyzes the challenges and opportunities for self-scaling database management systems in cluster environments. Scalability is a major concern of Internet based applications. Access peaks that overload the application are a financial risk. Therefore, systems are usually configured to be able to process peaks at any given moment. As a result, server systems often have a very low utilization. In distributed systems the efficiency can be increased by adapting the number of nodes to the current workload. We propose a processing model and an architecture that allows efficient self-scaling of cluster database systems. In the second part we consider different allocation approaches. To increase the efficiency we present a workload-aware, query-centric model. The approach is formalized; optimal and heuristic algorithms are presented. The algorithms optimize the data distribution for local query execution and balance the workload according to the query history. We present different query classification schemes for different forms of partitioning. The approach is evaluated for OLTP and OLAP style workloads. It is shown that variants of the approach scale well for both fields of application. The third part of the thesis considers benchmarks for large, adaptive systems. First, we present a data generator for cloud-sized applications. Due to its architecture the data generator can easily be extended and configured. A key feature is the high degree of parallelism that makes linear speedup for arbitrary numbers of nodes possible. To simulate systems with user interaction, we have analyzed a productive online e-learning management system. Based on our findings, we present a model for workload generation that considers the temporal dependency of user interaction.Datenbanksysteme sind seit langem die Grundlage für alle Arten von Informationsverarbeitung. In den letzten Jahren ist das Datenaufkommen selbst in kleinen Projekten dramatisch angestiegen. Dennoch sind viele Datenbanksysteme statisch in Bezug auf ihre Kapazität und Verarbeitungsgeschwindigkeit was die Skalierung aufwendig und teuer macht. Aufgrund der guten Geschwindigkeit und vor allem aus Kostengründen haben immer mehr Systeme eine Shared-Nothing-Architektur, bestehen also aus unabhängigen, lose gekoppelten Rechnerknoten. Da dieses Konstruktionsprinzip einen sehr hohen Grad an Parallelität aufweist, werden zunehmend Programmierparadigmen aus dem klassischen Hochleistungsrechen für die Informationsverarbeitung eingesetzt. Dieser Trend stellt die Datenbankforschung vor große Herausforderungen. Eine der grundlegenden Eigenschaften traditioneller Datenbanksysteme ist der transparente Zugriff zu den gespeicherten Daten, der es dem Nutzer erlaubt unabhängig von der internen Organisation auf die Daten zuzugreifen. Die resultierende Unabhängigkeit führt zu Abhängigkeiten in den Daten und erhöht die Komplexität der Systeme und der Kommunikation zwischen einzelnen Prozessen. Daher wird Transparenz von vielen Entwicklern für eine bessere Skalierbarkeit geopfert. Diese Entscheidung führt dazu, dass der die Datenorganisation und der Datenfluss explizit behandelt werden muss, was die Möglichkeiten für eine automatische und autonome Optimierung des Systems einschränkt. Der in dieser Arbeit vorgestellte Ansatz zur Skalierung und Allokation erhält den transparenten Zugriff und zeichnet sich dabei durch seine vollständige Automatisierbarkeit und sehr gute Skalierbarkeit aus. Im ersten Teil dieser Dissertation werden die Herausforderungen und Chancen für selbst-skalierende Datenbankmanagementsysteme behandelt, die in auf Computerclustern betrieben werden. Gute Skalierbarkeit ist eine notwendige Eigenschaft für Anwendungen, die über das Internet zugreifbar sind. Lastspitzen im Zugriff, die die Anwendung überladen stellen ein finanzielles Risiko dar. Deshalb werden Systeme so konfiguriert, dass sie eventuelle Lastspitzen zu jedem Zeitpunkt verarbeiten können. Das führt meist zu einer im Schnitt sehr geringen Auslastung der unterliegenden Systeme. Eine Möglichkeit dieser Ineffizienz entgegen zu steuern ist es die Anzahl der verwendeten Rechnerknoten an die vorliegende Last anzupassen. In dieser Dissertation werden ein Modell und eine Architektur für die Anfrageverarbeitung vorgestellt, mit denen es möglich ist Datenbanksysteme auf Clusterrechnern einfach und effizient zu skalieren. Im zweiten Teil der Arbeit werden verschieden Möglichkeiten für die Datenverteilung behandelt. Um die Effizienz zu steigern wird ein Modell verwendet, das die Lastverteilung im Anfragestrom berücksichtigt. Der Ansatz ist formalisiert und optimale und heuristische Lösungen werden präsentiert. Die vorgestellten Algorithmen optimieren die Datenverteilung für eine lokale Ausführung aller Anfragen und balancieren die Last auf den Rechnerknoten. Es werden unterschiedliche Arten der Anfrageklassifizierung vorgestellt, die zu verschiedenen Arten von Partitionierung führen. Der Ansatz wird sowohl für Onlinetransaktionsverarbeitung, als auch Onlinedatenanalyse evaluiert. Die Evaluierung zeigt, dass der Ansatz für beide Felder sehr gut skaliert. Im letzten Teil der Arbeit werden verschiedene Techniken für die Leistungsmessung von großen, adaptiven Systemen präsentiert. Zunächst wird ein Datengenerierungsansatz gezeigt, der es ermöglicht sehr große Datenmengen völlig parallel zu erzeugen. Um die Benutzerinteraktion von Onlinesystemen zu simulieren wurde ein produktives E-learningsystem analysiert. Anhand der Analyse wurde ein Modell für die Generierung von Arbeitslasten erstellt, das die zeitlichen Abhängigkeiten von Benutzerinteraktion berücksichtigt

    Seventh Biennial Report : June 2003 - March 2005

    No full text

    Organization based multiagent architecture for distributed environments

    Get PDF
    [EN]Distributed environments represent a complex field in which applied solutions should be flexible and include significant adaptation capabilities. These environments are related to problems where multiple users and devices may interact, and where simple and local solutions could possibly generate good results, but may not be effective with regards to use and interaction. There are many techniques that can be employed to face this kind of problems, from CORBA to multi-agent systems, passing by web-services and SOA, among others. All those methodologies have their advantages and disadvantages that are properly analyzed in this documents, to finally explain the new architecture presented as a solution for distributed environment problems. The new architecture for solving complex solutions in distributed environments presented here is called OBaMADE: Organization Based Multiagent Architecture for Distributed Environments. It is a multiagent architecture based on the organizations of agents paradigm, where the agents in the architecture are structured into organizations to improve their organizational capabilities. The reasoning power of the architecture is based on the Case-Based Reasoning methology, being implemented in a internal organization that uses agents to create services to solve the external request made by the users. The OBaMADE architecture has been successfully applied to two different case studies where its prediction capabilities have been properly checked. Those case studies have showed optimistic results and, being complex systems, have demonstrated the abstraction and generalizations capabilities of the architecture. Nevertheless OBaMADE is intended to be able to solve much other kind of problems in distributed environments scenarios. It should be applied to other varieties of situations and to other knowledge fields to fully develop its potencial.[ES]Los entornos distribuidos representan un campo de conocimiento complejo en el que las soluciones a aplicar deben ser flexibles y deben contar con gran capacidad de adaptación. Este tipo de entornos está normalmente relacionado con problemas donde varios usuarios y dispositivos entran en juego. Para solucionar dichos problemas, pueden utilizarse sistemas locales que, aunque ofrezcan buenos resultados en términos de calidad de los mismos, no son tan efectivos en cuanto a la interacción y posibilidades de uso. Existen múltiples técnicas que pueden ser empleadas para resolver este tipo de problemas, desde CORBA a sistemas multiagente, pasando por servicios web y SOA, entre otros. Todas estas mitologías tienen sus ventajas e inconvenientes, que se analizan en este documento, para explicar, finalmente, la nueva arquitectura presentada como una solución para los problemas generados en entornos distribuidos. La nueva arquitectura aquí se llama OBaMADE, que es el acrónimo del inglés Organization Based Multiagent Architecture for Distributed Environments (Arquitectura Multiagente Basada en Organizaciones para Entornos Distribuidos). Se trata de una arquitectura multiagente basasa en el paradigma de las organizaciones de agente, donde los agentes que forman parte de la arquitectura se estructuran en organizaciones para mejorar sus capacidades organizativas. La capacidad de razonamiento de la arquitectura está basada en la metodología de razonamiento basado en casos, que se ha implementado en una de las organizaciones internas de la arquitectura por medio de agentes que crean servicios que responden a las solicitudes externas de los usuarios. La arquitectura OBaMADE se ha aplicado de forma exitosa a dos casos de estudio diferentes, en los que se han demostrado sus capacidades predictivas. Aplicando OBaMADE a estos casos de estudio se han obtenido resultados esperanzadores y, al ser sistemas complejos, se han demostrado las capacidades tanto de abstracción como de generalización de la arquitectura presentada. Sin embargo, esta arquitectura está diseñada para poder ser aplicada a más tipo de problemas de entornos distribuidos. Debe ser aplicada a más variadas situaciones y a otros campos de conocimiento para desarrollar completamente el potencial de esta arquitectura
    corecore