Search CORE

12 research outputs found

Performance and energy optimization on terasort algorithm by task self-resizing

Author: Pahl Claus
Song Jie
Xu Shu
Yu Ge
Zhang Li
Publication venue: 'Kaunas University of Technology (KTU)'
Publication date: 30/09/2014
Field of study

In applications of MapReduce, Terasort is one of the most successful ones, which has helped Hadoop to win the Sort Benchmark three times. While Terasort is known for its sorting speed on big data, its performance and energy consumption still can be optimized. We have analyzed the characteristics of Terasort and have identified the existence of idle notes, which does not only waste energy but also loses performance. Therefore, we optimize Terasort through a single-task distributed algorithm and a task self-resizing algorithm to save time and reduce the energy that is consumed by map nodes, which is caused by waiting for tasks and reduce nodes waiting for input. The algorithm proposed in this paper has proved to be effective in optimizing performance and energy consumption through a series of experiments. It can also be adapted to other applications in the MapReduce environment

Irish Universities

DCU Online Research Access Service

Automatic Rescaling and Tuning of Big Data Applications on Container-Based Virtual Environments

Author: Enes Jonatan
Publication venue
Publication date: 01/01/2020
Field of study

Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 524V01[Resumo] As aplicacións Big Data actuais evolucionaron dun xeito significativo, dende fluxos de traballo baseados en procesamento por lotes ata outros máis complexos que poden requirir múltiples etapas de procesamento usando diferentes tecnoloxías, e mesmo executándose en tempo real. Doutra banda, para despregar estas aplicacións, os clusters ‘commodity’ foron substituídos nalgúns casos por paradigmas máis flexibles como o Cloud, ou mesmo por outros emerxentes como a computación ‘serverless’, precisando ambos paradigmas de tecnoloxías de virtualización. Esta Tese propón dúas contornas que proporcionan modos alternativos de realizar unha análise en profundidade e unha mellor xestión dos recursos de aplicacións Big Data despregadas en contornas virtuais baseadas en contedores software. Por unha banda, a contorna BDWatchdog permite realizar unha análise de gran fino e en tempo real en termos do uso dos recursos do sistema e do perfilado do código. Doutra banda, descríbese unha contorna para o reescalado dinámico e en tempo real dos recursos segundo un conxunto de políticas configurables. A primeira política proposta céntrase no reescalado automático dos recursos dos contedores segundo o uso real que as aplicacións fan dos mesmos, proporcionando así unha contorna ‘serverless’. Ademais, preséntase unha política alternativa centrada na xestión enerxética que permite implementar os conceptos de limitación e presuposto de potencia, que poden aplicarse a contedores, aplicacións ou mesmo usuarios. En xeral, as contornas propostas nesta Tese tratan de poñer de relevo o potencial de aplicar novos xeitos de analizar e axustar os recursos das aplicacións Big Data despregadas en clusters de contedores, mesmo en tempo real. Os casos de uso presentados son exemplos diso, demostrando que as aplicacións Big Data poden adaptarse a novas tecnoloxías ou paradigmas sen teren que cambiar as súas características máis intrínsecas.[Resumen] Las aplicaciones Big Data actuales han evolucionado de forma significativa, desde flujos de trabajo basados en procesamiento por lotes hasta otros más complejos que pueden requerir múltiples etapas de procesamiento usando distintas tecnologías, e incluso ejecutándose en tiempo real. Por otra parte, para desplegar estas aplicaciones, los clusters ‘commodity’ se han reemplazado en algunos casos por paradigmas más flexibles como el Cloud, o incluso por otros emergentes como la computación ‘serverless’, requiriendo ambos paradigmas de tecnologías de virtualización. Esta Tesis propone dos entornos que proporcionan formas alternativas de realizar un análisis en profundidad y una mejor gestión de los recursos de aplicaciones Big Data desplegadas en entornos virtuales basados en contenedores software. Por un lado, el entorno BDWatchdog permite realizar un análisis de grano fino y en tiempo real en lo que respecta a la monitorización de los recursos del sistema y al perfilado del código. Por otro lado, se describe un entorno para el reescalado dinámico y en tiempo real de los recursos de acuerdo a un conjunto de políticas configurables. La primera política propuesta se centra en el reescalado automático de los recursos de los contenedores de acuerdo al uso real que las aplicaciones hacen de los mismos, proporcionando así un entorno ‘serverless’. Además, se presenta una política alternativa centrada en la gestión energética que permite implementar los conceptos de limitación y presupuesto de potencia, pudiendo aplicarse a contenedores, aplicaciones o incluso usuarios. En general, los entornos propuestos en esta Tesis tratan de resaltar el potencial de aplicar nuevas formas de analizar y ajustar los recursos de las aplicaciones Big Data desplegadas en clusters de contenedores, incluso en tiempo real. Los casos de uso que se han presentado son ejemplos de esto, demostrando que las aplicaciones Big Data pueden adaptarse a nuevas tecnologías o paradigmas sin tener que cambiar su características más intrínsecas.[Abstract] Current Big Data applications have significantly evolved from its origins, moving from mostly batch workloads to more complex ones that may involve many processing stages using different technologies or even working in real time. Moreover, to deploy these applications, commodity clusters have been in some cases replaced in favor of newer and more flexible paradigms such as the Cloud or even emerging ones such as serverless computing, usually involving virtualization techniques. This Thesis proposes two frameworks that provide alternative ways to perform indepth analysis and improved resource management for Big Data applications deployed on virtual environments based on software containers. On the one hand, the BDWatchdog framework is capable of performing real-time, fine-grain analysis in terms of system resource monitoring and code profiling. On the other hand, a framework for the dynamic and real-time scaling of resources according to several tuning policies is described. The first proposed policy revolves around the automatic scaling of the containers’ resources according to the real usage of the applications, thus providing a serverless environment. Furthermore, an alternative policy focused on energy management is presented in a scenario where power capping and budgeting functionalities are implemented for containers, applications or even users. Overall, the frameworks proposed in this Thesis aim to showcase how novel ways of analyzing and tuning the resources given to Big Data applications in container clusters are possible, even in real time. The supported use cases that were presented are examples of this, and show how Big Data applications can be adapted to newer technologies or paradigms without having to lose their distinctive characteristics

Repositorio da Universidade da Coruña

Optimized terasort algorithm for data analytics: case of climate data analysis

Author: Matu Fiona Mugure
Publication venue: Strathmore University
Publication date: 01/01/2017
Field of study

Thesis submitted in partial fulfillment of the requirements for the Degree of Master of Science in Information Technology (MSIT) at Strathmore UniversityWeather forecasting has proven valuable in unravelling the causes of the occurrence of natural phenomena and predicting of future climatic conditions. Subsequently, better preparation and policy making regarding these occurrences can be done using resultant information from techniques employed in weather forecasting. Analysis of vast amounts of data are characteristic of climatology hence require computing intensive techniques such as numerical weather prediction (NWP). This has made climate modelling a preserve of high performance computing (HPC) until the recent entrance of big data analytics. It is therefore necessary to optimize the algorithms used in the big data environment so as to give comparable performance to that offered by HPC environments. The study aimed at improving the big data MapReduce framework of analysis by optimizing the TeraSort benchmark algorithm. The algorithm proposed employed classical sort techniques and incorporated quantum computing mechanisms. Historical weather data collected at weather stations across the world was gathered and converted into organised, human readable format to suffice as input to the program. The proposed algorithm constituting of a map, sort and reduction phase transformed the bulky observational data into a compact summary of monthly temperature averages in linear complexity. This is a significant improvement in performance in comparison to the TeraSort algorithm on a single node. The study concludes by suggesting areas that may be explored for further optimization with emphasis on quantum computing capabilities

SU+ Digital Repository

Learning workload behaviour models from monitored time-series for resource estimation towards data center optimization

Author: Buchaca Prats David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 14/01/2021
Field of study

In recent years there has been an extraordinary growth of the demand of Cloud Computing resources executed in Data Centers. Modern Data Centers are complex systems that need management. As distributed computing systems grow, and workloads benefit from such computing environments, the management of such systems increases in complexity. The complexity of resource usage and power consumption on cloud-based applications makes the understanding of application behavior through expert examination difficult. The difficulty increases when applications are seen as "black boxes", where only external monitoring can be retrieved. Furthermore, given the different amount of scenarios and applications, automation is required. To deal with such complexity, Machine Learning methods become crucial to facilitate tasks that can be automatically learned from data. Firstly, this thesis proposes an unsupervised learning technique to learn high level representations from workload traces. Such technique provides a fast methodology to characterize workloads as sequences of abstract phases. The learned phase representation is validated on a variety of datasets and used in an auto-scaling task where we show that it can be applied in a production environment, achieving better performance than other state-of-the-art techniques. Secondly, this thesis proposes a neural architecture, based on Sequence-to-Sequence models, that provides the expected resource usage of applications sharing hardware resources. The proposed technique provides resource managers the ability to predict resource usage over time as well as the completion time of the running applications. The technique provides lower error predicting usage when compared with other popular Machine Learning methods. Thirdly, this thesis proposes a technique for auto-tuning Big Data workloads from the available tunable parameters. The proposed technique gathers information from the logs of an application generating a feature descriptor that captures relevant information from the application to be tuned. Using this information we demonstrate that performance models can generalize up to a 34% better when compared with other state-of-the-art solutions. Moreover, the search time to find a suitable solution can be drastically reduced, with up to a 12x speedup and almost equal quality results as modern solutions. These results prove that modern learning algorithms, with the right feature information, provide powerful techniques to manage resource allocation for applications running in cloud environments. This thesis demonstrates that learning algorithms allow relevant optimizations in Data Center environments, where applications are externally monitored and careful resource management is paramount to efficiently use computing resources. We propose to demonstrate this thesis in three areas that orbit around resource management in server environmentsEls Centres de Dades (Data Centers) moderns són sistemes complexos que necessiten ser gestionats. A mesura que creixen els sistemes de computació distribuïda i les aplicacions es beneficien d’aquestes infraestructures, també n’augmenta la seva complexitat. La complexitat que implica gestionar recursos de còmput i d’energia en sistemes de computació al núvol fa difícil entendre el comportament de les aplicacions que s'executen de manera manual. Aquesta dificultat s’incrementa quan les aplicacions s'observen com a "caixes negres", on només es poden monitoritzar algunes mètriques de les caixes de manera externa. A més, degut a la gran varietat d’escenaris i aplicacions, és necessari automatitzar la gestió d'aquests recursos. Per afrontar-ne el repte, l'aprenentatge automàtic juga un paper cabdal que facilita aquestes tasques, que poden ser apreses automàticament en base a dades prèvies del sistema que es monitoritza. Aquesta tesi demostra que els algorismes d'aprenentatge poden aportar optimitzacions molt rellevants en la gestió de Centres de Dades, on les aplicacions són monitoritzades externament i la gestió dels recursos és de vital importància per a fer un ús eficient de la capacitat de còmput d'aquests sistemes. En primer lloc, aquesta tesi proposa emprar aprenentatge no supervisat per tal d’aprendre representacions d'alt nivell a partir de traces d'aplicacions. Aquesta tècnica ens proporciona una metodologia ràpida per a caracteritzar aplicacions vistes com a seqüències de fases abstractes. La representació apresa de fases és validada en diferents “datasets” i s'aplica a la gestió de tasques d'”auto-scaling”, on es conclou que pot ser aplicable en un medi de producció, aconseguint un millor rendiment que altres mètodes de vanguardia. En segon lloc, aquesta tesi proposa l'ús de xarxes neuronals, basades en arquitectures “Sequence-to-Sequence”, que proporcionen una estimació dels recursos usats per aplicacions que comparteixen recursos de hardware. La tècnica proposada facilita als gestors de recursos l’habilitat de predir l'ús de recursos a través del temps, així com també una estimació del temps de còmput de les aplicacions. Tanmateix, redueix l’error en l’estimació de recursos en comparació amb d’altres tècniques populars d'aprenentatge automàtic. Per acabar, aquesta tesi introdueix una tècnica per a fer “auto-tuning” dels “hyper-paràmetres” d'aplicacions de Big Data. Consisteix així en obtenir informació dels “logs” de les aplicacions, generant un vector de característiques que captura informació rellevant de les aplicacions que s'han de “tunejar”. Emprant doncs aquesta informació es valida que els ”Regresors” entrenats en la predicció del rendiment de les aplicacions són capaços de generalitzar fins a un 34% millor que d’altres “Regresors” de vanguàrdia. A més, el temps de cerca per a trobar una bona solució es pot reduir dràsticament, aconseguint un increment de millora de fins a 12 vegades més dels resultats de qualitat en contraposició a alternatives modernes. Aquests resultats posen de manifest que els algorismes moderns d'aprenentatge automàtic esdevenen tècniques molt potents per tal de gestionar l'assignació de recursos en aplicacions que s'executen al núvol.Arquitectura de computador

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Adaptive Failure-Aware Scheduling for Hadoop

Author: Soualhia Mbarka
Publication venue
Publication date: 06/02/2018
Field of study

Given the dynamic nature of cloud environments, failures are the norm rather than the exception in data centers powering cloud frameworks. Despite the diversity of integrated recovery mechanisms in cloud frameworks, their schedulers still generate poor scheduling decisions leading to tasks' failures due to unforeseen events such as unpredicted demands of services or hardware outages. Traditionally, simulation and analytical modeling have been widely used to analyze the impact of the scheduling decisions on the failures rates. However, they cannot provide accurate results and exhaustive coverage of the cloud systems especially when failures occur. In this thesis, we present new approaches for modeling and verifying an adaptive failure-aware scheduling algorithm for Hadoop to early detect these failures and to reschedule tasks according to changes in the cloud. Hadoop is the framework of choice on many off-the-shelf clusters in the cloud to process data-intensive applications by efficiently running them across distributed multiple machines. The proposed scheduling algorithm for Hadoop relies on predictions made by machine learning algorithms trained on previously executed tasks and data collected from the Hadoop environment. To further improve Hadoop scheduling decisions on the fly, we use reinforcement learning techniques to select an appropriate scheduling action for a scheduled task. Furthermore, we propose an adaptive algorithm to dynamically detect failures of nodes in Hadoop. We implement the above approaches in ATLAS: an AdapTive Failure-Aware Scheduling algorithm that can be built on top of existing Hadoop schedulers. To illustrate the usefulness and benefits of ATLAS, we conduct a large empirical study on a Hadoop cluster deployed on Amazon Elastic MapReduce (EMR) to compare the performance of ATLAS to those of three Hadoop scheduling algorithms (FIFO, Fair, and Capacity). Results show that ATLAS outperforms these scheduling algorithms in terms of failures' rates, execution times, and resources utilization. Finally, we propose a new methodology to formally identify the impact of the scheduling decisions of Hadoop on the failures rates. We use model checking to verify some of the most important scheduling properties in Hadoop (schedulability, resources-deadlock freeness, and fairness) and provide possible strategies to avoid their occurrences in ATLAS. The formal verification of the Hadoop scheduler allows to identify more tasks failures and hence reduce the number of failures in ATLAS

Concordia University Research Repository

Task Scheduling in Big Data Platforms: A Systematic Literature Review

Author: Khomh Foutse
Soualhia Mbarka
Tahar Sofiène
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Context: Hadoop, Spark, Storm, and Mesos are very well known frameworks in both research and industrial communities that allow expressing and processing distributed computations on massive amounts of data. Multiple scheduling algorithms have been proposed to ensure that short interactive jobs, large batch jobs, and guaranteed-capacity production jobs running on these frameworks can deliver results quickly while maintaining a high throughput. However, only a few works have examined the effectiveness of these algorithms. Objective: The Evidence-based Software Engineering (EBSE) paradigm and its core tool, i.e., the Systematic Literature Review (SLR), have been introduced to the Software Engineering community in 2004 to help researchers systematically and objectively gather and aggregate research evidences about different topics. In this paper, we conduct a SLR of task scheduling algorithms that have been proposed for big data platforms. Method: We analyse the design decisions of different scheduling models proposed in the literature for Hadoop, Spark, Storm, and Mesos over the period between 2005 and 2016. We provide a research taxonomy for succinct classification of these scheduling models. We also compare the algorithms in terms of performance, resources utilization, and failure recovery mechanisms. Results: Our searches identifies 586 studies from journals, conferences and workshops having the highest quality in this field. This SLR reports about different types of scheduling models (dynamic, constrained, and adaptive) and the main motivations behind them (including data locality, workload balancing, resources utilization, and energy efficiency). A discussion of some open issues and future challenges pertaining to improving the current studies is provided

Crossref

Concordia University Research Repository

PolyPublie

Energy-aware service provisioning in P2P-assisted cloud ecosystems

Author: Sharifi Leila
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2016
Field of study

Cotutela Universitat Politècnica de Catalunya i Instituto Tecnico de LisboaEnergy has been emerged as a first-class computing resource in modern systems. The trend has primarily led to the strong focus on reducing the energy consumption of data centers, coupled with the growing awareness of the adverse impact on the environment due to data centers. This has led to a strong focus on energy management for server class systems. In this work, we intend to address the energy-aware service provisioning in P2P-assisted cloud ecosystems, leveraging economics-inspired mechanisms. Toward this goal, we addressed a number of challenges. To frame an energy aware service provisioning mechanism in the P2P-assisted cloud, first, we need to compare the energy consumption of each individual service in P2P-cloud and data centers. However, in the procedure of decreasing the energy consumption of cloud services, we may be trapped with the performance violation. Therefore, we need to formulate a performance aware energy analysis metric, conceptualized across the service provisioning stack. We leverage this metric to derive energy analysis framework. Then, we sketch a framework to analyze the energy effectiveness in P2P-cloud and data center platforms to choose the right service platform, according to the performance and energy characteristics. This framework maps energy from the hardware oblivious, top level to the particular hardware setting in the bottom layer of the stack. Afterwards, we introduce an economics-inspired mechanism to increase the energy effectiveness in the P2P-assisted cloud platform as well as moving toward a greener ICT for ICT for a greener ecosystem.La energía se ha convertido en un recurso de computación de primera clase en los sistemas modernos. La tendencia ha dado lugar principalmente a un fuerte enfoque hacia la reducción del consumo de energía de los centros de datos, así como una creciente conciencia sobre los efectos ambientales negativos, producidos por los centros de datos. Esto ha llevado a un fuerte enfoque en la gestión de energía de los sistemas de tipo servidor. En este trabajo, se pretende hacer frente a la provisión de servicios de bajo consumo energético en los ecosistemas de la nube asistida por P2P, haciendo uso de mecanismos basados en economía. Con este objetivo, hemos abordado una serie de desafíos. Para instrumentar un mecanismo de servicio de aprovisionamiento de energía consciente en la nube asistida por P2P, en primer lugar, tenemos que comparar el consumo energético de cada servicio en la nube P2P y en los centros de datos. Sin embargo, en el procedimiento de disminuir el consumo de energía de los servicios en la nube, podemos quedar atrapados en el incumplimiento del rendimiento. Por lo tanto, tenemos que formular una métrica, sobre el rendimiento energético, a través de la pila de servicio de aprovisionamiento. Nos aprovechamos de esta métrica para derivar un marco de análisis de energía. Luego, se esboza un marco para analizar la eficacia energética en la nube asistida por P2P y en la plataforma de centros de datos para elegir la plataforma de servicios adecuada, de acuerdo con las características de rendimiento y energía. Este marco mapea la energía desde el alto nivel independiente del hardware a la configuración de hardware particular en la capa inferior de la pila. Posteriormente, se introduce un mecanismo basado en economía para aumentar la eficacia energética en la plataforma en la nube asistida por P2P, así como avanzar hacia unas TIC más verdes, para las TIC en un ecosistema más verde.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Effizienz in Cluster-Datenbanksystemen - Dynamische und Arbeitslastberücksichtigende Skalierung und Allokation

Author: Rabl Tilmann
Publication venue
Publication date: 08/12/2011
Field of study

Database systems have been vital in all forms of data processing for a long time. In recent years, the amount of processed data has been growing dramatically, even in small projects. Nevertheless, database management systems tend to be static in terms of size and performance which makes scaling a difficult and expensive task. Because of performance and especially cost advantages more and more installed systems have a shared nothing cluster architecture. Due to the massive parallelism of the hardware programming paradigms from high performance computing are translated into data processing. Database research struggles to keep up with this trend. A key feature of traditional database systems is to provide transparent access to the stored data. This introduces data dependencies and increases system complexity and inter process communication. Therefore, many developers are exchanging this feature for a better scalability. However, explicitly managing the data distribution and data flow requires a deep understanding of the distributed system and reduces the possibilities for automatic and autonomic optimization. In this thesis we present an approach for database system scaling and allocation that features good scalability although it keeps the data distribution transparent. The first part of this thesis analyzes the challenges and opportunities for self-scaling database management systems in cluster environments. Scalability is a major concern of Internet based applications. Access peaks that overload the application are a financial risk. Therefore, systems are usually configured to be able to process peaks at any given moment. As a result, server systems often have a very low utilization. In distributed systems the efficiency can be increased by adapting the number of nodes to the current workload. We propose a processing model and an architecture that allows efficient self-scaling of cluster database systems. In the second part we consider different allocation approaches. To increase the efficiency we present a workload-aware, query-centric model. The approach is formalized; optimal and heuristic algorithms are presented. The algorithms optimize the data distribution for local query execution and balance the workload according to the query history. We present different query classification schemes for different forms of partitioning. The approach is evaluated for OLTP and OLAP style workloads. It is shown that variants of the approach scale well for both fields of application. The third part of the thesis considers benchmarks for large, adaptive systems. First, we present a data generator for cloud-sized applications. Due to its architecture the data generator can easily be extended and configured. A key feature is the high degree of parallelism that makes linear speedup for arbitrary numbers of nodes possible. To simulate systems with user interaction, we have analyzed a productive online e-learning management system. Based on our findings, we present a model for workload generation that considers the temporal dependency of user interaction.Datenbanksysteme sind seit langem die Grundlage für alle Arten von Informationsverarbeitung. In den letzten Jahren ist das Datenaufkommen selbst in kleinen Projekten dramatisch angestiegen. Dennoch sind viele Datenbanksysteme statisch in Bezug auf ihre Kapazität und Verarbeitungsgeschwindigkeit was die Skalierung aufwendig und teuer macht. Aufgrund der guten Geschwindigkeit und vor allem aus Kostengründen haben immer mehr Systeme eine Shared-Nothing-Architektur, bestehen also aus unabhängigen, lose gekoppelten Rechnerknoten. Da dieses Konstruktionsprinzip einen sehr hohen Grad an Parallelität aufweist, werden zunehmend Programmierparadigmen aus dem klassischen Hochleistungsrechen für die Informationsverarbeitung eingesetzt. Dieser Trend stellt die Datenbankforschung vor große Herausforderungen. Eine der grundlegenden Eigenschaften traditioneller Datenbanksysteme ist der transparente Zugriff zu den gespeicherten Daten, der es dem Nutzer erlaubt unabhängig von der internen Organisation auf die Daten zuzugreifen. Die resultierende Unabhängigkeit führt zu Abhängigkeiten in den Daten und erhöht die Komplexität der Systeme und der Kommunikation zwischen einzelnen Prozessen. Daher wird Transparenz von vielen Entwicklern für eine bessere Skalierbarkeit geopfert. Diese Entscheidung führt dazu, dass der die Datenorganisation und der Datenfluss explizit behandelt werden muss, was die Möglichkeiten für eine automatische und autonome Optimierung des Systems einschränkt. Der in dieser Arbeit vorgestellte Ansatz zur Skalierung und Allokation erhält den transparenten Zugriff und zeichnet sich dabei durch seine vollständige Automatisierbarkeit und sehr gute Skalierbarkeit aus. Im ersten Teil dieser Dissertation werden die Herausforderungen und Chancen für selbst-skalierende Datenbankmanagementsysteme behandelt, die in auf Computerclustern betrieben werden. Gute Skalierbarkeit ist eine notwendige Eigenschaft für Anwendungen, die über das Internet zugreifbar sind. Lastspitzen im Zugriff, die die Anwendung überladen stellen ein finanzielles Risiko dar. Deshalb werden Systeme so konfiguriert, dass sie eventuelle Lastspitzen zu jedem Zeitpunkt verarbeiten können. Das führt meist zu einer im Schnitt sehr geringen Auslastung der unterliegenden Systeme. Eine Möglichkeit dieser Ineffizienz entgegen zu steuern ist es die Anzahl der verwendeten Rechnerknoten an die vorliegende Last anzupassen. In dieser Dissertation werden ein Modell und eine Architektur für die Anfrageverarbeitung vorgestellt, mit denen es möglich ist Datenbanksysteme auf Clusterrechnern einfach und effizient zu skalieren. Im zweiten Teil der Arbeit werden verschieden Möglichkeiten für die Datenverteilung behandelt. Um die Effizienz zu steigern wird ein Modell verwendet, das die Lastverteilung im Anfragestrom berücksichtigt. Der Ansatz ist formalisiert und optimale und heuristische Lösungen werden präsentiert. Die vorgestellten Algorithmen optimieren die Datenverteilung für eine lokale Ausführung aller Anfragen und balancieren die Last auf den Rechnerknoten. Es werden unterschiedliche Arten der Anfrageklassifizierung vorgestellt, die zu verschiedenen Arten von Partitionierung führen. Der Ansatz wird sowohl für Onlinetransaktionsverarbeitung, als auch Onlinedatenanalyse evaluiert. Die Evaluierung zeigt, dass der Ansatz für beide Felder sehr gut skaliert. Im letzten Teil der Arbeit werden verschiedene Techniken für die Leistungsmessung von großen, adaptiven Systemen präsentiert. Zunächst wird ein Datengenerierungsansatz gezeigt, der es ermöglicht sehr große Datenmengen völlig parallel zu erzeugen. Um die Benutzerinteraktion von Onlinesystemen zu simulieren wurde ein produktives E-learningsystem analysiert. Anhand der Analyse wurde ein Modell für die Generierung von Arbeitslasten erstellt, das die zeitlichen Abhängigkeiten von Benutzerinteraktion berücksichtigt

Raspberry Pi Technology

Author
Publication venue: 'MDPI AG'
Publication date: 20/11/2017
Field of study

Portsmouth University Research Portal (Pure)