496 research outputs found
Performance analysis methods for understanding scaling bottlenecks in multi-threaded applications
In dit proefschrift stellen we drie nieuwe methodes voor om de prestatie van meerdradige programma's te analyseren. Onze eerste methode, criticality stacks, is bruikbaar voor het analyseren van onevenwicht tussen draden. Om deze stacks te construeren stellen we een nieuwe criticaliteitsmetriek voor, die de uitvoeringstijd van een applicatie opsplitst in een deel voor iedere draad. Hoe groter dit deel is voor een draad, hoe kritischer deze draad is voor de applicatie. De tweede methode, bottle graphs, stelt iedere draad van een meerdradig programma voor als een rechthoek in een grafiek. De hoogte van de rechthoek wordt berekend door middel van onze criticaliteitsmetriek, en de breedte stelt het parallellisme van een draad voor. Rechthoeken die bovenaan in de grafiek zitten, als het ware in de hals van de fles, hebben een beperkt parallellisme, waardoor we ze beschouwen als “bottlenecks” voor de applicatie. Onze derde methode, speedup stacks, toont de bereikte speedup van een applicatie en de verschillende componenten die speedup beperken in een gestapelde grafiek. De intuïtie achter dit concept is dat door het reduceren van de invloed van een bepaalde component, de speedup van een applicatie proportioneel toeneemt met de grootte van die component in de stapel
A framework for the characterization and analysis of software systems scalability
The term scalability appears frequently in computing literature, but it is a term that is poorly defined and
poorly understood. It is an important attribute of computer systems that is frequently asserted but rarely
validated in any meaningful, systematic way. The lack of a consistent, uniform and systematic treatment
of scalability makes it difficult to identify and avoid scalability problems, clearly and objectively describe
the scalability of software systems, evaluate claims of scalability, and compare claims from different
sources.
This thesis provides a definition of scalability and describes a systematic framework for the characterization
and analysis of software systems scalability. The framework is comprised of a goal-oriented
approach for describing, modeling and reasoning about scalability requirements, and an analysis technique
that captures the dependency relationships that underlie typical notions of scalability. The framework
is validated against a real-world data analysis system and is used to recast a number of examples
taken from the computing literature and from industry in order to demonstrate its use across different
application domains and system designs
SCALO: Scalability-Aware Parallelism Orchestration for Multi-Threaded Workloads
This article contributes a solution to orchestrate concurrent application execution to increase throughput. SCALO monitors co-executing applications at runtime to evaluate their scalability
Scalability and Performance Analysis of OpenMP Codes Using the Periscope Toolkit
In this paper, we present two new approaches while rendering necessary extensions to Periscope to perform scalability and performance analysis on OpenMP codes. Periscope is an online-based performance analysis toolkit which consists of a user defined number of analysis agents that automatically search for the performance properties while the application is running. In order to detect the scalability and performance bottlenecks of OpenMP codes using Periscope, a few newly defined performance properties and meta properties are formalized. We manifest our implementation by evaluating NAS OpenMP benchmarks. As shown in our results, our approach identifies the code regions which do not scale well and other performance problems, e.g. load imbalance in NAS parallel benchmarks
Real-time resource scaling platform for Big Data workloads on serverless environments
Versión final aceptada de: https://doi.org/10.1016/j.future.2019.11.037This manuscript version is made available under the CC-BY-NC-ND 4.0 license https://creativecommons.org/licenses/by-ncnd/
4.0/. This version of the article: Enes, J., Expósito, R. R., & Touriño, J. (2020). 'Real-time resource scaling platform for Big Data workloads
on serverless environments', has been accepted for publication in.: Future Generation Computer Systems, 105, 361–379. The Version of
Record is available online at: https://doi.org/10.1016/j.future.2019.11.037.The serverless execution paradigm is becoming an increasingly popular option when workloads are to be deployed in an abstracted way, more specifically, without specifying any infrastructure requirements. Currently, such workloads are typically comprised of small programs or even a series of single functions used as event triggers or to process a data stream. Other applications that may also fit on a serverless scenario are stateless services that may need to seamlessly scale in terms of resources, such as a web server. Although several commercial serverless services are available (e.g., Amazon Lambda), their use cases are mostly limited to the execution of functions or scripts that can be adapted to predefined templates or specifications. However, current research efforts point out that it is interesting for the serverless paradigm to evolve from single functions and support more flexible infrastructure units such as operating-system-level virtualization in the form of containers. In this paper we present a novel platform to automatically scale container resources in real time, while they are running, and without any need for reboots. This platform is evaluated using Big Data workloads, both batch and streaming, as representative examples of applications that could be initially regarded as unsuitable for the serverless paradigm considering the currently available services. The results show how our serverless platform can improve the CPU utilization by up to 77% with an execution time overhead of only 6%, while remaining scalable when using a 32-container cluster.This work was supported by the Ministry of Economy, Industry and Competitiveness of Spain and FEDER funds of the European Union (project TIN2016-75845-P, AEI/FEDER/EU), the FPU Program of the Ministry of Education, Spain (grant FPU15/03381) and by Xunta de Galicia, Spain (Centro Singular de Investigación de Galicia accreditation 2016–2019, ref. ED431G/01). We also gratefully acknowledge CESGA for providing access to the Big Data infrastructure, and also sincerely thank Dr. Javier López Cacheiro for his technical support to perform some of the experiments. Other experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several universities as well as other organizations.Xunta de Galicia; ED431G/0
Recommended from our members
Separating data from metadata for robustness and scalability
textWhen building storage systems that aim to simultaneously provide robustness, scalability, and efficiency, one faces a fundamental tension, as higher robustness typically incurs higher costs and thus hurts both efficiency and scalability. My research shows that an approach to storage system design based on a simple principle—separating data from metadata—can yield systems that address elegantly and effectively that tension in a variety of settings. One observation motivates our approach: much of the cost paid by many strong protection techniques is incurred to detect errors. This observation suggests an opportunity: if we can build a low-cost oracle to detect errors and identify correct data, it may be possible to reduce the cost of protection without weakening its guarantees. This dissertation shows that metadata, if carefully designed, can serve as such an oracle and help a storage system protect its data with minimal cost. This dissertation shows how to effectively apply this idea in three very different systems: Gnothi—a storage replication protocol that combines the high availability of asynchronous replication and the low cost of synchronous replication for a small-scale block storage; Salus—a large-scale block storage with unprecedented guarantees in terms of consistency, availability, and durability in the face of a wide range of server failures; and Exalt—a tool to emulate a large storage system with 100 times fewer machines.Computer Science
Evaluation and optimization of Big Data Processing on High Performance Computing Systems
Programa Oficial de Doutoramento en Investigación en Tecnoloxías da Información. 524V01[Resumo]
Hoxe en día, moitas organizacións empregan tecnoloxías Big Data para extraer
información de grandes volumes de datos. A medida que o tamaño destes volumes
crece, satisfacer as demandas de rendemento das aplicacións de procesamento
de datos masivos faise máis difícil. Esta Tese céntrase en avaliar e optimizar estas
aplicacións, presentando dúas novas ferramentas chamadas BDEv e Flame-MR. Por
unha banda, BDEv analiza o comportamento de frameworks de procesamento Big
Data como Hadoop, Spark e Flink, moi populares na actualidade. BDEv xestiona
a súa configuración e despregamento, xerando os conxuntos de datos de entrada
e executando cargas de traballo previamente elixidas polo usuario. Durante cada
execución, BDEv extrae diversas métricas de avaliación que inclúen rendemento,
uso de recursos, eficiencia enerxética e comportamento a nivel de microarquitectura.
Doutra banda, Flame-MR permite optimizar o rendemento de aplicacións Hadoop
MapReduce. En xeral, o seu deseño baséase nunha arquitectura dirixida por eventos
capaz de mellorar a eficiencia dos recursos do sistema mediante o solapamento da
computación coas comunicacións. Ademais de reducir o número de copias en memoria
que presenta Hadoop, emprega algoritmos eficientes para ordenar e mesturar os
datos. Flame-MR substitúe o motor de procesamento de datos MapReduce de xeito
totalmente transparente, polo que non é necesario modificar o código de aplicacións
xa existentes. A mellora de rendemento de Flame-MR foi avaliada de maneira exhaustiva
en sistemas clúster e cloud, executando tanto benchmarks estándar coma
aplicacións pertencentes a casos de uso reais. Os resultados amosan unha redución
de entre un 40% e un 90% do tempo de execución das aplicacións. Esta Tese proporciona
aos usuarios e desenvolvedores de Big Data dúas potentes ferramentas
para analizar e comprender o comportamento de frameworks de procesamento de
datos e reducir o tempo de execución das aplicacións sen necesidade de contar con
coñecemento experto para elo.[Resumen]
Hoy en día, muchas organizaciones utilizan tecnologías Big Data para extraer
información de grandes volúmenes de datos. A medida que el tamaño de estos volúmenes
crece, satisfacer las demandas de rendimiento de las aplicaciones de procesamiento
de datos masivos se vuelve más difícil. Esta Tesis se centra en evaluar y
optimizar estas aplicaciones, presentando dos nuevas herramientas llamadas BDEv
y Flame-MR. Por un lado, BDEv analiza el comportamiento de frameworks de procesamiento
Big Data como Hadoop, Spark y Flink, muy populares en la actualidad.
BDEv gestiona su configuración y despliegue, generando los conjuntos de datos de
entrada y ejecutando cargas de trabajo previamente elegidas por el usuario. Durante
cada ejecución, BDEv extrae diversas métricas de evaluación que incluyen rendimiento,
uso de recursos, eficiencia energética y comportamiento a nivel de microarquitectura.
Por otro lado, Flame-MR permite optimizar el rendimiento de aplicaciones
Hadoop MapReduce. En general, su diseño se basa en una arquitectura dirigida por
eventos capaz de mejorar la eficiencia de los recursos del sistema mediante el solapamiento
de la computación con las comunicaciones. Además de reducir el número
de copias en memoria que presenta Hadoop, utiliza algoritmos eficientes para ordenar
y mezclar los datos. Flame-MR reemplaza el motor de procesamiento de datos
MapReduce de manera totalmente transparente, por lo que no se necesita modificar
el código de aplicaciones ya existentes. La mejora de rendimiento de Flame-MR ha
sido evaluada de manera exhaustiva en sistemas clúster y cloud, ejecutando tanto
benchmarks estándar como aplicaciones pertenecientes a casos de uso reales. Los
resultados muestran una reducción de entre un 40% y un 90% del tiempo de ejecución
de las aplicaciones. Esta Tesis proporciona a los usuarios y desarrolladores de
Big Data dos potentes herramientas para analizar y comprender el comportamiento
de frameworks de procesamiento de datos y reducir el tiempo de ejecución de las
aplicaciones sin necesidad de contar con conocimiento experto para ello.[Abstract]
Nowadays, Big Data technologies are used by many organizations to extract
valuable information from large-scale datasets. As the size of these datasets increases,
meeting the huge performance requirements of data processing applications
becomes more challenging. This Thesis focuses on evaluating and optimizing these
applications by proposing two new tools, namely BDEv and Flame-MR. On the one
hand, BDEv allows to thoroughly assess the behavior of widespread Big Data processing
frameworks such as Hadoop, Spark and Flink. It manages the configuration
and deployment of the frameworks, generating the input datasets and launching the
workloads specified by the user. During each workload, it automatically extracts
several evaluation metrics that include performance, resource utilization, energy efficiency
and microarchitectural behavior. On the other hand, Flame-MR optimizes
the performance of existing Hadoop MapReduce applications. Its overall design is
based on an event-driven architecture that improves the efficiency of the system
resources by pipelining data movements and computation. Moreover, it avoids redundant
memory copies present in Hadoop, while also using efficient sort and merge
algorithms for data processing. Flame-MR replaces the underlying MapReduce data
processing engine in a transparent way and thus the source code of existing applications
does not require to be modified. The performance benefits provided by Flame-
MR have been thoroughly evaluated on cluster and cloud systems by using both
standard benchmarks and real-world applications, showing reductions in execution
time that range from 40% to 90%. This Thesis provides Big Data users with powerful
tools to analyze and understand the behavior of data processing frameworks and
reduce the execution time of the applications without requiring expert knowledge
BB-ML: Basic Block Performance Prediction using Machine Learning Techniques
Recent years have seen the adoption of Machine Learning (ML) techniques to
predict the performance of large-scale applications, mostly at a coarse level.
In contrast, we propose to use ML techniques for performance prediction at a
much finer granularity, namely at the Basic Block (BB) level, which are single
entry, single exit code blocks that are used for analysis by the compilers to
break down a large code into manageable pieces. We extrapolate the basic block
execution counts of GPU applications and use them for predicting the
performance for large input sizes from the counts of smaller input sizes. We
train a Poisson Neural Network (PNN) model using random input values as well as
the lowest input values of the application to learn the relationship between
inputs and basic block counts. Experimental results show that the model can
accurately predict the basic block execution counts of 16 GPU benchmarks. We
achieve an accuracy of 93.5% in extrapolating the basic block counts for large
input sets when trained on smaller input sets and an accuracy of 97.7% in
predicting basic block counts on random instances. In a case study, we apply
the ML model to CUDA GPU benchmarks for performance prediction across a
spectrum of applications. We use a variety of metrics for evaluation, including
global memory requests and the active cycles of tensor cores, ALU, and FMA
units. Results demonstrate the model's capability of predicting the performance
of large datasets with an average error rate of 0.85% and 0.17% for global and
shared memory requests, respectively. Additionally, to address the utilization
of the main functional units in Ampere architecture GPUs, we calculate the
active cycles for tensor cores, ALU, FMA, and FP64 units and achieve an average
error of 2.3% and 10.66% for ALU and FMA units while the maximum observed error
across all tested applications and units reaches 18.5%.Comment: Accepted at the 29th IEEE International Conference on Parallel and
Distributed Systems (ICPADS 2023
- …