149 research outputs found
SusTrainable: Promoting Sustainability as a Fundamental Driver in Software Development Training and Education. 2nd Teacher Training, January 23-27, 2023, Pula, Croatia. Revised lecture notes
This volume exhibits the revised lecture notes of the 2nd teacher training
organized as part of the project Promoting Sustainability as a Fundamental
Driver in Software Development Training and Education, held at the Juraj
Dobrila University of Pula, Croatia, in the week January 23-27, 2023. It is the
Erasmus+ project No. 2020-1-PT01-KA203-078646 - Sustrainable. More details can
be found at the project web site https://sustrainable.github.io/
One of the most important contributions of the project are two summer
schools. The 2nd SusTrainable Summer School (SusTrainable - 23) will be
organized at the University of Coimbra, Portugal, in the week July 10-14, 2023.
The summer school will consist of lectures and practical work for master and
PhD students in computing science and closely related fields. There will be
contributions from Babe\c{s}-Bolyai University, E\"{o}tv\"{o}s Lor\'{a}nd
University, Juraj Dobrila University of Pula, Radboud University Nijmegen,
Roskilde University, Technical University of Ko\v{s}ice, University of
Amsterdam, University of Coimbra, University of Minho, University of Plovdiv,
University of Porto, University of Rijeka.
To prepare and streamline the summer school, the consortium organized a
teacher training in Pula, Croatia. This was an event of five full days,
organized by Tihana Galinac Grbac and Neven Grbac. The Juraj Dobrila University
of Pula is very concerned with the sustainability issues. The education,
research and management are conducted with sustainability goals in mind.
The contributions in the proceedings were reviewed and provide a good
overview of the range of topics that will be covered at the summer school. The
papers in the proceedings, as well as the very constructive and cooperative
teacher training, guarantee the highest quality and beneficial summer school
for all participants.Comment: 85 pages, 8 figures, 3 code listings and 1 table; editors: Tihana
Galinac Grbac, Csaba Szab\'{o}, Jo\~{a}o Paulo Fernande
Adaptation-Aware Architecture Modeling and Analysis of Energy Efficiency for Software Systems
This thesis presents an approach for the design time analysis of energy efficiency for static and self-adaptive software systems.
The quality characteristics of a software system, such as performance and operating costs, strongly depend upon its architecture. Software architecture is a high-level view on software artifacts that reflects essential quality characteristics of a system under design. Design decisions made on an architectural level have a decisive impact on the quality of a system. Revising architectural design decisions late into development requires significant effort. Architectural analyses allow software architects to reason about the impact of design decisions on quality, based on an architectural description of the system. An essential quality goal is the reduction of cost while maintaining other quality goals. Power consumption accounts for a significant part of the Total Cost of Ownership (TCO) of data centers. In 2010, data centers contributed 1.3% of the world-wide power consumption. However, reasoning on the energy efficiency of software systems is excluded from the systematic analysis of software architectures at design time. Energy efficiency can only be evaluated once the system is deployed and operational. One approach to reduce power consumption or cost is the introduction of self-adaptivity to a software system. Self-adaptive software systems execute adaptations to provision costly resources dependent on user load. The execution of reconfigurations can increase energy efficiency and reduce cost. If performed improperly, however, the additional resources required to execute a reconfiguration may exceed their positive effect.
Existing architecture-level energy analysis approaches offer limited accuracy or only consider a limited set of system features, e.g., the used communication style. Predictive approaches from the embedded systems and Cloud Computing domain operate on an abstraction that is not suited for architectural analysis. The execution of adaptations can consume additional resources. The additional consumption can reduce performance and energy efficiency. Design time quality analyses for self-adaptive software systems ignore this transient effect of adaptations.
This thesis makes the following contributions to enable the systematic consideration of energy efficiency in the architectural design of self-adaptive software systems: First, it presents a modeling language that captures power consumption characteristics on an architectural abstraction level. Second, it introduces an energy efficiency analysis approach that uses instances of our power consumption modeling language in combination with existing performance analyses for architecture models. The developed analysis supports reasoning on energy efficiency for static and self-adaptive software systems. Third, to ease the specification of power consumption characteristics, we provide a method for extracting power models for server environments. The method encompasses an automated profiling of servers based on a set of restrictions defined by the user. A model training framework extracts a set of power models specified in our modeling language from the resulting profile. The method ranks the trained power models based on their predicted accuracy. Lastly, this thesis introduces a systematic modeling and analysis approach for considering transient effects in design time quality analyses. The approach explicitly models inter-dependencies between reconfigurations, performance and power consumption. We provide a formalization of the execution semantics of the model. Additionally, we discuss how our approach can be integrated with existing quality analyses of self-adaptive software systems.
We validated the accuracy, applicability, and appropriateness of our approach in a variety of case studies. The first two case studies investigated the accuracy and appropriateness of our modeling and analysis approach. The first study evaluated the impact of design decisions on the energy efficiency of a media hosting application. The energy consumption predictions achieved an absolute error lower than 5.5% across different user loads. Our approach predicted the relative impact of the design decision on energy efficiency with an error of less than 18.94%. The second case study used two variants of the Spring-based community case study system PetClinic. The case study complements the accuracy and appropriateness evaluation of our modeling and analysis approach. We were able to predict the energy consumption of both variants with an absolute error of no more than 2.38%. In contrast to the first case study, we derived all models automatically, using our power model extraction framework, as well as an extraction framework for performance models. The third case study applied our model-based prediction to evaluate the effect of different self-adaptation algorithms on energy efficiency. It involved scientific workloads executed in a virtualized environment. Our approach predicted the energy consumption with an error below 7.1%, even though we used coarse grained measurement data of low accuracy to train the input models. The fourth case study evaluated the appropriateness and accuracy of the automated model extraction method using a set of Big Data and enterprise workloads. Our method produced power models with prediction errors below 5.9%. A secondary study evaluated the accuracy of extracted power models for different Virtual Machine (VM) migration scenarios. The results of the fifth case study showed that our approach for modeling transient effects improved the prediction accuracy for a horizontally scaling application. Leveraging the improved accuracy, we were able to identify design deficiencies of the application that otherwise would have remained unnoticed
Limits to parallelism in scientific computing
The goal of our research is to decrease the execution time of scientific computing applications. We exploit the application\u27s inherent parallelism to achieve this goal. This exploitation is expensive as we analyze sequential applications and port them to parallel computers. Many scientifically computational problems appear to have considerable exploitable parallelism; however, upon implementing a parallel solution on a parallel computer, limits to the parallelism are encountered. Unfortunately, many of these limits are characteristic of a specific parallel computer. This thesis explores these limits.;We study the feasibility of exploiting the inherent parallelism of four NASA scientific computing applications. We use simple models to predict each application\u27s degree of parallelism at several levels of granularity. From this analysis, we conclude that it is infeasible to exploit the inherent parallelism of two of the four applications. The interprocessor communication of one application is too expensive relative to its computation cost. The input and output costs of the other application are too expensive relative to its computation cost. We exploit the parallelism of the remaining two applications and measure their performance on an Intel iPSC/2 parallel computer. We parallelize an Optimal Control Boundary Value Problem. This guidance control problem determines an optimal trajectory of a boat in a river. We parallelize the Carbon Dioxide Slicing technique which is a macrophysical cloud property retrieval algorithm. This technique computes the height at the top of a cloud using cloud imager measurements. We consider the feasibility of exploiting its massive parallelism on a MasPar MP-2 parallel computer. We conclude that many limits to parallelism are surmountable while other limits are inescapable.;From these limits, we elucidate some fundamental issues that must be considered when porting similar problems to yet-to-be designed computers. We conclude that the technological improvements to reduce the isolation of computational units frees a programmer from many of the programmer\u27s current concerns about the granularity of the work. We also conclude that the technological improvements to relax the regimented guidance of the computational units allows a programmer to exploit the inherent heterogeneous parallelism of many applications
Data Parallel C++
Learn how to accelerate C++ programs using data parallelism. This open access book enables C++ programmers to be at the forefront of this exciting and important new development that is helping to push computing to new levels. It is full of practical advice, detailed explanations, and code examples to illustrate key topics. Data parallelism in C++ enables access to parallel resources in a modern heterogeneous system, freeing you from being locked into any particular computing device. Now a single C++ application can use any combination of devices—including GPUs, CPUs, FPGAs and AI ASICs—that are suitable to the problems at hand. This book begins by introducing data parallelism and foundational topics for effective use of the SYCL standard from the Khronos Group and Data Parallel C++ (DPC++), the open source compiler used in this book. Later chapters cover advanced topics including error handling, hardware-specific programming, communication and synchronization, and memory model considerations. Data Parallel C++ provides you with everything needed to use SYCL for programming heterogeneous systems. What You'll Learn Accelerate C++ programs using data-parallel programming Target multiple device types (e.g. CPU, GPU, FPGA) Use SYCL and SYCL compilers Connect with computing’s heterogeneous future via Intel’s oneAPI initiative Who This Book Is For Those new data-parallel programming and computer programmers interested in data-parallel programming using C++
Data Parallel C++
Learn how to accelerate C++ programs using data parallelism. This open access book enables C++ programmers to be at the forefront of this exciting and important new development that is helping to push computing to new levels. It is full of practical advice, detailed explanations, and code examples to illustrate key topics. Data parallelism in C++ enables access to parallel resources in a modern heterogeneous system, freeing you from being locked into any particular computing device. Now a single C++ application can use any combination of devices—including GPUs, CPUs, FPGAs and AI ASICs—that are suitable to the problems at hand. This book begins by introducing data parallelism and foundational topics for effective use of the SYCL standard from the Khronos Group and Data Parallel C++ (DPC++), the open source compiler used in this book. Later chapters cover advanced topics including error handling, hardware-specific programming, communication and synchronization, and memory model considerations. Data Parallel C++ provides you with everything needed to use SYCL for programming heterogeneous systems. What You'll Learn Accelerate C++ programs using data-parallel programming Target multiple device types (e.g. CPU, GPU, FPGA) Use SYCL and SYCL compilers Connect with computing’s heterogeneous future via Intel’s oneAPI initiative Who This Book Is For Those new data-parallel programming and computer programmers interested in data-parallel programming using C++
Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-Synthesis
Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform.
This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements
Optimization of high-throughput real-time processes in physics reconstruction
La presente tesis se ha desarrollado en colaboración entre
la Universidad de Sevilla y la Organización Europea para la
Investigación Nuclear, CERN.
El detector LHCb es uno de los cuatro grandes detectores
situados en el Gran Colisionador de Hadrones, LHC. En LHCb,
se colisionan partÃculas a altas energÃas para comprender la
diferencia existente entre la materia y la antimateria. Debido a la
cantidad ingente de datos generada por el detector, es necesario
realizar un filtrado de datos en tiempo real, fundamentado en
los conocimientos actuales recogidos en el Modelo Estándar de
fÃsica de partÃculas. El filtrado, también conocido como High
Level Trigger, deberá procesar un throughput de 40 Tb/s de datos,
y realizar un filtrado de aproximadamente 1 000:1, reduciendo
el throughput a unos 40 Gb/s de salida, que se almacenan para
posterior análisis.
El proceso del High Level Trigger se subdivide a su vez en
dos etapas: High Level Trigger 1 (HLT1) y High Level Trigger
2 (HLT2). El HLT1 transcurre en tiempo real, y realiza una reducción de datos de aproximadamente 30:1. El HLT1 consiste
en una serie de procesos software que reconstruyen lo que ha
sucedido en la colisión de partÃculas. En la reconstrucción del
HLT1 únicamente se analizan las trayectorias de las partÃculas
producidas fruto de la colisión, en un problema conocido como
reconstrucción de trazas, para dictaminar el interés de las colisiones.
Por contra, el proceso HLT2 es más fino, requiriendo más
tiempo en realizarse y reconstruyendo todos los subdetectores
que componen LHCb.
Hacia 2020, el detector LHCb, asà como todos los componentes
del sistema de adquisici´on de datos, serán actualizados acorde
a los últimos desarrollos técnicos. Como parte del sistema
de adquisición de datos, los servidores que procesan HLT1 y
HLT2 también sufrirán una actualización. Al mismo tiempo, el
acelerador LHC será también actualizado, de manera que la
cantidad de datos generada en cada cruce de grupo de partÃculas
aumentare en aproxidamente 5 veces la actual. Debido a
las actualizaciones tanto del acelerador como del detector, se
prevé que la cantidad de datos que deberá procesar el HLT en
su totalidad sea unas 40 veces mayor a la actual.
La previsión de la escalabilidad del software actual a 2020
subestim´ó los recursos necesarios para hacer frente al incremento
en throughput. Esto produjo que se pusiera en marcha un
estudio de todos los algoritmos tanto del HLT1 como del HLT2,
asà como una actualización del código a nuevos estándares, para
mejorar su rendimiento y ser capaz de procesar la cantidad de
datos esperada.
En esta tesis, se exploran varios algoritmos de la reconstrucción de LHCb. El problema de reconstrucción de trazas se analiza
en profundidad y se proponen nuevos algoritmos para su
resolución. Ya que los problemas analizados exhiben un paralelismo
masivo, estos algoritmos se implementan en lenguajes
especializados para tarjetas gráficas modernas (GPUs), dada su
arquitectura inherentemente paralela. En este trabajo se dise Ëœnan
dos algoritmos de reconstrucción de trazas. Además, se diseñan
adicionalmente cuatro algoritmos de decodificación y un algoritmo
de clustering, problemas también encontrados en el HLT1.
Por otra parte, se diseña un algoritmo para el filtrado de Kalman,
que puede ser utilizado en ambas etapas.
Los algoritmos desarrollados cumplen con los requisitos esperados
por la colaboración LHCb para el año 2020. Para poder
ejecutar los algoritmos eficientemente en tarjetas gráficas, se
desarrolla un framework especializado para GPUs, que permite
la ejecución paralela de secuencias de reconstrucción en GPUs.
Combinando los algoritmos desarrollados con el framework, se
completa una secuencia de ejecución que asienta las bases para
un HLT1 ejecutable en GPU.
Durante la investigación llevada a cabo en esta tesis, y gracias
a los desarrollos arriba mencionados y a la colaboración de
un pequeño equipo de personas coordinado por el autor, se
completa un HLT1 ejecutable en GPUs. El rendimiento obtenido
en GPUs, producto de esta tesis, permite hacer frente al reto de
ejecutar una secuencia de reconstrucción en tiempo real, bajo
las condiciones actualizadas de LHCb previstas para 2020. As´ı
mismo, se completa por primera vez para cualquier experimento
del LHC un High Level Trigger que se ejecuta únicamente en
GPUs. Finalmente, se detallan varias posibles configuraciones
para incluir tarjetas gr´aficas en el sistema de adquisición de
datos de LHCb.The current thesis has been developed in collaboration between
Universidad de Sevilla and the European Organization for Nuclear
Research, CERN.
The LHCb detector is one of four big detectors placed alongside
the Large Hadron Collider, LHC. In LHCb, particles are
collided at high energies in order to understand the difference
between matter and antimatter. Due to the massive quantity
of data generated by the detector, it is necessary to filter data
in real-time. The filtering, also known as High Level Trigger,
processes a throughput of 40 Tb/s of data and performs a selection
of approximately 1 000:1. The throughput is thus reduced
to roughly 40 Gb/s of data output, which is then stored for
posterior analysis.
The High Level Trigger process is subdivided into two stages:
High Level Trigger 1 (HLT1) and High Level Trigger 2 (HLT2).
HLT1 occurs in real-time, and yields a reduction of data of approximately
30:1. HLT1 consists in a series of software processes
that reconstruct particle collisions. The HLT1 reconstruction only
analyzes the trajectories of particles produced at the collision,
solving a problem known as track reconstruction, that determines
whether the collision data is kept or discarded. In contrast,
HLT2 is a finer process, which requires more time to execute
and reconstructs all subdetectors composing LHCb.
Towards 2020, the LHCb detector and all the components
composing the data acquisition system will be upgraded. As
part of the data acquisition system, the servers that process
HLT1 and HLT2 will also be upgraded. In addition, the LHC
accelerator will also be updated, increasing the data generated in
every bunch crossing by roughly 5 times. Due to the accelerator
and detector upgrades, the amount of data that the HLT will
require to process is expected to increase by 40 times.
The foreseen scalability of the software through 2020 underestimated
the required resources to face the increase in data
throughput. As a consequence, studies of all algorithms composing
HLT1 and HLT2 and code modernizations were carried
out, in order to obtain a better performance and increase the
processing capability of the foreseen hardware resources in the
upgrade.
In this thesis, several algorithms of the LHCb recontruction
are explored. The track reconstruction problem is analyzed
in depth, and new algorithms are proposed. Since the analyzed
problems are massively parallel, these algorithms are implemented
in specialized languages for modern graphics cards
(GPUs), due to their inherently parallel architecture. From this
work stem two algorithm designs. Furthermore, four additional
decoding algorithms and a clustering algorithms have been designed
and implemented, which are also part of HLT1. Apart
from that, an parallel Kalman filter algorithm has been designed
and implemented, which can be used in both HLT stages.
The developed algorithms satisfy the requirements of the
LHCb collaboration for the LHCb upgrade. In order to execute
the algorithms efficiently on GPUs, a software framework specialized
for GPUs is developed, which allows executing GPU
reconstruction sequences in parallel. Combining the developed
algorithms with the framework, an execution sequence is completed
as the foundations of a GPU HLT1.
During the research carried out in this thesis, the aforementioned
developments and a small group of collaborators coordinated
by the author lead to the completion of a full GPU
HLT1 sequence. The performance obtained on GPUs allows
executing a reconstruction sequence in real-time, under LHCb
upgrade conditions. The developed GPU HLT1 constitutes the
first GPU high level trigger ever developed for an LHC experiment.
Finally, various possible realizations of the GPU HLT1 to
integrate in a production GPU-equipped data acquisition system
are detailed
Database Principles and Technologies – Based on Huawei GaussDB
This open access book contains eight chapters that deal with database technologies, including the development history of database, database fundamentals, introduction to SQL syntax, classification of SQL syntax, database security fundamentals, database development environment, database design fundamentals, and the application of Huawei’s cloud database product GaussDB database. This book can be used as a textbook for database courses in colleges and universities, and is also suitable as a reference book for the HCIA-GaussDB V1.5 certification examination. The Huawei GaussDB (for MySQL) used in the book is a Huawei cloud-based high-performance, highly applicable relational database that fully supports the syntax and functionality of the open source database MySQL. All the experiments in this book can be run on this database platform. As the world’s leading provider of ICT (information and communication technology) infrastructure and smart terminals, Huawei’s products range from digital data communication, cyber security, wireless technology, data storage, cloud computing, and smart computing to artificial intelligence
Intelligent Management of Inter-Thread Synchronization Dependencies for Concurrent Programs.
Power dissipation limits and design complexity have made the microprocessor industry less successful in improving the performance of monolithic processors, even though semiconductor technology continues to scale. Consequently, chip multiprocessors (CMPs) have become a standard for all ranges of computing from cellular phones to high-performance servers. As sufficient thread level parallelism (TLP) is necessary to exploit the computational power provided by CMPs, most performance-aware programmers need to parallelize their programs.
For shared memory multi-threaded programs, synchronization mechanisms such as mutexes, barriers, and condition variables, are used to enforce the threads to interact with each other in the way the programmers intended. However, employing synchronization operations in both correct and efficient way at the same time is extremely difficult, and there have been trade-offs between programmability and efficiency of using synchronizations.
This thesis proposes a collection of works that increase the programmability and efficiency of concurrent programs by intelligently managing the synchronization operations. First, we focus on mutex locks and unlocks. Many concurrency bug detection tools and automated bug fixers rely on the precise identification of critical sections guarded by lock/unlock operations. We suggest a practical lock/unlock pairing mechanism that combines static analysis with dynamic instrumentation to identify critical sections in POSIX multi-threaded C/C++ programs. Second, we present Dynamic Core Boosting (DCB) to accelerate critical paths in multi-thread programs. Inter-thread dependencies through synchronizations form critical paths. These critical paths are major performance bottlenecks for concurrent programs, and they are exacerbated by workload imbalances in performance asymmetric CMPs. DCB coordinates its compiler, runtime subsystem, and architecture to mitigates such performance bottlenecks. Finally, we propose exploiting synchronization operations for better energy efficiency through dynamic power management.PhDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/108886/1/netforce_1.pd
- …