1,449 research outputs found

    Modern computing: Vision and challenges

    Get PDF
    Over the past six decades, the computing systems field has experienced significant transformations, profoundly impacting society with transformational developments, such as the Internet and the commodification of computing. Underpinned by technological advancements, computer systems, far from being static, have been continuously evolving and adapting to cover multifaceted societal niches. This has led to new paradigms such as cloud, fog, edge computing, and the Internet of Things (IoT), which offer fresh economic and creative opportunities. Nevertheless, this rapid change poses complex research challenges, especially in maximizing potential and enhancing functionality. As such, to maintain an economical level of performance that meets ever-tighter requirements, one must understand the drivers of new model emergence and expansion, and how contemporary challenges differ from past ones. To that end, this article investigates and assesses the factors influencing the evolution of computing systems, covering established systems and architectures as well as newer developments, such as serverless computing, quantum computing, and on-device AI on edge devices. Trends emerge when one traces technological trajectory, which includes the rapid obsolescence of frameworks due to business and technical constraints, a move towards specialized systems and models, and varying approaches to centralized and decentralized control. This comprehensive review of modern computing systems looks ahead to the future of research in the field, highlighting key challenges and emerging trends, and underscoring their importance in cost-effectively driving technological progress

    Towards a centralized multicore automotive system

    Get PDF
    Today’s automotive systems are inundated with embedded electronics to host chassis, powertrain, infotainment, advanced driver assistance systems, and other modern vehicle functions. As many as 100 embedded microcontrollers execute hundreds of millions of lines of code in a single vehicle. To control the increasing complexity in vehicle electronics and services, automakers are planning to consolidate different on-board automotive functions as software tasks on centralized multicore hardware platforms. However, these vehicle software services have different and contrasting timing, safety, and security requirements. Existing vehicle operating systems are ill-equipped to provide all the required service guarantees on a single machine. A centralized automotive system aims to tackle this by assigning software tasks to multiple criticality domains or levels according to their consequences of failures, or international safety standards like ISO 26262. This research investigates several emerging challenges in time-critical systems for a centralized multicore automotive platform and proposes a novel vehicle operating system framework to address them. This thesis first introduces an integrated vehicle management system (VMS), called DriveOS™, for a PC-class multicore hardware platform. Its separation kernel design enables temporal and spatial isolation among critical and non-critical vehicle services in different domains on the same machine. Time- and safety-critical vehicle functions are implemented in a sandboxed Real-time Operating System (OS) domain, and non-critical software is developed in a sandboxed general-purpose OS (e.g., Linux, Android) domain. To leverage the advantages of model-driven vehicle function development, DriveOS provides a multi-domain application framework in Simulink. This thesis also presents a real-time task pipeline scheduling algorithm in multiprocessors for communication between connected vehicle services with end-to-end guarantees. The benefits and performance of the overall automotive system framework are demonstrated with hardware-in-the-loop testing using real-world applications, car datasets and simulated benchmarks, and with an early-stage deployment in a production-grade luxury electric vehicle

    Secure storage systems for untrusted cloud environments

    Get PDF
    The cloud has become established for applications that need to be scalable and highly available. However, moving data to data centers owned and operated by a third party, i.e., the cloud provider, raises security concerns because a cloud provider could easily access and manipulate the data or program flow, preventing the cloud from being used for certain applications, like medical or financial. Hardware vendors are addressing these concerns by developing Trusted Execution Environments (TEEs) that make the CPU state and parts of memory inaccessible from the host software. While TEEs protect the current execution state, they do not provide security guarantees for data which does not fit nor reside in the protected memory area, like network and persistent storage. In this work, we aim to address TEEs’ limitations in three different ways, first we provide the trust of TEEs to persistent storage, second we extend the trust to multiple nodes in a network, and third we propose a compiler-based solution for accessing heterogeneous memory regions. More specifically, • SPEICHER extends the trust provided by TEEs to persistent storage. SPEICHER implements a key-value interface. Its design is based on LSM data structures, but extends them to provide confidentiality, integrity, and freshness for the stored data. Thus, SPEICHER can prove to the client that the data has not been tampered with by an attacker. • AVOCADO is a distributed in-memory key-value store (KVS) that extends the trust that TEEs provide across the network to multiple nodes, allowing KVSs to scale beyond the boundaries of a single node. On each node, AVOCADO carefully divides data between trusted memory and untrusted host memory, to maximize the amount of data that can be stored on each node. AVOCADO leverages the fact that we can model network attacks as crash-faults to trust other nodes with a hardened ABD replication protocol. • TOAST is based on the observation that modern high-performance systems often use several different heterogeneous memory regions that are not easily distinguishable by the programmer. The number of regions is increased by the fact that TEEs divide memory into trusted and untrusted regions. TOAST is a compiler-based approach to unify access to different heterogeneous memory regions and provides programmability and portability. TOAST uses a load/store interface to abstract most library interfaces for different memory regions

    Untersuchung von Performanzveränderungen auf Quelltextebene

    Get PDF
    Änderungen am Quelltext einer Software können zu veränderter Performanz führen. Um das Auftreten von Regressionen zu verhindern und die Effekte von Quelltextänderungen, von denen eine Verbesserung erwartet wird, zu überprüfen, ist die Messung der Auswirkungen von Quelltextänderungen auf die Performanz sowie das tiefgehende Verständnis des Laufzeitverhaltens der beteiligten Quelltextkonstrukte notwendig. Die Spezifikation von Benchmarks oder Lasttests, um Regressionen zu erkennen, erfordert immensen manuellen Aufwand. Für das Verständnis der Änderungen sind anschließend oft weitere Experimente notwendig. In der vorliegenden Arbeit wird der Ansatz Performanzanalyse von Softwaresystemen (Peass) entwickelt. Peass beruht auf der Annahme, dass Performanzänderungen durch Messung der Performanz von Unittests erkennbar ist. Peass besteht aus (1) einer Methode zur Regressionstestselektion, d. h. zur Bestimmung, zwischen welchen Commits sich die Performanz geändert haben kann basierend auf statischer Quelltextanalyse und Analyse des Laufzeitverhaltens, (2) einer Methode zur Umwandlung von Unittests in Performanztests und zur statistisch zuverlässigen und reproduzierbaren Messung der Performanz und (3) einer Methode zur Unterstützung des Verstehens von Ursachen von Performanzänderungen. Der Peass-Ansatzes ermöglicht es somit, durch den Workload von Unittests messbare Performanzänderungen automatisiert zu untersuchen. Die Validität des Ansatzes wird geprüft, indem gezeigt wird, dass (1) typische Performanzprobleme in künstlichen Testfällen und (2) reale, durch Entwickler markierte Performanzänderungen durch Peass gefunden werden können. Durch eine Fallstudie in einem laufenden Softwareentwicklungsprojekt wird darüber hinaus gezeigt, dass Peass in der Lage ist, relevante Performanzänderungen zu erkennen.:1 Einleitung 1.1 Motivation 1.2 Ansatz 1.3 Forschungsfragen 1.4 Beiträge 1.5 Aufbau der Arbeit 2 Grundlagen 2.1 Software Performance Engineering 2.2 Modellbasierter Ansatz 2.2.1 Überblick 2.2.2 Performanzantipattern 2.3 Messbasierter Ansatz 2.3.1 Messprozess 2.3.2 Messwertanalyse 2.4 Messung in künstlichen Umgebungen 2.4.1 Benchmarking 2.4.2 Lasttests 2.4.3 Performanztests 2.5 Messung in realen Umgebungen: Monitoring 2.5.1 Überblick 2.5.2 Umsetzung 2.5.3 Werkzeuge 3 Regressionstestselektion 3.1 Ansatz 3.1.1 Grundidee 3.1.2 Voraussetzungen 3.1.3 Zweistufiger Prozess 3.2 Statische Testselektion 3.2.1 Selektierte Änderungen 3.2.2 Prozess 3.2.3 Implementierung 3.3 Tracevergleich 3.3.1 Selektierte Änderungen 3.3.2 Prozess 3.3.3 Implementierung 3.3.4 Kombination mit statischer Analyse 3.4 Evaluation 3.4.1 Implementierung 3.4.2 Exaktheit 3.4.3 Korrektheit 3.4.4 Diskussion der Validität 3.5 Verwandte Arbeiten 3.5.1 Funktionale Regressionstestbestimmung 3.5.2 Regressionstestbestimmung für Performanztests 4 Messprozess 4.1 Vergleich von Mess- und Analysemethoden 4.1.1 Vorgehen 4.1.2 Fehlerbetrachtung 4.1.3 Workloadgröße der künstlichen Unittestpaare 4.2 Messmethode 4.2.1 Aufbau einer Iteration 4.2.2 Beenden von Messungen 4.2.3 Garbage Collection je Iteration 4.2.4 Umgang mit Standardausgabe 4.2.5 Zusammenfassung der Messmethode 4.3 Analysemethode 4.3.1 Auswahl des statistischen Tests 4.3.2 Ausreißerentfernung 4.3.3 Parallelisierung 4.4 Evaluierung 4.4.1 Vergleich mit JMH 4.4.2 Reproduzierbarkeit der Ergebnisse 4.4.3 Fazit 4.5 Verwandte Arbeiten 4.5.1 Beenden von Messungen 4.5.2 Änderungserkennung 4.5.3 Anomalieerkennung 5 Ursachenanalyse 5.1 Reduktion des Overheads der Messung einzelner Methoden 5.1.1 Generierung von Beispielprojekten 5.1.2 Messung von Methodenausführungsdauern 5.1.3 Optionen zur Overheadreduktion 5.1.4 Messergebnisse 5.1.5 Überprüfung mit MooBench 5.2 Messkonfiguration der Ursachenanalyse 5.2.1 Grundlagen 5.2.2 Fehlerbetrachtung 5.2.3 Ansatz 5.2.4 Messergebnisse 5.3 Verwandte Arbeiten 5.3.1 Monitoringoverhead 5.3.2 Ursachenanalyse für Performanzänderungen 5.3.3 Ursachenanalyse für Performanzprobleme 6 Evaluation 6.1 Validierung durch künstliche Performanzprobleme 6.1.1 Reproduktion durch Benchmarks 6.1.2 Umwandlung der Benchmarks 6.1.3 Überprüfen von Problemen mit Peass 6.2 Evaluation durch reale Performanzprobleme 6.2.1 Untersuchung dokumentierter Performanzänderungen offenen Projekten 6.2.2 Untersuchung der Performanzänderungen in GeoMap 7 Zusammenfassung und Ausblick 7.1 Zusammenfassung 7.2 AusblickChanges to the source code of a software may result in varied performance. In order to prevent the occurance of regressions and check the effect of source changes, which are expected to result in performance improvements, both the measurement of the impact of source code changes and a deep understanding of the runtime behaviour of the used source code elements are necessary. The specification of benchmarks and load tests, which are able to detect performance regressions, requires immense manual effort. To understand the changes, often additional experiments are necessary. This thesis develops the Peass approach (Performance analysis of software systems). Peass is based on the assumption, that performance changes can be identified by unit tests. Therefore, Peass consists of (1) a method for regression test selection, which determines between which commits the performance may have changed based on static code analysis and analysis of the runtime behavior, (2) a method for transforming unit tests into performance tests and for statistically reliable and reproducible measurement of the performance and (3) a method for aiding the diagnosis of root causes of performance changes. The Peass approach thereby allows to automatically examine performance changes that are measurable by the workload of unit tests. The validity of the approach is evaluated by showing that (1) typical performance problems in artificial test cases and (2) real, developer-tagged performance changes can be found by Peass. Furthermore, a case study in an ongoing software development project shows that Peass is able to detect relevant performance changes.:1 Einleitung 1.1 Motivation 1.2 Ansatz 1.3 Forschungsfragen 1.4 Beiträge 1.5 Aufbau der Arbeit 2 Grundlagen 2.1 Software Performance Engineering 2.2 Modellbasierter Ansatz 2.2.1 Überblick 2.2.2 Performanzantipattern 2.3 Messbasierter Ansatz 2.3.1 Messprozess 2.3.2 Messwertanalyse 2.4 Messung in künstlichen Umgebungen 2.4.1 Benchmarking 2.4.2 Lasttests 2.4.3 Performanztests 2.5 Messung in realen Umgebungen: Monitoring 2.5.1 Überblick 2.5.2 Umsetzung 2.5.3 Werkzeuge 3 Regressionstestselektion 3.1 Ansatz 3.1.1 Grundidee 3.1.2 Voraussetzungen 3.1.3 Zweistufiger Prozess 3.2 Statische Testselektion 3.2.1 Selektierte Änderungen 3.2.2 Prozess 3.2.3 Implementierung 3.3 Tracevergleich 3.3.1 Selektierte Änderungen 3.3.2 Prozess 3.3.3 Implementierung 3.3.4 Kombination mit statischer Analyse 3.4 Evaluation 3.4.1 Implementierung 3.4.2 Exaktheit 3.4.3 Korrektheit 3.4.4 Diskussion der Validität 3.5 Verwandte Arbeiten 3.5.1 Funktionale Regressionstestbestimmung 3.5.2 Regressionstestbestimmung für Performanztests 4 Messprozess 4.1 Vergleich von Mess- und Analysemethoden 4.1.1 Vorgehen 4.1.2 Fehlerbetrachtung 4.1.3 Workloadgröße der künstlichen Unittestpaare 4.2 Messmethode 4.2.1 Aufbau einer Iteration 4.2.2 Beenden von Messungen 4.2.3 Garbage Collection je Iteration 4.2.4 Umgang mit Standardausgabe 4.2.5 Zusammenfassung der Messmethode 4.3 Analysemethode 4.3.1 Auswahl des statistischen Tests 4.3.2 Ausreißerentfernung 4.3.3 Parallelisierung 4.4 Evaluierung 4.4.1 Vergleich mit JMH 4.4.2 Reproduzierbarkeit der Ergebnisse 4.4.3 Fazit 4.5 Verwandte Arbeiten 4.5.1 Beenden von Messungen 4.5.2 Änderungserkennung 4.5.3 Anomalieerkennung 5 Ursachenanalyse 5.1 Reduktion des Overheads der Messung einzelner Methoden 5.1.1 Generierung von Beispielprojekten 5.1.2 Messung von Methodenausführungsdauern 5.1.3 Optionen zur Overheadreduktion 5.1.4 Messergebnisse 5.1.5 Überprüfung mit MooBench 5.2 Messkonfiguration der Ursachenanalyse 5.2.1 Grundlagen 5.2.2 Fehlerbetrachtung 5.2.3 Ansatz 5.2.4 Messergebnisse 5.3 Verwandte Arbeiten 5.3.1 Monitoringoverhead 5.3.2 Ursachenanalyse für Performanzänderungen 5.3.3 Ursachenanalyse für Performanzprobleme 6 Evaluation 6.1 Validierung durch künstliche Performanzprobleme 6.1.1 Reproduktion durch Benchmarks 6.1.2 Umwandlung der Benchmarks 6.1.3 Überprüfen von Problemen mit Peass 6.2 Evaluation durch reale Performanzprobleme 6.2.1 Untersuchung dokumentierter Performanzänderungen offenen Projekten 6.2.2 Untersuchung der Performanzänderungen in GeoMap 7 Zusammenfassung und Ausblick 7.1 Zusammenfassung 7.2 Ausblic

    Efficient concurrent data structure access parallelism techniques for increasing scalability

    Get PDF
    Multi-core processors have revolutionised the way data structures are designed by bringing parallelism to mainstream computing. Key to exploiting hardware parallelism available in multi-core processors are concurrent data structures. However, some concurrent data structure abstractions are inherently sequential and incapable of harnessing the parallelism performance of multi-core processors. Designing and implementing concurrent data structures to harness hardware parallelism is challenging due to the requirement of correctness, efficiency and practicability under various application constraints. In this thesis, our research contribution is towards improving concurrent data structure access parallelism to increase data structure performance. We propose new design frameworks that improve access parallelism of already existing concurrent data structure designs. Also, we propose new concurrent data structure designs with significant performance improvements. To give an insight into the interplay between hardware and concurrent data structure access parallelism, we give a detailed analysis and model the performance scalability with varying parallelism.In the first part of the thesis, we focus on data structure semantic relaxation. By relaxing the semantics of a data structure, a bigger design space, that allows weaker synchronization and more useful parallelism, is unveiled. Investigating new data structure designs, capable of trading semantics for achieving better performance in a monotonic way, is a major challenge in the area. We algorithmically address this challenge in this part of the thesis. We present an efficient, lock-free, concurrent data structure design framework for out-of-order semantic relaxation. We introduce a new two-dimensional algorithmic design, that uses multiple instances of a given data structure to improve access parallelism. In the second part of the thesis, we propose an efficient priority queue that improves access parallelism by reducing the number of synchronization points for each operation. Priority queues are fundamental abstract data types, often used to manage limited resources in parallel systems. Typical proposed parallel priority queue implementations are based on heaps or skip lists. In recent literature, skip lists have been shown to be the most efficient design choice for implementing priority queues. Though numerous intricate implementations of skip list based queues have been proposed in the literature, their performance is constrained by the high number of global atomic updates per operation and the high memory consumption, which are proportional to the number of sub-lists in the queue. In this part of the thesis, we propose an alternative approach for designing lock-free linearizable priority queues, that significantly improve memory efficiency and throughput performance, by reducing the number of global atomic updates and memory consumption as compared to skip-list based queues. To achieve this, our new design combines two structures; a search tree and a linked list, forming what we call a Tree Search List Queue (TSLQueue). Subsequently, we analyse and introduce a model for lock-free concurrent data structure access parallelism. The major impediment to scaling concurrent data structures is memory contention when accessing shared data structure access points, leading to thread serialisation, and hindering parallelism. Aiming to address this challenge, a significant amount of work in the literature has proposed multi-access techniques that improve concurrent data structure parallelism. However, there is little work on analysing and modelling the execution behaviour of concurrent multi-access data structures especially in a shared memory setting. In this part of the thesis, we analyse and model the general execution behaviour of concurrent multi-access data structures in the shared memory setting. We study and analyse the behaviour of the two popular random access patterns: shared (Remote) and exclusive (Local) access, and the behaviour of the two most commonly used atomic primitives for designing lock-free data structures: Compare and Swap, and, Fetch and Add

    Programming parallel dense matrix factorizations and inversion for new-generation NUMA architectures

    Get PDF
    We propose a methodology to address the programmability issues derived from the emergence of new-generation shared-memory NUMA architectures. For this purpose, we employ dense matrix factorizations and matrix inversion (DMFI) as a use case, and we target two modern architectures (AMD Rome and Huawei Kunpeng 920) that exhibit configurable NUMA topologies. Our methodology pursues performance portability across different NUMA configurations by proposing multi-domain implementations for DMFI plus a hybrid task- and loop-level parallelization that configures multi-threaded executions to fix core-to-data binding, exploiting locality at the expense of minor code modifications. In addition, we introduce a generalization of the multi-domain implementations for DMFI that offers support for virtually any NUMA topology in present and future architectures. Our experimentation on the two target architectures for three representative dense linear algebra operations validates the proposal, reveals insights on the necessity of adapting both the codes and their execution to improve data access locality, and reports performance across architectures and inter- and intra-socket NUMA configurations competitive with state-of-the-art message-passing implementations, maintaining the ease of development usually associated with shared-memory programming.This research was sponsored by project PID2019-107255GB of Ministerio de Ciencia, Innovación y Universidades; project S2018/TCS-4423 of Comunidad de Madrid; project 2017-SGR-1414 of the Generalitat de Catalunya and the Madrid Government under the Multiannual Agreement with UCM in the line Program to Stimulate Research for Young Doctors in the context of the V PRICIT, project PR65/19-22445. This project has also received funding from the European High-Performance Computing Joint Undertaking (JU) under grant agreement No 955558. The JU receives support from the European Union’s Horizon 2020 research and innovation programme, and Spain, Germany, France, Italy, Poland, Switzerland, Norway. The work is also supported by grants PID2020-113656RB-C22 and PID2021-126576NB-I00 of MCIN/AEI/10.13039/501100011033 and by ERDF A way of making Europe.Peer ReviewedPostprint (published version

    A Survey of FPGA Optimization Methods for Data Center Energy Efficiency

    Get PDF
    This article provides a survey of academic literature about field programmable gate array (FPGA) and their utilization for energy efficiency acceleration in data centers. The goal is to critically present the existing FPGA energy optimization techniques and discuss how they can be applied to such systems. To do so, the article explores current energy trends and their projection to the future with particular attention to the requirements set out by the European Code of Conduct for Data Center Energy Efficiency. The article then proposes a complete analysis of over ten years of research in energy optimization techniques, classifying them by purpose, method of application, and impacts on the sources of consumption. Finally, we conclude with the challenges and possible innovations we expect for this sector.Comment: Accepted for publication in IEEE Transactions on Sustainable Computin

    Evaluating Architectural Safeguards for Uncertain AI Black-Box Components

    Get PDF
    Although tremendous progress has been made in Artificial Intelligence (AI), it entails new challenges. The growing complexity of learning tasks requires more complex AI components, which increasingly exhibit unreliable behaviour. In this book, we present a model-driven approach to model architectural safeguards for AI components and analyse their effect on the overall system reliability
    corecore