668 research outputs found
Towards a centralized multicore automotive system
Today’s automotive systems are inundated with embedded electronics to host chassis, powertrain, infotainment, advanced driver assistance systems, and other modern vehicle functions. As many as 100 embedded microcontrollers execute hundreds of millions of lines of code in a single vehicle. To control the increasing complexity in vehicle electronics and services, automakers are planning to consolidate different on-board automotive functions as software tasks on centralized multicore hardware platforms. However, these vehicle software services have different and contrasting timing, safety, and security requirements. Existing vehicle operating systems are ill-equipped to provide all the required service guarantees on a single machine. A centralized automotive system aims to tackle this by assigning software tasks to multiple criticality domains or levels according to their consequences of failures, or international safety standards like ISO 26262. This research investigates several emerging challenges in time-critical systems for a centralized multicore automotive platform and proposes a novel vehicle operating system framework to address them.
This thesis first introduces an integrated vehicle management system (VMS), called DriveOS™, for a PC-class multicore hardware platform. Its separation kernel design enables temporal and spatial isolation among critical and non-critical vehicle services in different domains on the same machine. Time- and safety-critical vehicle functions are implemented in a sandboxed Real-time Operating System (OS) domain, and non-critical software is developed in a sandboxed general-purpose OS (e.g., Linux, Android) domain. To leverage the advantages of model-driven vehicle function development, DriveOS provides a multi-domain application framework in Simulink. This thesis also presents a real-time task pipeline scheduling algorithm in multiprocessors for communication between connected vehicle services with end-to-end guarantees. The benefits and performance of the overall automotive system framework are demonstrated with hardware-in-the-loop testing using real-world applications, car datasets and simulated benchmarks, and with an early-stage deployment in a production-grade luxury electric vehicle
GPU devices for safety-critical systems: a survey
Graphics Processing Unit (GPU) devices and their associated software programming languages and frameworks can deliver the computing performance required to facilitate the development of next-generation high-performance safety-critical systems such as autonomous driving systems. However, the integration of complex, parallel, and computationally demanding software functions with different safety-criticality levels on GPU devices with shared hardware resources contributes to several safety certification challenges. This survey categorizes and provides an overview of research contributions that address GPU devices’ random hardware failures, systematic failures, and independence of execution.This work has been partially supported by the European Research Council with Horizon 2020 (grant agreements No. 772773 and 871465), the Spanish Ministry of Science and Innovation under grant PID2019-107255GB, the HiPEAC Network of Excellence and the Basque Government under grant KK-2019-00035. The Spanish Ministry of Economy and Competitiveness has also partially supported Leonidas Kosmidis with a Juan de la Cierva Incorporación postdoctoral fellowship (FJCI-2020- 045931-I).Peer ReviewedPostprint (author's final draft
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Software Fault Tolerance in Real-Time Systems: Identifying the Future Research Questions
Tolerating hardware faults in modern architectures is becoming a prominent problem due to the miniaturization of the hardware components, their increasing complexity, and the necessity to reduce the costs. Software-Implemented Hardware Fault Tolerance approaches have been developed to improve the system dependability to hardware faults without resorting to custom hardware solutions. However, these come at the expense of making the satisfaction of the timing constraints of the applications/activities harder from a scheduling standpoint. This paper surveys the current state of the art of fault tolerance approaches when used in the context real-time systems, identifying the main challenges and the cross-links between these two topics. We propose a joint scheduling-failure analysis model that highlights the formal interactions among software fault tolerance mechanisms and timing properties. This model allows us to present and discuss many open research questions with the final aim to spur the future research activities
Pushing the Boundaries of Spacecraft Autonomy and Resilience with a Custom Software Framework and Onboard Digital Twin
This research addresses the high CubeSat mission failure rates caused by inadequate software and overreliance on ground control. By applying a reliable design methodology to flight software development and developing an onboard digital twin platform with fault prediction capabilities, this study provides a solution to increase satellite resilience and autonomy, thus reducing the risk of mission failure. These findings have implications for spacecraft of all sizes, paving the way for more resilient space missions
Applying Hypervisor-Based Fault Tolerance Techniques to Safety-Critical Embedded Systems
This document details the work conducted through the development of this thesis, and it
is structured as follows:
• Chapter 1, Introduction, has briefly presented the motivation, objectives, and contributions
of this thesis.
• Chapter 2, Fundamentals, exposes a series of concepts that are necessary to correctly
understand the information presented in the rest of the thesis, such as the
concepts of virtualization, hypervisors, or software-based fault tolerance. In addition,
this chapter includes an exhaustive review and comparison between the different
hypervisors used in scientific studies dealing with safety-critical systems, and a
brief review of some works that try to improve fault tolerance in the hypervisor itself,
an area of research that is outside the scope of this work, but that complements
the mechanism presented and could be established as a line of future work.
• Chapter 3, Problem Statement and Related Work, explains the main reasons why
the concept of Hypervisor-Based Fault Tolerance was born and reviews the main
articles and research papers on the subject. This review includes both papers related
to safety-critical embedded systems (such as the research carried out in this thesis)
and papers related to cloud servers and cluster computing that, although not directly
applicable to embedded systems, may raise useful concepts that make our solution
more complete or allow us to establish future lines of work.
• Chapter 4, Proposed Solution, begins with a brief comparison of the work presented
in Chapter 3 to establish the requirements that our solution must meet in order to
be as complete and innovative as possible. It then sets out the architecture of the
proposed solution and explains in detail the two main elements of the solution: the
Voter and the Health Monitoring partition.
• Chapter 5, Prototype, explains in detail the prototyping of the proposed solution,
including the choice of the hypervisor, the processing board, and the critical functionality
to be redundant. With respect to the voter, it includes prototypes for both
the software version (the voter is implemented in a virtual machine) and the hardware
version (the voter is implemented as IP cores on the FPGA).
• Chapter 6, Evaluation, includes the evaluation of the prototype developed in Chapter
5. As a preliminary step and given that there is no evidence in this regard, an
exercise is carried out to measure the overhead involved in using the XtratuM hypervisor
versus not using it. Subsequently, qualitative tests are carried out to check that
Health Monitoring is working as expected and a fault injection campaign is carried
out to check the error detection and correction rate of our solution. Finally, a comparison
is made between the performance of the hardware and software versions of
Voter.
• Chapter 7, Conclusions and Future Work, is dedicated to collect the conclusions
obtained and the contributions made during the research (in the form of articles in
journals, conferences and contributions to projects and proposals in the industry).
In addition, it establishes some lines of future work that could complete and extend
the research carried out during this doctoral thesis.Programa de Doctorado en Ciencia y Tecnología Informática por la Universidad Carlos III de MadridPresidente: Katzalin Olcoz Herrero.- Secretario: Félix García Carballeira.- Vocal: Santiago Rodríguez de la Fuent
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
Systematic Approaches for Telemedicine and Data Coordination for COVID-19 in Baja California, Mexico
Conference proceedings info:
ICICT 2023: 2023 The 6th International Conference on Information and Computer Technologies
Raleigh, HI, United States, March 24-26, 2023
Pages 529-542We provide a model for systematic implementation of telemedicine within a large evaluation center for COVID-19 in the area of Baja California, Mexico. Our model is based on human-centric design factors and cross disciplinary collaborations for scalable data-driven enablement of smartphone, cellular, and video Teleconsul-tation technologies to link hospitals, clinics, and emergency medical services for point-of-care assessments of COVID testing, and for subsequent treatment and quar-antine decisions. A multidisciplinary team was rapidly created, in cooperation with different institutions, including: the Autonomous University of Baja California, the Ministry of Health, the Command, Communication and Computer Control Center
of the Ministry of the State of Baja California (C4), Colleges of Medicine, and the College of Psychologists. Our objective is to provide information to the public and to evaluate COVID-19 in real time and to track, regional, municipal, and state-wide data in real time that informs supply chains and resource allocation with the anticipation of a surge in COVID-19 cases. RESUMEN Proporcionamos un modelo para la implementación sistemática de la telemedicina dentro de un gran centro de evaluación de COVID-19 en el área de Baja California, México. Nuestro modelo se basa en factores de diseño centrados en el ser humano y colaboraciones interdisciplinarias para la habilitación escalable basada en datos de tecnologías de teleconsulta de teléfonos inteligentes, celulares y video para vincular hospitales, clínicas y servicios médicos de emergencia para evaluaciones de COVID en el punto de atención. pruebas, y para el tratamiento posterior y decisiones de cuarentena. Rápidamente se creó un equipo multidisciplinario, en cooperación con diferentes instituciones, entre ellas: la Universidad Autónoma de Baja California, la Secretaría de Salud, el Centro de Comando, Comunicaciones y Control Informático.
de la Secretaría del Estado de Baja California (C4), Facultades de Medicina y Colegio de Psicólogos. Nuestro objetivo es proporcionar información al público y evaluar COVID-19 en tiempo real y rastrear datos regionales, municipales y estatales en tiempo real que informan las cadenas de suministro y la asignación de recursos con la anticipación de un aumento de COVID-19. 19 casos.ICICT 2023: 2023 The 6th International Conference on Information and Computer Technologieshttps://doi.org/10.1007/978-981-99-3236-
Improving the Performance of Big Data Analytics Platforms by Task and I/O Granularity Adjustment
Department of Computer Science and EngineeringWith the massive increase in the amount of semi-structured and unstructured web data, big data analytics platforms have emerged and started to evolve rapidly. Apache Hadoop has been developed for batch processing on a large dataset, and systems for interactive and general purpose applications have been developed alongside NoSQL databases. Numerous efforts have been made to improve the performance of Hadoop and NoSQL databases, including utilizing a new device called NVMM for NoSQL databases. Nonetheless, their performance is still far from satisfactory due to inadequate granularity for tasks and I/O. In this dissertation, we present novel techniques to improve the performance of Apache Hadoop and NVMM-based LSM-tree by adjusting task and I/O granularity.
First, we analyze YARN container overhead and present dynamic input split size adjustment scheme, which can logically combine multiple HDFS blocks and increase the input size of each container, thereby enabling a single map wave and reducing the number of containers and their initialization overhead. Experimental results shows that we can avoid recurring container overhead by selecting the right size for input splits and reducing the number of containers. Second, we present a novel HDFS block coalescing scheme that mitigates the YARN con tainer overhead. Our assorted block coalescing scheme combines multiple HDFS blocks and creates large input splits of various sizes, reducing the number of containers and their initializa tion overhead. Our experimental study shows the block coalescing scheme significantly reduces the container overhead while it achieves good load balancing and job scheduling fairness without impairing the degree of overlap between map phase and reduce phase.
Third, we discuss design choice of using NVMM for indexing structure in NoSQL databases and present ZipperDB, a key-value store that redesigns LSM-tree for byte-addressable persistent memory. To benefit from the byte-addressability of persistent memory, ZipperDB employs byte addressable persistent SkipLists and performs Zipper Compaction, a novel in-place compaction algorithm that merges two adjacent persistent SkipLists without compromising the failure atomicity. The byte-addressable compaction helps mitigate the write amplification problem, which is known to be the root cause of the write stall problem in LSM-tree.
Finally, we present ListDB, a write-optimized key-value store for NVMM to overcome the gap between DRAM and NVMM write latencies and thereby, resolve the write stall problem. ListDB consists of three novel techniques: (i) byte-addressable Index-Unified Logging, which incrementally converts write-ahead logs into SkipLists, (ii) Braided SkipList, a simple NUMA aware SkipList that effectively reduces the NUMA effects of NVMM, and (iii) NUMA-aware Zipper Compaction. Using the three techniques, ListDB makes background flush and com paction fast enough to resolve the infamous write stall problem and shows 1.6x and 25x higher write throughputs than PACTree and Intel Pmem-RocksDB, respectively.ope
- …