118 research outputs found

    Computer Architectures to Close the Loop in Real-time Optimization

    Get PDF
    © 2015 IEEE.Many modern control, automation, signal processing and machine learning applications rely on solving a sequence of optimization problems, which are updated with measurements of a real system that evolves in time. The solutions of each of these optimization problems are then used to make decisions, which may be followed by changing some parameters of the physical system, thereby resulting in a feedback loop between the computing and the physical system. Real-time optimization is not the same as fast optimization, due to the fact that the computation is affected by an uncertain system that evolves in time. The suitability of a design should therefore not be judged from the optimality of a single optimization problem, but based on the evolution of the entire cyber-physical system. The algorithms and hardware used for solving a single optimization problem in the office might therefore be far from ideal when solving a sequence of real-time optimization problems. Instead of there being a single, optimal design, one has to trade-off a number of objectives, including performance, robustness, energy usage, size and cost. We therefore provide here a tutorial introduction to some of the questions and implementation issues that arise in real-time optimization applications. We will concentrate on some of the decisions that have to be made when designing the computing architecture and algorithm and argue that the choice of one informs the other

    Qduino: a cyber-physical programming platform for multicore Systems-on-Chip

    Full text link
    Emerging multicore Systems-on-Chip are enabling new cyber-physical applications such as autonomous drones, driverless cars and smart manufacturing using web-connected 3D printers. Common to those applications is a communicating task pipeline, to acquire and process sensor data and produce outputs that control actuators. As a result, these applications usually have timing requirements for both individual tasks and task pipelines formed for sensor data processing and actuation. Current cyber-physical programming platforms, such as Arduino and embedded Linux with the POSIX interface do not allow application developers to specify those timing requirements. Moreover, none of them provide the programming interface to schedule tasks and map them to processor cores, while managing I/O in a predictable manner, on multicore hardware platforms. Hence, this thesis presents the Qduino programming platform. Qduino adopts the simplicity of the Arduino API, with additional support for real-time multithreaded sketches on multicore architectures. Qduino allows application developers to specify timing properties of individual tasks as well as task pipelines at the design stage. To this end, we propose a mathematical framework to derive each task’s budget and period from the specified end-to-end timing requirements. The second part of the thesis is motivated by the observation that at the center of these pipelines are tasks that typically require complex software support, such as sensor data fusion or image processing algorithms. These features are usually developed by many man-year engineering efforts and thus commonly seen on General-Purpose Operating Systems (GPOS). Therefore, in order to support modern, intelligent cyber-physical applications, we enhance the Qduino platform’s extensibility by taking advantage of the Quest-V virtualized partitioning kernel. The platform’s usability is demonstrated by building a novel web-connected 3D printer and a prototypical autonomous drone framework in Qduino

    Energy Concerns with HPC Systems and Applications

    Full text link
    For various reasons including those related to climate changes, {\em energy} has become a critical concern in all relevant activities and technical designs. For the specific case of computer activities, the problem is exacerbated with the emergence and pervasiveness of the so called {\em intelligent devices}. From the application side, we point out the special topic of {\em Artificial Intelligence}, who clearly needs an efficient computing support in order to succeed in its purpose of being a {\em ubiquitous assistant}. There are mainly two contexts where {\em energy} is one of the top priority concerns: {\em embedded computing} and {\em supercomputing}. For the former, power consumption is critical because the amount of energy that is available for the devices is limited. For the latter, the heat dissipated is a serious source of failure and the financial cost related to energy is likely to be a significant part of the maintenance budget. On a single computer, the problem is commonly considered through the electrical power consumption. This paper, written in the form of a survey, we depict the landscape of energy concerns in computer activities, both from the hardware and the software standpoints.Comment: 20 page

    Multi-core devices for safety-critical systems: a survey

    Get PDF
    Multi-core devices are envisioned to support the development of next-generation safety-critical systems, enabling the on-chip integration of functions of different criticality. This integration provides multiple system-level potential benefits such as cost, size, power, and weight reduction. However, safety certification becomes a challenge and several fundamental safety technical requirements must be addressed, such as temporal and spatial independence, reliability, and diagnostic coverage. This survey provides a categorization and overview at different device abstraction levels (nanoscale, component, and device) of selected key research contributions that support the compliance with these fundamental safety requirements.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under grant TIN2015-65316-P, Basque Government under grant KK-2019-00035 and the HiPEAC Network of Excellence. The Spanish Ministry of Economy and Competitiveness has also partially supported Jaume Abella under Ramon y Cajal postdoctoral fellowship (RYC-2013-14717).Peer ReviewedPostprint (author's final draft

    SIMD-Swift: Improving Performance of Swift Fault Detection

    Get PDF
    The general tendency in modern hardware is an increase in fault rates, which is caused by the decreased operation voltages and feature sizes. Previously, the issue of hardware faults was mainly approached only in high-availability enterprise servers and in safety-critical applications, such as transport or aerospace domains. These fields generally have very tight requirements, but also higher budgets. However, as fault rates are increasing, fault tolerance solutions are starting to be also required in applications that have much smaller profit margins. This brings to the front the idea of software-implemented hardware fault tolerance, that is, the ability to detect and tolerate hardware faults using software-based techniques in commodity CPUs, which allows to get resilience almost for free. Current solutions, however, are lacking in performance, even though they show quite good fault tolerance results. This thesis explores the idea of using the Single Instruction Multiple Data (SIMD) technology for executing all program\'s operations on two copies of the same data. This idea is based on the observation that SIMD is ubiquitous in modern CPUs and is usually an underutilized resource. It allows us to detect bit-flips in hardware by a simple comparison of two copies under the assumption that only one copy is affected by a fault. We implemented this idea as a source-to-source compiler which performs hardening of a program on the source code level. The evaluation of our several implementations shows that it is beneficial to use it for applications that are dominated by arithmetic or logical operations, but those that have more control-flow or memory operations are actually performing better with the regular instruction replication. For example, we managed to get only 15% performance overhead on Fast Fourier Transformation benchmark, which is dominated by arithmetic instructions, but memory-access-dominated Dijkstra algorithm has shown a high overhead of 200%

    Development and certification of mixed-criticality embedded systems based on probabilistic timing analysis

    Get PDF
    An increasing variety of emerging systems relentlessly replaces or augments the functionality of mechanical subsystems with embedded electronics. For quantity, complexity, and use, the safety of such subsystems is an increasingly important matter. Accordingly, those systems are subject to safety certification to demonstrate system's safety by rigorous development processes and hardware/software constraints. The massive augment in embedded processors' complexity renders the arduous certification task significantly harder to achieve. The focus of this thesis is to address the certification challenges in multicore architectures: despite their potential to integrate several applications on a single platform, their inherent complexity imperils their timing predictability and certification. Recently, the Measurement-Based Probabilistic Timing Analysis (MBPTA) technique emerged as an alternative to deal with hardware/software complexity. The innovation that MBPTA brings about is, however, a major step from current certification procedures and standards. The particular contributions of this Thesis include: (i) the definition of certification arguments for mixed-criticality integration upon multicore processors. In particular we propose a set of safety mechanisms and procedures as required to comply with functional safety standards. For timing predictability, (ii) we present a quantitative approach to assess the likelihood of execution-time exceedance events with respect to the risk reduction requirements on safety standards. To this end, we build upon the MBPTA approach and we present the design of a safety-related source of randomization (SoR), that plays a key role in the platform-level randomization needed by MBPTA. And (iii) we evaluate current certification guidance with respect to emerging high performance design trends like caches. Overall, this Thesis pushes the certification limits in the use of multicore and MBPTA technology in Critical Real-Time Embedded Systems (CRTES) and paves the way towards their adoption in industry.Una creciente variedad de sistemas emergentes reemplazan o aumentan la funcionalidad de subsistemas mecánicos con componentes electrónicos embebidos. El aumento en la cantidad y complejidad de dichos subsistemas electrónicos así como su cometido, hacen de su seguridad una cuestión de creciente importancia. Tanto es así que la comercialización de estos sistemas críticos está sujeta a rigurosos procesos de certificación donde se garantiza la seguridad del sistema mediante estrictas restricciones en el proceso de desarrollo y diseño de su hardware y software. Esta tesis trata de abordar los nuevos retos y dificultades dadas por la introducción de procesadores multi-núcleo en dichos sistemas críticos: aunque su mayor rendimiento despierta el interés de la industria para integrar múltiples aplicaciones en una sola plataforma, suponen una mayor complejidad. Su arquitectura desafía su análisis temporal mediante los métodos tradicionales y, asimismo, su certificación es cada vez más compleja y costosa. Con el fin de lidiar con estas limitaciones, recientemente se ha desarrollado una novedosa técnica de análisis temporal probabilístico basado en medidas (MBPTA). La innovación de esta técnica, sin embargo, supone un gran cambio cultural respecto a los estándares y procedimientos tradicionales de certificación. En esta línea, las contribuciones de esta tesis están agrupadas en tres ejes principales: (i) definición de argumentos de seguridad para la certificación de aplicaciones de criticidad-mixta sobre plataformas multi-núcleo. Se definen, en particular, mecanismos de seguridad, técnicas de diagnóstico y reacción de faltas acorde con el estándar IEC 61508 sobre una arquitectura multi-núcleo de referencia. Respecto al análisis temporal, (ii) presentamos la cuantificación de la probabilidad de exceder un límite temporal y su relación con los requisitos de reducción de riesgos derivados de los estándares de seguridad funcional. Con este fin, nos basamos en la técnica MBPTA y presentamos el diseño de una fuente de números aleatorios segura; un componente clave para conseguir las propiedades aleatorias requeridas por MBPTA a nivel de plataforma. Por último, (iii) extrapolamos las guías actuales para la certificación de arquitecturas multi-núcleo a una solución comercial de 8 núcleos y las evaluamos con respecto a las tendencias emergentes de diseño de alto rendimiento (caches). Con estas contribuciones, esta tesis trata de abordar los retos que el uso de procesadores multi-núcleo y MBPTA implican en el proceso de certificación de sistemas críticos de tiempo real y facilita, de esta forma, su adopción por la industria.Postprint (published version

    Autonomous Machine을 위한 실시간 스트림 처리와 센서 퓨전을 지원하는 Splash 프로그래밍 언어의 설계

    Get PDF
    학위논문(박사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2020. 2. 홍성수.Autonomous machines have begun to be widely used in various application domains due to recent remarkable advances in machine intelligence. As these autonomous machines are equipped with diverse sensors, multicore processors and distributed computing nodes, the complexity of the underlying software platform is increasing at a rapid pace, overwhelming the developers with implementation details. This leads to a demand for a new programming framework that has an easy-to-use programming abstraction. In this thesis, we present a graphical programming framework named Splash that explicitly addresses the programming challenges that arise during the development of an autonomous machine. We set four design goals to solve the challenges. First, Splash should provide an easy-to-use, effective programming abstraction. Second, it must support real-time stream processing for deep-learning based machine learning intelligence. Third, it must provide programming support for real-time control system of autonomous machines such as sensor fusion and mode change. Finally, it should support performance optimization of software system running on a heterogeneous multicore distributed computing platform. Splash allows programmers to specify genuine, end-to-end timing constraints. Also, it provides a best-effort runtime system that tries to meet the annotated timing constraints and exception handling mechanisms to monitor the violation of such constraints. To implement these runtime mechanisms, Splash provides underlying timing semantics: (1) it provides an abstract global clock that is shared by machines in the distributed system and (2) it supports programmers to write birthmark on every stream data item. Splash offers a multithreaded process model to support concurrent programming. In the multithreaded process model, a programmer can write a multithreaded program using Splash threads we call sthreads. An sthread is a logical entity of independent execution. In addition, Splash provides a language construct named build unit that allows programmers to allocate sthreads to processes and threads of an underlying operating system. Splash provides three additional language semantics to support real-time stream processing and real-time control systems. First, it provides rate control semantics to solve uncontrolled jitter and an unbounded FIFO queue problem due to the variability in communication delay and execution time. Second, it supports fusion semantics to handle timing issues caused by asynchronous sensors in the system. Finally, it provides mode change semantics to meet varying requirements in the real-time control systems. In this paper, we describe each language semantics and runtime mechanism that realizes such semantics in detail. To show the utility of our framework, we have written a lane keeping assist system (LKAS) in Splash as an example. We evaluated rate control, sensor fusion, mode change and build unit-based allocation. First, using rate controller, the jitter was reduced from 30.61 milliseconds to 1.66 milliseconds. Also, average lateral deviation and heading angle is reduced from 0.180 meters to 0.016 meters and 0.043 rad to 0.008 rad, respectively. Second, we showed that the fusion operator works normally as intended, with a run-time overhead of only 7 microseconds on average. Third, the mode change mechanism operated correctly and incurred a run-time overhead of only 0.53 milliseconds. Finally, as we increased the number of build units from 1 to 8, the average end-to-end latency was increased from 75.79 microseconds to 2022.96 microseconds. These results show that the language semantics and runtime mechanisms proposed in this thesis are designed and implemented correctly, and Splash can be used to effectively develop applications for an autonomous machine.딥 러닝 기반 machine intelligence의 비약적인 발전으로 인해 autonomous machine들이 다양한 분야에서 활용되고 있다. 이런 기기들은 다양한 센서, 멀티코어 프로세서, 분산 컴퓨팅 노드를 장착하고 있기 때문에, 이들을 지원하기 위한 기반 소프트웨어 플랫폼의 복잡도는 빠른 속도로 증가하는 추세이다. 이에 따라 개발자들이 복잡한 소프트웨어 구조를 효과적으로 다룰 수 있도록 해주는 프로그래밍 프레임워크의 필요성이 대두되고 있다. 본 학위논문은 autonomous machine의 개발 과정에서 발생하는 문제들을 해결하기 위한 그래픽 기반 프로그래밍 프레임워크인 Splash를 제안한다. Splash라는 이름은 stream processing language for autonomous machine에서 앞의 세 단어의 첫 문자들을 따서 지어졌다. 이 이름은 물과 같이 흐르는 스트림 데이터를 다루기 위한 프로그래밍 언어와 런타임 시스템을 개발하겠다는 의도를 가진다. 본 논문에서는 복잡한 소프트웨어 구조를 효과적으로 다루기 위해 네 가지 디자인 목표를 설정한다. 첫째, Splash는 개발자에게 세부적인 구현 이슈를 숨기고, 쉽게 사용할 수 있는 프로그래밍 추상화를 제공하여야 한다. 둘째, Splash는 machine intelligence를 위한 실시간 스트림 처리를 지원할 수 있어야 한다. 셋째, Splash는 실시간 제어 시스템에서 널리 사용되는 센서 퓨전, 모드 변경, 예외 처리와 같은 기능들을 위한 지원을 제공하여야 한다. 넷째, Splash는 이기종 멀티코어 분산 컴퓨팅 플랫폼에서 수행되는 소프트웨어 시스템의 성능 최적화를 지원하여야 한다. Splash는 실시간 스트림 처리를 위해 개발자가 프로그램 상에 본질적인 end-to-end timing constraints를 명시할 수 있도록 한다. 그리고 개발자가 명시한 timing constraints를 인지하고 이를 최대한 지켜주는 best-effort 런타임 시스템과 timing constraints의 위반을 모니터링하고 처리해주는 예외 처리 메커니즘을 함께 제공한다. 이런 런타임 메커니즘들을 구현하기 위해 Splash는 두 가지 기본적인 timing semantics를 제공한다. 첫째, 분산 시스템 상에서 모든 머신들이 공유할 수 있는 global time base를 제공한다. 둘째, Splash 상에 들어오는 모든 스트림 데이터 아이템에 자신의 birthmark를 기록하도록 한다. Splash는 동시성 프로그래밍을 지원하기 위한 멀티 쓰레디드 처리 모델을 제공한다. Splash 프로그래머는 sthread라는 논리적인 수행 단위를 사용하여 프로그램을 개발할 수 있다. 그리고 Splash는 sthread들을 실제 운영체제의 수행 단위인 프로세스와 쓰레드에게 할당하는 과정을 돕기 위한 빌드 유닛이라는 language construct를 제공한다. Splash는 timing semantics와 멀티 쓰레디드 처리 모델을 기반으로 실시간 스트림 처리와 실시간 제어 시스템을 지원하기 위한 세 가지 language semantics를 추가로 지원한다. 첫째는 스트림 데이터의 통신이나 처리 지연으로 인해 발생하는 지터나 바운드 되지 않는 큐 문제를 해결하기 위한 rate 제어 semantics이다. 둘째는 센서 퓨전 과정에서 시간적으로 동기화되지 않은 센서 입력들로 인한 타이밍 이슈들을 해결하기 위한 퓨전 semantics이다. 마지막은 가변적인 제어 시스템의 요구사항을 충족시키기 위해 수행 로직의 변경을 지원하는 모드 변경 semantics이다. 본 논문에서는 각각의 language semantics를 구체적으로 설명하고, 이를 실현하기 위한 런타임 메커니즘을 설계하고 구현한다. Splash의 효용성을 검증하기 위해서, 본 논문은 Splash를 사용하여 LKAS 응용을 개발하고 이를 Splash 런타임 시스템 상에서 수행시키며 실험을 진행하였다. 본 논문에서는 rate 제어 메커니즘, 센서 퓨전 메커니즘, 모드 변경 메커니즘, 빌드 유닛 기반 allocation을 각각 선정된 성능 지표들을 사용하여 검증하였다. 첫째, Splash의 rate 제어기를 사용하면 지터가 30.61ms에서 1.66ms로 감소되었고, 이로 인해 주행 차량의 측면 편차와 방향각이 각각 0.180m에서 0.016m, 0.043rad에서 0.008rad으로 개선된다는 것을 확인하였다. 둘째, 센서 퓨전을 위해 제안된 퓨전 연산자가 설계된 의도대로 정상 동작하고, 평균 7us의 낮은 오버헤드만을 유발한다는 것을 확인하였다. 셋째, 모드 변경 기능의 정상 동작을 검증하였고 그 과정에서 발생하는 시간적 오버헤드는 평균 0.53ms에 불과하였다. 마지막으로, synthetic workload에 대해 컴포넌트들에 매핑된 빌드 유닛 개수를 1개, 2개, 4개, 8개로 증가시킴에 따라 평균 end-to-end 지연 시간은 75.79us, 330.80us, 591.87us, 2022.96us로 증가하는 것을 확인하였다. 이러한 결과들은 본 논문에서 제안하는 language semantics와 런타임 메커니즘들이 의도대로 설계, 구현되었고, 이를 통해 autonomous machine의 응용들을 효과적으로 개발할 수 있다는 것을 보여준다.Chapter 1 Introduction p.1 1.1 Motivation p.2 1.2 Splash Overview p.5 1.3 Organization of This Dissertation p.9 Chapter 2 Related Work p.10 2.1 Kahn Process Network p.10 2.2 Firing Rule Applied to a Process p.13 2.3 Programming Framework for an Autonomous Machine p.14 2.4 Runtime Software for an Autonomous Machine p.16 2.5 Rate Control p.18 2.5.1 Traffic Shaping p.20 2.5.2 Traffic Policing p.22 2.6 Sensor Fusion p.23 2.6.1 Measurement Fusion p.24 2.6.2 Situation Fusion p.27 2.7 Mode Change p.30 2.7.1 Synchronous Mode Change p.32 2.7.2 Asynchronous Mode Change p.32 Chapter 3 Motivation and Contributions p.34 3.1 Problem Description p.34 3.2 Limitations of Kahn Process Network p.36 3.3 Contributions of this Dissertation p.38 Chapter 4 Underlying Timing Semantics of Splash p.41 4.1 End-to-End Timing Constraints p.41 4.2 Global Time Base and In-order Delivery p.42 4.3 Integrating Three Distinct Computing Models p.43 Chapter 5 Splash Language Constructs p.45 5.1 Processing Component p.46 5.2 Port p.49 5.3 Channel and Clink p.52 5.4 Fusion Operator p.54 5.5 Factory and Mode Change p.60 5.6 Build Unit p.65 5.7 Exception Handling p.67 Chapter 6 Splash Runtime Mechanisms p.69 6.1 Rate Control Mechanism p.69 6.2 Sensor Fusion Mechanism p.70 6.3 Mode Change Mechanism p.77 Chapter 7 Code Generation and Runtime System p.80 7.1 Build Unit-based Allocation p.80 7.2 Code Generation Template p.82 7.3 Splash Runtime System p.84 Chapter 8 Experimental Evaluation p.86 8.1 LKAS Program p.86 8.2 Experimental Environment p.91 8.3 Evaluating Rate Control p.92 8.4 Evaluating Sensor Fusion p.96 8.5 Evaluating Mode Change p.97 8.6 Evaluating Build Unit-based Allocation p.99 Chapter 9 Conclusion p.102 Bibliography p.104 Abstract in Korean p.113Docto

    Thermal Balancing Policy for Streaming Computing on Multiprocessor Architectures

    Get PDF
    As feature sizes decrease, power dissipation and heat generation density exponentially increase. Thus, temperature gradients in Multiprocessor Systems on Chip (MPSoCs) can seriously impact system performance and reliability. Thermal balancing policies based on task migration have been proposed to modulate power distribution between processing cores to achieve temperature flattening. However, in the context of MPSoC for multimedia streaming computing, where timeliness is critical, the impact of migration on quality of service must be carefully analyzed. In this paper we present the design and implementation of a lightweight thermal balancing policy that reduces on-chip temperature gradients via task migration. This policy exploits run-time temperature and load information to balance the chip temperature. Moreover, we assess the effectiveness of the proposed policy for streaming computing architectures by analyzing deadlines misses and architectural thermal effects of task migration using a cycle-accurate thermal-aware emulation infrastructure. Our results using a real-life software defined radio multitask benchmark show that our policy achieves thermal balancing while keeping migration costs bounded
    corecore