Search CORE

1,128 research outputs found

A SimEvents Model for the Analysis of Scheduling and Memory Access Delays in Multicores

Author: Brandberg Caroline
Di Natale Marco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Crossref

Archivio della ricerca della Scuola Superiore Sant'Anna

Investigation on AUTOSAR-Compliant Solutions for Many-Core Architectures

Author: Becker Matthias
Behnam Moris
Dasari Dakshina
Nelis Vincent
Nolte Thomas
Pinho Luís Miguel
Publication venue
Publication date: 01/01/2015
Field of study

As of today, AUTOSAR is the de facto standard in the automotive industry, providing a common software architec- ture and development process for automotive applications. While this standard is originally written for singlecore operated Elec- tronic Control Units (ECU), new guidelines and recommendations have been added recently to provide support for multicore archi- tectures. This update came as a response to the steady increase of the number and complexity of the software functions embedded in modern vehicles, which call for the computing power of multicore execution environments. In this paper, we enumerate and analyze the design options and the challenges of porting AUTOSAR-based automotive applications onto multicore platforms. In particular, we investigate those options when considering the emerging many- core architectures that provide a more scalable environment than the traditional multicore systems. Such platforms are suitable to enable massive parallel execution, and their design is more suitable for partitioning and isolating the software components.Euromicro Conference on Digital System Design (DSD 2015), Funchal, Portugal

Repositório Científico do Instituto Politécnico do Porto

Crossref

Accelerating sequential programs using FastFlow and self-offloading

Author: Aldinucci Marco
Danelutto Marco
Kilpatrick Peter
Meneghin Massimiliano
Torquati Massimo
Publication venue
Publication date: 12/02/2010
Field of study

FastFlow is a programming environment specifically targeting cache-coherent shared-memory multi-cores. FastFlow is implemented as a stack of C++ template libraries built on top of lock-free (fence-free) synchronization mechanisms. In this paper we present a further evolution of FastFlow enabling programmers to offload part of their workload on a dynamically created software accelerator running on unused CPUs. The offloaded function can be easily derived from pre-existing sequential code. We emphasize in particular the effective trade-off between human productivity and execution efficiency of the approach.Comment: 17 pages + cove

arXiv.org e-Print Archive

UnipiEprints

Reliable scalable symbolic computation: The design of SymGridPar2

Author: Maier Patrick
Stewart R.
Trinder P.W.
Publication venue: 'Elsevier BV'
Publication date: 01/04/2014
Field of study

Symbolic computation is an important area of both Mathematics and Computer Science, with many large computations that would benefit from parallel execution. Symbolic computations are, however, challenging to parallelise as they have complex data and control structures, and both dynamic and highly irregular parallelism. The SymGridPar framework (SGP) has been developed to address these challenges on small-scale parallel architectures. However the multicore revolution means that the number of cores and the number of failures are growing exponentially, and that the communication topology is becoming increasingly complex. Hence an improved parallel symbolic computation framework is required. This paper presents the design and initial evaluation of SymGridPar2 (SGP2), a successor to SymGridPar that is designed to provide scalability onto 10^5 cores, and hence also provide fault tolerance. We present the SGP2 design goals, principles and architecture. We describe how scalability is achieved using layering and by allowing the programmer to control task placement. We outline how fault tolerance is provided by supervising remote computations, and outline higher-level fault tolerance abstractions. We describe the SGP2 implementation status and development plans. We report the scalability and efficiency, including weak scaling to about 32,000 cores, and investigate the overheads of tolerating faults for simple symbolic computations

Sheffield Hallam University Research Archive

Thermal-Aware Networked Many-Core Systems

Author: Vaddina Kameswar Rao
Publication venue: Turku Centre for Computer Science
Publication date: 23/05/2014
Field of study

Advancements in IC processing technology has led to the innovation and growth happening in the consumer electronics sector and the evolution of the IT infrastructure supporting this exponential growth. One of the most difficult obstacles to this growth is the removal of large amount of heatgenerated by the processing and communicating nodes on the system. The scaling down of technology and the increase in power density is posing a direct and consequential effect on the rise in temperature. This has resulted in the increase in cooling budgets, and affects both the life-time reliability and performance of the system. Hence, reducing on-chip temperatures has become a major design concern for modern microprocessors. This dissertation addresses the thermal challenges at different levels for both 2D planer and 3D stacked systems. It proposes a self-timed thermal monitoring strategy based on the liberal use of on-chip thermal sensors. This makes use of noise variation tolerant and leakage current based thermal sensing for monitoring purposes. In order to study thermal management issues from early design stages, accurate thermal modeling and analysis at design time is essential. In this regard, spatial temperature profile of the global Cu nanowire for on-chip interconnects has been analyzed. It presents a 3D thermal model of a multicore system in order to investigate the effects of hotspots and the placement of silicon die layers, on the thermal performance of a modern ip-chip package. For a 3D stacked system, the primary design goal is to maximise the performance within the given power and thermal envelopes. Hence, a thermally efficient routing strategy for 3D NoC-Bus hybrid architectures has been proposed to mitigate on-chip temperatures by herding most of the switching activity to the die which is closer to heat sink. Finally, an exploration of various thermal-aware placement approaches for both the 2D and 3D stacked systems has been presented. Various thermal models have been developed and thermal control metrics have been extracted. An efficient thermal-aware application mapping algorithm for a 2D NoC has been presented. It has been shown that the proposed mapping algorithm reduces the effective area reeling under high temperatures when compared to the state of the art.Siirretty Doriast

UTUPub

Reliable scalable symbolic computation: The design of SymGridPar2

Author: Al Zain
Aswad
Barroso
Borwein
Char
Cole
Daberkow
Davidson
Dean
Geck
Gropp
Halstead
Lameter
Lamport
Linton
Loogen
Lübeck
P. Maier
P.W. Trinder
R. Stewart
Schneider
Trinder
Wrzesinska
Publication venue: 'Elsevier BV'
Publication date: 01/04/2014
Field of study

Heriot Watt Pure

Crossref

Stirling Online Research Repository (RIOXX)

Sheffield Hallam University Research Archive

Stirling Online Research Repository

A survey of techniques for reducing interference in real-time applications on multicore platforms

Author: Carretero Pérez Jesús
Fernández Muñoz Javier
Lozano Santiago
Lugo Tamara
Publication venue: IEEE
Publication date: 15/02/2022
Field of study

This survey reviews the scientific literature on techniques for reducing interference in real-time multicore systems, focusing on the approaches proposed between 2015 and 2020. It also presents proposals that use interference reduction techniques without considering the predictability issue. The survey highlights interference sources and categorizes proposals from the perspective of the shared resource. It covers techniques for reducing contentions in main memory, cache memory, a memory bus, and the integration of interference effects into schedulability analysis. Every section contains an overview of each proposal and an assessment of its advantages and disadvantages.This work was supported in part by the Comunidad de Madrid Government "Nuevas Técnicas de Desarrollo de Software de Tiempo Real Embarcado Para Plataformas. MPSoC de Próxima Generación" under Grant IND2019/TIC-17261

Universidad Carlos III de Madrid e-Archivo

Performance Aspects of Synthesizable Computing Systems

Author: Schleuniger Pascal
Publication venue: Technical University of Denmark
Publication date: 01/01/2014
Field of study

Online Research Database In Technology

Autonomous Machine을 위한 실시간 스트림 처리와 센서 퓨전을 지원하는 Splash 프로그래밍 언어의 설계

Author: 노순현
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2020. 2. 홍성수.Autonomous machines have begun to be widely used in various application domains due to recent remarkable advances in machine intelligence. As these autonomous machines are equipped with diverse sensors, multicore processors and distributed computing nodes, the complexity of the underlying software platform is increasing at a rapid pace, overwhelming the developers with implementation details. This leads to a demand for a new programming framework that has an easy-to-use programming abstraction. In this thesis, we present a graphical programming framework named Splash that explicitly addresses the programming challenges that arise during the development of an autonomous machine. We set four design goals to solve the challenges. First, Splash should provide an easy-to-use, effective programming abstraction. Second, it must support real-time stream processing for deep-learning based machine learning intelligence. Third, it must provide programming support for real-time control system of autonomous machines such as sensor fusion and mode change. Finally, it should support performance optimization of software system running on a heterogeneous multicore distributed computing platform. Splash allows programmers to specify genuine, end-to-end timing constraints. Also, it provides a best-effort runtime system that tries to meet the annotated timing constraints and exception handling mechanisms to monitor the violation of such constraints. To implement these runtime mechanisms, Splash provides underlying timing semantics: (1) it provides an abstract global clock that is shared by machines in the distributed system and (2) it supports programmers to write birthmark on every stream data item. Splash offers a multithreaded process model to support concurrent programming. In the multithreaded process model, a programmer can write a multithreaded program using Splash threads we call sthreads. An sthread is a logical entity of independent execution. In addition, Splash provides a language construct named build unit that allows programmers to allocate sthreads to processes and threads of an underlying operating system. Splash provides three additional language semantics to support real-time stream processing and real-time control systems. First, it provides rate control semantics to solve uncontrolled jitter and an unbounded FIFO queue problem due to the variability in communication delay and execution time. Second, it supports fusion semantics to handle timing issues caused by asynchronous sensors in the system. Finally, it provides mode change semantics to meet varying requirements in the real-time control systems. In this paper, we describe each language semantics and runtime mechanism that realizes such semantics in detail. To show the utility of our framework, we have written a lane keeping assist system (LKAS) in Splash as an example. We evaluated rate control, sensor fusion, mode change and build unit-based allocation. First, using rate controller, the jitter was reduced from 30.61 milliseconds to 1.66 milliseconds. Also, average lateral deviation and heading angle is reduced from 0.180 meters to 0.016 meters and 0.043 rad to 0.008 rad, respectively. Second, we showed that the fusion operator works normally as intended, with a run-time overhead of only 7 microseconds on average. Third, the mode change mechanism operated correctly and incurred a run-time overhead of only 0.53 milliseconds. Finally, as we increased the number of build units from 1 to 8, the average end-to-end latency was increased from 75.79 microseconds to 2022.96 microseconds. These results show that the language semantics and runtime mechanisms proposed in this thesis are designed and implemented correctly, and Splash can be used to effectively develop applications for an autonomous machine.딥 러닝 기반 machine intelligence의 비약적인 발전으로 인해 autonomous machine들이 다양한 분야에서 활용되고 있다. 이런 기기들은 다양한 센서, 멀티코어 프로세서, 분산 컴퓨팅 노드를 장착하고 있기 때문에, 이들을 지원하기 위한 기반 소프트웨어 플랫폼의 복잡도는 빠른 속도로 증가하는 추세이다. 이에 따라 개발자들이 복잡한 소프트웨어 구조를 효과적으로 다룰 수 있도록 해주는 프로그래밍 프레임워크의 필요성이 대두되고 있다. 본 학위논문은 autonomous machine의 개발 과정에서 발생하는 문제들을 해결하기 위한 그래픽 기반 프로그래밍 프레임워크인 Splash를 제안한다. Splash라는 이름은 stream processing language for autonomous machine에서 앞의 세 단어의 첫 문자들을 따서 지어졌다. 이 이름은 물과 같이 흐르는 스트림 데이터를 다루기 위한 프로그래밍 언어와 런타임 시스템을 개발하겠다는 의도를 가진다. 본 논문에서는 복잡한 소프트웨어 구조를 효과적으로 다루기 위해 네 가지 디자인 목표를 설정한다. 첫째, Splash는 개발자에게 세부적인 구현 이슈를 숨기고, 쉽게 사용할 수 있는 프로그래밍 추상화를 제공하여야 한다. 둘째, Splash는 machine intelligence를 위한 실시간 스트림 처리를 지원할 수 있어야 한다. 셋째, Splash는 실시간 제어 시스템에서 널리 사용되는 센서 퓨전, 모드 변경, 예외 처리와 같은 기능들을 위한 지원을 제공하여야 한다. 넷째, Splash는 이기종 멀티코어 분산 컴퓨팅 플랫폼에서 수행되는 소프트웨어 시스템의 성능 최적화를 지원하여야 한다. Splash는 실시간 스트림 처리를 위해 개발자가 프로그램 상에 본질적인 end-to-end timing constraints를 명시할 수 있도록 한다. 그리고 개발자가 명시한 timing constraints를 인지하고 이를 최대한 지켜주는 best-effort 런타임 시스템과 timing constraints의 위반을 모니터링하고 처리해주는 예외 처리 메커니즘을 함께 제공한다. 이런 런타임 메커니즘들을 구현하기 위해 Splash는 두 가지 기본적인 timing semantics를 제공한다. 첫째, 분산 시스템 상에서 모든 머신들이 공유할 수 있는 global time base를 제공한다. 둘째, Splash 상에 들어오는 모든 스트림 데이터 아이템에 자신의 birthmark를 기록하도록 한다. Splash는 동시성 프로그래밍을 지원하기 위한 멀티 쓰레디드 처리 모델을 제공한다. Splash 프로그래머는 sthread라는 논리적인 수행 단위를 사용하여 프로그램을 개발할 수 있다. 그리고 Splash는 sthread들을 실제 운영체제의 수행 단위인 프로세스와 쓰레드에게 할당하는 과정을 돕기 위한 빌드 유닛이라는 language construct를 제공한다. Splash는 timing semantics와 멀티 쓰레디드 처리 모델을 기반으로 실시간 스트림 처리와 실시간 제어 시스템을 지원하기 위한 세 가지 language semantics를 추가로 지원한다. 첫째는 스트림 데이터의 통신이나 처리 지연으로 인해 발생하는 지터나 바운드 되지 않는 큐 문제를 해결하기 위한 rate 제어 semantics이다. 둘째는 센서 퓨전 과정에서 시간적으로 동기화되지 않은 센서 입력들로 인한 타이밍 이슈들을 해결하기 위한 퓨전 semantics이다. 마지막은 가변적인 제어 시스템의 요구사항을 충족시키기 위해 수행 로직의 변경을 지원하는 모드 변경 semantics이다. 본 논문에서는 각각의 language semantics를 구체적으로 설명하고, 이를 실현하기 위한 런타임 메커니즘을 설계하고 구현한다. Splash의 효용성을 검증하기 위해서, 본 논문은 Splash를 사용하여 LKAS 응용을 개발하고 이를 Splash 런타임 시스템 상에서 수행시키며 실험을 진행하였다. 본 논문에서는 rate 제어 메커니즘, 센서 퓨전 메커니즘, 모드 변경 메커니즘, 빌드 유닛 기반 allocation을 각각 선정된 성능 지표들을 사용하여 검증하였다. 첫째, Splash의 rate 제어기를 사용하면 지터가 30.61ms에서 1.66ms로 감소되었고, 이로 인해 주행 차량의 측면 편차와 방향각이 각각 0.180m에서 0.016m, 0.043rad에서 0.008rad으로 개선된다는 것을 확인하였다. 둘째, 센서 퓨전을 위해 제안된 퓨전 연산자가 설계된 의도대로 정상 동작하고, 평균 7us의 낮은 오버헤드만을 유발한다는 것을 확인하였다. 셋째, 모드 변경 기능의 정상 동작을 검증하였고 그 과정에서 발생하는 시간적 오버헤드는 평균 0.53ms에 불과하였다. 마지막으로, synthetic workload에 대해 컴포넌트들에 매핑된 빌드 유닛 개수를 1개, 2개, 4개, 8개로 증가시킴에 따라 평균 end-to-end 지연 시간은 75.79us, 330.80us, 591.87us, 2022.96us로 증가하는 것을 확인하였다. 이러한 결과들은 본 논문에서 제안하는 language semantics와 런타임 메커니즘들이 의도대로 설계, 구현되었고, 이를 통해 autonomous machine의 응용들을 효과적으로 개발할 수 있다는 것을 보여준다.Chapter 1 Introduction p.1 1.1 Motivation p.2 1.2 Splash Overview p.5 1.3 Organization of This Dissertation p.9 Chapter 2 Related Work p.10 2.1 Kahn Process Network p.10 2.2 Firing Rule Applied to a Process p.13 2.3 Programming Framework for an Autonomous Machine p.14 2.4 Runtime Software for an Autonomous Machine p.16 2.5 Rate Control p.18 2.5.1 Traffic Shaping p.20 2.5.2 Traffic Policing p.22 2.6 Sensor Fusion p.23 2.6.1 Measurement Fusion p.24 2.6.2 Situation Fusion p.27 2.7 Mode Change p.30 2.7.1 Synchronous Mode Change p.32 2.7.2 Asynchronous Mode Change p.32 Chapter 3 Motivation and Contributions p.34 3.1 Problem Description p.34 3.2 Limitations of Kahn Process Network p.36 3.3 Contributions of this Dissertation p.38 Chapter 4 Underlying Timing Semantics of Splash p.41 4.1 End-to-End Timing Constraints p.41 4.2 Global Time Base and In-order Delivery p.42 4.3 Integrating Three Distinct Computing Models p.43 Chapter 5 Splash Language Constructs p.45 5.1 Processing Component p.46 5.2 Port p.49 5.3 Channel and Clink p.52 5.4 Fusion Operator p.54 5.5 Factory and Mode Change p.60 5.6 Build Unit p.65 5.7 Exception Handling p.67 Chapter 6 Splash Runtime Mechanisms p.69 6.1 Rate Control Mechanism p.69 6.2 Sensor Fusion Mechanism p.70 6.3 Mode Change Mechanism p.77 Chapter 7 Code Generation and Runtime System p.80 7.1 Build Unit-based Allocation p.80 7.2 Code Generation Template p.82 7.3 Splash Runtime System p.84 Chapter 8 Experimental Evaluation p.86 8.1 LKAS Program p.86 8.2 Experimental Environment p.91 8.3 Evaluating Rate Control p.92 8.4 Evaluating Sensor Fusion p.96 8.5 Evaluating Mode Change p.97 8.6 Evaluating Build Unit-based Allocation p.99 Chapter 9 Conclusion p.102 Bibliography p.104 Abstract in Korean p.113Docto

SNU Open Repository and Archive