Search CORE

450 research outputs found

Schedulability Analysis for Multi-Core Systems Accounting for Resource Stress and Sensitivity

Author: Bate Iain
Davis Robert I.
Griffin David
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd Euromicro Conference on Real-Time Systems (ECRTS 2021)
Publication date: 01/01/2021
Field of study

Timing verification of multi-core systems is complicated by contention for shared hardware resources between co-running tasks on different cores. This paper introduces the Multi-core Resource Stress and Sensitivity (MRSS) task model that characterizes how much stress each task places on resources and how much it is sensitive to such resource stress. This model facilitates a separation of concerns, thus retaining the advantages of the traditional two-step approach to timing verification (i.e. timing analysis followed by schedulability analysis). Response time analysis is derived for the MRSS task model, providing efficient context-dependent and context independent schedulability tests for both fixed priority preemptive and fixed priority non-preemptive scheduling. Dominance relations are derived between the tests, and proofs of optimal priority assignment provided. The MRSS task model is underpinned by a proof-of-concept industrial case study

Dagstuhl Research Online Publication Server

Network Time with a Consensus on Clock

Author: Handan Kilinc Alper
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 07/06/2021
Field of study

Decentralized protocols which require synchronous communication usually achieve it with the help of the time that computer clocks show. These clocks are mostly adjusted by centralized systems such as Network Time Protocol (NTP) because these adjustments are indispensable to reduce the effects of random drifts on clocks. On the other hand, an attack on these systems (which has happened in the past) can cause corruption of the protocols which rely on the time data that they provide to preserve synchronicity. So, we are facing the dilemma of relying on a centralized solution to adjust our timers or risking the security of our decentralized protocols. In this paper, we propose a Global Universal Composable (GUC) model for the physical clock synchronization problem in the decentralized systems by modeling the notion of consensus on clocks. Consensus on clocks is agreed upon considering the local clocks of all parties in a protocol which are possibly drifted. In this way, we model the functionality that e.g. NTP provides in a decentralized manner. In the end, we give a simple but useful protocol relying on a blockchain network that realizes our model. Our protocol can be used by the full nodes of a blockchain that need synchronous clocks in the real world to preserve the correctness and the security of the blockchain protocol. One advantage of our protocol is that it does not cause any extra communication overhead on the underlying blockchain protocol

Physical-Layer Security, Quantum Key Distribution and Post-quantum Cryptography

Author
Publication venue: 'MDPI AG'
Publication date: 16/09/2022
Field of study

The growth of data-driven technologies, 5G, and the Internet place enormous pressure on underlying information infrastructure. There exist numerous proposals on how to deal with the possible capacity crunch. However, the security of both optical and wireless networks lags behind reliable and spectrally efficient transmission. Significant achievements have been made recently in the quantum computing arena. Because most conventional cryptography systems rely on computational security, which guarantees the security against an efficient eavesdropper for a limited time, with the advancement in quantum computing this security can be compromised. To solve these problems, various schemes providing perfect/unconditional security have been proposed including physical-layer security (PLS), quantum key distribution (QKD), and post-quantum cryptography. Unfortunately, it is still not clear how to integrate those different proposals with higher level cryptography schemes. So the purpose of the Special Issue entitled “Physical-Layer Security, Quantum Key Distribution and Post-quantum Cryptography” was to integrate these various approaches and enable the next generation of cryptography systems whose security cannot be broken by quantum computers. This book represents the reprint of the papers accepted for publication in the Special Issue

Resource-aware Programming in a High-level Language - Improved performance with manageable effort on clustered MPSoCs

Author: Zwinkau Andreas
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2018
Field of study

Bis 2001 bedeutete Moores und Dennards Gesetz eine Verdoppelung der Ausführungszeit alle 18 Monate durch verbesserte CPUs. Heute ist Nebenläufigkeit das dominante Mittel zur Beschleunigung von Supercomputern bis zu mobilen Geräten. Allerdings behindern neuere Phänomene wie "Dark Silicon" zunehmend eine weitere Beschleunigung durch Hardware. Um weitere Beschleunigung zu erreichen muss sich auch die Software mehr ihrer Hardware Resourcen gewahr werden. Verbunden mit diesem Phänomen ist eine immer heterogenere Hardware. Supercomputer integrieren Beschleuniger wie GPUs. Mobile SoCs (bspw. Smartphones) integrieren immer mehr Fähigkeiten. Spezialhardware auszunutzen ist eine bekannte Methode, um den Energieverbrauch zu senken, was ein weiterer wichtiger Aspekt ist, welcher mit der reinen Geschwindigkeit abgewogen werde muss. Zum Beispiel werden Supercomputer auch nach "Performance pro Watt" bewertet. Zur Zeit sind systemnahe low-level Programmierer es gewohnt über Hardware nachzudenken, während der gemeine high-level Programmierer es vorzieht von der Plattform möglichst zu abstrahieren (bspw. Cloud). "High-level" bedeutet nicht, dass Hardware irrelevant ist, sondern dass sie abstrahiert werden kann. Falls Sie eine Java-Anwendung für Android entwickeln, kann der Akku ein wichtiger Aspekt sein. Irgendwann müssen aber auch Hochsprachen resourcengewahr werden, um Geschwindigkeit oder Energieverbrauch zu verbessern. Innerhalb des Transregio "Invasive Computing" habe ich an diesen Problemen gearbeitet. In meiner Dissertation stelle ich ein Framework vor, mit dem man Hochsprachenanwendungen resourcengewahr machen kann, um so die Leistung zu verbessern. Das könnte beispielsweise erhöhte Effizienz oder schnellerer Ausführung für das System als Ganzes bringen. Ein Kerngedanke dabei ist, dass Anwendungen sich nicht selbst optimieren. Stattdessen geben sie alle Informationen an das Betriebssystem. Das Betriebssystem hat eine globale Sicht und trifft Entscheidungen über die Resourcen. Diesen Prozess nennen wir "Invasion". Die Aufgabe der Anwendung ist es, sich an diese Entscheidungen anzupassen, aber nicht selbst welche zu fällen. Die Herausforderung besteht darin eine Sprache zu definieren, mit der Anwendungen Resourcenbedingungen und Leistungsinformationen kommunizieren. So eine Sprache muss ausdrucksstark genug für komplexe Informationen, erweiterbar für neue Resourcentypen, und angenehm für den Programmierer sein. Die zentralen Beiträge dieser Dissertation sind: Ein theoretisches Modell der Resourcen-Verwaltung, um die Essenz des resourcengewahren Frameworks zu beschreiben, die Korrektheit der Entscheidungen des Betriebssystems bezüglich der Bedingungen einer Anwendung zu begründen und zum Beweis meiner Thesen von Effizienz und Beschleunigung in der Theorie. Ein Framework und eine Übersetzungspfad resourcengewahrer Programmierung für die Hochsprache X10. Zur Bewertung des Ansatzes haben wir Anwendungen aus dem High Performance Computing implementiert. Eine Beschleunigung von 5x konnte gemessen werden. Ein Speicherkonsistenzmodell für die X10 Programmiersprache, da dies ein notwendiger Schritt zu einer formalen Semantik ist, die das theoretische Modell und die konkrete Implementierung verknüpft. Zusammengefasst zeige ich, dass resourcengewahre Programmierung in Hoch\-sprachen auf zukünftigen Architekturen mit vielen Kernen mit vertretbarem Aufwand machbar ist und die Leistung verbessert

OpenCL Actors - Adding Data Parallelism to Actor-based Programming with CAF

Author: A Klöckner
D Charousset
G Agha
G Agha
J Nickolls
JD Owens
K Wu
L Dagum
S Srinivasan
S Wienke
T Desell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The actor model of computation has been designed for a seamless support of concurrency and distribution. However, it remains unspecific about data parallel program flows, while available processing power of modern many core hardware such as graphics processing units (GPUs) or coprocessors increases the relevance of data parallelism for general-purpose computation. In this work, we introduce OpenCL-enabled actors to the C++ Actor Framework (CAF). This offers a high level interface for accessing any OpenCL device without leaving the actor paradigm. The new type of actor is integrated into the runtime environment of CAF and gives rise to transparent message passing in distributed systems on heterogeneous hardware. Following the actor logic in CAF, OpenCL kernels can be composed while encapsulated in C++ actors, hence operate in a multi-stage fashion on data resident at the GPU. Developers are thus enabled to build complex data parallel programs from primitives without leaving the actor paradigm, nor sacrificing performance. Our evaluations on commodity GPUs, an Nvidia TESLA, and an Intel PHI reveal the expected linear scaling behavior when offloading larger workloads. For sub-second duties, the efficiency of offloading was found to largely differ between devices. Moreover, our findings indicate a negligible overhead over programming with the native OpenCL API.Comment: 28 page

arXiv.org e-Print Archive

Advances in parallel programming for electronic design automation

Author: Lin Chun-Xun
Publication venue
Publication date: 01/08/2020
Field of study

The continued miniaturization of the technology node increases not only the chip capacity but also the circuit design complexity. How does one efficiently design a chip with millions or billions transistors? This has become a challenging problem in the integrated circuit (IC) design industry, especially for the developers of electronic design automation (EDA) tools. To boost the performance of EDA tools, one promising direction is via parallel computing. In this dissertation, we explore different parallel computing approaches, from CPU to GPU to distributed computing, for EDA applications. Nowadays multi-core processors are prevalent from mobile devices to laptops to desktop, and it is natural for software developers to utilize the available cores to maximize the performance of their applications. Therefore, in this dissertation we first focus on multi-threaded programming. We begin by reviewing a C++ parallel programming library called Cpp-Taskflow. Cpp-Taskflow is designed to facilitate programming parallel applications, and has been successfully applied to an EDA timing analysis tool. We will demonstrate Cpp-Taskflow’s programming model and interface, software architecture and execution flow. Then, we improve Cpp-Taskflow in several aspects. First, we enhance Cpp-Taskflow’s usability through restructuring the software architecture. Second, we introduce task graph composition to support composability and modularity, which makes it easier for users to construct large and complex parallel patterns. Third, we add a new task type in Cpp-Taskflow to let users control the graph execution flow. This feature empowers the graph model with the ability to describe complex control flow. Aside from the above enhancements, we have designed a new scheduler to adaptively manage the threads based on available parallelism. The new scheduler uses a simple and effective strategy which can not only prevent resource from being underutilized, but also mitigate resource over-subscription. We have evaluated the new scheduler on both micro-benchmarks and a very-large-scale integration (VLSI) application, and the results show that the new scheduler can achieve good performance and is very energy-efficient. Next we study the applicability of heterogeneous computing, specifically the graphics processing unit (GPU), to EDA. We demonstrate how to use GPU to accelerate VLSI placement, and we show that GPU can bring substantial performance gain to VLSI placement. Finally, as the design size keeps increasing, a more scalable solution will be distributed computing. We introduce a distributed power grid analysis framework built on top of DtCraft. This framework allows users to flexibly partition the design and automatically deploy the computations across several machines. In addition, we propose a job scheduler that can efficiently utilize cluster resource to improve the framework’s performance

Practical Condition Synchronization for Transactional Memory

Author: Wang Chao
Publication venue: Lehigh Preserve
Publication date
Field of study

Few transactional memory implementations allow for condition synchronization among transactions. The problems are many, most notably the lack of consensus about a single appropriate linguistic construct, and the lack of mechanisms that are compatible with hardware transactional memory. In this thesis, we introduce a broadly useful mechanism for supporting condition synchronization among transactions. Our mechanism supports a number of linguistic constructs for coordinating transactions, and does so without introducing overhead on in-flight hardware transactions. Experiments show that our mechanisms work well, and that the diversity of linguistic constructs allows programmers to chose the technique that is best suited to a particular application

Lehigh University: Lehigh Preserve