Search CORE

123 research outputs found

Heterogeneity-aware scheduling and data partitioning for system performance acceleration

Author: Yu Teng
Publication venue: The University of St Andrews
Publication date: 14/04/2020
Field of study

Over the past decade, heterogeneous processors and accelerators have become increasingly prevalent in modern computing systems. Compared with previous homogeneous parallel machines, the hardware heterogeneity in modern systems provides new opportunities and challenges for performance acceleration. Classic operating systems optimisation problems such as task scheduling, and application-specific optimisation techniques such as the adaptive data partitioning of parallel algorithms, are both required to work together to address hardware heterogeneity. Significant effort has been invested in this problem, but either focuses on a specific type of heterogeneous systems or algorithm, or a high-level framework without insight into the difference in heterogeneity between different types of system. A general software framework is required, which can not only be adapted to multiple types of systems and workloads, but is also equipped with the techniques to address a variety of hardware heterogeneity. This thesis presents approaches to design general heterogeneity-aware software frameworks for system performance acceleration. It covers a wide variety of systems, including an OS scheduler targeting on-chip asymmetric multi-core processors (AMPs) on mobile devices, a hierarchical many-core supercomputer and multi-FPGA systems for high performance computing (HPC) centers. Considering heterogeneity from on-chip AMPs, such as thread criticality, core sensitivity, and relative fairness, it suggests a collaborative based approach to co-design the task selector and core allocator on OS scheduler. Considering the typical sources of heterogeneity in HPC systems, such as the memory hierarchy, bandwidth limitations and asymmetric physical connection, it proposes an application-specific automatic data partitioning method for a modern supercomputer, and a topological-ranking heuristic based schedule for a multi-FPGA based reconfigurable cluster. Experiments on both a full system simulator (GEM5) and real systems (Sunway Taihulight Supercomputer and Xilinx Multi-FPGA based clusters) demonstrate the significant advantages of the suggested approaches compared against the state-of-the-art on variety of workloads."This work is supported by St Leonards 7th Century Scholarship and Computer Science PhD funding from University of St Andrews; by UK EPSRC grant Discovery: Pattern Discovery and Program Shaping for Manycore Systems (EP/P020631/1)." -- Acknowledgement

University of St. Andrews - Pure

St Andrews Research Repository

Adaptive Dispatching of Tasks in the Cloud

Author: Gelenbe Erol
Wang Lan
Publication venue
Publication date: 03/01/2015
Field of study

The increasingly wide application of Cloud Computing enables the consolidation of tens of thousands of applications in shared infrastructures. Thus, meeting the quality of service requirements of so many diverse applications in such shared resource environments has become a real challenge, especially since the characteristics and workload of applications differ widely and may change over time. This paper presents an experimental system that can exploit a variety of online quality of service aware adaptive task allocation schemes, and three such schemes are designed and compared. These are a measurement driven algorithm that uses reinforcement learning, secondly a "sensible" allocation algorithm that assigns jobs to sub-systems that are observed to provide a lower response time, and then an algorithm that splits the job arrival stream into sub-streams at rates computed from the hosts' processing capabilities. All of these schemes are compared via measurements among themselves and with a simple round-robin scheduler, on two experimental test-beds with homogeneous and heterogeneous hosts having different processing capacities.Comment: 10 pages, 9 figure

arXiv.org e-Print Archive

CiteSeerX

Spiral - Imperial College Digital Repository

협업 로봇을 위한 서비스 기반과 모델 기반의 소프트웨어 개발 방법론

Author: 홍혜선
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2020. 2. 하순회.가까운 미래에는 다양한 로봇이 다양한 분야에서 하나의 임무를 협력하여 수행하는 모습은 흔히 볼 수 있게 될 것이다. 그러나 실제로 이러한 모습이 실현되기에는 두 가지의 어려움이 있다. 먼저 로봇을 운용하기 위한 소프트웨어를 명세하는 기존 연구들은 대부분 개발자가 로봇의 하드웨어와 소프트웨어에 대한 지식을 알고 있는 것을 가정하고 있다. 그래서 로봇이나 컴퓨터에 대한 지식이 없는 사용자들이 여러 대의 로봇이 협력하는 시나리오를 작성하기는 쉽지 않다. 또한, 로봇의 소프트웨어를 개발할 때 로봇의 하드웨어의 특성과 관련이 깊어서, 다양한 로봇의 소프트웨어를 개발하는 것도 간단하지 않다. 본 논문에서는 상위 수준의 미션 명세와 로봇의 행위 프로그래밍으로 나누어 새로운 소프트웨어 개발 프레임워크를 제안한다. 또한, 본 프레임워크는 크기가 작은 로봇부터 계산 능력이 충분한 로봇들이 서로 군집을 이루어 미션을 수행할 수 있도록 지원한다. 본 연구에서는 로봇의 하드웨어나 소프트웨어에 대한 지식이 부족한 사용자도 로봇의 동작을 상위 수준에서 명세할 수 있는 스크립트 언어를 제안한다. 제안하는 언어는 기존의 스크립트 언어에서는 지원하지 않는 네 가지의 기능인 팀의 구성, 각 팀의 서비스 기반 프로그래밍, 동적으로 모드 변경, 다중 작업(멀티 태스킹)을 지원한다. 우선 로봇은 팀으로 그룹 지을 수 있고, 로봇이 수행할 수 있는 기능을 서비스 단위로 추상화하여 새로운 복합 서비스를 명세할 수 있다. 또한 로봇의 멀티 태스킹을 위해 '플랜' 이라는 개념을 도입하였고, 복합 서비스 내에서 이벤트를 발생시켜서 동적으로 모드가 변환할 수 있도록 하였다. 나아가 여러 로봇의 협력이 더욱 견고하고, 유연하고, 확장성을 높이기 위해, 군집 로봇을 운용할 때 로봇이 임무를 수행하는 도중에 문제가 생길 수 있으며, 상황에 따라 로봇을 동적으로 다른 행위를 수행할 수 있다고 가정한다. 이를 위해 동적으로도 팀을 구성할 수 있고, 여러 대의 로봇이 하나의 서비스를 수행하는 그룹 서비스를 지원하고, 일대 다 통신과 같은 새로운 기능을 스크립트 언어에 반영하였다. 따라서 확장된 상위 수준의 스크립트 언어는 비전문가도 다양한 유형의 협력 임무를 쉽게 명세할 수 있다. 로봇의 행위를 프로그래밍하기 위해 다양한 소프트웨어 개발 프레임워크가 연구되고 있다. 특히 재사용성과 확장성을 중점으로 둔 연구들이 최근 많이 사용되고 있지만, 대부분의 이들 연구는 리눅스 운영체제와 같이 많은 하드웨어 자원을 필요로 하는 운영체제를 가정하고 있다. 또한, 프로그램의 분석 및 성능 예측 등을 고려하지 않기 때문에, 자원 제약이 심한 크기가 작은 로봇의 소프트웨어를 개발하기에는 어렵다. 그래서 본 연구에서는 임베디드 소프트웨어를 설계할 때 쓰이는 정형적인 모델을 이용한다. 이 모델은 정적 분석과 성능 예측이 가능하지만, 로봇의 행위를 표현하기에는 제약이 있다. 그래서 본 논문에서 외부의 이벤트에 의해 수행 중간에 행위를 변경하는 로봇을 위해 유한 상태 머신 모델과 데이터 플로우 모델이 결합하여 동적 행위를 명세할 수 있는 확장된 모델을 적용한다. 그리고 딥러닝과 같이 계산량을 많이 필요로 하는 응용을 분석하기 위해, 루프 구조를 명시적으로 표현할 수 있는 모델을 제안한다. 마지막으로 여러 로봇의 협업 운용을 위해 로봇 사이에 공유되는 정보를 나타내기 위해 두 가지 모델을 사용한다. 먼저 중앙에서 공유 정보를 관리하기 위해 라이브러리 태스크라는 특별한 태스크를 통해 공유 정보를 나타낸다. 또한, 로봇이 자신의 정보를 가까운 로봇들과 공유하기 위해 멀티캐스팅을 위한 새로운 포트를 추가한다. 이렇게 확장된 정형적인 모델은 실제 로봇 코드로 자동 생성되어, 소프트웨어 설계 생산성 및 개발 효율성에 이점을 가진다. 비전문가가 명세한 스크립트 언어는 정형적인 태스크 모델로 변환하기 위해 중간 단계인 전략 단계를 추가하였다. 제안하는 방법론의 타당성을 검증하기 위해, 시뮬레이션과 여러 대의 실제 로봇을 이용한 협업하는 시나리오에 대해 실험을 진행하였다.In the near future, it will be common that a variety of robots are cooperating to perform a mission in various fields. There are two software challenges when deploying collaborative robots: how to specify a cooperative mission and how to program each robot to accomplish its mission. In this paper, we propose a novel software development framework that separates mission specification and robot behavior programming, which is called service-oriented and model-based (SeMo) framework. Also, it can support distributed robot systems, swarm robots, and their hybrid. For mission specification, a novel scripting language is proposed with the expression capability. It involves team composition and service-oriented behavior specification of each team, allowing dynamic mode change of operation and multi-tasking. Robots are grouped into teams, and the behavior of each team is defined with a composite service. The internal behavior of a composite service is defined by a sequence of services that the robots will perform. The notion of plan is applied to express multi-tasking. And the robot may have various operating modes, so mode change is triggered by events generated in a composite service. Moreover, to improve the robustness, scalability, and flexibility of robot collaboration, the high-level mission scripting language is extended with new features such as team hierarchy, group service, one-to-many communication. We assume that any robot fails during the execution of scenarios, and the grouping of robots can be made at run-time dynamically. Therefore, the extended mission specification enables a casual user to specify various types of cooperative missions easily. For robot behavior programming, an extended dataflow model is used for task-level behavior specification that does not depend on the robot hardware platform. To specify the dynamic behavior of the robot, we apply an extended task model that supports a hybrid specification of dataflow and finite state machine models. Furthermore, we propose a novel extension to allow the explicit specification of loop structures. This extension helps the compute-intensive application, which contains a lot of loop structures, to specify explicitly and analyze at compile time. Two types of information sharing, global information sharing and local knowledge sharing, are supported for robot collaboration in the dataflow graph. For global information, we use the library task, which supports shared resource management and server-client interaction. On the other hand, to share information locally with near robots, we add another type of port for multicasting and use the knowledge sharing technique. The actual robot code per robot is automatically generated from the associated task graph, which minimizes the human efforts in low-level robot programming and improves the software design productivity significantly. By abstracting the tasks or algorithms as services and adding the strategy description layer in the design flow, the mission specification is refined into task-graph specification automatically. The viability of the proposed methodology is verified with preliminary experiments with three cooperative mission scenarios with heterogeneous robot platforms and robot simulator.Chapter 1. Introduction 1 1.1 Motivation 1 1.2 Contribution 7 1.3 Dissertation Organization 9 Chapter 2. Background and Existing Research 11 2.1 Terminologies 11 2.2 Robot Software Development Frameworks 25 2.3 Parallel Embedded Software Development Framework 31 Chapter 3. Overview of the SeMo Framework 41 3.1 Motivational Examples 45 Chapter 4. Robot Behavior Programming 47 4.1 Related works 48 4.2 Model-based Task Graph Specification for Individual Robots 56 4.3 Model-based Task Graph Specification for Cooperating Robots 70 4.4 Automatic Code Generation 74 4.5 Experiments 78 Chapter 5. High-level Mission Specification 81 5.1 Service-oriented Mission Specification 82 5.2 Strategy Description 93 5.3 Automatic Task Graph Generation 96 5.4 Related works 99 5.5 Experiments 104 Chapter 6. Conclusion 114 6.1 Future Research 116 Bibliography 118 Appendices 133 요약 158Docto

SNU Open Repository and Archive

Timing Predictability in Future Multi-Core Avionics Systems

Author
Publication venue: 'Linkoping University Electronic Press'
Publication date
Field of study

Crossref

Overlay virtualized wireless sensor networks for application in industrial internet of things : a review

Author: Abu-Mahfouz Adnan M.
Hancke Gerhard P.
Nkomo Malvin
Onumanyi Adeiza. J.
Sinha Saurabh
Publication venue
Publication date: 01/01/2018
Field of study

Abstract: In recent times, Wireless Sensor Networks (WSNs) are broadly applied in the Industrial Internet of Things (IIoT) in order to enhance the productivity and efficiency of existing and prospective manufacturing industries. In particular, an area of interest that concerns the use of WSNs in IIoT is the concept of sensor network virtualization and overlay networks. Both network virtualization and overlay networks are considered contemporary because they provide the capacity to create services and applications at the edge of existing virtual networks without changing the underlying infrastructure. This capability makes both network virtualization and overlay network services highly beneficial, particularly for the dynamic needs of IIoT based applications such as in smart industry applications, smart city, and smart home applications. Consequently, the study of both WSN virtualization and overlay networks has become highly patronized in the literature, leading to the growth and maturity of the research area. In line with this growth, this paper provides a review of the development made thus far concerning virtualized sensor networks, with emphasis on the application of overlay networks in IIoT. Principally, the process of virtualization in WSN is discussed along with its importance in IIoT applications. Different challenges in WSN are also presented along with possible solutions given by the use of virtualized WSNs. Further details are also presented concerning the use of overlay networks as the next step to supporting virtualization in shared sensor networks. Our discussion closes with an exposition of the existing challenges in the use of virtualized WSN for IIoT applications. In general, because overlay networks will be contributory to the future development and advancement of smart industrial and smart city applications, this review may be considered by researchers as a reference point for those particularly interested in the study of this growing field

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

University of Johannesburg Institutional Repository

UPSpace at the University of Pretoria

Towards multiprogrammed GPUs

Author: Tanasić Ivan
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2017
Field of study

Programmable Graphics Processing Units (GPUs) have recently become the most pervasitheve massively parallel processors. They have come a long way, from fixed function ASICs designed to accelerate graphics tasks to a programmable architecture that can also execute general-purpose computations. Because of their performance and efficiency, an increasing amount of software is relying on them to accelerate data parallel and computationally intensive sections of code. They have earned a place in many systems, from low power mobile devices to the biggest data centers in the world. However, GPUs are still plagued by the fact that they essentially have no multiprogramming support, resulting in low system performance if the GPU is shared among multiple programs. In this dissertation we set to provide the rich GPU multiprogramming support by improving the multitasking capabilities and increasing the virtual memory functionality and performance. The main issue hindering the multitasking support in GPUs is the nonpreemptive execution of GPU kernels. Here we propose two preemption mechanisms with dierent design philosophies, that can be used by a scheduler to preempt execution on GPU cores and make room for some other process. We also argue for the spatial sharing of the GPU and propose a concrete hardware scheduler implementation that dynamically partitions the GPU cores among running kernels, according to their set priorities. Opposing the assumptions made in the related work, we demonstrate that preemptive execution is feasible and the desired approach to GPU multitasking. We further show improved system fairness and responsiveness with our scheduling policy. We also pinpoint that at the core of the insufficient virtual memory support lies the exceptions handling mechanism used by modern GPUs. Currently, GPUs offload the actual exception handling work to the CPU, while the faulting instruction is stalled in the GPU core. This stall-on-fault model prevents some of the virtual memory features and optimizations and is especially harmful in multiprogrammed environments because it prevents context switching the GPU unless all the in-flight faults are resolved. In this disseritation, we propose three GPU core organizations with varying performance-complexity trade-off that get rid of the stall-on-fault execution and enable preemptible exceptions on the GPU (i.e., the faulting instruction can be squashed and restarted later). Building on this support, we implement two use cases and demonstrate their utility. One is a scheme that performs context switch of the faulted threads and tries to find some other useful work to do in the meantime, hiding the latency of the fault and improving the system performance. The other enables the fault handling code to run locally, on the GPU, instead of relying on the CPU offloading and show that the local fault handling can also improve performance.Las Unidades de Procesamiento de Gráficos Programables (GPU, por sus siglas en inglés) se han convertido recientemente en los procesadores masivamente paralelos más difundidos. Han recorrido un largo camino desde ASICs de función fija diseñados para acelerar tareas gráficas, hasta una arquitectura programable que también puede ejecutar cálculos de propósito general. Debido a su rendimiento y eficiencia, una cantidad creciente de software se basa en ellas para acelerar las secciones de código computacionalmente intensivas que disponen de paralelismo de datos. Se han ganado un lugar en muchos sistemas, desde dispositivos móviles de baja potencia hasta los centros de datos más grandes del mundo. Sin embargo, las GPUs siguen plagadas por el hecho de que esencialmente no tienen soporte de multiprogramación, lo que resulta en un bajo rendimiento del sistema si la GPU se comparte entre múltiples programas. En esta disertación nos centramos en proporcionar soporte de multiprogramación para GPUs mediante la mejora de las capacidades de multitarea y del soporte de memoria virtual. El principal problema que dificulta el soporte multitarea en las GPUs es la ejecución no apropiativa de los núcleos de la GPU. Proponemos dos mecanismos de apropiación con diferentes filosofías de diseño, que pueden ser utilizados por un planificador para apropiarse de los núcleos de la GPU y asignarlos a otros procesos. También abogamos por la división espacial de la GPU y proponemos una implementación concreta de un planificador hardware que divide dinámicamente los núcleos de la GPU entre los kernels en ejecución, de acuerdo con sus prioridades establecidas. Oponiéndose a las suposiciones hechas por otros en trabajos relacionados, demostramos que la ejecución apropiativa es factible y el enfoque deseado para la multitarea en GPUs. Además, mostramos una mayor equidad y capacidad de respuesta del sistema con nuestra política de asignación de núcleos de la GPU. También señalamos que la causa principal del insuficiente soporte de la memoria virtual en las GPUs es el mecanismo de manejo de excepciones utilizado por las GPUs modernas. En la actualidad, las GPUs descargan el manejo de las excepciones a la CPU, mientras que la instrucción que causo la fallada se encuentra esperando en el núcleo de la GPU. Este modelo de bloqueo en fallada impide algunas de las funciones y optimizaciones de la memoria virtual y es especialmente perjudicial en entornos multiprogramados porque evita el cambio de contexto de la GPU a menos que se resuelvan todas las fallas pendientes. En esta disertación, proponemos tres implementaciones del pipeline de los núcleos de la GPU que ofrecen distintos balances de rendimiento-complejidad y permiten la apropiación del núcleo aunque haya excepciones pendientes (es decir, la instrucción que produjo la fallada puede ser reiniciada más tarde). Basándonos en esta nueva funcionalidad, implementamos dos casos de uso para demostrar su utilidad. El primero es un planificador que asigna el núcleo a otros subprocesos cuando hay una fallada para tratar de hacer trabajo útil mientras esta se resuelve, ocultando así la latencia de la fallada y mejorando el rendimiento del sistema. El segundo permite que el código de manejo de las falladas se ejecute localmente en la GPU, en lugar de descargar el manejo a la CPU, mostrando que el manejo local de falladas también puede mejorar el rendimiento.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Hardware support for memory protection in sensor nodes

Author: LOPRIORE LANFRANCO
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

With reference to the typical hardware configuration of a sensor node, we present the architecture of a memory protection unit (MPU) designed as a low-complexity addition to the microcontroller. The MPU is aimed at supporting memory protection and the privileged execution mode. It is connected to the system buses, and is seen by the processor as a memory-mapped input/output device. The contents of the internal MPU registers specify the composition of the protection contexts of the running program in terms of access rights for the memory pages. The MPU generates a hardware interrupt to the processor when it detects a protection violation. The proposed MPU architecture is evaluated from a number of salient viewpoints, which include the distribution, review and revocation of access permissions, and the support for important memory protection paradigms, including hierarchical contexts and protection rings

Crossref

Archivio della Ricerca - Università di Pisa

Parallel and Distributed Computing

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

Directory of Open Access Books (DOAB)

Algorithmic and Software System Support to Accelerate Data Processing in CPU-GPU Hybrid Computing Environments

Author: Wang Kaibo
Publication venue: The Ohio State University / OhioLINK
Publication date: 01/01/2015
Field of study

OhioLINK Electronic Thesis and Dissertation Center