182,655 research outputs found

    An Implementation of a Predictable Cache-coherent Multi-core System

    Get PDF
    Multi-core platforms have entered the realm of the embedded systems to meet the ever growing performance requirements of the real-time embedded applications. Real-time applications leverage the hardware parallelism from multi-cores while keeping the hardware cost minimum. However, when the real-time tasks are deployed on the multi-core platforms, they experience interference due to sharing of hardware resources such as shared bus, last level cache, and main memory. As a result, it complicates computing the worst-case execution time of the real-time tasks. In this thesis, I present a hardware prototype that implements a predictable cache-coherent real-time multi-core system. The designed hardware follows the design guidelines outlined in the predictable cache coherence protocol. The hardware uses a latency insensitive interfaces to integrate the multi-core components such as the processor, cache controller, and interconnecting bus. The prototyped multi-core hardware is synthesized and implemented in a low-cost and high-performing FPGA board. The hardware is validated and verified on a tethered system that enables the design to run multi-threaded pthread applications

    EMVS: Embedded Multi Vector-core System

    Get PDF
    With the increase in the density and performance of digital electronics, the demand for a power-efficient high-performance computing (HPC) system has been increased for embedded applications. The existing embedded HPC systems suffer from issues like programmability, scalability, and portability. Therefore, a parameterizable and programmable high-performance processor system architecture is required to execute the embedded HPC applications. In this work, we proposed an Embedded Multi Vector-core System (EMVS) which executes the embedded application by managing the multiple vectorized tasks and their memory operations. The system is designed and ported on an Altera DE4 FPGA development board. The performance of EMVS is compared with the Heterogeneous Multi-Processing Odroid XU3, Parallela and GPU Jetson TK1 embedded systems. In contrast to the embedded systems, the results show that EMVS improves 19.28 and 10.22 times of the application and system performance respectively and consumes 10.6 times less energy.Peer ReviewedPostprint (author's final draft

    On-Board High-Performance Computing For Multi-Robot Aerial Systems

    Get PDF
    With advancements in low-energy-consumption multi/many core embedded-computing devices, a logical transition for robotic systems is Supercomputing, formally known as high performance computing (HPC), a tool currently used for solving the most complex problems for humankind such as the origin of the universe, the finding of deceases’ cures, etc. As such, HPC has always been focused on scientific inquires. However, its scope can be widening up to include missions carried out with robots. Since a robot could be embedded with computing devices, a set of robots could be set as a cluster of computers, the most reliable HPC infrastructure. The advantages of setting up such an infrastructure are many, from speeding up on-board computation up to providing a multi-robot system with robustness, scalability, user transparency, etc., all key features in supercomputing. This chapter presents a middleware technology for the enabling of high performance computing in multi-robot systems, in particular for aerial robots. The technology can be used for the automatic deployment of cluster computing in multi-robot systems, the utilization of standard HPC technologies, and the development of HPC applications in multiple fields such as precision agriculture, military, civilian, search and rescue, etc

    On the Task Mapping and Scheduling for DAG-based Embedded Vision Applications on Heterogeneous Multi/Many-core Architectures

    Get PDF
    Embedded vision applications have stringent performance constraints that must be satisfied when they are run on low-power embedded systems. OpenVX has emerged as the de-facto reference standard to develop such applications. Starting with a DAG representation of the application and by relying on a primitive-based programming model, it allows for automatic system-level optimizations and synthesis of an implementation onto the target heterogeneous multi-core architecture. However, the state-of-the-art algorithm for task mapping and scheduling in OpenVX does not provide the performance necessary for such applications when deployed on embedded multi-/many-core architectures. %does not implement an efficient algorithm task mapping and scheduling onto embedded multi/many-core architectures. Our work addresses this challenge by making the following three contributions. First, we implemented a static task scheduling and mapping approach for OpenVX using the heterogeneous earliest finish time (HEFT) heuristic. We show that HEFT allows us to improve the system performance up to 70% on one of the most widespread embedded vision systems (i.e., NVIDIA VisionWorks on NVIDIA Jetson TX2). Second, we show that HEFT, in the context of an embedded vision application where some primitives may have multiple implementations (e.g., for CPU and for GPU), can lead to an imbalance in load amongst heterogeneous computing elements (CEs); thereby, suffering from degraded performance. Third, we propose an algorithm called exclusive earliest finish time (XEFT) that introduces the notion of exclusive overlap between single implementation primitives to improve the load balancing. We show that XEFT can further improve the system performance up to 33% over HEFT, and 82% over OpenVX. We present the results on different benchmarks, including a real-world localization and mapping application (ORB-SLAM) combined with the NVIDIA image recognition application based on deep-learning

    The Multy Supply Function Abstraction for Multiprocessors

    Get PDF
    Multi-core platforms are becoming the dominant computing architecture for next generation embedded systems. Nevertheless, designing, programming, and analyzing such systems is not easy and a solid methodology is still missing. In this paper, we propose two powerful abstractions to model the computing power of a parallel machine, which provide a general interface for developing and analyzing real-time applications in isolation, independently of the physical platform. The proposed abstractions can be applied on top of different types of service mechanisms, such as periodic servers, static partitions, and P-fair time partitions. In addition, we developed the schedulability analysis of a set of real-time tasks on top of a parallel machine that is compliant with the proposed abstractions

    A Survey and Comparative Study of Hard and Soft Real-time Dynamic Resource Allocation Strategies for Multi/Many-core Systems

    Get PDF
    Multi-/many-core systems are envisioned to satisfy the ever-increasing performance requirements of complex applications in various domains such as embedded and high-performance computing. Such systems need to cater to increasingly dynamic workloads, requiring efficient dynamic resource allocation strategies to satisfy hard or soft real-time constraints. This article provides an extensive survey of hard and soft real-time dynamic resource allocation strategies proposed since the mid-1990s and highlights the emerging trends for multi-/many-core systems. The survey covers a taxonomy of the resource allocation strategies and considers their various optimization objectives, which have been used to provide comprehensive comparison. The strategies employ various principles, such as market and biological concepts, to perform the optimizations. The trend followed by the resource allocation strategies, open research challenges, and likely emerging research directions have also been provided

    Hardware/Software Approach for Code Synchronization in Low-Power Multi-Core Sensor Nodes

    Get PDF
    Latest embedded bio-signal analysis applications, targeting low-power Wireless Body Sensor Nodes (WBSNs), present conflicting requirements. On one hand, bio-signal analysis applications are continuously increasing their demand for high computing capabilities. On the other hand, long-term signal processing in WBSNs must be provided within their highly constrained energy budget. In this context, parallel processing effectively increases the power efficiency of WBSNs, but only if the execution can be properly synchronized among computing elements. To address this challenge, in this work we propose a hardware/software approach to synchronize the execution of bio-signal processing applications in multi-core WBSNs. This new approach requires little hardware resources and very few adaptations in the source code. Moreover, it provides the necessary flexibility to execute applications with an arbitrarily large degree of complexity and parallelism, enabling considerable reductions in power consumption for all multi-core WBSN execution conditions. Experimental results show that a multi-core WBSN architecture using the illustrated approach can obtain energy savings of up to 40%, with respect to an equivalent singlecore architecture, when performing advanced bio-signal analysi

    CROSS-LAYER CUSTOMIZATION FOR LOW POWER AND HIGH PERFORMANCE EMBEDDED MULTI-CORE PROCESSORS

    Get PDF
    Due to physical limitations and design difficulties, computer processor architecture has shifted to multi-core and even many-core based approaches in recent years. Such architectures provide potentials for sustainable performance scaling into future peta-scale/exa-scale computing platforms, at affordable power budget, design complexity, and verification efforts. To date, multi-core processor products have been replacing uni-core processors in almost every market segment, including embedded systems, general-purpose desktops and laptops, and super computers. However, many issues still remain with multi-core processor architectures that need to be addressed before their potentials could be fully realized. People in both academia and industry research community are still seeking proper ways to make efficient and effective use of these processors. The issues involve hardware architecture trade-offs, the system software service, the run-time management, and user application design, which demand more research effort into this field. Due to the architectural specialties with multi-core based computers, a Cross-Layer Customization framework is proposed in this work, which combines application specific information and system platform features, along with necessary operating system service support, to achieve exceptional power and performance efficiency for targeted multi-core platforms. Several topics are covered with specific optimization goals, including snoop cache coherence protocol, inter-core communication for producer-consumer applications, synchronization mechanisms, and off-chip memory bandwidth limitations. Analysis of benchmark program execution with conventional mechanisms is made to reveal the overheads in terms of power and performance. Specific customizations are proposed to eliminate such overheads with support from hardware, system software, compiler, and user applications. Experiments show significant improvement on system performance and power efficiency

    A system model and stack for the parallelization of time-critical applications on many-core architectures

    Get PDF
    3rd Workshop on High-performance and Real-time Embedded Systems (HIRES 2015). 21, Jan, 2015. Amsterdam, Netherlands.Many embedded systems are subject to stringent timing requirementsthat compel them to "react" within prede_ned time bounds.The said "reaction" may be understood as simply outputting the resultsof a basic computation, but may also mean engaging in complex interactionswith the surrounding environment. Although these strict temporalrequirements advocate the use of simple and predictable hardwarearchitectures that allow for the computation of tight upper-bounds onthe software response time, meanwhile most of these embedded systemssteadily demand for more and more computational performance, whichweighs in favor of specialized, complex, and optimized multi-core andmany-core processors on which the execution of the application can beparallelized. However, it is not straightforward how event-based embeddedapplications can be structured in order to take advantage and fullyexploit the parallelization opportunities and achieve higher performanceand energy-e_fficient computing. The P-SOCRATES project envisions thenecessity to bring together next-generation many-core accelerators fromthe embedded computing domain with the programming models andtechniques from the high-performance computing domain, supportingthis with real-time methodologies to provide timing predictability. This paper gives an overview of the system model and software stackproposed in the P-SOCRATES project to facilitate the deployment andexecution of parallel applications on many-core infrastructures, whilepreserving the time-predictability of the execution required by real-timepractices to upper-bound the response time of the embedded application
    • …
    corecore