1,083 research outputs found

    Energy-efficient and high-performance lock speculation hardware for embedded multicore systems

    Full text link
    Embedded systems are becoming increasingly common in everyday life and like their general-purpose counterparts, they have shifted towards shared memory multicore architectures. However, they are much more resource constrained, and as they often run on batteries, energy efficiency becomes critically important. In such systems, achieving high concurrency is a key demand for delivering satisfactory performance at low energy cost. In order to achieve this high concurrency, consistency across the shared memory hierarchy must be accomplished in a cost-effective manner in terms of performance, energy, and implementation complexity. In this article, we propose Embedded-Spec, a hardware solution for supporting transparent lock speculation, without the requirement for special supporting instructions. Using this approach, we evaluate the energy consumption and performance of a suite of benchmarks, exploring a range of contention management and retry policies. We conclude that for resource-constrained platforms, lock speculation can provide real benefits in terms of improved concurrency and energy efficiency, as long as the underlying hardware support is carefully configured.This work is supported in part by NSF under Grants CCF-0903384, CCF-0903295, CNS-1319495, and CNS-1319095 as well the Semiconductor Research Corporation under grant number 1983.001. (CCF-0903384 - NSF; CCF-0903295 - NSF; CNS-1319495 - NSF; CNS-1319095 - NSF; 1983.001 - Semiconductor Research Corporation

    EOS: A project to investigate the design and construction of real-time distributed embedded operating systems

    Get PDF
    The EOS project is investigating the design and construction of a family of real-time distributed embedded operating systems for reliable, distributed aerospace applications. Using the real-time programming techniques developed in co-operation with NASA in earlier research, the project staff is building a kernel for a multiple processor networked system. The first six months of the grant included a study of scheduling in an object-oriented system, the design philosophy of the kernel, and the architectural overview of the operating system. In this report, the operating system and kernel concepts are described. An environment for the experiments has been built and several of the key concepts of the system have been prototyped. The kernel and operating system is intended to support future experimental studies in multiprocessing, load-balancing, routing, software fault-tolerance, distributed data base design, and real-time processing

    Castell: a heterogeneous cmp architecture scalable to hundreds of processors

    Get PDF
    Technology improvements and power constrains have taken multicore architectures to dominate microprocessor designs over uniprocessors. At the same time, accelerator based architectures have shown that heterogeneous multicores are very efficient and can provide high throughput for parallel applications, but with a high-programming effort. We propose Castell a scalable chip multiprocessor architecture that can be programmed as uniprocessors, and provides the high throughput of accelerator-based architectures. Castell relies on task-based programming models that simplify software development. These models use a runtime system that dynamically finds, schedules, and adds hardware-specific features to parallel tasks. One of these features is DMA transfers to overlap computation and data movement, which is known as double buffering. This feature allows applications on Castell to tolerate large memory latencies and lets us design the memory system focusing on memory bandwidth. In addition to provide programmability and the design of the memory system, we have used a hierarchical NoC and added a synchronization module. The NoC design distributes memory traffic efficiently to allow the architecture to scale. The synchronization module is a consequence of the large performance degradation of application for large synchronization latencies. Castell is mainly an architecture framework that enables the definition of domain-specific implementations, fine-tuned to a particular problem or application. So far, Castell has been successfully used to propose heterogeneous multicore architectures for scientific kernels, video decoding (using H.264), and protein sequence alignment (using Smith-Waterman and clustalW). It has also been used to explore a number of architecture optimizations such as enhanced DMA controllers, and architecture support for task-based programming models. ii

    PORTING OF FREERTOS ON A PYTHON VIRTUAL MACHINE FOR EMBEDDED AND IOT DEVICES

    Get PDF
    The fourth industrial revolution, The Industry 4.0, puts emphasis on the need of “Smart” and “Connected” objects through the use of services provided by Internet of Things, cyber-physical systems and cloud computing to optimize the cost, development time and remote connectivity. Development of highly scalable and flexible IoT applications is the need of time. These solutions require connectivity, less development time, time-to-market and at the same time offers a high performance and great reliability. Zerynth, a small company, provides its full stack for IoT solutions. Zerynth Virtual Machine is the core component among other components in stack which allow the programmers to code in python or hybrid C/Python coding with multithreaded Real Time OS with negligible memory footprint. The Python layer, Application Layer, is totally agnostic of underlying RTOS and hardware abstraction layer. This layered software architecture of Zerynth VM makes it totally compatible with new Industry 4.0 standard. The Hardware abstraction layer, VHAL, abstracts the hardware features of supported MCU and its peripherals while RTOS layer, VOSAL, uses the features of underlying Real Time OS. Zerynth VM can be ported with different Real Time OS and various hardware platforms depending upon the application’s cost, features and other relevant parameters. Configuring Kinetis MCU (MK64FN1M0VDC12) with existing VM became the first objective of my thesis. This configuration covers from scratch the clock, boot loading and peripheral support. Since previous version of Zerynth VM had a support of only Chibi2 OS which has certain dependency on the hardware layer underneath so this became another objective to separate the Chibi2 OS from VHAL layer for total independence. Finally, Porting of FreeRTOS on Zerynth VM with Hexiwear MCU as target board could a make a room for another RTOS hence enhancing the features and support of currently available VM. This thesis report describes all porting steps, procedures and testing methodologies starting from configuring a new hardware platform Hexiwear to FreeRTOS porting on Zerynth V

    Shared-Semaphored Cache Implementation for Parallel Program Execution in Multi-Core Systems

    Get PDF
    The paper brings forward the idea of multi-threadedcomputation synchronization based on the shared semaphoredcache in the multi-core CPUs. It is dedicated to the implementationof multi-core PLC control, embedded solution or parallelcomputation of models described using hardware description languages.The shared semaphored cache is implemented as guardedmemory cells within a dedicated section of the cache memory thatis shared by multiple cores. This enables the cores to speed up thedata exchange and seamlessly synchronize the computation. Theidea has been verified by creating a multi-core system model usingVerilog HDL. The simulation of task synchronization methodsallows for proving the benefits of shared semaphored memorycells over standard synchronization methods. The proposed ideaenhances the computation in the algorithms that consist ofrelatively short tasks that can be processed in parallel andrequires fast synchronization mechanisms to avoid data raceconditions
    • …
    corecore