Search CORE

70 research outputs found

Multi-core architectures with coarse-grained dynamically reconfigurable processors for broadband wireless access technologies

Author: Han Wei
Publication venue: The University of Edinburgh
Publication date: 01/01/2010
Field of study

Broadband Wireless Access technologies have significant market potential, especially the WiMAX protocol which can deliver data rates of tens of Mbps. Strong demand for high performance WiMAX solutions is forcing designers to seek help from multi-core processors that offer competitive advantages in terms of all performance metrics, such as speed, power and area. Through the provision of a degree of flexibility similar to that of a DSP and performance and power consumption advantages approaching that of an ASIC, coarse-grained dynamically reconfigurable processors are proving to be strong candidates for processing cores used in future high performance multi-core processor systems. This thesis investigates multi-core architectures with a newly emerging dynamically reconfigurable processor – RICA, targeting WiMAX physical layer applications. A novel master-slave multi-core architecture is proposed, using RICA processing cores. A SystemC based simulator, called MRPSIM, is devised to model this multi-core architecture. This simulator provides fast simulation speed and timing accuracy, offers flexible architectural options to configure the multi-core architecture, and enables the analysis and investigation of multi-core architectures. Meanwhile a profiling-driven mapping methodology is developed to partition the WiMAX application into multiple tasks as well as schedule and map these tasks onto the multi-core architecture, aiming to reduce the overall system execution time. Both the MRPSIM simulator and the mapping methodology are seamlessly integrated with the existing RICA tool flow. Based on the proposed master-slave multi-core architecture, a series of diverse homogeneous and heterogeneous multi-core solutions are designed for different fixed WiMAX physical layer profiles. Implemented in ANSI C and executed on the MRPSIM simulator, these multi-core solutions contain different numbers of cores, combine various memory architectures and task partitioning schemes, and deliver high throughputs at relatively low area costs. Meanwhile a design space exploration methodology is developed to search the design space for multi-core systems to find suitable solutions under certain system constraints. Finally, laying a foundation for future multithreading exploration on the proposed multi-core architecture, this thesis investigates the porting of a real-time operating system – Micro C/OS-II to a single RICA processor. A multitasking version of WiMAX is implemented on a single RICA processor with the operating system support

Edinburgh Research Archive

Design and Implementation of IDCT/IDST-Specific Accelerators for HEVC Standard on Heterogeneous Accelerator-Rich Platform

Author: Pourabed Mohammad Ali
Publication venue
Publication date: 08/05/2019
Field of study

Having High Efficiency Video Coding (HEVC) is important for image processing, reducing bandwidth, and increasing video quality. There are different methods that can be used to implement HEVC. This thesis focuses on design and implementation of application-specific accelerators for IDCT/IDST algorithms dedicated for HEVC standard. Those algorithms are parallel-in-nature tasks which makes them suitable to be executed by heterogeneous multicore platforms. This is done using accelerators which are required for power efficient processing. In this study, Coarse-Grained Reconfigurable Arrays (CGRAs) are used for making a template for an accelerator. CGRA has one of the major roles in a Heterogeneous Accelerator-Rich Platforms (HARP) as it is capable of accelerating non-parallel loops with lower loop counts. This thesis includes various algorithms for the use of IDCT and IDST with different designs and templates, reaching a unique final architecture. The final output intended is to reach 4 points IDST together with a 4/8 points IDCT. Another feature added to the hypothesis is the use of different dimensions for the CGRA template in order to have a different type of accelerator. The many CGRAs are combined together in successive arrangement with Reduced Instructions Set Computers (RISC) over the Network-on-Chip (NoC). The aim is to study the performance of the accelerator used for the IDCT and the IDST. This can be evaluated as the data movement through NoC network along with comparison of performance of accelerator with clock cycles in order to calculate the efficiency of the system. The results show that a four point IDST and IDCT can be computed in 56 clock cycles. In addition, the 8 point IDCT can be implemented in 64 cycles. One important factor to consider during the study is the power and energy consumption which is important in this century. The dynamic power dissipation usage for the routing of data has reached a value of 4.03 mW. Whereas, the energy consumption was 1.76

\mu

J for the 4 points system (IDCT and IDST) and 3.06

\mu

J for the 8 points (IDCT). Processing Elements (PEs) are used for implementing the transform algorithm and units were operated at 200 MHz. Finally, these results show that 1080P image at 30 frames per second can be attained by using FPGA

Trepo - Institutional Repository of Tampere University

Exploiting reconfigurable heterogenous parallel architectures in a multitasking context: a systems approach

Author: Συρίβελης Δημήτρης
Publication venue
Publication date: 01/01/2009
Field of study

University of Thessaly Institutional Repository

Reconfiguration of field programmable logic in embedded systems

Author: Kennedy Irwin O.
Publication venue: The University of Edinburgh
Publication date: 01/01/2005
Field of study

Edinburgh Research Archive

Stochastic Performance Throttling for Multicore Architectures under Spatial and Temporal Dependencies

Author: Iqbal Nabeel
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2013
Field of study

KITopen

협업 로봇을 위한 서비스 기반과 모델 기반의 소프트웨어 개발 방법론

Author: 홍혜선
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 전기·컴퓨터공학부,2020. 2. 하순회.가까운 미래에는 다양한 로봇이 다양한 분야에서 하나의 임무를 협력하여 수행하는 모습은 흔히 볼 수 있게 될 것이다. 그러나 실제로 이러한 모습이 실현되기에는 두 가지의 어려움이 있다. 먼저 로봇을 운용하기 위한 소프트웨어를 명세하는 기존 연구들은 대부분 개발자가 로봇의 하드웨어와 소프트웨어에 대한 지식을 알고 있는 것을 가정하고 있다. 그래서 로봇이나 컴퓨터에 대한 지식이 없는 사용자들이 여러 대의 로봇이 협력하는 시나리오를 작성하기는 쉽지 않다. 또한, 로봇의 소프트웨어를 개발할 때 로봇의 하드웨어의 특성과 관련이 깊어서, 다양한 로봇의 소프트웨어를 개발하는 것도 간단하지 않다. 본 논문에서는 상위 수준의 미션 명세와 로봇의 행위 프로그래밍으로 나누어 새로운 소프트웨어 개발 프레임워크를 제안한다. 또한, 본 프레임워크는 크기가 작은 로봇부터 계산 능력이 충분한 로봇들이 서로 군집을 이루어 미션을 수행할 수 있도록 지원한다. 본 연구에서는 로봇의 하드웨어나 소프트웨어에 대한 지식이 부족한 사용자도 로봇의 동작을 상위 수준에서 명세할 수 있는 스크립트 언어를 제안한다. 제안하는 언어는 기존의 스크립트 언어에서는 지원하지 않는 네 가지의 기능인 팀의 구성, 각 팀의 서비스 기반 프로그래밍, 동적으로 모드 변경, 다중 작업(멀티 태스킹)을 지원한다. 우선 로봇은 팀으로 그룹 지을 수 있고, 로봇이 수행할 수 있는 기능을 서비스 단위로 추상화하여 새로운 복합 서비스를 명세할 수 있다. 또한 로봇의 멀티 태스킹을 위해 '플랜' 이라는 개념을 도입하였고, 복합 서비스 내에서 이벤트를 발생시켜서 동적으로 모드가 변환할 수 있도록 하였다. 나아가 여러 로봇의 협력이 더욱 견고하고, 유연하고, 확장성을 높이기 위해, 군집 로봇을 운용할 때 로봇이 임무를 수행하는 도중에 문제가 생길 수 있으며, 상황에 따라 로봇을 동적으로 다른 행위를 수행할 수 있다고 가정한다. 이를 위해 동적으로도 팀을 구성할 수 있고, 여러 대의 로봇이 하나의 서비스를 수행하는 그룹 서비스를 지원하고, 일대 다 통신과 같은 새로운 기능을 스크립트 언어에 반영하였다. 따라서 확장된 상위 수준의 스크립트 언어는 비전문가도 다양한 유형의 협력 임무를 쉽게 명세할 수 있다. 로봇의 행위를 프로그래밍하기 위해 다양한 소프트웨어 개발 프레임워크가 연구되고 있다. 특히 재사용성과 확장성을 중점으로 둔 연구들이 최근 많이 사용되고 있지만, 대부분의 이들 연구는 리눅스 운영체제와 같이 많은 하드웨어 자원을 필요로 하는 운영체제를 가정하고 있다. 또한, 프로그램의 분석 및 성능 예측 등을 고려하지 않기 때문에, 자원 제약이 심한 크기가 작은 로봇의 소프트웨어를 개발하기에는 어렵다. 그래서 본 연구에서는 임베디드 소프트웨어를 설계할 때 쓰이는 정형적인 모델을 이용한다. 이 모델은 정적 분석과 성능 예측이 가능하지만, 로봇의 행위를 표현하기에는 제약이 있다. 그래서 본 논문에서 외부의 이벤트에 의해 수행 중간에 행위를 변경하는 로봇을 위해 유한 상태 머신 모델과 데이터 플로우 모델이 결합하여 동적 행위를 명세할 수 있는 확장된 모델을 적용한다. 그리고 딥러닝과 같이 계산량을 많이 필요로 하는 응용을 분석하기 위해, 루프 구조를 명시적으로 표현할 수 있는 모델을 제안한다. 마지막으로 여러 로봇의 협업 운용을 위해 로봇 사이에 공유되는 정보를 나타내기 위해 두 가지 모델을 사용한다. 먼저 중앙에서 공유 정보를 관리하기 위해 라이브러리 태스크라는 특별한 태스크를 통해 공유 정보를 나타낸다. 또한, 로봇이 자신의 정보를 가까운 로봇들과 공유하기 위해 멀티캐스팅을 위한 새로운 포트를 추가한다. 이렇게 확장된 정형적인 모델은 실제 로봇 코드로 자동 생성되어, 소프트웨어 설계 생산성 및 개발 효율성에 이점을 가진다. 비전문가가 명세한 스크립트 언어는 정형적인 태스크 모델로 변환하기 위해 중간 단계인 전략 단계를 추가하였다. 제안하는 방법론의 타당성을 검증하기 위해, 시뮬레이션과 여러 대의 실제 로봇을 이용한 협업하는 시나리오에 대해 실험을 진행하였다.In the near future, it will be common that a variety of robots are cooperating to perform a mission in various fields. There are two software challenges when deploying collaborative robots: how to specify a cooperative mission and how to program each robot to accomplish its mission. In this paper, we propose a novel software development framework that separates mission specification and robot behavior programming, which is called service-oriented and model-based (SeMo) framework. Also, it can support distributed robot systems, swarm robots, and their hybrid. For mission specification, a novel scripting language is proposed with the expression capability. It involves team composition and service-oriented behavior specification of each team, allowing dynamic mode change of operation and multi-tasking. Robots are grouped into teams, and the behavior of each team is defined with a composite service. The internal behavior of a composite service is defined by a sequence of services that the robots will perform. The notion of plan is applied to express multi-tasking. And the robot may have various operating modes, so mode change is triggered by events generated in a composite service. Moreover, to improve the robustness, scalability, and flexibility of robot collaboration, the high-level mission scripting language is extended with new features such as team hierarchy, group service, one-to-many communication. We assume that any robot fails during the execution of scenarios, and the grouping of robots can be made at run-time dynamically. Therefore, the extended mission specification enables a casual user to specify various types of cooperative missions easily. For robot behavior programming, an extended dataflow model is used for task-level behavior specification that does not depend on the robot hardware platform. To specify the dynamic behavior of the robot, we apply an extended task model that supports a hybrid specification of dataflow and finite state machine models. Furthermore, we propose a novel extension to allow the explicit specification of loop structures. This extension helps the compute-intensive application, which contains a lot of loop structures, to specify explicitly and analyze at compile time. Two types of information sharing, global information sharing and local knowledge sharing, are supported for robot collaboration in the dataflow graph. For global information, we use the library task, which supports shared resource management and server-client interaction. On the other hand, to share information locally with near robots, we add another type of port for multicasting and use the knowledge sharing technique. The actual robot code per robot is automatically generated from the associated task graph, which minimizes the human efforts in low-level robot programming and improves the software design productivity significantly. By abstracting the tasks or algorithms as services and adding the strategy description layer in the design flow, the mission specification is refined into task-graph specification automatically. The viability of the proposed methodology is verified with preliminary experiments with three cooperative mission scenarios with heterogeneous robot platforms and robot simulator.Chapter 1. Introduction 1 1.1 Motivation 1 1.2 Contribution 7 1.3 Dissertation Organization 9 Chapter 2. Background and Existing Research 11 2.1 Terminologies 11 2.2 Robot Software Development Frameworks 25 2.3 Parallel Embedded Software Development Framework 31 Chapter 3. Overview of the SeMo Framework 41 3.1 Motivational Examples 45 Chapter 4. Robot Behavior Programming 47 4.1 Related works 48 4.2 Model-based Task Graph Specification for Individual Robots 56 4.3 Model-based Task Graph Specification for Cooperating Robots 70 4.4 Automatic Code Generation 74 4.5 Experiments 78 Chapter 5. High-level Mission Specification 81 5.1 Service-oriented Mission Specification 82 5.2 Strategy Description 93 5.3 Automatic Task Graph Generation 96 5.4 Related works 99 5.5 Experiments 104 Chapter 6. Conclusion 114 6.1 Future Research 116 Bibliography 118 Appendices 133 요약 158Docto

SNU Open Repository and Archive

A TrustZone-assisted secure silicon on a co-design framework

Author: Pereira Sérgio Augusto Gomes
Publication venue
Publication date: 01/01/2018
Field of study

Dissertação de mestrado em Engenharia Eletrónica Industrial e ComputadoresEmbedded systems were for a long time, single-purpose and closed systems, characterized by hardware resource constraints and real-time requirements. Nowadays, their functionality is ever-growing, coupled with an increasing complexity and heterogeneity. Embedded applications increasingly demand employment of general-purpose operating systems (GPOSs) to handle operator interfaces and general-purpose computing tasks, while simultaneously ensuring the strict timing requirements. Virtualization, which enables multiple operating systems (OSs) to run on top of the same hardware platform, is gaining momentum in the embedded systems arena, driven by the growing interest in consolidating and isolating multiple and heterogeneous environments. The penalties incurred by classic virtualization approaches is pushing research towards hardware-assisted solutions. Among the existing commercial off-the-shelf (COTS) technologies for virtualization, ARM TrustZone technology is gaining momentum due to the supremacy and lower cost of TrustZone-enabled processors. Programmable system-on-chips (SoCs) are becoming leading players in the embedded systems space, because the combination of a plethora of hard resources with programmable logic enables the efficient implementation of systems that perfectly fit the heterogeneous nature of embedded applications. Moreover, novel disruptive approaches make use of field-programmable gate array (FPGA) technology to enhance virtualization mechanisms. This master’s thesis proposes a hardware-software co-design framework for easing the economy of addressing the new generation of embedded systems requirements. ARM TrustZone is exploited to implement the root-of-trust of a virtualization-based architecture that allows the execution of a GPOS side-by-side with a real-time OS (RTOS). RTOS services were offloaded to hardware, so that it could present simultaneous improvements on performance and determinism. Instead of focusing in a concrete application, the goal is to provide a complete framework, specifically tailored for Zynq-base devices, that developers can use to accelerate a bunch of distinct applications across different embedded industries.Os sistemas embebidos foram, durante muitos anos, sistemas com um simples e único propósito, caracterizados por recursos de hardware limitados e com cariz de tempo real. Hoje em dia, o número de funcionalidades começa a escalar, assim como o grau de complexidade e heterogeneidade. As aplicações embebidas exigem cada vez mais o uso de sistemas operativos (OSs) de uso geral (GPOS) para lidar com interfaces gráficas e tarefas de computação de propósito geral. Porém, os seus requisitos primordiais de tempo real mantém-se. A virtualização permite que vários sistemas operativos sejam executados na mesma plataforma de hardware. Impulsionada pelo crescente interesse em consolidar e isolar ambientes múltiplos e heterogéneos, a virtualização tem ganho uma crescente relevância no domínio dos sistemas embebidos. As adversidades que advém das abordagens de virtualização clássicas estão a direcionar estudos no âmbito de soluções assistidas por hardware. Entre as tecnologias comerciais existentes, a tecnologia ARM TrustZone está a ganhar muita relevância devido à supremacia e ao menor custo dos processadores que suportam esta tecnologia. Plataformas hibridas, que combinam processadores com lógica programável, estão em crescente penetração no domínio dos sistemas embebidos pois, disponibilizam um enorme conjunto de recursos que se adequam perfeitamente à natureza heterogénea dos sistemas atuais. Além disso, existem soluções recentes que fazem uso da tecnologia de FPGA para melhorar os mecanismos de virtualização. Esta dissertação propõe uma framework baseada em hardware-software de modo a cumprir os requisitos da nova geração de sistemas embebidos. A tecnologia TrustZone é explorada para implementar uma arquitetura que permite a execução de um GPOS lado-a-lado com um sistemas operativo de tempo real (RTOS). Os serviços disponibilizados pelo RTOS são migrados para hardware, para melhorar o desempenho e determinismo do OS. Em vez de focar numa aplicação concreta, o objetivo é fornecer uma framework especificamente adaptada para dispositivos baseados em System-on-chips Zynq, de forma a que developers possam usar para acelerar um vasto número de aplicações distintas em diferentes setores

Universidade do Minho: RepositoriUM

Realizing Software Defined Radio - A Study in Designing Mobile Supercomputers.

Author: Lin Yuan
Publication venue
Publication date: 01/01/2008
Field of study

The physical layer of most wireless protocols is traditionally implemented in custom hardware to satisfy the heavy computational requirements while keeping power consumption to a minimum. These implementations are time consuming to design and difficult to verify. A programmable hardware platform capable of supporting software implementations of the physical layer, or Software Defined Radio (SDR), has a number of advantages. These include support for multiple protocols, faster time-to-market, higher chip volumes, and support for late implementation changes. The challenge is to achieve this under the power budget of a mobile device. Wireless communications belong to an emerging class of applications with the processing requirements of a supercomputer but the power constraints of a mobile device -- mobile supercomputing. This thesis presents a set of design proposals for building a programmable wireless communication solution. In order to design a solution that can meet the lofty requirements of SDR, this thesis takes an application-centric design approach -- evaluate and optimize all aspects of the design based on the characteristics of wireless communication protocols. This includes a DSP processor architecture optimized for wireless baseband processing, wireless algorithm optimizations, and language and compilation tool support for the algorithm software and the processor hardware. This thesis first analyzes the software characteristics of SDR. Based on the analysis, this thesis proposes the Signal-Processing On-Demand Architecture (SODA), a fully programmable multi-core architecture that can support the computation requirements of third generation wireless protocols, while operating within the power budget of a mobile device. This thesis then presents wireless algorithm implementations and optimizations for the SODA processor architecture. A signal processing language extension (SPEX) is proposed to help the software development efforts of wireless communication protocols on SODA-like multi-core architecture. And finally, the SPIR compiler is proposed to automatically map SPEX code onto the multi-core processor hardware.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61760/1/linyz_1.pd

CiteSeerX

Deep Blue Documents at the University of Michigan

Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

Author: Medardoni Simone
Publication venue
Publication date: 01/01/2009
Field of study

The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

EprintsUnife

Archivio istituzionale della ricerca - Università di Ferrara

A hardware-software codesign framework for cellular computing

Author: Mudry Pierre-André
Publication venue: Lausanne, EPFL
Publication date: 29/01/2009
Field of study

Until recently, the ever-increasing demand of computing power has been met on one hand by increasing the operating frequency of processors and on the other hand by designing architectures capable of exploiting parallelism at the instruction level through hardware mechanisms such as super-scalar execution. However, both these approaches seem to have reached a plateau, mainly due to issues related to design complexity and cost-effectiveness. To face the stabilization of performance of single-threaded processors, the current trend in processor design seems to favor a switch to coarser-grain parallelization, typically at the thread level. In other words, high computational power is achieved not only by a single, very fast and very complex processor, but through the parallel operation of several processors, each executing a different thread. Extrapolating this trend to take into account the vast amount of on-chip hardware resources that will be available in the next few decades (either through further shrinkage of silicon fabrication processes or by the introduction of molecular-scale devices), together with the predicted features of such devices (e.g., the impossibility of global synchronization or higher failure rates), it seems reasonable to foretell that current design techniques will not be able to cope with the requirements of next-generation electronic devices and that novel design tools and programming methods will have to be devised. A tempting source of inspiration to solve the problems implied by a massively parallel organization and inherently error-prone substrates is biology. In fact, living beings possess characteristics, such as robustness to damage and self-organization, which were shown in previous research as interesting to be implemented in hardware. For instance, it was possible to realize relatively simple systems, such as a self-repairing watch. Overall, these bio-inspired approaches seem very promising but their interest for a wider audience is problematic because their heavily hardware-oriented designs lack some of the flexibility achievable with a general purpose processor. In the context of this thesis, we will introduce a processor-grade processing element at the heart of a bio-inspired hardware system. This processor, based on a single-instruction, features some key properties that allow it to maintain the versatility required by the implementation of bio-inspired mechanisms and to realize general computation. We will also demonstrate that the flexibility of such a processor enables it to be evolved so it can be tailored to different types of applications. In the second half of this thesis, we will analyze how the implementation of a large number of these processors can be used on a hardware platform to explore various bio-inspired mechanisms. Based on an extensible platform of many FPGAs, configured as a networked structure of processors, the hardware part of this computing framework is backed by an open library of software components that provides primitives for efficient inter-processor communication and distributed computation. We will show that this dual software–hardware approach allows a very quick exploration of different ways to solve computational problems using bio-inspired techniques. In addition, we also show that the flexibility of our approach allows it to exploit replication as a solution to issues that concern standard embedded applications

Infoscience - École polytechnique fédérale de Lausanne