18 research outputs found

    Application specific instruction set processor design for embedded application using the coware tool

    Get PDF
    An Application Specific Instruction Set Processor (ASIP) is widely used as a System on a Chip(SoC) Component. ASIPs possess an instruction set which is tai-lored to benefit a specific application. Such specialization allows ASIPs to serve as an intermediate between two dominant processor design styles- ASICs which has high processing abilities at the cost of limited programmability and Programmable solu-tions such as FPGAs that provide programming exibility at the cost of less energy eficiency. In this dissertation the goal is to design ASIP, keeping in mind a temper-ature sensor system. The platform used for processor design is LISA 2.0 description language and processor designing environment from CoWare. Coware processor de-signer allows processor architecture to be defined at an abstract level and automatic generation of chain of software tools like assembler, linker and simulator for functional verification followed by RTL level description. RTL level description is used to gen-erate synthesized report of the design using RTL compiler and finally the layout is created using Cadence encounter

    The implementation of an LDPC decoder in a Network on Chip environment

    Get PDF
    The proposed project takes origin from a cooperation initiative named NEWCOM++ among research groups to develop 3G wireless mobile system. This work, in particular, tries to focuse on the communication errors arising on a message signal characterized by working under WiMAX 802.16e standard. It will be shown how this last wireless generation protocol needs a specific flexible instrumentation and why an LDPC error correction code suitable in order to respect the quality restrictions. A chapter will be dedicated to describe, not from a mathematical point of view, the LDPC algorithm theory and how it can be graphically represented to better organize the decodification process. The main objective of this work is to validate the PHAL-concept when addressing a complex and computationally intensive design like the LDPC encoder/decoder. The expected results should be both conceptual; identifying the lacks on the PHAL concept when addressing a real problem; and second to determine the overhead introduced by PHAL in the implementation of a LDPC decoder. The mission is to build a NoC (Network on Chip) able to perform the same task of a general purpose processor, but in less time and with better efficiency, in terms of component flexibility and throughput. The single element of the network is a basic processor element (PE) formed by the union of two separated components: a special purpose processor ASIP, the responsible of the input data LDPC decoding, and the router component PHAL, checking incoming data packets and scanning the temporization of tasks execution. Supported by a specific programming tool, the ASIP has been completely designed, from the architecture resources to the instruction set, through a language like C. Realized in this SystemC code and converted in VHDL language, it's been synthesized as to fit onto an FPGA of the Xilinx Virtex-5 family. Although the main purpose regards the making of an application as flexible as possible, a WiMAX-orientated LDPC implemented on a FPGA saves space and resources, choosing the one that best suits the project synthesis. This is because encoders and decoders will have to find room in the communication tools (e.g. modems) as best as possible. The whole network scenary has been mounted through a Linux application, acting as a master element. The entire environment will require the use of VPI libraries and components able to manage the communication protocols and interfacing mechanisms

    The implementation of an LDPC decoder in a Network on Chip environment

    Get PDF
    The proposed project takes origin from a cooperation initiative named NEWCOM++ among research groups to develop 3G wireless mobile system. This work, in particular, tries to focuse on the communication errors arising on a message signal characterized by working under WiMAX 802.16e standard. It will be shown how this last wireless generation protocol needs a specific flexible instrumentation and why an LDPC error correction code suitable in order to respect the quality restrictions. A chapter will be dedicated to describe, not from a mathematical point of view, the LDPC algorithm theory and how it can be graphically represented to better organize the decodification process. The main objective of this work is to validate the PHAL-concept when addressing a complex and computationally intensive design like the LDPC encoder/decoder. The expected results should be both conceptual; identifying the lacks on the PHAL concept when addressing a real problem; and second to determine the overhead introduced by PHAL in the implementation of a LDPC decoder. The mission is to build a NoC (Network on Chip) able to perform the same task of a general purpose processor, but in less time and with better efficiency, in terms of component flexibility and throughput. The single element of the network is a basic processor element (PE) formed by the union of two separated components: a special purpose processor ASIP, the responsible of the input data LDPC decoding, and the router component PHAL, checking incoming data packets and scanning the temporization of tasks execution. Supported by a specific programming tool, the ASIP has been completely designed, from the architecture resources to the instruction set, through a language like C. Realized in this SystemC code and converted in VHDL language, it's been synthesized as to fit onto an FPGA of the Xilinx Virtex-5 family. Although the main purpose regards the making of an application as flexible as possible, a WiMAX-orientated LDPC implemented on a FPGA saves space and resources, choosing the one that best suits the project synthesis. This is because encoders and decoders will have to find room in the communication tools (e.g. modems) as best as possible. The whole network scenary has been mounted through a Linux application, acting as a master element. The entire environment will require the use of VPI libraries and components able to manage the communication protocols and interfacing mechanisms

    Low power architectures for streaming applications

    Get PDF

    High-Level Design Space and Flexibility Exploration for Adaptive, Energy-Efficient WCDMA Channel Estimation Architectures

    Get PDF
    Due to the fast changing wireless communication standards coupled with strict performance constraints, the demand for flexible yet high-performance architectures is increasing. To tackle the flexibility requirement, software-defined radio (SDR) is emerging as an obvious solution, where the underlying hardware implementation is tuned via software layers to the varied standards depending on power-performance and quality requirements leading to adaptable, cognitive radio. In this paper, we conduct a case study for representatives of two complexity classes of WCDMA channel estimation algorithms and explore the effect of flexibility on energy efficiency using different implementation options. Furthermore, we propose new design guidelines for both highly specialized architectures and highly flexible architectures using high-level synthesis, to enable the required performance and flexibility to support multiple applications. Our experiments with various design points show that the resulting architectures meet the performance constraints of WCDMA and a wide range of options are offered for tuning such architectures depending on power/performance/area constraints of SDR

    An automated OpenCL FPGA compilation framework targeting a configurable, VLIW chip multiprocessor

    Get PDF
    Modern system-on-chips augment their baseline CPU with coprocessors and accelerators to increase overall computational capacity and power efficiency, and thus have evolved into heterogeneous systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This thesis discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a customised VLIW chip multiprocessor (CMP) architecture, known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on the LE1 CPU. The framework fully automates the compilation flow and supports work-item coalescing to better utilise the CPU cores and alleviate the effects of thread divergence. This thesis discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework on a highly precise cycle-accurate simulator. This is achieved through the execution of 12 benchmarks across 240 different machine configurations, as well as further results utilising an incomplete development branch of the compiler. It is shown that the problems generally scale well with the LE1 architecture, up to eight cores, when the memory system becomes a serious bottleneck. Results demonstrate superlinear performance on certain benchmarks (x9 for the bitonic sort benchmark with 8 dual-issue cores) with further improvements from compiler optimisations (x14 for bitonic with the same configuration

    Design methodologies for instruction-set extensible processors

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Extensible microprocessor without interlocked pipeline stages (emips), the reconfigurable microprocessor

    Get PDF
    In this thesis we propose to realize the performance benefits of applicationspecific hardware optimizations in a general-purpose, multi-user system environment using a dynamically extensible microprocessor architecture. We have called our dynamically extensible microprocessor design the Extensible Microprocessor without Interlocked Pipeline Stages, or eMIPS. The eMIPS architecture uses the interaction of fixed and configurable logic available in modern Field Programmable Gate Array (FPGA). This interaction is used to address the limitations of current microprocessor architectures based solely on Application Specific Integrated Circuits (ASIC). These limitations include inflexibility, size, and application specific performance optimization. The eMIPS system allows multiple secure extensions to load dynamically and to plug into the stages of a pipelined central processing unit (CPU) data path, thereby extending the core instruction set of the microprocessor. Extensions can also be used to realize on-chip peripherals, and if area permits, even multiple cores. Extension instructions reduce dramatically the execution time of frequently executed instruction patterns. These new functionalities we have developed can be exploited by patching the binaries of existing applications, without any changes to the compilers. A FPGA based workstation prototype and a flexible simulation system implementating this design demonstrates speedups of 2x-3x on a set of applications that include video games, real-time programs and the SPEC2000 integer benchmarks. eMIPS is the first realized workstation based entirely on a dynamically extensible microprocessor that is safe for general purpose, multi-user applications. By exposing the individual stages of the data path, eMIPS allows optimizations not previously possible. This includes permitting safe and coherent accesses to memory from within an extension, optimizing multi-branched blocks, and throwing precise and restart able exceptions from within an extension. This work describes a simplified implementation of an extensible microprocessor architecture based on the Microprocessor without Interlocked Pipeline Stages (MIPS) Reduced Instruction Set Computer (RISC) architecture. The concepts and methods contained within this thesis may be applied to other similar architectures. Given this simplified prototype we look forward to propose how this architecture will be expanded as it matures

    Run-time management for future MPSoC platforms

    Get PDF
    In recent years, we are witnessing the dawning of the Multi-Processor Systemon- Chip (MPSoC) era. In essence, this era is triggered by the need to handle more complex applications, while reducing overall cost of embedded (handheld) devices. This cost will mainly be determined by the cost of the hardware platform and the cost of designing applications for that platform. The cost of a hardware platform will partly depend on its production volume. In turn, this means that ??exible, (easily) programmable multi-purpose platforms will exhibit a lower cost. A multi-purpose platform not only requires ??exibility, but should also combine a high performance with a low power consumption. To this end, MPSoC devices integrate computer architectural properties of various computing domains. Just like large-scale parallel and distributed systems, they contain multiple heterogeneous processing elements interconnected by a scalable, network-like structure. This helps in achieving scalable high performance. As in most mobile or portable embedded systems, there is a need for low-power operation and real-time behavior. The cost of designing applications is equally important. Indeed, the actual value of future MPSoC devices is not contained within the embedded multiprocessor IC, but in their capability to provide the user of the device with an amount of services or experiences. So from an application viewpoint, MPSoCs are designed to ef??ciently process multimedia content in applications like video players, video conferencing, 3D gaming, augmented reality, etc. Such applications typically require a lot of processing power and a signi??cant amount of memory. To keep up with ever evolving user needs and with new application standards appearing at a fast pace, MPSoC platforms need to be be easily programmable. Application scalability, i.e. the ability to use just enough platform resources according to the user requirements and with respect to the device capabilities is also an important factor. Hence scalability, ??exibility, real-time behavior, a high performance, a low power consumption and, ??nally, programmability are key components in realizing the success of MPSoC platforms. The run-time manager is logically located between the application layer en the platform layer. It has a crucial role in realizing these MPSoC requirements. As it abstracts the platform hardware, it improves platform programmability. By deciding on resource assignment at run-time and based on the performance requirements of the user, the needs of the application and the capabilities of the platform, it contributes to ??exibility, scalability and to low power operation. As it has an arbiter function between different applications, it enables real-time behavior. This thesis details the key components of such an MPSoC run-time manager and provides a proof-of-concept implementation. These key components include application quality management algorithms linked to MPSoC resource management mechanisms and policies, adapted to the provided MPSoC platform services. First, we describe the role, the responsibilities and the boundary conditions of an MPSoC run-time manager in a generic way. This includes a de??nition of the multiprocessor run-time management design space, a description of the run-time manager design trade-offs and a brief discussion on how these trade-offs affect the key MPSoC requirements. This design space de??nition and the trade-offs are illustrated based on ongoing research and on existing commercial and academic multiprocessor run-time management solutions. Consequently, we introduce a fast and ef??cient resource allocation heuristic that considers FPGA fabric properties such as fragmentation. In addition, this thesis introduces a novel task assignment algorithm for handling soft IP cores denoted as hierarchical con??guration. Hierarchical con??guration managed by the run-time manager enables easier application design and increases the run-time spatial mapping freedom. In turn, this improves the performance of the resource assignment algorithm. Furthermore, we introduce run-time task migration components. We detail a new run-time task migration policy closely coupled to the run-time resource assignment algorithm. In addition to detailing a design-environment supported mechanism that enables moving tasks between an ISP and ??ne-grained recon??gurable hardware, we also propose two novel task migration mechanisms tailored to the Network-on-Chip environment. Finally, we propose a novel mechanism for task migration initiation, based on reusing debug registers in modern embedded microprocessors. We propose a reactive on-chip communication management mechanism. We show that by exploiting an injection rate control mechanism it is possible to provide a communication management system capable of providing a soft (reactive) QoS in a NoC. We introduce a novel, platform independent run-time algorithm to perform quality management, i.e. to select an application quality operating point at run-time based on the user requirements and the available platform resources, as reported by the resource manager. This contribution also proposes a novel way to manage the interaction between the quality manager and the resource manager. In order to have a the realistic, reproducible and ??exible run-time manager testbench with respect to applications with multiple quality levels and implementation tradev offs, we have created an input data generation tool denoted Pareto Surfaces For Free (PSFF). The the PSFF tool is, to the best of our knowledge, the ??rst tool that generates multiple realistic application operating points either based on pro??ling information of a real-life application or based on a designer-controlled random generator. Finally, we provide a proof-of-concept demonstrator that combines these concepts and shows how these mechanisms and policies can operate for real-life situations. In addition, we show that the proposed solutions can be integrated into existing platform operating systems
    corecore