385 research outputs found

    MORA - an architecture and programming model for a resource efficient coarse grained reconfigurable processor

    Get PDF
    This paper presents an architecture and implementation details for MORA, a novel coarse grained reconfigurable processor for accelerating media processing applications. The MORA architecture involves a 2-D array of several such processors, to deliver low cost, high throughput performance in media processing applications. A distinguishing feature of the MORA architecture is the co-design of hardware architecture and low-level programming language throughout the design cycle. The implementation details for the single MORA processor, and benchmark evaluation using a cycle accurate simulator are presented

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    Domain-specific and reconfigurable instruction cells based architectures for low-power SoC

    Get PDF

    Reconfigurable Instruction Cell Architecture Reconfiguration and Interconnects

    Get PDF

    Exploring Processor and Memory Architectures for Multimedia

    Get PDF
    Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture

    Reconfigurable architectures for beyond 3G wireless communication systems

    Get PDF

    Reconfigurable architectures for the next generation of mobile device telecommunications systems

    Get PDF
    Mobile devices have become a dominant tool in our daily lives. Business and personal usage has escalated tremendously since the emergence of smartphones and tablets. The combination of powerful processing in mobile devices, such as smartphones and the Internet, have established a new era for communications systems. This has put further pressure on the performance and efficiency of telecommunications systems in delivering the aspirations of users. Mobile device users no longer want devices that merely perform phone calls and messaging. Rather, they look for further interactive applications such as video streaming, navigation and real time social interaction. Such applications require a new set of hardware and standards. The WiFi (IEEE 802.11) standard has been at the forefront of reliable and high-speed internet access telecommunications. This is due to its high signal quality (quality of service) and speed (throughput). However, its limited availability and short range highlights the need for further protocols, in particular when far away from access points or base stations. This led to the emergence of 3G followed by 4G and the upcoming 5G standard that, if fully realised, will provide another dimension in “anywhere, anytime internet connectivity.” On the other hand, the WiMAX (IEEE 802.16) standard promises to exceed the WiFi signal coverage range. The coverage range could be extended to kilometres at least with a better or similar WiFi signal level. This thesis considers a dynamically reconfigurable architecture that is capable of processing various modules within telecommunications systems. Forward error correction, coder and navigation modules are deployed in a unified low power communication platform. These modules have been selected since they are among those with the highest demand in terms of processing power, strict processing time or throughput. The modules are mainly realised within WiFi and WiMAX systems in addition to global positioning systems (GPS). The idea behind the selection of these modules is to investigate the possibility of designing an architecture capable of processing various systems and dynamically reconfiguring between them. The GPS system is a power-hungry application and, at the same time, it is not needed all of the time. Hence, one key idea presented in this thesis is to effectively exploit the dynamic reconfiguration capability so as to reconfigure the architecture (GPS) when it is not needed in order to process another needed application or function such as WiFi or WiMAX. This will allow lower energy consumption and the optimum usage of the hardware available on the device. This work investigates the major current coarse-grain reconfigurable architectures. A novel multi-rate convolution encoder is then designed and realised as a reconfigurable fabric. This demonstrates the ability to adapt the algorithms involved to meet various requirements. A throughput of between 200 and 800 Mbps has been achieved for the rates 1/2 to 7/8, which is a great achievement for the proposed novel architecture. A reconfigurable interleaver is designed as a standalone fabric and on a dynamically reconfigurable processor. High throughputs exceeding 90 Mbps are achieved for the various supported block sizes. The Reed Solomon coder is the next challenging system to be designed into a dynamically reconfigurable processor. A novel Galois Field multiplier is designed and integrated into the developed Reed Solomon reconfigurable processor. As a result of this work, throughputs of 200Mbps and 93Mbps respectively for RS encoding and decoding are achieved. A GPS correlation module is also investigated in this work. This is the main part of the GPS receiver responsible for continuously tracking GPS satellites and extracting messages from them. The challenging aspect of this part is its real-time nature and the associated critical time constraints. This work resulted in a novel dynamically reconfigurable multi-channel GPS correlator with up to 72 simultaneous channels. This work is a contribution towards a global unified processing platform that is capable of processing communication-related operations efficiently and dynamically with minimum energy consumption

    Reconfigurable microarchitectures at the programmable logic interface

    Get PDF

    Domain specific high performance reconfigurable architecture for a communication platform

    Get PDF

    Further Specialization of Clustered VLIW Processors: A MAP Decoder for Software Defined Radio

    Get PDF
    Turbo codes are extensively used in current communications standards and have a promising outlook for future generations. The advantages of software defined radio, especially dynamic reconfiguration, make it very attractive in this multi-standard scenario. However, the complex and power consuming implementation of the maximum a posteriori (MAP) algorithm, employed by turbo decoders, sets hurdles to this goal. This work introduces an ASIP architecture for the MAP algorithm, based on a dual-clustered VLIW processor. It displays the good performance of application specific designs along with the versatility of processors, which makes it compliant with leading edge standards. The machine deals with multi-operand instructions in an innovative way, the fetching and assertion of data is serialized and the addressing is automatized and transparent for the programmer. The performance-area trade-off of the proposed architecture achieves a throughput of 8 cycles per symbol with very low power dissipation
    corecore