385 research outputs found
MORA - an architecture and programming model for a resource efficient coarse grained reconfigurable processor
This paper presents an architecture and implementation details for MORA, a novel coarse grained reconfigurable processor for accelerating media processing applications. The MORA architecture involves a 2-D array of several such processors, to deliver low cost, high throughput performance in media processing applications. A distinguishing feature of the MORA architecture is the co-design of hardware architecture and low-level programming language throughout the design cycle. The implementation details for the single MORA processor, and benchmark evaluation using a cycle accurate simulator are presented
Coarse-grained reconfigurable array architectures
Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code
Exploring Processor and Memory Architectures for Multimedia
Multimedia has become one of the cornerstones of our 21st century society and, when combined with mobility, has enabled a tremendous evolution of our society. However, joining these two concepts introduces many technical challenges. These range from having sufficient performance for handling multimedia content to having the battery stamina for acceptable mobile usage. When taking a projection of where we are heading, we see these issues becoming ever more challenging by increased mobility as well as advancements in multimedia content, such as introduction of stereoscopic 3D and augmented reality. The increased performance needs for handling multimedia come not only from an ongoing step-up in resolution going from QVGA (320x240) to Full HD (1920x1080) a 27x increase in less than half a decade. On top of this, there is also codec evolution (MPEG-2 to H.264 AVC) that adds to the computational load increase. To meet these performance challenges there has been processing and memory architecture advances (SIMD, out-of-order superscalarity, multicore processing and heterogeneous multilevel memories) in the mobile domain, in conjunction with ever increasing operating frequencies (200MHz to 2GHz) and on-chip memory sizes (128KB to 2-3MB). At the same time there is an increase in requirements for mobility, placing higher demands on battery-powered systems despite the steady increase in battery capacity (500 to 2000mAh). This leaves negative net result in-terms of battery capacity versus performance advances. In order to make optimal use of these architectural advances and to meet the power limitations in mobile systems, there is a need for taking an overall approach on how to best utilize these systems. The right trade-off between performance and power is crucial. On top of these constraints, the flexibility aspects of the system need to be addressed. All this makes it very important to reach the right architectural balance in the system. The first goal for this thesis is to examine multimedia applications and propose a flexible solution that can meet the architectural requirements in a mobile system. Secondly, propose an automated methodology of optimally mapping multimedia data and instructions to a heterogeneous multilevel memory subsystem. The proposed methodology uses constraint programming for solving a multidimensional optimization problem. Results from this work indicate that using today’s most advanced mobile processor technology together with a multi-level heterogeneous on-chip memory subsystem can meet the performance requirements for handling multimedia. By utilizing the automated optimal memory mapping method presented in this thesis lower total power consumption can be achieved, whilst performance for multimedia applications is improved, by employing enhanced memory management. This is achieved through reduced external accesses and better reuse of memory objects. This automatic method shows high accuracy, up to 90%, for predicting multimedia memory accesses for a given architecture
Reconfigurable architectures for the next generation of mobile device telecommunications systems
Mobile devices have become a dominant tool in our daily lives. Business and
personal usage has escalated tremendously since the emergence of smartphones
and tablets. The combination of powerful processing in mobile devices, such as
smartphones and the Internet, have established a new era for communications
systems. This has put further pressure on the performance and efficiency of
telecommunications systems in delivering the aspirations of users. Mobile device
users no longer want devices that merely perform phone calls and messaging.
Rather, they look for further interactive applications such as video streaming,
navigation and real time social interaction. Such applications require a new set of
hardware and standards. The WiFi (IEEE 802.11) standard has been at the forefront
of reliable and high-speed internet access telecommunications. This is due to its
high signal quality (quality of service) and speed (throughput). However, its limited
availability and short range highlights the need for further protocols, in particular
when far away from access points or base stations. This led to the emergence of 3G
followed by 4G and the upcoming 5G standard that, if fully realised, will provide
another dimension in “anywhere, anytime internet connectivity.” On the other
hand, the WiMAX (IEEE 802.16) standard promises to exceed the WiFi signal
coverage range. The coverage range could be extended to kilometres at least with a
better or similar WiFi signal level.
This thesis considers a dynamically reconfigurable architecture that is capable of
processing various modules within telecommunications systems. Forward error
correction, coder and navigation modules are deployed in a unified low power
communication platform. These modules have been selected since they are among
those with the highest demand in terms of processing power, strict processing time
or throughput. The modules are mainly realised within WiFi and WiMAX systems
in addition to global positioning systems (GPS). The idea behind the selection of
these modules is to investigate the possibility of designing an architecture capable
of processing various systems and dynamically reconfiguring between them. The
GPS system is a power-hungry application and, at the same time, it is not needed
all of the time. Hence, one key idea presented in this thesis is to effectively exploit
the dynamic reconfiguration capability so as to reconfigure the architecture (GPS)
when it is not needed in order to process another needed application or function
such as WiFi or WiMAX. This will allow lower energy consumption and the
optimum usage of the hardware available on the device.
This work investigates the major current coarse-grain reconfigurable architectures.
A novel multi-rate convolution encoder is then designed and realised as a
reconfigurable fabric. This demonstrates the ability to adapt the algorithms
involved to meet various requirements. A throughput of between 200 and 800
Mbps has been achieved for the rates 1/2 to 7/8, which is a great achievement for
the proposed novel architecture. A reconfigurable interleaver is designed as a
standalone fabric and on a dynamically reconfigurable processor. High throughputs
exceeding 90 Mbps are achieved for the various supported block sizes. The Reed
Solomon coder is the next challenging system to be designed into a dynamically
reconfigurable processor. A novel Galois Field multiplier is designed and
integrated into the developed Reed Solomon reconfigurable processor. As a result
of this work, throughputs of 200Mbps and 93Mbps respectively for RS encoding
and decoding are achieved. A GPS correlation module is also investigated in this
work. This is the main part of the GPS receiver responsible for continuously
tracking GPS satellites and extracting messages from them. The challenging aspect
of this part is its real-time nature and the associated critical time constraints. This
work resulted in a novel dynamically reconfigurable multi-channel GPS correlator
with up to 72 simultaneous channels.
This work is a contribution towards a global unified processing platform that is
capable of processing communication-related operations efficiently and
dynamically with minimum energy consumption
Further Specialization of Clustered VLIW Processors: A MAP Decoder for Software Defined Radio
Turbo codes are extensively used in current communications standards and have a promising outlook for future generations. The advantages of software defined radio, especially dynamic reconfiguration, make it very attractive in this multi-standard scenario. However, the complex and power consuming implementation of the maximum a posteriori (MAP) algorithm, employed by turbo decoders, sets hurdles to this goal. This work introduces an ASIP architecture for the MAP algorithm, based on a dual-clustered VLIW processor. It displays the good performance of application specific designs along with the versatility of processors, which makes it compliant with leading edge standards. The machine deals with multi-operand instructions in an innovative way, the fetching and assertion of data is serialized and the addressing is automatized and transparent for the programmer. The performance-area trade-off of the proposed architecture achieves a throughput of 8 cycles per symbol with very low power dissipation
- …