1,145 research outputs found
State of the art baseband DSP platforms for Software Defined Radio: A survey
Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities and industries to support SDR applications. This article presents an overview of current platforms and analyzes the related architectural choices, the current issues in SDR, as well as potential future trends.Peer reviewe
Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)
ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability
Energy-Efficient Computing for Mobile Signal Processing
Mobile devices have rapidly proliferated, and deployment of handheld devices continues to increase at a spectacular rate. As today's devices not only support advanced signal processing of wireless communication data but also provide rich sets of applications, contemporary mobile computing requires both demanding computation and efficiency. Most mobile processors combine general-purpose processors, digital signal processors, and hardwired application-specific integrated circuits to satisfy their high-performance and low-power requirements. However, such a heterogeneous platform is inefficient in area, power and programmability. Improving the efficiency of programmable mobile systems is a critical challenge and an active area of computer systems research.
SIMD (single instruction multiple data) architectures are very effective for data-level-parallelism intense algorithms in mobile signal processing. However, new characteristics of advanced wireless/multimedia algorithms require architectural re-evaluation to achieve better energy efficiency. Therefore, fourth generation wireless protocol and high definition mobile video algorithms are analyzed to enhance a wide-SIMD architecture. The key enhancements include 1) programmable crossbar to support complex data alignment, 2) SIMD partitioning to support fine-grain SIMD computation, and 3) fused operation to support accelerating frequently used instruction pairs.
Near-threshold computation has been attractive in low-power architecture research because it balances performance and power. To further improve energy efficiency in mobile computing, near-threshold computation is applied to a wide SIMD architecture. This proposed near-threshold wide SIMD architecture-Diet SODA-presents interesting architectural design decisions such as 1) very wide SIMD datapath to compensate for degraded performance induced by near-threshold computation and 2) scatter-gather data prefetcher to exploit large latency gap between memory and the SIMD datapath. Although near-threshold computation provides excellent energy efficiency, it suffers from increased delay variations. A systematic study of delay variations in near-threshold computing is performed and simple techniques-structural duplication and voltage/frequency margining-are explored to tolerate and mitigate the delay variations in near-threshold wide SIMD architectures.
This dissertation analyzes representative wireless/multimedia mobile signal processing algorithms, proposes an energy-efficient programmable platform, and evaluates performance and power. A main theme of this dissertation is that the performance and efficiency of programmable embedded systems can be significantly improved with a combination of parallel SIMD and near-threshold computations.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/86356/1/swseo_1.pd
Recommended from our members
Design and performance optimization of asynchronous networks-on-chip
As digital systems continue to grow in complexity, the design of conventional synchronous systems is facing unprecedented challenges. The number of transistors on individual chips is already in the multi-billion range, and a greatly increasing number of components are being integrated onto a single chip. As a consequence, modern digital designs are under strong time-to-market pressure, and there is a critical need for composable design approaches for large complex systems.
In the past two decades, networks-on-chip (NoC’s) have been a highly active research area. In a NoC-based system, functional blocks are first designed individually and may run at different clock rates. These modules are then connected through a structured network for on-chip global communication. However, due to the rigidity of centrally-clocked NoC’s, there have been bottlenecks of system scalability, energy and performance, which cannot be easily solved with synchronous approaches. As a result, there has been significant recent interest in combing the notion of asynchrony with NoC designs. Since the NoC approach inherently separates the communication infrastructure, and its timing, from computational elements, it is a natural match for an asynchronous paradigm. Asynchronous NoC’s, therefore, enable a modular and extensible system composition for an ‘object-orient’ design style.
The thesis aims to significantly advance the state-of-art and viability of asynchronous and globally-asynchronous locally-synchronous (GALS) networks-on-chip, to enable high-performance and low-energy systems. The proposed asynchronous NoC’s are nearly entirely based on standard cells, which eases their integration into industrial design flows. The contributions are instantiated in three different directions.
First, practical acceleration techniques are proposed for optimizing the system latency, in order to break through the latency bottleneck in the memory interfaces of many on-chip parallel processors. Novel asynchronous network protocols are proposed, along with concrete NoC designs. A new concept, called ‘monitoring network’, is introduced. Monitoring networks are lightweight shadow networks used for fast-forwarding anticipated traffic information, ahead of the actual packet traffic. The routers are therefore allowed to initiate and perform arbitration and channel allocation in advance. The technique is successfully applied to two topologies which belong to two different categories – a variant mesh-of-trees (MoT) structure and a 2D-mesh topology. Considerable and stable latency improvements are observed across a wide range of traffic patterns, along with moderate throughput gains.
Second, for the first time, a high-performance and low-power asynchronous NoC router is compared directly to a leading commercial synchronous counterpart in an advanced industrial technology. The asynchronous router design shows significant performance improvements, as well as area and power savings. The proposed asynchronous router integrates several advanced techniques, including a low-latency circular FIFO for buffer design, and a novel end-to-end credit-based virtual channel (VC) flow control. In addition, a semi-automated design flow is created, which uses portions of a standard synchronous tool flow.
Finally, a high-performance multi-resource asynchronous arbiter design is developed. This small but important component can be directly used in existing asynchronous NoC’s for performance optimization. In addition, this standalone design promises use in opening up new NoC directions, as well as for general use in parallel systems. In the proposed arbiter design, the allocation of a resource to a client is divided into several steps. Multiple successive client-resource pairs can be selected rapidly in pipelined sequence, and the completion of the assignments can overlap in parallel.
In sum, the thesis provides a set of advanced design solutions for performance optimization of asynchronous and GALS networks-on-chip. These solutions are at different levels, from network protocols, down to router- and component-level optimizations, which can be directly applied to existing basic asynchronous NoC designs to provide a leap in performance improvement
High-Level Design for Ultra-Fast Software Defined Radio Prototyping on Multi-Processors Heterogeneous Platforms
International audienceThe design of Software Defined Radio (SDR) equipments (terminals, base stations, etc.) is still very challenging. We propose here a design methodology for ultra-fast prototyping on heterogeneous platforms made of GPPs (General Purpose Processors), DSPs (Digital Signal Processors) and FPGAs (Field Programmable Gate Array). Lying on a component-based approach, the methodology mainly aims at automating as much as possible the design from an algorithmic validation to a multi-processing heterogeneous implementation. The proposed methodology is based on the SynDEx CAD design approach, which was originally dedicated to multi-GPPs networks. We show how this was changed so that it is made appropriate with an embedded context of DSP. The implication of FPGAs is then addressed and integrated in the design approach with very little restrictions. Apart from a manual HW/SW partitioning, all other operations may be kept automatic in a heterogeneous processing context. The targeted granularity of the components, which are to be assembled in the design flow, is roughly the same size as that of a FFT, a filter or a Viterbi decoder for instance. The re-use of third party or pre-developed IPs is a basis for this design approach. Thanks to the proposed design methodology it is possible to port "ultra" fast a radio application over several platforms. In addition, the proposed design methodology is not restricted to SDR equipment design, and can be useful for any real-time embedded heterogeneous design in a prototyping context
Parallel Architectures for Many-Core Systems-On-Chip in Deep Sub-Micron Technology
Despite the several issues faced in the past, the evolutionary trend of silicon has kept its constant pace. Today an ever increasing number of cores is integrated onto the same die. Unfortunately, the extraordinary performance achievable by the many-core paradigm is limited by several factors. Memory bandwidth limitation, combined with inefficient synchronization mechanisms, can severely overcome the potential computation capabilities. Moreover, the huge HW/SW design space requires accurate and flexible tools to perform architectural explorations and validation of design choices.
In this thesis we focus on the aforementioned aspects: a flexible and accurate Virtual Platform has been developed, targeting a reference many-core architecture. Such tool has been used to perform architectural explorations, focusing on instruction caching architecture and hybrid HW/SW synchronization mechanism. Beside architectural implications, another issue of embedded systems is considered: energy efficiency. Near Threshold Computing is a key research area in the Ultra-Low-Power domain, as it promises a tenfold improvement in energy efficiency compared to super-threshold operation and it mitigates thermal bottlenecks. The physical implications of modern deep sub-micron technology are severely limiting performance and reliability of modern designs. Reliability becomes a major obstacle when operating in NTC, especially memory operation becomes unreliable and can compromise system correctness. In the present work a novel hybrid memory architecture is devised to overcome reliability issues and at the same time improve energy efficiency by means of aggressive voltage scaling when allowed by workload requirements. Variability is another great drawback of near-threshold operation. The greatly increased sensitivity to threshold voltage variations in today a major concern for electronic devices. We introduce a variation-tolerant extension of the baseline many-core architecture. By means of micro-architectural knobs and a lightweight runtime control unit, the baseline architecture becomes dynamically tolerant to variations
Reconfigurable architectures for the next generation of mobile device telecommunications systems
Mobile devices have become a dominant tool in our daily lives. Business and
personal usage has escalated tremendously since the emergence of smartphones
and tablets. The combination of powerful processing in mobile devices, such as
smartphones and the Internet, have established a new era for communications
systems. This has put further pressure on the performance and efficiency of
telecommunications systems in delivering the aspirations of users. Mobile device
users no longer want devices that merely perform phone calls and messaging.
Rather, they look for further interactive applications such as video streaming,
navigation and real time social interaction. Such applications require a new set of
hardware and standards. The WiFi (IEEE 802.11) standard has been at the forefront
of reliable and high-speed internet access telecommunications. This is due to its
high signal quality (quality of service) and speed (throughput). However, its limited
availability and short range highlights the need for further protocols, in particular
when far away from access points or base stations. This led to the emergence of 3G
followed by 4G and the upcoming 5G standard that, if fully realised, will provide
another dimension in “anywhere, anytime internet connectivity.” On the other
hand, the WiMAX (IEEE 802.16) standard promises to exceed the WiFi signal
coverage range. The coverage range could be extended to kilometres at least with a
better or similar WiFi signal level.
This thesis considers a dynamically reconfigurable architecture that is capable of
processing various modules within telecommunications systems. Forward error
correction, coder and navigation modules are deployed in a unified low power
communication platform. These modules have been selected since they are among
those with the highest demand in terms of processing power, strict processing time
or throughput. The modules are mainly realised within WiFi and WiMAX systems
in addition to global positioning systems (GPS). The idea behind the selection of
these modules is to investigate the possibility of designing an architecture capable
of processing various systems and dynamically reconfiguring between them. The
GPS system is a power-hungry application and, at the same time, it is not needed
all of the time. Hence, one key idea presented in this thesis is to effectively exploit
the dynamic reconfiguration capability so as to reconfigure the architecture (GPS)
when it is not needed in order to process another needed application or function
such as WiFi or WiMAX. This will allow lower energy consumption and the
optimum usage of the hardware available on the device.
This work investigates the major current coarse-grain reconfigurable architectures.
A novel multi-rate convolution encoder is then designed and realised as a
reconfigurable fabric. This demonstrates the ability to adapt the algorithms
involved to meet various requirements. A throughput of between 200 and 800
Mbps has been achieved for the rates 1/2 to 7/8, which is a great achievement for
the proposed novel architecture. A reconfigurable interleaver is designed as a
standalone fabric and on a dynamically reconfigurable processor. High throughputs
exceeding 90 Mbps are achieved for the various supported block sizes. The Reed
Solomon coder is the next challenging system to be designed into a dynamically
reconfigurable processor. A novel Galois Field multiplier is designed and
integrated into the developed Reed Solomon reconfigurable processor. As a result
of this work, throughputs of 200Mbps and 93Mbps respectively for RS encoding
and decoding are achieved. A GPS correlation module is also investigated in this
work. This is the main part of the GPS receiver responsible for continuously
tracking GPS satellites and extracting messages from them. The challenging aspect
of this part is its real-time nature and the associated critical time constraints. This
work resulted in a novel dynamically reconfigurable multi-channel GPS correlator
with up to 72 simultaneous channels.
This work is a contribution towards a global unified processing platform that is
capable of processing communication-related operations efficiently and
dynamically with minimum energy consumption
- …