28 research outputs found

    An Approach Based on Edge Coloring of Tripartite Graph for Designing Parallel LDPC Interleaver Architecture

    No full text
    International audienceA practical and feasible solution for LDPC decoder is to design partially-parallel hardware architecture. These architectures are efficient in terms of area, cost, flexibility and performances. However, this type of architecture is complex to design since concurrent read and write accesses to data have to be performed at each time instance without any conflict. To solve this memory mapping problem, we present in this paper, an original approach based on a tripartite graph modeling and a modified edge coloring algorithm to design parallel LDPC interleaver architecture

    Design of Parallel LDPC Interleaver Architecture: A Bipartite Edge Coloring Approach

    No full text
    International audienceParallel hardware architecture proves to be an excellent compromise between area, cost, flexibility and high throughput in the hardware design of LDPC decoder. However, this type of architecture suffers from memory mapping problem: concurrent read and write accesses to data have to be performed at each time instance without any conflict. In this paper, we present an original approach based on the tanner graph modeling and a modified bipartite edge coloring algorithm to design parallel LDPC interleaver architecture

    High-Throughput Contention-Free Concurrent Interleaver Architecture for Multi-Standard Turbo Decoder

    Get PDF
    To meet the higher data rate requirement of emerging wireless communication technology, numerous parallel turbo decoder architectures have been developed. However, the interleaver has become a major bottleneck that limits the achievable throughput in the parallel decoders due to the massive memory conflicts. In this paper, we propose a flexible Double-Buffer based Contention-Free (DBCF) interleaver architecture that can efficiently solve the memory conflict problem for parallel turbo decoders with very high parallelism. The proposed DBCF architecture enables high throughput concurrent interleaving for multi-standard turbo decoders that support UMTS/HSPA+, LTE and WiMAX, with small datapath delays and low hardware cost. We implemented the DBCF interleaver with a 65nm CMOS technology. The implementation of this highly efficient DBCF interleaver architecture shows significant improvement in terms of the maximum throughput and occupied chip area compared to the previous work.HuaweiNational Science Foundatio

    Reconfiguration of field programmable logic in embedded systems

    Get PDF

    Portable Waveform Development for Software Defined Radios

    Get PDF
    This work focuses on the question: "How can we build waveforms that can be moved from one platform to another?\u27\u27 Therefore an approach based on the Model Driven Architecture was evaluated. Furthermore, a proof of concept is given with the port of a TETRA waveform from a USRP platform to an SFF SDR platform

    Energy-efficient design and implementation of turbo codes for wireless sensor network

    No full text
    The objective of this thesis is to apply near Shannon limit Error-Correcting Codes (ECCs), particularly the turbo-like codes, to energy-constrained wireless devices, for the purpose of extending their lifetime. Conventionally, sophisticated ECCs are applied to applications, such as mobile telephone networks or satellite television networks, to facilitate long range and high throughput wireless communication. For low power applications, such as Wireless Sensor Networks (WSNs), these ECCs were considered due to their high decoder complexities. In particular, the energy efficiency of the sensor nodes in WSNs is one of the most important factors in their design. The processing energy consumption required by high complexity ECCs decoders is a significant drawback, which impacts upon the overall energy consumption of the system. However, as Integrated Circuit (IC) processing technology is scaled down, the processing energy consumed by hardware resources reduces exponentially. As a result, near Shannon limit ECCs have recently begun to be considered for use in WSNs to reduce the transmission energy consumption [1,2]. However, to ensure that the transmission energy consumption reduction granted by the employed ECC makes a positive improvement on the overall energy efficiency of the system, the processing energy consumption must still be carefully considered.The main subject of this thesis is to optimise the design of turbo codes at both an algorithmic and a hardware implementation level for WSN scenarios. The communication requirements of the target WSN applications, such as communication distance, channel throughput, network scale, transmission frequency, network topology, etc, are investigated. Those requirements are important factors for designing a channel coding system. Especially when energy resources are limited, the trade-off between the requirements placed on different parameters must be carefully considered, in order to minimise the overall energy consumption. Moreover, based on this investigation, the advantages of employing near Shannon limit ECCs in WSNs are discussed. Low complexity and energy-efficient hardware implementations of the ECC decoders are essential for the target applications

    BASEBAND RADIO MODEM DESIGN USING GRAPHICS PROCESSING UNITS

    Get PDF
    A modern radio or wireless communications transceiver is programmed via software and firmware to change its functionalities at the baseband. However, the actual implementation of the radio circuits relies on dedicated hardware, and the design and implementation of such devices are time consuming and challenging. Due to the need for real-time operation, dedicated hardware is preferred in order to meet stringent requirements on throughput and latency. With increasing need for higher throughput and shorter latency, while supporting increasing bandwidth across a fragmented spectrum, dedicated subsystems are developed in order to service individual frequency bands and specifications. Such a dedicated-hardware-intensive approach leads to high resource costs, including costs due to multiple instantiations of mixers, filters, and samplers. Such increases in hardware requirements in turn increases device size, power consumption, weight, and financial cost. If it can meet the required real-time constraints, a more flexible and reconfigurable design approach, such as a software-based solution, is often more desirable over a dedicated hardware solution. However, significant challenges must be overcome in order to meet constraints on throughput and latency while servicing different frequency bands and bandwidths. Graphics processing unit (GPU) technology provides a promising class of platforms for addressing these challenges. GPUs, which were originally designed for rendering images and video sequences, have been adapted as general purpose high-throughput computation engines for a wide variety of application areas beyond their original target domains. Linear algebra and signal processing acceleration are examples of such application areas. In this thesis, we apply GPUs as software-based, baseband radios and demonstrate novel, software-based implementations of key subsystems in modern wireless transceivers. In our work, we develop novel implementation techniques that allow communication system designers to use GPUs as accelerators for baseband processing functions, including real-time filtering and signal transformations. More specifically, we apply GPUs to accelerate several computationally-intensive, frontend radio subsystems, including filtering, signal mixing, sample rate conversion, and synchronization. These are critical subsystems that must operate in real-time to reliably receive waveforms. The contributions of this thesis can be broadly organized into 3 major areas: (1) channelization, (2) arbitrary resampling, and (3) synchronization. 1. Channelization: a wideband signal is shared between different users and channels, and a channelizer is used to separate the components of the shared signal in the different channels. A channelizer is often used as a pre-processing step in selecting a specific channel-of-interest. A typical channelization process involves signal conversion, resampling, and filtering to reject adjacent channels. We investigate GPU acceleration for a particularly efficient form of channelizer called a polyphase filterbank channelizer, and demonstrate a real-time implementation of our novel channelizer design. 2. Arbitrary resampling: following a channelization process, a signal is often resampled to at least twice the data rate in order to further condition the signal. Since different communication standards require different resampling ratios, it is desirable for a resampling subsystem to support a variety of different ratios. We investigate optimized, GPU-based methods for resampling using polyphase filter structures that are mapped efficiently into GPU hardware. We investigate these GPU implementation techniques in the context of interpolation (integer-factor increases in sampling rate), decimation (integer-factor decreases in sampling rate), and rational resampling. Finally, we demonstrate an efficient implementation of arbitrary resampling using GPUs. This implementation exploits specialized hardware units within the GPU to enable efficient and accurate resampling processes involving arbitrary changes in sample rate. 3. Synchronization: incoming signals in a wireless communications transceiver must be synchronized in order to recover the transmitted data properly from complex channel effects such as thermal noise, fading, and multipath propagation. We investigate timing recovery in GPUs to accelerate the most computationally intensive part of the synchronization process, and correctly align the incoming data symbols in the receiver. Furthermore, we implement fully-parallel timing error detection to accelerate maximum likelihood estimation

    Doctor of Philosophy

    Get PDF
    dissertationThe embedded system space is characterized by a rapid evolution in the complexity and functionality of applications. In addition, the short time-to-market nature of the business motivates the use of programmable devices capable of meeting the conflicting constraints of low-energy, high-performance, and short design times. The keys to achieving these conflicting constraints are specialization and maximally extracting available application parallelism. General purpose processors are flexible but are either too power hungry or lack the necessary performance. Application-specific integrated circuits (ASICS) efficiently meet the performance and power needs but are inflexible. Programmable domain-specific architectures (DSAs) are an attractive middle ground, but their design requires significant time, resources, and expertise in a variety of specialties, which range from application algorithms to architecture and ultimately, circuit design. This dissertation presents CoGenE, a design framework that automates the design of energy-performance-optimal DSAs for embedded systems. For a given application domain and a user-chosen initial architectural specification, CoGenE consists of a a Compiler to generate execution binary, a simulator Generator to collect performance/energy statistics, and an Explorer that modifies the current architecture to improve energy-performance-area characteristics. The above process repeats automatically until the user-specified constraints are achieved. This removes or alleviates the time needed to understand the application, manually design the DSA, and generate object code for the DSA. Thus, CoGenE is a new design methodology that represents a significant improvement in performance, energy dissipation, design time, and resources. This dissertation employs the face recognition domain to showcase a flexible architectural design methodology that creates "ASIC-like" DSAs. The DSAs are instruction set architecture (ISA)-independent and achieve good energy-performance characteristics by coscheduling the often conflicting constraints of data access, data movement, and computation through a flexible interconnect. This represents a significant increase in programming complexity and code generation time. To address this problem, the CoGenE compiler employs integer linear programming (ILP)-based 'interconnect-aware' scheduling techniques for automatic code generation. The CoGenE explorer employs an iterative technique to search the complete design space and select a set of energy-performance-optimal candidates. When compared to manual designs, results demonstrate that CoGenE produces superior designs for three application domains: face recognition, speech recognition and wireless telephony. While CoGenE is well suited to applications that exhibit a streaming behavior, multithreaded applications like ray tracing present a different but important challenge. To demonstrate its generality, CoGenE is evaluated in designing a novel multicore N-wide SIMD architecture, known as StreamRay, for the ray tracing domain. CoGenE is used to synthesize the SIMD execution cores, the compiler that generates the application binary, and the interconnection subsystem. Further, separating address and data computations in space reduces data movement and contention for resources, thereby significantly improving performance compared to existing ray tracing approaches

    Récepteur itératif pour les systèmes MIMO-OFDM basé sur le décodage sphérique : convergence, performance et complexité

    Get PDF
    Recently, iterative processing has been widely considered to achieve near-capacity performance and reliable high data rate transmission, for future wireless communication systems. However, such an iterative processing poses significant challenges for efficient receiver design. In this thesis, iterative receiver combining multiple-input multiple-output (MIMO) detection with channel decoding is investigated for high data rate transmission. The convergence, the performance and the computational complexity of the iterative receiver for MIMO-OFDM system are considered. First, we review the most relevant hard-output and soft-output MIMO detection algorithms based on sphere decoding, K-Best decoding, and interference cancellation. Consequently, a low-complexity K-best (LCK- Best) based decoder is proposed in order to substantially reduce the computational complexity without significant performance degradation. We then analyze the convergence behaviors of combining these detection algorithms with various forward error correction codes, namely LTE turbo decoder and LDPC decoder with the help of Extrinsic Information Transfer (EXIT) charts. Based on this analysis, a new scheduling order of the required inner and outer iterations is suggested. The performance of the proposed receiver is evaluated in various LTE channel environments, and compared with other MIMO detection schemes. Secondly, the computational complexity of the iterative receiver with different channel coding techniques is evaluated and compared for different modulation orders and coding rates. Simulation results show that our proposed approaches achieve near optimal performance but more importantly it can substantially reduce the computational complexity of the system. From a practical point of view, fixed-point representation is usually used in order to reduce the hardware costs in terms of area, power consumption and execution time. Therefore, we present efficient fixed point arithmetic of the proposed iterative receiver based on LC-KBest decoder. Additionally, the impact of the channel estimation on the system performance is studied. The proposed iterative receiver is tested in a real-time environment using the MIMO WARP platform.Pour permettre l’accroissement de débit et de robustesse dans les futurs systèmes de communication sans fil, les processus itératifs sont de plus considérés dans les récepteurs. Cependant, l’adoption d’un traitement itératif pose des défis importants dans la conception du récepteur. Dans cette thèse, un récepteur itératif combinant les techniques de détection multi-antennes avec le décodage de canal est étudié. Trois aspects sont considérés dans un contexte MIMOOFDM: la convergence, la performance et la complexité du récepteur. Dans un premier temps, nous étudions les différents algorithmes de détection MIMO à décision dure et souple basés sur l’égalisation, le décodage sphérique, le décodage K-Best et l’annulation d’interférence. Un décodeur K-best de faible complexité (LC-K-Best) est proposé pour réduire la complexité sans dégradation significative des performances. Nous analysons ensuite la convergence de la combinaison de ces algorithmes de détection avec différentes techniques de codage de canal, notamment le décodeur turbo et le décodeur LDPC en utilisant le diagramme EXIT. En se basant sur cette analyse, un nouvel ordonnancement des itérations internes et externes nécessaires est proposé. Les performances du récepteur ainsi proposé sont évaluées dans différents modèles de canal LTE, et comparées avec différentes techniques de détection MIMO. Ensuite, la complexité des récepteurs itératifs avec différentes techniques de codage de canal est étudiée et comparée pour différents modulations et rendement de code. Les résultats de simulation montrent que les approches proposées offrent un bon compromis entre performance et complexité. D’un point de vue implémentation, la représentation en virgule fixe est généralement utilisée afin de réduire les coûts en termes de surface, de consommation d’énergie et de temps d’exécution. Nous présentons ainsi une représentation en virgule fixe du récepteur itératif proposé basé sur le décodeur LC K-Best. En outre, nous étudions l’impact de l’estimation de canal sur la performance du système. Finalement, le récepteur MIMOOFDM itératif est testé sur la plateforme matérielle WARP, validant le schéma proposé
    corecore