151 research outputs found

    State of the art baseband DSP platforms for Software Defined Radio: A survey

    Get PDF
    Software Defined Radio (SDR) is an innovative approach which is becoming a more and more promising technology for future mobile handsets. Several proposals in the field of embedded systems have been introduced by different universities and industries to support SDR applications. This article presents an overview of current platforms and analyzes the related architectural choices, the current issues in SDR, as well as potential future trends.Peer reviewe

    Feasibility study of multiantenna transmitter baseband processing on customized processor core in wireless local area devices

    Get PDF
    The world of wireless communications is governed by a wide variety of the standards, each tailored to its specific applications and targets. The IEEE802.11 family is one of those standards which is specifically created and maintained by IEEE committee to im-plement the Wireless Local Area Network (WLAN) communication. By notably rapid growth of devices which exploit the WLAN technology and increasing demand for rich multimedia functionalities and broad Internet access, the WLAN technology should be necessarily enhanced to support the required specifications. In this regard, IEEE802.11ac, the latest amendment of the WLAN technology, was released which is taking advantage of the previous draft versions while benefiting from certain changes especially to the PHY layer to satisfy the promised requirements. This thesis evaluates the feasibility of software-based implementation for the MIMO transmitter baseband processing conforming to the IEEE802.11ac standard on a DSP core with vector extensions. The transmitter is implemented in four different transmis-sion scenarios which include 2x2 and 4x4 MIMO configurations, yielding beyond 1Gbps transmit bit rate. The implementation is done for the frequency-domain pro-cessing and real-time operation has been achieved when running at a clock fre-quency of 500MHz. The developed software solution is evaluated by profiling and analysing the imple-mentation using the tools provided by the vendor. We have presented the results with regards to number of clock cycles, power and energy consumption, and memory usage. The performance analysis shows that the SDR based implementation provides improved flexibility and reduced design effort compared to conventional approaches while main-taining power consumption close to fixed-function hardware solutions

    Datacenter Design for Future Cloud Radio Access Network.

    Full text link
    Cloud radio access network (C-RAN), an emerging cloud service that combines the traditional radio access network (RAN) with cloud computing technology, has been proposed as a solution to handle the growing energy consumption and cost of the traditional RAN. Through aggregating baseband units (BBUs) in a centralized cloud datacenter, C-RAN reduces energy and cost, and improves wireless throughput and quality of service. However, designing a datacenter for C-RAN has not yet been studied. In this dissertation, I investigate how a datacenter for C-RAN BBUs should be built on commodity servers. I first design WiBench, an open-source benchmark suite containing the key signal processing kernels of many mainstream wireless protocols, and study its characteristics. The characterization study shows that there is abundant data level parallelism (DLP) and thread level parallelism (TLP). Based on this result, I then develop high performance software implementations of C-RAN BBU kernels in C++ and CUDA for both CPUs and GPUs. In addition, I generalize the GPU parallelization techniques of the Turbo decoder to the trellis algorithms, an important family of algorithms that are widely used in data compression and channel coding. Then I evaluate the performance of commodity CPU servers and GPU servers. The study shows that the datacenter with GPU servers can meet the LTE standard throughput with 4× to 16× fewer machines than with CPU servers. A further energy and cost analysis show that GPU servers can save on average 13× more energy and 6× more cost. Thus, I propose the C-RAN datacenter be built using GPUs as a server platform. Next I study resource management techniques to handle the temporal and spatial traffic imbalance in a C-RAN datacenter. I propose a “hill-climbing” power management that combines powering-off GPUs and DVFS to match the temporal C-RAN traffic pattern. Under a practical traffic model, this technique saves 40% of the BBU energy in a GPU-based C-RAN datacenter. For spatial traffic imbalance, I propose three workload distribution techniques to improve load balance and throughput. Among all three techniques, pipelining packets has the most throughput improvement at 10% and 16% for balanced and unbalanced loads, respectively.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/120825/1/qizheng_1.pd

    Efficient implementation of channel estimation algorithm for beamforming

    Get PDF
    Abstract. The future 5G mobile network technology is expected to offer significantly better performance than its predecessors. Improved data rates in conjunction with low latency is believed to enable technological revolutions such as self-driving cars. To achieve faster data rates, MIMO systems can be utilized. These systems enable the use of spatial filtering technique known as beamforming. Beamforming that is based on the preacquired channel matrix is computationally very demanding causing challenges in achieving low latency. By acquiring the channel matrix as efficiently as possible, we can facilitate this challenge. In this thesis we examined the implementation of channel estimation algorithm for beamforming with a digital signal processor specialized in vector computation. We present implementations for different antenna configurations based on three different approaches. The results show that the best performance is achieved by applying the algorithm according to the limitations given by the system and the processor architecture. Although the exploitation of the parallel architecture was proved to be challenging, the implementation of the algorithm would have benefitted from the greater amount of parallelism. The current parallel resources will be a challenge especially in the future as the size of antenna configurations is expected to grow.Keilanmuodostuksen tarvitseman kanavaestimointialgoritmin tehokas toteutus. Tiivistelmä. Tulevan viidennen sukupolven mobiiliverkkoteknologian odotetaan tarjoavan merkittävästi edeltäjäänsä parempaa suorituskykyä. Tämän suorituskyvyn tarjoamat suuret datanopeudet yhdistettynä pieneen latenssiin uskotaan mahdollistavan esimerkiksi itsestään ajavat autot. Suurempien datanopeuksien saavuttamiseksi voidaan hyödyntää monitiekanavassa käytettävää MIMO-systeemiä, joka mahdollistaa keilanmuodostuksena tunnetun spatiaalisen suodatusmenetelmän käytön. Etukäteen hankittuun kanavatilatietoon perustuva keilanmuodostus on laskennallisesti erittäin kallista. Tämä aiheuttaa haasteita verkon pienen latenssivaatimuksen saavuttamisessa. Tässä työssä tutkittiin keilanmuodostukselle tarkoitetun kanavaestimointialgoritmin tehokasta toteutusta hyödyntäen vektorilaskentaan erikoistunutta prosessoriarkkitehtuuria. Työssä esitellään kolmea eri lähestymistapaa hyödyntävät toteutukset eri kokoisille antennikonfiguraatioille. Tuloksista nähdään, että paras suorituskyky saavutetaan sovittamalla algoritmi järjestelmän ja arkkitehtuurin asettamien rajoitusten mukaisesti. Vaikka rinnakkaisarkkitehtuurin hyödyntäminen asetti omat haasteensa, olisi algoritmin toteutus hyötynyt suuremmasta rinnakkaisuuden määrästä. Nykyinen rinnakkaisuuden määrä tulee olemaan haaste erityisesti tulevaisuudessa, sillä antennikonfiguraatioiden koon odotetaan kasvavan

    Multipass communication systems for tiled processor architectures

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 191-202).Multipass communication systems utilize multiple sets of parallel baseband receiver functions to balance communication data rates and available computation capabilities. This is achieved by spatially pipelining baseband functions across parallel resources to perform multiple processing passes on the same set of received values, thus allowing the system to simultaneously convey multiple sequences of data using a single wireless link. The use of multiple passes mitigates the effects of data rate on receiver processing bottlenecks, making the use of general-purpose processing elements for high data rate communication functions viable. The flexibility of general-purpose processing, in turn, allows the receiver composition to trade-off resource usage and required processing rate. For instance, a communication system could be distributed across 2 passes using 2x the overall area, but reducing the data rate for each pass and the resultant overall required processing rate, and hence clock speed, by 1/2. Lowering the clock speed can also be leveraged to reduce power through voltage scaling and/or the use of higher Vt devices. The characteristics of general-purpose parallel processors for communications processing are explored, as well as the applicability of specific parallel designs to communications processing.(Cont.) In particular, an in depth look is taken of the Raw processor's tiled architecture as a general-purpose parallel processor particularly well suited to portable communications processing. An example of a multipass system, based on the 802.11a baseband, implemented on the Raw processor along with the accompanying hardware implementation is presented as both a proof-of-concept, as well as a means to explore some of the advantages and trade-offs of such a system. A bit-error rate study is presented which shows this multipass system to be within a small fraction of dB of the performance of an equivalent data rate single pass system, thus demonstrating the viability of the multipass algorithm. In addition, the capability of tiled processors to maximize processing capabilities at the system block level, as well as the system architectural level, is shown. Parallel implementations of two processing intensive functions: the FFT and the Viterbi decoder are shown. A parallelized assembly language FFT utilizing 16 tiles is shown to have a 1,000x improvement , and a parallelized 48-tile assembly language Viterbi decoder is shown to have a 10, 000x improvement over corresponding serial C implementations.by Nathan Robert Shnidman.Ph.D
    corecore