5,461 research outputs found

    Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations

    Full text link
    Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to be one of the key technologies in next-generation multi-user cellular systems, based on the upcoming 3GPP LTE Release 12 standard, for example. In this work, we propose - to the best of our knowledge - the first VLSI design enabling high-throughput data detection in single-carrier frequency-division multiple access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate matrix inversion algorithm relying on a Neumann series expansion, which substantially reduces the complexity of linear data detection. We analyze the associated error, and we compare its performance and complexity to those of an exact linear detector. We present corresponding VLSI architectures, which perform exact and approximate soft-output detection for large-scale MIMO systems with various antenna/user configurations. Reference implementation results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale MIMO system. We finally provide a performance/complexity trade-off comparison using the presented FPGA designs, which reveals that the detector circuit of choice is determined by the ratio between BS antennas and users, as well as the desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin

    Randomized protocols for asynchronous consensus

    Full text link
    The famous Fischer, Lynch, and Paterson impossibility proof shows that it is impossible to solve the consensus problem in a natural model of an asynchronous distributed system if even a single process can fail. Since its publication, two decades of work on fault-tolerant asynchronous consensus algorithms have evaded this impossibility result by using extended models that provide (a) randomization, (b) additional timing assumptions, (c) failure detectors, or (d) stronger synchronization mechanisms than are available in the basic model. Concentrating on the first of these approaches, we illustrate the history and structure of randomized asynchronous consensus protocols by giving detailed descriptions of several such protocols.Comment: 29 pages; survey paper written for PODC 20th anniversary issue of Distributed Computin

    GPU accelerated Monte Carlo simulation of Brownian motors dynamics with CUDA

    Full text link
    This work presents an updated and extended guide on methods of a proper acceleration of the Monte Carlo integration of stochastic differential equations with the commonly available NVIDIA Graphics Processing Units using the CUDA programming environment. We outline the general aspects of the scientific computing on graphics cards and demonstrate them with two models of a well known phenomenon of the noise induced transport of Brownian motors in periodic structures. As a source of fluctuations in the considered systems we selected the three most commonly occurring noises: the Gaussian white noise, the white Poissonian noise and the dichotomous process also known as a random telegraph signal. The detailed discussion on various aspects of the applied numerical schemes is also presented. The measured speedup can be of the astonishing order of about 3000 when compared to a typical CPU. This number significantly expands the range of problems solvable by use of stochastic simulations, allowing even an interactive research in some cases.Comment: 21 pages, 5 figures; Comput. Phys. Commun., accepted, 201

    Systematic Analysis of Majorization in Quantum Algorithms

    Get PDF
    Motivated by the need to uncover some underlying mathematical structure of optimal quantum computation, we carry out a systematic analysis of a wide variety of quantum algorithms from the majorization theory point of view. We conclude that step-by-step majorization is found in the known instances of fast and efficient algorithms, namely in the quantum Fourier transform, in Grover's algorithm, in the hidden affine function problem, in searching by quantum adiabatic evolution and in deterministic quantum walks in continuous time solving a classically hard problem. On the other hand, the optimal quantum algorithm for parity determination, which does not provide any computational speed-up, does not show step-by-step majorization. Lack of both speed-up and step-by-step majorization is also a feature of the adiabatic quantum algorithm solving the 2-SAT ``ring of agrees'' problem. Furthermore, the quantum algorithm for the hidden affine function problem does not make use of any entanglement while it does obey majorization. All the above results give support to a step-by-step Majorization Principle necessary for optimal quantum computation.Comment: 15 pages, 14 figures, final versio

    Low-power digital processor for wireless sensor networks

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 69-72).In order to make sensor networks cost-effective and practical, the electronic components of a wireless sensor node need to run for months to years on the same battery. This thesis explores the design of a low-power digital processor for these sensor nodes, employing techniques such as hardwired algorithms, lowered supply voltages, clock gating and subsystem shutdown. Prototypes were built on both a FPGA and ASIC platform, in order to verify functionality and characterize power consumption. The resulting 0.18[micro]m silicon fabricated in National Semiconductor Corporation's process was operational for supply voltages ranging from 0.5V to 1.8V. At the lowest operating voltage of 0.5V and a frequency of 100KHz, the chip performs 8 full-accuracy FFT computations per second and draws 1.2nJ of total energy per cycle. Although this energy/cycle metric does not surpass existing low-energy processors demonstrated in literature or commercial products, several low-power techniques are suggested that could drastically improve the energy metrics of a future implementation.by Daniel Frederic Finchelstein.S.M

    Register-transfer-level power profiling for system-on-chip power distribution network design and signoff

    Get PDF
    Abstract. This thesis is a study of how register-transfer-level (RTL) power profiling can help the design and signoff of power distribution network in digital integrated circuits. RTL power profiling is a method which collects RTL power estimation results to a single power profile which then can be analysed in order to find interesting time windows for specifying power distribution network design and signoff. The thesis starts with theory part. Complementary metal-oxide semiconductor (CMOS) inverter power dissipation is studied at first. Next, power distribution network structure and voltage drop problems are introduced. Voltage drop is demonstrated by using power distribution network impedance figures. Common on-chip power distribution network structure is introduced, and power distribution network design flow is outlined. Finally, decoupling capacitors function and impact on power distribution network impedance are thoroughly explained. The practical part of the thesis contains RTL power profiling flow details and power profiling flow results for one simulation case in one design block. Also, some methods of improving RTL power estimation accuracy are discussed and calibration with extracted parasitic is then used to get new set of power profiling time windows. After the results are presented, overall RTL power estimation accuracy is analysed and resulted time windows are compared to reference gate-level time windows. RTL power profiling result analysis shows that resulted time windows match the theory and RTL power profiling seems to be a promising method for finding time windows for power distribution network design and signoff.Rekisterisiirtotason tehoprofilointi järjestelmäpiirin tehonsiirtoverkon suunnittelussa ja verifioinnissa. Tiivistelmä. Tässä työssä tutkitaan, miten rekisterisiirtotason (RTL) tehoprofilointi voi auttaa digitaalisten integroitujen piirien tehonsiirtoverkon suunnittelussa ja verifioinnissa. RTL-tehoprofilointi on menetelmä, joka analysoi RTL-tehoestimoinnista saadusta tehokäyrästä hyödyllisiä aikaikkunoita tehonsiirtoverkon suunnitteluun ja verifiointiin. Työ alkaa teoriaosuudella, jonka aluksi selitetään, miten CMOS-invertteri kuluttaa tehoa. Seuravaksi esitellään tehonsiirtoverkon rakenne ja pahimmat tehonsiirtoverkon jännitehäviön aiheuttajat. Jännitehäviötä havainnollistetaan myös piirikaavioiden ja impedanssikäyrien avustuksella. Lisäksi integroidun piirin tehonsiirtoverkon suunnitteluvuo ja yleisin rakenne on esitelty. Lopuksi teoriaosuus käsittelee yksityiskohtaisesti ohituskondensaattoreiden toiminnan ja vaikutuksen tehonsiirtoverkon kokonaisimpedanssiin. Työn kokeellisessa osuudessa esitellään ensin tehoprofiloinnin vuo ja sen jälkeen vuon tulokset yhdelle esimerkkilohkolle yhdessä simulaatioajossa. Lisäksi tässä osiossa käsitellään RTL-tehoestimoinnin tarkkuutta ja tehdään RTL-tehoprofilointi loisimpedansseilla kalibroidulle RTL-mallille. Lopuksi RTL-tehoestimoinnin tuloksia ja saatuja RTL-tehoprofiloinnin aikaikkunoita analysoidaan ja verrataan porttitason mallin tuloksiin. RTL-tehoprofiloinnin tulosten analysointi osoittaa, että saatavat aikaikkunat vastaavat teoriaa ja että RTL-tehoprofilointi näyttää lupaavalta menetelmältä tehosiirtoverkon analysoinnin ja verifioinnin aikaikkunoiden löytämiseen

    A high-accuracy optical linear algebra processor for finite element applications

    Get PDF
    Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced
    corecore