5,461 research outputs found
Large-Scale MIMO Detection for 3GPP LTE: Algorithms and FPGA Implementations
Large-scale (or massive) multiple-input multiple-output (MIMO) is expected to
be one of the key technologies in next-generation multi-user cellular systems,
based on the upcoming 3GPP LTE Release 12 standard, for example. In this work,
we propose - to the best of our knowledge - the first VLSI design enabling
high-throughput data detection in single-carrier frequency-division multiple
access (SC-FDMA)-based large-scale MIMO systems. We propose a new approximate
matrix inversion algorithm relying on a Neumann series expansion, which
substantially reduces the complexity of linear data detection. We analyze the
associated error, and we compare its performance and complexity to those of an
exact linear detector. We present corresponding VLSI architectures, which
perform exact and approximate soft-output detection for large-scale MIMO
systems with various antenna/user configurations. Reference implementation
results for a Xilinx Virtex-7 XC7VX980T FPGA show that our designs are able to
achieve more than 600 Mb/s for a 128 antenna, 8 user 3GPP LTE-based large-scale
MIMO system. We finally provide a performance/complexity trade-off comparison
using the presented FPGA designs, which reveals that the detector circuit of
choice is determined by the ratio between BS antennas and users, as well as the
desired error-rate performance.Comment: To appear in the IEEE Journal of Selected Topics in Signal Processin
Randomized protocols for asynchronous consensus
The famous Fischer, Lynch, and Paterson impossibility proof shows that it is
impossible to solve the consensus problem in a natural model of an asynchronous
distributed system if even a single process can fail. Since its publication,
two decades of work on fault-tolerant asynchronous consensus algorithms have
evaded this impossibility result by using extended models that provide (a)
randomization, (b) additional timing assumptions, (c) failure detectors, or (d)
stronger synchronization mechanisms than are available in the basic model.
Concentrating on the first of these approaches, we illustrate the history and
structure of randomized asynchronous consensus protocols by giving detailed
descriptions of several such protocols.Comment: 29 pages; survey paper written for PODC 20th anniversary issue of
Distributed Computin
GPU accelerated Monte Carlo simulation of Brownian motors dynamics with CUDA
This work presents an updated and extended guide on methods of a proper
acceleration of the Monte Carlo integration of stochastic differential
equations with the commonly available NVIDIA Graphics Processing Units using
the CUDA programming environment. We outline the general aspects of the
scientific computing on graphics cards and demonstrate them with two models of
a well known phenomenon of the noise induced transport of Brownian motors in
periodic structures. As a source of fluctuations in the considered systems we
selected the three most commonly occurring noises: the Gaussian white noise,
the white Poissonian noise and the dichotomous process also known as a random
telegraph signal. The detailed discussion on various aspects of the applied
numerical schemes is also presented. The measured speedup can be of the
astonishing order of about 3000 when compared to a typical CPU. This number
significantly expands the range of problems solvable by use of stochastic
simulations, allowing even an interactive research in some cases.Comment: 21 pages, 5 figures; Comput. Phys. Commun., accepted, 201
Systematic Analysis of Majorization in Quantum Algorithms
Motivated by the need to uncover some underlying mathematical structure of
optimal quantum computation, we carry out a systematic analysis of a wide
variety of quantum algorithms from the majorization theory point of view. We
conclude that step-by-step majorization is found in the known instances of fast
and efficient algorithms, namely in the quantum Fourier transform, in Grover's
algorithm, in the hidden affine function problem, in searching by quantum
adiabatic evolution and in deterministic quantum walks in continuous time
solving a classically hard problem. On the other hand, the optimal quantum
algorithm for parity determination, which does not provide any computational
speed-up, does not show step-by-step majorization. Lack of both speed-up and
step-by-step majorization is also a feature of the adiabatic quantum algorithm
solving the 2-SAT ``ring of agrees'' problem. Furthermore, the quantum
algorithm for the hidden affine function problem does not make use of any
entanglement while it does obey majorization. All the above results give
support to a step-by-step Majorization Principle necessary for optimal quantum
computation.Comment: 15 pages, 14 figures, final versio
Low-power digital processor for wireless sensor networks
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 69-72).In order to make sensor networks cost-effective and practical, the electronic components of a wireless sensor node need to run for months to years on the same battery. This thesis explores the design of a low-power digital processor for these sensor nodes, employing techniques such as hardwired algorithms, lowered supply voltages, clock gating and subsystem shutdown. Prototypes were built on both a FPGA and ASIC platform, in order to verify functionality and characterize power consumption. The resulting 0.18[micro]m silicon fabricated in National Semiconductor Corporation's process was operational for supply voltages ranging from 0.5V to 1.8V. At the lowest operating voltage of 0.5V and a frequency of 100KHz, the chip performs 8 full-accuracy FFT computations per second and draws 1.2nJ of total energy per cycle. Although this energy/cycle metric does not surpass existing low-energy processors demonstrated in literature or commercial products, several low-power techniques are suggested that could drastically improve the energy metrics of a future implementation.by Daniel Frederic Finchelstein.S.M
Register-transfer-level power profiling for system-on-chip power distribution network design and signoff
Abstract. This thesis is a study of how register-transfer-level (RTL) power profiling can help the design and signoff of power distribution network in digital integrated circuits. RTL power profiling is a method which collects RTL power estimation results to a single power profile which then can be analysed in order to find interesting time windows for specifying power distribution network design and signoff.
The thesis starts with theory part. Complementary metal-oxide semiconductor (CMOS) inverter power dissipation is studied at first. Next, power distribution network structure and voltage drop problems are introduced. Voltage drop is demonstrated by using power distribution network impedance figures. Common on-chip power distribution network structure is introduced, and power distribution network design flow is outlined. Finally, decoupling capacitors function and impact on power distribution network impedance are thoroughly explained.
The practical part of the thesis contains RTL power profiling flow details and power profiling flow results for one simulation case in one design block. Also, some methods of improving RTL power estimation accuracy are discussed and calibration with extracted parasitic is then used to get new set of power profiling time windows. After the results are presented, overall RTL power estimation accuracy is analysed and resulted time windows are compared to reference gate-level time windows. RTL power profiling result analysis shows that resulted time windows match the theory and RTL power profiling seems to be a promising method for finding time windows for power distribution network design and signoff.Rekisterisiirtotason tehoprofilointi järjestelmäpiirin tehonsiirtoverkon suunnittelussa ja verifioinnissa. Tiivistelmä. Tässä työssä tutkitaan, miten rekisterisiirtotason (RTL) tehoprofilointi voi auttaa digitaalisten integroitujen piirien tehonsiirtoverkon suunnittelussa ja verifioinnissa. RTL-tehoprofilointi on menetelmä, joka analysoi RTL-tehoestimoinnista saadusta tehokäyrästä hyödyllisiä aikaikkunoita tehonsiirtoverkon suunnitteluun ja verifiointiin.
Työ alkaa teoriaosuudella, jonka aluksi selitetään, miten CMOS-invertteri kuluttaa tehoa. Seuravaksi esitellään tehonsiirtoverkon rakenne ja pahimmat tehonsiirtoverkon jännitehäviön aiheuttajat. Jännitehäviötä havainnollistetaan myös piirikaavioiden ja impedanssikäyrien avustuksella. Lisäksi integroidun piirin tehonsiirtoverkon suunnitteluvuo ja yleisin rakenne on esitelty. Lopuksi teoriaosuus käsittelee yksityiskohtaisesti ohituskondensaattoreiden toiminnan ja vaikutuksen tehonsiirtoverkon kokonaisimpedanssiin.
Työn kokeellisessa osuudessa esitellään ensin tehoprofiloinnin vuo ja sen jälkeen vuon tulokset yhdelle esimerkkilohkolle yhdessä simulaatioajossa. Lisäksi tässä osiossa käsitellään RTL-tehoestimoinnin tarkkuutta ja tehdään RTL-tehoprofilointi loisimpedansseilla kalibroidulle RTL-mallille. Lopuksi RTL-tehoestimoinnin tuloksia ja saatuja RTL-tehoprofiloinnin aikaikkunoita analysoidaan ja verrataan porttitason mallin tuloksiin. RTL-tehoprofiloinnin tulosten analysointi osoittaa, että saatavat aikaikkunat vastaavat teoriaa ja että RTL-tehoprofilointi näyttää lupaavalta menetelmältä tehosiirtoverkon analysoinnin ja verifioinnin aikaikkunoiden löytämiseen
A high-accuracy optical linear algebra processor for finite element applications
Optical linear processors are computationally efficient computers for solving matrix-matrix and matrix-vector oriented problems. Optical system errors limit their dynamic range to 30-40 dB, which limits their accuray to 9-12 bits. Large problems, such as the finite element problem in structural mechanics (with tens or hundreds of thousands of variables) which can exploit the speed of optical processors, require the 32 bit accuracy obtainable from digital machines. To obtain this required 32 bit accuracy with an optical processor, the data can be digitally encoded, thereby reducing the dynamic range requirements of the optical system (i.e., decreasing the effect of optical errors on the data) while providing increased accuracy. This report describes a new digitally encoded optical linear algebra processor architecture for solving finite element and banded matrix-vector problems. A linear static plate bending case study is described which quantities the processor requirements. Multiplication by digital convolution is explained, and the digitally encoded optical processor architecture is advanced
- …