66,950 research outputs found
High Performance Sparse Multivariate Polynomials: Fundamental Data Structures and Algorithms
Polynomials may be represented sparsely in an effort to conserve memory usage and provide a succinct and natural representation. Moreover, polynomials which are themselves sparse – have very few non-zero terms – will have wasted memory and computation time if represented, and operated on, densely. This waste is exacerbated as the number of variables increases. We provide practical implementations of sparse multivariate data structures focused on data locality and cache complexity. We look to develop high-performance algorithms and implementations of fundamental polynomial operations, using these sparse data structures, such as arithmetic (addition, subtraction, multiplication, and division) and interpolation. We revisit a sparse arithmetic scheme introduced by Johnson in 1974, adapting and optimizing these algorithms for modern computer architectures, with our implementations over the integers and rational numbers vastly outperforming the current wide-spread implementations. We develop a new algorithm for sparse pseudo-division based on the sparse polynomial division algorithm, with very encouraging results. Polynomial interpolation is explored through univariate, dense multivariate, and sparse multivariate methods. Arithmetic and interpolation together form a solid high-performance foundation from which many higher-level and more interesting algorithms can be built
On digit-recurrence division algorithms for self-timed circuits
The optimization of algorithms for self-timed or asynchronous circuits requires specific solutions. Due to the variable-time capabilities of asynchronous circuits, the average computation time should be optimized and not only the worst case of the signal propagation. If efficient algorithms and implementations are known for asynchronous addition and multiplication, only straightforward algorithms have been studied for division. This paper compares several digit-recurrence division algorithms (speed, area and circuit activity for estimating the power consumption). The comparison is based on simulations of the different operators described at the gate level. This work shows that the best solutions for asynchronous circuits are quite different from those used in synchronous circuits
MIMO Transmission with Residual Transmit-RF Impairments
Physical transceiver implementations for multiple-input multiple-output
(MIMO) wireless communication systems suffer from transmit-RF (Tx-RF)
impairments. In this paper, we study the effect on channel capacity and
error-rate performance of residual Tx-RF impairments that defy proper
compensation. In particular, we demonstrate that such residual distortions
severely degrade the performance of (near-)optimum MIMO detection algorithms.
To mitigate this performance loss, we propose an efficient algorithm, which is
based on an i.i.d. Gaussian model for the distortion caused by these
impairments. In order to validate this model, we provide measurement results
based on a 4-stream Tx-RF chain implementation for MIMO orthogonal
frequency-division multiplexing (OFDM).Comment: to be presented at the International ITG Workshop on Smart Antennas -
WSA 201
Complexity Analysis of Reed-Solomon Decoding over GF(2^m) Without Using Syndromes
For the majority of the applications of Reed-Solomon (RS) codes, hard
decision decoding is based on syndromes. Recently, there has been renewed
interest in decoding RS codes without using syndromes. In this paper, we
investigate the complexity of syndromeless decoding for RS codes, and compare
it to that of syndrome-based decoding. Aiming to provide guidelines to
practical applications, our complexity analysis differs in several aspects from
existing asymptotic complexity analysis, which is typically based on
multiplicative fast Fourier transform (FFT) techniques and is usually in big O
notation. First, we focus on RS codes over characteristic-2 fields, over which
some multiplicative FFT techniques are not applicable. Secondly, due to
moderate block lengths of RS codes in practice, our analysis is complete since
all terms in the complexities are accounted for. Finally, in addition to fast
implementation using additive FFT techniques, we also consider direct
implementation, which is still relevant for RS codes with moderate lengths.
Comparing the complexities of both syndromeless and syndrome-based decoding
algorithms based on direct and fast implementations, we show that syndromeless
decoding algorithms have higher complexities than syndrome-based ones for high
rate RS codes regardless of the implementation. Both errors-only and
errors-and-erasures decoding are considered in this paper. We also derive
tighter bounds on the complexities of fast polynomial multiplications based on
Cantor's approach and the fast extended Euclidean algorithm.Comment: 11 pages, submitted to EURASIP Journal on Wireless Communications and
Networkin
Faster Geometric Algorithms via Dynamic Determinant Computation
The computation of determinants or their signs is the core procedure in many
important geometric algorithms, such as convex hull, volume and point location.
As the dimension of the computation space grows, a higher percentage of the
total computation time is consumed by these computations. In this paper we
study the sequences of determinants that appear in geometric algorithms. The
computation of a single determinant is accelerated by using the information
from the previous computations in that sequence.
We propose two dynamic determinant algorithms with quadratic arithmetic
complexity when employed in convex hull and volume computations, and with
linear arithmetic complexity when used in point location problems. We implement
the proposed algorithms and perform an extensive experimental analysis. On one
hand, our analysis serves as a performance study of state-of-the-art
determinant algorithms and implementations. On the other hand, we demonstrate
the supremacy of our methods over state-of-the-art implementations of
determinant and geometric algorithms. Our experimental results include a 20 and
78 times speed-up in volume and point location computations in dimension 6 and
11 respectively.Comment: 29 pages, 8 figures, 3 table
Pipelining Of Double Precision Floating Point Division And Square Root Operations On Field-programmable Gate Arrays
Many space applications, such as vision-based systems, synthetic aperture radar, and radar altimetry rely increasingly on high data rate DSP algorithms. These algorithms use double precision floating point arithmetic operations. While most DSP applications can be executed on DSP processors, the DSP numerical requirements of these new space applications surpass by far the numerical capabilities of many current DSP processors. Since the tradition in DSP processing has been to use fixed point number representation, only recently have DSP processors begun to incorporate floating point arithmetic units, even though most of these units handle only single precision floating point addition/subtraction, multiplication, and occasionally division. While DSP processors are slowly evolving to meet the numerical requirements of newer space applications, FPGA densities have rapidly increased to parallel and surpass even the gate densities of many DSP processors and commodity CPUs. This makes them attractive platforms to implement compute-intensive DSP computations. Even in the presence of this clear advantage on the side of FPGAs, few attempts have been made to examine how wide precision floating point arithmetic, particularly division and square root operations, can perform on FPGAs to support these compute-intensive DSP applications. In this context, this thesis presents the sequential and pipelined designs of IEEE-754 compliant double floating point division and square root operations based on low radix digit recurrence algorithms. FPGA implementations of these algorithms have the advantage of being easily testable. In particular, the pipelined designs are synthesized based on careful partial and full unrolling of the iterations in the digit recurrence algorithms. In the overall, the implementations of the sequential and pipelined designs are common-denominator implementations which do not use any performance-enhancing embedded components such as multipliers and block memory. As these implementations exploit exclusively the fine-grain reconfigurable resources of Virtex FPGAs, they are easily portable to other FPGAs with similar reconfigurable fabrics without any major modifications. The pipelined designs of these two operations are evaluated in terms of area, throughput, and dynamic power consumption as a function of pipeline depth. Pipelining experiments reveal that the area overhead tends to remain constant regardless of the degree of pipelining to which the design is submitted, while the throughput increases with pipeline depth. In addition, these experiments reveal that pipelining reduces power considerably in shallow pipelines. Pipelining further these designs does not necessarily lead to significant power reduction. By partitioning these designs into deeper pipelines, these designs can reach throughputs close to the 100 MFLOPS mark by consuming a modest 1% to 8% of the reconfigurable fabric within a Virtex-II XC2VX000 (e.g., XC2V1000 or XC2V6000) FPGA
Multi-service systems: an enabler of flexible 5G air-interface
Multi-service system is an enabler to flexibly support
diverse communication requirements for the next generation
wireless communications. In such a system, multiple types of
services co-exist in one baseband system with each service having
its optimal frame structure and low out of band emission (OoBE)
waveforms operating on the service frequency band to reduce the
inter-service-band-interference (ISvcBI). In this article, a
framework for multi-service system is established and the
challenges and possible solutions are studied. The multi-service
system implementation in both time and frequency domain is
discussed. Two representative subband filtered multicarrier
(SFMC) waveforms: filtered orthogonal frequency division
multiplexing (F-OFDM) and universal filtered multi-carrier
(UFMC) are considered in this article. Specifically, the design
methodology, criteria, orthogonality conditions and prospective
application scenarios in the context of 5G are discussed. We
consider both single-rate (SR) and multi-rate (MR) signal
processing methods. Compared with the SR system, the MR
system has significantly reduced computational complexity at the
expense of performance loss due to inter-subband-interference
(ISubBI) in MR systems. The ISvcBI and ISubBI in MR systems
are investigated with proposed low-complexity interference
cancelation algorithms to enable the multi-service operation in
low interference level conditions
- …