7,222 research outputs found
Joint Algorithm-Architecture Optimization of CABAC
This paper uses joint algorithm and architecture design to enable high coding efficiency in conjunction with high processing speed and low area cost. Specifically, it presents several optimizations that can be performed on Context Adaptive Binary Arithmetic Coding (CABAC), a form of entropy coding used in H.264/AVC, to achieve the throughput necessary for real-time low power high definition video coding. The combination of syntax element partitions and interleaved entropy slices, referred to as Massively Parallel CABAC, increases the number of binary symbols that can be processed in a cycle. Subinterval reordering is used to reduce the cycle time required to process each binary symbol. Under common conditions using the JM12.0 software, the Massively Parallel CABAC, increases the bins per cycle by 2.7 to 32.8× at a cost of 0.25 to 6.84% coding loss compared with sequential single slice H.264/AVC CABAC. It also provides a 2× reduction in area cost, and reduces memory bandwidth. Subinterval reordering reduces the critical path delay by 14 to 22%, while modifications to context selection reduces the memory requirement by 67%. This work demonstrates that accounting for implementation cost during video coding algorithms design can enable higher processing speed and reduce hardware cost, while still delivering high coding efficiency in the next generation video coding standard.Texas Instruments Incorporated (Graduate Women's Fellowship for Leadership in Microelectronics)Natural Sciences and Engineering Research Council of Canad
Distributed multi-user MIMO transmission using real-time sigma-delta-over-fiber for next generation fronthaul interface
To achieve the massive device connectivity and high data rate demanded by 5G, wireless transmission with wider signal bandwidths and higher-order multiple-input multiple-output (MIMO) is inevitable. This work demonstrates a possible function split option for the next generation fronthaul interface (NGFI). The proof-of-concept downlink architecture consists of real-time sigma-delta modulated signal over fiber (SDoF) links in combination with distributed multi-user (MU) MIMO transmission. The setup is fully implemented using off-the-shelf and in-house developed components. A single SDoF link achieves an error vector magnitude (EVM) of 3.14% for a 163.84 MHz-bandwidth 256-QAM OFDM signal (958.64 Mbps) with a carrier frequency around 3.5 GHz transmitted over 100 m OM4 multi-mode fiber at 850 nm using a commercial QSFP module. The centralized architecture of the proposed setup introduces no frequency asynchronism among remote radio units. For most cases, the 2 x 2 MU-MIMO transmission has little performance degradation compared to SISO, 0.8 dB EVM degradation for 40.96 MHz-bandwidth signals and 1.4 dB for 163.84 MHz-bandwidth on average, implying that the wireless spectral efficiency almost doubles by exploiting spatial multiplexing. A 1.4 Gbps data rate (720 Mbps per user, 163.84 MHz-bandwidth, 64-QAM) is reached with an average EVM of 6.66%. The performance shows that this approach is feasible for the high-capacity hot-spot scenario
Performance Prediction of Nonbinary Forward Error Correction in Optical Transmission Experiments
In this paper, we compare different metrics to predict the error rate of
optical systems based on nonbinary forward error correction (FEC). It is shown
that the correct metric to predict the performance of coded modulation based on
nonbinary FEC is the mutual information. The accuracy of the prediction is
verified in a detailed example with multiple constellation formats, FEC
overheads in both simulations and optical transmission experiments over a
recirculating loop. It is shown that the employed FEC codes must be universal
if performance prediction based on thresholds is used. A tutorial introduction
into the computation of the threshold from optical transmission measurements is
also given.Comment: submitted to IEEE/OSA Journal of Lightwave Technolog
On the BICM Capacity
Optimal binary labelings, input distributions, and input alphabets are
analyzed for the so-called bit-interleaved coded modulation (BICM) capacity,
paying special attention to the low signal-to-noise ratio (SNR) regime. For
8-ary pulse amplitude modulation (PAM) and for 0.75 bit/symbol, the folded
binary code results in a higher capacity than the binary reflected gray code
(BRGC) and the natural binary code (NBC). The 1 dB gap between the additive
white Gaussian noise (AWGN) capacity and the BICM capacity with the BRGC can be
almost completely removed if the input symbol distribution is properly
selected. First-order asymptotics of the BICM capacity for arbitrary input
alphabets and distributions, dimensions, mean, variance, and binary labeling
are developed. These asymptotics are used to define first-order optimal (FOO)
constellations for BICM, i.e. constellations that make BICM achieve the Shannon
limit -1.59 \tr{dB}. It is shown that the \Eb/N_0 required for reliable
transmission at asymptotically low rates in BICM can be as high as infinity,
that for uniform input distributions and 8-PAM there are only 72 classes of
binary labelings with a different first-order asymptotic behavior, and that
this number is reduced to only 26 for 8-ary phase shift keying (PSK). A general
answer to the question of FOO constellations for BICM is also given: using the
Hadamard transform, it is found that for uniform input distributions, a
constellation for BICM is FOO if and only if it is a linear projection of a
hypercube. A constellation based on PAM or quadrature amplitude modulation
input alphabets is FOO if and only if they are labeled by the NBC; if the
constellation is based on PSK input alphabets instead, it can never be FOO if
the input alphabet has more than four points, regardless of the labeling.Comment: Submitted to the IEEE Transactions on Information Theor
Novel Methods in the Improvement of Turbo Codes and their Decoding
The performance of turbo codes can often be improved by improving the weight spectra of such codes. Methods of producing the weight spectra of turbo codes have been investigated and many improvements were made to refine the techniques. A much faster method of weight spectrum evaluation has been developed that allows calculation of weight spectra within a few minutes on a typical desktop PC. Simulation results show that new high performance turbo codes are produced by the optimisation methods presented. The two further important areas of concern are the code itself and the decoding. Improvements of the code are accomplished through optimisation of the interleaver and choice of constituent coders. Optimisation of interleaves can also be accomplished automatically using the algorithms described in this work.
The addition of a CRC as an outer code proved to offer a vast improvement on the overall code performance. This was achieved without any code rate loss as the turbo code is punctured to make way for the CRC remainder. The results show a gain of 0.4dB compared to the non-CRC (1014,676) turbo code.
Another improvement to the decoding performance was achieved through a combination of MAP decoding and Ordered Reliability decoding. The simulations show a performance of just 0.2dB from the Shannon limit. The same code without ordered reliability decoding has a performance curve which is 0.6dB from the Shannon limit. In situations where the MAP decoder fails to converge ordered reliability decoding succeeds in producing a codeword much closer to the received vector, often the correct codeword. The ordered reliability decoding adds to the computational complexity but lends itself to FPGA implementation.Engineering and Physical Sciences Research Council (EPSRC
Energy-efficient acceleration of MPEG-4 compression tools
We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09
μm TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art
Replacing the Soft FEC Limit Paradigm in the Design of Optical Communication Systems
The FEC limit paradigm is the prevalent practice for designing optical
communication systems to attain a certain bit-error rate (BER) without forward
error correction (FEC). This practice assumes that there is an FEC code that
will reduce the BER after decoding to the desired level. In this paper, we
challenge this practice and show that the concept of a channel-independent FEC
limit is invalid for soft-decision bit-wise decoding. It is shown that for low
code rates and high order modulation formats, the use of the soft FEC limit
paradigm can underestimate the spectral efficiencies by up to 20%. A better
predictor for the BER after decoding is the generalized mutual information,
which is shown to give consistent post-FEC BER predictions across different
channel conditions and modulation formats. Extensive optical full-field
simulations and experiments are carried out in both the linear and nonlinear
transmission regimes to confirm the theoretical analysis
A low-energy rate-adaptive bit-interleaved passive optical network
Energy consumption of customer premises equipment (CPE) has become a serious issue in the new generations of time-division multiplexing passive optical networks, which operate at 10 Gb/s or higher. It is becoming a major factor in global network energy consumption, and it poses problems during emergencies when CPE is battery-operated. In this paper, a low-energy passive optical network (PON) that uses a novel bit-interleaving downstream protocol is proposed. The details about the network architecture, protocol, and the key enabling implementation aspects, including dynamic traffic interleaving, rate-adaptive descrambling of decimated traffic, and the design and implementation of a downsampling clock and data recovery circuit, are described. The proposed concept is shown to reduce the energy consumption for protocol processing by a factor of 30. A detailed analysis of the energy consumption in the CPE shows that the interleaving protocol reduces the total energy consumption of the CPE significantly in comparison to the standard 10 Gb/s PON CPE. Experimental results obtained from measurements on the implemented CPE prototype confirm that the CPE consumes significantly less energy than the standard 10 Gb/s PON CPE
- …