19,078 research outputs found
Study of combining GPU/FPGA accelerators for high-performance computing
This contribution presents the performance modeling of a super desktop with GPU and FPGA accelerators, using OpenCL for the GPU and a high-level synthesis compiler for the FPGAs. The performance model is used to evaluate the different high-level synthesis optimizations, taking into account the resource usage, and to compare the compute power of the FPGA with the GP
A Scalable Correlator Architecture Based on Modular FPGA Hardware, Reuseable Gateware, and Data Packetization
A new generation of radio telescopes is achieving unprecedented levels of
sensitivity and resolution, as well as increased agility and field-of-view, by
employing high-performance digital signal processing hardware to phase and
correlate large numbers of antennas. The computational demands of these imaging
systems scale in proportion to BMN^2, where B is the signal bandwidth, M is the
number of independent beams, and N is the number of antennas. The
specifications of many new arrays lead to demands in excess of tens of PetaOps
per second.
To meet this challenge, we have developed a general purpose correlator
architecture using standard 10-Gbit Ethernet switches to pass data between
flexible hardware modules containing Field Programmable Gate Array (FPGA)
chips. These chips are programmed using open-source signal processing libraries
we have developed to be flexible, scalable, and chip-independent. This work
reduces the time and cost of implementing a wide range of signal processing
systems, with correlators foremost among them,and facilitates upgrading to new
generations of processing technology. We present several correlator
deployments, including a 16-antenna, 200-MHz bandwidth, 4-bit, full Stokes
parameter application deployed on the Precision Array for Probing the Epoch of
Reionization.Comment: Accepted to Publications of the Astronomy Society of the Pacific. 31
pages. v2: corrected typo, v3: corrected Fig. 1
Low-Complexity Sub-band Digital Predistortion for Spurious Emission Suppression in Noncontiguous Spectrum Access
Noncontiguous transmission schemes combined with high power-efficiency
requirements pose big challenges for radio transmitter and power amplifier (PA)
design and implementation. Due to the nonlinear nature of the PA, severe
unwanted emissions can occur, which can potentially interfere with neighboring
channel signals or even desensitize the own receiver in frequency division
duplexing (FDD) transceivers. In this article, to suppress such unwanted
emissions, a low-complexity sub-band DPD solution, specifically tailored for
spectrally noncontiguous transmission schemes in low-cost devices, is proposed.
The proposed technique aims at mitigating only the selected spurious
intermodulation distortion components at the PA output, hence allowing for
substantially reduced processing complexity compared to classical linearization
solutions. Furthermore, novel decorrelation based parameter learning solutions
are also proposed and formulated, which offer reduced computing complexity in
parameter estimation as well as the ability to track time-varying features
adaptively. Comprehensive simulation and RF measurement results are provided,
using a commercial LTE-Advanced mobile PA, to evaluate and validate the
effectiveness of the proposed solution in real world scenarios. The obtained
results demonstrate that highly efficient spurious component suppression can be
obtained using the proposed solutions
Massive MIMO with Non-Ideal Arbitrary Arrays: Hardware Scaling Laws and Circuit-Aware Design
Massive multiple-input multiple-output (MIMO) systems are cellular networks
where the base stations (BSs) are equipped with unconventionally many antennas,
deployed on co-located or distributed arrays. Huge spatial degrees-of-freedom
are achieved by coherent processing over these massive arrays, which provide
strong signal gains, resilience to imperfect channel knowledge, and low
interference. This comes at the price of more infrastructure; the hardware cost
and circuit power consumption scale linearly/affinely with the number of BS
antennas . Hence, the key to cost-efficient deployment of large arrays is
low-cost antenna branches with low circuit power, in contrast to today's
conventional expensive and power-hungry BS antenna branches. Such low-cost
transceivers are prone to hardware imperfections, but it has been conjectured
that the huge degrees-of-freedom would bring robustness to such imperfections.
We prove this claim for a generalized uplink system with multiplicative
phase-drifts, additive distortion noise, and noise amplification. Specifically,
we derive closed-form expressions for the user rates and a scaling law that
shows how fast the hardware imperfections can increase with while
maintaining high rates. The connection between this scaling law and the power
consumption of different transceiver circuits is rigorously exemplified. This
reveals that one can make the circuit power increase as , instead of
linearly, by careful circuit-aware system design.Comment: Accepted for publication in IEEE Transactions on Wireless
Communications, 16 pages, 8 figures. The results can be reproduced using the
following Matlab code: https://github.com/emilbjornson/hardware-scaling-law
Millimeter-wave Wireless LAN and its Extension toward 5G Heterogeneous Networks
Millimeter-wave (mmw) frequency bands, especially 60 GHz unlicensed band, are
considered as a promising solution for gigabit short range wireless
communication systems. IEEE standard 802.11ad, also known as WiGig, is
standardized for the usage of the 60 GHz unlicensed band for wireless local
area networks (WLANs). By using this mmw WLAN, multi-Gbps rate can be achieved
to support bandwidth-intensive multimedia applications. Exhaustive search along
with beamforming (BF) is usually used to overcome 60 GHz channel propagation
loss and accomplish data transmissions in such mmw WLANs. Because of its short
range transmission with a high susceptibility to path blocking, multiple number
of mmw access points (APs) should be used to fully cover a typical target
environment for future high capacity multi-Gbps WLANs. Therefore, coordination
among mmw APs is highly needed to overcome packet collisions resulting from
un-coordinated exhaustive search BF and to increase the total capacity of mmw
WLANs. In this paper, we firstly give the current status of mmw WLANs with our
developed WiGig AP prototype. Then, we highlight the great need for coordinated
transmissions among mmw APs as a key enabler for future high capacity mmw
WLANs. Two different types of coordinated mmw WLAN architecture are introduced.
One is the distributed antenna type architecture to realize centralized
coordination, while the other is an autonomous coordination with the assistance
of legacy Wi-Fi signaling. Moreover, two heterogeneous network (HetNet)
architectures are also introduced to efficiently extend the coordinated mmw
WLANs to be used for future 5th Generation (5G) cellular networks.Comment: 18 pages, 24 figures, accepted, invited paper
Temporal unpredictability detection of real-time video sequence
Imperial Users onl
Neuromorphic Hardware In The Loop: Training a Deep Spiking Network on the BrainScaleS Wafer-Scale System
Emulating spiking neural networks on analog neuromorphic hardware offers
several advantages over simulating them on conventional computers, particularly
in terms of speed and energy consumption. However, this usually comes at the
cost of reduced control over the dynamics of the emulated networks. In this
paper, we demonstrate how iterative training of a hardware-emulated network can
compensate for anomalies induced by the analog substrate. We first convert a
deep neural network trained in software to a spiking network on the BrainScaleS
wafer-scale neuromorphic system, thereby enabling an acceleration factor of 10
000 compared to the biological time domain. This mapping is followed by the
in-the-loop training, where in each training step, the network activity is
first recorded in hardware and then used to compute the parameter updates in
software via backpropagation. An essential finding is that the parameter
updates do not have to be precise, but only need to approximately follow the
correct gradient, which simplifies the computation of updates. Using this
approach, after only several tens of iterations, the spiking network shows an
accuracy close to the ideal software-emulated prototype. The presented
techniques show that deep spiking networks emulated on analog neuromorphic
devices can attain good computational performance despite the inherent
variations of the analog substrate.Comment: 8 pages, 10 figures, submitted to IJCNN 201
- …