1,546 research outputs found
MicroTCA implementation of synchronous Ethernet-Based DAQ systems for large scale experiments
Large LAr TPCs are among the most powerful detectors to address open problems
in particle and astro-particle physics, such as CP violation in leptonic
sector, neutrino properties and their astrophysical implications, proton decay
search etc. The scale of such detector implies severe constraints on their
readout and DAQ system. In this article we describe a data acquisition scheme
for this new generation of large detectors. The main challenge is to propose a
scalable and easy to use solution able to manage a large number of channels at
the lowest cost. It is interesting to note that these constraints are very
similar to those existing in Network Telecommunication Industry. We propose to
study how emerging technologies like ATCA and TCA could be used in
neutrino experiments. We describe the design of an Advanced Mezzanine Board
(AMC) including 32 ADC channels. This board receives 32 analogical channels at
the front panel and sends the formatted data through the TCA backplane
using a Gigabit Ethernet link. The gigabit switch of the MCH is used to
centralize and to send the data to the event building computer. The core of
this card is a FPGA (ARIA-GX from ALTERA) including the whole system except the
memories. A hardware accelerator has been implemented using a NIOS II P
and a Gigabit MAC IP. Obviously, in order to be able to reconstruct the tracks
from the events a time synchronisation system is mandatory. We decided to
implement the IEEE1588 standard also called Precision Timing Protocol, another
emerging and promising technology in Telecommunication Industry. In this
article we describe a Gigabit PTP implementation using the recovered clock of
the gigabit link. By doing so the drift is directly cancelled and the PTP will
be used only to evaluate and to correct the offset.Comment: Talk presented at the 2009 Real Time Conference, Beijing, May '09,
submitted to the proceeding
Flexible Power Modeling of LTE Base Stations
With the explosion of wireless communications in number of users and data rates, the reduction of network power consumption becomes more and more critical. This is especially true for base stations which represent a dominant share of the total power in cellular networks. In order to study power reduction techniques, a convenient power model is required, providing estimates of the power consumption in different scenarios. This paper proposes such a model, accurate but simple to use. It evaluates the base station power consumption for different types of cells supporting the 3GPP LTE standard. It is flexible enough to enable comparisons between state-of-the-art and advanced configurations, and an easy adaptation to various scenarios. The model is based on a combination of base station components and sub-components as well as power scaling rules as functions of the main system parameters
Low-latency adiabatic quantum-flux-parametron circuit integrated with a hybrid serializer/deserializer
Adiabatic quantum-flux-parametron (AQFP) logic is an ultra-low-power
superconductor logic family. AQFP logic gates are powered and clocked by
dedicated clocking schemes using ac excitation currents to implement an
energy-efficient switching process, adiabatic switching. We have proposed a
low-latency clocking scheme, delay-line clocking, and demonstrated basic AQFP
logic gates. In order to test more complex circuits, a serializer/deserializer
(SerDes) should be incorporated into the AQFP circuit under test, since the
number of input/output (I/O) cables is limited by equipment. Therefore, in the
present study we propose and develop a novel SerDes for testing
delay-line-clocked AQFP circuits by combining AQFP and rapid
single-flux-quantum (RSFQ) logic families, which we refer to as the AQFP/RSFQ
hybrid SerDes. The hybrid SerDes comprises RSFQ shift registers to facilitate
the data storage during serial-to-parallel and parallel-to-serial conversion.
Furthermore, all the component circuits in the hybrid SerDes are clocked by the
identical excitation current to synchronize the AQFP and RSFQ parts. We
fabricate and demonstrate a delay-line-clocked AQFP circuit (8-to-3 encoder,
which is the largest delay-line-clocked circuit ever designed) integrated with
the hybrid SerDes at 4.2 K up to 4.5 GHz. Our measurement results indicate that
the hybrid SerDes enables the testing of delay-line-clocked AQFP circuits with
only a few I/O cables and is thus a powerful tool for the development of very
large-scale integration AQFP circuits.Comment: 7 pages, 6 figure
NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features
While the GPGPU paradigm is widely recognized as an effective approach to
high performance computing, its adoption in low-latency, real-time systems is
still in its early stages.
Although GPUs typically show deterministic behaviour in terms of latency in
executing computational kernels as soon as data is available in their internal
memories, assessment of real-time features of a standard GPGPU system needs
careful characterization of all subsystems along data stream path.
The networking subsystem results in being the most critical one in terms of
absolute value and fluctuations of its response latency.
Our envisioned solution to this issue is NaNet, a FPGA-based PCIe Network
Interface Card (NIC) design featuring a configurable and extensible set of
network channels with direct access through GPUDirect to NVIDIA Fermi/Kepler
GPU memories.
NaNet design currently supports both standard - GbE (1000BASE-T) and 10GbE
(10Base-R) - and custom - 34~Gbps APElink and 2.5~Gbps deterministic latency
KM3link - channels, but its modularity allows for a straightforward inclusion
of other link technologies.
To avoid host OS intervention on data stream and remove a possible source of
jitter, the design includes a network/transport layer offload module with
cycle-accurate, upper-bound latency, supporting UDP, KM3link Time Division
Multiplexing and APElink protocols.
After NaNet architecture description and its latency/bandwidth
characterization for all supported links, two real world use cases will be
presented: the GPU-based low level trigger for the RICH detector in the NA62
experiment at CERN and the on-/off-shore data link for KM3 underwater neutrino
telescope
High-Speed Low-Voltage Line Driver for SerDes Applications
The driving factor behind this research was to design & develop a line driver capable of meeting the demanding specifications of the next generation of SerDes devices. In this thesis various line driver topologies were analysed to identify a topology suited for a high-speed low-voltage operating environment.
This thesis starts of by introducing a relatively new high-speed communication Device called SerDes. SerDes is used in wired chip-to-chip communications and operates by converting a parallel data stream in a serial data stream that can be then transmitted at a higher bit rate, existing SerDes devices operate up to 12.5Gbps. A matching SerDes device at the destination will then convert the serial data stream back into a parallel data stream to be read by the destination ASIC. SerDes typically uses a line driver with a differential output. Using a differential line driver increases the resilience to outside sources of noise and reduces the amount of EM radiation produced by transmission.
The focus of this research is to design and develop a line driver that can operate at 40Gbps and can function with a power supply of less than IV. This demanding specification was decided to be an accurate representation of future requirements that a line driver in a SerDes device will have to conform to.
A suitable line driver with a differential output was identified to meet the demanding specifications and was modified so that it can perfonn an equalisation technique called pre-distortion. Two variations of the new topology were outlined and a behavioural model was created for both using Matlab Simulink. The behavioural model for both variants proved the concept, however only one variant maintained its perfomance once the designs were implemented at transistor level in Cadence, using a 65nm CMOS
technology provided by Texas Instruments.
The final line driver design was then converted into a layout design, again using Cadence, and RC parasitics were extracted to perfom a post-layout simulation. The post layout simulation shows that the novel line driver can operate at 40Gbps with a power supply of 1 V - O.8V and has a power consumption of 4.54m W /Gbps. The Deterministic Jitter added by the line driver is 12.9ps
CMOS VLSI correlator design for radio-astronomical signal processing : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Engineering at Massey University, Auckland, New Zealand
Multi-element radio telescopes employ methods of indirect imaging to capture the image of the sky. These methods are in contrast to direct imaging methods whereby the image is constructed from sensor measurements directly and involve extensive
signal processing on antenna signals. The Square Kilometre Array, or the SKA, is a future radio telescope of this type that, once built, will become the largest telescope in the world. The unprecedented scale of the SKA requires novel solutions to be
developed for its signal processing pipeline one of the most resource-consuming parts of which is the correlator. The SKA uses the FX correlator construction that consists of two parts: the F part that translates antenna signals into frequency domain and the X part that cross-correlates these signals between each other. This research focuses on the integrated circuit design and VLSI implementation issues of the X part of a very large FX correlator in 28 nm and 130 nm CMOS. The correlator’s main processing operation is the complex multiply-accumulation (CMAC) for which custom 28 nm CMAC designs are presented and evaluated. Performance of various memories inside the correlator also affects overall efficiency, and input-buffered and output-buffered approaches are considered with the goal of improving upon it. For output-buffered designs, custom memory control circuits have been designed and prototyped in 130 nm that improve upon eDRAM by taking advantage of sequential access patterns. For the input-buffered architecture, a new scheme is proposed that decreases the usage of the input-buffer memory by a third by making use of multiple accumulators in every CMAC. Because cross-correlation is a very data-intensive process, high-performance SerDes I/O is essential to any practical ASIC implementation. On the I/O design, the 28 nm full-rate transmitter delivering 15 Gbps per lane is presented. This design consists of the scrambler, the serialiser, the digital VCO with analog fine-tuning and the SST driver including features of a 4-tap FFE, impedance tuning and amplitude tuning
Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube
Three-dimensional (3D)-stacking technology, which enables the integration of
DRAM and logic dies, offers high bandwidth and low energy consumption. This
technology also empowers new memory designs for executing tasks not
traditionally associated with memories. A practical 3D-stacked memory is Hybrid
Memory Cube (HMC), which provides significant access bandwidth and low power
consumption in a small area. Although several studies have taken advantage of
the novel architecture of HMC, its characteristics in terms of latency and
bandwidth or their correlation with temperature and power consumption have not
been fully explored. This paper is the first, to the best of our knowledge, to
characterize the thermal behavior of HMC in a real environment using the AC-510
accelerator and to identify temperature as a new limitation for this
state-of-the-art design space. Moreover, besides bandwidth studies, we
deconstruct factors that contribute to latency and reveal their sources for
high- and low-load accesses. The results of this paper demonstrates essential
behaviors and performance bottlenecks for future explorations of
packet-switched and 3D-stacked memories.Comment: EEE Catalog Number: CFP17236-USB ISBN 13: 978-1-5386-1232-
Evaluation of Giga-bit Ethernet Instrumentation for SalSA Electronics Readout (GEISER)
An instrumentation prototype for acquiring high-speed transient data from an
array of high bandwidth antennas is presented. Multi-kilometer cable runs
complicate acquisition of such large bandwidth radio signals from an extensive
antenna array. Solutions using analog fiber optic links are being explored,
though are very expensive. We propose an inexpensive solution that allows for
individual operation of each antenna element, operating at potentially high
local self-trigger rates. Digitized data packets are transmitted to the surface
via commercially available Giga-bit Ethernet hardware. Events are then
reconstructed on a computer farm by sorting the received packets using standard
networking gear, eliminating the need for custom, very high-speed trigger
hardware. Such a system is completely scalable and leverages the hugh capital
investment made by the telecommunications industry. Test results from a
demonstration prototype are presented.Comment: 8 pages, to be submitted to NIM
- …