990 research outputs found
NaNet: a Low-Latency, Real-Time, Multi-Standard Network Interface Card with GPUDirect Features
While the GPGPU paradigm is widely recognized as an effective approach to
high performance computing, its adoption in low-latency, real-time systems is
still in its early stages.
Although GPUs typically show deterministic behaviour in terms of latency in
executing computational kernels as soon as data is available in their internal
memories, assessment of real-time features of a standard GPGPU system needs
careful characterization of all subsystems along data stream path.
The networking subsystem results in being the most critical one in terms of
absolute value and fluctuations of its response latency.
Our envisioned solution to this issue is NaNet, a FPGA-based PCIe Network
Interface Card (NIC) design featuring a configurable and extensible set of
network channels with direct access through GPUDirect to NVIDIA Fermi/Kepler
GPU memories.
NaNet design currently supports both standard - GbE (1000BASE-T) and 10GbE
(10Base-R) - and custom - 34~Gbps APElink and 2.5~Gbps deterministic latency
KM3link - channels, but its modularity allows for a straightforward inclusion
of other link technologies.
To avoid host OS intervention on data stream and remove a possible source of
jitter, the design includes a network/transport layer offload module with
cycle-accurate, upper-bound latency, supporting UDP, KM3link Time Division
Multiplexing and APElink protocols.
After NaNet architecture description and its latency/bandwidth
characterization for all supported links, two real world use cases will be
presented: the GPU-based low level trigger for the RICH detector in the NA62
experiment at CERN and the on-/off-shore data link for KM3 underwater neutrino
telescope
High-speed data transfer with FPGAs and QSFP+ modules
We present test results and characterization of a data transmission system
based on a last generation FPGA and a commercial QSFP+ (Quad Small Form
Pluggable +) module. QSFP+ standard defines a hot-pluggable transceiver
available in copper or optical cable assemblies for an aggregated bandwidth of
up to 40 Gbps. We implemented a complete testbench based on a commercial
development card mounting an Altera Stratix IV FPGA with 24 serial transceivers
at 8.5 Gbps, together with a custom mezzanine hosting three QSFP+ modules. We
present test results and signal integrity measurements up to an aggregated
bandwidth of 12 Gbps.Comment: 5 pages, 3 figures, Published on JINST Journal of Instrumentation
proceedings of Topical Workshop on Electronics for Particle Physics 2010,
20-24 September 2010, Aachen, Germany(R Ammendola et al 2010 JINST 5 C12019
High resolution infrared and Raman spectra of 13C12CD2: The CD stretching fundamentals and associated combination and hot bands
11 pags.; 4 figs.; 9 tabs.© 2015 AIP Publishing LLC. Infrared and Raman spectra of mono 13C fully deuterated acetylene, 13C12CD2, have been recorded and analysed to obtain detailed information on the CâD stretching fundamentals and associated combination, overtone, and hot bands. Infrared spectra were recorded at an instrumental resolution ranging between 0.006 and 0.01 cmâ1 in the region 1800â7800 cmâ1. Sixty new bands involving the Îœ 1 and Îœ 3 CâD stretching modes also associated with the Îœ 4 and Îœ 5 bending vibrations have been observed and analysed. In total, 5881 transitions have been assigned in the investigated spectral region. In addition, the Q branch of the Îœ 1 fundamental was recorded using inverse Raman spectroscopy, with an instrumental resolution of about 0.003 cmâ1. The transitions relative to each stretching mode, i.e., the fundamental band, its first overtone, and associated hot and combination bands involving bending states with Ï
4 + Ï
5 up to 2 were fitted simultaneously. The usual Hamiltonian appropriate to a linear molecule, including vibration and rotation l-type and the DarlingâDennison interaction between Ï
4 = 2 and Ï
5 = 2 levels associated with the stretching states, was adopted for the analysis. The standard deviation for each global fit is â€0.0004âcmâ1, of the same order of magnitude of the measurement precision. Slightly improved parameters for the bending and the Îœ 2 manifold have been also determined. Precise values of spectroscopic parameters deperturbed from the resonance interactions have been obtained. They provide quantitative information on the anharmonic character of the potential energy surface, which can be useful, in addition to those reported in the literature, for the determination of a general anharmonic force field for the molecule. Finally, the obtained values of the DarlingâDennison constants can be valuable for understanding energy flows between independent vibrations.The Bologna authors acknowledge the UniversitĂ di
Bologna and the financial support of the Ministero dellâ
Istruzione dellâUniversitĂ e della Ricerca (PRIN 2012 âSpettroscopia
e Tecniche computazionali per la ricerca Astrofisica,
atmosferica e Radioastronomicaâ). D.B. and R.Z.M. acknowledge
the financial support of the Ministry of Economy and
Competitiveness through Research Grant No. FIS2012-38175.Peer Reviewe
apeNEXT: A multi-TFlops Computer for Simulations in Lattice Gauge Theory
We present the APE (Array Processor Experiment) project for the development
of dedicated parallel computers for numerical simulations in lattice gauge
theories. While APEmille is a production machine in today's physics simulations
at various sites in Europe, a new machine, apeNEXT, is currently being
developed to provide multi-Tflops computing performance. Like previous APE
machines, the new supercomputer is largely custom designed and specifically
optimized for simulations of Lattice QCD.Comment: Poster at the XXIII Physics in Collisions Conference (PIC03),
Zeuthen, Germany, June 2003, 3 pages, Latex. PSN FRAP15. Replaced for adding
forgotten autho
Toll-like receptor kinetics in septic shock patients: a preliminary study.
The aim of this study is to evaluate some inflammatory parameter changes in septic shock patients and their possible correlation with clinical outcome, in particular when continuous veno-venous hemofiltration (CVVH) treatment is required. Considering the objective difficulty in enrolling this kind of patient, a preliminary study was initiated on seventeen septic shock patients admitted to a medical and surgical ICU. The mRNA expression of Toll-like receptor (TLR)-1, TLR-2, TLR-4, TLR-5, TLR-9, TNFalpha, IL-8 and IL-1beta was assessed, the plasmatic concentrations of IL-18, IL-2, IL-10 and TNFalpha were measured on the day of sepsis diagnosis and after 72 h. In those patients who developed acute renal failure unresponsive to medical treatment and who underwent CVVH treatment the same parameters were measured every 24 h during CVVH and after completion of the treatment. On sepsis diagnosis, gene expression of TLRs was up-regulated compared to the housekeeping gene in all the patients. After 72 h, in 35% of the patients a down-regulation of these genes was found compared to day 1, but it was not associated with a reduction of cytokine serum levels or improved clinical signs, better outcome or reduced mortality. After high volume hemofiltration treatment, cytokine serum levels and TLR expression were not significantly modified. In conclusion, considering the not numerous number of cases, from our preliminary study, we cannot certainly correlate TLR over-expression in septic shock patients with severity or outcome scores
GPU-based Real-time Triggering in the NA62 Experiment
Over the last few years the GPGPU (General-Purpose computing on Graphics
Processing Units) paradigm represented a remarkable development in the world of
computing. Computing for High-Energy Physics is no exception: several works
have demonstrated the effectiveness of the integration of GPU-based systems in
high level trigger of different experiments. On the other hand the use of GPUs
in the low level trigger systems, characterized by stringent real-time
constraints, such as tight time budget and high throughput, poses several
challenges. In this paper we focus on the low level trigger in the CERN NA62
experiment, investigating the use of real-time computing on GPUs in this
synchronous system. Our approach aimed at harvesting the GPU computing power to
build in real-time refined physics-related trigger primitives for the RICH
detector, as the the knowledge of Cerenkov rings parameters allows to build
stringent conditions for data selection at trigger level. Latencies of all
components of the trigger chain have been analyzed, pointing out that
networking is the most critical one. To keep the latency of data transfer task
under control, we devised NaNet, an FPGA-based PCIe Network Interface Card
(NIC) with GPUDirect capabilities. For the processing task, we developed
specific multiple ring trigger algorithms to leverage the parallel architecture
of GPUs and increase the processing throughput to keep up with the high event
rate. Results obtained during the first months of 2016 NA62 run are presented
and discussed
APEnet+: high bandwidth 3D torus direct network for petaflops scale commodity clusters
We describe herein the APElink+ board, a PCIe interconnect adapter featuring
the latest advances in wire speed and interface technology plus hardware
support for a RDMA programming model and experimental acceleration of GPU
networking; this design allows us to build a low latency, high bandwidth PC
cluster, the APEnet+ network, the new generation of our cost-effective,
tens-of-thousands-scalable cluster network architecture. Some test results and
characterization of data transmission of a complete testbench, based on a
commercial development card mounting an Altera FPGA, are provided.Comment: 6 pages, 7 figures, proceeding of CHEP 2010, Taiwan, October 18-2
A simple method for generating full length cDNA from low abundance partial genomic clones
BACKGROUND: PCR amplification of target molecules involves sequence specific primers that flank the region to be amplified. While this technique is generally routine, its applicability may not be sufficient to generate a desired target molecule from two separate regions involving intron /exon boundaries. For these situations, the generation of full-length complementary DNAs from two partial genomic clones becomes necessary for the family of low abundance genes. RESULTS: The first approach we used for the isolation of full-length cDNA from two known genomic clones of Hox genes was based on fusion PCR. Here we describe a simple and efficient method of amplification for homeobox D13 (HOXD13) full length cDNA from two partial genomic clones. Specific 5' and 3' untranslated region (UTR) primer pairs and website program (primer3_www.cgv0.2) were key steps involved in this process. CONCLUSIONS: We have devised a simple, rapid and easy method for generating cDNA clone from genomic sequences. The full length HOXD13 clone (1.1 kb) generated with this technique was confirmed by sequence analysis. This simple approach can be utilized to generate full-length cDNA clones from available partial genomic sequences
Progress and status of APEmille
We report on the progress and status of the APEmille project: a SIMD parallel
computer with a peak performance in the TeraFlops range which is now in an
advanced development phase. We discuss the hardware and software architecture,
and present some performance estimates for Lattice Gauge Theory (LGT)
applications.Comment: Talk presented at LATTICE97, 3 pages, Late
- âŠ