23,410 research outputs found
Lattice QCD Production on Commodity Clusters at Fermilab
We describe the construction and results to date of Fermilab's three
Myrinet-networked lattice QCD production clusters (an 80-node dual Pentium III
cluster, a 48-node dual Xeon cluster, and a 128-node dual Xeon cluster). We
examine a number of aspects of performance of the MILC lattice QCD code running
on these clusters.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics
(CHEP03), La Jolla, Ca, USA, March 2003, 6 pages, LaTeX, 8 eps figures. PSN
TUIT00
A review of advances in pixel detectors for experiments with high rate and radiation
The Large Hadron Collider (LHC) experiments ATLAS and CMS have established
hybrid pixel detectors as the instrument of choice for particle tracking and
vertexing in high rate and radiation environments, as they operate close to the
LHC interaction points. With the High Luminosity-LHC upgrade now in sight, for
which the tracking detectors will be completely replaced, new generations of
pixel detectors are being devised. They have to address enormous challenges in
terms of data throughput and radiation levels, ionizing and non-ionizing, that
harm the sensing and readout parts of pixel detectors alike. Advances in
microelectronics and microprocessing technologies now enable large scale
detector designs with unprecedented performance in measurement precision (space
and time), radiation hard sensors and readout chips, hybridization techniques,
lightweight supports, and fully monolithic approaches to meet these challenges.
This paper reviews the world-wide effort on these developments.Comment: 84 pages with 46 figures. Review article.For submission to Rep. Prog.
Phy
Scalable Layer-2/Layer-3 Multistage Switching Architectures for Software Routers
Software routers are becoming an important alternative to proprietary and expensive network devices, because they exploit the economy of scale of the PC market and open-source software. When considering maximum performance in terms of throughput, PC-based routers suffer from limitations stemming from the single PC architecture, e.g., limited bus bandwidth, and high memory access latency. To overcome these limitations, in this paper we present a multistage architecture that combines a layer-2 load-balancer front-end and a layer-3 routing back-end, interconnected by standard Ethernet switches. Both the front-end and the back-end are implemented using standard PCs and open- source software. After describing the architecture, evaluation is performed on a lab test-bed, to show its scalability. While the proposed solution allows to increase performance of PC- based routers, it also allows to distribute packet manipulation functionalities, and to automatically recover from component failures
GPU peer-to-peer techniques applied to a cluster interconnect
Modern GPUs support special protocols to exchange data directly across the
PCI Express bus. While these protocols could be used to reduce GPU data
transmission times, basically by avoiding staging to host memory, they require
specific hardware features which are not available on current generation
network adapters. In this paper we describe the architectural modifications
required to implement peer-to-peer access to NVIDIA Fermi- and Kepler-class
GPUs on an FPGA-based cluster interconnect. Besides, the current software
implementation, which integrates this feature by minimally extending the RDMA
programming model, is discussed, as well as some issues raised while employing
it in a higher level API like MPI. Finally, the current limits of the technique
are studied by analyzing the performance improvements on low-level benchmarks
and on two GPU-accelerated applications, showing when and how they seem to
benefit from the GPU peer-to-peer method.Comment: paper accepted to CASS 201
Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube
Three-dimensional (3D)-stacking technology, which enables the integration of
DRAM and logic dies, offers high bandwidth and low energy consumption. This
technology also empowers new memory designs for executing tasks not
traditionally associated with memories. A practical 3D-stacked memory is Hybrid
Memory Cube (HMC), which provides significant access bandwidth and low power
consumption in a small area. Although several studies have taken advantage of
the novel architecture of HMC, its characteristics in terms of latency and
bandwidth or their correlation with temperature and power consumption have not
been fully explored. This paper is the first, to the best of our knowledge, to
characterize the thermal behavior of HMC in a real environment using the AC-510
accelerator and to identify temperature as a new limitation for this
state-of-the-art design space. Moreover, besides bandwidth studies, we
deconstruct factors that contribute to latency and reveal their sources for
high- and low-load accesses. The results of this paper demonstrates essential
behaviors and performance bottlenecks for future explorations of
packet-switched and 3D-stacked memories.Comment: EEE Catalog Number: CFP17236-USB ISBN 13: 978-1-5386-1232-
Design of multimedia processor based on metric computation
Media-processing applications, such as signal processing, 2D and 3D graphics
rendering, and image compression, are the dominant workloads in many embedded
systems today. The real-time constraints of those media applications have
taxing demands on today's processor performances with low cost, low power and
reduced design delay. To satisfy those challenges, a fast and efficient
strategy consists in upgrading a low cost general purpose processor core. This
approach is based on the personalization of a general RISC processor core
according the target multimedia application requirements. Thus, if the extra
cost is justified, the general purpose processor GPP core can be enforced with
instruction level coprocessors, coarse grain dedicated hardware, ad hoc
memories or new GPP cores. In this way the final design solution is tailored to
the application requirements. The proposed approach is based on three main
steps: the first one is the analysis of the targeted application using
efficient metrics. The second step is the selection of the appropriate
architecture template according to the first step results and recommendations.
The third step is the architecture generation. This approach is experimented
using various image and video algorithms showing its feasibility
- âŠ