23,747 research outputs found
A 3-D Track-Finding Processor for the CMS Level-1 Muon Trigger
We report on the design and test results of a prototype processor for the CMS
Level-1 trigger that performs 3-D track reconstruction and measurement from
data recorded by the cathode strip chambers of the endcap muon system. The
tracking algorithms are written in C++ using a class library we developed that
facilitates automatic conversion to Verilog. The code is synthesized into
firmware for field-programmable gate-arrays from the Xilinx Virtex-II series. A
second-generation prototype has been developed and is currently under test. It
performs regional track-finding in a 60 degree azimuthal sector and accepts 3
GB/s of input data synchronously with the 40 MHz beam crossing frequency. The
latency of the track-finding algorithms is expected to be 250 ns, including
geometrical alignment correction of incoming track segments and a final
momentum assignment based on the muon trajectory in the non-uniform magnetic
field in the CMS endcaps.Comment: 7 pages, 5 figures, proceedings for the conference on Computing in
High Energy and Nuclear Physics, March 24-28 2003, La Jolla, Californi
Breadth First Search Vectorization on the Intel Xeon Phi
Breadth First Search (BFS) is a building block for graph algorithms and has
recently been used for large scale analysis of information in a variety of
applications including social networks, graph databases and web searching. Due
to its importance, a number of different parallel programming models and
architectures have been exploited to optimize the BFS. However, due to the
irregular memory access patterns and the unstructured nature of the large
graphs, its efficient parallelization is a challenge. The Xeon Phi is a
massively parallel architecture available as an off-the-shelf accelerator,
which includes a powerful 512 bit vector unit with optimized scatter and gather
functions. Given its potential benefits, work related to graph traversing on
this architecture is an active area of research.
We present a set of experiments in which we explore architectural features of
the Xeon Phi and how best to exploit them in a top-down BFS algorithm but the
techniques can be applied to the current state-of-the-art hybrid, top-down plus
bottom-up, algorithms.
We focus on the exploitation of the vector unit by developing an improved
highly vectorized OpenMP parallel algorithm, using vector intrinsics, and
understanding the use of data alignment and prefetching. In addition, we
investigate the impact of hyperthreading and thread affinity on performance, a
topic that appears under researched in the literature. As a result, we achieve
what we believe is the fastest published top-down BFS algorithm on the version
of Xeon Phi used in our experiments. The vectorized BFS top-down source code
presented in this paper can be available on request as free-to-use software
Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors
Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features
Compiling vector pascal to the XeonPhi
Intel's XeonPhi is a highly parallel x86 architecture chip made by Intel. It has a number of novel features which make it a particularly challenging target for the compiler writer. This paper describes the techniques used to port the Glasgow Vector Pascal Compiler to this architecture and assess its performance by comparisons of the XeonPhi with 3 other machines running the same algorithms
- …