23,747 research outputs found

    A 3-D Track-Finding Processor for the CMS Level-1 Muon Trigger

    Full text link
    We report on the design and test results of a prototype processor for the CMS Level-1 trigger that performs 3-D track reconstruction and measurement from data recorded by the cathode strip chambers of the endcap muon system. The tracking algorithms are written in C++ using a class library we developed that facilitates automatic conversion to Verilog. The code is synthesized into firmware for field-programmable gate-arrays from the Xilinx Virtex-II series. A second-generation prototype has been developed and is currently under test. It performs regional track-finding in a 60 degree azimuthal sector and accepts 3 GB/s of input data synchronously with the 40 MHz beam crossing frequency. The latency of the track-finding algorithms is expected to be 250 ns, including geometrical alignment correction of incoming track segments and a final momentum assignment based on the muon trajectory in the non-uniform magnetic field in the CMS endcaps.Comment: 7 pages, 5 figures, proceedings for the conference on Computing in High Energy and Nuclear Physics, March 24-28 2003, La Jolla, Californi

    Breadth First Search Vectorization on the Intel Xeon Phi

    Full text link
    Breadth First Search (BFS) is a building block for graph algorithms and has recently been used for large scale analysis of information in a variety of applications including social networks, graph databases and web searching. Due to its importance, a number of different parallel programming models and architectures have been exploited to optimize the BFS. However, due to the irregular memory access patterns and the unstructured nature of the large graphs, its efficient parallelization is a challenge. The Xeon Phi is a massively parallel architecture available as an off-the-shelf accelerator, which includes a powerful 512 bit vector unit with optimized scatter and gather functions. Given its potential benefits, work related to graph traversing on this architecture is an active area of research. We present a set of experiments in which we explore architectural features of the Xeon Phi and how best to exploit them in a top-down BFS algorithm but the techniques can be applied to the current state-of-the-art hybrid, top-down plus bottom-up, algorithms. We focus on the exploitation of the vector unit by developing an improved highly vectorized OpenMP parallel algorithm, using vector intrinsics, and understanding the use of data alignment and prefetching. In addition, we investigate the impact of hyperthreading and thread affinity on performance, a topic that appears under researched in the literature. As a result, we achieve what we believe is the fastest published top-down BFS algorithm on the version of Xeon Phi used in our experiments. The vectorized BFS top-down source code presented in this paper can be available on request as free-to-use software

    A highly parameterized and efficient FPGA-based skeleton for pairwise biological sequence alignment

    Get PDF

    Vienna FORTRAN: A FORTRAN language extension for distributed memory multiprocessors

    Get PDF
    Exploiting the performance potential of distributed memory machines requires a careful distribution of data across the processors. Vienna FORTRAN is a language extension of FORTRAN which provides the user with a wide range of facilities for such mapping of data structures. However, programs in Vienna FORTRAN are written using global data references. Thus, the user has the advantage of a shared memory programming paradigm while explicitly controlling the placement of data. The basic features of Vienna FORTRAN are presented along with a set of examples illustrating the use of these features

    Compiling vector pascal to the XeonPhi

    Get PDF
    Intel's XeonPhi is a highly parallel x86 architecture chip made by Intel. It has a number of novel features which make it a particularly challenging target for the compiler writer. This paper describes the techniques used to port the Glasgow Vector Pascal Compiler to this architecture and assess its performance by comparisons of the XeonPhi with 3 other machines running the same algorithms
    • …
    corecore