675 research outputs found

    Massively Parallel Computing and the Search for Jets and Black Holes at the LHC

    Full text link
    Massively parallel computing at the LHC could be the next leap necessary to reach an era of new discoveries at the LHC after the Higgs discovery. Scientific computing is a critical component of the LHC experiment, including operation, trigger, LHC computing GRID, simulation, and analysis. One way to improve the physics reach of the LHC is to take advantage of the flexibility of the trigger system by integrating coprocessors based on Graphics Processing Units (GPUs) or the Many Integrated Core (MIC) architecture into its server farm. This cutting edge technology provides not only the means to accelerate existing algorithms, but also the opportunity to develop new algorithms that select events in the trigger that previously would have evaded detection. In this article we describe new algorithms that would allow to select in the trigger new topological signatures that include non-prompt jet and black hole--like objects in the silicon tracker.Comment: 15 pages, 11 figures, submitted to NIM

    Highly parallel computation

    Get PDF
    Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed

    First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC

    Full text link
    Recent innovations focused around {\em parallel} processing, either through systems containing multiple processors or processors containing multiple cores, hold great promise for enhancing the performance of the trigger at the LHC and extending its physics program. The flexibility of the CMS/ATLAS trigger system allows for easy integration of computational accelerators, such as NVIDIA's Tesla Graphics Processing Unit (GPU) or Intel's \xphi, in the High Level Trigger. These accelerators have the potential to provide faster or more energy efficient event selection, thus opening up possibilities for new complex triggers that were not previously feasible. At the same time, it is crucial to explore the performance limits achievable on the latest generation multicore CPUs with the use of the best software optimization methods. In this article, a new tracking algorithm based on the Hough transform will be evaluated for the first time on a multi-core Intel Xeon E5-2697v2 CPU, an NVIDIA Tesla K20c GPU, and an Intel \xphi\ 7120 coprocessor. Preliminary time performance will be presented.Comment: 13 pages, 4 figures, Accepted to JINS

    How to cluster in parallel with neural networks

    Get PDF
    Partitioning a set of N patterns in a d-dimensional metric space into K clusters - in a way that those in a given cluster are more similar to each other than the rest - is a problem of interest in astrophysics, image analysis and other fields. As there are approximately K(N)/K (factorial) possible ways of partitioning the patterns among K clusters, finding the best solution is beyond exhaustive search when N is large. Researchers show that this problem can be formulated as an optimization problem for which very good, but not necessarily optimal solutions can be found by using a neural network. To do this the network must start from many randomly selected initial states. The network is simulated on the MPP (a 128 x 128 SIMD array machine), where researchers use the massive parallelism not only in solving the differential equations that govern the evolution of the network, but also by starting the network from many initial states at once, thus obtaining many solutions in one run. Researchers obtain speedups of two to three orders of magnitude over serial implementations and the promise through Analog VLSI implementations of speedups comensurate with human perceptual abilities

    Fast Low Fidelity Microsimulation of Vehicle Traffic on Supercomputers

    Full text link
    A set of very simple rules for driving behavior used to simulate roadway traffic gives realistic results. Because of its simplicity, it is easy to implement the model on supercomputers (vectorizing and parallel), where we have achieved real time limits of more than 4~million~kilometers (or more than 53~million vehicle sec/sec). The model can be used for applications where both high simulation speed and individual vehicle resolution are needed. We use the model for extended statistical analysis to gain insight into traffic phenomena near capacity, and we discuss that this model is a good candidate for network routing applications. (Submitted to Transportation Research Board Meeting, Jan. 1994, Washington D.C.)Comment: 11 pages, latex, figs. available upon request, Cologne-WP 93.14

    Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems

    Get PDF
    Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ). This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)
    corecore