675 research outputs found
Massively Parallel Computing and the Search for Jets and Black Holes at the LHC
Massively parallel computing at the LHC could be the next leap necessary to
reach an era of new discoveries at the LHC after the Higgs discovery.
Scientific computing is a critical component of the LHC experiment, including
operation, trigger, LHC computing GRID, simulation, and analysis. One way to
improve the physics reach of the LHC is to take advantage of the flexibility of
the trigger system by integrating coprocessors based on Graphics Processing
Units (GPUs) or the Many Integrated Core (MIC) architecture into its server
farm. This cutting edge technology provides not only the means to accelerate
existing algorithms, but also the opportunity to develop new algorithms that
select events in the trigger that previously would have evaded detection. In
this article we describe new algorithms that would allow to select in the
trigger new topological signatures that include non-prompt jet and black
hole--like objects in the silicon tracker.Comment: 15 pages, 11 figures, submitted to NIM
Highly parallel computation
Highly parallel computing architectures are the only means to achieve the computation rates demanded by advanced scientific problems. A decade of research has demonstrated the feasibility of such machines and current research focuses on which architectures designated as multiple instruction multiple datastream (MIMD) and single instruction multiple datastream (SIMD) have produced the best results to date; neither shows a decisive advantage for most near-homogeneous scientific problems. For scientific problems with many dissimilar parts, more speculative architectures such as neural networks or data flow may be needed
First Evaluation of the CPU, GPGPU and MIC Architectures for Real Time Particle Tracking based on Hough Transform at the LHC
Recent innovations focused around {\em parallel} processing, either through
systems containing multiple processors or processors containing multiple cores,
hold great promise for enhancing the performance of the trigger at the LHC and
extending its physics program. The flexibility of the CMS/ATLAS trigger system
allows for easy integration of computational accelerators, such as NVIDIA's
Tesla Graphics Processing Unit (GPU) or Intel's \xphi, in the High Level
Trigger. These accelerators have the potential to provide faster or more energy
efficient event selection, thus opening up possibilities for new complex
triggers that were not previously feasible. At the same time, it is crucial to
explore the performance limits achievable on the latest generation multicore
CPUs with the use of the best software optimization methods. In this article, a
new tracking algorithm based on the Hough transform will be evaluated for the
first time on a multi-core Intel Xeon E5-2697v2 CPU, an NVIDIA Tesla K20c GPU,
and an Intel \xphi\ 7120 coprocessor. Preliminary time performance will be
presented.Comment: 13 pages, 4 figures, Accepted to JINS
How to cluster in parallel with neural networks
Partitioning a set of N patterns in a d-dimensional metric space into K clusters - in a way that those in a given cluster are more similar to each other than the rest - is a problem of interest in astrophysics, image analysis and other fields. As there are approximately K(N)/K (factorial) possible ways of partitioning the patterns among K clusters, finding the best solution is beyond exhaustive search when N is large. Researchers show that this problem can be formulated as an optimization problem for which very good, but not necessarily optimal solutions can be found by using a neural network. To do this the network must start from many randomly selected initial states. The network is simulated on the MPP (a 128 x 128 SIMD array machine), where researchers use the massive parallelism not only in solving the differential equations that govern the evolution of the network, but also by starting the network from many initial states at once, thus obtaining many solutions in one run. Researchers obtain speedups of two to three orders of magnitude over serial implementations and the promise through Analog VLSI implementations of speedups comensurate with human perceptual abilities
Fast Low Fidelity Microsimulation of Vehicle Traffic on Supercomputers
A set of very simple rules for driving behavior used to simulate roadway
traffic gives realistic results. Because of its simplicity, it is easy to
implement the model on supercomputers (vectorizing and parallel), where we have
achieved real time limits of more than 4~million~kilometers (or more than
53~million vehicle sec/sec). The model can be used for applications where both
high simulation speed and individual vehicle resolution are needed. We use the
model for extended statistical analysis to gain insight into traffic phenomena
near capacity, and we discuss that this model is a good candidate for network
routing applications. (Submitted to Transportation Research Board Meeting, Jan.
1994, Washington D.C.)Comment: 11 pages, latex, figs. available upon request, Cologne-WP 93.14
Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems
Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ).
This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)
- …