2,952 research outputs found
Stability of the Gains of the STAR Endcap Calorimeter from 2009 to 2012
The Solenoid Tracker at RHIC (STAR) experiment, based at Brookhaven National Laboratory\u27s Relativistic Heavy Ion Collider, uses polarized-proton collisions to investigate sea quark and gluon contributions to the known proton spin. The STAR detector\u27s Endcap Electromagnetic Calorimeter (EEMC) is of particular interest in this experiment because it covers a kinematic region, which is sensitive to gluons carrying a low fraction of the proton momentum, where the gluon spin is almost entirely unconstrained. The EEMC is located in the intermediate pseudorapidity range, 1 \u3c η \u3c 2, and measures the electromagnetic energy of particles produced by the collisions using a lead-scintillator sampling calorimeter. The calorimeter consists of several layers that include pre-shower, shower maximum, tower, and post-shower detectors. In these detectors, the energy gains, which convert a measured signal into an energy deposition, have been determined using data taken from the years 2009, 2011, and 2012. These gains will be analyzed and studied in order to understand the calibration of the detector
goSLP: Globally Optimized Superword Level Parallelism Framework
Modern microprocessors are equipped with single instruction multiple data
(SIMD) or vector instruction sets which allow compilers to exploit superword
level parallelism (SLP), a type of fine-grained parallelism. Current SLP
auto-vectorization techniques use heuristics to discover vectorization
opportunities in high-level language code. These heuristics are fragile, local
and typically only present one vectorization strategy that is either accepted
or rejected by a cost model. We present goSLP, a novel SLP auto-vectorization
framework which solves the statement packing problem in a pairwise optimal
manner. Using an integer linear programming (ILP) solver, goSLP searches the
entire space of statement packing opportunities for a whole function at a time,
while limiting total compilation time to a few minutes. Furthermore, goSLP
optimally solves the vector permutation selection problem using dynamic
programming. We implemented goSLP in the LLVM compiler infrastructure,
achieving a geometric mean speedup of 7.58% on SPEC2017fp, 2.42% on SPEC2006fp
and 4.07% on NAS benchmarks compared to LLVM's existing SLP auto-vectorizer.Comment: Published at OOPSLA 201
Spatial variation of water supply and demand in Sri Lanka
Water supplyWater demandRiver basinsRunoffIrrigation efficiency
A parallel Viterbi decoder for block cyclic and convolution codes
We present a parallel version of Viterbi's decoding procedure, for which we are able to demonstrate that the resultant task graph has restricted complexity in that the number of communications to or from any processor cannot exceed 4 for BCH codes. The resulting algorithm works in lock step making it suitable for implementation on a systolic processor array, which we have implemented on a field programmable gate array and demonstrate the perfect scaling of the algorithm for two exemplar BCH codes. The parallelisation strategy is applicable to all cyclic codes and convolution codes. We also present a novel method for generating the state transition diagrams for these codes
Format Abstraction for Sparse Tensor Algebra Compilers
This paper shows how to build a sparse tensor algebra compiler that is
agnostic to tensor formats (data layouts). We develop an interface that
describes formats in terms of their capabilities and properties, and show how
to build a modular code generator where new formats can be added as plugins. We
then describe six implementations of the interface that compose to form the
dense, CSR/CSF, COO, DIA, ELL, and HASH tensor formats and countless variants
thereof. With these implementations at hand, our code generator can generate
code to compute any tensor algebra expression on any combination of the
aforementioned formats.
To demonstrate our technique, we have implemented it in the taco tensor
algebra compiler. Our modular code generator design makes it simple to add
support for new tensor formats, and the performance of the generated code is
competitive with hand-optimized implementations. Furthermore, by extending taco
to support a wider range of formats specialized for different application and
data characteristics, we can improve end-user application performance. For
example, if input data is provided in the COO format, our technique allows
computing a single matrix-vector multiplication directly with the data in COO,
which is up to 3.6 faster than by first converting the data to CSR.Comment: Presented at OOPSLA 201
Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks
Predicting the number of clock cycles a processor takes to execute a block of
assembly instructions in steady state (the throughput) is important for both
compiler designers and performance engineers. Building an analytical model to
do so is especially complicated in modern x86-64 Complex Instruction Set
Computer (CISC) machines with sophisticated processor microarchitectures in
that it is tedious, error prone, and must be performed from scratch for each
processor generation. In this paper we present Ithemal, the first tool which
learns to predict the throughput of a set of instructions. Ithemal uses a
hierarchical LSTM--based approach to predict throughput based on the opcodes
and operands of instructions in a basic block. We show that Ithemal is more
accurate than state-of-the-art hand-written tools currently used in compiler
backends and static machine code analyzers. In particular, our model has less
than half the error of state-of-the-art analytical models (LLVM's llvm-mca and
Intel's IACA). Ithemal is also able to predict these throughput values just as
fast as the aforementioned tools, and is easily ported across a variety of
processor microarchitectures with minimal developer effort.Comment: Published at 36th International Conference on Machine Learning (ICML)
201
- …