136,092 research outputs found
Probing transverse momentum dependent gluon distribution functions from hadronic quarkonium pair production
The inclusive hadronic production of ( or ) pair is
proposed to extract the transverse momentum dependent(TMD) gluon distribution
functions. We use nonrelativistic QCD(NRQCD) for the production of .
Under nonrelativistic limit TMD factorization for this process is assumed to
make a lowest order calculation. For unpolarized initial hadrons, unpolarized
and linearly polarized gluon distributions can be extracted by studying
different angular distributions.Comment: 7 pages, 3 figures, 1 tabl
AutoAccel: Automated Accelerator Generation and Optimization with Composable, Parallel and Pipeline Architecture
CPU-FPGA heterogeneous architectures are attracting ever-increasing attention
in an attempt to advance computational capabilities and energy efficiency in
today's datacenters. These architectures provide programmers with the ability
to reprogram the FPGAs for flexible acceleration of many workloads.
Nonetheless, this advantage is often overshadowed by the poor programmability
of FPGAs whose programming is conventionally a RTL design practice. Although
recent advances in high-level synthesis (HLS) significantly improve the FPGA
programmability, it still leaves programmers facing the challenge of
identifying the optimal design configuration in a tremendous design space.
This paper aims to address this challenge and pave the path from software
programs towards high-quality FPGA accelerators. Specifically, we first propose
the composable, parallel and pipeline (CPP) microarchitecture as a template of
accelerator designs. Such a well-defined template is able to support efficient
accelerator designs for a broad class of computation kernels, and more
importantly, drastically reduce the design space. Also, we introduce an
analytical model to capture the performance and resource trade-offs among
different design configurations of the CPP microarchitecture, which lays the
foundation for fast design space exploration. On top of the CPP
microarchitecture and its analytical model, we develop the AutoAccel framework
to make the entire accelerator generation automated. AutoAccel accepts a
software program as an input and performs a series of code transformations
based on the result of the analytical-model-based design space exploration to
construct the desired CPP microarchitecture. Our experiments show that the
AutoAccel-generated accelerators outperform their corresponding software
implementations by an average of 72x for a broad class of computation kernels
Hardness Results for Structured Linear Systems
We show that if the nearly-linear time solvers for Laplacian matrices and
their generalizations can be extended to solve just slightly larger families of
linear systems, then they can be used to quickly solve all systems of linear
equations over the reals. This result can be viewed either positively or
negatively: either we will develop nearly-linear time algorithms for solving
all systems of linear equations over the reals, or progress on the families we
can solve in nearly-linear time will soon halt
- …