31,210 research outputs found
A Parallel Iterative Method for Computing Molecular Absorption Spectra
We describe a fast parallel iterative method for computing molecular
absorption spectra within TDDFT linear response and using the LCAO method. We
use a local basis of "dominant products" to parametrize the space of orbital
products that occur in the LCAO approach. In this basis, the dynamical
polarizability is computed iteratively within an appropriate Krylov subspace.
The iterative procedure uses a a matrix-free GMRES method to determine the
(interacting) density response. The resulting code is about one order of
magnitude faster than our previous full-matrix method. This acceleration makes
the speed of our TDDFT code comparable with codes based on Casida's equation.
The implementation of our method uses hybrid MPI and OpenMP parallelization in
which load balancing and memory access are optimized. To validate our approach
and to establish benchmarks, we compute spectra of large molecules on various
types of parallel machines.
The methods developed here are fairly general and we believe they will find
useful applications in molecular physics/chemistry, even for problems that are
beyond TDDFT, such as organic semiconductors, particularly in photovoltaics.Comment: 20 pages, 17 figures, 3 table
Powerful Trend Function Tests That Are Robust to Strong Serial Correlation with an Application to the Prebisch-Singer Hypothesis
In this paper we propose tests for hypotheses regarding the parameters of the deterministic trend function of a univariate time series. The tests do not require knowledge of the form of serial correlation in the data and they are robust to strong serial correlation. The data can contain a unit root and the tests still have the correct size asymptotically. The tests we analyze are standard heteroskedasticity autocorrelation (HAC) robust tests based on nonparametric kernel variance estimators. We analyze these tests using the ï¾…xed-b asymptotic framework recently proposed by Kiefer and Vogelsang (2002). This analysis allows us to analyze the power properties of the tests with regards to bandwidth and kernel choices. Our analysis shows that among popular kernels, there are speciï¾…c kernel and bandwidth choices that deliver tests with maximal power within a speciï¾…c class of tests. Based on the theoretical results, we propose a data dependent bandwidth rule that maximizes integrated power. Our recommended test is shown to have power that dominates a related test proposed by Vogelsang (1998). We apply the recommended test to the logarithm of a net barter terms of trade series and we ï¾…nd that this series has a statistically signiï¾…cant negative slope. This ï¾…nding is consistent with the well known Prebisch-Singer hypothesis.
Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels
Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels
concurrently. On these GPUs, the thread block scheduler (TBS) uses the FIFO
policy to schedule their thread blocks. We show that FIFO leaves performance to
chance, resulting in significant loss of performance and fairness. To improve
performance and fairness, we propose use of the preemptive Shortest Remaining
Time First (SRTF) policy instead. Although SRTF requires an estimate of runtime
of GPU kernels, we show that such an estimate of the runtime can be easily
obtained using online profiling and exploiting a simple observation on GPU
kernels' grid structure. Specifically, we propose a novel Structural Runtime
Predictor. Using a simple Staircase model of GPU kernel execution, we show that
the runtime of a kernel can be predicted by profiling only the first few thread
blocks. We evaluate an online predictor based on this model on benchmarks from
ERCBench, and find that it can estimate the actual runtime reasonably well
after the execution of only a single thread block. Next, we design a thread
block scheduler that is both concurrent kernel-aware and uses this predictor.
We implement the SRTF policy and evaluate it on two-program workloads from
ERCBench. SRTF improves STP by 1.18x and ANTT by 2.25x over FIFO. When compared
to MPMax, a state-of-the-art resource allocation policy for concurrent kernels,
SRTF improves STP by 1.16x and ANTT by 1.3x. To improve fairness, we also
propose SRTF/Adaptive which controls resource usage of concurrently executing
kernels to maximize fairness. SRTF/Adaptive improves STP by 1.12x, ANTT by
2.23x and Fairness by 2.95x compared to FIFO. Overall, our implementation of
SRTF achieves system throughput to within 12.64% of Shortest Job First (SJF, an
oracle optimal scheduling policy), bridging 49% of the gap between FIFO and
SJF.Comment: 14 pages, full pre-review version of PACT 2014 poste
Asteroseismic measurement of surface-to-core rotation in a main-sequence A star, KIC 11145123
We have discovered rotationally split core g-mode triplets and surface p-mode triplets and quintuplets in a terminal age main-sequence A star, KIC 11145123, that shows both δ Sct p-mode pulsations and γ Dor g-mode pulsations. This gives the first robust determination of the rotation of the deep core and surface of a main-sequence star, essentially model independently.
We find its rotation to be nearly uniform with a period near 100 d, but we show with high confidence that the surface rotates slightly faster than the core. A strong angular momentum transfer mechanism must be operating to produce the nearly rigid rotation, and a mechanism other than viscosity must be operating toproduce a more rapidly rotating surface than core. Our asteroseismic result, along with previous asteroseismic constraints on internal rotation in some B stars, and measurements of internal rotation in some subgiant, giant and white dwarf stars,has made angular momentum transport in stars throughout their lifetimes an observational science
- …