8,503 research outputs found
Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels
Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels
concurrently. On these GPUs, the thread block scheduler (TBS) uses the FIFO
policy to schedule their thread blocks. We show that FIFO leaves performance to
chance, resulting in significant loss of performance and fairness. To improve
performance and fairness, we propose use of the preemptive Shortest Remaining
Time First (SRTF) policy instead. Although SRTF requires an estimate of runtime
of GPU kernels, we show that such an estimate of the runtime can be easily
obtained using online profiling and exploiting a simple observation on GPU
kernels' grid structure. Specifically, we propose a novel Structural Runtime
Predictor. Using a simple Staircase model of GPU kernel execution, we show that
the runtime of a kernel can be predicted by profiling only the first few thread
blocks. We evaluate an online predictor based on this model on benchmarks from
ERCBench, and find that it can estimate the actual runtime reasonably well
after the execution of only a single thread block. Next, we design a thread
block scheduler that is both concurrent kernel-aware and uses this predictor.
We implement the SRTF policy and evaluate it on two-program workloads from
ERCBench. SRTF improves STP by 1.18x and ANTT by 2.25x over FIFO. When compared
to MPMax, a state-of-the-art resource allocation policy for concurrent kernels,
SRTF improves STP by 1.16x and ANTT by 1.3x. To improve fairness, we also
propose SRTF/Adaptive which controls resource usage of concurrently executing
kernels to maximize fairness. SRTF/Adaptive improves STP by 1.12x, ANTT by
2.23x and Fairness by 2.95x compared to FIFO. Overall, our implementation of
SRTF achieves system throughput to within 12.64% of Shortest Job First (SJF, an
oracle optimal scheduling policy), bridging 49% of the gap between FIFO and
SJF.Comment: 14 pages, full pre-review version of PACT 2014 poste
Hazard Contribution Modes of Machine Learning Components
Amongst the essential steps to be taken towards developing and deploying safe systems with embedded learning-enabled components (LECs) i.e., software components that use ma- chine learning (ML)are to analyze and understand the con- tribution of the constituent LECs to safety, and to assure that those contributions have been appropriately managed. This paper addresses both steps by, first, introducing the notion of hazard contribution modes (HCMs) a categorization of the ways in which the ML elements of LECs can contribute to hazardous system states; and, second, describing how argumentation patterns can capture the reasoning that can be used to assure HCM mitigation. Our framework is generic in the sense that the categories of HCMs developed i) can admit different learning schemes, i.e., supervised, unsupervised, and reinforcement learning, and ii) are not dependent on the type of system in which the LECs are embedded, i.e., both cyber and cyber-physical systems. One of the goals of this work is to serve a starting point for systematizing L analysis towards eventually automating it in a tool
On the Asymptotic Distribution of the Transaction Price in a Clock Model of a Multi-Unit, Oral, Ascending-Price Auction within the Common-Value Paradigm
Using a clock model of a multi-unit, oral, ascending-price auction, within the commonvalue paradigm, we analyse the asymptotic behaviour of the transaction price as the number of bidders gets large. We find that even though the transaction price is determined by a (potentially small) fraction of losing drop-out bids, that price converges in probability to the ex ante unknown, true value. Subsequently, we derive the asymptotic distribution of the transaction price. Finally, we apply our methods to data from an auction of taxi license plates held in Shenzhen, China.common value, information aggregation, multi-unit auctions, taxis
On the Asymptotic Distribution of the Transaction Price in a Clock Model of a Multi-Unit, Oral, Ascending-Price Auction within the Common-Value Paradigm
Using a clock model of a multi-unit, oral, ascending-price auction, within the commonvalue paradigm, we analyse the asymptotic behaviour of the transaction price as the number of bidders gets large. We find that even though the transaction price is determined by a (potentially small) fraction of losing drop-out bids, that price converges in probability to the ex ante unknown, true value. Subsequently, we derive the asymptotic distribution of the transaction price. Finally, we apply our methods to data from an auction of taxi license plates held in Shenzhen, China.common value; information aggregation; multi-unit auctions; taxis
Radiation Pressure Induced Instabilities in Laser Interferometric Detectors of Gravitational Waves
The large scale interferometric gravitational wave detectors consist of
Fabry-Perot cavities operating at very high powers ranging from tens of kW to
MW for next generations. The high powers may result in several nonlinear
effects which would affect the performance of the detector. In this paper, we
investigate the effects of radiation pressure, which tend to displace the
mirrors from their resonant position resulting in the detuning of the cavity.
We observe a remarkable effect, namely, that the freely hanging mirrors gain
energy continuously and swing with increasing amplitude. It is found that the
`time delay', that is, the time taken for the field to adjust to its
instantaneous equilibrium value, when the mirrors are in motion, is responsible
for this effect. This effect is likely to be important in the optimal operation
of the full-scale interferometers such as VIRGO and LIGO.Comment: 27 pages, 11 figures, RevTex styl
Similarity laws of lunar and terrestrial volcanic flows
A mathematical model of a one dimensional, steady duct flow of a mixture of a gas and small solid particles (rock) was analyzed and applied to the lunar and the terrestrial volcanic flows under geometrically and dynamically similar conditions. Numerical results for the equilibrium two phase flows of lunar and terrestrial volcanoes under similar conditions are presented. The study indicates that: (1) the lunar crater is much larger than the corresponding terrestrial crater; (2) the exit velocity from the lunar volcanic flow may be higher than the lunar escape velocity but the exit velocity of terrestrial volcanic flow is much less than that of the lunar case; and (3) the thermal effects on the lunar volcanic flow are much larger than those of the terrestrial case
Optimising the directional sensitivity of LISA
It was shown in a previous work that the data combinations canceling laser
frequency noise constitute a module - the module of syzygies. The cancellation
of laser frequency noise is crucial for obtaining the requisite sensitivity for
LISA. In this work we show how the sensitivity of LISA can be optimised for a
monochromatic source - a compact binary - whose direction is known, by using
appropriate data combinations in the module. A stationary source in the
barycentric frame appears to move in the LISA frame and our strategy consists
of "coherently tracking" the source by appropriately "switching" the data
combinations so that they remain optimal at all times. Assuming that the
polarisation of the source is not known, we average the signal over the
polarisations. We find that the best statistic is the `network' statistic, in
which case LISA can be construed of as two independent detectors. We compare
our results with the Michelson combination, which has been used for obtaining
the standard sensitivity curve for LISA, and with the observable obtained by
optimally switching the three Michelson combinations. We find that for sources
lying in the ecliptic plane the improvement in SNR increases from 34% at low
frequencies to nearly 90% at around 20 mHz. Finally we present the
signal-to-noise ratios for some known binaries in our galaxy. We also show
that, if at low frequencies SNRs of both polarisations can be measured, the
inclination angle of the plane of the orbit of the binary can be estimated.Comment: 16 pages, 8 figures, submitted to Phys Rev
Unified continuum approach to crystal surface morphological relaxation
A continuum theory is used to predict scaling laws for the morphological
relaxation of crystal surfaces in two independent space dimensions. The goal is
to unify previously disconnected experimental observations of decaying surface
profiles. The continuum description is derived from the motion of interacting
atomic steps. For isotropic diffusion of adatoms across each terrace, induced
adatom fluxes transverse and parallel to step edges obey different laws,
yielding a tensor mobility for the continuum surface flux. The partial
differential equation (PDE) for the height profile expresses an interplay of
step energetics and kinetics, and aspect ratio of surface topography that
plausibly unifies observations of decaying bidirectional surface corrugations.
The PDE reduces to known evolution equations for axisymmetric mounds and
one-dimensional periodic corrugations.Comment: 5 pages, 1 figur
- …