8,503 research outputs found

    Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels

    Full text link
    Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels concurrently. On these GPUs, the thread block scheduler (TBS) uses the FIFO policy to schedule their thread blocks. We show that FIFO leaves performance to chance, resulting in significant loss of performance and fairness. To improve performance and fairness, we propose use of the preemptive Shortest Remaining Time First (SRTF) policy instead. Although SRTF requires an estimate of runtime of GPU kernels, we show that such an estimate of the runtime can be easily obtained using online profiling and exploiting a simple observation on GPU kernels' grid structure. Specifically, we propose a novel Structural Runtime Predictor. Using a simple Staircase model of GPU kernel execution, we show that the runtime of a kernel can be predicted by profiling only the first few thread blocks. We evaluate an online predictor based on this model on benchmarks from ERCBench, and find that it can estimate the actual runtime reasonably well after the execution of only a single thread block. Next, we design a thread block scheduler that is both concurrent kernel-aware and uses this predictor. We implement the SRTF policy and evaluate it on two-program workloads from ERCBench. SRTF improves STP by 1.18x and ANTT by 2.25x over FIFO. When compared to MPMax, a state-of-the-art resource allocation policy for concurrent kernels, SRTF improves STP by 1.16x and ANTT by 1.3x. To improve fairness, we also propose SRTF/Adaptive which controls resource usage of concurrently executing kernels to maximize fairness. SRTF/Adaptive improves STP by 1.12x, ANTT by 2.23x and Fairness by 2.95x compared to FIFO. Overall, our implementation of SRTF achieves system throughput to within 12.64% of Shortest Job First (SJF, an oracle optimal scheduling policy), bridging 49% of the gap between FIFO and SJF.Comment: 14 pages, full pre-review version of PACT 2014 poste

    Hazard Contribution Modes of Machine Learning Components

    Get PDF
    Amongst the essential steps to be taken towards developing and deploying safe systems with embedded learning-enabled components (LECs) i.e., software components that use ma- chine learning (ML)are to analyze and understand the con- tribution of the constituent LECs to safety, and to assure that those contributions have been appropriately managed. This paper addresses both steps by, first, introducing the notion of hazard contribution modes (HCMs) a categorization of the ways in which the ML elements of LECs can contribute to hazardous system states; and, second, describing how argumentation patterns can capture the reasoning that can be used to assure HCM mitigation. Our framework is generic in the sense that the categories of HCMs developed i) can admit different learning schemes, i.e., supervised, unsupervised, and reinforcement learning, and ii) are not dependent on the type of system in which the LECs are embedded, i.e., both cyber and cyber-physical systems. One of the goals of this work is to serve a starting point for systematizing L analysis towards eventually automating it in a tool

    On the Asymptotic Distribution of the Transaction Price in a Clock Model of a Multi-Unit, Oral, Ascending-Price Auction within the Common-Value Paradigm

    Get PDF
    Using a clock model of a multi-unit, oral, ascending-price auction, within the commonvalue paradigm, we analyse the asymptotic behaviour of the transaction price as the number of bidders gets large. We find that even though the transaction price is determined by a (potentially small) fraction of losing drop-out bids, that price converges in probability to the ex ante unknown, true value. Subsequently, we derive the asymptotic distribution of the transaction price. Finally, we apply our methods to data from an auction of taxi license plates held in Shenzhen, China.common value, information aggregation, multi-unit auctions, taxis

    On the Asymptotic Distribution of the Transaction Price in a Clock Model of a Multi-Unit, Oral, Ascending-Price Auction within the Common-Value Paradigm

    Get PDF
    Using a clock model of a multi-unit, oral, ascending-price auction, within the commonvalue paradigm, we analyse the asymptotic behaviour of the transaction price as the number of bidders gets large. We find that even though the transaction price is determined by a (potentially small) fraction of losing drop-out bids, that price converges in probability to the ex ante unknown, true value. Subsequently, we derive the asymptotic distribution of the transaction price. Finally, we apply our methods to data from an auction of taxi license plates held in Shenzhen, China.common value; information aggregation; multi-unit auctions; taxis

    Influence of seed source on growth and nitrogen fixation of Maackia amurensis Rupr. & Maxim.

    Get PDF

    Radiation Pressure Induced Instabilities in Laser Interferometric Detectors of Gravitational Waves

    Get PDF
    The large scale interferometric gravitational wave detectors consist of Fabry-Perot cavities operating at very high powers ranging from tens of kW to MW for next generations. The high powers may result in several nonlinear effects which would affect the performance of the detector. In this paper, we investigate the effects of radiation pressure, which tend to displace the mirrors from their resonant position resulting in the detuning of the cavity. We observe a remarkable effect, namely, that the freely hanging mirrors gain energy continuously and swing with increasing amplitude. It is found that the `time delay', that is, the time taken for the field to adjust to its instantaneous equilibrium value, when the mirrors are in motion, is responsible for this effect. This effect is likely to be important in the optimal operation of the full-scale interferometers such as VIRGO and LIGO.Comment: 27 pages, 11 figures, RevTex styl

    Similarity laws of lunar and terrestrial volcanic flows

    Get PDF
    A mathematical model of a one dimensional, steady duct flow of a mixture of a gas and small solid particles (rock) was analyzed and applied to the lunar and the terrestrial volcanic flows under geometrically and dynamically similar conditions. Numerical results for the equilibrium two phase flows of lunar and terrestrial volcanoes under similar conditions are presented. The study indicates that: (1) the lunar crater is much larger than the corresponding terrestrial crater; (2) the exit velocity from the lunar volcanic flow may be higher than the lunar escape velocity but the exit velocity of terrestrial volcanic flow is much less than that of the lunar case; and (3) the thermal effects on the lunar volcanic flow are much larger than those of the terrestrial case

    Optimising the directional sensitivity of LISA

    Get PDF
    It was shown in a previous work that the data combinations canceling laser frequency noise constitute a module - the module of syzygies. The cancellation of laser frequency noise is crucial for obtaining the requisite sensitivity for LISA. In this work we show how the sensitivity of LISA can be optimised for a monochromatic source - a compact binary - whose direction is known, by using appropriate data combinations in the module. A stationary source in the barycentric frame appears to move in the LISA frame and our strategy consists of "coherently tracking" the source by appropriately "switching" the data combinations so that they remain optimal at all times. Assuming that the polarisation of the source is not known, we average the signal over the polarisations. We find that the best statistic is the `network' statistic, in which case LISA can be construed of as two independent detectors. We compare our results with the Michelson combination, which has been used for obtaining the standard sensitivity curve for LISA, and with the observable obtained by optimally switching the three Michelson combinations. We find that for sources lying in the ecliptic plane the improvement in SNR increases from 34% at low frequencies to nearly 90% at around 20 mHz. Finally we present the signal-to-noise ratios for some known binaries in our galaxy. We also show that, if at low frequencies SNRs of both polarisations can be measured, the inclination angle of the plane of the orbit of the binary can be estimated.Comment: 16 pages, 8 figures, submitted to Phys Rev

    Unified continuum approach to crystal surface morphological relaxation

    Full text link
    A continuum theory is used to predict scaling laws for the morphological relaxation of crystal surfaces in two independent space dimensions. The goal is to unify previously disconnected experimental observations of decaying surface profiles. The continuum description is derived from the motion of interacting atomic steps. For isotropic diffusion of adatoms across each terrace, induced adatom fluxes transverse and parallel to step edges obey different laws, yielding a tensor mobility for the continuum surface flux. The partial differential equation (PDE) for the height profile expresses an interplay of step energetics and kinetics, and aspect ratio of surface topography that plausibly unifies observations of decaying bidirectional surface corrugations. The PDE reduces to known evolution equations for axisymmetric mounds and one-dimensional periodic corrugations.Comment: 5 pages, 1 figur
    • …
    corecore