5 research outputs found

    LIPIcs

    Get PDF
    We revisit the problem of estimating entropy of discrete distributions from independent samples, studied recently by Acharya, Orlitsky, Suresh and Tyagi (SODA 2015), improving their upper and lower bounds on the necessary sample size n. For estimating Renyi entropy of order alpha, up to constant accuracy and error probability, we show the following * Upper bounds n = O(1) 2^{(1-1/alpha)H_alpha} for integer alpha>1, as the worst case over distributions with Renyi entropy equal to H_alpha. * Lower bounds n = Omega(1) K^{1-1/alpha} for any real alpha>1, with the constant being an inverse polynomial of the accuracy, as the worst case over all distributions on K elements. Our upper bounds essentially replace the alphabet size by a factor exponential in the entropy, which offers improvements especially in low or medium entropy regimes (interesting for example in anomaly detection). As for the lower bounds, our proof explicitly shows how the complexity depends on both alphabet and accuracy, partially solving the open problem posted in previous works. The argument for upper bounds derives a clean identity for the variance of falling-power sum of a multinomial distribution. Our approach for lower bounds utilizes convex optimization to find a distribution with possibly worse estimation performance, and may be of independent interest as a tool to work with Le Cam’s two point method

    Forsøk med parallell sortering på flerkjerne-CPU og GPU

    Get PDF
    Moderne datamaskiner blir stadig mer parallelle. Det er nå vanlig at prosessorer har alt fra 2 til 8 kjerner. I tillegg blir det også mer vanlig å benytte skjermkort som en medprosessor for enkelte typer beregninger. Dette betyr at det i økende grad blir viktig for oss som utviklere å ta hensyn til parallelle arkitekturer dersom vi ønsker god ytelse i applikasjonene våre. I denne oppgaven vil vi forsøke å parallellisere algoritmer både for flerkjerne-CPU og for GPU. Samtidig ser vi nærmere på ett av informatikkens mest fundamentale problemer, nemlig sortering. Vi ser på flettesortering og venstre-radix-sortering, og undersøker hvordan disse algoritmene kan parallelliseres. Samtidig leter vi etter generelle metoder for å parallellisere algoritmer. Gjennom arbeidet vår ser vi at det på flerkjerne-CPU er lett å oppnå en viss grad av parallellitet. Samtidig ser vi at det er vanskelig å skrive algoritmer som skalerer godt når antall kjerner øker. På skjermkort ser vi at det er svært mye vanskeligere å skrive effektive algoritmer, og at det er færre bruksområder hvor det lønner seg

    High-performance tsunami modelling with modern GPU technology

    Get PDF
    PhD ThesisEarthquake-induced tsunamis commonly propagate in the deep ocean as long waves and develop into sharp-fronted surges moving rapidly coastward, which may be effectively simulated by hydrodynamic models solving the nonlinear shallow water equations (SWEs). Tsunamis can cause substantial economic and human losses, which could be mitigated through early warning systems given efficient and accurate modelling. Most existing tsunami models require long simulation times for real-world applications. This thesis presents a graphics processing unit (GPU) accelerated finite volume hydrodynamic model using the compute unified device architecture (CUDA) for computationally efficient tsunami simulations. Compared with a standard PC, the model is able to reduce run-time by a factor of > 40. The validated model is used to reproduce the 2011 Japan tsunami. Two source models were tested, one based on tsunami waveform inversion and another using deep-ocean tsunameters. Vertical sea surface displacement is computed by the Okada model, assuming instantaneous sea-floor deformation. Both source models can reproduce the wave propagation at offshore and nearshore gauges, but the tsunameter-based model better simulates the first wave amplitude. Effects of grid resolutions between 450-3600 m, slope limiters, and numerical accuracy are also investigated for the simulation of the 2011 Japan tsunami. Grid resolutions of 1-2 km perform well with a proper limiter; the Sweby limiter is optimal for coarser resolutions, recovers wave peaks better than minmod, and is more numerically stable than Superbee. One hour of tsunami propagation can be predicted in 50 times on a regular low-cost PC-hosted GPU, compared to a single CPU. For 450 m resolution on a larger-memory server-hosted GPU, performance increased by ~70 times. Finally, two adaptive mesh refinement (AMR) techniques including simplified dynamic adaptive grids on CPU and a static adaptive grid on GPU are introduced to provide multi-scale simulations. Both can reduce run-time by ~3 times while maintaining acceptable accuracy. The proposed computationally-efficient tsunami model is expected to provide a new practical tool for tsunami modelling for different purposes, including real-time warning, evacuation planning, risk management and city planning
    corecore