17,472 research outputs found

    Multiple Uncertainties in Time-Variant Cosmological Particle Data

    Get PDF
    Though the mediums for visualization are limited, the potential dimensions of a dataset are not. In many areas of scientific study, understanding the correlations between those dimensions and their uncertainties is pivotal to mining useful information from a dataset. Obtaining this insight can necessitate visualizing the many relationships among temporal, spatial, and other dimensionalities of data and its uncertainties. We utilize multiple views for interactive dataset exploration and selection of important features, and we apply those techniques to the unique challenges of cosmological particle datasets. We show how interactivity and incorporation of multiple visualization techniques help overcome the problem of limited visualization dimensions and allow many types of uncertainty to be seen in correlation with other variables

    Leveraging Coding Techniques for Speeding up Distributed Computing

    Get PDF
    Large scale clusters leveraging distributed computing frameworks such as MapReduce routinely process data that are on the orders of petabytes or more. The sheer size of the data precludes the processing of the data on a single computer. The philosophy in these methods is to partition the overall job into smaller tasks that are executed on different servers; this is called the map phase. This is followed by a data shuffling phase where appropriate data is exchanged between the servers. The final so-called reduce phase, completes the computation. One potential approach, explored in prior work for reducing the overall execution time is to operate on a natural tradeoff between computation and communication. Specifically, the idea is to run redundant copies of map tasks that are placed on judiciously chosen servers. The shuffle phase exploits the location of the nodes and utilizes coded transmission. The main drawback of this approach is that it requires the original job to be split into a number of map tasks that grows exponentially in the system parameters. This is problematic, as we demonstrate that splitting jobs too finely can in fact adversely affect the overall execution time. In this work we show that one can simultaneously obtain low communication loads while ensuring that jobs do not need to be split too finely. Our approach uncovers a deep relationship between this problem and a class of combinatorial structures called resolvable designs. Appropriate interpretation of resolvable designs can allow for the development of coded distributed computing schemes where the splitting levels are exponentially lower than prior work. We present experimental results obtained on Amazon EC2 clusters for a widely known distributed algorithm, namely TeraSort. We obtain over 4.69×\times improvement in speedup over the baseline approach and more than 2.6×\times over current state of the art

    Systemic: A Testbed For Characterizing the Detection of Extrasolar Planets. I. The Systemic Console Package

    Get PDF
    We present the systemic Console, a new all-in-one, general-purpose software package for the analysis and combined multiparameter fitting of Doppler radial velocity (RV) and transit timing observations. We give an overview of the computational algorithms implemented in the Console, and describe the tools offered for streamlining the characterization of planetary systems. We illustrate the capabilities of the package by analyzing an updated radial velocity data set for the HD128311 planetary system. HD128311 harbors a pair of planets that appear to be participating in a 2:1 mean motion resonance. We show that the dynamical configuration cannot be fully determined from the current data. We find that if a planetary system like HD128311 is found to undergo transits, then self-consistent Newtonian fits to combined radial velocity data and a small number of timing measurements of transit midpoints can provide an immediate and vastly improved characterization of the planet's dynamical state.Comment: 10 pages, 5 figures, accepted for publication on PASP. Additional material at http://www.ucolick.org/~smeschia/systemic.ph

    Information Gains from Cosmological Probes

    Full text link
    In light of the growing number of cosmological observations, it is important to develop versatile tools to quantify the constraining power and consistency of cosmological probes. Originally motivated from information theory, we use the relative entropy to compute the information gained by Bayesian updates in units of bits. This measure quantifies both the improvement in precision and the 'surprise', i.e. the tension arising from shifts in central values. Our starting point is a WMAP9 prior which we update with observations of the distance ladder, supernovae (SNe), baryon acoustic oscillations (BAO), and weak lensing as well as the 2015 Planck release. We consider the parameters of the flat Λ\LambdaCDM concordance model and some of its extensions which include curvature and Dark Energy equation of state parameter ww. We find that, relative to WMAP9 and within these model spaces, the probes that have provided the greatest gains are Planck (10 bits), followed by BAO surveys (5.1 bits) and SNe experiments (3.1 bits). The other cosmological probes, including weak lensing (1.7 bits) and {H0\rm H_0} measures (1.7 bits), have contributed information but at a lower level. Furthermore, we do not find any significant surprise when updating the constraints of WMAP9 with any of the other experiments, meaning that they are consistent with WMAP9. However, when we choose Planck15 as the prior, we find that, accounting for the full multi-dimensionality of the parameter space, the weak lensing measurements of CFHTLenS produce a large surprise of 4.4 bits which is statistically significant at the 8 σ\sigma level. We discuss how the relative entropy provides a versatile and robust framework to compare cosmological probes in the context of current and future surveys.Comment: 26 pages, 5 figure

    Compressing DNA sequence databases with coil

    Get PDF
    Background: Publicly available DNA sequence databases such as GenBank are large, and are growing at an exponential rate. The sheer volume of data being dealt with presents serious storage and data communications problems. Currently, sequence data is usually kept in large "flat files," which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which rarely achieves good compression ratios. While much research has been done on compressing individual DNA sequences, surprisingly little has focused on the compression of entire databases of such sequences. In this study we introduce the sequence database compression software coil. Results: We have designed and implemented a portable software package, coil, for compressing and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared towards achieving high compression ratios at the expense of execution time and memory usage during compression – the compression time represents a "one-off investment" whose cost is quickly amortised if the resulting compressed file is transmitted many times. Decompression requires little memory and is extremely fast. We demonstrate a 5% improvement in compression ratio over state-of-the-art general-purpose compression tools for a large GenBank database file containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental additions to a sequence database. Conclusion: coil presents a compelling alternative to conventional compression of flat files for the storage and distribution of DNA sequence databases having a narrow distribution of sequence lengths, such as EST data. Increasing compression levels for databases having a wide distribution of sequence lengths is a direction for future work

    One simulation to fit them all - changing the background parameters of a cosmological N-body simulation

    Full text link
    We demonstrate that the output of a cosmological N-body simulation can, to remarkable accuracy, be scaled to represent the growth of large-scale structure in a cosmology with parameters similar to but different from those originally assumed. Our algorithm involves three steps: a reassignment of length, mass and velocity units, a relabelling of the time axis, and a rescaling of the amplitudes of individual large-scale fluctuation modes. We test it using two matched pairs of simulations. Within each pair, one simulation assumes parameters consistent with analyses of the first-year WMAP data. The other has lower matter and baryon densities and a 15% lower fluctuation amplitude, consistent with analyses of the three-year WMAP data. The pairs differ by a factor of a thousand in mass resolution, enabling performance tests on both linear and nonlinear scales. Our scaling reproduces the mass power spectra of the target cosmology to better than 0.5% on large scales (k < 0.1 h/Mpc) both in real and in redshift space. In particular, the BAO features of the original cosmology are removed and are correctly replaced by those of the target cosmology. Errors are still below 3% for k < 1 h/Mpc. Power spectra of the dark halo distribution are even more precisely reproduced, with errors below 1% on all scales tested. A halo-by-halo comparison shows that centre-of-mass positions and velocities are reproduced to better than 90 kpc/h and 5%, respectively. Halo masses, concentrations and spins are also reproduced at about the 10% level, although with small biases. Halo assembly histories are accurately reproduced, leading to central galaxy magnitudes with errors of about 0.25 magnitudes and a bias of about 0.13 magnitudes for a representative semi-analytic model.Comment: 14 pages, 12 figures. Submitted to MNRA

    From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    Full text link
    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin
    corecore