17,472 research outputs found
Multiple Uncertainties in Time-Variant Cosmological Particle Data
Though the mediums for visualization are limited, the potential dimensions of a dataset are not. In many areas of scientific study, understanding the correlations between those dimensions and their uncertainties is pivotal to mining useful information from a dataset. Obtaining this insight can necessitate visualizing the many relationships among temporal, spatial, and other dimensionalities of data and its uncertainties. We utilize multiple views for interactive dataset exploration and selection of important features, and we apply those techniques to the unique challenges of cosmological particle datasets. We show how interactivity and incorporation of multiple visualization techniques help overcome the problem of limited visualization dimensions and allow many types of uncertainty to be seen in correlation with other variables
Leveraging Coding Techniques for Speeding up Distributed Computing
Large scale clusters leveraging distributed computing frameworks such as
MapReduce routinely process data that are on the orders of petabytes or more.
The sheer size of the data precludes the processing of the data on a single
computer. The philosophy in these methods is to partition the overall job into
smaller tasks that are executed on different servers; this is called the map
phase. This is followed by a data shuffling phase where appropriate data is
exchanged between the servers. The final so-called reduce phase, completes the
computation.
One potential approach, explored in prior work for reducing the overall
execution time is to operate on a natural tradeoff between computation and
communication. Specifically, the idea is to run redundant copies of map tasks
that are placed on judiciously chosen servers. The shuffle phase exploits the
location of the nodes and utilizes coded transmission. The main drawback of
this approach is that it requires the original job to be split into a number of
map tasks that grows exponentially in the system parameters. This is
problematic, as we demonstrate that splitting jobs too finely can in fact
adversely affect the overall execution time.
In this work we show that one can simultaneously obtain low communication
loads while ensuring that jobs do not need to be split too finely. Our approach
uncovers a deep relationship between this problem and a class of combinatorial
structures called resolvable designs. Appropriate interpretation of resolvable
designs can allow for the development of coded distributed computing schemes
where the splitting levels are exponentially lower than prior work. We present
experimental results obtained on Amazon EC2 clusters for a widely known
distributed algorithm, namely TeraSort. We obtain over 4.69 improvement
in speedup over the baseline approach and more than 2.6 over current
state of the art
Systemic: A Testbed For Characterizing the Detection of Extrasolar Planets. I. The Systemic Console Package
We present the systemic Console, a new all-in-one, general-purpose software
package for the analysis and combined multiparameter fitting of Doppler radial
velocity (RV) and transit timing observations. We give an overview of the
computational algorithms implemented in the Console, and describe the tools
offered for streamlining the characterization of planetary systems. We
illustrate the capabilities of the package by analyzing an updated radial
velocity data set for the HD128311 planetary system. HD128311 harbors a pair of
planets that appear to be participating in a 2:1 mean motion resonance. We show
that the dynamical configuration cannot be fully determined from the current
data. We find that if a planetary system like HD128311 is found to undergo
transits, then self-consistent Newtonian fits to combined radial velocity data
and a small number of timing measurements of transit midpoints can provide an
immediate and vastly improved characterization of the planet's dynamical state.Comment: 10 pages, 5 figures, accepted for publication on PASP. Additional
material at http://www.ucolick.org/~smeschia/systemic.ph
Information Gains from Cosmological Probes
In light of the growing number of cosmological observations, it is important
to develop versatile tools to quantify the constraining power and consistency
of cosmological probes. Originally motivated from information theory, we use
the relative entropy to compute the information gained by Bayesian updates in
units of bits. This measure quantifies both the improvement in precision and
the 'surprise', i.e. the tension arising from shifts in central values. Our
starting point is a WMAP9 prior which we update with observations of the
distance ladder, supernovae (SNe), baryon acoustic oscillations (BAO), and weak
lensing as well as the 2015 Planck release. We consider the parameters of the
flat CDM concordance model and some of its extensions which include
curvature and Dark Energy equation of state parameter . We find that,
relative to WMAP9 and within these model spaces, the probes that have provided
the greatest gains are Planck (10 bits), followed by BAO surveys (5.1 bits) and
SNe experiments (3.1 bits). The other cosmological probes, including weak
lensing (1.7 bits) and {} measures (1.7 bits), have contributed
information but at a lower level. Furthermore, we do not find any significant
surprise when updating the constraints of WMAP9 with any of the other
experiments, meaning that they are consistent with WMAP9. However, when we
choose Planck15 as the prior, we find that, accounting for the full
multi-dimensionality of the parameter space, the weak lensing measurements of
CFHTLenS produce a large surprise of 4.4 bits which is statistically
significant at the 8 level. We discuss how the relative entropy
provides a versatile and robust framework to compare cosmological probes in the
context of current and future surveys.Comment: 26 pages, 5 figure
Compressing DNA sequence databases with coil
Background: Publicly available DNA sequence databases such as GenBank are large, and are
growing at an exponential rate. The sheer volume of data being dealt with presents serious storage
and data communications problems. Currently, sequence data is usually kept in large "flat files,"
which are then compressed using standard Lempel-Ziv (gzip) compression – an approach which
rarely achieves good compression ratios. While much research has been done on compressing
individual DNA sequences, surprisingly little has focused on the compression of entire databases
of such sequences. In this study we introduce the sequence database compression software coil.
Results: We have designed and implemented a portable software package, coil, for compressing
and decompressing DNA sequence databases based on the idea of edit-tree coding. coil is geared
towards achieving high compression ratios at the expense of execution time and memory usage
during compression – the compression time represents a "one-off investment" whose cost is
quickly amortised if the resulting compressed file is transmitted many times. Decompression
requires little memory and is extremely fast. We demonstrate a 5% improvement in compression
ratio over state-of-the-art general-purpose compression tools for a large GenBank database file
containing Expressed Sequence Tag (EST) data. Finally, coil can efficiently encode incremental
additions to a sequence database.
Conclusion: coil presents a compelling alternative to conventional compression of flat files for the
storage and distribution of DNA sequence databases having a narrow distribution of sequence
lengths, such as EST data. Increasing compression levels for databases having a wide distribution of
sequence lengths is a direction for future work
One simulation to fit them all - changing the background parameters of a cosmological N-body simulation
We demonstrate that the output of a cosmological N-body simulation can, to
remarkable accuracy, be scaled to represent the growth of large-scale structure
in a cosmology with parameters similar to but different from those originally
assumed. Our algorithm involves three steps: a reassignment of length, mass and
velocity units, a relabelling of the time axis, and a rescaling of the
amplitudes of individual large-scale fluctuation modes. We test it using two
matched pairs of simulations. Within each pair, one simulation assumes
parameters consistent with analyses of the first-year WMAP data. The other has
lower matter and baryon densities and a 15% lower fluctuation amplitude,
consistent with analyses of the three-year WMAP data. The pairs differ by a
factor of a thousand in mass resolution, enabling performance tests on both
linear and nonlinear scales. Our scaling reproduces the mass power spectra of
the target cosmology to better than 0.5% on large scales (k < 0.1 h/Mpc) both
in real and in redshift space. In particular, the BAO features of the original
cosmology are removed and are correctly replaced by those of the target
cosmology. Errors are still below 3% for k < 1 h/Mpc. Power spectra of the dark
halo distribution are even more precisely reproduced, with errors below 1% on
all scales tested. A halo-by-halo comparison shows that centre-of-mass
positions and velocities are reproduced to better than 90 kpc/h and 5%,
respectively. Halo masses, concentrations and spins are also reproduced at
about the 10% level, although with small biases. Halo assembly histories are
accurately reproduced, leading to central galaxy magnitudes with errors of
about 0.25 magnitudes and a bias of about 0.13 magnitudes for a representative
semi-analytic model.Comment: 14 pages, 12 figures. Submitted to MNRA
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Starting from a high-level problem description in terms of partial
differential equations using abstract tensor notation, the Chemora framework
discretizes, optimizes, and generates complete high performance codes for a
wide range of compute architectures. Chemora extends the capabilities of
Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient
manner for complex applications, without low-level code tuning. Chemora
achieves parallelism through MPI and multi-threading, combining OpenMP and
CUDA. Optimizations include high-level code transformations, efficient loop
traversal strategies, dynamically selected data and instruction cache usage
strategies, and JIT compilation of GPU code tailored to the problem
characteristics. The discretization is based on higher-order finite differences
on multi-block domains. Chemora's capabilities are demonstrated by simulations
of black hole collisions. This problem provides an acid test of the framework,
as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific
Programmin
- …