34,260 research outputs found
The National Virtual Observatory
As a scientific discipline, Astronomy is rather unique. We only have one
laboratory, the Universe, and we cannot, of course, change the initial
conditions and study the resulting effects. On top of this, acquiring
Astronomical data has historically been a very labor-intensive effort. As a
result, data has traditionally been preserved for posterity. With recent
technological advances, however, the rate at which we acquire new data has
grown exponentially, which has generated a Data Tsunami, whose wave train
threatens to overwhelm the field. In this conference proceedings, we present
and define the concept of virtual observatories, which we feel is the only
logical answer to this dilemma.Comment: 5 pages, uses newpasp.sty (included), to appear in "Extragalactic Gas
at Low Redshfit", ASP Conf. Series, J. S. Mulchaey and J. T. Stocke (eds.
Data Driven Discovery in Astrophysics
We review some aspects of the current state of data-intensive astronomy, its
methods, and some outstanding data analysis challenges. Astronomy is at the
forefront of "big data" science, with exponentially growing data volumes and
data rates, and an ever-increasing complexity, now entering the Petascale
regime. Telescopes and observatories from both ground and space, covering a
full range of wavelengths, feed the data via processing pipelines into
dedicated archives, where they can be accessed for scientific analysis. Most of
the large archives are connected through the Virtual Observatory framework,
that provides interoperability standards and services, and effectively
constitutes a global data grid of astronomy. Making discoveries in this
overabundance of data requires applications of novel, machine learning tools.
We describe some of the recent examples of such applications.Comment: Keynote talk in the proceedings of ESA-ESRIN Conference: Big Data
from Space 2014, Frascati, Italy, November 12-14, 2014, 8 pages, 2 figure
Challenges of Big Data Analysis
Big Data bring new opportunities to modern society and challenges to data
scientists. On one hand, Big Data hold great promises for discovering subtle
population patterns and heterogeneities that are not possible with small-scale
data. On the other hand, the massive sample size and high dimensionality of Big
Data introduce unique computational and statistical challenges, including
scalability and storage bottleneck, noise accumulation, spurious correlation,
incidental endogeneity, and measurement errors. These challenges are
distinguished and require new computational and statistical paradigm. This
article give overviews on the salient features of Big Data and how these
features impact on paradigm change on statistical and computational methods as
well as computing architectures. We also provide various new perspectives on
the Big Data analysis and computation. In particular, we emphasis on the
viability of the sparsest solution in high-confidence set and point out that
exogeneous assumptions in most statistical methods for Big Data can not be
validated due to incidental endogeneity. They can lead to wrong statistical
inferences and consequently wrong scientific conclusions
BOSS-LDG: A Novel Computational Framework that Brings Together Blue Waters, Open Science Grid, Shifter and the LIGO Data Grid to Accelerate Gravitational Wave Discovery
We present a novel computational framework that connects Blue Waters, the
NSF-supported, leadership-class supercomputer operated by NCSA, to the Laser
Interferometer Gravitational-Wave Observatory (LIGO) Data Grid via Open Science
Grid technology. To enable this computational infrastructure, we configured,
for the first time, a LIGO Data Grid Tier-1 Center that can submit
heterogeneous LIGO workflows using Open Science Grid facilities. In order to
enable a seamless connection between the LIGO Data Grid and Blue Waters via
Open Science Grid, we utilize Shifter to containerize LIGO's workflow software.
This work represents the first time Open Science Grid, Shifter, and Blue Waters
are unified to tackle a scientific problem and, in particular, it is the first
time a framework of this nature is used in the context of large scale
gravitational wave data analysis. This new framework has been used in the last
several weeks of LIGO's second discovery campaign to run the most
computationally demanding gravitational wave search workflows on Blue Waters,
and accelerate discovery in the emergent field of gravitational wave
astrophysics. We discuss the implications of this novel framework for a wider
ecosystem of Higher Performance Computing users.Comment: 10 pages, 10 figures. Accepted as a Full Research Paper to the 13th
IEEE International Conference on eScienc
- …