35,214 research outputs found
Data-Mining a Large Digital Sky Survey: From the Challenges to the Scientific Results
The analysis and an efficient scientific exploration of the Digital Palomar
Observatory Sky Survey (DPOSS) represents a major technical challenge. The
input data set consists of 3 Terabytes of pixel information, and contains a few
billion sources. We describe some of the specific scientific problems posed by
the data, including searches for distant quasars and clusters of galaxies, and
the data-mining techniques we are exploring in addressing them.
Machine-assisted discovery methods may become essential for the analysis of
such multi-Terabyte data sets. New and future approaches involve unsupervised
classification and clustering analysis in the Giga-object data space, including
various Bayesian techniques. In addition to the searches for known types of
objects in this data base, these techniques may also offer the possibility of
discovering previously unknown, rare types of astronomical objects.Comment: Invited paper, to appear in Applications of Digital Image Processing
XX, ed. A. Tescher, Proc. S.P.I.E. vol. 3164, in press; 10 pages, a
self-contained TeX file, and 3 separate postscript figure
The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch
Recent and forthcoming advances in instrumentation, and giant new surveys,
are creating astronomical data sets that are not amenable to the methods of
analysis familiar to astronomers. Traditional methods are often inadequate not
merely because of the size in bytes of the data sets, but also because of the
complexity of modern data sets. Mathematical limitations of familiar algorithms
and techniques in dealing with such data sets create a critical need for new
paradigms for the representation, analysis and scientific visualization (as
opposed to illustrative visualization) of heterogeneous, multiresolution data
across application domains. Some of the problems presented by the new data sets
have been addressed by other disciplines such as applied mathematics,
statistics and machine learning and have been utilized by other sciences such
as space-based geosciences. Unfortunately, valuable results pertaining to these
problems are mostly to be found only in publications outside of astronomy. Here
we offer brief overviews of a number of concepts, techniques and developments,
some "old" and some new. These are generally unknown to most of the
astronomical community, but are vital to the analysis and visualization of
complex datasets and images. In order for astronomers to take advantage of the
richness and complexity of the new era of data, and to be able to identify,
adopt, and apply new solutions, the astronomical community needs a certain
degree of awareness and understanding of the new concepts. One of the goals of
this paper is to help bridge the gap between applied mathematics, artificial
intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in
Astronomy, special issue "Robotic Astronomy
Processing Images from the Zwicky Transient Facility
The Zwicky Transient Facility is a new robotic-observing program, in which a
newly engineered 600-MP digital camera with a pioneeringly large field of view,
47~square degrees, will be installed into the 48-inch Samuel Oschin Telescope
at the Palomar Observatory. The camera will generate ~petabyte of raw
image data over three years of operations. In parallel related work, new
hardware and software systems are being developed to process these data in real
time and build a long-term archive for the processed products. The first public
release of archived products is planned for early 2019, which will include
processed images and astronomical-source catalogs of the northern sky in the
and bands. Source catalogs based on two different methods will be
generated for the archive: aperture photometry and point-spread-function
fitting.Comment: 6 pages, 4 figures, submitted to RTSRE Proceedings (www.rtsre.org
Approximate Inference for Constructing Astronomical Catalogs from Images
We present a new, fully generative model for constructing astronomical
catalogs from optical telescope image sets. Each pixel intensity is treated as
a random variable with parameters that depend on the latent properties of stars
and galaxies. These latent properties are themselves modeled as random. We
compare two procedures for posterior inference. One procedure is based on
Markov chain Monte Carlo (MCMC) while the other is based on variational
inference (VI). The MCMC procedure excels at quantifying uncertainty, while the
VI procedure is 1000 times faster. On a supercomputer, the VI procedure
efficiently uses 665,000 CPU cores to construct an astronomical catalog from 50
terabytes of images in 14.6 minutes, demonstrating the scaling characteristics
necessary to construct catalogs for upcoming astronomical surveys.Comment: accepted to the Annals of Applied Statistic
Grids and the Virtual Observatory
We consider several projects from astronomy that benefit from the Grid paradigm and
associated technology, many of which involve either massive datasets or the federation
of multiple datasets. We cover image computation (mosaicking, multi-wavelength
images, and synoptic surveys); database computation (representation through XML,
data mining, and visualization); and semantic interoperability (publishing, ontologies,
directories, and service descriptions)
Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking
Montage is a portable software toolkit for constructing custom, science-grade
mosaics by composing multiple astronomical images. The mosaics constructed by
Montage preserve the astrometry (position) and photometry (intensity) of the
sources in the input images. The mosaic to be constructed is specified by the
user in terms of a set of parameters, including dataset and wavelength to be
used, location and size on the sky, coordinate system and projection, and
spatial sampling rate. Many astronomical datasets are massive, and are stored
in distributed archives that are, in most cases, remote with respect to the
available computational resources. Montage can be run on both single- and
multi-processor computers, including clusters and grids. Standard grid tools
are used to run Montage in the case where the data or computers used to
construct a mosaic are located remotely on the Internet. This paper describes
the architecture, algorithms, and usage of Montage as both a software toolkit
and as a grid portal. Timing results are provided to show how Montage
performance scales with number of processors on a cluster computer. In
addition, we compare the performance of two methods of running Montage in
parallel on a grid.Comment: 16 pages, 11 figure
The IPAC Image Subtraction and Discovery Pipeline for the intermediate Palomar Transient Factory
We describe the near real-time transient-source discovery engine for the
intermediate Palomar Transient Factory (iPTF), currently in operations at the
Infrared Processing and Analysis Center (IPAC), Caltech. We coin this system
the IPAC/iPTF Discovery Engine (or IDE). We review the algorithms used for
PSF-matching, image subtraction, detection, photometry, and machine-learned
(ML) vetting of extracted transient candidates. We also review the performance
of our ML classifier. For a limiting signal-to-noise ratio of 4 in relatively
unconfused regions, "bogus" candidates from processing artifacts and imperfect
image subtractions outnumber real transients by ~ 10:1. This can be
considerably higher for image data with inaccurate astrometric and/or
PSF-matching solutions. Despite this occasionally high contamination rate, the
ML classifier is able to identify real transients with an efficiency (or
completeness) of ~ 97% for a maximum tolerable false-positive rate of 1% when
classifying raw candidates. All subtraction-image metrics, source features, ML
probability-based real-bogus scores, contextual metadata from other surveys,
and possible associations with known Solar System objects are stored in a
relational database for retrieval by the various science working groups. We
review our efforts in mitigating false-positives and our experience in
optimizing the overall system in response to the multitude of science projects
underway with iPTF.Comment: 66 pages, 21 figures, 7 tables, accepted by PAS
- …