Search CORE

35,214 research outputs found

Data-Mining a Large Digital Sky Survey: From the Challenges to the Scientific Results

Author: de Carvalho R. R.
Djorgovski S. G.
Gal R. R.
Gray A.
Odewahn S. C.
Roden J.
Stolorz P.
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date: 01/01/1997
Field of study

The analysis and an efficient scientific exploration of the Digital Palomar Observatory Sky Survey (DPOSS) represents a major technical challenge. The input data set consists of 3 Terabytes of pixel information, and contains a few billion sources. We describe some of the specific scientific problems posed by the data, including searches for distant quasars and clusters of galaxies, and the data-mining techniques we are exploring in addressing them. Machine-assisted discovery methods may become essential for the analysis of such multi-Terabyte data sets. New and future approaches involve unsupervised classification and clustering analysis in the Giga-object data space, including various Bayesian techniques. In addition to the searches for known types of objects in this data base, these techniques may also offer the possibility of discovering previously unknown, rare types of astronomical objects.Comment: Invited paper, to appear in Applications of Digital Image Processing XX, ed. A. Tescher, Proc. S.P.I.E. vol. 3164, in press; 10 pages, a self-contained TeX file, and 3 separate postscript figure

arXiv.org e-Print Archive

CiteSeerX

Caltech Authors

CERN Document Server

The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch

Author: McCollum Bruce
Pesenson Isaac Z.
Pesenson Meyer Z.
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2010
Field of study

Recent and forthcoming advances in instrumentation, and giant new surveys, are creating astronomical data sets that are not amenable to the methods of analysis familiar to astronomers. Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets. Mathematical limitations of familiar algorithms and techniques in dealing with such data sets create a critical need for new paradigms for the representation, analysis and scientific visualization (as opposed to illustrative visualization) of heterogeneous, multiresolution data across application domains. Some of the problems presented by the new data sets have been addressed by other disciplines such as applied mathematics, statistics and machine learning and have been utilized by other sciences such as space-based geosciences. Unfortunately, valuable results pertaining to these problems are mostly to be found only in publications outside of astronomy. Here we offer brief overviews of a number of concepts, techniques and developments, some "old" and some new. These are generally unknown to most of the astronomical community, but are vital to the analysis and visualization of complex datasets and images. In order for astronomers to take advantage of the richness and complexity of the new era of data, and to be able to identify, adopt, and apply new solutions, the astronomical community needs a certain degree of awareness and understanding of the new concepts. One of the goals of this paper is to help bridge the gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in Astronomy, special issue "Robotic Astronomy

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Caltech Authors

Processing Images from the Zwicky Transient Facility

Author: Barlow Tom
Beck Ron
Bellm Eric
Bue Brian
Cenko S. B.
Dekany Richard G.
Flynn Dave
Graham Matthew
Groom Steve
Hacopians Eugean
Helou George
Jackson Ed
Kasliwal Mansi M.
Kulkarni Shrinivas R.
Kupfer Thomas
Laher Russ R.
Landry Walter
Masci Frank J.
Miller Adam A.
Patterson Maria
Prince Thomas A.
Rebbapragada Umaa
Rusholme Benjamin
Shupe David L.
Smith Roger M.
Surace Jason
Terek Scott
Yan Lin
Publication venue
Publication date: 16/10/2017
Field of study

The Zwicky Transient Facility is a new robotic-observing program, in which a newly engineered 600-MP digital camera with a pioneeringly large field of view, 47~square degrees, will be installed into the 48-inch Samuel Oschin Telescope at the Palomar Observatory. The camera will generate

\sim 1

~petabyte of raw image data over three years of operations. In parallel related work, new hardware and software systems are being developed to process these data in real time and build a long-term archive for the processed products. The first public release of archived products is planned for early 2019, which will include processed images and astronomical-source catalogs of the northern sky in the

g

and

r

bands. Source catalogs based on two different methods will be generated for the archive: aperture photometry and point-spread-function fitting.Comment: 6 pages, 4 figures, submitted to RTSRE Proceedings (www.rtsre.org

arXiv.org e-Print Archive

Crossref

Caltech Authors

iStarDB (The Astronomy Education Research Repository)

Approximate Inference for Constructing Astronomical Catalogs from Images

Author: Adams Ryan P.
McAuliffe Jon D.
Miller Andrew C.
Prabhat
Regier Jeffrey
Schlegel David
Publication venue
Publication date: 09/04/2019
Field of study

We present a new, fully generative model for constructing astronomical catalogs from optical telescope image sets. Each pixel intensity is treated as a random variable with parameters that depend on the latent properties of stars and galaxies. These latent properties are themselves modeled as random. We compare two procedures for posterior inference. One procedure is based on Markov chain Monte Carlo (MCMC) while the other is based on variational inference (VI). The MCMC procedure excels at quantifying uncertainty, while the VI procedure is 1000 times faster. On a supercomputer, the VI procedure efficiently uses 665,000 CPU cores to construct an astronomical catalog from 50 terabytes of images in 14.6 minutes, demonstrating the scaling characteristics necessary to construct catalogs for upcoming astronomical surveys.Comment: accepted to the Annals of Applied Statistic

arXiv.org e-Print Archive

Princeton University Open Access Repository

eScholarship - University of California

Grids and the Virtual Observatory

Author: Williams Roy
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 01/01/2003
Field of study

We consider several projects from astronomy that benefit from the Grid paradigm and associated technology, many of which involve either massive datasets or the federation of multiple datasets. We cover image computation (mosaicking, multi-wavelength images, and synoptic surveys); database computation (representation through XML, data mining, and visualization); and semantic interoperability (publishing, ontologies, directories, and service descriptions)

Caltech Authors

Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking

Author: Berriman G. Bruce
Deelman Ewa
Good John
Jacob Joseph C.
Katz Daniel S.
Kesselman Carl
Laity Anastasia C.
Prince Thomas A.
Singh Gurmeet
Su Mei-Hui
Williams Roy
Publication venue
Publication date: 01/01/2009
Field of study

Montage is a portable software toolkit for constructing custom, science-grade mosaics by composing multiple astronomical images. The mosaics constructed by Montage preserve the astrometry (position) and photometry (intensity) of the sources in the input images. The mosaic to be constructed is specified by the user in terms of a set of parameters, including dataset and wavelength to be used, location and size on the sky, coordinate system and projection, and spatial sampling rate. Many astronomical datasets are massive, and are stored in distributed archives that are, in most cases, remote with respect to the available computational resources. Montage can be run on both single- and multi-processor computers, including clusters and grids. Standard grid tools are used to run Montage in the case where the data or computers used to construct a mosaic are located remotely on the Internet. This paper describes the architecture, algorithms, and usage of Montage as both a software toolkit and as a grid portal. Timing results are provided to show how Montage performance scales with number of processors on a cluster computer. In addition, we compare the performance of two methods of running Montage in parallel on a grid.Comment: 16 pages, 11 figure

arXiv.org e-Print Archive

Crossref

The IPAC Image Subtraction and Discovery Pipeline for the intermediate Palomar Transient Factory

Author: Barlow Tom
Bellm Eric
Cao Yi
Cenko S. Bradley
Doran Gary
Grillmair Carl
Helou George
Jackson Ed
Kasliwal Mansi
Kulkarni Shrinivas
Laher Russ
Masci Frank
Miller Adam
Ofek Eran
Prince Thomas
Rebbapragada Umaa
Shupe David
Storrie-Lombardi Lisa
Surace Jason
Yan Lin
Publication venue: 'IOP Publishing'
Publication date: 03/10/2016
Field of study

We describe the near real-time transient-source discovery engine for the intermediate Palomar Transient Factory (iPTF), currently in operations at the Infrared Processing and Analysis Center (IPAC), Caltech. We coin this system the IPAC/iPTF Discovery Engine (or IDE). We review the algorithms used for PSF-matching, image subtraction, detection, photometry, and machine-learned (ML) vetting of extracted transient candidates. We also review the performance of our ML classifier. For a limiting signal-to-noise ratio of 4 in relatively unconfused regions, "bogus" candidates from processing artifacts and imperfect image subtractions outnumber real transients by ~ 10:1. This can be considerably higher for image data with inaccurate astrometric and/or PSF-matching solutions. Despite this occasionally high contamination rate, the ML classifier is able to identify real transients with an efficiency (or completeness) of ~ 97% for a maximum tolerable false-positive rate of 1% when classifying raw candidates. All subtraction-image metrics, source features, ML probability-based real-bogus scores, contextual metadata from other surveys, and possible associations with known Solar System objects are stored in a relational database for retrieval by the various science working groups. We review our efforts in mitigating false-positives and our experience in optimizing the overall system in response to the multitude of science projects underway with iPTF.Comment: 66 pages, 21 figures, 7 tables, accepted by PAS

arXiv.org e-Print Archive

NASA Technical Reports Server

Caltech Authors