Astronomy has a long history of acquiring, systematizing, and interpreting
large quantities of data. Starting from the earliest sky atlases through the
first major photographic sky surveys of the 20th century, this tradition is
continuing today, and at an ever increasing rate.
Like many other fields, astronomy has become a very data-rich science, driven
by the advances in telescope, detector, and computer technology. Numerous large
digital sky surveys and archives already exist, with information content
measured in multiple Terabytes, and even larger, multi-Petabyte data sets are
on the horizon. Systematic observations of the sky, over a range of
wavelengths, are becoming the primary source of astronomical data. Numerical
simulations are also producing comparable volumes of information. Data mining
promises to both make the scientific utilization of these data sets more
effective and more complete, and to open completely new avenues of astronomical
research.
Technological problems range from the issues of database design and
federation, to data mining and advanced visualization, leading to a new toolkit
for astronomical research. This is similar to challenges encountered in other
data-intensive fields today.
These advances are now being organized through a concept of the Virtual
Observatories, federations of data archives and services representing a new
information infrastructure for astronomy of the 21st century. In this article,
we provide an overview of some of the major datasets in astronomy, discuss
different techniques used for archiving data, and conclude with a discussion of
the future of massive datasets in astronomy.Comment: 46 Pages, 21 Figures, Invited Review for the Handbook of Massive
Datasets, editors J. Abello, P. Pardalos, and M. Resende. Due to space
limitations this version has low resolution figures. For full resolution
review see http://www.astro.caltech.edu/~rb/publications/hmds.ps.g