69 research outputs found
Some statistical and computational challenges, and opportunities in astronomy
The data complexity and volume of astronomical findings have increased in recent decades due to major technological improvements in instrumentation and data collection methods. The contemporary astronomer is flooded with terabytes of raw data that produce enormous multidimensional catalogs of objects (stars, galaxies, quasars, etc.) numbering in the billions, with hundreds of measured numbers for each object. The astronomical community thus faces a key task: to enable efficient and objective scientific exploitation of enormous multifaceted data sets and the complex links between data and astrophysical theory. In recognition of this task, the National Virtual Observatory (NVO) initiative recently emerged to federate numerous large digital sky archives, and to develop tools to explore and understand these vast volumes of data. The effective use of such integrated massive data sets presents a variety of new challenging statistical and algorithmic problems that require methodological advances. An interdisciplinary team of statisticians, astronomers and computer scientists from The Pennsylvania State University, California Institute of Technology and Carnegie Mellon University is developing statistical methodology for the NVO. A brief glimpse into the Virtual Observatory and the work of the Penn State-led team is provided here
Multivariate statistical analysis software technologies for astrophysical research involving large data bases
We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complete database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful, and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications, and has produced real, published results
The National Virtual Observatory
As a scientific discipline, Astronomy is rather unique. We only have one
laboratory, the Universe, and we cannot, of course, change the initial
conditions and study the resulting effects. On top of this, acquiring
Astronomical data has historically been a very labor-intensive effort. As a
result, data has traditionally been preserved for posterity. With recent
technological advances, however, the rate at which we acquire new data has
grown exponentially, which has generated a Data Tsunami, whose wave train
threatens to overwhelm the field. In this conference proceedings, we present
and define the concept of virtual observatories, which we feel is the only
logical answer to this dilemma.Comment: 5 pages, uses newpasp.sty (included), to appear in "Extragalactic Gas
at Low Redshfit", ASP Conf. Series, J. S. Mulchaey and J. T. Stocke (eds.
Massive Datasets in Astronomy
Astronomy has a long history of acquiring, systematizing, and interpreting
large quantities of data. Starting from the earliest sky atlases through the
first major photographic sky surveys of the 20th century, this tradition is
continuing today, and at an ever increasing rate.
Like many other fields, astronomy has become a very data-rich science, driven
by the advances in telescope, detector, and computer technology. Numerous large
digital sky surveys and archives already exist, with information content
measured in multiple Terabytes, and even larger, multi-Petabyte data sets are
on the horizon. Systematic observations of the sky, over a range of
wavelengths, are becoming the primary source of astronomical data. Numerical
simulations are also producing comparable volumes of information. Data mining
promises to both make the scientific utilization of these data sets more
effective and more complete, and to open completely new avenues of astronomical
research.
Technological problems range from the issues of database design and
federation, to data mining and advanced visualization, leading to a new toolkit
for astronomical research. This is similar to challenges encountered in other
data-intensive fields today.
These advances are now being organized through a concept of the Virtual
Observatories, federations of data archives and services representing a new
information infrastructure for astronomy of the 21st century. In this article,
we provide an overview of some of the major datasets in astronomy, discuss
different techniques used for archiving data, and conclude with a discussion of
the future of massive datasets in astronomy.Comment: 46 Pages, 21 Figures, Invited Review for the Handbook of Massive
Datasets, editors J. Abello, P. Pardalos, and M. Resende. Due to space
limitations this version has low resolution figures. For full resolution
review see http://www.astro.caltech.edu/~rb/publications/hmds.ps.g
Striking Photospheric Abundance Anomalies in Blue Horizontal-Branch Stars in Globular Cluster M13
High-resolution optical spectra of thirteen blue horizontal-branch (BHB)
stars in the globular cluster M13 show enormous deviations in element
abundances from the expected cluster metallicity. In the hotter stars (T_eff >
12000 K), helium is depleted by factors of 10 to 100 below solar, while iron is
enhanced to three times the solar abundance, two orders of magnitude above the
canonical metallicity [Fe/H] ~= -1.5 dex for this globular cluster. Nitrogen,
phosphorus, and chromium exhibit even more pronounced enhancements, and other
metals are also mildly overabundant, with the exception of magnesium, which
stays very near the expected cluster metallicity. These photospheric anomalies
are most likely due to diffusion --- gravitational settling of helium, and
radiative levitation of the other elements --- in the stable radiative
atmospheres of these hot stars. The effects of these mechanisms may have some
impact on the photometric morphology of the cluster's horizontal branch and on
estimates of its age and distance.Comment: 11 pages, 1 Postscript figure, uses aaspp4.sty, accepted for
publication in ApJ Letter
A possible close supermassive black-hole binary in a quasar with optical periodicity
Quasars have long been known to be variable sources at all wavelengths. Their
optical variability is stochastic, can be due to a variety of physical
mechanisms, and is well-described statistically in terms of a damped random
walk model. The recent availability of large collections of astronomical time
series of flux measurements (light curves) offers new data sets for a
systematic exploration of quasar variability. Here we report on the detection
of a strong, smooth periodic signal in the optical variability of the quasar PG
1302-102 with a mean observed period of 1,884 88 days. It was identified
in a search for periodic variability in a data set of light curves for 247,000
known, spectroscopically confirmed quasars with a temporal baseline of
years. While the interpretation of this phenomenon is still uncertain, the most
plausible mechanisms involve a binary system of two supermassive black holes
with a subparsec separation. Such systems are an expected consequence of galaxy
mergers and can provide important constraints on models of galaxy formation and
evolution.Comment: 19 pages, 6 figures. Published online by Nature on 7 January 201
Peculiar Multimodality on the Horizontal Branch of the Globular Cluster NGC 2808
We present distributions of colors of stars along the horizontal branch of
the globular cluster NGC 2808, from Hubble Space Telescope WFPC2 imaging in B,
V, and an ultraviolet filter (F218W). This cluster's HB is already known to be
strongly bimodal, with approximately equal-sized HB populations widely
separated in the color-magnitude diagram. Our images reveal a long blue tail
with two gaps, for a total of four nearly distinct HB groups. These gaps are
very narrow, corresponding to envelope-mass differences of only \sim 0.01 Msun.
This remarkable multimodality may be a signature of mass-loss processes, subtle
composition variations, or dynamical effects; we briefly summarize the
possibilities. The existence of narrow gaps between distinct clumps on the HB
presents a challenge for models that attempt to explain HB bimodality or other
peculiar HB structures.Comment: LaTeX, including compressed figures. To appear in ApJL. Larger (851k)
PostScript version, including high-quality figures, available from
http://astro.berkeley.edu/~csosin/pub
- …