118 research outputs found
Some statistical and computational challenges, and opportunities in astronomy
The data complexity and volume of astronomical findings have increased in recent decades due to major technological improvements in instrumentation and data collection methods. The contemporary astronomer is flooded with terabytes of raw data that produce enormous multidimensional catalogs of objects (stars, galaxies, quasars, etc.) numbering in the billions, with hundreds of measured numbers for each object. The astronomical community thus faces a key task: to enable efficient and objective scientific exploitation of enormous multifaceted data sets and the complex links between data and astrophysical theory. In recognition of this task, the National Virtual Observatory (NVO) initiative recently emerged to federate numerous large digital sky archives, and to develop tools to explore and understand these vast volumes of data. The effective use of such integrated massive data sets presents a variety of new challenging statistical and algorithmic problems that require methodological advances. An interdisciplinary team of statisticians, astronomers and computer scientists from The Pennsylvania State University, California Institute of Technology and Carnegie Mellon University is developing statistical methodology for the NVO. A brief glimpse into the Virtual Observatory and the work of the Penn State-led team is provided here
Multivariate statistical analysis software technologies for astrophysical research involving large data bases
We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complete database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful, and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications, and has produced real, published results
Astrophysics in the Era of Massive Time-Domain Surveys
Synoptic sky surveys are now the largest data producers in astronomy, entering the Petascale regime, opening the time domain for a systematic exploration. A great variety of interesting phenomena, spanning essentially all subfields of astronomy, can only be studied in the time domain, and these new surveys are producing large statistical samples of the known types of objects and events for further studies (e.g., SNe, AGN, variable stars of many kinds), and have already uncovered previously unknown
subtypes of these (e.g., rare or peculiar types of SNe). These surveys are generating a new science, and paving the way for even larger surveys to come, e.g., the LSST; our ability to fully exploit such forthcoming facilities depends critically on the science,
methodology, and experience that are being accumulated now. Among the outstanding challenges, the foremost is our ability to conduct an effective follow-up of the interesting events discovered by the surveys in any wavelength regime. The follow-up resources, especially spectroscopy, are already and, for the predictable future, will be severely limited, thus requiring an intelligent down-selection of the most astrophysically interesting events to follow. The first step in that process is an automated, real-time, iterative classification of events, that incorporates heterogeneous data from
the surveys themselves, archival and contextual information (spatial, temporal, and multiwavelength), and the incoming follow-up observations. The second step is an optimal automated event prioritization and allocation of the available follow-up resources that also change in time. Both of these challenges are highly non-trivial, and require a strong cyber-infrastructure based on the Virtual Observatory data grid, and the various astroinformatics efforts. Time domain astronomy is inherently an astronomy of telescope-computational systems, and will increasingly depend on novel
machine learning and artificial intelligence tools. Another arena with a strong potential for discovery is a purely archival, non-time-critical exploration of the time domain, with the time dimension adding the complexity to an already challenging problem of data mining of highly-dimensional parameter spaces produced by sky surveys
The National Virtual Observatory
As a scientific discipline, Astronomy is rather unique. We only have one
laboratory, the Universe, and we cannot, of course, change the initial
conditions and study the resulting effects. On top of this, acquiring
Astronomical data has historically been a very labor-intensive effort. As a
result, data has traditionally been preserved for posterity. With recent
technological advances, however, the rate at which we acquire new data has
grown exponentially, which has generated a Data Tsunami, whose wave train
threatens to overwhelm the field. In this conference proceedings, we present
and define the concept of virtual observatories, which we feel is the only
logical answer to this dilemma.Comment: 5 pages, uses newpasp.sty (included), to appear in "Extragalactic Gas
at Low Redshfit", ASP Conf. Series, J. S. Mulchaey and J. T. Stocke (eds.
Striking Photospheric Abundance Anomalies in Blue Horizontal-Branch Stars in Globular Cluster M13
High-resolution optical spectra of thirteen blue horizontal-branch (BHB)
stars in the globular cluster M13 show enormous deviations in element
abundances from the expected cluster metallicity. In the hotter stars (T_eff >
12000 K), helium is depleted by factors of 10 to 100 below solar, while iron is
enhanced to three times the solar abundance, two orders of magnitude above the
canonical metallicity [Fe/H] ~= -1.5 dex for this globular cluster. Nitrogen,
phosphorus, and chromium exhibit even more pronounced enhancements, and other
metals are also mildly overabundant, with the exception of magnesium, which
stays very near the expected cluster metallicity. These photospheric anomalies
are most likely due to diffusion --- gravitational settling of helium, and
radiative levitation of the other elements --- in the stable radiative
atmospheres of these hot stars. The effects of these mechanisms may have some
impact on the photometric morphology of the cluster's horizontal branch and on
estimates of its age and distance.Comment: 11 pages, 1 Postscript figure, uses aaspp4.sty, accepted for
publication in ApJ Letter
A possible close supermassive black-hole binary in a quasar with optical periodicity
Quasars have long been known to be variable sources at all wavelengths. Their
optical variability is stochastic, can be due to a variety of physical
mechanisms, and is well-described statistically in terms of a damped random
walk model. The recent availability of large collections of astronomical time
series of flux measurements (light curves) offers new data sets for a
systematic exploration of quasar variability. Here we report on the detection
of a strong, smooth periodic signal in the optical variability of the quasar PG
1302-102 with a mean observed period of 1,884 88 days. It was identified
in a search for periodic variability in a data set of light curves for 247,000
known, spectroscopically confirmed quasars with a temporal baseline of
years. While the interpretation of this phenomenon is still uncertain, the most
plausible mechanisms involve a binary system of two supermassive black holes
with a subparsec separation. Such systems are an expected consequence of galaxy
mergers and can provide important constraints on models of galaxy formation and
evolution.Comment: 19 pages, 6 figures. Published online by Nature on 7 January 201
Massive Datasets in Astronomy
Astronomy has a long history of acquiring, systematizing, and interpreting
large quantities of data. Starting from the earliest sky atlases through the
first major photographic sky surveys of the 20th century, this tradition is
continuing today, and at an ever increasing rate.
Like many other fields, astronomy has become a very data-rich science, driven
by the advances in telescope, detector, and computer technology. Numerous large
digital sky surveys and archives already exist, with information content
measured in multiple Terabytes, and even larger, multi-Petabyte data sets are
on the horizon. Systematic observations of the sky, over a range of
wavelengths, are becoming the primary source of astronomical data. Numerical
simulations are also producing comparable volumes of information. Data mining
promises to both make the scientific utilization of these data sets more
effective and more complete, and to open completely new avenues of astronomical
research.
Technological problems range from the issues of database design and
federation, to data mining and advanced visualization, leading to a new toolkit
for astronomical research. This is similar to challenges encountered in other
data-intensive fields today.
These advances are now being organized through a concept of the Virtual
Observatories, federations of data archives and services representing a new
information infrastructure for astronomy of the 21st century. In this article,
we provide an overview of some of the major datasets in astronomy, discuss
different techniques used for archiving data, and conclude with a discussion of
the future of massive datasets in astronomy.Comment: 46 Pages, 21 Figures, Invited Review for the Handbook of Massive
Datasets, editors J. Abello, P. Pardalos, and M. Resende. Due to space
limitations this version has low resolution figures. For full resolution
review see http://www.astro.caltech.edu/~rb/publications/hmds.ps.g
- …