118 research outputs found

    Some statistical and computational challenges, and opportunities in astronomy

    Get PDF
    The data complexity and volume of astronomical findings have increased in recent decades due to major technological improvements in instrumentation and data collection methods. The contemporary astronomer is flooded with terabytes of raw data that produce enormous multidimensional catalogs of objects (stars, galaxies, quasars, etc.) numbering in the billions, with hundreds of measured numbers for each object. The astronomical community thus faces a key task: to enable efficient and objective scientific exploitation of enormous multifaceted data sets and the complex links between data and astrophysical theory. In recognition of this task, the National Virtual Observatory (NVO) initiative recently emerged to federate numerous large digital sky archives, and to develop tools to explore and understand these vast volumes of data. The effective use of such integrated massive data sets presents a variety of new challenging statistical and algorithmic problems that require methodological advances. An interdisciplinary team of statisticians, astronomers and computer scientists from The Pennsylvania State University, California Institute of Technology and Carnegie Mellon University is developing statistical methodology for the NVO. A brief glimpse into the Virtual Observatory and the work of the Penn State-led team is provided here

    Multivariate statistical analysis software technologies for astrophysical research involving large data bases

    Get PDF
    We developed a package to process and analyze the data from the digital version of the Second Palomar Sky Survey. This system, called SKICAT, incorporates the latest in machine learning and expert systems software technology, in order to classify the detected objects objectively and uniformly, and facilitate handling of the enormous data sets from digital sky surveys and other sources. The system provides a powerful, integrated environment for the manipulation and scientific investigation of catalogs from virtually any source. It serves three principal functions: image catalog construction, catalog management, and catalog analysis. Through use of the GID3* Decision Tree artificial induction software, SKICAT automates the process of classifying objects within CCD and digitized plate images. To exploit these catalogs, the system also provides tools to merge them into a large, complete database which may be easily queried and modified when new data or better methods of calibrating or classifying become available. The most innovative feature of SKICAT is the facility it provides to experiment with and apply the latest in machine learning technology to the tasks of catalog construction and analysis. SKICAT provides a unique environment for implementing these tools for any number of future scientific purposes. Initial scientific verification and performance tests have been made using galaxy counts and measurements of galaxy clustering from small subsets of the survey data, and a search for very high redshift quasars. All of the tests were successful, and produced new and interesting scientific results. Attachments to this report give detailed accounts of the technical aspects for multivariate statistical analysis of small and moderate-size data sets, called STATPROG. The package was tested extensively on a number of real scientific applications, and has produced real, published results

    Astrophysics in the Era of Massive Time-Domain Surveys

    Get PDF
    Synoptic sky surveys are now the largest data producers in astronomy, entering the Petascale regime, opening the time domain for a systematic exploration. A great variety of interesting phenomena, spanning essentially all subfields of astronomy, can only be studied in the time domain, and these new surveys are producing large statistical samples of the known types of objects and events for further studies (e.g., SNe, AGN, variable stars of many kinds), and have already uncovered previously unknown subtypes of these (e.g., rare or peculiar types of SNe). These surveys are generating a new science, and paving the way for even larger surveys to come, e.g., the LSST; our ability to fully exploit such forthcoming facilities depends critically on the science, methodology, and experience that are being accumulated now. Among the outstanding challenges, the foremost is our ability to conduct an effective follow-up of the interesting events discovered by the surveys in any wavelength regime. The follow-up resources, especially spectroscopy, are already and, for the predictable future, will be severely limited, thus requiring an intelligent down-selection of the most astrophysically interesting events to follow. The first step in that process is an automated, real-time, iterative classification of events, that incorporates heterogeneous data from the surveys themselves, archival and contextual information (spatial, temporal, and multiwavelength), and the incoming follow-up observations. The second step is an optimal automated event prioritization and allocation of the available follow-up resources that also change in time. Both of these challenges are highly non-trivial, and require a strong cyber-infrastructure based on the Virtual Observatory data grid, and the various astroinformatics efforts. Time domain astronomy is inherently an astronomy of telescope-computational systems, and will increasingly depend on novel machine learning and artificial intelligence tools. Another arena with a strong potential for discovery is a purely archival, non-time-critical exploration of the time domain, with the time dimension adding the complexity to an already challenging problem of data mining of highly-dimensional parameter spaces produced by sky surveys

    The National Virtual Observatory

    Get PDF
    As a scientific discipline, Astronomy is rather unique. We only have one laboratory, the Universe, and we cannot, of course, change the initial conditions and study the resulting effects. On top of this, acquiring Astronomical data has historically been a very labor-intensive effort. As a result, data has traditionally been preserved for posterity. With recent technological advances, however, the rate at which we acquire new data has grown exponentially, which has generated a Data Tsunami, whose wave train threatens to overwhelm the field. In this conference proceedings, we present and define the concept of virtual observatories, which we feel is the only logical answer to this dilemma.Comment: 5 pages, uses newpasp.sty (included), to appear in "Extragalactic Gas at Low Redshfit", ASP Conf. Series, J. S. Mulchaey and J. T. Stocke (eds.

    Striking Photospheric Abundance Anomalies in Blue Horizontal-Branch Stars in Globular Cluster M13

    Get PDF
    High-resolution optical spectra of thirteen blue horizontal-branch (BHB) stars in the globular cluster M13 show enormous deviations in element abundances from the expected cluster metallicity. In the hotter stars (T_eff > 12000 K), helium is depleted by factors of 10 to 100 below solar, while iron is enhanced to three times the solar abundance, two orders of magnitude above the canonical metallicity [Fe/H] ~= -1.5 dex for this globular cluster. Nitrogen, phosphorus, and chromium exhibit even more pronounced enhancements, and other metals are also mildly overabundant, with the exception of magnesium, which stays very near the expected cluster metallicity. These photospheric anomalies are most likely due to diffusion --- gravitational settling of helium, and radiative levitation of the other elements --- in the stable radiative atmospheres of these hot stars. The effects of these mechanisms may have some impact on the photometric morphology of the cluster's horizontal branch and on estimates of its age and distance.Comment: 11 pages, 1 Postscript figure, uses aaspp4.sty, accepted for publication in ApJ Letter

    A possible close supermassive black-hole binary in a quasar with optical periodicity

    Full text link
    Quasars have long been known to be variable sources at all wavelengths. Their optical variability is stochastic, can be due to a variety of physical mechanisms, and is well-described statistically in terms of a damped random walk model. The recent availability of large collections of astronomical time series of flux measurements (light curves) offers new data sets for a systematic exploration of quasar variability. Here we report on the detection of a strong, smooth periodic signal in the optical variability of the quasar PG 1302-102 with a mean observed period of 1,884 ±\pm 88 days. It was identified in a search for periodic variability in a data set of light curves for 247,000 known, spectroscopically confirmed quasars with a temporal baseline of ∼9\sim9 years. While the interpretation of this phenomenon is still uncertain, the most plausible mechanisms involve a binary system of two supermassive black holes with a subparsec separation. Such systems are an expected consequence of galaxy mergers and can provide important constraints on models of galaxy formation and evolution.Comment: 19 pages, 6 figures. Published online by Nature on 7 January 201

    Massive Datasets in Astronomy

    Get PDF
    Astronomy has a long history of acquiring, systematizing, and interpreting large quantities of data. Starting from the earliest sky atlases through the first major photographic sky surveys of the 20th century, this tradition is continuing today, and at an ever increasing rate. Like many other fields, astronomy has become a very data-rich science, driven by the advances in telescope, detector, and computer technology. Numerous large digital sky surveys and archives already exist, with information content measured in multiple Terabytes, and even larger, multi-Petabyte data sets are on the horizon. Systematic observations of the sky, over a range of wavelengths, are becoming the primary source of astronomical data. Numerical simulations are also producing comparable volumes of information. Data mining promises to both make the scientific utilization of these data sets more effective and more complete, and to open completely new avenues of astronomical research. Technological problems range from the issues of database design and federation, to data mining and advanced visualization, leading to a new toolkit for astronomical research. This is similar to challenges encountered in other data-intensive fields today. These advances are now being organized through a concept of the Virtual Observatories, federations of data archives and services representing a new information infrastructure for astronomy of the 21st century. In this article, we provide an overview of some of the major datasets in astronomy, discuss different techniques used for archiving data, and conclude with a discussion of the future of massive datasets in astronomy.Comment: 46 Pages, 21 Figures, Invited Review for the Handbook of Massive Datasets, editors J. Abello, P. Pardalos, and M. Resende. Due to space limitations this version has low resolution figures. For full resolution review see http://www.astro.caltech.edu/~rb/publications/hmds.ps.g
    • …
    corecore