17 research outputs found
Data Mining the SDSS SkyServer Database
An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte
Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described
the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty
database queries and twelve data visualization tasks that a good data
management system should support. We built a database and interfaces to support
both the query load and also a website for ad-hoc access. This paper reports on
the database design, describes the data loading pipeline, and reports on the
query implementation and performance. The queries typically translated to a
single SQL statement. Most queries run in less than 20 seconds, allowing
scientists to interactively explore the database. This paper is an in-depth
tour of those queries. Readers should first have studied the companion overview
paper Szalay et. al. "The SDSS SkyServer, Public Access to the Sloan Digital
Sky Server Data" ACM SIGMOND 2002.Comment: 40 pages, Original source is at
http://research.microsoft.com/~gray/Papers/MSR_TR_O2_01_20_queries.do
Online Scientific Data Curation, Publication, and Archiving
Science projects are data publishers. The scale and complexity of current and
future science data changes the nature of the publication process. Publication
is becoming a major project component. At a minimum, a project must preserve
the ephemeral data it gathers. Derived data can be reconstructed from metadata,
but metadata is ephemeral. Longer term, a project should expect some archive to
preserve the data. We observe that pub-lished scientific data needs to be
available forever ? this gives rise to the data pyramid of versions and to data
inflation where the derived data volumes explode. As an example, this article
describes the Sloan Digital Sky Survey (SDSS) strategies for data publication,
data access, curation, and preservation.Comment: original at
http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2002-7
The Eighteenth Data Release of the Sloan Digital Sky Surveys: Targeting and First Spectra from SDSS-V
The eighteenth data release of the Sloan Digital Sky Surveys (SDSS) is the
first one for SDSS-V, the fifth generation of the survey. SDSS-V comprises
three primary scientific programs, or "Mappers": Milky Way Mapper (MWM), Black
Hole Mapper (BHM), and Local Volume Mapper (LVM). This data release contains
extensive targeting information for the two multi-object spectroscopy programs
(MWM and BHM), including input catalogs and selection functions for their
numerous scientific objectives. We describe the production of the targeting
databases and their calibration- and scientifically-focused components. DR18
also includes ~25,000 new SDSS spectra and supplemental information for X-ray
sources identified by eROSITA in its eFEDS field. We present updates to some of
the SDSS software pipelines and preview changes anticipated for DR19. We also
describe three value-added catalogs (VACs) based on SDSS-IV data that have been
published since DR17, and one VAC based on the SDSS-V data in the eFEDS field.Comment: Accepted to ApJ
The eighteenth data release of the Sloan Digital Sky Surveys : targeting and first spectra from SDSS-V
The eighteenth data release of the Sloan Digital Sky Surveys (SDSS) is the first one for SDSS-V, the fifth generation of the survey. SDSS-V comprises three primary scientific programs, or "Mappers": Milky Way Mapper (MWM), Black Hole Mapper (BHM), and Local Volume Mapper (LVM). This data release contains extensive targeting information for the two multi-object spectroscopy programs (MWM and BHM), including input catalogs and selection functions for their numerous scientific objectives. We describe the production of the targeting databases and their calibration- and scientifically-focused components. DR18 also includes ~25,000 new SDSS spectra and supplemental information for X-ray sources identified by eROSITA in its eFEDS field. We present updates to some of the SDSS software pipelines and preview changes anticipated for DR19. We also describe three value-added catalogs (VACs) based on SDSS-IV data that have been published since DR17, and one VAC based on the SDSS-V data in the eFEDS field.Publisher PDFPeer reviewe
The Seventeenth Data Release of the Sloan Digital Sky Surveys: Complete Release of MaNGA, MaStar and APOGEE-2 Data
This paper documents the seventeenth data release (DR17) from the Sloan Digital Sky Surveys; the fifth and final release from the fourth phase (SDSS-IV). DR17 contains the complete release of the Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey, which reached its goal of surveying over 10,000 nearby galaxies. The complete release of the MaNGA Stellar Library (MaStar) accompanies this data, providing observations of almost 30,000 stars through the MaNGA instrument during bright time. DR17 also contains the complete release of the Apache Point Observatory Galactic Evolution Experiment 2 (APOGEE-2) survey which publicly releases infra-red spectra of over 650,000 stars. The main sample from the Extended Baryon Oscillation Spectroscopic Survey (eBOSS), as well as the sub-survey Time Domain Spectroscopic Survey (TDSS) data were fully released in DR16. New single-fiber optical spectroscopy released in DR17 is from the SPectroscipic IDentification of ERosita Survey (SPIDERS) sub-survey and the eBOSS-RM program. Along with the primary data sets, DR17 includes 25 new or updated Value Added Catalogs (VACs). This paper concludes the release of SDSS-IV survey data. SDSS continues into its fifth phase with observations already underway for the Milky Way Mapper (MWM), Local Volume Mapper (LVM) and Black Hole Mapper (BHM) surveys
Jim Gray, Microsoft Research
Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to preserve the data. We observe that published scientific data needs to be available forever -- this gives rise to the data pyramid of versions and to data inflation where the derived data volumes explode. As an example, this article describes the Sloan Digital Sky Survey (SDSS) strategies for data publication, data access, curation, and preservation