18 research outputs found
Data Mining the SDSS SkyServer Database
An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte
Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described
the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty
database queries and twelve data visualization tasks that a good data
management system should support. We built a database and interfaces to support
both the query load and also a website for ad-hoc access. This paper reports on
the database design, describes the data loading pipeline, and reports on the
query implementation and performance. The queries typically translated to a
single SQL statement. Most queries run in less than 20 seconds, allowing
scientists to interactively explore the database. This paper is an in-depth
tour of those queries. Readers should first have studied the companion overview
paper Szalay et. al. "The SDSS SkyServer, Public Access to the Sloan Digital
Sky Server Data" ACM SIGMOND 2002.Comment: 40 pages, Original source is at
http://research.microsoft.com/~gray/Papers/MSR_TR_O2_01_20_queries.do
Massive Stochastic Testing of SQL
: Deterministic testing of SQL database systems is human intensive and cannot adequately cover the SQL input domain. A system (RAGS), was built to stochastically generate valid SQL statements 1 million times faster than a human and execute them. This paper describes RAGS and the results from turning it lose on several commercial SQL systems. 1. Testing SQL is Hard Good test coverage of commercial SQL database systems is very hard. The input domain, all SQL statements, from any number of users, combined with all states of the database, is gigantic. It is also difficult to verify output for positive tests because the semantics of SQL are complicated. Software engineering technology exists to predictably improve quality ([1] for example). The techniques involve a software development process including unit tests and final system validation tests (to verify the absence of bugs). This process requires a substantial investment so commercial SQL vendors with tight schedules tend to use a m..
Microsoft TerraServer: A Spatial Data Warehouse
Microsoft TerraServer stores aerial, satellite, and topographic images of the earth in a SQL database available via the Internet. It is the world's largest online atlas, combining five terabytes of image data from the United States Geological Survey (USGS) and SPIN-2. Internet browsers provide intuitive spatial and text interfaces to the data. Users need no special hardware, software, or knowledge to locate and browse imagery. This paper describes how terabytes of "Internet unfriendly" geo-spatial images were scrubbed and edited into hundreds of millions of "Internet friendly" image tiles and loaded into a SQL data warehouse. Microsoft TerraServer demonstrates that generalpurpose relational database technology can manage large scale image repositories, and shows that web browsers can be a good geospatial image presentation system. 1. Overview The TerraServer is the world's largest public repository of highresolution aerial, satellite, and topographic data. It is designed to be access..
Designing and mining multi-terabyte astronomy archives: The Sloan Digital Sky Survey
classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions fro
Tom Barclay
Microsoft TerraServer stores aerial, satellite, and topographic images of the earth in a SQL database available via the Internet. It is the world's largest online atlas, combining eight terabytes of image data from the United States Geological Survey (USGS) and SPIN-2. Internet browsers provide intuitive spatial and text interfaces to the data. Users need no special hardware, software, or knowledge to locate and browse imagery. This paper describes how terabytes of "Internet unfriendly" geo-spatial images were scrubbed and edited into hundreds of millions of "Internet friendly" image tiles and loaded into a SQL data warehouse. All meta-data and imagery are stored in the SQL database. TerraServer demonstrates that general-purpose relational database technology can manage large scale image repositories, and shows that web browsers can be a good geo-spatial image presentation system