18 research outputs found

    Data Mining the SDSS SkyServer Database

    Full text link
    An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty database queries and twelve data visualization tasks that a good data management system should support. We built a database and interfaces to support both the query load and also a website for ad-hoc access. This paper reports on the database design, describes the data loading pipeline, and reports on the query implementation and performance. The queries typically translated to a single SQL statement. Most queries run in less than 20 seconds, allowing scientists to interactively explore the database. This paper is an in-depth tour of those queries. Readers should first have studied the companion overview paper Szalay et. al. "The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data" ACM SIGMOND 2002.Comment: 40 pages, Original source is at http://research.microsoft.com/~gray/Papers/MSR_TR_O2_01_20_queries.do

    Massive Stochastic Testing of SQL

    No full text
    : Deterministic testing of SQL database systems is human intensive and cannot adequately cover the SQL input domain. A system (RAGS), was built to stochastically generate valid SQL statements 1 million times faster than a human and execute them. This paper describes RAGS and the results from turning it lose on several commercial SQL systems. 1. Testing SQL is Hard Good test coverage of commercial SQL database systems is very hard. The input domain, all SQL statements, from any number of users, combined with all states of the database, is gigantic. It is also difficult to verify output for positive tests because the semantics of SQL are complicated. Software engineering technology exists to predictably improve quality ([1] for example). The techniques involve a software development process including unit tests and final system validation tests (to verify the absence of bugs). This process requires a substantial investment so commercial SQL vendors with tight schedules tend to use a m..

    Microsoft TerraServer: A Spatial Data Warehouse

    No full text
    Microsoft TerraServer stores aerial, satellite, and topographic images of the earth in a SQL database available via the Internet. It is the world's largest online atlas, combining five terabytes of image data from the United States Geological Survey (USGS) and SPIN-2. Internet browsers provide intuitive spatial and text interfaces to the data. Users need no special hardware, software, or knowledge to locate and browse imagery. This paper describes how terabytes of "Internet unfriendly" geo-spatial images were scrubbed and edited into hundreds of millions of "Internet friendly" image tiles and loaded into a SQL data warehouse. Microsoft TerraServer demonstrates that generalpurpose relational database technology can manage large scale image repositories, and shows that web browsers can be a good geospatial image presentation system. 1. Overview The TerraServer is the world's largest public repository of highresolution aerial, satellite, and topographic data. It is designed to be access..

    Designing and mining multi-terabyte astronomy archives: The Sloan Digital Sky Survey

    No full text
    classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions fro

    Tom Barclay

    No full text
    Microsoft TerraServer stores aerial, satellite, and topographic images of the earth in a SQL database available via the Internet. It is the world's largest online atlas, combining eight terabytes of image data from the United States Geological Survey (USGS) and SPIN-2. Internet browsers provide intuitive spatial and text interfaces to the data. Users need no special hardware, software, or knowledge to locate and browse imagery. This paper describes how terabytes of "Internet unfriendly" geo-spatial images were scrubbed and edited into hundreds of millions of "Internet friendly" image tiles and loaded into a SQL data warehouse. All meta-data and imagery are stored in the SQL database. TerraServer demonstrates that general-purpose relational database technology can manage large scale image repositories, and shows that web browsers can be a good geo-spatial image presentation system
    corecore