178 research outputs found

    Petabyte Scale Data Mining: Dream or Reality?

    Full text link
    Science is becoming very data intensive1. Today's astronomy datasets with tens of millions of galaxies already present substantial challenges for data mining. In less than 10 years the catalogs are expected to grow to billions of objects, and image archives will reach Petabytes. Imagine having a 100GB database in 1996, when disk scanning speeds were 30MB/s, and database tools were immature. Such a task today is trivial, almost manageable with a laptop. We think that the issue of a PB database will be very similar in six years. In this paper we scale our current experiments in data archiving and analysis on the Sloan Digital Sky Survey2,3 data six years into the future. We analyze these projections and look at the requirements of performing data mining on such data sets. We conclude that the task scales rather well: we could do the job today, although it would be expensive. There do not seem to be any show-stoppers that would prevent us from storing and using a Petabyte dataset six years from today.Comment: originals at http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2002-8

    Data Mining the SDSS SkyServer Database

    Full text link
    An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty database queries and twelve data visualization tasks that a good data management system should support. We built a database and interfaces to support both the query load and also a website for ad-hoc access. This paper reports on the database design, describes the data loading pipeline, and reports on the query implementation and performance. The queries typically translated to a single SQL statement. Most queries run in less than 20 seconds, allowing scientists to interactively explore the database. This paper is an in-depth tour of those queries. Readers should first have studied the companion overview paper Szalay et. al. "The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data" ACM SIGMOND 2002.Comment: 40 pages, Original source is at http://research.microsoft.com/~gray/Papers/MSR_TR_O2_01_20_queries.do

    Online Scientific Data Curation, Publication, and Archiving

    Get PDF
    Science projects are data publishers. The scale and complexity of current and future science data changes the nature of the publication process. Publication is becoming a major project component. At a minimum, a project must preserve the ephemeral data it gathers. Derived data can be reconstructed from metadata, but metadata is ephemeral. Longer term, a project should expect some archive to preserve the data. We observe that pub-lished scientific data needs to be available forever ? this gives rise to the data pyramid of versions and to data inflation where the derived data volumes explode. As an example, this article describes the Sloan Digital Sky Survey (SDSS) strategies for data publication, data access, curation, and preservation.Comment: original at http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2002-7

    Galaxy Zoo: Disentangling the Environmental Dependence of Morphology and Colour

    Get PDF
    We analyze the environmental dependence of galaxy morphology and colour with two-point clustering statistics, using data from the Galaxy Zoo, the largest sample of visually classified morphologies yet compiled, extracted from the Sloan Digital Sky Survey. We present two-point correlation functions of spiral and early-type galaxies, and we quantify the correlation between morphology and environment with marked correlation functions. These yield clear and precise environmental trends across a wide range of scales, analogous to similar measurements with galaxy colours, indicating that the Galaxy Zoo classifications themselves are very precise. We measure morphology marked correlation functions at fixed colour and find that they are relatively weak, with the only residual correlation being that of red galaxies at small scales, indicating a morphology gradient within haloes for red galaxies. At fixed morphology, we find that the environmental dependence of colour remains strong, and these correlations remain for fixed morphology \textit{and} luminosity. An implication of this is that much of the morphology--density relation is due to the relation between colour and density. Our results also have implications for galaxy evolution: the morphological transformation of galaxies is usually accompanied by a colour transformation, but not necessarily vice versa. A spiral galaxy may move onto the red sequence of the colour-magnitude diagram without quickly becoming an early-type. We analyze the significant population of red spiral galaxies, and present evidence that they tend to be located in moderately dense environments and are often satellite galaxies in the outskirts of haloes. Finally, we combine our results to argue that central and satellite galaxies tend to follow different evolutionary paths.Comment: 19 pages, 18 figures. Accepted for publication in MNRA

    The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data

    Full text link
    The SkyServer provides Internet access to the public Sloan Digital Sky Survey (SDSS) data for both astronomers and for science education. This paper describes the SkyServer goals and architecture. It also describes our experience operating the SkyServer on the Internet. The SDSS data is public and well-documented so it makes a good test platform for research on database algorithms and performance.Comment: submitted for publication, original at http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2001-10

    Galaxy Zoo: The large-scale spin statistics of spiral galaxies in the Sloan Digital Sky Survey

    Get PDF
    We re-examine the evidence for a violation of large-scale statistical isotropy in the distribution of projected spin vectors of spiral galaxies. We have a sample of 37,000\sim 37,000 spiral galaxies from the Sloan Digital Sky Survey, with their line of sight spin direction confidently classified by members of the public through the online project Galaxy Zoo. After establishing and correcting for a certain level of bias in our handedness results we find the winding sense of the galaxies to be consistent with statistical isotropy. In particular we find no significant dipole signal, and thus no evidence for overall preferred handedness of the Universe. We compare this result to those of other authors and conclude that these may also be affected and explained by a bias effect.Comment: Accepted for publication in MNRAS. 8 pages, 5 figure

    Galaxy Zoo: Motivations of Citizen Scientists

    Full text link
    Citizen science, in which volunteers work with professional scientists to conduct research, is expanding due to large online datasets. To plan projects, it is important to understand volunteers' motivations for participating. This paper analyzes results from an online survey of nearly 11,000 volunteers in Galaxy Zoo, an astronomy citizen science project. Results show that volunteers' primary motivation is a desire to contribute to scientific research. We encourage other citizen science projects to study the motivations of their volunteers, to see whether and how these results may be generalized to inform the field of citizen science.Comment: 41 pages, including 6 figures and one appendix. In press at Astronomy Education Revie

    Galaxy Zoo: Exploring the Motivations of Citizen Science Volunteers

    Full text link
    The Galaxy Zoo citizen science website invites anyone with an Internet connection to participate in research by classifying galaxies from the Sloan Digital Sky Survey. As of April 2009, more than 200,000 volunteers had made more than 100 million galaxy classifications. In this paper, we present results of a pilot study into the motivations and demographics of Galaxy Zoo volunteers, and define a technique to determine motivations from free responses that can be used in larger multiple-choice surveys with similar populations. Our categories form the basis for a future survey, with the goal of determining the prevalence of each motivation.Comment: 15 pages, 3 figure

    Visuo-spatial ability in colonoscopy simulator training

    Get PDF
    Visuo-spatial ability is associated with a quality of performance in a variety of surgical and medical skills. However, visuo-spatial ability is typically assessed using Visualization tests only, which led to an incomplete understanding of the involvement of visuo-spatial ability in these skills. To remedy this situation, the current study investigated the role of a broad range of visuo-spatial factors in colonoscopy simulator training. Fifteen medical trainees (no clinical experience in colonoscopy) participated in two psycho-metric test sessions to assess four visuo-spatial ability factors. Next, participants trained flexible endoscope manipulation, and navigation to the cecum on the GI Mentor II simulator, for four sessions within 1 week. Visualization, and to a lesser degree Spatial relations were the only visuo-spatial ability factors to correlate with colonoscopy simulator performance. Visualization additionally covaried with learning rate for time on task on both simulator tasks. High Visualization ability indicated faster exercise completion. Similar to other endoscopic procedures, performance in colonoscopy is positively associated with Visualization, a visuo-spatial ability factor characterized by the ability to mentally manipulate complex visuo-spatial stimuli. The complexity of the visuo-spatial mental transformations required to successfully perform colonoscopy is likely responsible for the challenging nature of this technique, and should inform training- and assessment design. Long term training studies, as well as studies investigating the nature of visuo-spatial complexity in this domain are needed to better understand the role of visuo-spatial ability in colonoscopy, and other endoscopic techniques
    corecore