178 research outputs found
Petabyte Scale Data Mining: Dream or Reality?
Science is becoming very data intensive1. Today's astronomy datasets with
tens of millions of galaxies already present substantial challenges for data
mining. In less than 10 years the catalogs are expected to grow to billions of
objects, and image archives will reach Petabytes. Imagine having a 100GB
database in 1996, when disk scanning speeds were 30MB/s, and database tools
were immature. Such a task today is trivial, almost manageable with a laptop.
We think that the issue of a PB database will be very similar in six years. In
this paper we scale our current experiments in data archiving and analysis on
the Sloan Digital Sky Survey2,3 data six years into the future. We analyze
these projections and look at the requirements of performing data mining on
such data sets. We conclude that the task scales rather well: we could do the
job today, although it would be expensive. There do not seem to be any
show-stoppers that would prevent us from storing and using a Petabyte dataset
six years from today.Comment: originals at
http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2002-8
Data Mining the SDSS SkyServer Database
An earlier paper (Szalay et. al. "Designing and Mining MultiTerabyte
Astronomy Archives: The Sloan Digital Sky Survey," ACM SIGMOD 2000) described
the Sloan Digital Sky Survey's (SDSS) data management needs by defining twenty
database queries and twelve data visualization tasks that a good data
management system should support. We built a database and interfaces to support
both the query load and also a website for ad-hoc access. This paper reports on
the database design, describes the data loading pipeline, and reports on the
query implementation and performance. The queries typically translated to a
single SQL statement. Most queries run in less than 20 seconds, allowing
scientists to interactively explore the database. This paper is an in-depth
tour of those queries. Readers should first have studied the companion overview
paper Szalay et. al. "The SDSS SkyServer, Public Access to the Sloan Digital
Sky Server Data" ACM SIGMOND 2002.Comment: 40 pages, Original source is at
http://research.microsoft.com/~gray/Papers/MSR_TR_O2_01_20_queries.do
Online Scientific Data Curation, Publication, and Archiving
Science projects are data publishers. The scale and complexity of current and
future science data changes the nature of the publication process. Publication
is becoming a major project component. At a minimum, a project must preserve
the ephemeral data it gathers. Derived data can be reconstructed from metadata,
but metadata is ephemeral. Longer term, a project should expect some archive to
preserve the data. We observe that pub-lished scientific data needs to be
available forever ? this gives rise to the data pyramid of versions and to data
inflation where the derived data volumes explode. As an example, this article
describes the Sloan Digital Sky Survey (SDSS) strategies for data publication,
data access, curation, and preservation.Comment: original at
http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2002-7
Galaxy Zoo: Disentangling the Environmental Dependence of Morphology and Colour
We analyze the environmental dependence of galaxy morphology and colour with
two-point clustering statistics, using data from the Galaxy Zoo, the largest
sample of visually classified morphologies yet compiled, extracted from the
Sloan Digital Sky Survey. We present two-point correlation functions of spiral
and early-type galaxies, and we quantify the correlation between morphology and
environment with marked correlation functions. These yield clear and precise
environmental trends across a wide range of scales, analogous to similar
measurements with galaxy colours, indicating that the Galaxy Zoo
classifications themselves are very precise. We measure morphology marked
correlation functions at fixed colour and find that they are relatively weak,
with the only residual correlation being that of red galaxies at small scales,
indicating a morphology gradient within haloes for red galaxies. At fixed
morphology, we find that the environmental dependence of colour remains strong,
and these correlations remain for fixed morphology \textit{and} luminosity. An
implication of this is that much of the morphology--density relation is due to
the relation between colour and density. Our results also have implications for
galaxy evolution: the morphological transformation of galaxies is usually
accompanied by a colour transformation, but not necessarily vice versa. A
spiral galaxy may move onto the red sequence of the colour-magnitude diagram
without quickly becoming an early-type. We analyze the significant population
of red spiral galaxies, and present evidence that they tend to be located in
moderately dense environments and are often satellite galaxies in the outskirts
of haloes. Finally, we combine our results to argue that central and satellite
galaxies tend to follow different evolutionary paths.Comment: 19 pages, 18 figures. Accepted for publication in MNRA
The SDSS SkyServer, Public Access to the Sloan Digital Sky Server Data
The SkyServer provides Internet access to the public Sloan Digital Sky Survey
(SDSS) data for both astronomers and for science education. This paper
describes the SkyServer goals and architecture. It also describes our
experience operating the SkyServer on the Internet. The SDSS data is public and
well-documented so it makes a good test platform for research on database
algorithms and performance.Comment: submitted for publication, original at
http://research.microsoft.com/scripts/pubs/view.asp?TR_ID=MSR-TR-2001-10
Galaxy Zoo: The large-scale spin statistics of spiral galaxies in the Sloan Digital Sky Survey
We re-examine the evidence for a violation of large-scale statistical
isotropy in the distribution of projected spin vectors of spiral galaxies. We
have a sample of spiral galaxies from the Sloan Digital Sky
Survey, with their line of sight spin direction confidently classified by
members of the public through the online project Galaxy Zoo. After establishing
and correcting for a certain level of bias in our handedness results we find
the winding sense of the galaxies to be consistent with statistical isotropy.
In particular we find no significant dipole signal, and thus no evidence for
overall preferred handedness of the Universe. We compare this result to those
of other authors and conclude that these may also be affected and explained by
a bias effect.Comment: Accepted for publication in MNRAS. 8 pages, 5 figure
Galaxy Zoo: Motivations of Citizen Scientists
Citizen science, in which volunteers work with professional scientists to
conduct research, is expanding due to large online datasets. To plan projects,
it is important to understand volunteers' motivations for participating. This
paper analyzes results from an online survey of nearly 11,000 volunteers in
Galaxy Zoo, an astronomy citizen science project. Results show that volunteers'
primary motivation is a desire to contribute to scientific research. We
encourage other citizen science projects to study the motivations of their
volunteers, to see whether and how these results may be generalized to inform
the field of citizen science.Comment: 41 pages, including 6 figures and one appendix. In press at Astronomy
Education Revie
Galaxy Zoo: Exploring the Motivations of Citizen Science Volunteers
The Galaxy Zoo citizen science website invites anyone with an Internet
connection to participate in research by classifying galaxies from the Sloan
Digital Sky Survey. As of April 2009, more than 200,000 volunteers had made
more than 100 million galaxy classifications. In this paper, we present results
of a pilot study into the motivations and demographics of Galaxy Zoo
volunteers, and define a technique to determine motivations from free responses
that can be used in larger multiple-choice surveys with similar populations.
Our categories form the basis for a future survey, with the goal of determining
the prevalence of each motivation.Comment: 15 pages, 3 figure
Visuo-spatial ability in colonoscopy simulator training
Visuo-spatial ability is associated with a quality of performance in a variety of surgical and medical skills. However, visuo-spatial ability is typically assessed using Visualization tests only, which led to an incomplete understanding of the involvement of visuo-spatial ability in these skills. To remedy this situation, the current study investigated the role of a broad range of visuo-spatial factors in colonoscopy simulator training. Fifteen medical trainees (no clinical experience in colonoscopy) participated in two psycho-metric test sessions to assess four visuo-spatial ability factors. Next, participants trained flexible endoscope manipulation, and navigation to the cecum on the GI Mentor II simulator, for four sessions within 1 week. Visualization, and to a lesser degree Spatial relations were the only visuo-spatial ability factors to correlate with colonoscopy simulator performance. Visualization additionally covaried with learning rate for time on task on both simulator tasks. High Visualization ability indicated faster exercise completion. Similar to other endoscopic procedures, performance in colonoscopy is positively associated with Visualization, a visuo-spatial ability factor characterized by the ability to mentally manipulate complex visuo-spatial stimuli. The complexity of the visuo-spatial mental transformations required to successfully perform colonoscopy is likely responsible for the challenging nature of this technique, and should inform training- and assessment design. Long term training studies, as well as studies investigating the nature of visuo-spatial complexity in this domain are needed to better understand the role of visuo-spatial ability in colonoscopy, and other endoscopic techniques
- …