7 research outputs found
Immersive and Collaborative Data Visualization Using Virtual Reality Platforms
Effective data visualization is a key part of the discovery process in the
era of big data. It is the bridge between the quantitative content of the data
and human intuition, and thus an essential component of the scientific path
from data into knowledge and understanding. Visualization is also essential in
the data mining process, directing the choice of the applicable algorithms, and
in helping to identify and remove bad data from the analysis. However, a high
complexity or a high dimensionality of modern data sets represents a critical
obstacle. How do we visualize interesting structures and patterns that may
exist in hyper-dimensional data spaces? A better understanding of how we can
perceive and interact with multi dimensional information poses some deep
questions in the field of cognition technology and human computer interaction.
To this effect, we are exploring the use of immersive virtual reality platforms
for scientific data visualization, both as software and inexpensive commodity
hardware. These potentially powerful and innovative tools for multi dimensional
data visualization can also provide an easy and natural path to a collaborative
data visualization and exploration, where scientists can interact with their
data and their colleagues in the same visual space. Immersion provides benefits
beyond the traditional desktop visualization tools: it leads to a demonstrably
better perception of a datascape geometry, more intuitive data understanding,
and a better retention of the perceived relationships in the data.Comment: 6 pages, refereed proceedings of 2014 IEEE International Conference
on Big Data, page 609, ISBN 978-1-4799-5665-
Automated Real-Time Classification and Decision Making in Massive Data Streams from Synoptic Sky Surveys
The nature of scientific and technological data collection is evolving
rapidly: data volumes and rates grow exponentially, with increasing complexity
and information content, and there has been a transition from static data sets
to data streams that must be analyzed in real time. Interesting or anomalous
phenomena must be quickly characterized and followed up with additional
measurements via optimal deployment of limited assets. Modern astronomy
presents a variety of such phenomena in the form of transient events in digital
synoptic sky surveys, including cosmic explosions (supernovae, gamma ray
bursts), relativistic phenomena (black hole formation, jets), potentially
hazardous asteroids, etc. We have been developing a set of machine learning
tools to detect, classify and plan a response to transient events for astronomy
applications, using the Catalina Real-time Transient Survey (CRTS) as a
scientific and methodological testbed. The ability to respond rapidly to the
potentially most interesting events is a key bottleneck that limits the
scientific returns from the current and anticipated synoptic sky surveys.
Similar challenge arise in other contexts, from environmental monitoring using
sensor networks to autonomous spacecraft systems. Given the exponential growth
of data rates, and the time-critical response, we need a fully automated and
robust approach. We describe the results obtained to date, and the possible
future developments.Comment: 8 pages, IEEE conference format, to appear in the refereed
proceedings of the IEEE e-Science 2014 conf., eds. C. Medeiros et al., IEEE,
in press (2014). arXiv admin note: substantial text overlap with
arXiv:1209.1681, arXiv:1110.465
Nonparametric Transient Classification using Adaptive Wavelets
Classifying transients based on multi band light curves is a challenging but
crucial problem in the era of GAIA and LSST since the sheer volume of
transients will make spectroscopic classification unfeasible. Here we present a
nonparametric classifier that uses the transient's light curve measurements to
predict its class given training data. It implements two novel components: the
first is the use of the BAGIDIS wavelet methodology - a characterization of
functional data using hierarchical wavelet coefficients. The second novelty is
the introduction of a ranked probability classifier on the wavelet coefficients
that handles both the heteroscedasticity of the data in addition to the
potential non-representativity of the training set. The ranked classifier is
simple and quick to implement while a major advantage of the BAGIDIS wavelets
is that they are translation invariant, hence they do not need the light curves
to be aligned to extract features. Further, BAGIDIS is nonparametric so it can
be used for blind searches for new objects. We demonstrate the effectiveness of
our ranked wavelet classifier against the well-tested Supernova Photometric
Classification Challenge dataset in which the challenge is to correctly
classify light curves as Type Ia or non-Ia supernovae. We train our ranked
probability classifier on the spectroscopically-confirmed subsample (which is
not representative) and show that it gives good results for all supernova with
observed light curve timespans greater than 100 days (roughly 55% of the
dataset). For such data, we obtain a Ia efficiency of 80.5% and a purity of
82.4% yielding a highly competitive score of 0.49 whilst implementing a truly
"model-blind" approach to supernova classification. Consequently this approach
may be particularly suitable for the classification of astronomical transients
in the era of large synoptic sky surveys.Comment: 14 pages, 8 figures. Published in MNRA
Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets
The amount of collected data in many scientific fields is increasing, all of them requiring a common task: extract knowledge from massive, multi parametric data sets, as rapidly and efficiently possible. This is especially true in astronomy where synoptic sky surveys are enabling new research frontiers in the time domain astronomy and posing several new object classification challenges in multi dimensional spaces; given the high number of parameters available for each object, feature selection is quickly becoming a crucial task in analyzing astronomical data sets. Using data sets extracted from the ongoing Catalina Real-Time Transient Surveys (CRTS) and the Kepler Mission we illustrate a variety of feature selection strategies used to identify the subsets that give the most information and the results achieved applying these techniques to three major astronomical problems