7 research outputs found

    Immersive and Collaborative Data Visualization Using Virtual Reality Platforms

    Get PDF
    Effective data visualization is a key part of the discovery process in the era of big data. It is the bridge between the quantitative content of the data and human intuition, and thus an essential component of the scientific path from data into knowledge and understanding. Visualization is also essential in the data mining process, directing the choice of the applicable algorithms, and in helping to identify and remove bad data from the analysis. However, a high complexity or a high dimensionality of modern data sets represents a critical obstacle. How do we visualize interesting structures and patterns that may exist in hyper-dimensional data spaces? A better understanding of how we can perceive and interact with multi dimensional information poses some deep questions in the field of cognition technology and human computer interaction. To this effect, we are exploring the use of immersive virtual reality platforms for scientific data visualization, both as software and inexpensive commodity hardware. These potentially powerful and innovative tools for multi dimensional data visualization can also provide an easy and natural path to a collaborative data visualization and exploration, where scientists can interact with their data and their colleagues in the same visual space. Immersion provides benefits beyond the traditional desktop visualization tools: it leads to a demonstrably better perception of a datascape geometry, more intuitive data understanding, and a better retention of the perceived relationships in the data.Comment: 6 pages, refereed proceedings of 2014 IEEE International Conference on Big Data, page 609, ISBN 978-1-4799-5665-

    Automated Real-Time Classification and Decision Making in Massive Data Streams from Synoptic Sky Surveys

    Get PDF
    The nature of scientific and technological data collection is evolving rapidly: data volumes and rates grow exponentially, with increasing complexity and information content, and there has been a transition from static data sets to data streams that must be analyzed in real time. Interesting or anomalous phenomena must be quickly characterized and followed up with additional measurements via optimal deployment of limited assets. Modern astronomy presents a variety of such phenomena in the form of transient events in digital synoptic sky surveys, including cosmic explosions (supernovae, gamma ray bursts), relativistic phenomena (black hole formation, jets), potentially hazardous asteroids, etc. We have been developing a set of machine learning tools to detect, classify and plan a response to transient events for astronomy applications, using the Catalina Real-time Transient Survey (CRTS) as a scientific and methodological testbed. The ability to respond rapidly to the potentially most interesting events is a key bottleneck that limits the scientific returns from the current and anticipated synoptic sky surveys. Similar challenge arise in other contexts, from environmental monitoring using sensor networks to autonomous spacecraft systems. Given the exponential growth of data rates, and the time-critical response, we need a fully automated and robust approach. We describe the results obtained to date, and the possible future developments.Comment: 8 pages, IEEE conference format, to appear in the refereed proceedings of the IEEE e-Science 2014 conf., eds. C. Medeiros et al., IEEE, in press (2014). arXiv admin note: substantial text overlap with arXiv:1209.1681, arXiv:1110.465

    Nonparametric Transient Classification using Adaptive Wavelets

    Full text link
    Classifying transients based on multi band light curves is a challenging but crucial problem in the era of GAIA and LSST since the sheer volume of transients will make spectroscopic classification unfeasible. Here we present a nonparametric classifier that uses the transient's light curve measurements to predict its class given training data. It implements two novel components: the first is the use of the BAGIDIS wavelet methodology - a characterization of functional data using hierarchical wavelet coefficients. The second novelty is the introduction of a ranked probability classifier on the wavelet coefficients that handles both the heteroscedasticity of the data in addition to the potential non-representativity of the training set. The ranked classifier is simple and quick to implement while a major advantage of the BAGIDIS wavelets is that they are translation invariant, hence they do not need the light curves to be aligned to extract features. Further, BAGIDIS is nonparametric so it can be used for blind searches for new objects. We demonstrate the effectiveness of our ranked wavelet classifier against the well-tested Supernova Photometric Classification Challenge dataset in which the challenge is to correctly classify light curves as Type Ia or non-Ia supernovae. We train our ranked probability classifier on the spectroscopically-confirmed subsample (which is not representative) and show that it gives good results for all supernova with observed light curve timespans greater than 100 days (roughly 55% of the dataset). For such data, we obtain a Ia efficiency of 80.5% and a purity of 82.4% yielding a highly competitive score of 0.49 whilst implementing a truly "model-blind" approach to supernova classification. Consequently this approach may be particularly suitable for the classification of astronomical transients in the era of large synoptic sky surveys.Comment: 14 pages, 8 figures. Published in MNRA

    Feature Selection Strategies for Classifying High Dimensional Astronomical Data Sets

    No full text
    The amount of collected data in many scientific fields is increasing, all of them requiring a common task: extract knowledge from massive, multi parametric data sets, as rapidly and efficiently possible. This is especially true in astronomy where synoptic sky surveys are enabling new research frontiers in the time domain astronomy and posing several new object classification challenges in multi dimensional spaces; given the high number of parameters available for each object, feature selection is quickly becoming a crucial task in analyzing astronomical data sets. Using data sets extracted from the ongoing Catalina Real-Time Transient Surveys (CRTS) and the Kepler Mission we illustrate a variety of feature selection strategies used to identify the subsets that give the most information and the results achieved applying these techniques to three major astronomical problems
    corecore