88,490 research outputs found
Data Driven Discovery in Astrophysics
We review some aspects of the current state of data-intensive astronomy, its
methods, and some outstanding data analysis challenges. Astronomy is at the
forefront of "big data" science, with exponentially growing data volumes and
data rates, and an ever-increasing complexity, now entering the Petascale
regime. Telescopes and observatories from both ground and space, covering a
full range of wavelengths, feed the data via processing pipelines into
dedicated archives, where they can be accessed for scientific analysis. Most of
the large archives are connected through the Virtual Observatory framework,
that provides interoperability standards and services, and effectively
constitutes a global data grid of astronomy. Making discoveries in this
overabundance of data requires applications of novel, machine learning tools.
We describe some of the recent examples of such applications.Comment: Keynote talk in the proceedings of ESA-ESRIN Conference: Big Data
from Space 2014, Frascati, Italy, November 12-14, 2014, 8 pages, 2 figure
CD-CNN: A Partially Supervised Cross-Domain Deep Learning Model for Urban Resident Recognition
Driven by the wave of urbanization in recent decades, the research topic
about migrant behavior analysis draws great attention from both academia and
the government. Nevertheless, subject to the cost of data collection and the
lack of modeling methods, most of existing studies use only questionnaire
surveys with sparse samples and non-individual level statistical data to
achieve coarse-grained studies of migrant behaviors. In this paper, a partially
supervised cross-domain deep learning model named CD-CNN is proposed for
migrant/native recognition using mobile phone signaling data as behavioral
features and questionnaire survey data as incomplete labels. Specifically,
CD-CNN features in decomposing the mobile data into location domain and
communication domain, and adopts a joint learning framework that combines two
convolutional neural networks with a feature balancing scheme. Moreover, CD-CNN
employs a three-step algorithm for training, in which the co-training step is
of great value to partially supervised cross-domain learning. Comparative
experiments on the city Wuxi demonstrate the high predictive power of CD-CNN.
Two interesting applications further highlight the ability of CD-CNN for
in-depth migrant behavioral analysis.Comment: 8 pages, 5 figures, conferenc
Some Pattern Recognition Challenges in Data-Intensive Astronomy
We review some of the recent developments and challenges posed by the data
analysis in modern digital sky surveys, which are representative of the
information-rich astronomy in the context of Virtual Observatory. Illustrative
examples include the problems of an automated star-galaxy classification in
complex and heterogeneous panoramic imaging data sets, and an automated,
iterative, dynamical classification of transient events detected in synoptic
sky surveys. These problems offer good opportunities for productive
collaborations between astronomers and applied computer scientists and
statisticians, and are representative of the kind of challenges now present in
all data-intensive fields. We discuss briefly some emergent types of scalable
scientific data analysis systems with a broad applicability.Comment: 8 pages, compressed pdf file, figures downgraded in quality in order
to match the arXiv size limi
- …