1,816,247 research outputs found
The LSST Data Mining Research Agenda
We describe features of the LSST science database that are amenable to
scientific data mining, object classification, outlier identification, anomaly
detection, image quality assurance, and survey science validation. The data
mining research agenda includes: scalability (at petabytes scales) of existing
machine learning and data mining algorithms; development of grid-enabled
parallel data mining algorithms; designing a robust system for brokering
classifications from the LSST event pipeline (which may produce 10,000 or more
event alerts per night); multi-resolution methods for exploration of petascale
databases; indexing of multi-attribute multi-dimensional astronomical databases
(beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large
Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200
Large-Scale Plant Classification with Deep Neural Networks
This paper discusses the potential of applying deep learning techniques for
plant classification and its usage for citizen science in large-scale
biodiversity monitoring. We show that plant classification using near
state-of-the-art convolutional network architectures like ResNet50 achieves
significant improvements in accuracy compared to the most widespread plant
classification application in test sets composed of thousands of different
species labels. We find that the predictions can be confidently used as a
baseline classification in citizen science communities like iNaturalist (or its
Spanish fork, Natusfera) which in turn can share their data with biodiversity
portals like GBIF.Comment: 5 pages, 3 figures, 1 table. Published at Proocedings of ACM
Computing Frontiers Conference 201
Construction of a Pragmatic Base Line for Journal Classifications and Maps Based on Aggregated Journal-Journal Citation Relations
A number of journal classification systems have been developed in
bibliometrics since the launch of the Citation Indices by the Institute of
Scientific Information (ISI) in the 1960s. These systems are used to normalize
citation counts with respect to field-specific citation patterns. The best
known system is the so-called "Web-of-Science Subject Categories" (WCs). In
other systems papers are classified by algorithmic solutions. Using the Journal
Citation Reports 2014 of the Science Citation Index and the Social Science
Citation Index (n of journals = 11,149), we examine options for developing a
new system based on journal classifications into subject categories using
aggregated journal-journal citation data. Combining routines in VOSviewer and
Pajek, a tree-like classification is developed. At each level one can generate
a map of science for all the journals subsumed under a category. Nine major
fields are distinguished at the top level. Further decomposition of the social
sciences is pursued for the sake of example with a focus on journals in
information science (LIS) and science studies (STS). The new classification
system improves on alternative options by avoiding the problem of randomness in
each run that has made algorithmic solutions hitherto irreproducible.
Limitations of the new system are discussed (e.g. the classification of
multi-disciplinary journals). The system's usefulness for field-normalization
in bibliometrics should be explored in future studies.Comment: accepted for publication in the Journal of Informetrics, 20 July 201
Classification methods for noise transients in advanced gravitational-wave detectors
Noise of non-astrophysical origin will contaminate science data taken by the
Advanced Laser Interferometer Gravitational-wave Observatory (aLIGO) and
Advanced Virgo gravitational-wave detectors. Prompt characterization of
instrumental and environmental noise transients will be critical for improving
the sensitivity of the advanced detectors in the upcoming science runs. During
the science runs of the initial gravitational-wave detectors, noise transients
were manually classified by visually examining the time-frequency scan of each
event. Here, we present three new algorithms designed for the automatic
classification of noise transients in advanced detectors. Two of these
algorithms are based on Principal Component Analysis. They are Principal
Component Analysis for Transients (PCAT), and an adaptation of LALInference
Burst (LIB). The third algorithm is a combination of an event generator called
Wavelet Detection Filter (WDF) and machine learning techniques for
classification. We test these algorithms on simulated data sets, and we show
their ability to automatically classify transients by frequency, SNR and
waveform morphology
A Two-stage Classification Method for High-dimensional Data and Point Clouds
High-dimensional data classification is a fundamental task in machine
learning and imaging science. In this paper, we propose a two-stage multiphase
semi-supervised classification method for classifying high-dimensional data and
unstructured point clouds. To begin with, a fuzzy classification method such as
the standard support vector machine is used to generate a warm initialization.
We then apply a two-stage approach named SaT (smoothing and thresholding) to
improve the classification. In the first stage, an unconstraint convex
variational model is implemented to purify and smooth the initialization,
followed by the second stage which is to project the smoothed partition
obtained at stage one to a binary partition. These two stages can be repeated,
with the latest result as a new initialization, to keep improving the
classification quality. We show that the convex model of the smoothing stage
has a unique solution and can be solved by a specifically designed primal-dual
algorithm whose convergence is guaranteed. We test our method and compare it
with the state-of-the-art methods on several benchmark data sets. The
experimental results demonstrate clearly that our method is superior in both
the classification accuracy and computation speed for high-dimensional data and
point clouds.Comment: 21 pages, 4 figure
Data-Rich Astronomy: Mining Sky Surveys with PhotoRApToR
In the last decade a new generation of telescopes and sensors has allowed the
production of a very large amount of data and astronomy has become a data-rich
science. New automatic methods largely based on machine learning are needed to
cope with such data tsunami. We present some results in the fields of
photometric redshifts and galaxy classification, obtained using the MLPQNA
algorithm available in the DAMEWARE (Data Mining and Web Application Resource)
for the SDSS galaxies (DR9 and DR10). We present PhotoRApToR (Photometric
Research Application To Redshift): a Java based desktop application capable to
solve regression and classification problems and specialized for photo-z
estimation.Comment: proceedings of the IAU Symposium, Vol. 306, Cambridge University
Pres
- …