36,727 research outputs found
Automated novelty detection in the WISE survey with one-class support vector machines
Wide-angle photometric surveys of previously uncharted sky areas or
wavelength regimes will always bring in unexpected sources whose existence and
properties cannot be easily predicted from earlier observations: novelties or
even anomalies. Such objects can be efficiently sought for with novelty
detection algorithms. Here we present an application of such a method, called
one-class support vector machines (OCSVM), to search for anomalous patterns
among sources preselected from the mid-infrared AllWISE catalogue covering the
whole sky. To create a model of expected data we train the algorithm on a set
of objects with spectroscopic identifications from the SDSS DR13 database,
present also in AllWISE. OCSVM detects as anomalous those sources whose
patterns - WISE photometric measurements in this case - are inconsistent with
the model. Among the detected anomalies we find artefacts, such as objects with
spurious photometry due to blending, but most importantly also real sources of
genuine astrophysical interest. Among the latter, OCSVM has identified a sample
of heavily reddened AGN/quasar candidates distributed uniformly over the sky
and in a large part absent from other WISE-based AGN catalogues. It also
allowed us to find a specific group of sources of mixed types, mostly stars and
compact galaxies. By combining the semi-supervised OCSVM algorithm with
standard classification methods it will be possible to improve the latter by
accounting for sources which are not present in the training sample but are
otherwise well-represented in the target set. Anomaly detection adds
flexibility to automated source separation procedures and helps verify the
reliability and representativeness of the training samples. It should be thus
considered as an essential step in supervised classification schemes to ensure
completeness and purity of produced catalogues.Comment: 14 pages, 15 figure
Data-Mining a Large Digital Sky Survey: From the Challenges to the Scientific Results
The analysis and an efficient scientific exploration of the Digital Palomar
Observatory Sky Survey (DPOSS) represents a major technical challenge. The
input data set consists of 3 Terabytes of pixel information, and contains a few
billion sources. We describe some of the specific scientific problems posed by
the data, including searches for distant quasars and clusters of galaxies, and
the data-mining techniques we are exploring in addressing them.
Machine-assisted discovery methods may become essential for the analysis of
such multi-Terabyte data sets. New and future approaches involve unsupervised
classification and clustering analysis in the Giga-object data space, including
various Bayesian techniques. In addition to the searches for known types of
objects in this data base, these techniques may also offer the possibility of
discovering previously unknown, rare types of astronomical objects.Comment: Invited paper, to appear in Applications of Digital Image Processing
XX, ed. A. Tescher, Proc. S.P.I.E. vol. 3164, in press; 10 pages, a
self-contained TeX file, and 3 separate postscript figure
A Distance-Limited Imaging Survey of Sub-Stellar Companions to Solar Neighborhood Stars
We report techniques and results of a Palomar 200-inch (5 m) adaptive optics
imaging survey of sub-stellar companions to solar-type stars. The survey
consists of Ks coronagraphic observations of 21 FGK dwarfs out to 20 pc (median
distance about 17 pc). At 1-arcsec separation (17 projected AU) from a typical
target system, the survey achieves median sensitivities 7 mag fainter than the
parent star. In terms of companion mass, that corresponds to sensitivities of
50MJ (1 Gyr), 70MJ (solar age), and 75MJ (10 Gyr), using the evolutionary
models of Baraffe and colleagues. Using common proper motion to distinguish
companions from field stars, we find that no system shows positive evidence of
a previously unknown substellar companion (searchable separation about 20-250
projected AU at the median target distance).Comment: 29 pages, 5 figures. Carson et al. 2008, AJ, in pres
Achieving the Way for Automated Segmentation of Nuclei in Cancer Tissue Images through Morphology-Based Approach: a Quantitative Evaluation
In this paper we address the problem of nuclear segmentation in cancer tissue images, that is critical for specific protein activity quantification and for cancer diagnosis and therapy. We present a fully automated morphology-based technique able to perform accurate nuclear segmentations in images with heterogeneous staining and multiple tissue layers and we compare it with an alternate semi-automated method based on a well established segmentation approach, namely active contours. We discuss active contours’ limitations in the segmentation of immunohistochemical images and we demonstrate and motivate through extensive experiments the better accuracy of our fully automated approach compared to various active contours implementations
Data Mining and Machine Learning in Astronomy
We review the current state of data mining and machine learning in astronomy.
'Data Mining' can have a somewhat mixed connotation from the point of view of a
researcher in this field. If used correctly, it can be a powerful approach,
holding the potential to fully exploit the exponentially increasing amount of
available data, promising great scientific advance. However, if misused, it can
be little more than the black-box application of complex computing algorithms
that may give little physical insight, and provide questionable results. Here,
we give an overview of the entire data mining process, from data collection
through to the interpretation of results. We cover common machine learning
algorithms, such as artificial neural networks and support vector machines,
applications from a broad range of astronomy, emphasizing those where data
mining techniques directly resulted in improved science, and important current
and future directions, including probability density functions, parallel
algorithms, petascale computing, and the time domain. We conclude that, so long
as one carefully selects an appropriate algorithm, and is guided by the
astronomical problem at hand, data mining can be very much the powerful tool,
and not the questionable black box.Comment: Published in IJMPD. 61 pages, uses ws-ijmpd.cls. Several extra
figures, some minor additions to the tex
Soft clustering analysis of galaxy morphologies: A worked example with SDSS
Context: The huge and still rapidly growing amount of galaxies in modern sky
surveys raises the need of an automated and objective classification method.
Unsupervised learning algorithms are of particular interest, since they
discover classes automatically. Aims: We briefly discuss the pitfalls of
oversimplified classification methods and outline an alternative approach
called "clustering analysis". Methods: We categorise different classification
methods according to their capabilities. Based on this categorisation, we
present a probabilistic classification algorithm that automatically detects the
optimal classes preferred by the data. We explore the reliability of this
algorithm in systematic tests. Using a small sample of bright galaxies from the
SDSS, we demonstrate the performance of this algorithm in practice. We are able
to disentangle the problems of classification and parametrisation of galaxy
morphologies in this case. Results: We give physical arguments that a
probabilistic classification scheme is necessary. The algorithm we present
produces reasonable morphological classes and object-to-class assignments
without any prior assumptions. Conclusions: There are sophisticated automated
classification algorithms that meet all necessary requirements, but a lot of
work is still needed on the interpretation of the results.Comment: 18 pages, 19 figures, 2 tables, submitted to A
The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch
Recent and forthcoming advances in instrumentation, and giant new surveys,
are creating astronomical data sets that are not amenable to the methods of
analysis familiar to astronomers. Traditional methods are often inadequate not
merely because of the size in bytes of the data sets, but also because of the
complexity of modern data sets. Mathematical limitations of familiar algorithms
and techniques in dealing with such data sets create a critical need for new
paradigms for the representation, analysis and scientific visualization (as
opposed to illustrative visualization) of heterogeneous, multiresolution data
across application domains. Some of the problems presented by the new data sets
have been addressed by other disciplines such as applied mathematics,
statistics and machine learning and have been utilized by other sciences such
as space-based geosciences. Unfortunately, valuable results pertaining to these
problems are mostly to be found only in publications outside of astronomy. Here
we offer brief overviews of a number of concepts, techniques and developments,
some "old" and some new. These are generally unknown to most of the
astronomical community, but are vital to the analysis and visualization of
complex datasets and images. In order for astronomers to take advantage of the
richness and complexity of the new era of data, and to be able to identify,
adopt, and apply new solutions, the astronomical community needs a certain
degree of awareness and understanding of the new concepts. One of the goals of
this paper is to help bridge the gap between applied mathematics, artificial
intelligence and computer science on the one side and astronomy on the other.Comment: 24 pages, 8 Figures, 1 Table. Accepted for publication: "Advances in
Astronomy, special issue "Robotic Astronomy
- …