15,473 research outputs found
Fast conditional density estimation for quantitative structure-activity relationships
Many methods for quantitative structure-activity relationships (QSARs) deliver point estimates only, without quantifying the uncertainty inherent in the prediction. One way to quantify the uncertainy of a QSAR prediction is to predict the conditional density of the activity given the structure instead of a point estimate. If a conditional density estimate is available, it is easy to derive prediction intervals of activities. In this paper, we experimentally evaluate and compare three methods for conditional density estimation for their suitability in QSAR modeling. In contrast to traditional methods for conditional density estimation, they are based on generic machine learning schemes, more specifically, class probability estimators. Our experiments show that a kernel estimator based on class probability estimates from a random forest classifier is highly competitive with Gaussian process regression, while taking only a fraction of the time for training. Therefore, generic machine-learning based methods for conditional density estimation may be a good and fast option for quantifying uncertainty in QSAR modeling.http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/181
Hierarchic Bayesian models for kernel learning
The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel learning and present effective variational Bayes estimators for regression and classification. Illustrative experiments demonstrate the utility of the proposed method
ASTErIsM - Application of topometric clustering algorithms in automatic galaxy detection and classification
We present a study on galaxy detection and shape classification using
topometric clustering algorithms. We first use the DBSCAN algorithm to extract,
from CCD frames, groups of adjacent pixels with significant fluxes and we then
apply the DENCLUE algorithm to separate the contributions of overlapping
sources. The DENCLUE separation is based on the localization of pattern of
local maxima, through an iterative algorithm which associates each pixel to the
closest local maximum. Our main classification goal is to take apart elliptical
from spiral galaxies. We introduce new sets of features derived from the
computation of geometrical invariant moments of the pixel group shape and from
the statistics of the spatial distribution of the DENCLUE local maxima
patterns. Ellipticals are characterized by a single group of local maxima,
related to the galaxy core, while spiral galaxies have additional ones related
to segments of spiral arms. We use two different supervised ensemble
classification algorithms, Random Forest, and Gradient Boosting. Using a sample
of ~ 24000 galaxies taken from the Galaxy Zoo 2 main sample with spectroscopic
redshifts, and we test our classification against the Galaxy Zoo 2 catalog. We
find that features extracted from our pipeline give on average an accuracy of ~
93%, when testing on a test set with a size of 20% of our full data set, with
features deriving from the angular distribution of density attractor ranking at
the top of the discrimination power.Comment: 20 pages, 13 Figures, 8 Tables, Accepted for publication in the
Monthly Notices of the Royal Astronomical Societ
- …