15,473 research outputs found

    Fast conditional density estimation for quantitative structure-activity relationships

    Get PDF
    Many methods for quantitative structure-activity relationships (QSARs) deliver point estimates only, without quantifying the uncertainty inherent in the prediction. One way to quantify the uncertainy of a QSAR prediction is to predict the conditional density of the activity given the structure instead of a point estimate. If a conditional density estimate is available, it is easy to derive prediction intervals of activities. In this paper, we experimentally evaluate and compare three methods for conditional density estimation for their suitability in QSAR modeling. In contrast to traditional methods for conditional density estimation, they are based on generic machine learning schemes, more specifically, class probability estimators. Our experiments show that a kernel estimator based on class probability estimates from a random forest classifier is highly competitive with Gaussian process regression, while taking only a fraction of the time for training. Therefore, generic machine-learning based methods for conditional density estimation may be a good and fast option for quantifying uncertainty in QSAR modeling.http://www.aaai.org/ocs/index.php/AAAI/AAAI10/paper/view/181

    Hierarchic Bayesian models for kernel learning

    Get PDF
    The integration of diverse forms of informative data by learning an optimal combination of base kernels in classification or regression problems can provide enhanced performance when compared to that obtained from any single data source. We present a Bayesian hierarchical model which enables kernel learning and present effective variational Bayes estimators for regression and classification. Illustrative experiments demonstrate the utility of the proposed method

    ASTErIsM - Application of topometric clustering algorithms in automatic galaxy detection and classification

    Full text link
    We present a study on galaxy detection and shape classification using topometric clustering algorithms. We first use the DBSCAN algorithm to extract, from CCD frames, groups of adjacent pixels with significant fluxes and we then apply the DENCLUE algorithm to separate the contributions of overlapping sources. The DENCLUE separation is based on the localization of pattern of local maxima, through an iterative algorithm which associates each pixel to the closest local maximum. Our main classification goal is to take apart elliptical from spiral galaxies. We introduce new sets of features derived from the computation of geometrical invariant moments of the pixel group shape and from the statistics of the spatial distribution of the DENCLUE local maxima patterns. Ellipticals are characterized by a single group of local maxima, related to the galaxy core, while spiral galaxies have additional ones related to segments of spiral arms. We use two different supervised ensemble classification algorithms, Random Forest, and Gradient Boosting. Using a sample of ~ 24000 galaxies taken from the Galaxy Zoo 2 main sample with spectroscopic redshifts, and we test our classification against the Galaxy Zoo 2 catalog. We find that features extracted from our pipeline give on average an accuracy of ~ 93%, when testing on a test set with a size of 20% of our full data set, with features deriving from the angular distribution of density attractor ranking at the top of the discrimination power.Comment: 20 pages, 13 Figures, 8 Tables, Accepted for publication in the Monthly Notices of the Royal Astronomical Societ
    corecore