4,627 research outputs found

    Geoadditive Regression Modeling of Stream Biological Condition

    Get PDF
    Indices of biotic integrity (IBI) have become an established tool to quantify the condition of small non-tidal streams and their watersheds. To investigate the effects of watershed characteristics on stream biological condition, we present a new technique for regressing IBIs on watershed-specific explanatory variables. Since IBIs are typically evaluated on anordinal scale, our method is based on the proportional odds model for ordinal outcomes. To avoid overfitting, we do not use classical maximum likelihood estimation but a component-wise functional gradient boosting approach. Because component-wise gradient boosting has an intrinsic mechanism for variable selection and model choice, determinants of biotic integrity can be identified. In addition, the method offers a relatively simple way to account for spatial correlation in ecological data. An analysis of the Maryland Biological Streams Survey shows that nonlinear effects of predictor variables on stream condition can be quantified while, in addition, accurate predictions of biological condition at unsurveyed locations are obtained

    Separation of pulsar signals from noise with supervised machine learning algorithms

    Full text link
    We evaluate the performance of four different machine learning (ML) algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP ), Adaboost, Gradient Boosting Classifier (GBC), XGBoost, for the separation of pulsars from radio frequency interference (RFI) and other sources of noise, using a dataset obtained from the post-processing of a pulsar search pi peline. This dataset was previously used for cross-validation of the SPINN-based machine learning engine, used for the reprocessing of HTRU-S survey data arXiv:1406.3627. We have used Synthetic Minority Over-sampling Technique (SMOTE) to deal with high class imbalance in the dataset. We report a variety of quality scores from all four of these algorithms on both the non-SMOTE and SMOTE datasets. For all the above ML methods, we report high accuracy and G-mean in both the non-SMOTE and SMOTE cases. We study the feature importances using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum Relevance approach to report algorithm-agnostic feature ranking. From these methods, we find that the signal to noise of the folded profile to be the best feature. We find that all the ML algorithms report FPRs about an order of magnitude lower than the corresponding FPRs obtained in arXiv:1406.3627, for the same recall value.Comment: 14 pages, 2 figures. Accepted for publication in Astronomy and Computin

    Boosted Beta regression.

    Get PDF
    Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures

    Algorithm selection on data streams

    Get PDF
    We explore the possibilities of meta-learning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier performs best on the entire stream. This yields promising results and interesting patterns. In a second experiment, we build a meta-classifier that predicts, based on measurable data characteristics in a window of the data stream, the best classifier for the next window. The results show that this meta-algorithm is very competitive with state of the art ensembles, such as OzaBag, OzaBoost and Leveraged Bagging. The results of all experiments are made publicly available in an online experiment database, for the purpose of verifiability, reproducibility and generalizability

    Ursa Major II - Reproducing the observed properties through tidal disruption

    Full text link
    Recent deep photometry of the dwarf spheroidal Ursa Major II's morphology, and spectroscopy of individual stars, have provided a number of new constraints on its properties. With a velocity dispersion \sim6 km s1^{-1}, and under the assumption that the galaxy is virialised, the mass-to-light ratio is found to be approaching \sim2000 - apparently heavily dark matter dominated. Using N-Body simulations, we demonstrate that the observed luminosity, ellipticity, irregular morphology, velocity gradient, and the velocity dispersion can be well reproduced through processes associated with tidal mass loss, and in the absence of dark matter. These results highlight the considerable uncertainty that exists in measurements of the dark matter content of Ursa Major II. The dynamics of the inner tidal tails, and tidal stream, causes the observed velocity dispersion of stars to be boosted to values of >>5 km s1^{-1} (>>20 km s1^{-1} at times). This effect is responsible for raising the velocity dispersion of our model to the observed values in UMaII. We test an iterative rejection technique for removing unbound stars from samples of UMaII stars whose positions on the sky, and line-of-sight velocities, are provided. We find this technique is very effective at providing an accurate bound mass from this information, and only fails when the galaxy has a bound mass less than 10% of its initial mass. However when <2<2% mass remains bound, mass overestimation by >>3 orders of magnitude are seen. Additionally we find that mass measurements are sensitive to measurement uncertainty in line-of-sight velocities. Measurement uncertainties of 1-4 km s1^{-1} result in mass overestimates by a factor of \sim1.3-5.7.Comment: 17 pages, 12 figures, accepted to MNRAS: 23rd, May, 201
    corecore