4,627 research outputs found
Geoadditive Regression Modeling of Stream Biological Condition
Indices of biotic integrity (IBI) have become an established tool to quantify the condition of small non-tidal streams and their watersheds. To investigate the effects of watershed characteristics on stream biological condition, we present a new technique for regressing IBIs on watershed-specific explanatory variables. Since IBIs are typically evaluated on anordinal scale, our method is based on the proportional odds model for ordinal outcomes. To avoid overfitting, we do not use classical maximum likelihood estimation but a component-wise functional gradient boosting approach. Because component-wise gradient boosting has an intrinsic mechanism for variable selection and model choice, determinants of biotic integrity can be identified. In addition, the method offers a relatively simple way to account for spatial correlation in ecological data. An analysis of the Maryland Biological Streams Survey shows that nonlinear effects of predictor variables on stream condition can be quantified while, in addition, accurate predictions of biological condition at unsurveyed locations are obtained
Separation of pulsar signals from noise with supervised machine learning algorithms
We evaluate the performance of four different machine learning (ML)
algorithms: an Artificial Neural Network Multi-Layer Perceptron (ANN MLP ),
Adaboost, Gradient Boosting Classifier (GBC), XGBoost, for the separation of
pulsars from radio frequency interference (RFI) and other sources of noise,
using a dataset obtained from the post-processing of a pulsar search pi peline.
This dataset was previously used for cross-validation of the SPINN-based
machine learning engine, used for the reprocessing of HTRU-S survey data
arXiv:1406.3627. We have used Synthetic Minority Over-sampling Technique
(SMOTE) to deal with high class imbalance in the dataset. We report a variety
of quality scores from all four of these algorithms on both the non-SMOTE and
SMOTE datasets. For all the above ML methods, we report high accuracy and
G-mean in both the non-SMOTE and SMOTE cases. We study the feature importances
using Adaboost, GBC, and XGBoost and also from the minimum Redundancy Maximum
Relevance approach to report algorithm-agnostic feature ranking. From these
methods, we find that the signal to noise of the folded profile to be the best
feature. We find that all the ML algorithms report FPRs about an order of
magnitude lower than the corresponding FPRs obtained in arXiv:1406.3627, for
the same recall value.Comment: 14 pages, 2 figures. Accepted for publication in Astronomy and
Computin
Boosted Beta regression.
Regression analysis with a bounded outcome is a common problem in applied statistics. Typical examples include regression models for percentage outcomes and the analysis of ratings that are measured on a bounded scale. In this paper, we consider beta regression, which is a generalization of logit models to situations where the response is continuous on the interval (0,1). Consequently, beta regression is a convenient tool for analyzing percentage responses. The classical approach to fit a beta regression model is to use maximum likelihood estimation with subsequent AIC-based variable selection. As an alternative to this established - yet unstable - approach, we propose a new estimation technique called boosted beta regression. With boosted beta regression estimation and variable selection can be carried out simultaneously in a highly efficient way. Additionally, both the mean and the variance of a percentage response can be modeled using flexible nonlinear covariate effects. As a consequence, the new method accounts for common problems such as overdispersion and non-binomial variance structures
Algorithm selection on data streams
We explore the possibilities of meta-learning on data streams, in particular algorithm selection. In a first experiment we calculate the characteristics of a small sample of a data stream, and try to predict which classifier performs best on the entire stream. This yields promising results and interesting patterns. In a second experiment, we build a meta-classifier that predicts, based on measurable data characteristics in a window of the data stream, the best classifier for the next window. The results show that this meta-algorithm is very competitive with state of the art ensembles, such as OzaBag, OzaBoost and Leveraged Bagging. The results of all experiments are made publicly available in an online experiment database, for the purpose of verifiability, reproducibility and generalizability
Ursa Major II - Reproducing the observed properties through tidal disruption
Recent deep photometry of the dwarf spheroidal Ursa Major II's morphology,
and spectroscopy of individual stars, have provided a number of new constraints
on its properties. With a velocity dispersion 6 km s, and under
the assumption that the galaxy is virialised, the mass-to-light ratio is found
to be approaching 2000 - apparently heavily dark matter dominated. Using
N-Body simulations, we demonstrate that the observed luminosity, ellipticity,
irregular morphology, velocity gradient, and the velocity dispersion can be
well reproduced through processes associated with tidal mass loss, and in the
absence of dark matter. These results highlight the considerable uncertainty
that exists in measurements of the dark matter content of Ursa Major II. The
dynamics of the inner tidal tails, and tidal stream, causes the observed
velocity dispersion of stars to be boosted to values of 5 km s (20
km s at times). This effect is responsible for raising the velocity
dispersion of our model to the observed values in UMaII. We test an iterative
rejection technique for removing unbound stars from samples of UMaII stars
whose positions on the sky, and line-of-sight velocities, are provided. We find
this technique is very effective at providing an accurate bound mass from this
information, and only fails when the galaxy has a bound mass less than 10 of
its initial mass. However when mass remains bound, mass overestimation by
3 orders of magnitude are seen. Additionally we find that mass measurements
are sensitive to measurement uncertainty in line-of-sight velocities.
Measurement uncertainties of 1-4 km s result in mass overestimates by a
factor of 1.3-5.7.Comment: 17 pages, 12 figures, accepted to MNRAS: 23rd, May, 201
- …