212 research outputs found

    Self-consistent redshift estimation using correlation functions without a spectroscopic reference sample

    Full text link
    We present a new method to estimate redshift distributions and galaxy-dark matter bias parameters using correlation functions in a fully data driven and self-consistent manner. Unlike other machine learning, template, or correlation redshift methods, this approach does not require a reference sample with known redshifts. By measuring the projected cross- and auto- correlations of different galaxy sub-samples, e.g., as chosen by simple cells in color-magnitude space, we are able to estimate the galaxy-dark matter bias model parameters, and the shape of the redshift distributions of each sub-sample. This method fully marginalises over a flexible parameterisation of the redshift distribution and galaxy-dark matter bias parameters of sub-samples of galaxies, and thus provides a general Bayesian framework to incorporate redshift uncertainty into the cosmological analysis in a data-driven, consistent, and reproducible manner. This result is improved by an order of magnitude by including cross-correlations with the CMB and with galaxy-galaxy lensing. We showcase how this method could be applied to real galaxies. By using idealised data vectors, in which all galaxy-dark matter model parameters and redshift distributions are known, this method is demonstrated to recover unbiased estimates on important quantities, such as the offset Δz\Delta_z between the mean of the true and estimated redshift distribution and the 68\% and 95\% and 99.5\% widths of the redshift distribution to an accuracy required by current and future surveys.Comment: 20pages, 11 figures, text revised for clarification, version accepted by journal, conclusions unchange

    Feature importance for machine learning redshifts applied to SDSS galaxies

    Full text link
    We present an analysis of importance feature selection applied to photometric redshift estimation using the machine learning architecture Decision Trees with the ensemble learning routine Adaboost (hereafter RDF). We select a list of 85 easily measured (or derived) photometric quantities (or `features') and spectroscopic redshifts for almost two million galaxies from the Sloan Digital Sky Survey Data Release 10. After identifying which features have the most predictive power, we use standard artificial Neural Networks (aNN) to show that the addition of these features, in combination with the standard magnitudes and colours, improves the machine learning redshift estimate by 18% and decreases the catastrophic outlier rate by 32%. We further compare the redshift estimate using RDF with those from two different aNNs, and with photometric redshifts available from the SDSS. We find that the RDF requires orders of magnitude less computation time than the aNNs to obtain a machine learning redshift while reducing both the catastrophic outlier rate by up to 43%, and the redshift error by up to 25%. When compared to the SDSS photometric redshifts, the RDF machine learning redshifts both decreases the standard deviation of residuals scaled by 1/(1+z) by 36% from 0.066 to 0.041, and decreases the fraction of catastrophic outliers by 57% from 2.32% to 0.99%.Comment: 10 pages, 4 figures, updated to match version accepted in MNRA

    Tuning target selection algorithms to improve galaxy redshift estimates

    Full text link
    We showcase machine learning (ML) inspired target selection algorithms to determine which of all potential targets should be selected first for spectroscopic follow up. Efficient target selection can improve the ML redshift uncertainties as calculated on an independent sample, while requiring less targets to be observed. We compare the ML targeting algorithms with the Sloan Digital Sky Survey (SDSS) target order, and with a random targeting algorithm. The ML inspired algorithms are constructed iteratively by estimating which of the remaining target galaxies will be most difficult for the machine learning methods to accurately estimate redshifts using the previously observed data. This is performed by predicting the expected redshift error and redshift offset (or bias) of all of the remaining target galaxies. We find that the predicted values of bias and error are accurate to better than 10-30% of the true values, even with only limited training sample sizes. We construct a hypothetical follow-up survey and find that some of the ML targeting algorithms are able to obtain the same redshift predictive power with 2-3 times less observing time, as compared to that of the SDSS, or random, target selection algorithms. The reduction in the required follow up resources could allow for a change to the follow-up strategy, for example by obtaining deeper spectroscopy, which could improve ML redshift estimates for deeper test data.Comment: 16 pages, 9 figures, updated to match MNRAS accepted version. Minor text changes, results unchange

    Bekämpfungsstrategien für den Wurzelgallennematoden Meloidogyne hapla im Ökologischen Landbau

    Get PDF
    The root-knot nematode Meloidogyne hapla is a major pest in organic farming causing severe damage especially on vegetables. Common practices such as high cropping frequencies of legumes and low frequencies of cereals in association with unsatisfactory weed control are assumed to be major factors for nematode build-up. Due to the broad host spectrum of M. hapla strategies solely based on crop rotation are often not sufficient in controlling the nematode. A series of field experiments was conducted to develop more efficient control strategies. Based on the results a recommendation for reducing high nematode densities was developed which is build on black fallow throughout the main vegetation period buffered by additional measures such as previously growth of a overwintering legume and its incorporation early in spring before the nematode has multiplied and followed by a overwintering cereal to conserve soil nutrients and avoid erosion. In the long-term any build-up of damaging levels of M. hapla need to be avoided by a higher cropping frequency of non host crops (e. g. cereals, Tagetes), growth of catch crops (e. g. fodder radish), satisfactory weed control, short periods of black fallow to allow the soil to rest and avoidance of clover immediately before growing susceptible vegetables

    Anomaly detection for machine learning redshifts applied to SDSS galaxies

    Full text link
    We present an analysis of anomaly detection for machine learning redshift estimation. Anomaly detection allows the removal of poor training examples, which can adversely influence redshift estimates. Anomalous training examples may be photometric galaxies with incorrect spectroscopic redshifts, or galaxies with one or more poorly measured photometric quantity. We select 2.5 million 'clean' SDSS DR12 galaxies with reliable spectroscopic redshifts, and 6730 'anomalous' galaxies with spectroscopic redshift measurements which are flagged as unreliable. We contaminate the clean base galaxy sample with galaxies with unreliable redshifts and attempt to recover the contaminating galaxies using the Elliptical Envelope technique. We then train four machine learning architectures for redshift analysis on both the contaminated sample and on the preprocessed 'anomaly-removed' sample and measure redshift statistics on a clean validation sample generated without any preprocessing. We find an improvement on all measured statistics of up to 80% when training on the anomaly removed sample as compared with training on the contaminated sample for each of the machine learning routines explored. We further describe a method to estimate the contamination fraction of a base data sample.Comment: 13 pages, 8 figures, 1 table, minor text updates to macth MNRAS accepted versio

    Stacking for machine learning redshifts applied to SDSS galaxies

    Full text link
    We present an analysis of a general machine learning technique called 'stacking' for the estimation of photometric redshifts. Stacking techniques can feed the photometric redshift estimate, as output by a base algorithm, back into the same algorithm as an additional input feature in a subsequent learning round. We shown how all tested base algorithms benefit from at least one additional stacking round (or layer). To demonstrate the benefit of stacking, we apply the method to both unsupervised machine learning techniques based on self-organising maps (SOMs), and supervised machine learning methods based on decision trees. We explore a range of stacking architectures, such as the number of layers and the number of base learners per layer. Finally we explore the effectiveness of stacking even when using a successful algorithm such as AdaBoost. We observe a significant improvement of between 1.9% and 21% on all computed metrics when stacking is applied to weak learners (such as SOMs and decision trees). When applied to strong learning algorithms (such as AdaBoost) the ratio of improvement shrinks, but still remains positive and is between 0.4% and 2.5% for the explored metrics and comes at almost no additional computational cost.Comment: 13 pages, 3 tables, 7 figures version accepted by MNRAS, minor text updates. Results and conclusions unchange

    Statistical downscaling of future hourly precipitation extremes in the UK using regional climate models and circulation patterns

    Get PDF
    Observational trends, physical reasons and modelling results suggest an increase in extreme precipitation with climate warming. In particular, sub-daily precipitation extremes are expected to increase heavily raising concerns about the future impacts of flash floods in urban environments and for small or steep river catchments. In order to quantify the potential risk of flash floods in the future, impact studies often require site-specific sub-daily estimates of precipitation extremes. But in their current stage, most Regional Climate Models (RCMs) are only able to provide areal averaged projections at ca. 12.5km resolution and simulated sub-daily precipitation extremes tend to be heavily biased. As a result, statistical downscaling methods are needed to provide site-specific more reliable projections of sub-daily precipitation extremes. In this thesis, a statistical downscaling method was developed to project site-specific future hourly precipitation extremes over the UK. Circulation patterns (CPs) were classified using a fuzzy rules based approach to categorize extreme hourly precipitation events according to their corresponding atmospheric conditions. In a next step, an analogue day method was applied to find the most similar day in the past by comparing the RCM simulated daily precipitation and temperature with the observations for each CP. The daily maximum hourly precipitation record on the most similar day was extracted and perturbed based on precipitation duration-temperature relationships conditioned on CPs. Within the field of statistical downscaling techniques, the applied method is best described as a hybrid of the analogue and the regression-based method. It was shown that the method is capable of reproducing observed extreme hourly precipitation over different validation periods. Projections based on the applied statistical downscaling method indicate increases in UK hourly extremes but with high variations depending on the twelve different stations, the two future time periods, the two emission scenarios and the four different GCM-driven RCMs
    corecore