28 research outputs found

    Profiling Instances in Noise Reduction

    Get PDF
    The dependency on the quality of the training data has led to significant work in noise reduction for instance-based learning algorithms. This paper presents an empirical evaluation of current noise reduction techniques, not just from the perspective of their comparative performance, but from the perspective of investigating the types of instances that they focus on for re- moval. A novel instance profiling technique known as RDCL profiling allows the structure of a training set to be analysed at the instance level cate- gorising each instance based on modelling their local competence properties. This profiling approach o↵ers the opportunity of investigating the types of instances removed by the noise reduction techniques that are currently in use in instance-based learning. The paper also considers the e↵ect of removing instances with specific profiles from a dataset and shows that a very simple approach of removing instances that are misclassified by the training set and cause other instances in the dataset to be misclassified is an e↵ective noise reduction technique

    Patterns of abundance across geographical ranges as a predictor for responses to climate change:Evidence from UK rocky shores

    Get PDF
    Aim: Understanding patterns in the abundance of species across thermal ranges can give useful insights into the potential impacts of climate change. The abundant-centre hypothesis suggests that species will reach peak abundance at the centre of their thermal range where conditions are optimal, but evidence in support of this hypothesis is mixed and limited in geographical and taxonomic scope. We tested the applicability of the abundant-centre hypothesis across a range of intertidal organisms using a large, citizen science-generated data set. Location: UK. Methods: Species' abundance records were matched with their location within their thermal range. Patterns in abundance distribution for individual species, and across aggregated species abundances, were analysed using Kruskal–Wallis tests and quantile general additive models. Results: Individually, invertebrate species showed increasing abundances in the cooler half of the thermal range and decreasing abundances in the warmer half of the thermal range. The overall shape for aggregated invertebrate species abundances reflected a broad peak, with a cool-skewed maximum abundance. Algal species showed little evidence for an abundant-centre distribution individually, but overall the aggregated species abundances suggested a hump-backed abundance distribution. Main Conclusions: Our study follows others in showing mixed support for the abundant-centre hypothesis at an individual species level, but demonstrates an increased predictability in species responses when an aggregated overall response is considered

    Early-life telomere dynamics differ between the sexes and predict growth in the barn swallow (Hirundo rustica)

    Get PDF
    Telomeres are conserved DNA-protein structures at the termini of eukaryotic chromosomes which contribute to maintenance of genome integrity, and their shortening leads to cell senescence, with negative consequences for organismal functions. Because telomere erosion is influenced by extrinsic and endogenous factors, telomere dynamics may provide a mechanistic basis for evolutionary and physiological trade-offs. Yet, knowledge of fundamental aspects of telomere biology under natural selection regimes, including sex- and context-dependent variation in early-life, and the covariation between telomere dynamics and growth, is scant. In this study of barn swallows (Hirundo rustica) we investigated the sex-dependent telomere erosion during nestling period, and the covariation between relative telomere length and body and plumage growth. Finally, we tested whether any covariation between growth traits and relative telomere length depends on the social environment, as influenced by sibling sex ratio. Relative telomere length declined on average over the period of nestling maximal growth rate (between 7 and 16 days of age) and differently covaried with initial relative telomere length in either sex. The frequency distribution of changes in relative telomere length was bimodal, with most nestlings decreasing and some increasing relative telomere length, but none of the offspring traits predicted the a posteriori identified group to which individual nestlings belonged. Tail and wing length increased with relative telomere length, but more steeply in males than females, and this relationship held both at the within- and among-broods levels. Moreover, the increase in plumage phenotypic values was steeper when the sex ratio of an individual's siblings was female-biased. Our study provides evidence for telomere shortening during early life according to subtly different dynamics in either sex. Furthermore, it shows that the positive covariation between growth and relative telomere length depends on sex as well as social environment, in terms of sibling sex ratio

    Noise Reduction for Instance-Based Learning with a Local Maximal Margin Approach

    Get PDF
    To some extent the problem of noise reduction in machine learning has been finessed by the development of learning techniques that are noise-tolerant. However, it is difficult to make instance-based learning noise tolerant and noise reduction still plays an important role in k-nearest neighbour classification. There are also other motivations for noise reduction, for instance the elimination of noise may result in simpler models or data cleansing may be an end in itself. In this paper we present a novel approach to noise reduction based on local Support Vector Machines (LSVM) which brings the benefits of maximal margin classifiers to bear on noise reduction. This provides a more robust alternative to the majority rule on which almost all the existing noise reduction techniques are based. Roughly speaking, for each training sample an SVM is trained on its neighbourhood and if the SVM classification for the central sample disagrees with its actual class there is evidence in favour of removing it from the training set. We provide an empirical evaluation on 15 real datasets showing improved classification accuracy when using training data edited with our method as well as specific experiments regarding the spam filtering application domain. We present a further evaluation on two artificial datasets where we analyse two different types of noise (Gaussian sample noise and mislabelling noise) and the influence of different class densities. The conclusion is that LSVM noise reduction is significatively better than the other analysed algorithms for real datasets and for artificial datasets perturbed by Gaussian noise and in presence of uneven class densities
    corecore