13 research outputs found

    Clustering Time Series from Mixture Polynomial Models with Discretised Data

    Get PDF
    Clustering time series is an active research area with applications in many fields. One common feature of time series is the likely presence of outliers. These uncharacteristic data can significantly effect the quality of clusters formed. This paper evaluates a method of over-coming the detrimental effects of outliers. We describe some of the alternative approaches to clustering time series, then specify a particular class of model for experimentation with k-means clustering and a correlation based distance metric. For data derived from this class of model we demonstrate that discretising the data into a binary series of above and below the median improves the clustering when the data has outliers. More specifically, we show that firstly discretisation does not significantly effect the accuracy of the clusters when there are no outliers and secondly it significantly increases the accuracy in the presence of outliers, even when the probability of outlier is very low

    On allometric equations for predicting body mass of dinosaurs

    No full text
    Packard and colleagues investigate the prediction of the body mass of dinosaurs, using allometric models, advocating parameter estimation via direct optimization of a least-squares criterion on arithmetic axes rather than the conventional approach based on linear least-squares regression on logarithmic axes. We examine the statistical assumptions underpinning each approach, and find the method of Packard to be conceptually unsatisfactory as it assumes absolute rather than relative variability in body mass for a given long-bone circumference, which is biologically implausible. Their proposed approach is thus unduly sensitive to small relative errors for large mammals; as the largest (the elephant) is comparatively light for its large-bone circumference, the resulting model grossly overestimates the body mass of small mammals and is likely to substantially underestimate the body mass of dinosaurs. It is also important to note, however, that the error bars for the conventional model already indicate substantial uncertainty in body mass, such that for example, the body mass of Apatosaurus louisae may be as high as 63 metric tonnes, or as low as 23 metric tonnes, with a modal value of 38 metric tonnes

    Sparse Bayesian kernel survival analysis for modeling the growth domain of microbial pathogens

    No full text
    Survival analysis is a branch of statistics concerned with the time elapsing before "failure," with diverse applications in medical statistics and the analysis of the reliability of electrical or mechanical components. We introduce a parametric accelerated life survival analysis model based on kernel learning methods that, at least in principal, is able to learn arbitrary dependencies between a vector of explanatory variables and the scale of the distribution of survival times. The proposed kernel survival analysis method is then used to model the growth domain of Clostridium botulinum, the food processing and storage conditions permitting the growth of this foodborne microbial pathogen, leading to the production of the neurotoxin responsible for botulism. A Bayesian training procedure, based on the evidence framework, is used for model selection and to provide a credible interval on model predictions. The kernel survival analysis models are found to be more accurate than models based on more traditional survival analysis techniques but also suggest a risk assessment of the foodborne botulism hazard would benefit from the collection of additional data

    Predictive uncertainty in environmental modelling

    Get PDF
    Artificial neural networks have proved an attractive approach to non-linear regression problems arising in environmental modelling, such as statistical downscaling, short-term forecasting of atmospheric pollutant concentrations and rainfall run-off modelling. However, environmental datasets are frequently very noisy and characterized by a noise process that may be heteroscedastic (having input dependent variance) and/or non-Gaussian. The aim of this paper is to review existing methodologies for estimating predictive uncertainty in such situations and, more importantly, to illustrate how a model of the predictive distribution may be exploited in assessing the possible impacts of climate change and to improve current decision making processes. The results of the WCCI-2006 predictive uncertainty in environmental modelling challenge are also reviewed, suggesting a number of areas where further research may provide significant benefits

    Highly Parallel Convolution Method to Compare DNA Sequences with Enforced In/Del and Mutation Tolerance

    No full text
    ВСкст ΡΡ‚Π°Ρ‚ΡŒΠΈ Π½Π΅ публикуСтся Π² ΠΎΡ‚ΠΊΡ€Ρ‹Ρ‚ΠΎΠΌ доступС Π² соотвСтствии с ΠΏΠΎΠ»ΠΈΡ‚ΠΈΠΊΠΎΠΉ ΠΆΡƒΡ€Π½Π°Π»Π°.New error tolerant method for the comparison and analysis of symbol sequences is proposed. The method is based on convolution function calculation, where the function is defined over the binary numeric sequences obtained by the specific transformation of original symbol sequence. The method allows highly parallel implementation and is of great value for insertion/delition mutations search. To calculate the convolution function, fast Fourier transform is used in the method implementation. ΠŸΡ€Π΅Π΄Π»Π°Π³Π°Π΅Ρ‚ΡΡ Π½ΠΎΠ²Ρ‹ΠΉ устойчивый ΠΊ ошибкам ΠΌΠ΅Ρ‚ΠΎΠ΄ сравнСния ΠΈ Π°Π½Π°Π»ΠΈΠ·Π° ΡΠΈΠΌΠ²ΠΎΠ»ΡŒΠ½Ρ‹Ρ… ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚Π΅ΠΉ. ΠœΠ΅Ρ‚ΠΎΠ΄ основан Π½Π° вычислСнии Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈ свСртки, Π³Π΄Π΅ функция опрСдСляСтся Π½Π°Π΄ Π΄Π²ΠΎΠΈΡ‡Π½Ρ‹ΠΌΠΈ числовыми ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΡΠΌΠΈ, ΠΏΠΎΠ»ΡƒΡ‡Π΅Π½Π½Ρ‹ΠΌΠΈ ΠΏΡƒΡ‚Π΅ΠΌ ΠΊΠΎΠ½ΠΊΡ€Π΅Ρ‚Π½ΠΎΠ³ΠΎ прСобразования исходной ΠΏΠΎΡΠ»Π΅Π΄ΠΎΠ²Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎΡΡ‚ΠΈ символов. ΠœΠ΅Ρ‚ΠΎΠ΄ допускаСт эффСктивноС распараллСливаниС ΠΈ ΠΈΠΌΠ΅Π΅Ρ‚ большоС Π·Π½Π°Ρ‡Π΅Π½ΠΈΠ΅ для поиска ΠΌΡƒΡ‚Π°Ρ†ΠΈΠΉ вставки / удалСния. Для вычислСния Ρ„ΡƒΠ½ΠΊΡ†ΠΈΠΈ свСртки Π² Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΠΈ ΠΌΠ΅Ρ‚ΠΎΠ΄Π° ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΠ΅Ρ‚ΡΡ быстроС ΠΏΡ€Π΅ΠΎΠ±Ρ€Π°Π·ΠΎΠ²Π°Π½ΠΈΠ΅ Π€ΡƒΡ€ΡŒΠ΅
    corecore