4 research outputs found
The effect on inferences of population size of the sampling scheme for intraspecific DNA sequences
Variation in samples of DNA sequences from within one species can be informative about the demographic processes that have affected that species, revealing signals of migration patterns and population size changes in the past. The demographic models that are fitted to the data might vary, as might the way the data are used, but one almost ubiquitous assumption is that the samples sequenced in the study are randomly chosen. Yet this is rarely plausible either because random sampling is practically impossible to perform or indeed because the samples for analysis are very consciously selected in some non-random way.
This thesis explores the robustness of a particular flexible class of models used for inference of variable population size, the so-called skyline plot methods, to non-randomness of sampling by taking a simulation approach. The particular sampling scheme investigated takes sequences belonging to one subtree (or haplogroup) of the genealogy of a non-recombining locus. Pitfalls of analyses ignoring the sampling scheme are reported and a recommendation for the interpretation of such analyses is made.
This work uses the Bayesian skyline plot model to infer population sizes and in simulation settings this model proves to be accurate in estimating population size as a function of time, from random samples. When a non-random sample defined by a haplogroup is analysed, the model can infer the shape of the population curve well but fails to capture the magnitude, when compared to the population curve inferred from a random sample or to the true population curve. Functional data analysis techniques were used to explore the relationship between the population curves inferred from random and non-random samples. After establishing that there is indeed a strong relationship between the two, the goal was to develop a straightforward post hoc correction to the inferred population curve from the non-random sample that is easy to apply and permits practitioners to allow for the violations of model assumptions caused by the non-random sample, so obtaining a more reliable estimate of population size. This is illustrated by applying the approach to samples of sequences taken from human mitochondrial DNA. The correction uses information on the prevalence of the mutation defining the non-random subtree
Classification of cow diet based on milk mid infrared spectra: a data analysis competition at the "International workshop of spectroscopy and chemometrics 2022"
In April 2022, the Vistamilk SFI Research Centre organized the second edition
of the "International Workshop on Spectroscopy and Chemometrics - Applications
in Food and Agriculture". Within this event, a data challenge was organized
among participants of the workshop. Such data competition aimed at developing a
prediction model to discriminate dairy cows' diet based on milk spectral
information collected in the mid-infrared region. In fact, the development of
an accurate and reliable discriminant model for dairy cows' diet can provide
important authentication tools for dairy processors to guarantee product origin
for dairy food manufacturers from grass-fed animals. Different statistical and
machine learning modelling approaches have been employed during the workshop,
with different pre-processing steps involved and different degree of
complexity. The present paper aims to describe the statistical methods adopted
by participants to develop such classification model.Comment: 27 pages, 9 figure
Classification of cow diet based on milk Mid Infrared Spectra : A data analysis competition at the “International Workshop on Spectroscopy and Chemometrics 2022”
In April 2022, the Vistamilk SFI Research Centre organized the second edition of the “International Workshop on Spectroscopy and Chemometrics – Applications in Food and Agriculture”. Within this event, a data challenge was organized among participants of the workshop. Such data competition aimed at developing a prediction model to discriminate dairy cows’ diet based on milk spectral information collected in the mid-infrared region. In fact, the development of an accurate and reliable discriminant model for dairy cows’ diet can provide important authentication tools for dairy processors to guarantee product origin for dairy food manufacturers from grass-fed animals. Different statistical and machine learning modelling approaches have been employed during the workshop, with different pre-processing steps involved and different degree of complexity. The present paper aims to describe the statistical methods adopted by participants to develop such classification model
Classification of cow diet based on milk Mid Infrared Spectra: A data analysis competition at the “International Workshop on Spectroscopy and Chemometrics 2022”
In April 2022, the Vistamilk SFI Research Centre organized the second edition of the “International Workshop on Spectroscopy and Chemometrics – Applications in Food and Agriculture”. Within this event, a data challenge was organized among participants of the workshop. Such data competition aimed at developing a prediction model to discriminate dairy cows’ diet based on milk spectral information collected in the mid-infrared region. In fact, the development of an accurate and reliable discriminant model for dairy cows’ diet can provide important authentication tools for dairy processors to guarantee product origin for dairy food manufacturers from grass-fed animals. Different statistical and machine learning modelling approaches have been employed during the workshop, with different pre-processing steps involved and different degree of complexity. The present paper aims to describe the statistical methods adopted by participants to develop such classification model