Research into dog health has historically relied on small scale cross-sectional
studies or specialised medical and clinical data, which are subject to bias and
are difficult to generalise to the wider canine population. The digital era
presents an opportunity to collect sources of Big Data for health surveillance
and research, defined as data that is high volume, velocity or variability. Data
science techniques have made accessing, managing and analysing such
datasets more achievable.
Large scale cohort studies are needed to estimate the incidence of
disease and to identify factors associated with long-term canine health. This
project was primarily based on dog owner questionnaires from Dogslife, an
internet-based cohort of Labrador Retrievers in the UK set up in 2010. In this
thesis, I designed data cleaning methods for Dogslife and validated some of
them on veterinary and human medical records and investigated the
epidemiology of canine health using Dogslife data, Google Trends and 16S
ribosomal RNA gene sequencing data derived from canine faecal samples.
A decision-making algorithm for identifying, correcting or removing
implausible values in growth measurements was designed and tested in
combination with five different data cleaning methods, which were then
applied to five datasets. The algorithm was most effective in combination with
non-linear mixed effects models and increased the average sensitivity and
specificity of the models alone by 7.68% and 0.42% respectively. This
method was adaptable and had several useful functions including allowing for
individual growth trajectories, preserving data where possible and removing
duplications.
A vomiting outbreak was evident in UK dogs between December 2019
and March 2020 in data from Dogslife and Google Trends search queries.
The odds of a vomiting incident being reported to Dogslife was 1.51 (95% CI:
1.24 – 1.84) in comparison to the same time period in previous years
(December to March, 2010 to 2019). Dogslife data identified risks for a dog
experiencing a vomiting episode and differences in owner-decision making
when seeking veterinary attention for vomiting during the outbreak.
Compared with previous years (March 23rd to July 4th, 2010 to 2019), the
COVID-19 restrictions study period (March 23rd to July 4th 2020) was
associated with owners reporting increases in their dogs’ exercise and
worming and decreases in insurance, titbit-feeding and vaccination. Odds of
owners reporting that their dogs had an episode of coughing (0.20, 95% CI:
0.04 – 0.92) and that they took their dogs to a veterinarian with an episode of
any illness (0.58, 95% CI: 0.45 – 0.76) were lower during the COVID-19
restrictions compared to before.
A longitudinal sub-study of Dogslife Labrador Retriever puppies was
designed to investigate associations between environmental and health
factors and the development of the canine microbiome. When their puppies
were three to four, seven and 12 months of age, owners submitted digestive
health questionnaires and faecal samples from their puppies, which were
used to produce 16S ribosomal RNA gene sequencing data. Dogs’ faecal
microbiota were successfully characterised for each wave of sample
collection at the different dog ages. The largest source of variation in the
composition of dogs’ microbiomes was explained by differences between
individual dogs, explaining approximately 50%. Additional associations were
found between age, sex, coat colour, UK geographical region, household
type, coprophagia, contact with other animals, recent antibiotic use and
recent diarrhoea and various differences in the diversity and composition of
the microbiome.
Owner-derived data can be used alongside other sources of Big Data and
provides valid and valuable information for the surveillance of veterinary
health that contains detail about environmental factors not typically present in
medical records or clinical studies. Such information is becoming easier to
handle and analyse with the use of data science techniques. Furthermore,
cohort studies can be used for the recruitment of participants to sub-studies
that aim to answer a specialised question, such as microbiome research