3 research outputs found
Transmission networks inferred from HIV sequence data
HIV in the UK in the 1980s was concentrated within men who have sex with men
(MSM) and people who inject drugs (PWID) but heterosexual sex is now the most
frequently reported risk behaviour. As these risk groups are associated with different
virus populations, this is reflected in the subtype diversification of the UK epidemic,
which was historically dominated by subtype B.
I have made use of a national database of HIV sequences collected during routine
clinical care, which also contains data on age, sex, route of exposure & ethnicity. The
2014 release of the UK HIV Drug Resistance Database contained data from over
60,000 patients.
In this thesis, I first describe the development of novel tools that rapidly and
automatically identify HIV clusters within phylogenetic trees containing tens of
thousands of sequences because they represent transmission chains within the larger
infected population.
I use these tools to compare the HIV subtype B epidemics in the UK and Switzerland,
which had both been described separately but using different approaches. Working
with Swiss colleagues, I was able to analyse the epidemics in exactly the same way
without having to share sensitive data. I found clustering in the UK to be much higher
at relaxed thresholds than in Switzerland (34% vs 16%) indicating that the UK
database is more likely to capture transmission chains. Down sampling revealed that
this pattern is driven by the larger size of the UK epidemic. At tighter cluster
thresholds, the epidemics were very similar.
I next use these tools to analyse the spread of emerging subtypes A1, C, D and G in
the UK. I found both risk group and cluster size to be predictive of cluster growth,
which I tested using simulations and a GLM. Growth of MSM and crossover clusters
was significantly higher than expected for subtypes A1 and C, indicating that crossover
from heterosexuals to MSM has contributed to their expansion within the UK.
Numbers were small for subtypes D and G but the proportion of new diagnoses linking
to MSM and crossover clusters was similar to A1 and C, suggesting that the same
pattern may be emerging for D and G.
I conclude by evaluating the accuracy of a method previously described by our group
to generate transmission networks from HIV sequences. The interpretation of
clustering patterns from phylogenetic trees is difficult because of the absence of a
standardised statistical framework. In contrast, a body of work exists that relates
disease transmission to networks. Using large simulated datasets, I developed
algorithms which eliminate improbable links. I then reconstructed improved UK
transmission networks for subtypes A1, B and C and compare network metrics (such
as the degree distribution) between risk groups.
Together with other evidence, this thesis demonstrates that the UK HIV epidemic
continues to be driven by transmission among MSM. The UK epidemic is no longer
compartmentalised and the crossing over of subtypes across risk groups has been
facilitated by MSM also having sex with women
Visualization of Single Clusters
Evaluation of clustering partitions is a crucial step in data processing. A multitude of measures exists, which - unfortunately - give for one data set various results.
In this paper we present a visualization technique to visualize single clusters of high-dimensional data. Our method maps a single cluster to the plane trying to preserve the membership degrees. The resulting scatter plot illustrates separation of the respecting cluster and the need of additional prototypes as well. Since clusters will be visualized individually, additional prototypes can be added locally where they are needed
Visualization of Single Clusters
Abstract. Evaluation of clustering partitions is a crucial step in data processing. A multitude of measures exists, which- unfortunately- give for one data set various results. In this paper we present a visualization technique to visualize single clusters of high-dimensional data. Our method maps single clusters to the plane trying to preserve membership degrees that describe a data point’s gradual membership to a certain cluster. The resulting scatter plot illustrates separation of the respecting cluster and the need of additional prototypes as well. Since clusters will be visualized individually, additional prototypes can be added locally where they are needed