4,117 research outputs found
Recommended from our members
Ensuring Access to Safe and Nutritious Food for All Through the Transformation of Food Systems
The Adirondack Chronology
The Adirondack Chronology is intended to be a useful resource for researchers and others interested in the Adirondacks and Adirondack history.https://digitalworks.union.edu/arlpublications/1000/thumbnail.jp
Omics measures of ageing and disease susceptibility
While genomics has been a major field of study for decades due to relatively inexpensive genotyping arrays, the recent advancement of technology has also allowed the measure and study of various âomicsâ. There are now numerous methods and platforms available that allow high throughput and high dimensional quantification of many types of biological molecules. Traditional genomics and transcriptomics are now joined by proteomics, metabolomics, glycomics, lipidomics and epigenomics.
I was lucky to have access to a unique resource in the Orkney Complex Disease Study (ORCADES), a cohort of individuals from the Orkney Islands that are extremely deeply annotated. Approximately 1000 individuals in ORCADES have genomics, proteomics, lipidomics, glycomics, metabolomics, epigenomics, clinical risk factors and disease phenotypes, as well as body composition measurements from whole body scans. In addition to these cross-sectional omics and health related measures, these individuals also have linked electronic health records (EHR) available, allowing the assessment of the effect of these omics measures on incident disease over a ~10-year follow up period. In this thesis I use this phenotype rich resource to investigate the relationship between multiple types of omics measures and both ageing and health outcomes.
First, I used the ORCADES data to construct measures of biological age (BA). The idea that there is an underlying rate at which the body deteriorates with age that varies between individuals of the same chronological age, this biological age, would be more indicative of health status, functional capacity and risk of age-related diseases than chronological age. Previous models estimating BA (ageing clocks) have predominantly been built using a single type of omics assay and comparison between different omics ageing clocks has been limited. I performed the most exhaustive comparison of different omics ageing clocks yet, with eleven clocks spanning nine different omics assays. I show that different omics clocks overlap in the information they provide about age, that some omics clocks track more generalised ageing while others track specific disease risk factors and that omics ageing clocks are prognostic of incident disease over and above chronological age.
Second, I assessed whether individually or in multivariable models, omics measures are associated with health-related risk factors or prognostic of incident disease over 10 years post-assessment. I show that 2,686 single omics biomarkers are associated with 10 risk factors and 44 subsequent incident diseases. I also show that models built using multiple biomarkers from whole body scans, metabolomics, proteomics and clinical risk factors are prognostic of subsequent diabetes mellitus and that clinical risk factors are prognostic of incident hypertensive disorders, obesity, ischaemic heart disease and Framingham risk score.
Third, I investigated the genetic architecture of a subset of the proteomics measures available in ORCADES, specifically 184 cardiovascular-related proteins. Combining genome-wide association (GWAS) summary statistics from ORCADES and 17 other cohorts from the SCALLOP Consortium, giving a maximum sample size of 26,494 individuals, I performed 184 genome-wide association meta-analyses (GWAMAs) on the levels of these proteins circulating in plasma. I discovered 592 independent significant loci associated with the levels of at least one protein. I found that between 8-37% of these significant loci colocalise with known expression quantitative trait loci (eQTL). I also find evidence of causal associations between 11 plasma protein levels and disease susceptibility using Mendelian randomisation, highlighting potential candidate drug targets
Statistical Learning for Gene Expression Biomarker Detection in Neurodegenerative Diseases
In this work, statistical learning approaches are used to detect biomarkers for neurodegenerative diseases (NDs). NDs are becoming increasingly prevalent as populations age, making understanding of disease and identification of biomarkers progressively important for facilitating early diagnosis and the screening of individuals for clinical trials. Advancements in gene expression profiling has enabled the exploration of disease biomarkers at an unprecedented scale. The work presented here demonstrates the value of gene expression data in understanding the underlying processes and detection of biomarkers of NDs. The value of novel approaches to previously collected -omics data is shown and it is demonstrated that new therapeutic targets can be identified. Additionally, the importance of meta-analysis to improve power of multiple small studies is demonstrated. The value of blood transcriptomics data is shown in applications to researching NDs to understand underlying processes using network analysis and a novel hub detection method. Finally, after demonstrating the value of blood gene expression data for investigating NDs, a combination of feature selection and classification algorithms were used to identify novel accurate biomarker signatures for the diagnosis and prognosis of Parkinsonâs disease (PD) and Alzheimerâs disease (AD). Additionally, the use of feature pools based on previous knowledge of disease and the viability of neural networks in dimensionality reduction and biomarker detection is demonstrated and discussed. In summary, gene expression data is shown to be valuable for the investigation of ND and novel gene biomarker signatures for the diagnosis and prognosis of PD and AD
Unraveling the effect of sex on human genetic architecture
Sex is arguably the most important differentiating characteristic in most mammalian
species, separating populations into different groups, with varying behaviors, morphologies,
and physiologies based on their complement of sex chromosomes, amongst other factors. In
humans, despite males and females sharing nearly identical genomes, there are differences
between the sexes in complex traits and in the risk of a wide array of diseases. Sex provides
the genome with a distinct hormonal milieu, differential gene expression, and environmental
pressures arising from gender societal roles. This thus poses the possibility of observing
gene by sex (GxS) interactions between the sexes that may contribute to some of the
phenotypic differences observed. In recent years, there has been growing evidence of GxS,
with common genetic variation presenting different effects on males and females. These
studies have however been limited in regards to the number of traits studied and/or
statistical power. Understanding sex differences in genetic architecture is of great
importance as this could lead to improved understanding of potential differences in
underlying biological pathways and disease etiology between the sexes and in turn help
inform personalised treatments and precision medicine.
In this thesis we provide insights into both the scope and mechanism of GxS across the
genome of circa 450,000 individuals of European ancestry and 530 complex traits in the UK
Biobank. We found small yet widespread differences in genetic architecture across traits
through the calculation of sex-specific heritability, genetic correlations, and sex-stratified
genome-wide association studies (GWAS). We further investigated whether sex-agnostic
(non-stratified) efforts could potentially be missing information of interest, including sex-specific trait-relevant loci and increased phenotype prediction accuracies. Finally, we
studied the potential functional role of sex differences in genetic architecture through sex
biased expression quantitative trait loci (eQTL) and gene-level analyses.
Overall, this study marks a broad examination of the genetics of sex differences. Our findings
parallel previous reports, suggesting the presence of sexual genetic heterogeneity across
complex traits of generally modest magnitude. Furthermore, our results suggest the need to
consider sex-stratified analyses in future studies in order to shed light into possible sex-specific molecular mechanisms
Hunting Wildlife in the Tropics and Subtropics
The hunting of wild animals for their meat has been a crucial activity in the evolution of humans. It continues to be an essential source of food and a generator of income for millions of Indigenous and rural communities worldwide. Conservationists rightly fear that excessive hunting of many animal species will cause their demise, as has already happened throughout the Anthropocene. Many species of large mammals and birds have been decimated or annihilated due to overhunting by humans. If such pressures continue, many other species will meet the same fate. Equally, if the use of wildlife resources is to continue by those who depend on it, sustainable practices must be implemented. These communities need to remain or become custodians of the wildlife resources within their lands, for their own well-being as well as for biodiversity in general. This title is also available via Open Access on Cambridge Core
Antimicrobial Resistance at the Human-Animal Interface
Livestock-associated Methicillin-resistant Staphylococcus aureus (MRSA) are an emerging public-health issue in Australia, particularly amongst livestock and animal workers. We examined MRSA, isolated from humans and animals in Australia whole-genome sequencing and identified zoonotic and anthropozoonotic MRSA transmission, and antimicrobial-resistance gene transfer between MRSA of different host origin. This work highlights the need for expanded monitoring of microbial livestock pathogens and indicates the importance of prudent antimicrobial use in animal health
Machine learning and large scale cancer omic data: decoding the biological mechanisms underpinning cancer
Many of the mechanisms underpinning cancer risk and tumorigenesis are still not
fully understood. However, the next-generation sequencing revolution and the
rapid advances in big data analytics allow us to study cells
and complex phenotypes at unprecedented depth and breadth. While experimental
and clinical data are still fundamental to validate findings and confirm
hypotheses, computational biology is key for the analysis of system- and
population-level data for detection of hidden patterns and the generation of
testable hypotheses.
In this work, I tackle two main questions regarding cancer risk and tumorigenesis
that require novel computational methods for the analysis of system-level omic
data. First, I focused on how frequent, low-penetrance inherited variants modulate
cancer risk in the broader population. Genome-Wide Association Studies (GWAS)
have shown that Single Nucleotide Polymorphisms (SNP) contribute to cancer risk
with multiple subtle effects, but they are still failing to give further insight
into their synergistic effects. I developed a novel hierarchical Bayesian
regression model, BAGHERA, to estimate heritability at the gene-level from GWAS
summary statistics. I then used BAGHERA to analyse data from 38 malignancies in
the UK Biobank. I showed that genes with high heritable risk are involved in key
processes associated with cancer and are often localised in genes that are
somatically mutated drivers.
Heritability, like many other omics analysis methods, study the effects of DNA
variants on single genes in isolation. However, we know that most biological
processes require the interplay of multiple genes and we often lack a broad
perspective on them. For the second part of this thesis, I then worked on the
integration of Protein-Protein Interaction (PPI) graphs and omics data, which
bridges this gap and recapitulates these interactions at a system level. First,
I developed a modular and scalable Python package, PyGNA, that enables
robust statistical testing of genesets' topological properties. PyGNA complements
the literature with a tool that can be routinely introduced in bioinformatics
automated pipelines. With PyGNA I processed multiple genesets obtained from
genomics and transcriptomics data. However, topological properties alone have
proven to be insufficient to fully characterise complex phenotypes.
Therefore, I focused on a model that allows to combine topological and functional
data to detect multiple communities associated with a phenotype. Detecting
cancer-specific submodules is still an open problem, but it has the potential to
elucidate mechanisms detectable only by integrating multi-omics data. Building
on the recent advances in Graph Neural Networks (GNN), I present a supervised
geometric deep learning model that combines GNNs and Stochastic Block Models
(SBM). The model is able to learn multiple graph-aware representations, as
multiple joint SBMs, of the attributed network, accounting for nodes
participating in multiple processes. The simultaneous estimation of structure
and function provides an interpretable picture of how genes interact in specific
conditions and it allows to detect novel putative pathways associated with
cancer
Breaking Ub with Leishmania mexicana: a ubiquitin activating enzyme as a novel therapeutic target for leishmaniasis
Leishmaniasis is a neglected tropical disease, which inflicts a variety of gruesome pathologies on humans. The number of individuals afflicted with leishmaniasis is thought to vary between 0.7 and 1.2 million annually, of whom it is estimated that 20 to 40 thousand die. This problem is exemplary of inequality in healthcare â current leishmaniasis treatments are inadequate due to toxicity, cost, and ineffectiveness, so there is an urgent need for improved chemotherapies.
Ubiquitination is a biochemical pathway that has received attention in cancer research. It is the process of adding the ubiquitin protein as a post-translational modification to substrate proteins, using an enzymatic cascade comprised of enzymes termed E1s, E2s, and E3s. Ubiquitination can lead to degradation of substrate proteins, or otherwise modulate their function. As the name suggests, this modification can be found across eukaryotic cell biology. As such, interfering with ubiquitination may interfere with essential biological processes, which means ubiquitination may present a new therapeutic target for leishmaniasis.
Before ubiquitination inhibitors can be designed, components of the ubiquitination system must be identified. To this end, a bioinformatic screening campaign employed BLASTs and hidden Markov models, using characterised orthologs from model organisms as bait, to screen publicly-available Leishmania mexicana genome sequence databases, searching for genes encoding putative E1s, E2s, and E3s. To confirm some of these identifications on a protein level, activity-based probes, protein pulldowns, and mass spectrometry were used. Using an activity-based probe that emulates the structure of adenylated ubiquitin, E1s were identified, and their relative abundance quantified. A chemical crosslinker extended the reach of this probe, allowing the identification of an E2 (LmxM.33.0900). It is noted that L. mexicana has two E1s â unusual for a single celled organism. Of these E1s, LmxM.34.3060 was considerably more abundant than LmxM.23.0550 in both major life cycle stages of the in vitro Leishmania cultures.
It is important to describe the wider context of these enzymes â what is their interactome, what are their substrates? To study this, CRISPR was used to fuse a proximity-based labelling system, BioID, on genes of interest â LmxM.34.3060 and LmxM.33.0900. The E2 (LmxM.33.0900) was shown to interact with the E1 (LmxM.34.3060), validating the results from the activity-based probe and crosslinker experiments. Due to sequence homology with characterised orthologs, the E2 was hypothesised to function in the endoplasmic reticulum degradation pathway. Immunoprecipitations of a ubiquitin motif, diglycine, were conducted with a view to gathering information on the substrates of ubiquitin. Anti-diglycine peptides included some of those identified by BioID. Experiments examining ubiquitinâs role in the DNA damage response were also initiated, as were improvements to the proximity-based labelling system, however these were not followed to completion due to a lack of time and resources.
To examine the possibility of finding novel drug targets in the ubiquitination cascade, recombinant proteins were expressed. LmxM.34.3060 was expressed in a functional form, while a putative SUMO E2 (LmxM.02.0390) was functional after refolding. Expressed LmxM.33.0900 was not functional and could not be refolded into a functional form. Drug assays were conducted on LmxM.34.3060, which found an inhibitor of the human ortholog, TAK-243, to be 20-fold less effective against the Leishmania enzyme. Additional assays found an inhibitor that was 50-fold more effective at inhibiting the Leishmania enzyme as opposed to its human equivalent - 5'O-sulfamoyl adenosine. Furthermore, a new mechanism of action, inhibiting the E1, for was identified for drugs previously characterised to inhibit protein synthesis. LmxM.34.3060 underwent biophysical characterisation, with structural information obtained using SAXS and protein crystallography. A crystal structure was solved to 3.1 Ă
, with the in-solution SAXS structure complementary to this. TAK-243 was modelled into the LmxM.34.3060 structure and clashes were predicted, concurring with TAK-243âs reduced efficacy against the Leishmania enzyme in the drug assays.
This project aimed to characterise the potential of an understudied biochemical system to provide novel therapeutic targets for a neglected tropical pathogen. To achieve this aim it presents the identifications of two E1s, an interactome, a structure, and a potent, selective inhibitor of a Leishmania ubiquitin activating enzyme
Weed/Plant Classification Using Evolutionary Optimised Ensemble Based On Local Binary Patterns
This thesis presents a novel pixel-level weed classification through rotation-invariant uniform local binary pattern (LBP) features for precision weed control. Based on two-level optimisation structure; First, Genetic Algorithm (GA) optimisation to select the best rotation-invariant uniform LBP configurations; Second, Covariance Matrix Adaptation Evolution Strategy (CMA-ES) in the Neural Network (NN) ensemble to select the best combinations of voting weights of the predicted outcome for each classifier. The model obtained 87.9% accuracy in CWFID public benchmark
- âŠ