1,499 research outputs found
Advanced sequencing technologies applied to human cytomegalovirus
The betaherpesvirus human cytomegalovirus (HCMV) is a ubiquitous viral pathogen. It is the most common cause of congenital infection in infants and of opportunistic infections in immunocompromised patients worldwide. The large double-stranded DNA genome of HCMV (236 kb) contains several genes that exhibit a high degree of variation among strains within an otherwise highly conserved sequence. These hypervariable genes encode immune escape, tropism or regulatory factors that may affect virulence. Variation arising from these genes and from an evolutionary history of recombination between strains has been hypothesised to be linked to disease severity. To investigate this, the HCMV genome has been scrutinised in detail over the years using a variety of molecular techniques, most looking only at one or a few of these genes at a time. The advent of high-throughput sequencing (HTS) technology 20 years ago then started to enable more in-depth whole-genome analyses. My study extends this field by using both HTS and the more recently developed long-read nanopore technology to determine HCMV genome sequences directly from clinical samples. Firstly, I used an Illumina HTS pipeline to sequence HCMV strains directly from formalin-fixed, paraffin-embedded (FFPE) tissues. FFPE samples are a valuable repository for the study of relatively rare diseases, such as congenital HCMV (cCMV). However, formalin fixation induces DNA fragmentation and cross-linking, making this a challenging sample type for DNA sequencing. I successfully sequenced five whole HCMV genomes from FFPE tissues. Next, I developed a pipeline utilising the single-molecule, long-read sequencer from Oxford Nanopore Technologies (ONT) to sequence HCMV initially from high-titre cellcultured laboratory strains and then from clinical samples with high HCMV loads. Finally, I utilised a direct RNA sequencing protocol with the ONT sequencer to characterise novel HCMV transcripts produced during infection in cell culture, demonstrating the existence of transcript isoforms with multiple splice sites. Overall, my findings demonstrate how advanced sequencing technologies can be used to characterise the genome and transcriptome of a large DNA virus, and will facilitate future studies on HCMV prognostic factors, novel antiviral targets and vaccine development
Unlocking the genome of perch - From genes to ecology and back again
Eurasian perch Perca fluviatilis has been a popular model species for decades in the fields of aquatic ecology, community dynamics, behaviour, physiology and ecotoxicology. Yet, despite extensive research, the progress of integrating genomic perspective into existing ecological knowledge in perch has been relatively modest. Meanwhile, the emergence of high-throughput sequencing technologies has completely changed the methods for genetic variation assessment and conducting biodiversity and evolutionary research. During the last 5 years, three genome assemblies of P. fluviatilis have been generated, allowing substantial advancement of our understanding of the interactions between ecological and evolutionary processes at the whole-genome level. We review the past progress, current status and potential future impact of the genomic resources and tools for ecological research in Eurasian perch focusing on the utility of recent whole-genome assemblies. Furthermore, we demonstrate the power of genome-wide approaches and newly developed tools and outline recent cases where genomics have contributed to new ecological and evolutionary knowledge. We explore how the availability of reference assembly enables the efficient application of various statistical tools, and how genomic approaches can provide novel insights into resource polymorphism, host-parasite interactions and to genetic and phenotypic changes associated with climate change and harvesting-induced evolution. In summary, we call for increased integration of genomic tools into ecological research for perch, as well as for other fish species, which is likely to yield novel insights into processes linking the adaptation and plasticity to ecosystem functioning and environmental change
Characterising novel genetic causes of growth failure
To identify novel genetic causes of growth failure, I developed a unique, targeted whole gene panel for rapid and accurate genetic testing of patients with short stature and features of Growth Hormone Insensitivity (GHI) or unexplained short stature. This included 64 genes of interest, including those in the GH-IGF1 pathway and genes linked to conditions with overlapping features. In parallel, I also assessed these patients for copy number variants. Using custom bioinformatic pipelines to filter these data sets and a variety of in silico prediction programs, I identified interesting novel genetic defects in both known and candidate growth genes. I then performed functional analysis of these defects to determine if they affected gene structure/function and could explain the patient phenotype. I identified several novel splicing mutations in the Growth Hormone Receptor (GHR) causing a spectrum of GHI. These include a novel mutation deep within intron 6 GHR that leads to mis-splicing and pseudoexon inclusion. Pseudoexon inclusion leads to frameshift of the GHR and thus causes a non-functional Growth Hormone Receptor and severe GHI. I discovered two novel heterozygous GHR mutations in patients with milder GHI phenotypes. These mutations both led to mis-splicing of exon 9 of the GHR and act in a dominant negative effect on the GHR, reducing the efficacy of signalling and explaining their milder phenotypes. I identified a rare novel heterozygous IGF1 variant that I hypothesised would impair IGF-1 cleavage causing functional IGF-1 deficiency. Our patient cohort was enriched for low frequency CNVs, particularly in patients with subtle features of Silver Russell Syndrome. This is the first study to assess CNVs in patients with GHI. From my CNV analysis, I identified CHD1L and HMGA2 as key candidate growth genes and functionally assessed several patient variants identified within our cohort
Exponential power mixture model for regression : estimation and variable selection
The mixture regression model is an important technique used in statistical modelling to investigate the relationship between variables. It has been applied in many fields such as genetics, finance and biology. In this research, we focus on its application to genetic data. As we know gene expression data normally contains unknown correlation structures even after normalization, hence it raises a great challenge for the existing clustering methods such as the Gaussian mixture(GM) model and k-mean. Here we use the exponential power distribution to robustly overcome the clustering of gene expression data by treating the data as a mixture of regression. The exponential power distribution (EPD) is a scale mixture of Gaussian distributions that has varying shape parameters. In this study we introduce and develop our method based on two different aspects of multiple regression with random errors distributed according to the exponential power distribution. The first aspect is estimation: we use both the ExpectationMaximisation algorithm (EM) and the Newton-Raphson method to estimate the parameters of the exponential power distribution mixture regression models. The second aspect is simultaneous variable selection and clustering: we develop a LASSO-type method to select only the related variables in a large dataset, especially for a high dimensional dataset. The novelty of this research regarding to the Expectation-Maximization algorithm is that we convert each penalised mixture regression estimation problem to a LASSO (Least absolute shrinkage and selection operator) problem. The performance of our method is assessed on both independent and dependent data. We also compared the EPD mixture regression with Gaussian mixture regressions by simulations and real data analyses. We also derive the model selection criteria such as AIC, BIC and EBIC for both EPD mixture and GM models
Recommended from our members
CHIMERIC ANTIGEN RECEPTORS RECOGNIZING CANCER-SPECIFIC TN GLYCOPEPTIDE VARIANTS
Disclosed are binding proteins, or fragments thereof, that specifically binds to a cancer-specific glycosylation variant of a protein and to a second epitope on the same protein, to a different protein presented on the same cell, or to a different protein presented on a different cell, such as an encoded polypeptide binding to both a cancer cell and an activated T cell. Also disclosed are polynucleotides encoding such binding proteins, including polynucleotides comprising codon-optimized coding regions and polynucleotides comprising coding regions that are not codon-optimized for expression in a particular host cell. Also disclosed are methods of making the encoded polypeptide and methods of using the polypeptide to treat, prevent or ameliorate the symptom of a disease such as cancer
The quantitative genetics of gene expression: regulatory complexity and patterns of variation
The maintenance of genetic variation in quantitative traits has been a difficult conundrum for decades, with no real consensus as to the relative roles of evolutionary forces in shaping levels of heritable variation. For a long time, it has been common belief that traits under strong selection should carry reduced levels of heritable variation when compared to more weakly selected traits, due to the variance reducing effect of selection. While early studies supported this statement by showing that fitness traits (such as life‑history traits) tend to have lower heritabilities than nonfitness traits (such as morphological traits), later studies showed that life history traits actually have higher levels of genetic variation, and their low heritabilities are mostly due to higher environmental variation. In light of these findings and the observation that fitness traits tend to have more complex regulation, it was proposed that the rate of mutational input to a trait must be important in shaping differences in heritable variation between traits, and that traits with more complex regulatory networks must have both larger mutational target sizes and more opportunity for environmental perturbation.
In this thesis, I directly test the influence of mutation in shaping differences in heritable variation between gene expression traits and begin exploring the role of the gene regulatory network in determining patterns of standing genetic, new mutational and environmental variation across traits. I begin by presenting a novel statistical method (Chapter 2) where pairwise correlations between multiple sets of variance components can be estimated in a double‑hierarchical mixed model framework. Using this model to analyse previously published data sets (Chapter 3) on two eukaryotic model systems, Saccharomyces cerevisiae and Caenorhabditis elegans, I find the mutation variance of a gene expression trait to be strongly correlated with its genetic variance and traits more susceptible to mutational input to also be more susceptible to environmental perturbation. This observation suggests that the properties that govern mutational variation are related to those that govern environmental variation in gene expression traits, as would be expected of the biochemical features of gene regulation.
I then proceed to collect and analyse a similar data set on a prokaryotic model system, Escherichia coli, whose gene regulatory network is extremely well characterised, allowing for the study of the network properties of gene expression traits on observed levels of genetic, mutational and environmental variation (Chapter 4). Specifically, I analyse the effects of regulatory complexity of traits, quantified as the number trans‑regulatory elements (including both sigma and transcription factors), and gene essentiality, quantified as the number of genes directly regulated by a gene product, which serve as proxies for the mutational (and environmental) target size and strength of selection acting on gene expression traits, respectively. Unfortunately, I could not detect any mutational variance in order to answer the proposed question, but did find that increased target size is associated with greater environmental perturbation.
The quantitative genetics of prokaryotes is mostly new territory, and this is one of the first quantitative genetic studies on a prokaryotic system. Importantly, I found the quantitative genetics of gene expression traits to be somewhat different to that of eukaryotes, with results, as well as the structure of the gene regulatory network, indicating that mutational target sizes must be considerably lower, which has important implications to expected rates of trait evolution. With the increasing recognition of the importance of quantitative trait variation to population dynamics, with important implications towards predicting epidemics, species coexistance and competition, we can largely benefit from a better understanding of prokaryotic quantitative traits
Development and application of a platform for harmonisation and integration of metabolomics data
Integrating diverse metabolomics data for molecular epidemiology analyses provides both opportuni- ties and challenges in the field of human health research. Combining patient cohorts may improve power and sensitivity of analyses but is challenging due to significant technical and analytical vari- ability. Additionally, current systems for the storage and analysis of metabolomics data suffer from scalability, query-ability, and integration issues that limit their adoption for molecular epidemiological research. Here, a novel platform for integrative metabolomics is developed, which addresses issues of storage, harmonisation, querying, scaling, and analysis of large-scale metabolomics data. Its use is demonstrated through an investigation of molecular trends of ageing in an integrated four-cohort dataset where the advantages and disadvantages of combining balanced and unbalanced cohorts are explored, and robust metabolite trends are successfully identified and shown to be concordant with previous studies.Open Acces
- …