321 research outputs found

    Strukturell variasjon som påvirker genetisk miljøtilpasning i laksefisk

    Get PDF
    Structural variations (SVs), e.g. deletions, insertions, inversions and duplications of sequences, are a major source of genomic variation affecting more base pairs in the genome than single nucleotide polymorphisms (SNPs). Despite their increasingly recognised importance in adaptive evolution and species diversification, SVs are vastly understudied in most species. Long-read sequencing, together with recently developed bioinformatic tools, have provided step-change improvements in the precision and recall of SV detection and allow us to increase the detected SVs manyfold across the species range. In addition, long-reads represent a major shift in our ability to build continuous genome assemblies as fundamental resources for most genome wide studies. The work in this thesis utilises long-read data to generate multiple genome sequences for the two salmonid species Atlantic salmon (Salmo salar) and lake whitefish (Coregonus clupeaformis). We present the first pan-genome for Atlantic salmon, comprising 11 long-read-based assemblies across the species range. Among these, the highest quality genome has 2.55 Gbp assembled into chromosome sequences, 259 Mbp more sequence than in the previous Atlantic salmon reference genome. The genome has a highly improved continuity with contig N50 increasing from 58 kbp to 28.06 Mbp (484-fold). The detection of SVs in these 11 individuals, revealed 1,061,452 SVs, with an average of ~77.4 Mbp of sequence differing per sample. The Atlantic salmon has adapted to different river environment across a large geographical distribution. To investigate genomic variation underlying these adaptations, we associated SVs and environmental data in a dataset of 366 short-read samples genotyped using genome graph analyses. These analyses highlighted multiple SVs contributing to environmental adaptations, including an 18 kbp deletion encompassing a polymorphic segmental duplication of three genes associated with annual precipitation. Next, we use the Atlantic salmon pan-genome to study the emergence of supergenes. Because supergenes can be maintained over millions of years by balancing selection and typically exhibit strong recombination suppression, their underlying functional variants and how they are formed are largely unknown. Inversions are type of rearrangement commonly associated with supergenes, and by directly comparing multiple highly continuous genome assemblies we were able to detect a number of large inversions in Atlantic salmon. A 3 Mb inversion, estimated to be ~15,000-year-old, and segregating in North American populations, displayed supergene signatures with adaptive variation captured within the standard arrangement of the inversion, as well as other adaptive variation accumulating after the inversion occurred. Characterization of other inversions with matched repeat structures at the breakpoints did not show any supergene signatures, suggesting that shared breakpoint repeats may obstruct the supergene formation. Lastly, we created long-read based genome assemblies for sympatric species pairs (Dwarf and Normal) belonging to lake whitefish (Coregonus clupeaformis). The species pairs offer a suitable model system for studying genomic patterns of differentiation and in particular the role of SVs in speciation. By combining long-reads, direct assembly, and short-read methods we detect 89,909 high-confidence SVs in the species pair across two lakes, covering five times more sequence in the genome compared to SNPs. In the study, we highlight shared outliers of differentiation between the lakes, indicating that they contribute to speciation. Interestingly, we find that more than 70% of SVs differentiating between the Normal and Dwarf species pairs of lake whitefish are overlapping transposable elements. This work demonstrates that SVs may play an important role for the differentiation and speciation of sympatric species pairs in lake whitefish.Strukturell variasjon (SVer), for eksempel delesjoner, insersjoner, inversjoner og duplikasjoner av sekvens, er en viktig kilde til genomisk variasjon som samplet sett påvirker flere basepar i genomet enn punktmutasjoner (SNPs). Til tross for en økende annerkjennelse for at SVer spiller en viktig rolle i genetisk tilpassing til ulikt miljø og artsdannelse har denne typen variasjon vært lite studert i mange arter. Ny DNA-sekvenseringsteknologi med lengre leselengder (long-read sequencing), samt utvikling av nye bioinformatiske verktøy, har ført til drastiske forbedringer i deteksjonen av SVer. ‘Long-read’ sekvensering gjør det også mulig å lage mer komplette og sammenhengende genomsekvenser enn tidligere. I denne avhandlingen benytter vi oss av ‘long-read’ data til å lage flere genomsekvenser av høy kvalitet for to ulike laksefiskarter: Atlanterhavslaks (Salmo salar) og en Nordamerikansk type sik ‘lake whitefish’ (Coregonus clupeaformis). Her rapporterer vi det første pan-genomet for Atlanterhavslaks. Det består av 11 assemblier basert på ‘long- read’ sekvensering av individer fra fire ulike fylogeografiske grupper av villaks. Assembliet av høyest kvalitet inkluderer 2,55 Gbp sekvens i kromosomer, 259 Mbp mer enn det forrige referansegenomet til Atlanterhavslaks. I tillegg ble andelen sammenhengende sekvens, målt som contig N50, økt fra 58 kbp til 28,06 Mbp (484 ganger høyere). Vi fant 1.061.452 SVer på tvers av de 11 individene med ~77,4 Mbp gjennomsnittlig sekvensforskjell per prøve. Atlanterhavslaksen har over tid tilpasset miljøet i ulike elver. For å studere underliggende genetisk variasjon for denne tilpasningen assosierte vi SVer med ulike miljøvariabler i et datasett bestående av 366 ‘short-read’ sekvenserte prøver ved bruk av en genom-graf. Ved hjelp av disse analysene fant vi flere SVer som bidrar til miljøtilpasning, blant annet en 18 kbp lang delesjon som inneholder tre gener assosiert med mengden nedbør i området. Vi brukte så pan-genomet for Atlanterhavsaks til å studere dannelsen av ‘supergener’. Supergener er en sammenkobling av genetisk variasjon i koblingsulikevekt som for eksempel kan oppstå ved hjelp av store inversjoner. Her utnyttet vi 11 genomassemblier til å identifisere og karakterisere en rekke store inversjoner i Atlanterhavslaks. En av inversjonene på 3 Mbp, estimert til å være ~15.000 år gammel, viste signaturer for utvikling som supergen. For de andre inversjonene som var flankert av repetert DNA fant vi ikke karakteristiske trekk på supergener, noe som tyder på at det repetitive DNA forhindrer en dannelse av supergener. Til slutt lagde vi genomsekvenser for ulike former (‘Normal’ og ‘Dwarf’) av ‘lake whitefish’ (Coregonus clupeaformis) som lever i de samme innsjøene i Nord-Amerika. Genomsekvensene muliggjør studier av genomiske mekanismene bak artsdannelse i denne laksefisken. Ved å kombinere ‘long-read’ data, direkte sammenlikning av assemblier, og ‘short-read’ data fant vi 89,909 SVer som skilte de to formene av ‘lake whitefish’ i to innsjøer. SVene omfatter mer enn fem ganger flere basepar i genomet sammenlignet med SNPs. I studiet fant vi flere SVer med avvikende forekomst (‘outliers’) i de to formene av ‘lake whitefish’, noe som indikerer at disse SVene bidrar til artsdannelse. Videre fant vi at 70 % av SVene overlappet en form av repetert DNA kalt transposable elementer. Dette arbeidet understreker at SVer kan spille en viktig rolle for artsdannelse i ’lake whitefish’

    Rates and patterns of indels in HIV-1 gp120 within and among hosts

    Get PDF
    Insertions and deletions (indels) in the HIV-1 gp120 variable loops modulate sensitivity to neutralizing antibodies and are therefore implicated in HIV-1 immune escape. However, the rates and characteristics of variable loop indels have not been investigated within hosts. Here, I report a within-host phylogenetic analysis of gp120 variable loop indels, with mentions to my preceding study on these indels among hosts. We processed longitudinally-sampled gp120 sequences collected from a public database (n = 11,265) and the Novitsky Lab (n=2,541). I generated time-scaled within-host phylogenies using BEAST, extracted indels by reconstructing ancestral sequences in Historian, and estimated variable loop indel rates by applying a Poisson-based model to indel counts and time data. Variable loop indel rates appeared higher within hosts than among hosts in subtype C. Our findings improve understanding of indel evolution in HIV-1 gp120 and enable the evaluation of models describing indels, which I present as work in progress

    Revisiting a pollen-transmitted ilarvirus previously associated with angular mosaic of grapevine

    Get PDF
    We report the characterization of a novel tri-segmented RNA virus infecting Mercurialis annua, a common crop weed and model species in plant science. The virus, named "Mercurialis latent virus" (MeLaV) was first identified in a mixed infection with the recently described Mercurialis orthotospovirus 1 (MerV1) on symptomatic plants grown in glasshouses in Lausanne (Switzerland). Both viruses were found to be transmitted by Thrips tabaci, which presumably help the inoculation of infected pollen in the case of MeLaV. Complete genome sequencing of the latter revealed a typical ilarviral architecture and close phylogenetic relationship with members of the Ilarvirus subgroup 1. Surprisingly, a short portion of MeLaV replicase was found to be identical to the partial sequence of grapevine angular mosaic virus (GAMV) reported in Greece in the early 1990s. However, we have compiled data that challenge the involvement of GAMV in angular mosaic of grapevine, and we propose alternative causal agents for this disorder. In parallel, three highly-conserved MeLaV isolates were identified in symptomatic leaf samples in The Netherlands, including a herbarium sample collected in 1991. The virus was also traced in diverse RNA sequencing datasets from 2013-2020, corresponding to transcriptomic analyses of M. annua and other plant species from five European countries, as well as metaviromics analyses of bees in Belgium. Additional hosts are thus expected for MeLaV, yet we argue that infected pollen grains have likely contaminated several sequencing datasets and may have caused the initial characterization of MeLaV as GAMV

    Charting genomic heterogeneity in tumours : from bulk to single cell

    Get PDF
    Tumours do not consist of a single homogeneous population but are complex heterogeneous systems that contain billions of ever-evolving cells with no two tumours being the same. Tumour heterogeneity is present at three levels, 1) inter-patient heterogeneity; 2) intra-patient heterogeneity; and 3) intra-tumour heterogeneity (ITH). Understanding all levels of heterogeneity is crucial for patient prognosis and treatment choice. To this end, we aimed to improve our understanding of all three levels of tumour heterogeneity. In paper I we investigated the prevalence, type, length, and genomic distribution of 853.218 somatic copy number alterations (SCNAs) across 20.249 tumours belonging to 32 cancer types. Based on the 1) number of SCNAs; 2) percentage of the genome altered; and 3) average SCNA size, we found high levels of inter-patient heterogeneity, both between and within cancer types. We found that specific chromosomes were preferentially lost or gained depending on cancer type. Lastly, we detected co-alterations of key oncogenes and TSGs. Taken together, we provided a comprehensive analysis on SCNAs across many cancer types as a valuable resource for the community. In paper II we sought to elucidate intra-patient heterogeneity in non-small cell lung cancer (NSCLC) and their matched brain metastasis (BM). We performed shallow wholegenome sequencing (WGS) on 51 primary NSCLC and matched BM, whole exome sequencing on 40 of the pairs, multi-region sequencing of 15 BMs, and shallow WGS on an additional cohort of 115 BMs. We showed that there is significant intra-patient heterogeneity at the SCNA level, with BM samples showing, on average, more SCNAs compared to their matched NSCLC. In contrast, multi-region sequencing of 15 BMs did not show significant ITH at the level of SCNAs. Finally, we identified putative metastatic driver SCNAs and singlenucleotide variants in key tumour suppressor genes (TSGs) and oncogenes. In paper III we aimed to assess the level of ITH in early localized prostate cancer. We performed organ-wide, multi-region, single-cell DNA sequencing on two prostate midsections. We found transient chromosomal instability (CIN) both in tumour and normal prostate tissue, evidenced by a large number of cells with unique chromosomal (arm) losses and or gains. Furthermore, we found three distinct groups of cells within the prostate: 1) diploid cells; 2) pseudo-diploid cells; and 3) monster cells. We observed an enrichment of diploid cells in normal regions and pseudo-diploid cells in tumour-rich regions, while monster cells were equally distributed over the entire prostate, again suggesting that there were elevated CIN levels across the prostate. Lastly, we detected highly localized subclones that were exclusive to tumour-rich regions and harboured deletions in TSGs that are known to be frequently deleted in prostate cancer. Taken together, with this thesis, I have contributed to advance the understanding of inter-patient, intra-patient, and intra-tumour heterogeneity

    Histone deacetylase inhibitors modulate human polyomavirus JC replication

    Get PDF
    M.S.M.S. Thesis. University of Hawaiʻi at Mānoa 201

    Finished Genome of the Fungal Wheat Pathogen Mycosphaerella graminicola Reveals Dispensome Structure, Chromosome Plasticity, and Stealth Pathogenesis.

    Get PDF
    The plant-pathogenic fungus Mycosphaerella graminicola (asexual stage: Septoria tritici) causes septoria tritici blotch, a disease that greatly reduces the yield and quality of wheat. This disease is economically important in most wheat-growing areas worldwide and threatens global food production. Control of the disease has been hampered by a limited understanding of the genetic and biochemical bases of pathogenicity, including mechanisms of infection and of resistance in the host. Unlike most other plant pathogens, M. graminicola has a long latent period during which it evades host defenses. Although this type of stealth pathogenicity occurs commonly in Mycosphaerella and other Dothideomycetes, the largest class of plant-pathogenic fungi, its genetic basis is not known. To address this problem, the genome of M. graminicolawas sequenced completely. The finished genome contains 21 chromosomes, eight of which could be lost with no visible effect on the fungus and thus are dispensable. This eight-chromosome dispensome is dynamic in field and progeny isolates, is different from the core genome in gene and repeat content, and appears to have originated by ancient horizontal transfer from an unknown donor. Synteny plots of the M. graminicola chromosomes versus those of the only other sequenced Dothideomycete, Stagonospora nodorum, revealed conservation of gene content but not order or orientation, suggesting a high rate of intra-chromosomal rearrangement in one or both species. This observed “mesosynteny” is very different from synteny seen between other organisms. A surprising feature of the M. graminicolagenome compared to other sequenced plant pathogens was that it contained very few genes for enzymes that break down plant cell walls, which was more similar to endophytes than to pathogens. The stealth pathogenesis of M. graminicola probably involves degradation of proteins rather than carbohydrates to evade host defenses during the biotrophic stage of infection and may have evolved from endophytic ancestors

    The dynamic Eukaryote Genome: Evolution, mobile DNA, and the TE-Thrust hypothesis

    Get PDF
    The discovery of transposable elements (TEs) by Barbara McClintock in the 1940s, triggered a new dawning in the development of evolutionary theory. However, similar to Gregor Mendel’s development of the laws of heredity in the nineteenth century, it was a long time before the full significance of this discovery was appreciated. Nevertheless, by the beginning of the 21st century, the study and recognition of TEs as significant factors in evolution was well underway. However, many evolutionary biologists still choose to ignore them, to highlight the loss of fitness in some individuals caused by TEs, or concentrate on the supposed parasitic nature of TEs, and the diseases they cause. The major concept and theme of this thesis is that the ubiquitous and extremely ancient transposable elements are not merely “junk DNA” or “selfish parasites” but are instead ‘powerful facilitators of evolution’. They can create genomic dynamism, and cause genetic changes of great magnitude and variety in the genotypes and phenotypes of eukaryotic lineages. A large variety of data are presented supporting the theme of TEs as very significant forces in evolution. This concept is formalised into a hypothesis, the TE-Thrust hypothesis, which explicitly presents detail of how TEs can facilitate evolution. This hypothesis opens the way to explaining otherwise inexplicable aspects of evolution, such as the mismatch between the phyletic gradualism theory, and the punctuated equilibrium concept, which is based on the fossil record. Data from the studies of many metazoans are analysed, with a focus on the well studied mammals, especially the primates. Data from the seed plants are also included, with a strong focus on Darwin’s ‘abominable mystery’, the rapid origin, and the extraordinary success of the flowering plants. TEs are ubiquitous and many of them are extremely ancient, probably dating back to the origin of the eukaryotes, and some are also found in prokaryotes. TEs can build, sculpt and reformat genomes by both active and passive means. Active TE-Thrust is due to transpositions by members of the TE consortium, or their retrotransposition of retrocopy genes, or by new acquisitions of TEs, or by the endogenisation of retroviruses, and other similar phenomena. Major results of this are that the promoters carried by TEs can result in very significant alterations in gene expression, and that sequences from the TEs themselves can become exapted or domesticated as novel genes. TEs can also cause exon shuffling, possibly building novel genes. Passive TE-Thrust is due to large homogenous consortia of inactive TEs that can act passively by causing ectopic recombination, resulting in genomic deletions, duplications, and possibly karyotypic changes. TE-Thrust often works together with other facilitators of evolution, such as point mutations, which can occur in duplicated, or retrocopy genes, sometimes resulting in new functions for such genes. A major concept in the TE-Thrust hypothesis is that although TEs are sometimes harmful to individuals, and can lower the fitness of a population, they endow the lineage of that population with adaptive potential and evolutionary potential. These are extremes of a continuum of intra-genomic potential, and are not separate entities. This adaptive/evolutionary potential due to the presence and activities of the TE consortium of the genomes in a lineage, greatly enhance the future survival prospects of the lineage, and its ability to undergo evolutionary transitions, and/or to radiate into a clade of multiple divergent lineages. Lineages may acquire a TE consortium by new infiltrations of TEs, either by horizontal transposon transfer, de novo synthesis, or endogenisation of retroviruses. Lineages lacking an effective TE consortium are likely to lack adaptive/evolutionary potential and could fail to diversify, become “living fossils”, or even become extinct, as many lineages ultimately do. The opposite of extinction is the fecund radiation of lineages, and it is shown here that fecund species-rich lineages such as rodents (Order Rodentia) and bats (Order Chiroptera) and the angiosperms, are all well endowed with many viable active TEs. The Simian Primates which have undergone major evolutionary transitions are also well endowed with viable and periodically active TEs, and/or large homogenous populations of TEs. Data on the “living fossils” such as the coelacanth and the tuatara are very limited, but indicate a lack of new acquisitions of TEs, and/or the mutational decay of ancient TE families in their genomes. Lineages are often in stasis, but a new acquisition of TEs, or other factors such as stress, hybridisation, or whole genome duplications (especially in angiosperms) may trigger a major burst of activity in the TE consortium, resulting in an evolutionary punctuation event. The TE-Thrust hypothesis thus offers an explanation for the punctuated equilibrium, frequently observed in the fossil record. There are many other known facilitators of evolution, such as point mutations, whole genome duplications, changes in allele frequency, epigenetic changes, symbiosis, hybridisation, simple sequence repeats, karyotypic changes, drift in small populations, allopatric and sympatric reproductive isolation, co-evolution, environmental and ecological changes, and so on. In addition, there may be some as yet unknown facilitators of evolution. However, TEs usually make up between 20 to 80 percent of the genomes of eukaryotes, as against one or two percent of coding genes, and are known to be able to make genomic modifications (“mutations”) that cannot be made by other facilitators of evolution. TEs also come in many superfamilies, and in thousands of families, which make up the mobile DNA of the earth’s biota. It is apparent then that their influence on, and facilitation of, eukaryotic evolution has been very significant indeed. In this thesis data are presented, which indicate that these ubiquitous and extremely ancient TEs are powerful facilitators of change, essential to the evolution of the earth’s biota. The TE-Thrust hypothesis, when fully explored, developed, and tested, if confirmed, must result in an extension to the Modern Synthesis, or even become a part of a new paradigm of evolutionary theory

    Distinct retroelement classes define evolutionary breakpoints demarcating sites of evolutionary novelty

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Large-scale genome rearrangements brought about by chromosome breaks underlie numerous inherited diseases, initiate or promote many cancers and are also associated with karyotype diversification during species evolution. Recent research has shown that these breakpoints are nonrandomly distributed throughout the mammalian genome and many, termed "evolutionary breakpoints" (EB), are specific genomic locations that are "reused" during karyotypic evolution. When the phylogenetic trajectory of orthologous chromosome segments is considered, many of these EB are coincident with ancient centromere activity as well as new centromere formation. While EB have been characterized as repeat-rich regions, it has not been determined whether specific sequences have been retained during evolution that would indicate previous centromere activity or a propensity for new centromere formation. Likewise, the conservation of specific sequence motifs or classes at EBs among divergent mammalian taxa has not been determined.</p> <p>Results</p> <p>To define conserved sequence features of EBs associated with centromere evolution, we performed comparative sequence analysis of more than 4.8 Mb within the tammar wallaby, <it>Macropus eugenii</it>, derived from centromeric regions (CEN), euchromatic regions (EU), and an evolutionary breakpoint (EB) that has undergone convergent breakpoint reuse and past centromere activity in marsupials. We found a dramatic enrichment for long interspersed nucleotide elements (LINE1s) and endogenous retroviruses (ERVs) and a depletion of short interspersed nucleotide elements (SINEs) shared between CEN and EBs. We analyzed the orthologous human EB (14q32.33), known to be associated with translocations in many cancers including multiple myelomas and plasma cell leukemias, and found a conserved distribution of similar repetitive elements.</p> <p>Conclusion</p> <p>Our data indicate that EBs tracked within the class Mammalia harbor sequence features retained since the divergence of marsupials and eutherians that may have predisposed these genomic regions to large-scale chromosomal instability.</p

    Molecular analysis of South African ovine herpesvirus 2 strains based on selected glycoprotein and tegument genes

    Get PDF
    Ovine herpesvirus 2 (OvHV-2), is the causative agent of sheep-associated malignant catarrhal fever (SA-MCF), a generally fatal disease of cattle and other captive wild ruminants. Information on the OvHV-2 strains circulating in South Africa (SA) and other African countries with regard to genetic structure and diversity, and pattern of distribution is not available. This study aimed to characterize the OvHV-2 strains circulating in SA using selected genes encoding glycoproteins and tegument proteins. To establish the genetic diversity of OvHV-2 strains, four genes, Ov 7, Ov 8 ex2, ORF 27 and ORF 73 were selected for analysis by PCR and DNA sequencing. Nucleotide and amino acid multiple sequence analyses revealed two genotypes for ORF 27 and ORF 73, and three genotypes for Ov 7 and Ov 8 ex2, randomly distributed throughout the regions. Ov 7 and ORF 27 nucleotide sequence analysis revealed variations that distinguished SA genotypes from those of reference OvHV-2 strains. Epitope mapping analysis showed that mutations identified from the investigated genes are not likely to affect the functions of the gene products, particularly those responsible for antibody binding activities associated with B-cell epitopes. Knowledge of the extent of genetic diversity existing among OvHV-2 strains has provided an understanding on the distribution patterns of OvHV-2 strains or genotypes across the regions of South Africa. This can facilitate the management of SA-MCF in SA, in terms of introduction of control measures or safe practices to monitor and control OvHV-2 infection. The products encoded by the Ov 7, Ov 8 ex2 and ORF 27 genes are recommended for evaluation of their coded proteins as possible antigens in the development of an OvHV-2 specific serodiagnostic assay.S1 Table. Average sequence identities determined for the Ov 7 nucleotide and amino sequences obtained between South African Ov 7 sequences compared to reference sequences.S2 Table. Average sequence identities determined for the Ov 8 ex2 nucleotide (Figure A) and derived amino acid (Figure B) sequences obtained between South African OvHV-2 strains compared to reference strains.S3 Table. Average sequence identities determined for the ORF 27 nucleotide and derived amino acid sequences obtained between South African OvHV-2 strains compared to reference strains.S4 Table. Average sequence identities for the ORF 73 nucleotide and derived amino acid sequences obtained between South African OvHV-2 strains compared to reference strains.This work was supported by Department of Science and Technology (DST) for research funding; Meat Industry Trust (MIT) for payment of the University tuitions.http://www.plosone.orgam2017Veterinary Tropical Disease
    corecore