108 research outputs found

    Bioinformatics tools for development of fast and cost effective simple sequence repeat (SSR), and single nucleotide polymorphisms (SNP) markers from expressed sequence tags (ESTs)

    Get PDF
    The development of current molecular biology techniques has led to the generation of huge amount of gene sequence information under the expressed sequence tag (EST) sequencing projects on a large number of plant species. This has opened a new era in crop molecular breeding with identification and/or development of a new class of useful DNA markers called genic molecular markers (GMMs). These markers represent the functional component of the genome in contrast to all other random DNA markers (RMMs). Many recent studies have demonstrated that GMMs may be superior to RMMs for use in the marker assisted selection, comparative mapping and exploration of functional genetic diversity in the germplasms adapted to different environment. Therefore, identification of DNA sequences which can be used as markers remains fundamental to the development of GMMs. Amongst others; bioinformatics approaches are very useful for development of molecular markers, making their development much faster and cheaper. Already, a number of computer programs have been implemented that aim at identifying molecular markers from sequence data. A revision of current bioinformatics tools for development of genic molecular markers is, therefore, crucial in this phase. This mini-review mainly provides an overview of different bioinformatics tools available and its use in marker development with particular reference to SNP and SSR markers.Keywords: Genic molecular marker, simple sequence repeat (SSR), and single nucleotide polymorphisms (SNP) markers from expressed sequence tags (ESTs).African Journal of Biotechnology Vol. 12(30), pp. 4713-472

    비교유전체학을 이용한 선충의 서브텔로미어 진화와 표현형 변이 연구

    Get PDF
    학위논문(박사)--서울대학교 대학원 :자연과학대학 생명과학부,2020. 2. 이준호.CB4856 계통의 유전체를 N2의 표준 유전체와 비교하였다. CB4856 유전체는 Pacific Biosciences (PacBio) 사의 RSII 기법을 활용해 염기서열 분석을 진행하였고(80×, N50 리드 길이 11.8 kb), 이후 유전체 이어붙이기 과정을 거쳐 염색체에 가까운 수준(76 contigs, N50 contig 2.8 Mb)으로 완성할 수 있었다. 두 유전체를 비교한 결과 2,694개 유전자에서 구조 변이를 확인할 수 있었고 그 중 상당수는 염색체 바깥쪽에 몰려있었다. 염색체 끝에 인접한 서브텔로미어(subtelomere) 지역은 가장 구조 변이가 심각한 지역으로, 그 중에는 새롭게 서브텔로미어가 생겨난 곳도 있었다. 5번 염색체 오른쪽의 서브텔로미어 구조는 CB4856 계통의 조상에서 텔로미어(telomere) 손상이 일어났고, 텔로머레이즈(telomerase) 유전자가 분명 존재했음에도 그 대신 대안적 텔로미어 연장(Alternative Lengthening of telomeres)을 통해 손상이 회복됐으며, 이후 절단 유도 복제(break-induced replication)이 일어나면서 새롭게 서브텔로미어가 형성됐다는 것을 암시하고 있다. 본 연구는 구조 변이와 새로운 서브텔로미어를 포함한 상당한 유전체 변화가 한 종 내에서도 유지될 수 있고, 이러한 변화가 종 내의 유전다양성을 높일 수 있다는 것을 보여준다. 다음으로, 예쁜꼬마선충의 근연종이면서도 성별(암수한몸, 암컷, 수컷)과 행동(튜브 닉테이션)에서 확연한 차이를 보이는 Auanema freiburgensis와 Auanema sp. APS14 두 종의 유전체 초안 또한 본 연구에서 분석됐다. A. freiburgensis와 Auanema sp. APS14의 유전체는 각각 PacBio RSII (270×, N50 리드 길이 12.5 kb)와 Oxford Nanopore Technologies (ONT) 사의 MinION (113×, N50 리드 길이 3.6 kb)을 통해 염기서열 이 분석됐으며, 유전체 이어붙이기 결과 예쁜꼬마선충(~100 Mb)에 비해 유전체 크기 또한 상당히 작다는 것(각각 55 Mb와 69 Mb) 또한 확인되었다. 이 두 유전체는 어떻게 유전체 내에 생긴 변화가 새로운 형질의 진화에 영향을 줄 수 있었을지 이해하는 데에 기여할 수 있을 것으로 내다본다.Long-read sequencing technologies have contributed greatly to comparative genomics among species and can also be applied to study genomics within a species. In this study, to determine how substantial genomic changes are generated and tolerated within a species, a C. elegans strain, CB4856, was sequenced which is one of the most genetically divergent strains compared to the N2 reference strain. For this comparison, the Pacific Biosciences (PacBio) RSII platform (80×, N50 read length 11.8 kb) was used and de novo genome assembly were generated to the level of pseudochromosomes containing 76 contigs (N50 contig = 2.8 Mb). I identified structural variations that affected as many as 2,694 genes, most of which are at chromosome arms. Subtelomeric regions contained the most extensive genomic rearrangements, which even created new subtelomeres in some cases. The subtelomere structure of Chromosome VR implies that ancestral telomere damage was repaired by alternative lengthening of telomeres even in the presence of a functional telomerase gene and that a new subtelomere was formed by break-induced replication. My study demonstrates that substantial genomic changes including structural variations and new subtelomeres can be tolerated within a species, and that these changes may accumulate genetic diversity within a species. Secondly, I also assembled draft genomes of two C. elegans relative species, Auanema freiburgensis and Auanema sp. APS14, which have and a distinct reproductive (three genders; male, female, and hermaphrodite) and behavioral repertoire (tube-nictation). A. freiburgensis and Auanema sp. APS14 were sequenced using the PacBio RSII (270×, N50 read length 12.5 kb) and the Oxford Nanopore Technologies (ONT) MinION platforms (113×, N50 read length 3.6 kb), respectively, and their reads were assembled as smaller genomes (55 and 69 Mb, respectively) compared to that of C. elegans (~100 Mb). Comparative genomic studies of these genomes will help understand how genomic changes in close relative species affect evolution of novel traits.Chapter 1. Introduction 1 Long-read sequencing and de novo genome assembly 2 Caenorhabditis and Caenorhabditis elegans as a model system for comparative genomics 2 Repetitive nature of subtelomere and the trace of alternative lengthening of telomeres (ALT) in subtelomeric regions 3 Phenotypic diversity in the genus Auanema 4 Purposes of the study 6 Materials and Methods 7 Chapter II. De novo genome assembly of the CB4856 genome and subtelomere evolution via past ALT events in C. elegans 17 Part I. De novo genome assembly of the CB4856 genome and structural variants compared to the reference strain, N2 18 Long-read sequencing and de novo assembly of the CB4856 genome 18 Long-read sequencing identified new structural variations 19 Part II. Subtelomere evolution via past ALT events in C. elegans 21 Long-read sequencing revealed the hypervariable nature of subtelomeres 21 The structure of Chr VR subtelomere is unique, in consequence of past ALT and BIR events 21 New genes in the subtelomeric region 22 Chapter III. Phenotypic characterization of Korean nematodes and draft genome assembly of two Auanema species 24 Korean nematode collection 25 Phenotypic diversification in the genus Auanema 25 Highly contiguous genome assembly using two long-read sequencing technologies 26 Chapter IV. Discussion 28 Enrichment of genetic variations in chromosome arms and subtelomeres by background selection and error-prone recombination 29 New subtelomere formation by ALT and BIR 30 References 78 Abstract in Korean 87 Acknowledgement 88Docto

    Fine-scale analysis of mechanisms and controlling factors in a meiotic recombination hotspot in dogs (canis familiaris)

    Get PDF
    Meiotic recombination re-shuffles genomes from one generation to the next. In humans and most other mammals, meiotic recombination events are clustered in 1-2 kb wide recombination hotspots, whose locations are determined in trans by the protein PR-domain containing 9 (PRDM9). Mice lacking PRDM9 direct recombination to promoters and functional elements, resulting in meiotic defects. Dogs (Canis familiaris) lack a functional copy of PRDM9, yet linkage data showed that historical recombination events cluster in functional elements, suggesting that there may be a mechanism enabling controlled recombination at these locations, and in the absence of PRDM9. However nothing is known about the de-novo activity of dog recombination hotspots and the patters of recombination resolution in this PRDM9 deficient species. I investigated a dog recombination hotspot for de-novo recombination events using pooled sperm typing, and uncovered high crossover frequencies affecting up to 1 % of sperm. Frequencies can differ by one order of magnitude between dogs. Fine-scale analysis of crossover-breakpoints revealed wide distributions of breaks across up to 10 kb within the hotspot region. I further detect asymmetric breakpoint distributions between crossover orientations and crossover-associated transmission distortion, suggesting biased recombination-initiation or -repair. This work is an elaborate fine-scale dissection of a mammalian PRDM9-independent active recombination hotspot

    Genomic insights into fine-scale recombination variation in adaptively diverging threespine stickleback fish (Gasterosteus aculeatus)

    Get PDF
    Meiotic recombination is one of the major molecular mechanisms generating genetic diversity and influencing genome evolution. By shuffling allelic combinations, it can directly influence the patterns and efficacy of natural selection. Studies in various organisms have shown that the rate and placement of recombination varies substantially within the genome, among individuals, between sexes and among different species. It is hypothesized that this variation plays an important role in genome evolution. In this PhD thesis, I investigated the extent and molecular basis of recombination variation in adaptively diverging threespine stickleback fish (Gasterosteus aculeatus) to further understand its evolutionary implications. I used both ChIP-sequencing and whole genome sequencing of pedigrees to empirically identify and quantify double strand breaks (DSBs) and meiotic crossovers (COs). Whole genome sequencing of large nuclear families was performed to identify meiotic crossovers in 36 individuals of diverging marine and freshwater ecotypes and their hybrids. This produced the first genome-wide high-resolution sex-specific and ecotype-specific map of contemporary recombination events in sticklebacks. The results show striking differences in crossover number and placement between sexes. Females recombine nearly 1.76 times more than males and their COs are distributed all over the chromosome while male COs predominantly occur near the chromosomal periphery. When compared among ecotypes a significant reduction in overall recombination rate was observed in hybrid females compared to pure forms. Even though the known loci underlying marine-freshwater adaptive divergence tend to fall in regions of low recombination, considerable female recombination is observed in the regions between adaptive loci. This suggests that the sexual dimorphism in recombination phenotype may have important evolutionary implications. At the fine-scale, COs and male DSBs are nonrandomly distributed involving ‘semi-hot’ hotspots and coldspots of recombination. I report a significant association of male DSBs and COs with functionally active open chromatin regions like gene promoters, whereas female COs did not show an association more than expected by chance. However, a considerable number of COs and DSBs away from any of the tested open chromatin marks suggests possibility of additional novel mechanisms of recombination regulation in sticklebacks. In addition, we developed a novel method for constructing individualized recombination maps from pooled gamete DNA using linked read sequencing technology by 10X Genomics®. We tested the method by contrasting recombination profiles of gametic and somatic tissue from a hybrid mouse and stickleback fish. Our pipeline faithfully detects previously described recombination hotspots in mice at high resolution and identify many novel hotspots across the genome in both species and thereby demonstrate the efficiency of the novel method. This method could be employed for large scale QTL mapping studies to further understand the genetic basis of recombination variation reported in this thesis. By bridging the gap between natural populations and lab organisms with large clutch sizes and tractable genetic tools, this work shows the utility of the stickleback system and provides important groundwork for further studies of heterochiasmy and divergence in recombination during adaptation to differing environments

    Strukturell variasjon som påvirker genetisk miljøtilpasning i laksefisk

    Get PDF
    Structural variations (SVs), e.g. deletions, insertions, inversions and duplications of sequences, are a major source of genomic variation affecting more base pairs in the genome than single nucleotide polymorphisms (SNPs). Despite their increasingly recognised importance in adaptive evolution and species diversification, SVs are vastly understudied in most species. Long-read sequencing, together with recently developed bioinformatic tools, have provided step-change improvements in the precision and recall of SV detection and allow us to increase the detected SVs manyfold across the species range. In addition, long-reads represent a major shift in our ability to build continuous genome assemblies as fundamental resources for most genome wide studies. The work in this thesis utilises long-read data to generate multiple genome sequences for the two salmonid species Atlantic salmon (Salmo salar) and lake whitefish (Coregonus clupeaformis). We present the first pan-genome for Atlantic salmon, comprising 11 long-read-based assemblies across the species range. Among these, the highest quality genome has 2.55 Gbp assembled into chromosome sequences, 259 Mbp more sequence than in the previous Atlantic salmon reference genome. The genome has a highly improved continuity with contig N50 increasing from 58 kbp to 28.06 Mbp (484-fold). The detection of SVs in these 11 individuals, revealed 1,061,452 SVs, with an average of ~77.4 Mbp of sequence differing per sample. The Atlantic salmon has adapted to different river environment across a large geographical distribution. To investigate genomic variation underlying these adaptations, we associated SVs and environmental data in a dataset of 366 short-read samples genotyped using genome graph analyses. These analyses highlighted multiple SVs contributing to environmental adaptations, including an 18 kbp deletion encompassing a polymorphic segmental duplication of three genes associated with annual precipitation. Next, we use the Atlantic salmon pan-genome to study the emergence of supergenes. Because supergenes can be maintained over millions of years by balancing selection and typically exhibit strong recombination suppression, their underlying functional variants and how they are formed are largely unknown. Inversions are type of rearrangement commonly associated with supergenes, and by directly comparing multiple highly continuous genome assemblies we were able to detect a number of large inversions in Atlantic salmon. A 3 Mb inversion, estimated to be ~15,000-year-old, and segregating in North American populations, displayed supergene signatures with adaptive variation captured within the standard arrangement of the inversion, as well as other adaptive variation accumulating after the inversion occurred. Characterization of other inversions with matched repeat structures at the breakpoints did not show any supergene signatures, suggesting that shared breakpoint repeats may obstruct the supergene formation. Lastly, we created long-read based genome assemblies for sympatric species pairs (Dwarf and Normal) belonging to lake whitefish (Coregonus clupeaformis). The species pairs offer a suitable model system for studying genomic patterns of differentiation and in particular the role of SVs in speciation. By combining long-reads, direct assembly, and short-read methods we detect 89,909 high-confidence SVs in the species pair across two lakes, covering five times more sequence in the genome compared to SNPs. In the study, we highlight shared outliers of differentiation between the lakes, indicating that they contribute to speciation. Interestingly, we find that more than 70% of SVs differentiating between the Normal and Dwarf species pairs of lake whitefish are overlapping transposable elements. This work demonstrates that SVs may play an important role for the differentiation and speciation of sympatric species pairs in lake whitefish.Strukturell variasjon (SVer), for eksempel delesjoner, insersjoner, inversjoner og duplikasjoner av sekvens, er en viktig kilde til genomisk variasjon som samplet sett påvirker flere basepar i genomet enn punktmutasjoner (SNPs). Til tross for en økende annerkjennelse for at SVer spiller en viktig rolle i genetisk tilpassing til ulikt miljø og artsdannelse har denne typen variasjon vært lite studert i mange arter. Ny DNA-sekvenseringsteknologi med lengre leselengder (long-read sequencing), samt utvikling av nye bioinformatiske verktøy, har ført til drastiske forbedringer i deteksjonen av SVer. ‘Long-read’ sekvensering gjør det også mulig å lage mer komplette og sammenhengende genomsekvenser enn tidligere. I denne avhandlingen benytter vi oss av ‘long-read’ data til å lage flere genomsekvenser av høy kvalitet for to ulike laksefiskarter: Atlanterhavslaks (Salmo salar) og en Nordamerikansk type sik ‘lake whitefish’ (Coregonus clupeaformis). Her rapporterer vi det første pan-genomet for Atlanterhavslaks. Det består av 11 assemblier basert på ‘long- read’ sekvensering av individer fra fire ulike fylogeografiske grupper av villaks. Assembliet av høyest kvalitet inkluderer 2,55 Gbp sekvens i kromosomer, 259 Mbp mer enn det forrige referansegenomet til Atlanterhavslaks. I tillegg ble andelen sammenhengende sekvens, målt som contig N50, økt fra 58 kbp til 28,06 Mbp (484 ganger høyere). Vi fant 1.061.452 SVer på tvers av de 11 individene med ~77,4 Mbp gjennomsnittlig sekvensforskjell per prøve. Atlanterhavslaksen har over tid tilpasset miljøet i ulike elver. For å studere underliggende genetisk variasjon for denne tilpasningen assosierte vi SVer med ulike miljøvariabler i et datasett bestående av 366 ‘short-read’ sekvenserte prøver ved bruk av en genom-graf. Ved hjelp av disse analysene fant vi flere SVer som bidrar til miljøtilpasning, blant annet en 18 kbp lang delesjon som inneholder tre gener assosiert med mengden nedbør i området. Vi brukte så pan-genomet for Atlanterhavsaks til å studere dannelsen av ‘supergener’. Supergener er en sammenkobling av genetisk variasjon i koblingsulikevekt som for eksempel kan oppstå ved hjelp av store inversjoner. Her utnyttet vi 11 genomassemblier til å identifisere og karakterisere en rekke store inversjoner i Atlanterhavslaks. En av inversjonene på 3 Mbp, estimert til å være ~15.000 år gammel, viste signaturer for utvikling som supergen. For de andre inversjonene som var flankert av repetert DNA fant vi ikke karakteristiske trekk på supergener, noe som tyder på at det repetitive DNA forhindrer en dannelse av supergener. Til slutt lagde vi genomsekvenser for ulike former (‘Normal’ og ‘Dwarf’) av ‘lake whitefish’ (Coregonus clupeaformis) som lever i de samme innsjøene i Nord-Amerika. Genomsekvensene muliggjør studier av genomiske mekanismene bak artsdannelse i denne laksefisken. Ved å kombinere ‘long-read’ data, direkte sammenlikning av assemblier, og ‘short-read’ data fant vi 89,909 SVer som skilte de to formene av ‘lake whitefish’ i to innsjøer. SVene omfatter mer enn fem ganger flere basepar i genomet sammenlignet med SNPs. I studiet fant vi flere SVer med avvikende forekomst (‘outliers’) i de to formene av ‘lake whitefish’, noe som indikerer at disse SVene bidrar til artsdannelse. Videre fant vi at 70 % av SVene overlappet en form av repetert DNA kalt transposable elementer. Dette arbeidet understreker at SVer kan spille en viktig rolle for artsdannelse i ’lake whitefish’

    De novo mutations in canine evolution and disease

    Get PDF
    The domestic dog is an evolutionarily unique animal and has a special niche within genomics research. Since their domestication from the grey wolf, dogs have become one of the most phenotypically diverse living land animals. Man’s desire to create individuals with specialised morphological and behavioural traits has led to the development of over 400 recognised breeds. Dogs share a significant number of inherited disease phenotypes with humans and are regarded as valuable animal models for understanding evolution and disease. New mutations are the ultimate source of new phenotypic diversity and evolutionary change. They can also cause rare spontaneous genetic disorders and collectively, they make a significant contribution to disease burden in managed populations. To comprehensively understand the mechanisms of evolution and disease, discovering the rates of occurrence, type, and patterns of distribution of de novo mutations across the genome is essential. Until recently, the characteristics of de novo mutations could be inferred only using indirect or biased methods. With recent technological advancements, it is now possible to directly observe de novo mutations that occur in a single generation directly through parent-offspring sequencing studies. Whole genome sequencing provides the opportunity for genomic variants associated with rare diseases caused by spontaneous mutations to be identified directly. We are on the brink of the capacity to utilize these technologies more fully in the field of personal medicine. In this thesis, de novo germline mutations affecting the evolution and occurrence of disease in the dog are identified and characterised. The inspiration for this work stemmed from the extraordinary phenotypic diversity in the species and its close relationship to people

    Computational analysis of human genomic variants and lncRNAs from sequence data

    Get PDF
    The high-throughput sequencing technologies have been developed and applied to the human genome studies for nearly 20 years. These technologies have provided numerous research applications and have significantly expanded our knowledge about the human genome. In this thesis, computational methods that utilize sequence data to study human genomic variants and transcripts were evaluated and developed. Indel represents insertion and deletion, which are two types of common genomic variants that are widespread in the human genome. Detecting indels from human genomes is the crucial step for diagnosing indel related genomic disorders and may potentially identify novel indel makers for studying certain diseases. Compared with previous techniques, the high-throughput sequencing technologies, especially the next- generation sequencing (NGS) technology, enable to detect indels accurately and efficiently in wide ranges of genome. In the first part of the thesis, tools with indel calling abilities are evaluated with an assortment of indels and different NGS settings. The results show that the selection of tools and NGS settings impact on indel detection significantly, which provide suggestions for tool selection and future developments. In bioinformatics analysis, an indel’s position can be marked inconsistently on the reference genome, which may result in an indel having different but equivalent representations and cause troubles for downstream. This problem is related to the complex sequence context of the indels, for example, short tandem repeats (STRs), where the same short stretch of nucleotides is amplified. In the second part of the thesis, a novel computational tool VarSCAT was described, which has various functions for annotating the sequence context of variants, including ambiguous positions, STRs, and other sequence context features. Analysis of several high- confidence human variant sets with VarSCAT reveals that a large number of genomic variants, especially indels, have sequence features associated with STRs. In the human genome, not all genes and their transcripts are translated into proteins. Long non-coding ribonucleic acid (lncRNA) is a typical example. Sequence recognition built with machine learning models have improved significantly in recent years. In the last part of the thesis, several machine learning-based lncRNA prediction tools were evaluated on their predictions for coding potentiality of transcripts. The results suggest that tools based on deep learning identify lncRNAs best. Ihmisen genomivarianttien ja lncRNA:iden laskennallinen analyysi sekvenssiaineistosta Korkean suorituskyvyn sekvensointiteknologioita on kehitetty ja sovellettu ihmisen genomitutkimuksiin lähes 20 vuoden ajan. Nämä teknologiat ovat mahdollistaneet ihmisen genomin laaja-alaisen tutkimisen ja lisänneet merkittävästi tietoamme siitä. Tässä väitöstyössä arvioitiin ja kehitettiin sekvenssiaineistoa hyödyntäviä laskennallisia menetelmiä ihmisen genomivarianttien sekä transkriptien tutkimiseen. Indeli on yhteisnimitys lisäys- eli insertio-varianteille ja häviämä- eli deleetio-varianteille, joita esiintyy koko genomin alueella. Indelien tunnistaminen on ratkaisevaa geneettisten poikkeavuuksien diagnosoinnissa ja eri sairauksiin liittyvien uusien indeli-markkereiden löytämisessä. Aiempiin teknologioihin verrattuna korkean suorituskyvyn sekvensointiteknologiat, erityisesti seuraavan sukupolven sekvensointi (NGS) mahdollistavat indelien havaitsemisen tarkemmin ja tehokkaammin laajemmilta genomialueilta. Väitöstyön ensimmäisessä osassa indelien kutsumiseen tarkoitettuja laskentatyökaluja arvioitiin käyttäen laajaa valikoimaa indeleitä ja erilaisia NGS-asetuksia. Tulokset osoittivat, että työkalujen valinta ja NGS-asetukset vaikuttivat indelien tunnistukseen merkittävästi ja siten ne voivat ohjata työkalujen valinnassa ja kehitystyössä. Bioinformatiivisessa analyysissä saman indelin sijainti voidaan merkitä eri kohtiin referenssigenomia, joka voi aiheuttaa ongelmia loppupään analyysiin, kuten indeli-kutsujen arviointiin. Tämä ongelma liittyy sekvenssikontekstiin, koska variantit voivat sijoittua lyhyille perättäisille tandem-toistojaksoille (STR), jossa sama lyhyt nukleotidijakso on monistunut. Väitöstyön toisessa osassa kehitettiin laskentatyökalu VarSCAT, jossa on eri toimintoja, mm. monitulkintaisten sijaintitietojen, vierekkäisten alueiden ja STR-alueiden tarkasteluun. Luotettaviksi arvioitujen ihmisen varianttiaineistojen analyysi VarSCAT-työkalulla paljasti, että monien geneettisten varianttien ja erityisesti indelien ominaisuudet liittyvät STR-alueisiin. Kaikkia ihmisen geenejä ja niiden geenituotteita, kuten esimerkiksi ei-koodaavia RNA:ta (lncRNA) ei käännetä proteiiniksi. Koneoppimismenetelmissä ja sekvenssitunnistuksessa on tapahtunut huomattavaa parannusta viime vuosina. Väitöstyön viimeisessä osassa arvioitiin useiden koneoppimiseen perustuvien lncRNA-ennustustyökalujen ennusteita. Tulokset viittaavat siihen, että syväoppimiseen perustuvat työkalut tunnistavat lncRNA:t parhaiten

    Mapping and functional characterisation of the Atlantic salmon genome and its regulation of pathogen response

    Get PDF
    Atlantic salmon is a species of both scientific and economic importance, and Atlantic salmon farming is a highly profitable industry worldwide. One of the biggest challenges being faced by farms, which affects production efficiency and results in severe economic loss, is disease. In livestock production, one of the approaches taken to limit the impact of disease outbreaks is to selectively breed for improved resistance within farmed populations. Although traditional family-based resistance breeding programs have shown improvements in resistance to a variety of bacterial, viral and parasitic diseases on Atlantic salmon farms, response to selection can be slow. One way of increasing selection efficiency is through the incorporation of genetic markers into breeding programs, for marker-assisted or genomic selection. However, genomic resources for cultured aquatic species are sparse, and the generation of new and denser resources for use in selective breeding programs would be advantageous. The main focus of this thesis is the development of genomic resources in Atlantic salmon and the application of those resources to gain a better understanding of the salmon genome, particularly in the genetic basis of host resistance to infectious diseases. The first aim of this thesis was to develop improved genomic resources for Atlantic salmon, and to characterise the Atlantic salmon genome via construction and analysis of a SNP linkage map derived from RAD-Sequencing (RAD-Seq). Approximately 6,500 SNPs were assigned to 29 linkage groups, and ~1,800 male-segregating, and ~1,400 female-segregating SNPs were ordered and positioned. Overall map lengths and recombination ratios were relatively consistent between the sexes and across the linkage groups (~1:1.5, male:female). However, a substantial difference in the degree of marker clustering was seen between males and females, which is reflective of the difference in the positions of chiasmata between the two sexes. Using this map, ~4,000 Atlantic salmon reference genome contigs were assigned to a linkage group, and 112 contigs were assigned to multiple linkage groups, highlighting regions of homeology (large sections of duplicated chromosomal regions) within the salmon genome. Alignment of SNP-flanking sequences to the stickleback and rainbow trout genomes identified putative gene-associated SNPs and cross-species chromosomal orthologies, and provided evidence in support of the salmonid-specific genome duplication. In addition, based on this and other publically available RAD-Seq datasets, the utility of RAD-Seq-derived data from different species and laboratories for population genetics analyses was tested. Short RAD-Seq contigs in Atlantic salmon and nine other teleost fish were used to identify cross-species orthologous genomic relationships. Several thousands of orthologous RAD loci were identified across the species, with the number of RAD loci decreasing with evolutionary distance, as expected. Previously published broad-level relationships between orthologous chromosomes were confirmed. The identified cross-species orthologous RAD loci were used to estimate evolutionary relationships between the ten teleost fish species. Previously published relationships were recovered, suggesting that RAD-Seq data derived from different laboratories is useful for this purpose. The second aim was to characterise the genetic architecture of resistance to two viral diseases affecting Atlantic salmon production on farms: pancreas disease (PD), and infectious pancreatic necrosis (IPN). Using data and samples collected from a large population of salmon fry challenged with PD, a high heritability for resistance was estimated (h2 ~0.5), and four QTL were identified, on chromosomes 3, 4, 7 and 23. The QTL explaining the highest within-family variation for resistance was located on chromosome 3. This QTL has been confirmed in a population of post-smolts by an independent research group, highlighting the potential for its incorporation into breeding programs to improve PD resistance. For IPN, the major resistance QTL had previously been mapped to linkage group 21. However, the mutation(s) underlying this QTL effect and the consequences of these mutation(s) on the affected genes and relevant biological resistance mechanisms are unknown. To generate a list of candidate genes within the vicinity of the IPN QTL, QTL-linked DNA sequences were aligned to four model fish genomes. This identified two QTL-orthologous regions in each of the species, and gene order within these regions was highly conserved across species. Analysis of gene expression patterns between IPN resistant and susceptible salmon in a viral challenge experiment revealed that the five most significantly differentially-expressed genes mapped to the QTL-orthologous region on linkage group II of stickleback. Pathway enrichment analysis across all differentially-expressed genes suggests that biological pathways influencing viral infection stress response/entry/replication, cellular energy production and apoptosis may be involved in resistance during the initial stages of IPN virus (IPNV) infection. These results have provided the basis for further study of the putative involvement of these candidate genes and pathways in genetic resistance to IPNV. In summary, the results and resources presented in this thesis extend our current understanding of the salmon genome and the genetic basis of resistance to two viral diseases, and provide resources with the potential to be used in Atlantic salmon selective breeding programs to tackle disease outbreaks

    Investigating the potential role of recombination regulator PRDM9 in mitochondria /

    Get PDF
    PhD ThesisAt present, 805 mitochondrial DNA (mtDNA) deletions have been described. Short direct repeat regions of DNA flank many of these deletions, suggesting that specific regions of the mtDNA molecule have a susceptibility to deletion formation. Despite this, the exact underlying cellular mechanisms facilitating mtDNA deletions are unclear. PR domain 9 (PRDM9) is a meiotic-specific protein responsible for determining the site of recombination in the nuclear genome. Through its zinc finger repeat region, PRDM9 binds a specific DNA consensus sequence, and acts as a methyl transferase, opening chromatin for DNA crossover events to occur. This is of interest as mitochondrial DNA also contains PRDM9 binding motif sites. This thesis outlines the experimental steps taken to determine if PRDM9 has any involvement in mtDNA maintenance and viability. Firstly, an in silico approach was used to screen mtDNA sequences from 31,551 individuals for the presence of the PRDM9 binding motif, identifying multiple putative binding sites in and around known deletion forming flanking regions. In addition, population and phylogenetic stratification showed differential mtDNA binding motif patterns, potentially explaining the variable deletion frequencies between mtDNA haplogroups and populations. Secondly, to test the potential interaction between PRDM9 and mtDNA, complete genotyping of the PRDM9 zinc finger repeat region in a cohort of 48 mitochondrial single deletion patients and 50 healthy controls was performed. However, there was no association between PRDM9 haplotype and the formation of mtDNA deletions. Heterozygous individuals were significantly increased in the patient cohort compared to controls although no particular allele was associated with mtDNA deletion. Finally, PRDM9 protein levels were interrogated in cell lines and tissue samples. However, due to timing of expression it was not possible to reliably detect nascent protein using commercially available antibodies. To overcome this, stable cell lines overexpressing Flag-tagged PRDM9 were created. Low levels of PRDM9 expression were detected by immunoblotting indicating overexpression had worked but also indicating that PRDM9 turnover in cells is likely rapid. iv Given the data presented, and despite the presence of multiple putative PRDM9 binding sites in almost all mitochondrial genomes studied, we conclude that it is unlikely that PRDM9 has a significant effect on the maintenance of mtDNA. However, to the best of my knowledge this is the first stable PRDM9 overexpression model created and it has provided a unique insight into some of the functions of this protein

    Describing Genomic and Epigenomic Traits Underpinning Emerging Fungal Pathogens.

    Get PDF
    An unprecedented number of pathogenic fungi are emerging and causing disease in animals and plants, putting the resilience of wild and managed ecosystems in jeopardy. While the past decades have seen an increase in the number of pathogenic fungi, they have also seen the birth of new big data technologies and analytical approaches to tackle these emerging pathogens. We review how the linked fields of genomics and epigenomics are transforming our ability to address the challenge of emerging fungal pathogens. We explore the methodologies and bioinformatic toolkits that currently exist to rapidly analyze the genomes of unknown fungi, then discuss how these data can be used to address key questions that shed light on their epidemiology. We show how genomic approaches are leading a revolution into our understanding of emerging fungal diseases and speculate on future approaches that will transform our ability to tackle this increasingly important class of emerging pathogens
    corecore