407 research outputs found

    A haplotype information theory method reveals genes of evolutionary interest in European vs. Asian pigs

    Get PDF
    Asian and European wild boars were independently domesticated ca. 10,000 years ago. Since the 17th century, Chinese breeds have been imported to Europe to improve the genetics of European animals by introgression of favourable alleles, resulting in a complex mosaic of haplotypes. To interrogate the structure of these haplotypes further, we have run a new haplotype segregation analysis based on information theory, namely compression efficiency (CE). We applied the approach to sequence data from individuals from each phylogeographic region (n = 23 from Asia and Europe) including a number of major pig breeds. Our genome-wide CE is able to discriminate the breeds in a manner reflecting phylogeography. Furthermore, 24,956 non-overlapping sliding windows (each comprising 1,000 consecutive SNP) were quantified for extent of haplotype sharing within and between Asia and Europe. The genome-wide distribution of extent of haplotype sharing was quite different between groups. Unlike European pigs, Asian pigs haplotype sharing approximates a normal distribution. In line with this, we found the European breeds possessed a number of genomic windows of dramatically higher haplotype sharing than the Asian breeds. Our CE analysis of sliding windows capture some of the genomic regions reported to contain signatures of selection in domestic pigs. Prominent among these regions, we highlight the role of a gene encoding the mitochondrial enzyme LACTB which has been associated with obesity, and the gene encoding MYOG a fundamental transcriptional regulator of myogenesis. The origin of these regions likely reflects either a population bottleneck in European animals, or selective targets on commercial phenotypes reducing allelic diversity in particular genes and/or regulatory regions

    Networks of inbreeding coefficients in a selected population of rabbits

    Get PDF
    The correlation between pedigree and genomic‐based inbreeding coefficients is usually discussed in the literature. However, some of these correlations could be spurious. Using partial correlations and information theory, it is possible to distinguish a significant association between two variables which is independent from associations with a third variable. The objective of this study is to implement partial correlations and information theory to assess the relationship between different inbreeding coefficients using a selected population of rabbits. Data from pedigree and genomic information from a 200K SNP chip were available. After applying filtering criteria, the data set comprised 437 animals genotyped for 114,604 autosomal SNP. Fifteen pedigree‐ and genome‐based inbreeding coefficients were estimated and used to build a network. Recent inbreeding coefficient based on runs of homozygosity had 9 edges linking it with different inbreeding coefficients. Partial correlations and information theory approach allowed to infer meaningful associations between inbreeding coefficients and highlighted the importance of the recent inbreeding based on runs of homozygosity, but a good proxy of it could be those pedigree‐based definitions reflecting recent inbreeding.info:eu-repo/semantics/acceptedVersio

    Functional Analysis of Genomic Variation and Impact on Molecular and Higher Order Phenotypes

    Get PDF
    Reverse genetics methods, particularly the production of gene knockouts and knockins, have revolutionized the understanding of gene function. High throughput sequencing now makes it practical to exploit reverse genetics to simultaneously study functions of thousands of normal sequence variants and spontaneous mutations that segregate in intercross and backcross progeny generated by mating completely sequenced parental lines. To evaluate this new reverse genetic method we resequenced the genome of one of the oldest inbred strains of mice—DBA/2J—the father of the large family of BXD recombinant inbred strains. We analyzed ~100X wholegenome sequence data for the DBA/2J strain, relative to C57BL/6J, the reference strain for all mouse genomics and the mother of the BXD family. We generated the most detailed picture of molecular variation between the two mouse strains to date and identified 5.4 million sequence polymorphisms, including, 4.46 million single nucleotide polymorphisms (SNPs), 0.94 million intersections/deletions (indels), and 20,000 structural variants. We systematically scanned massive databases of molecular phenotypes and ~4,000 classical phenotypes to detect linked functional consequences of sequence variants. In majority of cases we successfully recovered known genotype-to-phenotype associations and in several cases we linked sequence variants to novel phenotypes (Ahr, Fh1, Entpd2, and Col6a5). However, our most striking and consistent finding is that apparently deleterious homozygous SNPs, indels, and structural variants have undetectable or very modest additive effects on phenotypes

    Whole-genome sequence analysis for pathogen detection and diagnostics

    Get PDF
    This dissertation focuses on computational methods for improving the accuracy of commonly used nucleic acid tests for pathogen detection and diagnostics. Three specific biomolecular techniques are addressed: polymerase chain reaction, microarray comparative genomic hybridization, and whole-genome sequencing. These methods are potentially the future of diagnostics, but each requires sophisticated computational design or analysis to operate effectively. This dissertation presents novel computational methods that unlock the potential of these diagnostics by efficiently analyzing whole-genome DNA sequences. Improvements in the accuracy and resolution of each of these diagnostic tests promises more effective diagnosis of illness and rapid detection of pathogens in the environment. For designing real-time detection assays, an efficient data structure and search algorithm are presented to identify the most distinguishing sequences of a pathogen that are absent from all other sequenced genomes. Results are presented that show these "signature" sequences can be used to detect pathogens in complex samples and differentiate them from their non-pathogenic, phylogenetic near neighbors. For microarray, novel pan-genomic design and analysis methods are presented for the characterization of unknown microbial isolates. To demonstrate the effectiveness of these methods, pan-genomic arrays are applied to the study of multiple strains of the foodborne pathogen, Listeria monocytogenes, revealing new insights into the diversity and evolution of the species. Finally, multiple methods are presented for the validation of whole-genome sequence assemblies, which are capable of identifying assembly errors in even finished genomes. These validated assemblies provide the ultimate nucleic acid diagnostic, revealing the entire sequence of a genome

    Phylogeography of Italian barbels (Cyprinidae, Barbus) inferred by mitochondrial and nuclear markers.

    Get PDF
    Species of the genus Barbus, being primary freshwater fishes intolerant of salt water, are of great value for biogeographic studies since their dispersal strictly depends on geological evolution of the landmasses (i.e. catchments watershed, mountain chains and fluctuations of sea level). In Italian peninsula four specie are formally recognized: B. caninus, B. balcanicus, B. plebejus and B. tyberinus. Their genetic relationships were assessed using both mitochondrial and nuclear markers. The study was carried out as first developing new nuclear primers for the S7 ribosomal protein and the Growth hormone genes (Gh); then performing a SNPs characterization of these loci on 18 populations (264 specimens in total). Results from nuclear sequences were then compared with those from partial sequences of the Cytochrome b mitochondrial gene (733 bp). Recovered phylogenies were congruent with the current morphology-based systematic and taxonomy. Results highlighted the close relationships between species belonging to the fluvio-lacustrine ecological group: B. plebejus and B. tyberinus and the high genetic distance between species belonging to the riverine group: B. caninus and B. balcanicus. Moreover findings were congruent with hypotheses of partial permeability of principal biogeographic barriers (Alpine and the Apennine chains) to freshwater fish fauna. Successively the influence of different ecological preferences on gene flow was tested for B. caninus and B. tyberinus on 6 and 7 populations respectively. Results pointed out that the riverine B. caninus has higher structured populations than B. tyberinus, probably due to the different dispersion ability and the different habitat colonized. Moreover, for the first time, molecular evidences were shown about hybridization events occurring between B. caninus and B. plebejus, B. tyberinus and B. barbus

    The role of visual adaptation in cichlid fish speciation

    Get PDF
    D. Shane Wright (1) , Ole Seehausen (2), Ton G.G. Groothuis (1), Martine E. Maan (1) (1) University of Groningen; GELIFES; EGDB(2) Department of Fish Ecology & Evolution, EAWAG Centre for Ecology, Evolution and Biogeochemistry, Kastanienbaum AND Institute of Ecology and Evolution, Aquatic Ecology, University of Bern.In less than 15,000 years, Lake Victoria cichlid fishes have radiated into as many as 500 different species. Ecological and sexual sel ection are thought to contribute to this ongoing speciation process, but genetic differentiation remains low. However, recent work in visual pigment genes, opsins, has shown more diversity. Unlike neighboring Lakes Malawi and Tanganyika, Lake Victoria is highly turbid, resulting in a long wavelength shift in the light spectrum with increasing depth, providing an environmental gradient for exploring divergent coevolution in sensory systems and colour signals via sensory drive. Pundamilia pundamila and Pundamilia nyererei are two sympatric species found at rocky islands across southern portions of Lake Victoria, differing in male colouration and the depth they reside. Previous work has shown species differentiation in colour discrimination, corresponding to divergent female preferences for conspecific male colouration. A mechanistic link between colour vision and preference would provide a rapid route to reproductive isolation between divergently adapting populations. This link is tested by experimental manip ulation of colour vision - raising both species and their hybrids under light conditions mimicking shallow and deep habitats. We quantify the expression of retinal opsins and test behaviours important for speciation: mate choice, habitat preference, and fo raging performance

    Comparative Phylogeographic, Population Genomic, and Selection Inference with Development of Hierarchical Co-Demographic Models

    Full text link
    Comparing demographic histories across assemblages of populations, species, and sister pairs has been a focus in phylogeography since its inception. Initial approaches utilized organelle genetic data and involved qualitative comparisons of genetic patterns for evaluating hypotheses of shared evolutionary responses to past environmental changes. This endeavor has progressed with coalescent model-based statistical techniques and advances in next-generation sequencing, yet there remains a need for methods that can analyze aggregated genomic-scale data from non-model organisms within a unified framework that considers individual taxon uncertainty and variance. To this end, the aggregate site frequency spectrum (aSFS), an expansion of the site frequency spectrum to exploit SNP data collected from multiple independent populations, and the aggregate joint site frequency spectrum (ajSFS), an extension of the aSFS for population-pairs, are introduced and explored here for the purpose of assemblage-level demographic inference. Furthermore, introduced and described here is the R package Multi-DICE, a wrapper program that exploits existing simulation software for straight-forward and flexible execution of hierarchical co-demographic model-based inference given either the aSFS or single-locus sequence data. These methodological developments were validated through a succession of in silico experiments that tested a range of sampling configurations, alternative inferential frameworks, and various prior specifications. Additionally, empirical demonstrations were conducted from published RAD-seq data of five threespine stickleback populations as well as eight local replicates of a lamprey species-pair. Synchronous demographic trajectories were detected for both of these analyses. Moreover, similar techniques were utilized to investigate LINE selection among population-level whole-genome vertebrate datasets. In brief, a null demographic background was inferred utilizing SNP data, which was then exploited to simulate a putative null distribution of summary statistics that was compared to LINE data for detecting selection. Subsequently, the null demographic model was leveraged to evaluate selection presence, directionality, and strength. There was a robust signal for purifying selection along with a pattern of LINE size affecting selection strength in two species. As large-scale SNP data become routine, the aSFS, Multi-DICE, ajSFS, and protocol employed here for detecting selection will collectively expand the potential for powerful comparative phylogeographic and population genomic inference
    corecore