27 research outputs found

    Genealogy Reconstruction: Methods and applications in cancer and wild populations

    Get PDF
    Genealogy reconstruction is widely used in biology when relationships among entities are studied. Phylogenies, or evolutionary trees, show the differences between species. They are of profound importance because they help to obtain better understandings of evolutionary processes. Pedigrees, or family trees, on the other hand visualize the relatedness between individuals in a population. The reconstruction of pedigrees and the inference of parentage in general is now a cornerstone in molecular ecology. Applications include the direct infer- ence of gene flow, estimation of the effective population size and parameters describing the population’s mating behaviour such as rates of inbreeding. In the first part of this thesis, we construct genealogies of various types of cancer. Histopatho- logical classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. We introduce a novel algorithm to rank tumor subtypes according to the dis- similarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia and liposarcoma subtypes and then apply it to a broader group of sarcomas and of breast cancer subtypes. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors. In contrast to asexually reproducing cancer cell populations, pedigrees of sexually reproduc- ing populations cannot be represented by phylogenetic trees. Pedigrees are directed acyclic graphs (DAGs) and therefore resemble more phylogenetic networks where reticulate events are indicated by vertices with two incoming arcs. We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatel- lites and single nucleotide polymorphism (SNPs) in the second part of the thesis. If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical data set with known pedigree. The parentage inference is robust even in the presence of genotyping errors. We further demonstrate the accuracy of the algorithm on simulated clonal populations. We show that the joint estimation of parameters of inter- est such as the rate of self-fertilization or clonality is possible with high accuracy even with marker panels of moderate power. Classical methods can only assign a very limited number of statistically significant parentages in this case and would therefore fail. The method is implemented in a fast and easy to use open source software that scales to large datasets with many thousand individuals.:Abstract v Acknowledgments vii 1 Introduction 1 2 Cancer Phylogenies 7 2.1 Introduction..................................... 7 2.2 Background..................................... 9 2.2.1 PhylogeneticTrees............................. 9 2.2.2 Microarrays................................. 10 2.3 Methods....................................... 11 2.3.1 Datasetcompilation ............................ 11 2.3.2 Statistical Methods and Analysis..................... 13 2.3.3 Comparison of our methodology to other methods . . . . . . . . . . . 15 2.4 Results........................................ 16 2.4.1 Phylogenetic tree reconstruction method. . . . . . . . . . . . . . . . . 16 2.4.2 Comparison of tree reconstruction methods to other algorithms . . . . 28 2.4.3 Systematic analysis of methods and parameters . . . . . . . . . . . . . 30 2.5 Discussion...................................... 32 3 Wild Pedigrees 35 3.1 Introduction..................................... 35 3.2 The molecular ecologist’s tools of the trade ................... 36 3.2.1 3.2.2 3.2.3 3.2.1 Sibship inference and parental reconstruction . . . . . . . . . . . . . . 37 3.2.2 Parentage and paternity inference .................... 39 3.2.3 Multigenerational pedigree reconstruction . . . . . . . . . . . . . . . . 40 3.3 Background..................................... 40 3.3.1 Pedigrees .................................. 40 3.3.2 Genotypes.................................. 41 3.3.3 Mendelian segregation probability .................... 41 3.3.4 LOD Scores................................. 43 3.3.5 Genotyping Errors ............................. 43 3.3.6 IBD coefficients............................... 45 3.3.7 Bayesian MCMC.............................. 46 3.4 Methods....................................... 47 3.4.1 Likelihood Model.............................. 47 3.4.2 Efficient Likelihood Calculation...................... 49 3.4.3 Maximum Likelihood Pedigree ...................... 51 3.4.4 Full siblings................................. 52 3.4.5 Algorithm.................................. 53 3.4.6 Missing Values ............................... 56 3.4.7 Allelefrequencies.............................. 58 3.4.8 Rates of Self-fertilization.......................... 60 3.4.9 Rates of Clonality ............................. 60 3.5 Results........................................ 61 3.5.1 Real Microsatellite Data.......................... 61 3.5.2 Simulated Human Population....................... 62 3.5.3 SimulatedClonalPlantPopulation.................... 64 3.6 Discussion...................................... 71 4 Conclusions 77 A FRANz 79 A.1 Availability ..................................... 79 A.2 Input files...................................... 79 A.2.1 Maininputfile ............................... 79 A.2.2 Knownrelationships ............................ 80 A.2.3 Allele frequencies.............................. 81 A.2.4 Sampling locations............................. 82 A.3 Output files..................................... 83 A.4 Web 2.0 Interface.................................. 86 List of Figures 87 List of Tables 88 List Abbreviations 90 Bibliography 92 Curriculum Vitae

    Present and past climatic effects on the current distribution and genetic diversity of the Iberian spadefoot toad (Pelobates cultripes): an integrative approach

    Get PDF
    Aim: Predicting species responses to global change is one of the most pressing issues in Conservation Biogeography. A key part of the problem is understanding how organisms have reacted to climatic changes in the past. Here we use species distribution modelling to infer the effects of climate changes since the Last Interglacial (LIG, about 130,000 ybp) on patterns of genetic structure and diversity in the Western Spadefoot toad (Pelobates cultripes) in combination with spatially-explicit phylogeographic analyses. Location: Iberian Peninsula and mainland France. Methods: 524 individuals from 54 populations across the species range were sampled to document patterns of genetic diversity and infer their evolutionary history based on data from mtDNA and fourteen polymorphic microsatellites. Generalized linear models based on distribution data were used to infer climatic favourability for the species in the present and in projected scenarios for the LIG, the Mid Holocene and the last glacial maximum (LGM). Results: Estimates of genetic diversity show a decreasing trend from south to north, suggesting persistence of higher historical population sizes in the southern Iberian Peninsula. Species distribution models show differences in climatic favourability through time, with significant correlations between historically stable favourable areas and current patterns of genetic diversity. These results are corroborated by Bayesian Skyline Plots and continuous diffusion phylogeographic analyses. Main conclusions: The results indicate the presence of southern refugia, with moderate recent expansions at the northern end of the species’ range. Populations at the northern range margin exhibit the lowest genetic diversity and occupy historically unstable areas, classified as marginal in terms of favourability, rendering them most vulnerable to climate-mediated changes in the medium to long term

    Efficient Computational Genetics Methods for Multiparent Crosses

    Get PDF
    Multiparent crosses are genetic populations bred in a controlled manner from a finite number of known founders. They represent experimental resources that are of potentially great value for understanding the genetic basis of complex diseases. An important new experimental technology that can be applied to multiparent crosses, namely high-throughput sequencing, generates an immense amount of data and provides unprecedented opportunities to study genetics at a ultra high resolution. However, to take advantage of such massive data, several computational genetics problems have to be resolved. These include RNA-Seq assembly and quantification, QTL mapping, and haplotype effect estimation. In order to tackle these problems, which are highly connected to each other, I propose a series of methods: GeneScissors is a novel method to detect errors caused by multiple alignments in the RNA-Seq; RNA-Skim can rapidly quantify RNA-Seq data while still provide reliable results; HTreeQA is designed as a phylogeny based QTL mapping method for genotypes with heterozygou sites; and Diploffect estimates founder effects with statistically valid interval estimates in multiparent crosses. These methods are extensively studied on both simulated and real data. These studies demonstrate that the proposed methods can make data analysis of multiparent crosses more effective and efficient and produce results are more accurate and trustworthy than a number of existing alternative methods.Doctor of Philosoph

    Struktur och storlek hos två björnpopulationer (Ursos arctos) från SNP-baserade släktträdsanalyser

    Get PDF
    Reliable population estimates are an important aspect of sustainable wildlife management but usually difficult to obtain for rare and elusive large carnivores. I tested a new method developed by Creel and Rosenblatt (2013) to estimate the population size of two Swedish brown bear (Ursus arctos) populations. The Creel-Rosenblatt estimator (CRE) projects beyond the count of genotypes by including individuals that were inferred from the pedigree as well as undetected individuals into the population estimates. Using a recently developed panel of 96 single nucleotide polymorphisms (SNPs), hunter-collected fecal samples were genotyped for reconstructing pedigrees. Based on 434 genotypes from Dalarna-Gävleborg and 265 from Västerbotten, the CRE population estimates

    Age‐ and sex‐dependent variation in relatedness corresponds to reproductive skew, territory inheritance and workload in cooperatively breeding cichlids

    Get PDF
    Kin selection plays a major role in the evolution of cooperative systems. However, many social species exhibit complex within-group relatedness structures, where kin selection alone cannot explain the occurrence of cooperative behavior. Understanding such social structures is crucial to elucidate the evolution and maintenance of multi-layered cooperative societies. In lamprologine cichlids, intragroup relatedness seems to correlate positively with reproductive skew, suggesting that in this clade dominants tend to provide reproductive concessions to unrelated subordinates to secure their participation in brood care. We investigate how patterns of within-group relatedness covary with direct and indirect fitness benefits of cooperation in a highly social vertebrate, the cooperatively breeding, polygynous lamprologine cichlid Neolamprologus savoryi. Behavioral and genetic data from 43 groups containing 578 individuals show that groups are socially and genetically structured into subgroups. About 17% of group members were unrelated immigrants, and average relatedness between breeders and brood care helpers declined with helper age due to group membership dynamics. Hence the relative importance of direct and indirect fitness benefits of cooperation depends on helper age. Our findings highlight how both direct and indirect fitness benefits of cooperation and group membership can select for cooperative behavior in societies comprising complex social and relatedness structures

    Evolutionary patterns and processes in amphibians from the Iberian peninsula: a comparative and multi-scale perspective

    Full text link
    Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología. Fecha de lectura: 29-06-2017Esta tesis tiene embargado el acceso al texto completo hasta el 29-12-201

    The iridescent enigma: genome evolution and species boundaries of the blue-ringed octopus species complex (Octopodidae: Hapalochlaena)

    Get PDF
    Brooke Whitelaw examined the evolution of the blue-ringed octopus genus (Octopodidae: Hapalochlaena) species complex. The current state of Hapalochlaena systematics was revealed to be insufficient for the species diversity observed. Furthermore, evolution of the Hapalochlaena genome revealed distinct differences to non-tetrodotoxin (TTX) bearing octopod genomes. This work provides a genetic basis for systematic re-evaluation of the genus, in conjunction with an annotated genome and linkage map for H. maculosa

    Gene Dispersal In Tropical Trees: Ecological Processes And Genetic Consequences.

    Full text link
    Tropical trees constitute an ecologically important functional group in terrestrial ecosystems because of the essential roles that they play in sustaining biodiversity and carbon storage. The persistence and evolutionary potentials of tropical trees are, however, increasingly threatened by human-induced rapid changes in abiotic and biotic environments. For long-lived forest trees, gene dispersal by seeds and pollen is critical for tracking shifting climatic niches and for maintaining genetic variation needed to adapt to changing environments. Understanding the potential responses of tropical trees to environmental changes depends in part upon quantifying the rates of seed and pollen dispersal. This dissertation aims to quantify the spatial extent and magnitude of seed and pollen dispersal and their respective genetic impacts in a comparative context, by focusing on four Neotropical tree species that have distinct dispersal and pollination syndromes and life-history strategies. By using parentage inference and inverse modeling, I found that long-distance gene dispersal by seeds is common in these vertebrate-dispersed tropical trees, in which models predicted 1–18% of dispersal events exceeding 1 km. This fraction of pollen dispersal >1 km could reach 10–20% in these species. Furthermore, simulations with gene dispersal distances realistically represented suggest that seed and pollen dispersal limitation can lead to genetic diversity loss in tropical tree populations. By examining the respective genetic impacts of seed vs. pollen dispersal, I found that seed dispersal is the primary force driving spatial genetic patterns in these species. It suggests that the functional loss of seed-dispersing vertebrates, as a result of anthropogenic disturbance in tropical forests, could alter not only tree population spatial structure and ecological dynamics, but also genetic structure and evolutionary dynamics.PHDEcology and Evolutionary BiologyUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113619/1/weina_1.pd

    Recent identity by descent in human genetic data - methods and applications

    Get PDF
    The thesis describes algorithms for detecting regions of recent identity by descent (IBD) from human genetic data and its applications in optimising resequencing studies, genomic predictions and detecting Mendelian subtypes of diseases. Firstly, we describe the algorithm ANCHAP, which scans pairs of multi-point SNP genotypes for sharing IBD of long haplotypes. A comparison with other methods shows that ANCHAP outperforms them in terms of speed or accuracy. We demonstrate the algorithm on data from population isolates - from Orcades, Croatian islands, and from a population of unrelated individuals. We compare the abundance of IBD segments between cohorts, and identify genetic regions where IBD is most common. Secondly, we verify the IBD regions detected from array data against exome sequence data. We estimate that where sharing IBD between a pair of individuals is inferred, this is confirmed by exome data in 98% of cases. Correctness of IBD detection varies with settings of ANCHAP, length of IBD segments, and position with respect to segment endpoints. We find that with sample sizes of 1000 individuals from an isolated population genotyped using a dense SNP array, and with 20% of these individuals sequenced, 65% of sequences of the un-sequenced subjects can be partially inferred. Implementation of such resequencing strategies requires an IBD-based imputation algorithm, which is outlined. Thirdly, we use recent IBD to detect carriers of Mendelian subtypes of colon cancer. We show this with the example of Lynch syndrome, which accounts for about 3% of colon cancer patients. We detect IBD sharing between known and unknown carriers around DNA mismatch-repair genes. Using the IBD relationship, we build and evaluate a model that predicts presence of Lynch Syndrome mutations. Finally, we discuss whether regions of identity by descent can be used for genomic predictions. We conclude that the utility of the inferred IBD regions depends on accuracy of detection, time to most recent common ancestors and mutation rates since
    corecore