1,266 research outputs found

    Fast and scalable inference of multi-sample cancer lineages.

    Get PDF
    Somatic variants can be used as lineage markers for the phylogenetic reconstruction of cancer evolution. Since somatic phylogenetics is complicated by sample heterogeneity, novel specialized tree-building methods are required for cancer phylogeny reconstruction. We present LICHeE (Lineage Inference for Cancer Heterogeneity and Evolution), a novel method that automates the phylogenetic inference of cancer progression from multiple somatic samples. LICHeE uses variant allele frequencies of somatic single nucleotide variants obtained by deep sequencing to reconstruct multi-sample cell lineage trees and infer the subclonal composition of the samples. LICHeE is open source and available at http://viq854.github.io/lichee

    FRANz: reconstruction of wild multi-generation pedigrees

    Get PDF
    Summary: We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatellites and single nucleotide polymorphisms (SNPs). If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov Chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical dataset with known pedigree. The parentage inference is robust even in the presence of genotyping errors

    Fast half-sibling population reconstruction: theory and algorithms

    Get PDF

    Genealogy Reconstruction: Methods and applications in cancer and wild populations

    Get PDF
    Genealogy reconstruction is widely used in biology when relationships among entities are studied. Phylogenies, or evolutionary trees, show the differences between species. They are of profound importance because they help to obtain better understandings of evolutionary processes. Pedigrees, or family trees, on the other hand visualize the relatedness between individuals in a population. The reconstruction of pedigrees and the inference of parentage in general is now a cornerstone in molecular ecology. Applications include the direct infer- ence of gene flow, estimation of the effective population size and parameters describing the population’s mating behaviour such as rates of inbreeding. In the first part of this thesis, we construct genealogies of various types of cancer. Histopatho- logical classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. We introduce a novel algorithm to rank tumor subtypes according to the dis- similarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia and liposarcoma subtypes and then apply it to a broader group of sarcomas and of breast cancer subtypes. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors. In contrast to asexually reproducing cancer cell populations, pedigrees of sexually reproduc- ing populations cannot be represented by phylogenetic trees. Pedigrees are directed acyclic graphs (DAGs) and therefore resemble more phylogenetic networks where reticulate events are indicated by vertices with two incoming arcs. We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatel- lites and single nucleotide polymorphism (SNPs) in the second part of the thesis. If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical data set with known pedigree. The parentage inference is robust even in the presence of genotyping errors. We further demonstrate the accuracy of the algorithm on simulated clonal populations. We show that the joint estimation of parameters of inter- est such as the rate of self-fertilization or clonality is possible with high accuracy even with marker panels of moderate power. Classical methods can only assign a very limited number of statistically significant parentages in this case and would therefore fail. The method is implemented in a fast and easy to use open source software that scales to large datasets with many thousand individuals.:Abstract v Acknowledgments vii 1 Introduction 1 2 Cancer Phylogenies 7 2.1 Introduction..................................... 7 2.2 Background..................................... 9 2.2.1 PhylogeneticTrees............................. 9 2.2.2 Microarrays................................. 10 2.3 Methods....................................... 11 2.3.1 Datasetcompilation ............................ 11 2.3.2 Statistical Methods and Analysis..................... 13 2.3.3 Comparison of our methodology to other methods . . . . . . . . . . . 15 2.4 Results........................................ 16 2.4.1 Phylogenetic tree reconstruction method. . . . . . . . . . . . . . . . . 16 2.4.2 Comparison of tree reconstruction methods to other algorithms . . . . 28 2.4.3 Systematic analysis of methods and parameters . . . . . . . . . . . . . 30 2.5 Discussion...................................... 32 3 Wild Pedigrees 35 3.1 Introduction..................................... 35 3.2 The molecular ecologist’s tools of the trade ................... 36 3.2.1 3.2.2 3.2.3 3.2.1 Sibship inference and parental reconstruction . . . . . . . . . . . . . . 37 3.2.2 Parentage and paternity inference .................... 39 3.2.3 Multigenerational pedigree reconstruction . . . . . . . . . . . . . . . . 40 3.3 Background..................................... 40 3.3.1 Pedigrees .................................. 40 3.3.2 Genotypes.................................. 41 3.3.3 Mendelian segregation probability .................... 41 3.3.4 LOD Scores................................. 43 3.3.5 Genotyping Errors ............................. 43 3.3.6 IBD coefficients............................... 45 3.3.7 Bayesian MCMC.............................. 46 3.4 Methods....................................... 47 3.4.1 Likelihood Model.............................. 47 3.4.2 Efficient Likelihood Calculation...................... 49 3.4.3 Maximum Likelihood Pedigree ...................... 51 3.4.4 Full siblings................................. 52 3.4.5 Algorithm.................................. 53 3.4.6 Missing Values ............................... 56 3.4.7 Allelefrequencies.............................. 58 3.4.8 Rates of Self-fertilization.......................... 60 3.4.9 Rates of Clonality ............................. 60 3.5 Results........................................ 61 3.5.1 Real Microsatellite Data.......................... 61 3.5.2 Simulated Human Population....................... 62 3.5.3 SimulatedClonalPlantPopulation.................... 64 3.6 Discussion...................................... 71 4 Conclusions 77 A FRANz 79 A.1 Availability ..................................... 79 A.2 Input files...................................... 79 A.2.1 Maininputfile ............................... 79 A.2.2 Knownrelationships ............................ 80 A.2.3 Allele frequencies.............................. 81 A.2.4 Sampling locations............................. 82 A.3 Output files..................................... 83 A.4 Web 2.0 Interface.................................. 86 List of Figures 87 List of Tables 88 List Abbreviations 90 Bibliography 92 Curriculum Vitae

    Colonization and dispersal patterns of the invasive American brine shrimp Artemia franciscana (Branchiopoda: Anostraca) in the Mediterranean region

    Get PDF
    Cysts of the brine shrimp Artemia franciscana are harvested from the Great Salt Lake (GSL) and San Francisco Bay (SFB) saltworks in the USA, and marketed worldwide to provide live food for aquaculture. This species has become invasive across several countries. We investigated (1) if the introduced populations in the Mediterranean region could have originated from these USA populations, (2) how the genetic diversity of Mediterranean compares to that at GSL and SFB, and (3) if genetic patterns in the Mediterranean can shed light on colonization routes. We sequenced a fragment of the cytochrome c oxidase subunit I and screened microsatellites loci from Mediterranean populations and the two putative USA sources. Haplotypes from Mediterranean populations were identical or closely related to those from SFB and GSL, and not related to other available American populations. Microsatellite analyses showed a reduced population diversity for most Mediterranean populations suggesting bottleneck effects, but few populations were showing similar or higher genetic diversity than native ones, which are likely to be admixed from both GSL and SFB because of multiple introductions. Results suggest natural dispersal, potentially via flamingos, between two Spanish populations. Our analyses show that all invaded populations could have originated from those commercialized USA populations. © 2013 Springer Science+Business Media Dordrecht

    GENETIC MONITORING AND RESCUE IN MID-ATLANTIC BROOK TROUT (SALVELINUS FONTINALIS) POULATIONS

    Get PDF
    Brook trout (Salvelinus fontinalis) populations have experienced dramatic declines throughout their native range, in part, due to anthropogenic land use and habitat fragmentation. In the mid-Atlantic region, brook trout populations often occupy small, headwater habitat fragments in demographic and genetic isolation, making them vulnerable to inbreeding and genetic drift. My dissertation evaluates different methods for genetic assessment, monitoring, and management of small, isolated brook trout populations. First, I examined the potential value of effective number of breeders (Nb) estimates for genetic monitoring by determining whether Nb estimates were sensitive to habitat characteristics known to affect brook trout populations. Using genetic data from 71 brook trout habitat patches, I found significant evidence that Nb estimates were positively related to habitat size and base flow index, and negatively related to temperature. These results provide further support for the use of Nb in genetic assessments and monitoring of isolated salmonid populations. Human-mediated gene flow is a promising approach to reduce extinction risk and alleviate negative fitness effects associated with small effective population size (i.e., genetic rescue). However, there had not been an assessment of the statistical power of commonly used approaches to determine fitness effects of gene flow, despite calls for more widespread use of human-mediated gene flow. I addressed this need by using individual-based simulations of gene flow and found that these monitoring approaches frequently suffered from low statistical power but also identified strategies to improve inference. Finally, I examined the multigenerational effects of genetic rescue in a small, isolated population of brook trout and found consistent evidence of elevated fitness in F1 hybrids as compared to resident individuals. In contrast, I found a negative relationship between proportion migrant ancestry and lifetime reproductive success in backcrosses (F2 and later generations). Still, backcrosses with less than 0.48 migrant ancestry had lifetime reproductive success greater than residents, on average. These results highlight that gene flow often introduces beneficial and deleterious variation with the net-effect depending on the efficacy of natural selection, which suggests that ecological conditions affecting demography can play an outsized role in determining the outcome of genetic rescue attempts

    Fast and scalable inference of multi-sample cancer lineages

    Get PDF
    • 

    corecore