3,646 research outputs found

    A Differentiation-Based Phylogeny of Cancer Subtypes

    Get PDF
    Histopathological classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. In this paper, we introduce a novel computational algorithm to rank tumor subtypes according to the dissimilarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia, breast cancer and liposarcoma subtypes and then apply it to a broader group of sarcomas. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors

    Genealogy Reconstruction: Methods and applications in cancer and wild populations

    Get PDF
    Genealogy reconstruction is widely used in biology when relationships among entities are studied. Phylogenies, or evolutionary trees, show the differences between species. They are of profound importance because they help to obtain better understandings of evolutionary processes. Pedigrees, or family trees, on the other hand visualize the relatedness between individuals in a population. The reconstruction of pedigrees and the inference of parentage in general is now a cornerstone in molecular ecology. Applications include the direct infer- ence of gene flow, estimation of the effective population size and parameters describing the population’s mating behaviour such as rates of inbreeding. In the first part of this thesis, we construct genealogies of various types of cancer. Histopatho- logical classification of human tumors relies in part on the degree of differentiation of the tumor sample. To date, there is no objective systematic method to categorize tumor subtypes by maturation. We introduce a novel algorithm to rank tumor subtypes according to the dis- similarity of their gene expression from that of stem cells and fully differentiated tissue, and thereby construct a phylogenetic tree of cancer. We validate our methodology with expression data of leukemia and liposarcoma subtypes and then apply it to a broader group of sarcomas and of breast cancer subtypes. This ranking of tumor subtypes resulting from the application of our methodology allows the identification of genes correlated with differentiation and may help to identify novel therapeutic targets. Our algorithm represents the first phylogeny-based tool to analyze the differentiation status of human tumors. In contrast to asexually reproducing cancer cell populations, pedigrees of sexually reproduc- ing populations cannot be represented by phylogenetic trees. Pedigrees are directed acyclic graphs (DAGs) and therefore resemble more phylogenetic networks where reticulate events are indicated by vertices with two incoming arcs. We present a software package for pedigree reconstruction in natural populations using co-dominant genomic markers such as microsatel- lites and single nucleotide polymorphism (SNPs) in the second part of the thesis. If available, the algorithm makes use of prior information such as known relationships (sub-pedigrees) or the age and sex of individuals. Statistical confidence is estimated by Markov chain Monte Carlo (MCMC) sampling. The accuracy of the algorithm is demonstrated for simulated data as well as an empirical data set with known pedigree. The parentage inference is robust even in the presence of genotyping errors. We further demonstrate the accuracy of the algorithm on simulated clonal populations. We show that the joint estimation of parameters of inter- est such as the rate of self-fertilization or clonality is possible with high accuracy even with marker panels of moderate power. Classical methods can only assign a very limited number of statistically significant parentages in this case and would therefore fail. The method is implemented in a fast and easy to use open source software that scales to large datasets with many thousand individuals.:Abstract v Acknowledgments vii 1 Introduction 1 2 Cancer Phylogenies 7 2.1 Introduction..................................... 7 2.2 Background..................................... 9 2.2.1 PhylogeneticTrees............................. 9 2.2.2 Microarrays................................. 10 2.3 Methods....................................... 11 2.3.1 Datasetcompilation ............................ 11 2.3.2 Statistical Methods and Analysis..................... 13 2.3.3 Comparison of our methodology to other methods . . . . . . . . . . . 15 2.4 Results........................................ 16 2.4.1 Phylogenetic tree reconstruction method. . . . . . . . . . . . . . . . . 16 2.4.2 Comparison of tree reconstruction methods to other algorithms . . . . 28 2.4.3 Systematic analysis of methods and parameters . . . . . . . . . . . . . 30 2.5 Discussion...................................... 32 3 Wild Pedigrees 35 3.1 Introduction..................................... 35 3.2 The molecular ecologist’s tools of the trade ................... 36 3.2.1 3.2.2 3.2.3 3.2.1 Sibship inference and parental reconstruction . . . . . . . . . . . . . . 37 3.2.2 Parentage and paternity inference .................... 39 3.2.3 Multigenerational pedigree reconstruction . . . . . . . . . . . . . . . . 40 3.3 Background..................................... 40 3.3.1 Pedigrees .................................. 40 3.3.2 Genotypes.................................. 41 3.3.3 Mendelian segregation probability .................... 41 3.3.4 LOD Scores................................. 43 3.3.5 Genotyping Errors ............................. 43 3.3.6 IBD coefficients............................... 45 3.3.7 Bayesian MCMC.............................. 46 3.4 Methods....................................... 47 3.4.1 Likelihood Model.............................. 47 3.4.2 Efficient Likelihood Calculation...................... 49 3.4.3 Maximum Likelihood Pedigree ...................... 51 3.4.4 Full siblings................................. 52 3.4.5 Algorithm.................................. 53 3.4.6 Missing Values ............................... 56 3.4.7 Allelefrequencies.............................. 58 3.4.8 Rates of Self-fertilization.......................... 60 3.4.9 Rates of Clonality ............................. 60 3.5 Results........................................ 61 3.5.1 Real Microsatellite Data.......................... 61 3.5.2 Simulated Human Population....................... 62 3.5.3 SimulatedClonalPlantPopulation.................... 64 3.6 Discussion...................................... 71 4 Conclusions 77 A FRANz 79 A.1 Availability ..................................... 79 A.2 Input files...................................... 79 A.2.1 Maininputfile ............................... 79 A.2.2 Knownrelationships ............................ 80 A.2.3 Allele frequencies.............................. 81 A.2.4 Sampling locations............................. 82 A.3 Output files..................................... 83 A.4 Web 2.0 Interface.................................. 86 List of Figures 87 List of Tables 88 List Abbreviations 90 Bibliography 92 Curriculum Vitae

    Medoidshift clustering applied to genomic bulk tumor data.

    Get PDF
    Despite the enormous medical impact of cancers and intensive study of their biology, detailed characterization of tumor growth and development remains elusive. This difficulty occurs in large part because of enormous heterogeneity in the molecular mechanisms of cancer progression, both tumor-to-tumor and cell-to-cell in single tumors. Advances in genomic technologies, especially at the single-cell level, are improving the situation, but these approaches are held back by limitations of the biotechnologies for gathering genomic data from heterogeneous cell populations and the computational methods for making sense of those data. One popular way to gain the advantages of whole-genome methods without the cost of single-cell genomics has been the use of computational deconvolution (unmixing) methods to reconstruct clonal heterogeneity from bulk genomic data. These methods, too, are limited by the difficulty of inferring genomic profiles of rare or subtly varying clonal subpopulations from bulk data, a problem that can be computationally reduced to that of reconstructing the geometry of point clouds of tumor samples in a genome space. Here, we present a new method to improve that reconstruction by better identifying subspaces corresponding to tumors produced from mixtures of distinct combinations of clonal subpopulations. We develop a nonparametric clustering method based on medoidshift clustering for identifying subgroups of tumors expected to correspond to distinct trajectories of evolutionary progression. We show on synthetic and real tumor copy-number data that this new method substantially improves our ability to resolve discrete tumor subgroups, a key step in the process of accurately deconvolving tumor genomic data and inferring clonal heterogeneity from bulk data

    A unified phylogeny-based nomenclature for histone variants

    Get PDF
    Histone variants are non-allelic protein isoforms that play key roles in diversifying chromatin structure. The known number of such variants has greatly increased in recent years, but the lack of naming conventions for them has led to a variety of naming styles, multiple synonyms and misleading homographs that obscure variant relationships and complicate database searches. We propose here a unified nomenclature for variants of all five classes of histones that uses consistent but flexible naming conventions to produce names that are informative and readily searchable. The nomenclature builds on historical usage and incorporates phylogenetic relationships, which are strong predictors of structure and function. A key feature is the consistent use of punctuation to represent phylogenetic divergence, making explicit the relationships among variant subtypes that have previously been implicit or unclear. We recommend that by default new histone variants be named with organism-specific paralog-number suffixes that lack phylogenetic implication, while letter suffixes be reserved for structurally distinct clades of variants. For clarity and searchability, we encourage the use of descriptors that are separate from the phylogeny-based variant name to indicate developmental and other properties of variants that may be independent of structure

    The Evolutionary Dynamics of the Lion Panthera leo Revealed by Host and Viral Population Genomics

    Get PDF
    The lion Panthera leo is one of the world's most charismatic carnivores and is one of Africa's key predators. Here, we used a large dataset from 357 lions comprehending 1.13 megabases of sequence data and genotypes from 22 microsatellite loci to characterize its recent evolutionary history. Patterns of molecular genetic variation in multiple maternal (mtDNA), paternal (Y-chromosome), and biparental nuclear (nDNA) genetic markers were compared with patterns of sequence and subtype variation of the lion feline immunodeficiency virus (FIVPle), a lentivirus analogous to human immunodeficiency virus (HIV). In spite of the ability of lions to disperse long distances, patterns of lion genetic diversity suggest substantial population subdivision (mtDNA ΦST = 0.92; nDNA FST = 0.18), and reduced gene flow, which, along with large differences in sero-prevalence of six distinct FIVPle subtypes among lion populations, refute the hypothesis that African lions consist of a single panmictic population. Our results suggest that extant lion populations derive from several Pleistocene refugia in East and Southern Africa (∼324,000–169,000 years ago), which expanded during the Late Pleistocene (∼100,000 years ago) into Central and North Africa and into Asia. During the Pleistocene/Holocene transition (∼14,000–7,000 years), another expansion occurred from southern refugia northwards towards East Africa, causing population interbreeding. In particular, lion and FIVPle variation affirms that the large, well-studied lion population occupying the greater Serengeti Ecosystem is derived from three distinct populations that admixed recently

    Reconstruction of ancestral brains: Exploring the evolutionary process of encephalization in amniotes

    Get PDF
    AbstractThere is huge divergence in the size and complexity of vertebrate brains. Notably, mammals and birds have bigger brains than other vertebrates, largely because these animal groups established larger dorsal telencephali. Fossil evidence suggests that this anatomical trait could have evolved independently. However, recent comparative developmental analyses demonstrate surprising commonalities in neuronal subtypes among species, although this interpretation is highly controversial. In this review, we introduce intriguing evidence regarding brain evolution collected from recent studies in paleontology and developmental biology, and we discuss possible evolutionary changes in the cortical developmental programs that led to the encephalization and structural complexity of amniote brains. New research concepts and approaches will shed light on the origin and evolutionary processes of amniote brains, particularly the mammalian cerebral cortex

    The GDR : a novel approach to detect large-scale genomic sequence patterns

    Get PDF
    Utvikling av ny sekvenseringsteknologi de to siste tiårene har tillatt dypere dykk ned i de biomolekylære aspektene ved menneskets oppskrift. Hel-genom data fra flere hundre tusen mennesker er allerede tilgjengelig, men hvordan den økende mengden informasjon kan settes sammen til meningsfull funksjonell tolkning er komplisert og krever nye metoder. MikroRNA - mRNA interaksjoner utgjør et enormt genreguleringsnettverk som er vanskelig å predikere, selv for dagens beste maskinlæringsalgoritmer(1). Disse ikke-kodende elementene er involvert i omtrent alle cellulære prosesser i mennesket, primært via delvis komplementær baseparing mellom mikroRNA og mRNA, men det er mye vi ikke forstår av dette nettverkets betydning i vår biologi (2-4). Nye metoder er nødvendige for å kunne utforske genetisk variasjon i dette nettverket, som kan gi nye innblikk i hvordan genene våre reguleres. Her presenteres «The Group Diversity Ratio» (GDR) som en ny målenhet til å møte denne utfordringen. GDR kan kvantifisere evolusjonær struktur av variasjon i store mengder genomisk sekvensdata, med et resultat som kan statistisk valideres. Metoden baserer seg på å måle gruppe-struktur i et distanse-basert fylogenetisk tre av sekvensdata, for forhåndsdefinerte grupper av «blader» i treet. Gruppene representerer en egenskap som kan relateres til sekvensdataen, og det undersøkes til hvilken grad det finnes en sammenheng mellom de to. Metoden kan primært brukes til å raskt skaffe overblikk over store mengder genomisk sekvensdata, som kan gi verdifulle innblikk til videre etterforskning. For å teste metoden ble GDR brukt til å identifisere variasjon assosiert med etniske populasjoner i 3’UTR data fra «The 1000 Genomes Project» (1KGP). 1KGP var det første store prosjektet som adresserte den etniske skjevheten som nå finnes i genom-databaser, og som utgjør en god grunn til å utforske etnisk genetisk variasjon (5). I tillegg til identifikasjon av mer enn 1000 3’UTR sekvenser som inneholder signifikant etnisitet-spesifikk variasjon, viser dette studiet GDR-metodens høye potensial til å undersøke genetisk variasjon i stor skala.The emergence of new sequencing technologies over the past two decades has enabled us to dive deeper into the biomolecular aspect of the human recipe. Entire genomes from several hundred thousand people are already accessible, but how to interpretate the connections between the blueprints and the phenotypes are complicated, even for the best developed machine learning algorithms. Prediction of the microRNA-mRNA targeting network is a classic example, which is involved with gene regulation of all living cell processes. These non-coding features make up complex networks of interactions, where microRNAs primarily target 3’UTRs through partial complementary base-pairing. Thus, the challenge to investigate patterns in such large-scaled genomic sequence data requires new approaches. The Group Diversity Ratio (GDR) metric is presented here as a novel approach to aid in this challenge. The GDR quantifies genome-wide structure in large-scale sequence data with a statistically testable result. Patterns are measured for a group feature that may be related to variation in sequence samples, based on phylogenetic distance estimations. It opens opportunities to quickly gain insights into genomic regions of interests and used to guide further research. To demonstrate the use of the GDR metric, ethnicity-associated variation patterns in more than 1000 human 3’UTRs was identified with the GDR. The study set was from 1000 Genomes project, which was the first major effort to address the problem of ethnic bias in genetic studies and contained more than 2500 whole-genome sequences from 26 ethnic lineages. In addition to detecting significantly distinct 3’UTR elements for ethnic populations, the key finding of this study was the high potentials of the GDR to facilitate more high-throughput characterization of genomic sequence data.M-BIA
    corecore